Model Support List
The following table lists the models currently supported by LiteBoost and their feature support status.
Model |
Hardware |
Parallel |
Attention |
Quantization |
Fused Operator |
Notes |
|---|---|---|---|---|---|---|
Atlas 300I Duo Inference Card |
USP (CP) |
NPU Flash Attention |
Not supported |
Not supported |
RoPE rewrite (float32 real-valued arithmetic + cache) |
|
Atlas 300I Duo Inference Card |
USP (CP) + DP (temporal tiling) |
NPU Flash Attention |
Not supported |
Not supported |
RoPE rewrite (float32 real-valued arithmetic + cache) |
Column descriptions:
Model: Model name, linked to the corresponding README in the LiteBoost source tree.
Hardware: Supported Ascend hardware platforms.
Parallel: Parallelism strategies applied by
ParallelManager. USP (CP) = Ulysses Sequence Parallel (Context Parallel) for DiT; DP = Data Parallel temporal tiling for VAE.Attention: Attention implementation replacement. The auto-fallback chain is Flash Attention 3 → Flash Attention 2 →
npu_prompt_flash_attention.Quantization: Whether quantization is supported.
Fused Operator: Whether C++ fused operators (registered via
TORCH_LIBRARYand invoking CANNaclnninterfaces) are used. RoPE rewrite is a Python-layer optimization and is not classified as a fused operator.Notes: Additional optimization details.