Model Support List

The following table lists the models currently supported by LiteBoost and their feature support status.

Model

Hardware

Parallel

Attention

Quantization

Fused Operator

Notes

Wan2.1-T2V-1.3B

Atlas 300I Duo Inference Card
Atlas 800I A2 Inference Server

USP (CP)

NPU Flash Attention
(Flash Attention 3→2→npu_prompt_flash_attention)

Not supported

Not supported

RoPE rewrite (float32 real-valued arithmetic + cache)
Supports VACE variant

Wan2.2-TI2V-5B

Atlas 300I Duo Inference Card
Atlas 800I A2 Inference Server

USP (CP) + DP (temporal tiling)

NPU Flash Attention
(Flash Attention 3→2→npu_prompt_flash_attention)

Not supported

Not supported

RoPE rewrite (float32 real-valued arithmetic + cache)
VAE DP temporal tiling for encode/decode

Column descriptions:

  • Model: Model name, linked to the corresponding README in the LiteBoost source tree.

  • Hardware: Supported Ascend hardware platforms.

  • Parallel: Parallelism strategies applied by ParallelManager. USP (CP) = Ulysses Sequence Parallel (Context Parallel) for DiT; DP = Data Parallel temporal tiling for VAE.

  • Attention: Attention implementation replacement. The auto-fallback chain is Flash Attention 3 → Flash Attention 2 → npu_prompt_flash_attention.

  • Quantization: Whether quantization is supported.

  • Fused Operator: Whether C++ fused operators (registered via TORCH_LIBRARY and invoking CANN aclnn interfaces) are used. RoPE rewrite is a Python-layer optimization and is not classified as a fused operator.

  • Notes: Additional optimization details.