lite_boost

LiteBoost is an inference acceleration toolkit for Ascend hardware, built on top of MindSpore Lite. It provides high-performance custom operators, multi-card parallel inference, quantization and sparsity, and other inference acceleration capabilities.

Parallel

lite_boost.parallel.initialize_usp

Initialize the HCCL distributed environment for parallel inference.

lite_boost.parallel.ParallelManager

Modify a supported model in-place for distributed parallel inference.

Operators

lite_boost.ops.rain_fusion_attention

Block-sparse fusion attention forward computation.

lite_boost.ops.sparse_attention

High-level sparse attention entry point.

lite_boost.ops.recurrent_gated_delta_rule

Recurrent GatedDeltaRule operator — CANN aclnn-backed recurrent linear attention decode.