lite_boost
LiteBoost is an inference acceleration toolkit for Ascend hardware, built on top of MindSpore Lite. It provides high-performance custom operators, multi-card parallel inference, quantization and sparsity, and other inference acceleration capabilities.
Parallel
Initialize the HCCL distributed environment for parallel inference. |
|
Modify a supported model in-place for distributed parallel inference. |
Operators
Block-sparse fusion attention forward computation. |
|
High-level sparse attention entry point. |
|
Recurrent GatedDeltaRule operator — CANN aclnn-backed recurrent linear attention decode. |