# LiteBoost Introduction ## Overview LiteBoost is an inference acceleration toolkit for Ascend hardware, built on top of MindSpore Lite. It provides high-performance custom operators, multi-card parallel inference, quantization and sparsity, and other inference acceleration capabilities. LiteBoost builds on the PyTorch interface and deeply invokes Ascend CANN `aclnn` interfaces through C++ custom operators. It combines optimized Attention and RoPE implementations at the Python layer with HCCL multi-card communication to achieve end-to-end inference acceleration. ## Core Capabilities ### High-Performance Custom Operators - Provides an easy-to-use interface by integrating CANN fused operators, enabling quick adoption of fused operators to improve model inference performance. - Supports developing custom fused operators and exposing them through LiteBoost interfaces to improve model inference performance with PyTorch. ### Multi-Card Parallelism - Supports multiple parallel strategies such as TP, CP, SP, and DP. - Adapts and optimizes for different algorithm models through different parallel strategies, provides a simple and easy-to-use experience for open-source models, and improves developers’ ability to enable multi-card parallelism. ## Technical Architecture LiteBoost adopts a dual-layer architecture of **C++ Operator Layer + Python Acceleration Layer**: - **C++ Operator Layer**: Registers custom operators through the PyTorch `TORCH_LIBRARY` mechanism, compiles them into shared libraries, and deeply invokes Ascend CANN `aclnn` interfaces to fully leverage Ascend NPU hardware performance, and will continue to develop custom operators to improve the inference performance of this component. - **Python Acceleration Layer**: Encapsulates Python bindings for C++ operators, optimized Attention Layers, and the HCCL-based multi-card parallel solution, providing a clean and easy-to-use Python API, and will continue to be updated and add acceleration optimizations related to quantization and sparsity.