# LiteBoost Introduction

## Overview

LiteBoost is an inference acceleration toolkit for Ascend hardware, built on top of MindSpore Lite. It provides high-performance custom operators, multi-card parallel inference, quantization and sparsity, and other inference acceleration capabilities. LiteBoost builds on the PyTorch interface and deeply invokes Ascend CANN `aclnn` interfaces through C++ custom operators. It combines optimized Attention and RoPE implementations at the Python layer with HCCL multi-card communication to achieve end-to-end inference acceleration.

## Core Capabilities

### High-Performance Custom Operators

- Provides an easy-to-use interface by integrating CANN fused operators, enabling quick adoption of fused operators to improve model inference performance.
- Supports developing custom fused operators and exposing them through LiteBoost interfaces to improve model inference performance with PyTorch.

### Multi-Card Parallelism

- Supports multiple parallel strategies such as TP, CP, SP, and DP.
- Adapts and optimizes for different algorithm models through different parallel strategies, provides a simple and easy-to-use experience for open-source models, and improves developers’ ability to enable multi-card parallelism.

## Technical Architecture

LiteBoost adopts a dual-layer architecture of **C++ Operator Layer + Python Acceleration Layer**:

- **C++ Operator Layer**: Registers custom operators through the PyTorch `TORCH_LIBRARY` mechanism, compiles them into shared libraries, and deeply invokes Ascend CANN `aclnn` interfaces to fully leverage Ascend NPU hardware performance, and will continue to develop custom operators to improve the inference performance of this component.
- **Python Acceleration Layer**: Encapsulates Python bindings for C++ operators, optimized Attention Layers, and the HCCL-based multi-card parallel solution, providing a clean and easy-to-use Python API, and will continue to be updated and add acceleration optimizations related to quantization and sparsity.