Supported Features List

The features supported by vLLM-MindSpore Plugin are consistent with the community version of vLLM. For feature descriptions and usage, please refer to the vLLM Official Documentation.

The following are the features supported in vLLM-MindSpore Plugin.

Features	vLLM V0	vLLM V1
Chunked Prefill	√	√
Automatic Prefix Caching	√	√
Multi step scheduler	√	×
DeepSeek MTP	√	WIP
Async output	√	√
Quantization	√	√
LoRA	WIP	WIP
Tensor Parallel	√	√
Pipeline Parallel	WIP	WIP
Expert Parallel	×	√
Data Parallel	×	√
Prefill Decode Disaggregation	×	WIP
Multi Modality	WIP	WIP
Prompt adapter	×	WIP
Speculative decoding	×	WIP
LogProbs	×	√
Prompt logProbs	×	WIP
Best of	×	×
Beam search	×	WIP
Guided Decoding	×	WIP
Pooling	×	×
Enc-dec	×	×
Reasoning Outputs	√	√
Tool Calling	WIP	√

√: Feature aligned with the community version of vLLM.
×: Currently unsupported; alternative solutions are recommended.
WIP: Under development or planned for future implementation.

Feature Description

LoRA currently only supports the Qwen2.5 vLLM-MindSpore Plugin native model, other models are in the process of adaptation.
Tool Calling only supports DeepSeek V3 0324 W8A8 model.
300I Duo has supported Chunked Prefill, Automatic Prefix Caching and Tensor Parallel，and other features are in the process of adaptation.