Supported Features List

View Source on AtomGit

The features supported by vLLM-MindSpore Plugin are consistent with the community version of vLLM. For feature descriptions and usage, please refer to the vLLM Official Documentation.

The following are the features supported in vLLM-MindSpore Plugin.

Features

vLLM V0

vLLM V1

Chunked Prefill

Automatic Prefix Caching

Multi step scheduler

×

DeepSeek MTP

WIP

Async output

Quantization

LoRA

WIP

WIP

Tensor Parallel

Pipeline Parallel

WIP

WIP

Expert Parallel

×

Data Parallel

×

Prefill Decode Disaggregation

×

WIP

Multi Modality

WIP

WIP

Prompt adapter

×

WIP

Speculative decoding

×

WIP

LogProbs

×

Prompt logProbs

×

WIP

Best of

×

×

Beam search

×

WIP

Guided Decoding

×

WIP

Pooling

×

×

Enc-dec

×

×

Reasoning Outputs

Tool Calling

WIP

Graph Capture

x

  • √: Feature aligned with the community version of vLLM.

  • ×: Currently unsupported; alternative solutions are recommended.

  • WIP: Under development or planned for future implementation.

Feature Description

  • LoRA currently has two inference modes: static graph and dynamic graph. The static graph offers better performance but does not support dynamic unloading or loading LoRA adapters. The LoRA feature currently only supports the Qwen2.5, other models are in the process of adaptation.

  • Tool Calling only supports DeepSeek V3 0324 W8A8 model.

  • Atlas 300I Duo has supported Chunked Prefill, LoRA(static graph), Automatic Prefix Caching and Tensor Parallel, and other features are in the process of adaptation.