Installation Guide
This document will introduce the Version Matching of vLLM-MindSpore Plugin, the installation steps for vLLM-MindSpore Plugin, and the Quick Verification to verify whether the installation is successful. The installation steps provide two installation methods:
Docker Installation: Suitable for quick deployment scenarios.
Source Code Installation: Suitable for incremental development of vLLM-MindSpore Plugin.
Version Compatibility
OS: Linux-aarch64
Python: 3.9 / 3.10 / 3.11
Depent Software version compatibility
Software
Version And Links
CANN
MindSpore
MSAdapter
MindSpore Transformers
vLLM
Source code and download link of vLLM-MindSpore Plugin
Source Code Link
Package Link
Docker Installation
We recommend using Docker for quick deployment of the vLLM-MindSpore Plugin environment. Below are the steps:
Building the Image
User can execute the following commands to clone the vLLM-MindSpore Plugin code repository:
git clone https://gitee.com/mindspore/vllm-mindspore.git
To build the image according to your npu type, follow these steps:
For Atlas 800I A2:
bash build_image.shFor Atlas 300I Duo:
bash build_image.sh -a 310p
After a successful build, user will get the following output:
Successfully built e40bcbeae9fc
Successfully tagged vllm_ms_20250726:latest
Here, e40bcbeae9fc is the image ID, and vllm_ms_20250726:latest is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created:
docker images
Creating a Container
After building the image, set DOCKER_NAME and IMAGE_NAME as the container and image names, then execute the following command to create the container:
export DOCKER_NAME=vllm-mindspore-container # your container name
export IMAGE_NAME=vllm_ms_20250726:latest # your image name
docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /var/log/npu/slog/:/var/log/npu/slog \
-v /var/log/npu/profiling/:/var/log/npu/profiling \
-v /var/log/npu/dump/:/var/log/npu/dump \
-v /var/log/npu/:/usr/slog \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /etc/vnpu.cfg:/etc/vnpu.cfg \
--shm-size="250g" \
${IMAGE_NAME} \
bash
The container ID will be returned if docker is created successfully. User can also check the container by executing the following command:
docker ps
Entering the Container
After creating the container, user can start and enter the container, using the environment variable DOCKER_NAME:
docker exec -it $DOCKER_NAME bash
Source Code Installation
CANN Installation
For CANN installation methods and environment configuration, please refer to CANN Community Edition Installation Guide. If you encounter any issues during CANN installation, please consult the Ascend FAQ for troubleshooting.
The default installation path for CANN is /usr/local/Ascend. After completing CANN installation, configure the environment variables with the following commands:
LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
vLLM Prerequisites Installation
For vLLM environment configuration and installation methods, please refer to the vLLM Installation Guide.
vLLM-MindSpore Plugin Installation
vLLM-MindSpore Plugin can be installed in the following two ways. vLLM-MindSpore Plugin Quick Installation is suitable for scenarios where users need quick deployment and usage. vLLM-MindSpore Plugin Manual Installation is suitable for scenarios where users require custom modifications to the components.
vLLM-MindSpore Plugin Quick Installation
To install vLLM-MindSpore Plugin, user needs to pull the vLLM-MindSpore Plugin source code and then runs the following command to install the dependencies:
git clone https://gitee.com/mindspore/vllm-mindspore.git cd vllm-mindspore bash install_depend_pkgs.sh
Compile and install vLLM-MindSpore Plugin:
pip install .
User can also refer to Version Compatibility, check the Python version, download vLLM-Mindspore Pulgin whl package, and use pip to install.
vLLM-MindSpore Plugin Manual Installation
If users require custom modifications to dependent components such as vLLM, MindSpore, or MSAdapter, they can prepare the modified installation packages locally and perform manual installation in a specific sequence. The installation sequence requirements are as follows:
Install vLLM
pip install /path/to/vllm-*.whl
Install MindSpore
pip install /path/to/mindspore-*.whl
Install MindSpore Transformers
pip install /path/to/mindformers-*.whl
Install MSAdapter
pip install /path/to/msadapter-*.whl
Install vLLM-MindSpore Plugin
User needs to pull source of vLLM-MindSpore Plugin, and run installation.
git clone https://gitee.com/mindspore/vllm-mindspore.git cd vllm-mindspore pip install .
Quick Verification
User can verify the installation with a simple offline inference test. First, user needs to configure the environment variables with the following command:
export VLLM_MS_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
About environment variables above, user can also refer to environment variables section for more details.
User can use the following Python scripts to verify with Qwen2.5-7B:
import vllm_mindspore # Add this line on the top of script.
from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"I am",
"Today is",
"Llama is"
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95)
# Create a LLM
llm = LLM(model="Qwen2.5-7B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")
If successful, the output will resemble:
Prompt: 'I am'. Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'
Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the teacher has'
Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative'
Alternatively, refer to the Quick Start guide for online inference verification.