Installation Guide
This document will introduce the Version Matching of vLLM-MindSpore Plugin, the installation steps for vLLM-MindSpore Plugin, and the Quick Verification to verify whether the installation is successful. The installation steps provide two installation methods:
Docker Installation: Suitable for quick deployment scenarios.
Source Code Installation: Suitable for incremental development of vLLM-MindSpore Plugin.
Version Compatibility
OS: Linux-aarch64
Python: 3.9 / 3.10 / 3.11
Software version compatibility
Software
Version And Links
CANN
MindSpore
MSAdapter
MindSpore Transformers
Golden Stick
vLLM
Note: vLLM Package uses vLLM 0.8.3 branch,and add data parallel.
Docker Installation
We recommend using Docker for quick deployment of the vLLM-MindSpore Plugin environment. Below are the steps:
Building the Image
User can execute the following commands to clone the vLLM-MindSpore Plugin code repository and build the image:
git clone -b r0.3.0 https://gitee.com/mindspore/vllm-mindspore.git
bash build_image.sh
After a successful build, user will get the following output:
Successfully built e40bcbeae9fc
Successfully tagged vllm_ms_20250726:latest
Here, e40bcbeae9fc
is the image ID, and vllm_ms_20250726:latest
is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created:
docker images
Creating a Container
After building the image, set DOCKER_NAME
and IMAGE_NAME
as the container and image names, then execute the following command to create the container:
export DOCKER_NAME=vllm-mindspore-container # your container name
export IMAGE_NAME=vllm_ms_20250726:latest # your image name
docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
--device=/dev/davinci0 \
--device=/dev/davinci1 \
--device=/dev/davinci2 \
--device=/dev/davinci3 \
--device=/dev/davinci4 \
--device=/dev/davinci5 \
--device=/dev/davinci6 \
--device=/dev/davinci7 \
--device=/dev/davinci_manager \
--device=/dev/devmm_svm \
--device=/dev/hisi_hdc \
-v /usr/local/sbin/:/usr/local/sbin/ \
-v /var/log/npu/slog/:/var/log/npu/slog \
-v /var/log/npu/profiling/:/var/log/npu/profiling \
-v /var/log/npu/dump/:/var/log/npu/dump \
-v /var/log/npu/:/usr/slog \
-v /etc/hccn.conf:/etc/hccn.conf \
-v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
-v /usr/local/dcmi:/usr/local/dcmi \
-v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
-v /etc/ascend_install.info:/etc/ascend_install.info \
-v /etc/vnpu.cfg:/etc/vnpu.cfg \
--shm-size="250g" \
${IMAGE_NAME} \
bash
The container ID will be returned if docker is created successfully. User can also check the container by executing the following command:
docker ps
Entering the Container
After creating the container, user can start and enter the container, using the environment variable DOCKER_NAME
:
docker exec -it $DOCKER_NAME bash
Source Code Installation
CANN Installation
For CANN installation methods and environment configuration, please refer to CANN Community Edition Installation Guide. If you encounter any issues during CANN installation, please consult the Ascend FAQ for troubleshooting.
The default installation path for CANN is /usr/local/Ascend
. After completing CANN installation, configure the environment variables with the following commands:
LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit
vLLM Prerequisites Installation
For vLLM environment configuration and installation methods, please refer to the vLLM Installation Guide. In vllM installation, gcc/g++ >= 12.3.0
is required, and it could be installed by the following command:
yum install -y gcc gcc-c++
vLLM-MindSpore Plugin Installation
vLLM-MindSpore Plugin can be installed in the following two ways. vLLM-MindSpore Plugin Quick Installation is suitable for scenarios where users need quick deployment and usage. vLLM-MindSpore Plugin Manual Installation is suitable for scenarios where users require custom modifications to the components.
vLLM-MindSpore Plugin Quick Installation
To install vLLM-MindSpore Plugin, user needs to pull the vLLM-MindSpore Plugin source code and then runs the following command to install the dependencies:
git clone https://gitee.com/mindspore/vllm-mindspore.git cd vllm-mindspore bash install_depend_pkgs.sh
Compile and install vLLM-MindSpore Plugin:
pip install .
After executing the above commands,
mindformers
folder will be generated in thevllm-mindspore/install_depend_pkgs
directory. Add this folder to the environment variables:export PYTHONPATH=$MF_PATH:$PYTHONPATH
vLLM-MindSpore Plugin Manual Installation
If user need to modify the components or use other versions, components need to be manually installed in a specific order. Version compatibility of vLLM-MindSpore Plugin can be found Version Compatibility, abd vLLM-MindSpore Plugin requires the following installation sequence:
Install vLLM
pip install /path/to/vllm-*.whl
Uninstall Torch-related components
pip uninstall torch torch-npu torchvision torchaudio -y
Install MindSpore
pip install /path/to/mindspore-*.whl
Clone the MindSpore Transformers repository and add it to
PYTHONPATH
git clone https://gitee.com/mindspore/mindformers.git export PYTHONPATH=$MF_PATH:$PYTHONPATH
Install Golden Stick
pip install /path/to/mindspore_gs-*.whl
Install MSAdapter
pip install /path/to/msadapter-*.whl
Install vLLM-MindSpore Plugin
User needs to pull source of vLLM-MindSpore Plugin, and run installation.
git clone https://gitee.com/mindspore/vllm-mindspore.git cd vllm-mindspore pip install .
Quick Verification
User can verify the installation with a simple offline inference test. First, user need to configure the environment variables with the following command:
export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.
About environment variables above, user can also refer to here for more details.
User can use the following Python scripts to verify with Qwen2.5-7B:
import vllm_mindspore # Add this line on the top of script.
from vllm import LLM, SamplingParams
# Sample prompts.
prompts = [
"I am",
"Today is",
"Llama is"
]
# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95)
# Create a LLM
llm = LLM(model="Qwen2.5-7B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
prompt = output.prompt
generated_text = output.outputs[0].text
print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")
If successful, the output will resemble:
Prompt: 'I am'. Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'
Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the teacher has'
Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative'
Alternatively, refer to the Quick Start guide for online inference verification.