Installation Guide

View Source On Gitee

This document will introduce the Version Matching of vLLM-MindSpore Plugin, the installation steps for vLLM-MindSpore Plugin, and the Quick Verification to verify whether the installation is successful. The installation steps provide two installation methods:

Version Compatibility

  • OS: Linux-aarch64

  • Python: 3.9 / 3.10 / 3.11

  • Software version compatibility

    Software

    Version And Links

    CANN

    8.1.RC1

    MindSpore

    2.7.0

    MSAdapter

    0.5.0

    MindSpore Transformers

    1.6.0

    Golden Stick

    1.2.0

    vLLM

    0.8.3

Note: vLLM Package uses vLLM 0.8.3 branch,and add data parallel.

Docker Installation

We recommend using Docker for quick deployment of the vLLM-MindSpore Plugin environment. Below are the steps:

Building the Image

User can execute the following commands to clone the vLLM-MindSpore Plugin code repository and build the image:

git clone -b r0.3.0 https://gitee.com/mindspore/vllm-mindspore.git
bash build_image.sh

After a successful build, user will get the following output:

Successfully built e40bcbeae9fc
Successfully tagged vllm_ms_20250726:latest

Here, e40bcbeae9fc is the image ID, and vllm_ms_20250726:latest is the image name and tag. User can run the following command to confirm that the Docker image has been successfully created:

docker images  

Creating a Container

After building the image, set DOCKER_NAME and IMAGE_NAME as the container and image names, then execute the following command to create the container:

export DOCKER_NAME=vllm-mindspore-container  # your container name
export IMAGE_NAME=vllm_ms_20250726:latest  # your image name

docker run -itd --name=${DOCKER_NAME} --ipc=host --network=host --privileged=true \
        --device=/dev/davinci0 \
        --device=/dev/davinci1 \
        --device=/dev/davinci2 \
        --device=/dev/davinci3 \
        --device=/dev/davinci4 \
        --device=/dev/davinci5 \
        --device=/dev/davinci6 \
        --device=/dev/davinci7 \
        --device=/dev/davinci_manager \
        --device=/dev/devmm_svm \
        --device=/dev/hisi_hdc \
        -v /usr/local/sbin/:/usr/local/sbin/ \
        -v /var/log/npu/slog/:/var/log/npu/slog \
        -v /var/log/npu/profiling/:/var/log/npu/profiling \
        -v /var/log/npu/dump/:/var/log/npu/dump \
        -v /var/log/npu/:/usr/slog \
        -v /etc/hccn.conf:/etc/hccn.conf \
        -v /usr/local/bin/npu-smi:/usr/local/bin/npu-smi \
        -v /usr/local/dcmi:/usr/local/dcmi \
        -v /usr/local/Ascend/driver:/usr/local/Ascend/driver \
        -v /etc/ascend_install.info:/etc/ascend_install.info \
        -v /etc/vnpu.cfg:/etc/vnpu.cfg \
        --shm-size="250g" \
        ${IMAGE_NAME} \
        bash

The container ID will be returned if docker is created successfully. User can also check the container by executing the following command:

docker ps  

Entering the Container

After creating the container, user can start and enter the container, using the environment variable DOCKER_NAME:

docker exec -it $DOCKER_NAME bash  

Source Code Installation

CANN Installation

For CANN installation methods and environment configuration, please refer to CANN Community Edition Installation Guide. If you encounter any issues during CANN installation, please consult the Ascend FAQ for troubleshooting.

The default installation path for CANN is /usr/local/Ascend. After completing CANN installation, configure the environment variables with the following commands:

LOCAL_ASCEND=/usr/local/Ascend # the root directory of run package
source ${LOCAL_ASCEND}/ascend-toolkit/set_env.sh
export ASCEND_CUSTOM_PATH=${LOCAL_ASCEND}/ascend-toolkit

vLLM Prerequisites Installation

For vLLM environment configuration and installation methods, please refer to the vLLM Installation Guide. In vllM installation, gcc/g++ >= 12.3.0 is required, and it could be installed by the following command:

yum install -y gcc gcc-c++

vLLM-MindSpore Plugin Installation

vLLM-MindSpore Plugin can be installed in the following two ways. vLLM-MindSpore Plugin Quick Installation is suitable for scenarios where users need quick deployment and usage. vLLM-MindSpore Plugin Manual Installation is suitable for scenarios where users require custom modifications to the components.

  • vLLM-MindSpore Plugin Quick Installation

    To install vLLM-MindSpore Plugin, user needs to pull the vLLM-MindSpore Plugin source code and then runs the following command to install the dependencies:

    git clone https://gitee.com/mindspore/vllm-mindspore.git  
    cd vllm-mindspore  
    bash install_depend_pkgs.sh  
    

    Compile and install vLLM-MindSpore Plugin:

    pip install .  
    

    After executing the above commands, mindformers folder will be generated in the vllm-mindspore/install_depend_pkgs directory. Add this folder to the environment variables:

    export PYTHONPATH=$MF_PATH:$PYTHONPATH  
    
  • vLLM-MindSpore Plugin Manual Installation

    If user need to modify the components or use other versions, components need to be manually installed in a specific order. Version compatibility of vLLM-MindSpore Plugin can be found Version Compatibility, abd vLLM-MindSpore Plugin requires the following installation sequence:

    1. Install vLLM

      pip install /path/to/vllm-*.whl  
      
    2. Uninstall Torch-related components

      pip uninstall torch torch-npu torchvision torchaudio -y  
      
    3. Install MindSpore

      pip install /path/to/mindspore-*.whl  
      
    4. Clone the MindSpore Transformers repository and add it to PYTHONPATH

      git clone https://gitee.com/mindspore/mindformers.git  
      export PYTHONPATH=$MF_PATH:$PYTHONPATH  
      
    5. Install Golden Stick

      pip install /path/to/mindspore_gs-*.whl  
      
    6. Install MSAdapter

      pip install /path/to/msadapter-*.whl  
      
    7. Install vLLM-MindSpore Plugin

      User needs to pull source of vLLM-MindSpore Plugin, and run installation.

      git clone https://gitee.com/mindspore/vllm-mindspore.git
      cd vllm-mindspore
      pip install .  
      

Quick Verification

User can verify the installation with a simple offline inference test. First, user need to configure the environment variables with the following command:

export vLLM_MODEL_BACKEND=MindFormers # use MindSpore Transformers as model backend.
export MINDFORMERS_MODEL_CONFIG=$YAML_PATH # Set the corresponding MindSpore Transformers model's YAML file.

About environment variables above, user can also refer to here for more details.

User can use the following Python scripts to verify with Qwen2.5-7B:

import vllm_mindspore # Add this line on the top of script.
from vllm import LLM, SamplingParams

# Sample prompts.
prompts = [
    "I am",
    "Today is",
    "Llama is"
]

# Create a sampling params object.
sampling_params = SamplingParams(temperature=0.0, top_p=0.95)

# Create a LLM
llm = LLM(model="Qwen2.5-7B-Instruct")
# Generate texts from the prompts. The output is a list of RequestOutput objects
# that contain the prompt, generated text, and other information.
outputs = llm.generate(prompts, sampling_params)
# Print the outputs.
for output in outputs:
    prompt = output.prompt
    generated_text = output.outputs[0].text
    print(f"Prompt: {prompt!r}. Generated text: {generated_text!r}")

If successful, the output will resemble:

Prompt: 'I am'. Generated text: ' trying to create a virtual environment for my Python project, but I am encountering some'  
Prompt: 'Today is'. Generated text: ' the 100th day of school. To celebrate, the teacher has'  
Prompt: 'Llama is'. Generated text: ' a 100% natural, biodegradable, and compostable alternative'  

Alternatively, refer to the Quick Start guide for online inference verification.