Release Notes

View Source On Gitee

vLLM-MindSpore Plugin 0.4.0 Release Notes

The vLLM-MindSpore Plugin 0.4.0 version is compatible with vLLM 0.9.1. Below are the new features and models supported in this release.

New Features

  • Architecture Adaptation: Completed architectural upgrades and adapted to vLLM 0.9.1, fully inheriting all features of the older version. Added support for deploying DP parallel services by Ray. For specific configurations and operational instructions, please refer to the Multi-Machine Parallel Inference.

  • Quantization Support: The DeepSeek-R1 model now supports A8W4 quantization inference. For technical details and usage guidelines, see DeepSeek-R1 A8W4 Quantization Inference Model Link.

  • Performance Optimization: Integrated FA3 quantization inference and MLA series operators, significantly improving the performance of the DeepSeek model series in long-sequence scenarios. Performance improvements exceed 10% in some scenarios.

  • Usability Enhancements: Optimized the deployment process for backend model services in MindSpore Transformers, eliminating the dependency on YAML configuration files. Users can now complete deployments directly using community-open-sourced Hugging Face model configuration files.

New Models

  • DeepSeek Series Model:

    • DeepSeek-R1 A8W4 Quantization Inference

  • TeleChat Model Series:

    • TeleChat2 Model Inference

  • GLM Model Series:

    • GLM-4 Model Inference

    • GLM-4.5 Model Inference

  • Qwen3 Model Series:

    • Qwen3 Native Model

  • Qwen2.5 Model Series:

    • Qwen2.5-VL Native Model

Contributors

Thanks to the following contributors for their efforts:

alien_0119, candyhong, can-gaa-hou, ccsszz, cs123abc, dayschan, Erpim, fary86, hangangqiang, horcam, huandong, huzhikun, i-robot, jiahaochen666, JingweiHuang, lijiakun, liu lili, lvhaoyu, lvhaoyu1, moran, nashturing, one_east, panshaowu, pengjingyou, r1chardf1d0, tongl, TrHan, tronzhang, TronZhang, twc, uh, w00521005, wangpingan2, WanYidong, WeiCheng Tan, wusimin, yangminghai, yyyyrf, zhaizhiqiang, zhangxuetong, zhang_xu_hao1230, zhanzhan1, zichun_ye, zlq2020

Contributions to the project in any form are welcome!