# Custom Operator Integration [![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/master/docs/vllm_mindspore/docs/source_en/developer_guide/operations/custom_ops.md) When the built-in operators do not meet your requirements, you can use MindSpore's custom operator functionality to integrate your operators. This document would introduce how to integrate a new custom operator into the vLLM-MindSpore Plugin project, with the **`advance_step_flashattn`** operator as an example. The focus here is on the integration process into vLLM-MindSpore Plugin. For the details of custom operator development, please refer to the official MindSpore tutorial: [CustomOpBuilder-Based Custom Operators](https://www.mindspore.cn/tutorials/en/master/custom_program/operation/op_customopbuilder.html), and for AscendC operator development, see the official Ascend documentation: [Ascend C Operator Development](https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/developmentguide/opdevg/Ascendcopdevg/atlas_ascendc_10_0001.html). **Note: Currently, custom operators in vLLM-MindSpore Plugin are only supported in PyNative Mode.** ## File Structure The directory `csrc` contains declaration and implementation of operations: ```text vllm-mindspore/ ├── csrc/ │ ├── CMakeLists.txt // Operator build script │ ├── ascendc/ │ │ ├── CMakeLists.txt // AscendC operator build script │ │ ├── adv_step_flash.h // AscendC AdvanceStepFlashattn operator declaration │ │ ├── adv_step_flash.c // AscendC AdvanceStepFlashattn operator implementation │ │ └── ... │ └── module/ │ ├── module.h // Common module registration header │ ├── module.cpp // Common module registration implementation │ ├── adv_step_flash.cpp // Integration layer code (Python interface registration) │ └── ... └── vllm_mindspore/ └── _custom_ops.py // Wrapper for custom operator call interface ``` - **`csrc/ascendc/`**: Contains AscendC custom operator implementation code. - **`csrc/module/`**: Contains operator integration layer code, including common module registration (`module.h`, `module.cpp`) and operator-specific integration (e.g., `adv_step_flash.cpp`). ## Integration Process To integrate a custom operator, user need to create [Operator Interface Declaration](#operator-interface-declaration), [Operator Implementation](#operator-implementation) and [Operator Integration](#operator-integration) in the directory `ops/ascendc/`. After the initial development and integration of the custom operator, user can add [Operator Interface](#add-operator-call-interface) and do [Operator Compilation and Testing](#operator-compilation-and-testing) after declaration and implementation. ### Operator Interface Declaration Create a header file in `csrc/ascendc/` to declare the operator function and related interfaces. Refer to [adv_step_flash.cpp](https://gitee.com/mindspore/vllm-mindspore/blob/master/csrc/ascendc/adv_step_flash.h): ```cpp #ifndef VLLM_MINDSPORE_CSRC_ASCENDC_ADV_STEP_FLASH_H #define VLLM_MINDSPORE_CSRC_ASCENDC_ADV_STEP_FLASH_H extern void AdvStepFlashKernelEntry(uint32_t blockDims, void *l2ctrl, void *aclStream, uint8_t *sampledTokenIds, uint8_t *blockTables, uint8_t *seqLensInput, uint8_t *inputTokens, uint8_t *inputPositions, uint8_t *seqLensOut, uint8_t *slotMapping, int32_t num_seqs, int32_t block_size, int32_t block_tables_stride); #endif // VLLM_MINDSPORE_CSRC_ASCENDC_ADV_STEP_FLASH_H ``` ### Operator Implementation Create an implementation file in `csrc/ascendc/` for the core logic. Refer to [adv_step_flash.cpp](https://gitee.com/mindspore/vllm-mindspore/blob/master/csrc/ascendc/adv_step_flash.c): ```cpp #include "kernel_operator.h" extern "C" __global__ __aicore__ void adv_step_flash_impl(GM_ADDR sampledTokenIds, GM_ADDR blockTables, GM_ADDR seqLensInput, GM_ADDR inputTokens, GM_ADDR inputPositions, GM_ADDR seqLensOut, GM_ADDR slotMapping, int32_t num_seqs, int32_t block_size, int32_t block_tables_stride) { // AscendC operator implementation } #ifndef __CCE_KT_TEST__ void AdvStepFlashKernelEntry(uint32_t blockDims, void *l2ctrl, void *aclStream, uint8_t *sampledTokenIds, uint8_t *blockTables, uint8_t *seqLensInput, uint8_t *inputTokens, uint8_t *inputPositions, uint8_t *seqLensOut, uint8_t *slotMapping, int32_t num_seqs, int32_t block_size, int32_t block_tables_stride) { adv_step_flash_impl<<>>(sampledTokenIds, blockTables, seqLensInput, inputTokens, inputPositions, seqLensOut, slotMapping, num_seqs, block_size, block_tables_stride); } #endif ``` ### Operator Integration Create an integration file in `csrc/module/`. Refer to [adv_step_flash.cpp](https://gitee.com/mindspore/vllm-mindspore/blob/master/csrc/module/adv_step_flash.cpp): ```cpp #include "ms_extension/api.h" #include "ascendc/adv_step_flash.h" #include "module/module.h" auto pyboost_adv_step_flash(int32_t num_seqs, int32_t num_queries, int32_t block_size, ms::Tensor input_tokens, ms::Tensor sampled_token_ids, ms::Tensor input_positions, ms::Tensor seq_lens, ms::Tensor slot_mapping, ms::Tensor block_tables) { // Use ms::PyboostRunner to call your operator } VLLM_MS_EXTENSION_MODULE(m) { m.def("advance_step_flashattn", &pyboost_adv_step_flash, "advance_step_flashattn", pybind11::arg("num_seqs"), pybind11::arg("num_queries"), pybind11::arg("block_size"), pybind11::arg("input_tokens"), pybind11::arg("sampled_token_ids"), pybind11::arg("input_positions"), pybind11::arg("seq_lens"), pybind11::arg("slot_mapping"), pybind11::arg("block_tables")); } ``` In the above, the first parameter `"advance_step_flashattn"` in `m.def()` is the Python interface name for the operator. The `module.h` and `module.cpp` files create the Python module for the operator based on pybind11. Since only one `PYBIND11_MODULE` is allowed per dynamic library, and to allow users to complete operator integration in a single file, vLLM-MindSpore Plugin provides a new registration macro `VLLM_MS_EXTENSION_MODULE`. When the custom operator dynamic library is loaded, all operator interfaces will be automatically registered into the same Python module. ### Operator Interface The custom operator in vLLM-MindSpore Plugin is compiled into `_C_ops.so`. For convenient calls, user can add a call interface in `vllm_mindspore/_custom_ops.py`. If extra adaptation is needed before or after the operator call, user can implement it in this interface. ```python def advance_step_flashattn(num_seqs: int, num_queries: int, block_size: int, input_tokens: torch.Tensor, sampled_token_ids: torch.Tensor, input_positions: torch.Tensor, seq_lens: torch.Tensor, slot_mapping: torch.Tensor, block_tables: torch.Tensor) -> None: """Advance a step on Ascend for existing inputs for a multi-step runner""" from vllm_mindspore import _C_ops as c_ops c_ops.advance_step_flashattn(num_seqs=num_seqs, num_queries=num_queries, block_size=block_size, input_tokens=input_tokens, sampled_token_ids=sampled_token_ids, input_positions=input_positions, seq_lens=seq_lens, slot_mapping=slot_mapping, block_tables=block_tables) ``` Here, importing `_C_ops` allows user to use the Python module for the custom operator. It is recommended to import it right before calling, so it is not imported unnecessarily. ### Operator Compilation and Testing 1. **Code Integration**: Merge the code into the vLLM-MindSpore Plugin project. 2. **Project Compilation**: Run `pip install .` in vllm-mindspore to build and install vLLM-MindSpore Plugin. 3. **Operator Testing**: Call the operator interface via `_custom_ops`. Refer to testcase [test_custom_advstepflash.py](https://gitee.com/mindspore/vllm-mindspore/blob/master/tests/st/python/test_custom_advstepflash.py): ```python from vllm_mindspore import _custom_ops as custom_ops custom_ops.advance_step_flashattn(...) ``` ## Custom Operator Compilation Project Currently, MindSpore provides only a [CustomOpBuilder](https://www.mindspore.cn/docs/en/master/api_python/ops/mindspore.ops.CustomOpBuilder.html) interface for online compilation of custom operators, with default compilation and linking options built in. vLLM-MindSpore Plugin integrates operators based on MindSpore’s custom operator feature and compiles them into a dynamic library for package release. The following introduces the build process: ### Extension Module In `setup.py`, vLLM-MindSpore Plugin adds a `vllm_mindspore._C_ops` extension and the corresponding build module: ```python ext_modules = [Extension("vllm_mindspore._C_ops", sources=[])], cmdclass = {"build_ext": CustomBuildExt}, ``` There is no need to specify `sources` here because vLLM-MindSpore Plugin triggers the operator build via CMake, which automatically collects the source files. ### Building Process 1. `CustomBuildExt` calls CMake to execute `csrc/CMakeLists.txt`, passing necessary environment variables to trigger operator build. 2. Through `ascendc/CMakeLists.txt`, the AscendC compiler is called to compile the source code in the `ascendc` directory. The static library `ascendc_kernels_npu.a` is generated. 3. Recursively collect the list of cpp source files `SRC_FILES`. 4. Generate a temporary script `build_custom_with_ms.py`, which calls `mindspore.CustomOpBuilder` to build the operator interfaces. The script also contains the source file list, header paths, static library paths, etc. 5. Use CMake's `add_custom_target` command to call the Python script to build the custom operator, generating `_C_ops.so`. 6. Rename `_C_ops.so` to the standard Python package extension module name, such as `_C_ops.cpython-39-aarch64-linux-gnu.so`.