CustomOpBuilder: Integrating ATB Operators Using AtbOpRunner

Overview

Ascend Transformer Boost (ATB) Operator Acceleration Library is an operator library specifically designed for training and inference of Transformer models, based on Huawei's Ascend AI processors.

When users need to use operators from the ATB acceleration library that are not provided by MindSpore, they can quickly integrate and use them through custom operators.

In Custom Operators Based on CustomOpBuilder, MindSpore provides the PyboostRunner tool to allow users to integrate custom operators in dynamic graphs. Now, for ATB operators, MindSpore additionally provides the AtbOpRunner tool to encapsulate the ATB operator's workflow and the dynamic graph's multi-stage pipeline.

In the complete ATB operator workflow, users need to execute steps such as constructing Param, creating Operation and Context, setting variantPack (operator input-output tensors), calling Setup, calling Execute, and destroying Context and Operation. However, for a single operator, its Operation only depends on operator attributes (Param), and its Context only depends on the stream, both of which can be reused. Therefore, MindSpore provides a cache to store these data structures, avoiding unnecessary time consumption caused by repeated creation and destruction.

When integrating ATB operators using the AtbOpRunner class, users only need to provide a corresponding hash function for Param (used as the key for caching Operation) and call the Init interface for initialization (constructing Operation), followed by the Run interface to execute the ATB operator. Additionally, users can directly call the RunAtbOp function for one-click execution (the function internally includes calls to both Init and Run interfaces).

This guide uses SwiGLU as an example to demonstrate the ATB operator integration process. The complete code can be found in the code repository.

Installing the ATB Acceleration Library

Click here for installation tutorial

Since MindSpore uses the "ABI=0" standard during construction, the set_env.sh script for ATB also requires the "ABI=0" configuration. For example:

source /usr/local/Ascend/nnal/atb/set_env.sh --cxx_abi=0 &> /dev/null

Integrating the SwiGLU Operator

Here we use ms::pynative::RunAtbOp to integrate the operator and call the function interface through ms::pynative::PyboostRunner::Call:

#include "ms_extension/api.h"

namespace atb {
template <>
struct HashOpParam<atb::infer::ActivationParam> {
  void operator()(const atb::infer::ActivationParam &param) const {
    add_param_to_buf("activationType", param.activationType);
    add_param_to_buf("scale", param.scale);
    add_param_to_buf("dim", param.dim);
    add_param_to_buf("geluMode", param.geluMode);
  }
};
}  // namespace atb

ms::Tensor InferSwigluForward(const ms::Tensor &x, int32_t dim) {
  ShapeVector out_tensor_shape(x.shape());
  int64_t split_dim = dim;
  if (split_dim < 0) {
    split_dim += out_tensor_shape.size();
  }
  const int64_t split_num = 2;
  out_tensor_shape[split_dim] /= split_num;
  return ms::Tensor(x.data_type(), out_tensor_shape);
}

ms::Tensor npu_swiglu(const ms::Tensor &x, int32_t dim) {
  auto y = InferSwigluForward(x, dim);

  atb::infer::ActivationParam param;
  param.activationType = atb::infer::ActivationType::ACTIVATION_SWIGLU_FORWARD;
  param.dim = dim;

  ms::pynative::RunAtbOp("SwiGLU", param, {x}, {y});
  return y;
}

auto pyboost_npu_swiglu(const ms::Tensor &x, int32_t dim) {
  return ms::pynative::PyboostRunner::Call<1>(npu_swiglu, x, dim);
}

PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
  m.def("npu_swiglu", &pyboost_npu_swiglu, "swiglu realization", pybind11::arg("x"), pybind11::arg("dim") = -1);
}

1. Provide the Hash Function for Param

namespace atb {
template <>
struct HashOpParam<atb::infer::ActivationParam> {
  void operator()(const atb::infer::ActivationParam &param) const {
    add_param_to_buf("activationType", param.activationType);
    add_param_to_buf("scale", param.scale);
    add_param_to_buf("dim", param.dim);
    add_param_to_buf("geluMode", param.geluMode);
  }
};
}  // namespace atb

As described in the ATB Acceleration Library API documentation, the ATB SwiGLU operator uses the atb::infer::ActivationParam parameter.

The hash function is defined as an operator() function within the HashOpParam template class. Users can specialize this class with an actual Param type and must place it within the namespace atb. Within the hash function, the add_param_to_buf interface is used to sequentially add the member variables of Param, and the framework calculates an integer hash value based on the values in the buffer.

In general, if a specific value of an operator parameter is unused or fixed, it can be excluded from the hash function. However, for maintainability and extensibility, it is recommended to include all member variables of Param in the hash function to avoid potential precision issues caused by missing hash values when extending operator functionality in the future.

2. Infer the Output Information of the Operator

ms::Tensor InferSwigluForward(const ms::Tensor &x, int32_t dim) {
  ShapeVector out_tensor_shape(x.shape());
  int64_t split_dim = dim;
  if (split_dim < 0) {
    split_dim += out_tensor_shape.size();
  }
  const int64_t split_num = 2;
  out_tensor_shape[split_dim] /= split_num;
  return ms::Tensor(x.data_type(), out_tensor_shape);
}

For the SwiGLU operator, the output tensor has the same data type as the input tensor. Its shape only differs in the dim dimension, which has a length half that of the input dimension, while other dimensions remain the same. After inferring the output shape, an empty tensor is constructed using the ms::Tensor constructor.

Here, the output tensor is defined as y:

auto y = InferSwigluForward(x, dim);

3. Create and Set the Operator Attribute Structure

atb::infer::ActivationParam param;
param.activationType = atb::infer::ActivationType::ACTIVATION_SWIGLU_FORWARD;
param.dim = dim;

4. Execute the Operator via the RunAtbOp Interface

ms::pynative::RunAtbOp("SwiGLU", param, {x}, {y});

This is a template interface, equivalent to:

auto runner = std::make_shared<AtbOpRunner>("SwiGLU");
runner->Init(param);
runner->Run({x}, {y});

By passing in the operator name, attributes, input tensor list, and output tensor list, the corresponding ATB operator can be invoked. This interface supports multi-stage pipeline execution in dynamic graphs.

5. Bind the C++ Function to a Python Function via pybind11

auto pyboost_npu_swiglu(const ms::Tensor &x, int32_t dim) {
  return ms::pynative::PyboostRunner::Call<1>(npu_swiglu, x, dim);
}

PYBIND11_MODULE(MS_EXTENSION_NAME, m) {
  m.def("swiglu", &pyboost_npu_swiglu, "swiglu realization", pybind11::arg("x"), pybind11::arg("dim") = -1);
}

6. Compile the Custom Operator Using CustomOpBuilder

Save the above C++ code as a file named atb_activation.cpp, and then compile it using the Python interface CustomOpBuilder.

import mindspore
import numpy as np
x = mindspore.Tensor(np.random.rand(2, 32).astype(np.float16))
my_ops = mindspore.ops.CustomOpBuilder("atb_activation", "atb_activation.cpp", enable_atb=True).load()
y = my_ops.swiglu(x, -1)
print(y)

Here, the parameter enable_atb=True is passed into CustomOpBuilder, and MindSpore will automatically add compilation and linking options related to the ATB acceleration library. Users only need to ensure that the set_env.sh script for the ATB library has been correctly executed, and the environment contains the ATB_HOME_PATH variable.