# List of Hardware Backends Supported by MindSpore Lite

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.7.1/resource/_static/logo_source_en.svg)](https://gitee.com/mindspore/docs/blob/r2.7.1/docs/lite/docs/source_en/reference/operator_list_lite.md)

| Operator Names                            | Operator Functions                                                     | CPU                                                 | Kirin NPU | GPU (Mali/Adreno)        | Ascend  |
| ----------------------------------- | ------------------------------------------------------------ | --------------------------------------------------- | --------- | ----------------------- | ----------------------- |
| Abs                                 | Element-wise calculate the absolute value                                             | FP16<br>FP32<br/>Int32<br/>Int8<br/>UInt8           | FP16      | FP16<br/>FP32           | FP16       |
| AbsGrad                             | Compute the gradient of the absolute value function                                         | FP32                                                | -         | -                       |                        |
| Activation                          | Activation functions                                                     | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| ActivationGrad                      | Calculate the gradient of a specific activation function                                       | FP16<br/>FP32                                       | -         | -                       |                        |
| Adam                                | Executing a single parameter update step of the Adam optimizer                             | FP32                                                | -         | -                       |                        |
| AddFusion                           | Element-wise addition computation                                               | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32<br/>Int8  | FP16 |
| AdderFusion            | Addition-based convolution operation                                                   | FP32                                                | -         | -                       |                        |
| AddGrad                             | Compute the gradient of the addition operation                                           | FP32                                                | -         | -                       |                        |
| AddN                                | Perform element-wise addition on N input tensors of identical shape and data type.              | FP16<br/>FP32                                       | -         | -                       |                        |
| Affine                              | Perform an affine transformation on the input tensor.                                       | FP32                                                | -         | -                       | FP16                   |
| All                                 | Determine whether all elements in the tensor are True (non-zero) along the specified dimension.           | FP32                                                | -         | -                       |                        |
| AllGather                           | Distributed collection communication operations                                           | FP32                                                | -         | -                       |                        |
| ApplyMomentum                       | Execute a single parameter update step of stochastic gradient descent for momentum.                  | FP32                                                | -         | -                       | FP16                   |
| Assert                              | Assertion                                                         | FP16<br/>FP32<br/>Bool                              | -         | -                       |                        |
| Assign                              | Assign a value to a variable                                       | FP32                                                | -         | -                       | FP16                   |
| ArgmaxFusion                        | Find the maximum value in a given dimension                                             | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| ArgminFusion                        | Find the minimum value in a given dimension                                             | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           | FP16       |
| AvgPoolFusion                       | Average pooling                                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| AvgPoolGrad                         | Compute the gradients for the average pooling layer                                         | FP16<br/>FP32                                       | -         | -                       |                        |
| BatchNorm                           | Batch normalization                                                   | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           | FP16       |
| BatchNormGrad                       | Compute the gradient of the batch normalization layer                                       | FP16<br/>FP32                                       | -         | -                       |                        |
| BatchToSpace                        | Inverse operation of space-to-batch transformation                                       | FP32<br/>Int8<br/>UInt8                             | -         | FP16<br/>FP32           |            |
| BatchToSpaceND                      | ND universal version of BatchToSpace                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           |            |
| BiasAdd                             | Add the bias vector to the input tensor                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           | FP16       |
| BiasAddGrad                         | The gradient of the BiasAdd operation                                       | FP16<br/>FP32                                       | -         | -                       |                        |
| BinaryCrossEntropy                  | Calculate the binary cross-entropy loss                                           | FP32                                                | -         | -                       | FP16                   |
| BinaryCrossEntropyGrad              | Calculate the gradient of the binary cross-entropy loss function                                 | FP32                                                | -         | -                       |                        |
| BroadcastTo                         | Expansion of dimensions                                                         | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       |                        |
| Call                                | Call a subgraph or function                                       | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       | FP16                   |
| Cast                                | Data type conversion                                                 | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32           | FP16       |
| Ceil                                | Round up to the nearest integer                                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Clip                                | Restrict element ranges                                                 | FP32<br/>Int32                                      | -         | -                       | FP16                   |
| Concat                              | Concatenated Tensor                                                     | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32<br/>Int32 | FP16 |
| ConstantOfShape                     | Generate a tensor with the same shape as the input and fill it with the specified constant.               | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| Conv2DFusion                        | 2D convolution                                                       | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Conv2DBackpropFilterFusion          | Compute the gradient of the convolution kernel with respect to the ordinary convolution operation.                               | FP16<br/>FP32                                       | -         | -                       |                        |
| Conv2DBackpropInputFusion           | Compute the gradient of the input data with respect to the standard convolution operation.                             | FP16<br/>FP32                                       | -         | -                       |                        |
| Conv2dTransposeFusion               | Perform transposed convolution operations                                             | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Cos                                 | Element-wise cosine calculation                                                | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Crop                                | Crop a specified region from an input image or feature map.                       | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | -         | -                       |                        |
| CropAndResize                       | Crop regions from the input image based on a set of bounding boxes, then resize each region to a uniform size. | FP32                                                | FP16      | -                       |                        |
| CumSum                              | Cumulative sum of elements                                                   | FP32<br/>Int32                                      | -         | -                       | FP16                   |
| CustomExtractFeatures               | Extract operators based on custom feature                                           | FP32                                                | -         | -                       |                        |
| CustomNormalize                     | Custom normalized operator                                             | FP32                                                | -         | -                       |                        |
| CustomPredict                       | Custom prediction operator                                               | FP32<br/>Int32                                      | -         | -                       |                        |
| DEConv2DGradFilter                  | Compute the gradient of the transposed convolution with respect to the convolution kernel.                                   | FP32                                                | -         |                         |                         |
| DepthToSpace                        | Rearrange deep data into spatial dimensions                               | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           |            |
| DetectionPostProcess                | Post-processing of object detection                                               | FP32<br/>Int8<br/>UInt8                             | -         | -                       |                        |
| DivFusion                           | Element-wise division                                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| DivGrad                             | Compute the gradient of the division operation                                           | FP32                                                | -         | -                       |                        |
| Dropout                             | Randomly set some elements of the input tensor to zero.                                 | FP16<br/>FP32                                       | -         | -                       | FP16                   |
| DropoutGrad                         | Compute the gradient of the Dropout operation                                        | FP16<br/>FP32                                       | -         | -                       |                        |
| DynamicQuant                        | Dynamically quantize floating-point tensors to uint8 type                                 | FP32                                                | -         | -                       |                        |
| Eltwise                             | Element-level operations                                                   | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Elu                                 | Activation function, applying exponential correction to negative inputs                               | FP16<br/>FP32                                       | -         | -                       | FP16                   |
| Equal                               | Determine whether inputs are equal                                              | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| EmbeddingLookupFusion               | Optimized word embedding lookup, mapping integer indices to dense vectors                 | FP32                                                | -         | -                       |                        |
| Erf                                 | Error functions                                                      | FP16<br/>FP32                                       | -         | -                       | FP16                   |
| ExpFusion                           | Element-wise exponentiation                                                 | FP16<br/>FP32                                       | -         | FP16<br/>FP32           | FP16       |
| ExpandDims                          | Insert a dimension of length 1 at the specified position                                  | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32<br/>Int32 | FP16 |
| Fill                                | Generate a tensor filled with the specified constant.                                   | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | FP16<br/>FP32           | FP16       |
| Flatten                             | Data is expanded by dimension                                               | FP16<br/>FP32<br/>Int32                             | -         | -                       | FP16                   |
| FlattenGrad                         | Compute the gradient of the Flatten operation                                        | FP16<br/>FP32                                       | -         | -                       |                        |
| Floor                               | Round down to the nearest integer                                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| FloorDiv                            | Element-wise division down to the nearest integer                                           | FP16<br/>FP32<br/>Int32                             | FP16      | FP16<br/>FP32           |            |
| FloorMod                            | Element-wise modulo operation: the sign of the result matches that of the divisor.                         | FP16<br/>FP32<br/>Int32                             | FP16      | FP16<br/>FP32           |            |
| FullConnection                      | Fully-connected layer                                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| FusedBatchNorm                      | Standardize the input                                               | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | -                       | FP16                   |
| GatherNd                            | Collect elements from the input tensor at specified positions based on the index tensor.                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | -         | FP16<br/>FP32           | FP16       |
| Gather                              | Collect elements at specified index positions along a single dimension                             | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32<br/>Int32 | FP16 |
| GatherD                             | Collect elements from the input tensor based on the index tensor.                    | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       | FP16                   |
| GLU                                 | Gated linear unit activation function splits the input into two parts and performs element-wise multiplication.         | FP32                                                | -         | -                       |                        |
| Greater                             | Perform element-wise comparison between two tensors, returning a logical result (True/False) indicating whether A > B.          | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| GreaterEqual                        | Perform element-wise comparison between two tensors, returning a logical result (True/False) indicating whether A ≥ B.         | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| GroupNormFusion                     | Group normalization for fusion optimization                                           | FP32                                                | -         | -                       |                        |
| GRU                                 | Gated recurrent unit, simplified LSTM                                     | FP16<br/>FP32                                       | -         | -                       |                        |
| HashtableLookup                     | Hash table lookup                                                   | FP32<br/>Int32                                      | -         | -                       |                        |
| InstanceNorm                        | Instance normalization                                                   | FP16<br/>FP32                                       | FP16      | -                       | FP16                   |
| InvertPermutation                   | Inverted replacement index                                                 | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| IsFinite                            | Check whether each element in the tensor is finite (not inf/NaN)                 | FP32                                                | -         | -                       | FP16                   |
| L2NormalizeFusion                   | L2 normalization for fusion optimization                                           | FP32<br/>Int8<br/>UInt8                             | -         | -                       |                        |
| LayerNormFusion                     | Layer normalization for fusion optimization                                            | FP16<br/>FP32<br/>Int8                              | -         | FP16<br/>FP32           | FP16       |
| LayerNormGrad                       | Compute layer normalization gradients                                           | FP16<br/>FP32                                       | -         | -                       |                        |
| LeakyReLU                           | Leaky ReLU activation function, which assigns a small slope to negative inputs.                  | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Less                                | Perform element-wise comparison between two tensors, returning a logical result indicating whether A < B.                       | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| LessEqual                           | Perform element-wise comparison: A ≤ B, returns a Boolean tensor                                | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| LRN                                 | Local response normalization                                               | FP32                                                | -         | -                       | FP16                   |
| Log                                 | Element-wise calculate the logarithm                                                 | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Log1p                               | Calculate log(1+X)                                                 | FP32                                                | -         | -                       | FP16                   |
| LogGrad                             | Calculate the gradient of the logarithmic function                                           | FP16<br/>FP32                                       | -         | -                       |                        |
| LogicalAnd                          | Element-wise logical AND operation                                             | FP16<br/>FP32<br/>Int32<br/>Bool                    | FP16      | FP16<br/>FP32           |            |
| LogicalNot                          | Element-level logical NOT operation                                                 | FP16<br/>FP32<br/>Int8<br/>UInt8<br/>Bool           | FP16      | FP16<br/>FP32           |            |
| LogicalOr                           | Element-wise logical OR operation                                             | FP16<br/>FP32<br/>Bool                              | FP16      | FP16<br/>FP32           |            |
| LogSoftmax                          | Perform a softmax operation on the input vector, then take the logarithm of the softmax result.         | FP16<br/>FP32                                       | -         | -                       | FP16                   |
| LshProjection                       | Locality-sensitive hash projection                                             | FP32                                                | -         | -                       |                        |
| LSTM                                | Long-term and short-term memory network unit                                           | FP16<br/>FP32                                       | -         | -                       |                        |
| LSTMGrad                            | Calculate the backward propagation gradient of the LSTM for the hidden state                               | FP32                                                | -         | -                       |                        |
| LSTMGradData                        | Compute the backpropagation gradient of the LSTM for the input data                             | FP32                                                | -         | -                       |                        |
| LSTMGradWeight                      | Calculate the backward propagation gradient of weights for the LSTM                                 | FP32                                                | -         | -                       |                        |
| MatMulFusion                        | Perform matrix multiplication on two inputs                                      | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Maximum                             | Find the maximum value at the element level                                               | FP16<br/>FP32<br/>Int32                             | FP16      | FP16<br/>FP32           | FP16       |
| MaximumGrad                         | Calculate the gradient of the maximum value function                                         | FP16<br/>FP32                                       | -         | -                       |                        |
| MaxPoolFusion                       | Maximum pooling                                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| MaxPoolGrad                         | Compute the gradients for the max-pooling layer                                         | FP16<br/>FP32                                       | -         | -                       |                        |
| Merge                               | Create a new tensor with the exact same shape as the input tensor X, but with all element values set to 1.     | FP16<br/>FP32                                       | -         | -                       |                        |
| Minimum                             | Find the minimum value at the element level                                               | FP16<br/>FP32<br/>Int32                             | FP16      | FP16<br/>FP32           | FP16       |
| MinimumGrad                         | Compute the gradient of the minimum value function                                         | FP16<br/>FP32                                       | -         | -                       |                        |
| Mod                                 | Return the remainder of the division operation                                           | FP32<br/>Int32                                      | -         | -                       | FP16                   |
| MulFusion                           | Element-wise multiplication                                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| MulGrad                             | Compute the gradient of the multiplication operation                                           | FP32                                                | -         | -                       |                        |
| Neg                                 | Element-wise find negative numbers                                                 | FP16<br/>FP32<br/>Int32                             | FP16      | FP16<br/>FP32           | FP16       |
| NegGrad                             | Compute the gradient of the negation operation                                           | FP16<br/>FP32                                       | -         | -                       |                        |
| NLLLoss                             | Compute the negative log-likelihood loss                                           | FP32                                                | -         | -                       | FP16                   |
| NLLLossGrad                         | Compute the gradient of NLLLoss                                            | FP32                                                | -         | -                       |                        |
| NotEqual                            | Performs element-wise comparison between two tensors and returns the logical result indicating whether A != B.                    | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           |            |
| NonMaxSuppression                  | Non-maximum suppression                                                 | FP32                                                | -         | -                       | FP16                   |
| NonZero                             | Return the indices of all non-zero elements in the input tensor.                             | Bool                                                | -         | -                       | FP16                   |
| OneHot                              | Convert integer index tensors to one-hot encoding representations                             | FP16<br/>FP32<br/>Int32                             | -         | FP16<br/>FP32<br/>Int32 |  |
| OnesLike                            | Create a new tensor with the exact same shape as the input tensor X, but with all element values set to 1.    | FP16<br/>FP32<br/>Int32                             | -         | -                       | FP16                   |
| PadFusion                           | Add specified padding to the input tensor, to achieve the desired size.              | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| PartialFusion                       | Partial fusion                                                     | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       |                        |
| PowFusion                           | Element-wise exponentiation                                                   | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           | FP16       |
| PowerGrad                           | Compute the gradient of the power operation                                             | FP32                                                | -         | -                       |                        |
| PriorBox                            | Generate prior boxes                                                   | FP32<br/>Int8<br/>UInt8                             | -         | -                       | FP16                   |
| PReLUFusion                         | PRelu activation function                                                | FP16<br/>FP32                                       | -         | FP16<br/>FP32           | FP16       |
| QuantDTypeCast                      | Perform quantitative data type conversion                                         | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | -                       |                        |
| RaggedRange                         | Generate sequences with non-uniform intervals                                         | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| RandomNormal                        | Generate a tensor whose values are randomly sampled from a normal distribution                   | FP16<br/>FP32                                       | -         | -                       |                        |
| RandomStandardNormal                | Generate a random tensor following a standard normal distribution                             | FP16<br/>FP32                                       | -         | -                       |                        |
| Range                               | Generate elements within a specified range                                         | FP16<br/>FP32<br/>Int32                             | -         | -                       | FP16                   |
| Rank                                | Return the number of dimensions in the input tensor                                         | FP16<br/>FP32                                       | -         | -                       |                        |
| RealDiv                             | Element-wise division                                                   | FP16<br/>FP32                                       | -         | -                       | FP16                   |
| Reciprocal                          | Return reciprocals                                                     | FP16<br/>FP32<br/>Int8                              | FP16      | -                       | FP16                   |
| ReduceFusion                        | Reduction operation                                                      | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32           | FP16       |
| ReduceScatter                       | Distributed operations: Input tensors are segmented and distributed across devices, with each device retaining only one segment of the results. | FP32                                                | -         | -                       |                        |
| Reshape                             | Changing the shape of a tensor while keeping the total number of elements unchanged                                 | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32<br/>Int32 | FP16 |
| Resize                              | Upsample or resize the input tensor                               | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           |            |
| ResizeGrad                          | Compute the gradient for Resize                                             | FP16<br/>FP32                                       | -         | -                       |                        |
| ReverseV2                           | Reverse the tensor along the specified axis                                             | FP32<br/>Int32                                      | -         | -                       |                        |
| ReverseSequence                     | Partially reverse the variable-length sequence of the input tensor.                         | FP32                                                | -         | -                       | FP16                   |
| ROIPooling                          | Regional interest pooling                                                 | FP32                                                | -         | -                       | FP16                   |
| Round                               | Round to the nearest whole number                                   | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Rsqrt                               | Element-wise compute square roots and reciprocals for normalization.                             | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           |            |
| RsqrtGrad                           | Calculate the gradient of the reciprocal of the square root                                         | FP32                                                | -         | -                       |                        |
| Select                              | Select elements from two tensors based on conditions                                 | FP32<br/>Bool                                       | -         | -                       |                        |
| Selu                                | Self-normalizing index linear unit activation function                                 | -                                                   | -         | -                       |                        |
| ScaleFusion                         | Fuse scaling operations with adjacent operators                                     | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| ScatterNd                           | Scatter values from the input tensor to specified positions in the output tensor based on the index.             | FP16<br/>FP32<br/>Int32                             | -         | -                       | FP16                   |
| ScatterNdUpdate                     | Update the value of the input data using the given value and the input index.                       | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| SGD                                 | Stochastic gradient descent optimizer                                           | FP32                                                | -         | -                       | FP16                   |
| Shape                               | Obtain the tensor shape                                                | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | -         | FP16<br/>FP32           | FP16       |
| SigmoidCrossEntropyWithLogits       | Combine Sigmoid activation and cross-entropy loss                                  | FP32                                                | -         | -                       | FP16                   |
| SigmoidCrossEntropyWithLogitsGrad   | Compute the gradient of the cross-entropy loss with sigmoid                              | FP32                                                | -         | -                       | FP16                   |
| Sin                                 | Element-wise calculation of sine                                               | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| Size                                | Obtain tensor dimension size                                             | FP16<br/>FP32<br/>Int32                             | -         | -                       | FP16                   |
| SliceFusion                         | Tensor slicing operation                                                 | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| SkipGram                            | The core operation of the Skip-gram model, used for training word vectors                      | FP32                                                | -         | -                       |                        |
| SmoothL1Loss                        | Smooth L1 Loss                                                   | FP32                                                | -         | -                       | FP16                   |
| SmoothL1LossGrad                    | Compute the gradient of the L1 loss                                         | FP32                                                | -         | -                       |                        |
| Softmax                             | Normalization operation                                                   | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| SoftmaxGrad                         | Calculate the gradient of Softmax                                            | FP32                                                | -         | -                       |                        |
| Softplus                            | Smooth ReLU variants                                               | FP16<br/>FP32                                       | -         | -                       | FP16                   |
| SpaceToBatch                        | Move the values of the height and width dimensions to the depth dimension.                               | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           | FP16       |
| SpaceToBatchND                      | Split spatial-dimensional data blocks into batch dimensions                             | FP16<br/>FP32<br/>Int8<br/>UInt8                    | -         | FP16<br/>FP32           |            |
| SpaceToDepth                        | Reorganize spatial data into depth channels                                     | FP16<br/>FP32                                       | -         | FP16<br/>FP32           |            |
| SparseToDense                       | Convert sparse representations to dense tensors                                     | FP16<br/>FP32<br/>Int32                             | -         | FP16<br/>FP32<br/>Int32 |  |
| SparseSoftmaxCrossEntropyWithLogits | Softmax cross-entropy for sparse labels                                      | FP32                                                | -         | -                       | FP16                   |
| Splice                              | Connect multiple slices or ranges of the input tensor along the specified axis.                         | FP16<br/>FP32                                       | -         | -                       |                        |
| Split                               | Split the input tensor into multiple smaller output tensors along the specified axis.                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| SplitWithOverlap                    | Overlapped split tensor                                             | FP16<br/>FP32                                       | -         | -                       |                        |
| Sqrt                                | Element-wise take the square root                                                 | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| SqrtGrad                            | Calculate the gradient of the square root                                             | FP32                                                | -         | -                       |                        |
| Square                              | Element-wise square                                                   | FP16<br/>FP32<br/>Int8<br/>UInt8                    | FP16      | FP16<br/>FP32           | FP16       |
| SquaredDifference                   | Element-wise compute (A-B)²                                            | FP16<br/>FP32                                       | -         | FP16<br/>FP32           |            |
| Squeeze                             | Remove dimension of size 1                                            | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | -         | FP16<br/>FP32<br/>Int32 |  |
| StridedSlice                        | Tensor slicing                                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| StridedSliceGrad                    | Compute the gradient of the slice operation                                           | FP16<br/>FP32                                       | -         | -                       |                        |
| Stack                               | Stack multiple tensors along the new axis                                           | FP16<br/>FP32<br/>Int32                             | -         | FP16<br/>FP32           | FP16       |
| SubFusion                           | Element-wise subtraction                                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | FP16      | FP16<br/>FP32           | FP16       |
| SubGrad                             | Calculate the gradient of subtraction                                               | FP32                                                | -         | -                       |                        |
| Switch                              | Select output branches based on Boolean conditions                                     | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       |                        |
| SwitchLayer                         | Select different subnetwork branches for execution within the model                             | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       |                        |
| TensorListFromTensor                | Convert a regular tensor into a list of tensors, splitting along the specified axis.                       | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| TensorListGetItem                   | Retrieve the tensor at the specified index position from the tensor list                           | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| TensorListReserve                   | Preallocate an empty array list, specifying the element data type and initial capacity.             | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| TensorListSetItem                   | Insert a tensor into a specified position in a list of tensors                                 | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| TensorListStack                     | Stack the list of tensors into a single regular tensor                                 | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| TensorScatterAdd                    | Add the updated tensor values to the specified positions in the target tensor using the index.           | FP32<br/>Int32                                      | -         | -                       |                        |
| TileFusion                          | Flatten the given matrix                                                 | FP16<br/>FP32<br/>Int32<br/>Bool                    | FP16      | -                       | FP16                   |
| TopKFusion                          | Return the top K elements from the input tensor.                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8          | -         | -                       | FP16                   |
| Transpose                           | Tensor transpose                                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>Bool           | FP16      | FP16<br/>FP32           | FP16       |
| UniformReal                         | Generate a random tensor following a uniform distribution                                 | FP32<br/>Int32                                      | -         | -                       |                        |
| Unique                              | Returns the unique values in the input tensor, along with their indices and count.               | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| UnsortedSegmentSum                  | Perform segmented summation on the tensor without requiring ordered segmented indices.                       | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| Unsqueeze                           | Add a new dimension to the input tensor                                   | FP16<br/>FP32<br/>Int32<br/>Int8<br/>UInt8<br/>Bool | FP16      | FP16<br/>FP32<br/>Int32 |  |
| Unstack                             | Split a tensor into multiple sub-tensors along a specified axis                                 | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |
| Where                               | Element selection                                                     | FP16<br/>FP32<br/>Int32<br/>Bool                    | -         | -                       |                        |
| ZerosLike                           | Generate a new tensor with the same shape as the input tensor but with all elements set to zero.                       | FP16<br/>FP32<br/>Int32                             | -         | -                       |                        |