mindspore.nn
Neural Network Cell
For building predefined building blocks or computational units in neural networks.
For more information about dynamic shape support status, please refer to Dynamic Shape Support Status of nn Interface .
Compared with the previous version, the added, deleted and supported platforms change information of mindspore.nn operators in MindSpore, please refer to the link mindspore.nn API Interface Change .
Basic Block
API Name 
Description 
Supported Platforms 
The basic building block of neural networks in MindSpore. 


Base class for running the graph loaded from a MindIR. 


Base class for other losses. 


Base class for updating parameters. 

Container
API Name 
Description 
Supported Platforms 
Holds Cells in a dictionary. 


Holds Cells in a list. 


Sequential Cell container. 

Wrapper Layer
API Name 
Description 
Supported Platforms 
A distributed optimizer. 


Dynamic Loss scale update cell. 


Update cell with fixed loss scaling value. 


Encapsulate training network. 


Cell to run for getting the next operation. 


Wrap the network with Micro Batch to enable the grad accumulation in semi_auto_parallel/auto_parallel mode. 


This function splits the input at the 0th into interleave_num pieces and then performs the computation of the wrapped cell. 


Cell that updates parameter. 


Slice MiniBatch into finergrained MicroBatch for use in pipelineparallel training. 


PipelineGradReducer is a gradient reducer for pipeline parallelism. 


The time distributed layer. 


Network training package class. 


Network training with loss scaling. 


Wraps the forward network with the loss function. 


Cell with loss function. 

Convolutional Layer
API Name 
Description 
Supported Platforms 
1D convolution layer. 


Calculates a 1D transposed convolution, which can be regarded as Conv1d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution). 


2D convolution layer. 


Calculates a 2D transposed convolution, which can be regarded as Conv2d for the gradient of the input, also called deconvolution (although it is not an actual deconvolution). 


3D convolution layer. 


Calculates a 3D transposed convolution, which can be regarded as Conv3d for the gradient of the input. 


Extracts patches from images. 

Recurrent Layer
API Name 
Description 
Supported Platforms 
Stacked Elman RNN layers, applying RNN layer with \(\tanh\) or \(\text{ReLU}\) nonlinearity to the input. 


An Elman RNN cell with tanh or ReLU nonlinearity. 


Stacked GRU (Gated Recurrent Unit) layers. 


A GRU(Gated Recurrent Unit) cell. 


Stacked LSTM (Long ShortTerm Memory) layers. 


A LSTM (Long ShortTerm Memory) cell. 

Transformer Layer
API Name 
Description 
Supported Platforms 
This is an implementation of multihead attention in the paper Attention is all you need. 


Transformer Encoder Layer. 


Transformer Decoder Layer. 


Transformer Encoder module with multilayer stacked of 


Transformer Decoder module with multilayer stacked of 


Transformer module including encoder and decoder. 

Embedding Layer
API Name 
Description 
Supported Platforms 
A simple lookup table that stores embeddings of a fixed dictionary and size. 


EmbeddingLookup layer. 


Returns a slice of input tensor based on the specified indices and the field ids. 

Nonlinear Activation Layer
API Name 
Description 
Supported Platforms 
CELU Activation Operator. 


Applies the exponential linear unit function elementwise. 


Applies FastGelu function to each element of the input. 


Applies GELU function to each element of the input. 


The gated linear unit function. 


Gets the activation function. 


Applies the Hardtanh function elementwise. 


Applies Hard Shrink activation function elementwise. 


Applies Hard Sigmoid activation function elementwise. 


Applies Hard Swish activation function elementwise. 


Leaky ReLU activation function. 


Applies logsigmoid activation elementwise. 


Applies the LogSoftmax function to ndimensional input tensor elementwise. 


Local Response Normalization. 


Computes MISH (A Self Regularized NonMonotonic Neural Activation Function) of input tensors elementwise. 


Applies softsign activation function elementwise. 


Applies PReLU activation function elementwise. 


Applies ReLU (Rectified Linear Unit activation function) elementwise. 


Compute ReLU6 activation function elementwise. 


Applies RReLU (Randomized Leaky ReLU activation function) elementwise. 


Applies activation function SeLU (Scaled exponential Linear Unit) elementwise. 


Applies the silu linear unit function elementwise. 


Applies sigmoid activation function elementwise. 


Softmin activation function, which is a twocategory function 


Softmax activation function, which is a twocategory function 


Softmax function applied to 2D features data. 


Applies the SoftShrink function elementwise. 


Applies the Tanh function elementwise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape. 


Applies Tanhshrink activation function elementwise and returns a new tensor. 


Thresholds each element of the input Tensor. 

Linear Layer
API Name 
Description 
Supported Platforms 
The dense connected layer. 


The bilinear dense connected layer. 

Dropout Layer
API Name 
Description 
Supported Platforms 
Dropout layer for the input. 


During training, randomly zeroes entire channels of the input tensor with probability p from a Bernoulli distribution (For a 3dimensional tensor with a shape of \((N, C, L)\), the channel feature map refers to a 1dimensional feature map with the shape of \(L\)). 


During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 4dimensional tensor with a shape of \(NCHW\), the channel feature map refers to a 2dimensional feature map with the shape of \(HW\)). 


During training, randomly zeroes some channels of the input tensor with probability p from a Bernoulli distribution (For a 5dimensional tensor with a shape of \(NCDHW\), the channel feature map refers to a 3dimensional feature map with a shape of \(DHW\)). 

Normalization Layer
API Name 
Description 
Supported Platforms 
This layer applies Batch Normalization over a 2D or 3D input (a minibatch of 1D or 2D inputs) to reduce internal covariate shift. 


Batch Normalization is widely used in convolutional networks. 


Batch Normalization is widely used in convolutional networks. 


Group Normalization over a minibatch of inputs. 


This layer applies Instance Normalization over a 3D input (a minibatch of 1D inputs with additional channel dimension). 


This layer applies Instance Normalization over a 4D input (a minibatch of 2D inputs with additional channel dimension). 


This layer applies Instance Normalization over a 5D input (a minibatch of 3D inputs with additional channel dimension). 


Applies Layer Normalization over a minibatch of inputs. 


Sync Batch Normalization layer over a Ndimension input. 

Pooling Layer
API Name 
Description 
Supported Platforms 
Applies a 1D adaptive average pooling over an input Tensor which can be regarded as a composition of 1D input planes. 


This operator applies a 2D adaptive average pooling to an input signal composed of multiple input planes. 


This operator applies a 3D adaptive average pooling to an input signal composed of multiple input planes. 


Applies a 1D adaptive maximum pooling over an input Tensor which can be regarded as a composition of 1D input planes. 


This operator applies a 2D adaptive max pooling to an input signal composed of multiple input planes. 


Calculates the 3D adaptive max pooling for an input Tensor. 


Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes. 


Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes. 


Applies a 3D average pooling over an input Tensor which can be regarded as a composition of 3D input planes. 


Applies the 3D FractionalMaxPool operation over input. 


Applying 1D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane. 


Applying 2D LPPooling operation on an input Tensor can be regarded as forming a 1D input plane. 


Applies a 1D max pooling over an input Tensor which can be regarded as a composition of 1D planes. 


Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes. 


3D max pooling operation. 


Computes the inverse of 


Computes the inverse of 


Computes the inverse of 

Padding Layer
API Name 
Description 
Supported Platforms 
Pads the input tensor according to the paddings and mode. 


Using a given constant value to pads the last dimensions of input tensor. 


Using a given constant value to pads the last two dimensions of input tensor. 


Using a given constant value to pads the last three dimensions of input tensor. 


Using a given padding to do reflection pad on the given tensor. 


Using a given padding to do reflection pad the given tensor. 


Pad the given tensor in a reflecting way using the input boundaries as the axis of symmetry. 


Pad on W dimension of input x according to padding. 


Pad on HW dimension of input x according to padding. 


Pad on DHW dimension of input x according to padding. 


Pads the last two dimensions of input tensor with zero. 

Loss Function
API Name 
Description 
Supported Platforms 
BCELoss creates a criterion to measure the binary cross entropy between the true labels and predicted labels. 


Adds sigmoid activation function to input input as logits, and uses the given logits to compute binary cross entropy between the input and the target. 


CosineEmbeddingLoss creates a criterion to measure the similarity between two tensors using cosine distance. 


The cross entropy loss between input and target. 


Calculates the CTC (Connectionist Temporal Classification) loss. 


The Dice coefficient is a set similarity loss, which is used to calculate the similarity between two samples. 


It is a loss function to solve the imbalance of categories and the difference of classification difficulty. 


Gaussian negative log likelihood loss. 


Calculate the Hinge Embedding Loss value based on the input 'logits' and' labels' (only including 1 or 1). 


HuberLoss calculate the error between the predicted value and the target value. 


Computes the KullbackLeibler divergence between the logits and the labels. 


L1Loss is used to calculate the mean absolute error between the predicted value and the target value. 


MarginRankingLoss creates a criterion that measures the loss. 


MAELoss creates a criterion to measure the average absolute error between \(x\) and \(y\) elementwise, where \(x\) is the input and \(y\) is the labels. 


Calculates the mean squared error between the predicted value and the label value. 


When there are multiple classifications, label is transformed into multiple binary classifications by one hot. 


Creates a loss criterion that minimizes the hinge loss for multiclass classification tasks. 


Calculates the MultiLabelSoftMarginLoss. 


Creates a criterion that optimizes a multiclass classification hinge loss (marginbased loss) between input \(x\) (a 2D minibatch Tensor) and output \(y\) (which is a 1D tensor of target class indices, \(0 \leq y \leq \text{x.size}(1)1\)): 


Gets the negative log likelihood loss between logits and labels. 


Poisson negative log likelihood loss. 


RMSELoss creates a criterion to measure the root mean square error between \(x\) and \(y\) elementwise, where \(x\) is the input and \(y\) is the labels. 


Computes the sampled softmax training loss. 


SmoothL1 loss function, if the absolute error elementwise between the predicted value and the target value is less than the set threshold beta, the square term is used, otherwise the absolute error term is used. 


A loss class for twoclass classification problems. 


Computes softmax cross entropy between logits and labels. 


TripletMarginLoss operation. 

Optimizer
API Name 
Description 
Supported Platforms 
Implements the Adadelta algorithm. 


Implements the Adagrad algorithm. 


Implements the Adaptive Moment Estimation (Adam) algorithm. 


Implements the AdaMax algorithm, a variant of Adaptive Movement Estimation (Adam) based on the infinity norm. 


This optimizer will offload Adam optimizer to host CPU and keep parameters being updated on the device, to minimize the memory cost. 


Implements the Adam algorithm with weight decay. 


Enable the adasum in "auto_parallel/semi_auto_parallel" mode. 


Enable the adasum in "auto_parallel/semi_auto_parallel" mode. 


Implements Average Stochastic Gradient Descent. 


Implements the FTRL algorithm. 


Implements the Lamb(Layerwise Adaptive Moments optimizer for Batching training) algorithm. 


Implements the LARS algorithm. 


Implements the Adaptive Moment Estimation (Adam) algorithm. 


Implements the Momentum algorithm. 


Implements TFT optimizer wrapper, this wrapper is used to report status to MindIO TFT before optimizer updating. 


Implements the ProximalAdagrad algorithm that is an online Learning and Stochastic Optimization. 


Implements Root Mean Squared Propagation (RMSProp) algorithm. 


Implements Resilient backpropagation. 


Implements stochastic gradient descent. 


Updates gradients by secondorder algorithmTHOR. 

Dynamic Learning Rate
LearningRateSchedule Class
The dynamic learning rates in this module are all subclasses of LearningRateSchedule. Pass the instance of LearningRateSchedule to an optimizer. During the training process, the optimizer calls the instance taking current step as input to get the current learning rate.
import mindspore.nn as nn
min_lr = 0.01
max_lr = 0.1
decay_steps = 4
cosine_decay_lr = nn.CosineDecayLR(min_lr, max_lr, decay_steps)
net = Net()
optim = nn.Momentum(net.trainable_params(), learning_rate=cosine_decay_lr, momentum=0.9)
API Name 
Description 
Supported Platforms 
Calculates learning rate based on cosine decay function. 


Calculates learning rate based on exponential decay function. 


Calculates learning rate base on inversetime decay function. 


Calculates learning rate base on natural exponential decay function. 


Calculates learning rate base on polynomial decay function. 


Gets learning rate warming up. 

Dynamic LR Function
The dynamic learning rates in this module are all functions. Call the function and pass the result to an optimizer. During the training process, the optimizer takes result[current step] as current learning rate.
import mindspore.nn as nn
min_lr = 0.01
max_lr = 0.1
total_step = 6
step_per_epoch = 1
decay_epoch = 4
lr= nn.cosine_decay_lr(min_lr, max_lr, total_step, step_per_epoch, decay_epoch)
net = Net()
optim = nn.Momentum(net.trainable_params(), learning_rate=lr, momentum=0.9)
API Name 
Description 
Supported Platforms 
Calculates learning rate base on cosine decay function. 


Calculates learning rate base on exponential decay function. 


Calculates learning rate base on inversetime decay function. 


Calculates learning rate base on natural exponential decay function. 


Get piecewise constant learning rate. 


Calculates learning rate base on polynomial decay function. 


Gets learning rate warming up. 

Image Processing Layer
API Name 
Description 
Supported Platforms 
Applies the PixelShuffle operation over input which implements subpixel convolutions with stride \(1/r\) . 


Applies the PixelUnshuffle operation over input which is the inverse of PixelShuffle. 


For details, please refer to 

Tools
API Name 
Description 
Supported Platforms 
Divide the channels of Tensor whose shape is \((*, C, H, W)\) into \(g\) groups to obtain a Tensor with shape \((*, C \frac g, g, H, W)\), and transpose along the corresponding axis of \(C\), \(\frac{g}{}\) and \(g\) to restore Tensor to the original shape. 


Flatten the input Tensor along dimensions from start_dim to end_dim. 


A placeholder identity operator that returns the same as input. 


Unflattens a Tensor dim according to axis and unflattened_size. 
