mindspore.nn¶
Neural Networks Cells.
Predefined building blocks or computing units to construct Neural Networks.

class
mindspore.nn.
Accuracy
(eval_type='classification')[source]¶ Calculates the accuracy for classification and multilabel data.
The accuracy class creates two local variables, correct number and total number that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as the accuracy: an idempotent operation that simply divides correct number by total number.
\[\text{accuracy} =\frac{\text{true_positive} + \text{true_negative}} {\text{true_positive} + \text{true_negative} + \text{false_positive} + \text{false_negative}}\] Parameters
eval_type (str) – Metric to calculate the accuracy over a dataset, for classification (singlelabel), and multilabel (multilabel classification). Default: ‘classification’.
Examples
>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32) >>> y = Tensor(np.array([1, 0, 1]), mindspore.float32) >>> metric = nn.Accuracy('classification') >>> metric.clear() >>> metric.update(x, y) >>> accuracy = metric.eval()

eval
()[source]¶ Computes the accuracy.
 Returns
Float, the computed result.
 Raises
RuntimeError – If the sample size is 0.

update
(*inputs)[source]¶ Updates the internal evaluation result \(y_{pred}\) and \(y\).
 Parameters
inputs – Input y_pred and y. y_pred and y are a Tensor, a list or an array. For ‘classification’ evaluation type, y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. Shape of y can be \((N, C)\) with values 0 and 1 if onehot encoding is used or the shape is \((N,)\) with integer values if index of category is used. For ‘multilabel’ evaluation type, y_pred and y can only be onehot encoding with values 0 or 1. Indices with 1 indicate positive category. The shape of y_pred and y are both \((N, C)\).
 Raises
ValueError – If the number of the input is not 2.

class
mindspore.nn.
Adam
(params, learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e08, use_locking=False, use_nesterov=False, weight_decay=0.0, loss_scale=1.0, decay_filter=<function Adam.<lambda>>)[source]¶ Updates gradients by Adaptive Moment Estimation (Adam) algorithm.
The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization.
The updating formulas are as follows,
\[\begin{split}\begin{array}{ll} \\ m = \beta_1 * m + (1  \beta_1) * g \\ v = \beta_2 * v + (1  \beta_2) * g * g \\ l = \alpha * \frac{\sqrt{1\beta_2^t}}{1\beta_1^t} \\ w = w  l * \frac{m}{\sqrt{v} + \epsilon} \end{array}\end{split}\]\(m\) represents the 1st moment vector moment1, \(v\) represents the 2nd moment vector moment2, \(g\) represents gradients, \(l\) represents scaling factor lr, \(\beta_1, \beta_2\) represent beta1 and beta2, \(t\) represents updating step while \(beta_1^t\) and \(beta_2^t\) represent beta1_power and beta2_power, \(\alpha\) represents learning_rate, \(w\) represents params, \(\epsilon\) represents eps.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in params should be class mindspore.Parameter.
learning_rate (float) – The Learning rate.
beta1 (float) – The exponential decay rate for the 1st moment estimates. Should be in range (0.0, 1.0).
beta2 (float) – The exponential decay rate for the 2nd moment estimates. Should be in range (0.0, 1.0).
eps (float) – Term added to the denominator to improve numerical stability. Should be greater than 0.
use_locking (bool) – Whether to enable a lock to protect updating variable tensors. If True, updating of the var, m, and v tensors will be protected by a lock. If False, the result is unpredictable. Default: False.
use_nesterov (bool) – Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. If True, updates the gradients using NAG. If False, updates the gradients without using NAG. Default: False.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.
loss_scale (float) – A floating point value for the loss scale. Default: 1.0. Should be equal to or greater than 1.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
Tensor[bool], the value is True.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.Adam(params=net.trainable_params()) >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)

class
mindspore.nn.
AdamWeightDecay
(params, learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e06, weight_decay=0.0)[source]¶ Implements Adam algorithm weight decay fix.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in params should be class mindspore.Parameter.
learning_rate (float) – A floating point value for the learning rate. Default: 1e3.
beta1 (float) – The exponential decay rate for the 1st moment estimates. Default: 0.9. Should be in range (0.0, 1.0).
beta2 (float) – The exponential decay rate for the 2nd moment estimates. Default: 0.999. Should be in range (0.0, 1.0).
eps (float) – Term added to the denominator to improve numerical stability. Default: 1e6. Should be greater than 0.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
tuple[Parameter], the updated velocity value, the shape is the same as params.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.AdamWeightDecay(params=net.trainable_params()) >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)

class
mindspore.nn.
AdamWeightDecayDynamicLR
(params, decay_steps, learning_rate=0.001, end_learning_rate=0.0001, power=10.0, beta1=0.9, beta2=0.999, eps=1e06, weight_decay=0.0)[source]¶ Adam Weight Decay Dynamic Learning Rate (LR).
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in params should be class mindspore.Parameter.
decay_steps (int) – The steps of the decay.
learning_rate (float) – A floating point value for the learning rate. Default: 0.001.
end_learning_rate (float) – A floating point value for the end learning rate. Default: 0.0001.
power (float) – Power. Default: 10.0.
beta1 (float) – The exponential decay rate for the 1st moment estimates. Default: 0.9. Should be in range (0.0, 1.0).
beta2 (float) – The exponential decay rate for the 2nd moment estimates. Default: 0.999. Should be in range (0.0, 1.0).
eps (float) – Term added to the denominator to improve numerical stability. Default: 1e6. Should be greater than 0.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
tuple[Parameter], the updated velocity value, the shape is the same as params.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.AdamWeightDecayDynamicLR(params=net.trainable_params(), decay_steps=10) >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)

class
mindspore.nn.
AvgPool1d
(kernel_size=1, stride=1, pad_mode='valid')[source]¶ Average pooling for temporal data.
Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes.
Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), AvgPool1d outputs regional average in the \((W_{in})\)dimension. Given kernel size \(ks = w_{ker}\) and stride \(s = s_0\), the operation is as follows.
\[\text{output}(N_i, C_j, h_k, w) = \frac{1}{w_{ker}} \sum_{n=0}^{w_{ker}1} \text{input}(N_i, C_j, h_k, s_0 \times w + n)\]Note
pad_mode for training only supports “same” and “valid”.
 Parameters
kernel_size (int) – The size of kernel window used to take the average value, Default: 1.
stride (int) – The distance of kernel moving, an int number that represents the width of movement is strides, Default: 1.
pad_mode (str) –
The optional values for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.
same: Adopts the way of completion. Output height and width will be the same as the input. Total number of padding will be calculated for horizontal and vertical direction and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.
valid: Adopts the way of discarding. The possibly largest height and width of output will be return without padding. Extra pixels will be discarded.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> pool = nn.AvgPool1d(kernel_size=3, strides=1) >>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32) >>> output = pool(x) >>> output.shape() (1, 2, 4, 2)

class
mindspore.nn.
AvgPool2d
(kernel_size=1, stride=1, pad_mode='valid')[source]¶ Average pooling for temporal data.
Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes.
Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), AvgPool2d outputs regional average in the \((H_{in}, W_{in})\)dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows.
\[\text{output}(N_i, C_j, h, w) = \frac{1}{h_{ker} * w_{ker}} \sum_{m=0}^{h_{ker}1} \sum_{n=0}^{w_{ker}1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]Note
pad_mode for training only supports “same” and “valid”.
 Parameters
kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the average value, is an int number that represents height and width are both kernel_size, or a tuple of two int numbers that represent height and width respectively. Default: 1.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.
pad_mode (str) –
The optional values for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.
same: Adopts the way of completion. Output height and width will be the same as the input. Total number of padding will be calculated for horizontal and vertical direction and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.
valid: Adopts the way of discarding. The possibly largest height and width of output will be return without padding. Extra pixels will be discarded.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> pool = nn.AvgPool2d(kernel_size=3, strides=1) >>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32) [[[[5. 5. 9. 9.] [8. 4. 3. 0.] [2. 7. 1. 2.] [1. 8. 3. 3.]] [[6. 8. 2. 4.] [3. 0. 2. 1.] [0. 8. 9. 7.] [2. 1. 4. 9.]]]] >>> output = pool(x) >>> output.shape() (1, 2, 2, 2) >>> output [[[[4.888889 4.4444447] [4.111111 3.4444444]] [[4.2222223 4.5555553] [3.2222223 4.5555553]]]]

class
mindspore.nn.
BatchNorm1d
(num_features, eps=1e05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=True)[source]¶ Batch normalization layer over a 2D input.
Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 2D input (a minibatch of 1D inputs) to reduce internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a minibatch of data and the learned parameters which can be described in the following formula.
\[y = \frac{x  \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\] Parameters
num_features (int) – C from an expected input of size (N, C).
eps (float) – A value added to the denominator for numerical stability. Default: 1e5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
affine (bool) – A bool value when set to True, gamma and beta can be learnable. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data, else use the mean value and variance value of specified value. Default: True.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> net = nn.BatchNorm1d(num_features=16) >>> input = Tensor(np.random.randint(0, 255, [3, 16]), mindspore.float32) >>> net(input)

class
mindspore.nn.
BatchNorm2d
(num_features, eps=1e05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=True)[source]¶ Batch normalization layer over a 4D input.
Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 4D input (a minibatch of 2D inputs with additional channel dimension) to avoid internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a minibatch of data and the learned parameters which can be described in the following formula.
\[y = \frac{x  \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\] Parameters
num_features (int) – C from an expected input of size (N, C, H, W).
eps (float) – A value added to the denominator for numerical stability. Default: 1e5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
affine (bool) – A bool value when set to True, gamma and beta can be learnable. Default: True.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data, else use the mean value and variance value of specified value. Default: True.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> net = nn.BatchNorm2d(num_features=3) >>> input = Tensor(np.random.randint(0, 255, [1, 3, 224, 224]), mindspore.float32) >>> net(input)

class
mindspore.nn.
Cell
(auto_prefix=True, flags=None)[source]¶ Base class for all neural network.
A ‘Cell’ could be a single neural network cell, such as conv2d, relu, batch_norm, etc. or a composition of cells to constructing a network.
Note
In general, the autograd algorithm will automatically generate the implementation of the gradient function, but if bprop method is implemented, the gradient function will be replaced by the bprop. The bprop implementation will receive a Tensor dout containing the gradient of the loss w.r.t. the output, and a Tensor out containing the forward result. The bprop need to compute the gradient of the loss w.r.t. the inputs, gradient of the loss w.r.t. Parameter variables is not supported currently.
 Parameters
auto_prefix (bool) – Recursively generate namespaces. Default: True.
Examples
>>> class MyCell(Cell): >>> def __init__(self): >>> super(MyCell, self).__init__() >>> self.relu = P.ReLU() >>> >>> def construct(self, x): >>> return self.relu(x)

cells_and_names
(cells=None, name_prefix='')[source]¶ Returns an iterator over all cells in the network.
Includes the cell’s name and itself.
 Parameters
Examples
>>> n = Net() >>> names = [] >>> for m in n.cells_and_names(): >>> if m[0]: >>> names.append(m[0])

compile_and_run
(*inputs)[source]¶ Compiles and runs cell.
 Parameters
inputs (tuple) – Input parameters.
 Returns
Object, the result of executing.

construct
(*inputs)[source]¶ Defines the computation to be performed.
This method should be overridden by all subclasses.
Note
The inputs of the top cell only allow Tensor. Other types (tuple, list, int etc.) are forbidden.
 Returns
Tensor, returns the computed result.

extend_repr
()[source]¶ Sets the extended representation of the Cell.
To print customized extended information, reimplement this method in your own cells.

get_parameters
(expand=True)[source]¶ Returns an iterator over cell parameters.
Yields parameters of this cell. If expand is True, yield parameters of this cell and all subcells.
 Parameters
expand (bool) – If True, yields parameters of this cell and all subcells. Otherwise, yields only parameters that are direct members of this cell. Default: True.
Examples
>>> net = Net() >>> for item in net.get_parameters(): >>> print(item)

insert_child_to_cell
(child_name, child)[source]¶ Adds a child cell to the current cell.
Inserts a subcell with given name to current cell.

insert_param_to_cell
(param_name, param, check_name=True)[source]¶ Adds a parameter to the current cell.
Inserts a parameter with given name to the cell. Please refer to the usage in source code of mindspore.nn.Cell.__setattr__.
 Parameters
 Raises
KeyError – If the name of parameter is null or contains dot.
AttributeError – If user did not call init() first.
TypeError – If the type of parameter is not Parameter.

load_parameter_slice
(params)[source]¶ Replace parameters with sliced tensors by parallel strategies.
Please refer to the usage in source code of mindspore.common._Executor.compile.
 Parameters
params (dict) – The parameters dictionary used for init data graph.

name_cells
()[source]¶ Returns an iterator over all cells in the network.
Include name of the cell and cell itself.

parameters_and_names
(name_prefix='', expand=True)[source]¶ Returns an iterator over cell parameters.
Includes the parameter’s name and itself.
 Parameters
Examples
>>> n = Net() >>> names = [] >>> for m in n.parameters_and_names(): >>> if m[0]: >>> names.append(m[0])

parameters_dict
(recurse=True)[source]¶ Gets parameters dictionary.
Gets the parameters dictionary of this cell.
 Parameters
recurse (bool) – Whether contains the parameters of subcells. Default: True.
 Returns
OrderedDict, return parameters dictionary.

set_broadcast_flag
(mode=True)[source]¶ Set the cell to data_parallel mode.
The cell can be accessed as an attribute using the given name.
 Parameters
mode (bool) – Specifies whether the model is data_parallel. Default: True.

set_train
(mode=True)[source]¶ Sets the cell to training mode.
The cell itself and all children cells will be set to training mode.
 Parameters
mode (bool) – Specifies whether the model is training. Default: True.

to_float
(dst_type)[source]¶ Add cast on all inputs of cell and child cells to run with certain float type.
If dst_type is mindspore.dtype.float16, all the inputs of Cell including input, Parameter, Tensor as const will be cast to float16. Please refer to the usage in source code of mindspore.train.amp.build_train_network.
Note
Call multiple times will overwrite the previous.
 Parameters
dst_type (
mindspore.dtype
) – Transfer Cell to Run with dst_type. dst_type can be mindspore.dtype.float16 or mindspore.dtype.float32. Raises
ValueError – If dst_type is not float32 or float16.

trainable_params
(recurse=True)[source]¶ Returns all trainable parameters.
Returns a list of all trainable parmeters.
 Parameters
recurse (bool) – Whether contains the trainable parameters of subcells. Default: True.
 Returns
List, the list of trainable parameters.

class
mindspore.nn.
CellList
(*args)[source]¶ Holds Cells in a list.
CellList can be indexed like a regular Python list, but cells it contains are properly registered, and will be visible by all Cell methods.
 Parameters
args (list, optional) – List of subclass of Cell.
Examples
>>> conv = nn.Conv2d(100, 20, 3) >>> bn = nn.BatchNorm2d(20) >>> relu = nn.ReLU() >>> cell_ls = nn.CellList([bn]) >>> cell_ls.insert(0, conv) >>> cell_ls.append(relu) >>> x = Tensor(np.random.random((1, 3, 4, 4)), dtype=mindspore.float32) >>> # not same as nn.SequentialCell, `cell_ls(x)` is not correct >>> cell_ls CellList< (0): Conv2d<input_channels=100, ..., bias_init=None> (1): BatchNorm2d<num_features=20, ..., moving_variance=Parameter (name=variance)> (2): ReLU<> >

class
mindspore.nn.
ClipByNorm
[source]¶ Clips tensor values to a maximum \(L_2\)norm.
The output of this layer remains the same if the \(L_2\)norm of the input tensor is not greater than the argument clip_norm. Otherwise the tensor will be normalized as:
\[\text{output}(X) = \frac{\text{clip_norm} * X}{L_2(X)},\]where \(L_2(X)\) is the \(L_2\)norm of \(X\).
 Inputs:
input (Tensor)  Tensor of shape ND.
clip_norm (Tensor)  A scalar Tensor of shape \(()\) or \((1)\) and of the same type as the input Tensor.
 Outputs:
Tensor, clipped tensor with the same shape as the input.
Examples
>>> net = nn.ClipByNorm() >>> input = Tensor(np.random.randint(0, 10, [4, 16]), mindspore.float32) >>> clip_norm = Tensor(np.array([100]).astype(np.float32)) >>> net(input, clip_norm)

class
mindspore.nn.
Conv2d
(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]¶ 2D convolution layer.
Applies a 2D convolution over an input tensor which is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size and \(C_{in}\) is channel number. For each batch of shape \((C_{in}, H_{in}, W_{in})\), the formula is defined as:
\[out_j = \sum_{i=0}^{C_{in}  1} ccor(W_{ij}, X_i) + b_j,\]where \(ccor\) is cross correlation operator, \(C_{in}\) is the input channel number, \(j\) ranges from \(0\) to \(C_{out}  1\), \(W_{ij}\) corresponds to \(i\)th channel of the \(j\)th filter and \(out_{j}\) corresponds to the \(j\)th channel of the output. \(W_{ij}\) is a slice of kernel and it has shape \((\text{ks_h}, \text{ks_w})\), where \(\text{ks_h}\) and \(\text{ks_w}\) are height and width of the convolution kernel. The full kernel has shape \((C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})\), where group is the group number to split the input in the channel dimension.
If the ‘pad_mode’ is set to be “valid”, the output height and width will be \(\left \lfloor{1 + \frac{H_{in} + 2 \times \text{padding}  \text{ks_h}  (\text{ks_h}  1) \times (\text{dilation}  1) }{\text{stride}}} \right \rfloor\) and \(\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding}  \text{ks_w}  (\text{ks_w}  1) \times (\text{dilation}  1) }{\text{stride}}} \right \rfloor\) respectively.
The first introduction can be found in paper Gradient Based Learning Applied to Document Recognition.
 Parameters
in_channels (int) – The number of input channel \(C_{in}\).
out_channels (int) – The number of output channel \(C_{out}\).
kernel_size (Union[int, tuple[int]]) – The data type is int or tuple with 2 integers. Specifies the height and width of the 2D convolution window. Single int means the value if for both height and width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.
pad_mode (str) –
Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.
same: Adopts the way of completion. Output height and width will be the same as the input. Total number of padding will be calculated for horizontal and vertical direction and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side. If this mode is set, padding must be 0.
valid: Adopts the way of discarding. The possibly largest height and width of output will be return without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.
pad: Implicit paddings on both sides of the input. The number of padding will be padded to the input Tensor borders. padding should be greater than or equal to 0.
padding (int) – Implicit paddings on both sides of the input. Default: 0.
dilation (Union[int, tuple[int]]) – The data type is int or tuple with 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k  1\) pixels skipped for each sampling location. Its value should be greater or equal to 1 and bounded by the height and width of the input. Default: 1.
group (int) – Split filter into groups, in_ channels and out_channels should be divisible by the number of groups. Default: 1.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> net = nn.Conv2d(120, 240, 4, has_bias=False, weight_init='normal') >>> input = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32) >>> net(input).shape() (1, 240, 1024, 640)

class
mindspore.nn.
Conv2dTranspose
(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]¶ 2D transposed convolution layer.
Compute a 2D transposed convolution, which is also know as a deconvolution (although it is not actual deconvolution).
Input is typically of shape \((N, C, H, W)\), where \(N\) is batch size and \(C\) is channel number.
 Parameters
in_channels (int) – The number of channels in the input space.
out_channels (int) – The number of channels in the output space.
kernel_size (Union[int, tuple]) – int or tuple with 2 integers, which specifies the height and width of the 2D convolution window. Single int means the value is for both height and width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.
pad_mode (str) –
Select the mode of the pad. The optional values are “pad”, “same”, “valid”. Default: “same”.
pad: Implicit paddings on both sides of the input.
same: Adopted the way of completion.
valid: Adopted the way of discarding.
padding (int) – Implicit paddings on both sides of the input. Default: 0.
dilation (Union[int, tuple[int]]) – The data type is int or tuple with 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k  1\) pixels skipped for each sampling location. Its value should be greater or equal to 1 and bounded by the height and width of the input. Default: 1.
group (int) – Split filter into groups, in_channels and out_channels should be divisible by the number of groups. Default: 1.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> net = nn.Conv2dTranspose(3, 64, 4, has_bias=False, weight_init='normal') >>> input = Tensor(np.ones([1, 3, 16, 50]), mindspore.float32) >>> net(input)

class
mindspore.nn.
DataWrapper
(network, dataset_types, dataset_shapes, queue_name)[source]¶ Network training package class for dataset.
DataWrapper wraps the input network with a dataset which automatically fetches data with ‘GetNext’ function from the dataset channel ‘queue_name’ and does forward computation in the construct function.
 Parameters
network (Cell) – The training network for dataset.
dataset_types (list) – The type of dataset. The list contains describes the types of the inputs.
dataset_shapes (list) – The shapes of dataset. The list contains multiple sublists that describes the shape of the inputs.
queue_name (str) – The identification of dataset channel which specifies the dataset channel to supply data for the network.
 Outputs:
Tensor, network output whose shape depends on the network.
Examples
>>> # call create_dataset function to create a regular dataset, refer to mindspore.dataset >>> train_dataset = create_dataset() >>> dataset_helper = mindspore.DatasetHelper(train_dataset) >>> net = Net() >>> net = DataWrapper(net, *(dataset_helper.types_shapes()), train_dataset.queue_name)

class
mindspore.nn.
Dense
(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, activation=None)[source]¶ The fully connected layer.
Applies denseconnected layer for the input. This layer implements the operation as:
\[\text{outputs} = \text{activation}(\text{inputs} * \text{kernel} + \text{bias}),\]where \(\text{activation}\) is the activation function passed as the activation argument (if passed in), \(\text{activation}\) is a weight matrix with the same data type as the inputs created by the layer, and \(\text{bias}\) is a bias vector with the same data type as the inputs created by the layer (only if has_bias is True).
 Parameters
in_channels (int) – The number of channels in the input space.
out_channels (int) – The number of channels in the output space.
weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘normal’.
bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘zeros’.
has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.
activation (str) – Regularizer function applied to the output of the layer, eg. ‘relu’. Default: None.
 Raises
ValueError – If weight_init or bias_init shape is incorrect.
 Inputs:
input (Tensor)  Tensor of shape \((N, in\_channels)\).
 Outputs:
Tensor of shape \((N, out\_channels)\).
Examples
>>> net = nn.Dense(3, 4) >>> input = Tensor(np.random.randint(0, 255, [2, 3]), mindspore.float32) >>> net(input) [[ 2.5246444 2.2738023 0.5711005 3.9399147 ] [ 1.0739875 4.0155234 0.94188046 5.459526 ]]

class
mindspore.nn.
DistributedGradReducer
(parameters, mean=True, degree=None)[source]¶ A distributed optimizer.
Constructs a gradient reducer Cell, which applies communication and average operations on singleprocess gradient values.
 Parameters
 Raises
ValueError – If degree is not a int or less than 0.
Examples
>>> from mindspore.communication import init, get_group_size >>> from mindspore.ops import composite as C >>> from mindspore.ops import operations as P >>> from mindspore.ops import functional as F >>> from mindspore import context >>> from mindspore import nn >>> from mindspore import ParallelMode, ParameterTuple >>> >>> device_id = int(os.environ["DEVICE_ID"]) >>> context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=True, >>> device_id=int(device_id), enable_hccl=True) >>> init() >>> context.reset_auto_parallel_context() >>> context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL) >>> >>> >>> class TrainingWrapper(nn.Cell): >>> def __init__(self, network, optimizer, sens=1.0): >>> super(TrainingWrapper, self).__init__(auto_prefix=False) >>> self.network = network >>> self.network.add_flags(defer_inline=True) >>> self.weights = ParameterTuple(network.trainable_params()) >>> self.optimizer = optimizer >>> self.grad = C.GradOperation('grad', get_by_list=True, sens_param=True) >>> self.sens = sens >>> self.reducer_flag = False >>> self.grad_reducer = None >>> self.parallel_mode = context.get_auto_parallel_context("parallel_mode") >>> if self.parallel_mode in [ParallelMode.DATA_PARALLEL, >>> ParallelMode.HYBRID_PARALLEL]: >>> self.reducer_flag = True >>> if self.reducer_flag: >>> mean = context.get_auto_parallel_context("mirror_mean") >>> if mean.get_device_num_is_set(): >>> degree = context.get_auto_parallel_context("device_num") >>> else: >>> degree = get_group_size() >>> self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree) >>> >>> def construct(self, *args): >>> weights = self.weights >>> loss = self.network(*args) >>> sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens) >>> grads = self.grad(self.network, weights)(*args, sens) >>> if self.reducer_flag: >>> # apply grad reducer on grads >>> grads = self.grad_reducer(grads) >>> return F.depend(loss, self.optimizer(grads)) >>> >>> network = Net() >>> optimizer = nn.Momentum(network.trainable_params(), learning_rate=0.1, momentum=0.9) >>> train_cell = TrainingWrapper(network, optimizer) >>> inputs = Tensor(np.ones([16, 16]).astype(np.float32)) >>> label = Tensor(np.zeros([16, 16]).astype(np.float32)) >>> grads = train_cell(inputs, label)

class
mindspore.nn.
Dropout
(keep_prob=0.5, seed0=0, seed1=0, dtype=mindspore.float32)[source]¶ Dropout layer for the input.
Randomly set some elements of the input tensor to zero with probability \(1  keep\_prob\) during training using samples from a Bernoulli distribution.
Note
Each channel will be zeroed out independently on every construct call.
The outputs are scaled by a factor of \(\frac{1}{keep\_prob}\) during training so that the output layer remains at a similar scale. During inference, this layer returns the same tensor as the input.
This technique is proposed in paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting and proved to be effective to reduce overfitting and prevents neurons from coadaptation. See more details in Improving neural networks by preventing coadaptation of feature detectors.
 Parameters
keep_prob (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.5.
seed0 (int) – The first random seed. Default: 0.
seed1 (int) – The second random seed. Default: 0.
dtype (
mindspore.dtype
) – Data type of input. Default: mindspore.float32.
 Raises
ValueError – If keep_prob is not in range (0, 1).
 Inputs:
input (Tensor)  An ND Tensor.
 Outputs:
Tensor, output tensor with the same shape as the input.
Examples
>>> x = Tensor(np.ones([20, 16, 50]), mindspore.float32) >>> net = nn.Dropout(keep_prob=0.8) >>> net(x)

class
mindspore.nn.
DynamicLossScaleUpdateCell
(loss_scale_value, scale_factor, scale_window)[source]¶ Dynamic Loss scale update cell.
For loss scaling training, the initial loss scaling value will be set to be loss_scale_value. In every training step, the loss scaling value will be updated by loss scaling value/scale_factor when there is overflow. And it will be increased by loss scaling value * scale_factor if there is no overflow for a continuous scale_window steps. This cell is used for Graph mode training in which all logic will be executed on device side(Another training mode is normal(nonsink) mode in which some logic will be executed on host).
 Parameters
 Inputs:
inputs (Tensor)  Tensor of shape \((N, \ldots)\).
label (Tensor)  Tensor of shape \((N, \ldots)\).
 Outputs:
Tensor, a scalar Tensor with shape \(()\).
Examples
>>> net_with_loss = Net() >>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9) >>> manager = nn.DynamicLossScaleUpdateCell(loss_scale_value=2**12, scale_factor=2, scale_window=1000) >>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_update_cell=manager) >>> train_network.set_train() >>> >>> inputs = Tensor(np.ones([16, 16]).astype(np.float32)) >>> label = Tensor(np.zeros([16, 16]).astype(np.float32)) >>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mindspore.float32) >>> output = train_network(inputs, label, scaling_sens)

class
mindspore.nn.
ELU
(alpha=1.0)[source]¶ Exponential Linear Uint activation function.
Applies the exponential linear unit function elementwise. The activation function defined as:
\[E_{i} = \begin{cases} x, &\text{if } x \geq 0; \cr \text{alpha} * (\exp(x_i)  1), &\text{otherwise.} \end{cases}\] Parameters
alpha (float) – The coefficient of negative factor whose type is float. Default: 1.0.
 Inputs:
input_data (Tensor)  The input of ELU.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
Embedding
(vocab_size, embedding_size, use_one_hot=False, embedding_table='normal', dtype=mindspore.float32)[source]¶ A simple lookup table that stores embeddings of a fixed dictionary and size.
This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.
Note
When ‘use_one_hot’ is set to True, the input should be of type mindspore.int32.
 Parameters
vocab_size (int) – Size of the dictionary of embeddings.
embedding_size (int) – The size of each embedding vector.
use_one_hot (bool) – Specifies whether to apply one_hot encoding form. Default: False.
embedding_table (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.
dtype (
mindspore.dtype
) – Data type of input. Default: mindspore.float32.
 Inputs:
input (Tensor)  Tensor of shape \((\text{vocab_size})\).
 Outputs:
Tensor of shape \((\text{vocab_size}, \text{embedding_size})\).
Examples
>>> net = nn.Embedding(20000, 768, True) >>> input_data = Tensor(np.ones([8, 128]), mindspore.int32) >>> >>> # Maps the input word IDs to word embedding. >>> output = net(input_data) >>> output.shape() (8, 128, 768)

class
mindspore.nn.
EvaluationBase
(eval_type)[source]¶ Base class of evaluation.
Note
Please refer to the definition of class Accuracy.
 Parameters
eval_type (str) – Type of evaluation must be in {‘classification’, ‘multilabel’}.
 Raises
TypeError – If the input type is not classification or multilabel.

clear
()[source]¶ A interface describes the behavior of clearing the internal evaluation result.
Note
All subclasses should override this interface.

class
mindspore.nn.
F1
[source]¶ Calculates the F1 score. F1 is a special case of Fbeta when beta is 1. Refer to class Fbeta for more details.
\[F_\beta=\frac{2\cdot true\_positive}{2\cdot true\_positive + false\_negative + false\_positive}\]Examples
>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]])) >>> y = Tensor(np.array([1, 0, 1])) >>> metric = nn.F1() >>> metric.update(x, y) >>> fbeta = metric.eval()

class
mindspore.nn.
FTRL
(params, initial_accum=0.1, learning_rate=0.001, lr_power=0.5, l1=0.0, l2=0.0, use_locking=False, loss_scale=1.0, weight_decay=0.0)[source]¶ Implement the FTRL algorithm with ApplyFtrl Operator.
FTRL is an online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions. Refer to paper Adaptive Bound Optimization for Online Convex Optimization. Refer to paper Ad Click Prediction: a View from the Trenches for engineering document.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in params should be Parameter.
initial_accum (float) – The starting value for accumulators, must be zero or positive values. Default: 0.1.
learning_rate (float) – The learning rate value, should be positive. Default: 0.001.
lr_power (float) – Learning rate power controls how the learning rate decreases during training, must be less than or equal to zero. Use fixed learning rate if lr_power is zero. Default: 0.5.
l1 (float) – l1 regularization strength, must be greater than or equal to zero. Default: 0.0.
l2 (float) – l2 regularization strength, must be greater than or equal to zero. Default: 0.0.
use_locking (bool) – If True use locks for update operation. Default: False.
loss_scale (float) – Value for the loss scale. It should be equal to or greater than 1.0. Default: 1.0.
wegith_decay (float) – Weight decay value to multiply weight, must be zero or positive value. Default: 0.0.
 Inputs:
grads (tuple[Tensor])  The gradients of params in optimizer, the shape is as same as the params in optimizer.
 Outputs:
tuple[Parameter], the updated parameters, the shape is the same as params.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> opt = nn.FTRL(net.trainable_params()) >>> model = Model(net, loss_fn=loss, optimizer=opt, metrics=None)

class
mindspore.nn.
Fbeta
(beta)[source]¶ Calculates the fbeta score.
Fbeta score is a weighted mean of precison and recall.
\[F_\beta=\frac{(1+\beta^2) \cdot true\_positive} {(1+\beta^2) \cdot true\_positive +\beta^2 \cdot false\_negative + false\_positive}\] Parameters
beta (float) – The weight of precision.
Examples
>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]])) >>> y = Tensor(np.array([1, 0, 1])) >>> metric = nn.Fbeta(1) >>> metric.update(x, y) >>> fbeta = metric.eval()

eval
(average=False)[source]¶ Computes the fbeta.
 Parameters
average (bool) – Whether to calculate the average fbeta. Default value is False.
 Returns
Float, computed result.

update
(*inputs)[source]¶ Updates the internal evaluation result y_pred and y.
 Parameters
inputs – Input y_pred and y. y_pred and y are Tensor, list or numpy.ndarray. y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. y contains values of integers. The shape is \((N, C)\) if onehot encoding is used. Shape can also be \((N,)\) if category index is used.

class
mindspore.nn.
FixedLossScaleUpdateCell
(loss_scale_value)[source]¶ Static scale update cell, the loss scaling value will not be updated.
For usage please refer to DynamicLossScaleUpdateCell.
 Parameters
loss_scale_value (float) – Init loss scale.
Examples
>>> net_with_loss = Net() >>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9) >>> manager = nn.FixedLossScaleUpdateCell(loss_scale_value=2**12) >>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_update_cell=manager) >>> train_network.set_train() >>> >>> inputs = Tensor(np.ones([16, 16]).astype(np.float32)) >>> label = Tensor(np.zeros([16, 16]).astype(np.float32)) >>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mindspore.float32) >>> output = train_network(inputs, label, scaling_sens)

class
mindspore.nn.
Flatten
[source]¶ Flatten layer for the input.
Flattens a tensor without changing dimension of batch size on the 0th axis.
 Inputs:
input (Tensor)  Tensor of shape \((N, \ldots)\) to be flattened.
 Outputs:
Tensor, the shape of the output tensor is \((N, X)\), where \(X\) is the product of the remaining dimensions.
Examples
>>> net = nn.Flatten() >>> input = Tensor(np.array([[[1.2, 1.2], [2.1, 2.1]], [[2.2, 2.2], [3.2, 3.2]]]), mindspore.float32) >>> input.shape() (2, 2, 2) >>> net(input) [[1.2 1.2 2.1 2.1] [2.2 2.2 3.2 3.2]]

class
mindspore.nn.
GELU
[source]¶ Gaussian error linear unit activation function.
Applies GELU function to each element of the input. The input is a Tensor with any valid shape.
GELU is defined as: \(GELU(x_i) = x_i*P(X < x_i)\), where \(P\) is the cumulative distribution function of standard Gaussian distribution and \(x_i\) is the element of the input.
 Inputs:
input_data (Tensor)  The input of Tanh.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
GetNextSingleOp
(dataset_types, dataset_shapes, queue_name)[source]¶ Cell to run get next operation.
 Parameters
dataset_types (list[
mindspore.dtype
]) – The types of dataset.queue_name (str) – Queue name to fetch the data.
Detailed information, please refer to ops.operations.GetNext.

class
mindspore.nn.
GlobalBatchNorm
(num_features, eps=1e05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=True, group=1)[source]¶ Global normalization layer over a Ndimension input.
Global Normalization is cross device synchronized batch normalization. Batch Normalization implementation only normalize the data within each device. Global normalization will normalize the input within the group. It has been described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a minibatch of data and the learned parameters which can be described in the following formula.
\[y = \frac{x  \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\] Parameters
num_features (int) – C from an expected input of size (N, C, H, W).
group (int) – The number of device in each group.
eps (float) – A value added to the denominator for numerical stability. Default: 1e5.
momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data, else use the mean value and variance value of specified value. Default: True.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> global_bn_op = nn.GlobalBatchNorm(num_features=3, group=4) >>> input = Tensor(np.random.randint(0, 255, [1, 3, 224, 224]), mindspore.float32) >>> global_bn_op(input)

class
mindspore.nn.
GroupNorm
(num_groups, num_channels, eps=1e05, affine=True)[source]¶ Group Normalization over a minibatch of inputs.
Group normalization is widely used in recurrent neural networks. It applies normalization over a minibatch of inputs for each single training case as described in the paper Group Normalization. Group normalization divides the channels into groups and computes within each group the mean and variance for normalization, and it performs very stable over a wide range of batch size. It can be described using the following formula.
\[y = \frac{x  \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\] Parameters
num_groups (int) – The number of groups to be divided along the channel dimension.
num_channels (int) – The number of channels per group.
eps (float) – A value added to the denominator for numerical stability. Default: 1e5.
affine (bool) – A bool value, this layer will has learnable affine parameters when set to true. Default: True.
 Inputs:
input_x (Tensor)  The input feature with shape [N, C, H, W].
 Outputs:
Tensor, the normalized and scaled offset tensor, has the same shape and data type as the input_x.
Examples
>>> goup_norm_op = nn.GroupNorm(16, 64) >>> x = Tensor(np.ones([1, 64, 256, 256], np.float32)) >>> goup_norm_op(x)

class
mindspore.nn.
HSigmoid
[source]¶ Hard sigmoid activation function.
Applies hard sigmoid activation elementwise. The input is a Tensor with any valid shape.
Hard sigmoid is defined as:
\[\text{hsigmoid}(x_{i}) = max(0, min(1, \frac{2 * x_{i} + 5}{10})),\]where \(x_{i}\) is the \(i\)th slice along the given dim of the input Tensor.
 Inputs:
input_data (Tensor)  The input of HSigmoid.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
HSwish
[source]¶ rHard swish activation function.
Applies hswishtype activation elementwise. The input is a Tensor with any valid shape.
Hard swish is defined as:
\[\text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},\]where \(x_{i}\) is the \(i\)th slice along the given dim of the input Tensor.
 Inputs:
input_data (Tensor)  The input of HSwish.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
ImageGradients
[source]¶ Returns two tensors, the first is along the height dimension and the second is along the width dimension.
Assume an image shape is \(h*w\). The gradients along the height and the width are \(dy\) and \(dx\), respectively.
\[ \begin{align}\begin{aligned}dy[i] = \begin{cases} image[i+1, :]image[i, :], &if\ 0<=i<h1 \cr 0, &if\ i==h1\end{cases}\\dx[i] = \begin{cases} image[:, i+1]image[:, i], &if\ 0<=i<w1 \cr 0, &if\ i==w1\end{cases}\end{aligned}\end{align} \] Inputs:
images (Tensor)  The input image data, with format ‘NCHW’.
 Outputs:
dy (Tensor)  vertical image gradients, the same type and shape as input.
dx (Tensor)  horizontal image gradients, the same type and shape as input.
Examples
>>> net = nn.ImageGradients() >>> image = Tensor(np.array([[[[1,2],[3,4]]]]), dtype=mstype.int32) >>> net(image) [[[[2,2] [0,0]]]] [[[[1,0] [1,0]]]]

class
mindspore.nn.
L1Loss
(reduction='mean')[source]¶ L1Loss creates a criterion to measure the mean absolute error (MAE) between \(x\) and \(y\) by element, where \(x\) is the input Tensor and \(y\) is the target Tensor.
For simplicity, let \(x\) and \(y\) be 1dimensional Tensor with length \(N\), the unreduced loss (i.e. with argument reduction set to ‘none’) of \(x\) and \(y\) is given as:
\[L(x, y) = \{l_1,\dots,l_N\}, \quad \text{with } l_n = \left x_n  y_n \right\]When argument reduction is ‘mean’, the mean value of \(L(x, y)\) will be returned. When argument reduction is ‘sum’, the sum of \(L(x, y)\) will be returned. \(N\) is the batch size.
 Parameters
reduction (str) – Type of reduction to apply to loss. The optional values are “mean”, “sum”, “none”. Default: “mean”.
 Inputs:
input_data (Tensor)  Tensor of shape \((x_1, x_2, ..., x_R)\).
target_data (Tensor)  Tensor of shape \((y_1, y_2, ..., y_S)\).
 Outputs:
Tensor, loss float tensor.
Examples
>>> loss = nn.L1Loss() >>> input_data = Tensor(np.array([1, 2, 3]), mindspore.float32) >>> target_data = Tensor(np.array([1, 2, 2]), mindspore.float32) >>> loss(input_data, target_data)

class
mindspore.nn.
LARS
(optimizer, epsilon=1e05, hyperpara=0.001, weight_decay=0.0, use_clip=False, decay_filter=<function LARS.<lambda>>, lars_filter=<function LARS.<lambda>>, loss_scale=1.0)[source]¶ Implements the LARS algorithm with LARSUpdate Operator.
LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS.
 Parameters
optimizer (Optimizer) – MindSpore optimizer for which to wrap and modify gradients.
epsilon (float) – Term added to the denominator to improve numerical stability. Default: 1e05.
hyperpara (float) – Trust coefficient for calculating the local learning rate. Default: 0.001.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.
use_clip (bool) – Whether to use clip operation for calculating the local learning rate. Default: False.
decay_filter (Function) – A function to determine whether apply weight decay on parameters. Default: lambda x: ‘LayerNorm’ not in x.name and ‘bias’ not in x.name.
lars_filter (Function) – A function to determine whether apply lars algorithm. Default: lambda x: ‘LayerNorm’ not in x.name and ‘bias’ not in x.name.
loss_scale (float) – A floating point value for the loss scale. Default: 1.0.
 Inputs:
gradients (tuple[Tensor])  The gradients of params in optimizer, the shape is as same as the params in optimizer.
 Outputs:
Union[Tensor[bool], tuple[Parameter]], it depends on the output of optimizer.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> opt = nn.Momentum(net.trainable_params(), 0.1, 0.9) >>> opt_lars = nn.LARS(opt, epsilon=1e08, hyperpara=0.02) >>> model = Model(net, loss_fn=loss, optimizer=opt_lars, metrics=None)

class
mindspore.nn.
LSTM
(input_size, hidden_size, num_layers=1, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]¶ LSTM (Long ShortTerm Memory) layer.
Applies a LSTM to the input.
There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and another is hidden state pipeline. Denote two consecutive time nodes as \(t1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t1}\) and an cell state \(c_{t1}\) of the layer at time \({t1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.
\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t1)} + b_{oh}) \\ c_t = f_t * c_{(t1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORTTERM MEMORY and Long ShortTerm Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.
 Parameters
input_size (int) – Number of features of input.
hidden_size (int) – Number of features of hidden layer.
num_layers (int) – Number of layers of stacked LSTM . Default: 1.
has_bias (bool) – Specifies whether has bias b_ih and b_hh. Default: True.
batch_first (bool) – Specifies whether the first dimension of input is batch_size. Default: False.
dropout (float) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].
bidirectional (bool) – Specifies whether this is a bidirectional LSTM. If set True, number of directions will be 2 otherwise number of directions is 1. Default: False.
 Inputs:
input (Tensor)  Tensor of shape (seq_len, batch_size, input_size).
hx (tuple)  A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). Data type of hx should be the same of input.
 Outputs:
Tuple, a tuple constains (output, (h_n, c_n)).
output (Tensor)  Tensor of shape (seq_len, batch_size, num_directions * hidden_size).
hx_n (tuple)  A tuple of two Tensor (h_n, c_n) both of shape (num_directions * num_layers, batch_size, hidden_size).
Examples
>>> class LstmNet(nn.Cell): >>> def __init__(self, input_size, hidden_size, num_layers, has_bias, batch_first, bidirectional): >>> super(LstmNet, self).__init__() >>> self.lstm = nn.LSTM(input_size=input_size, >>> hidden_size=hidden_size, >>> num_layers=num_layers, >>> has_bias=has_bias, >>> batch_first=batch_first, >>> bidirectional=bidirectional, >>> dropout=0.0) >>> >>> def construct(self, inp, h0, c0): >>> return self.lstm(inp, (h0, c0)) >>> >>> net = LstmNet(10, 12, 2, has_bias=True, batch_first=True, bidirectional=False) >>> input = Tensor(np.ones([3, 5, 10]).astype(np.float32)) >>> h0 = Tensor(np.ones([1 * 2, 3, 12]).astype(np.float32)) >>> c0 = Tensor(np.ones([1 * 2, 3, 12]).astype(np.float32)) >>> output, (hn, cn) = net(input, h0, c0)

class
mindspore.nn.
Lamb
(params, decay_steps, warmup_steps=0, start_learning_rate=0.1, end_learning_rate=0.0001, power=1.0, beta1=0.9, beta2=0.999, eps=1e06, weight_decay=0.0, decay_filter=<function Lamb.<lambda>>)[source]¶ Lamb Dynamic LR.
LAMB is an optimization algorithm employing a layerwise adaptive large batch optimization technique. Refer to the paper LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76 MINUTES.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in params should be class mindspore.Parameter.
decay_steps (int) – The steps of the lr decay. Should be equal to or greater than 1.
warmup_steps (int) – The steps of lr warm up. Default: 0.
start_learning_rate (float) – A floating point value for the learning rate. Default: 0.1.
end_learning_rate (float) – A floating point value for the end learning rate. Default: 0.0001.
power (float) – The power of the polynomial. Default: 1.0.
beta1 (float) – The exponential decay rate for the 1st moment estimates. Default: 0.9. Should be in range (0.0, 1.0).
beta2 (float) – The exponential decay rate for the 2nd moment estimates. Default: 0.999. Should be in range (0.0, 1.0).
eps (float) – Term added to the denominator to improve numerical stability. Default: 1e6. Should be greater than 0.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0. Should be equal to or greater than 0.
decay_filter (Function) – A function to determine whether to apply weight decay on parameters. Default: lambda x: ‘LayerNorm’ not in x.name and ‘bias’ not in x.name.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
tuple[Parameter], the updated velocity value, the shape is the same as params.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.Lamb(params=net.trainable_params(), decay_steps=10) >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)

class
mindspore.nn.
LayerNorm
(normalized_shape, begin_norm_axis=1, begin_params_axis=1, gamma_init='ones', beta_init='zeros')[source]¶ Applies Layer Normalization over a minibatch of inputs.
Layer normalization is widely used in recurrent neural networks. It applies normalization over a minibatch of inputs for each single training case as described in the paper Layer Normalization. Unlike batch normalization, layer normalization performs exactly the same computation at training and testing times. It can be described using the following formula. It is applied across all channels and pixel but only one batch size.
\[y = \frac{x  \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\] Parameters
normalized_shape (Union(tuple[int], list[int]) – The normalization is performed over axes begin_norm_axis … R  1 and centering and scaling parameters are calculated over begin_params_axis … R  1.
begin_norm_axis (int) – It first normalization dimension: normalization will be performed along dimensions begin_norm_axis: rank(inputs), the value should be in [1, rank(input)). Default: 1.
begin_params_axis (int) – The first parameter(beta, gamma)dimension: scale and centering parameters will have dimensions begin_params_axis: rank(inputs) and will be broadcast with the normalized inputs accordingly, the value should be in [1, rank(input)). Default: 1.
gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.
beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.
 Inputs:
input_x (Tensor)  The shape of ‘input_x’ is \((x_1, x_2, ..., x_R)\), and input_shape[begin_norm_axis:] is equal to normalized_shape.
 Outputs:
Tensor, the normalized and scaled offset tensor, has the same shape and data type as the input_x.
Examples
>>> x = Tensor(np.ones([20, 5, 10, 10]), mindspore.float32) >>> shape1 = x.shape()[1:] >>> m = nn.LayerNorm(shape1, begin_norm_axis=1, begin_params_axis=1) >>> m(x)

class
mindspore.nn.
LeakyReLU
(alpha=0.2)[source]¶ Leaky ReLU activation function.
LeakyReLU is similar to ReLU, but LeakyReLU has a slope that makes it not equal to 0 at x < 0. The activation function is defined as:
\[\text{leaky_relu}(x) = \begin{cases}x, &\text{if } x \geq 0; \cr \text{alpha} * x, &\text{otherwise.}\end{cases}\]See https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf
 Parameters
alpha (float) – Slope of the activation function at x < 0. Default: 0.2.
 Inputs:
input_x (Tensor)  The input of LeakyReLU.
 Outputs:
Tensor, has the same type and shape with the input_x.

class
mindspore.nn.
LogSoftmax
(axis=1)[source]¶ LogSoftmax activation function.
Applies the LogSoftmax function to ndimensional input tensor.
The input is transformed with Softmax function and then with log function to lie in range[inf,0).
Logsoftmax is defined as: \(\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n1} \exp(x_j)}\right)\), where \(x_{i}\) is the \(i\)th slice along the given dim of the input Tensor.
 Parameters
axis (int) – The axis to apply LogSoftmax operation, 1 means the last dimension. Default: 1.
 Inputs:
x (Tensor)  The input of LogSoftmax.
 Outputs:
Tensor, which has the same type and shape as the input as x with values in the range[inf,0).

class
mindspore.nn.
Loss
[source]¶ Calculates the average of the loss. If method ‘update’ is called every \(n\) iterations, the result of evaluation will be:
\[loss = \frac{\sum_{k=1}^{n}loss_k}{n}\]Examples
>>> x = Tensor(np.array(0.2), mindspore.float32) >>> loss = nn.Loss() >>> loss.clear() >>> loss.update(x) >>> result = loss.eval()

eval
()[source]¶ Calculates the average of the loss.
 Returns
Float, the average of the loss.
 Raises
RuntimeError – If the total number is 0.

update
(*inputs)[source]¶ Updates the internal evaluation result.
 Parameters
inputs – Inputs contain only one element, the element is loss. The dimension of loss should be 0 or 1.
 Raises
ValueError – If the length of inputs is not 1.
ValueError – If the dimensions of loss is not 1.


class
mindspore.nn.
MAE
[source]¶ Calculates the mean absolute error.
Creates a criterion that measures the mean absolute error (MAE) between each element in the input: \(x\) and the target: \(y\).
\[\text{MAE} = \frac{\sum_{i=1}^n \y_i  x_i\}{n}\]Here \(y_i\) is the prediction and \(x_i\) is the true value.
Note
The method update must be called with the form update(y_pred, y).
Examples
>>> x = Tensor(np.array([0.1, 0.2, 0.6, 0.9]), mindspore.float32) >>> y = Tensor(np.array([0.1, 0.25, 0.7, 0.9]), mindspore.float32) >>> error = nn.MAE() >>> error.clear() >>> error.update(x, y) >>> result = error.eval()

eval
()[source]¶ Computes the mean absolute error.
 Returns
Float, the computed result.
 Raises
RuntimeError – If the number of the total samples is 0.

update
(*inputs)[source]¶ Updates the internal evaluation result \(y_{pred}\) and \(y\).
 Parameters
inputs – Input y_pred and y for calculating mean absolute error where the shape of y_pred and y are both ND and the shape are the same.
 Raises
ValueError – If the number of the input is not 2.


class
mindspore.nn.
MSE
[source]¶ Measures the mean squared error.
Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input: \(x\) and the target: \(y\).
\[\text{MSE}(x,\ y) = \frac{\sum_{i=1}^n(y_i  x_i)^2}{n},\]where \(n\) is batch size.
Examples
>>> x = Tensor(np.array([0.1, 0.2, 0.6, 0.9]), mindspore.float32) >>> y = Tensor(np.array([0.1, 0.25, 0.5, 0.9]), mindspore.float32) >>> error = nn.MSE() >>> error.clear() >>> error.update(x, y) >>> result = error.eval()

eval
()[source]¶ Compute the mean squared error.
 Returns
Float, the computed result.
 Raises
RuntimeError – If the number of samples is 0.

update
(*inputs)[source]¶ Updates the internal evaluation result \(y_{pred}\) and \(y\).
 Parameters
inputs – Input y_pred and y for calculating mean square error where the shape of y_pred and y are both ND and the shape are the same.
 Raises
ValueError – If the number of input is not 2.


class
mindspore.nn.
MSELoss
(reduction='mean')[source]¶ MSELoss create a criterion to measures the mean squared error (squared L2norm) between \(x\) and \(y\) by element, where \(x\) is the input and \(y\) is the target.
For simplicity, let \(x\) and \(y\) be 1dimensional Tensor with length \(N\), the unreduced loss (i.e. with argument reduction set to ‘none’) of \(x\) and \(y\) is given as:
\[L(x, y) = \{l_1,\dots,l_N\}, \quad \text{with} \quad l_n = (x_n  y_n)^2.\]When argument reduction is ‘mean’, the mean value of \(L(x, y)\) will be returned. When argument reduction is ‘sum’, the sum of \(L(x, y)\) will be returned. \(N\) is the batch size.
 Parameters
reduction (str) – Type of reduction to apply to loss. The optional values are “mean”, “sum”, “none”. Default: “mean”.
 Inputs:
input_data (Tensor)  Tensor of shape \((x_1, x_2, ..., x_R)\).
target_data (Tensor)  Tensor of shape \((y_1, y_2, ..., y_S)\).
 Outputs:
Tensor, weighted loss float tensor.
Examples
>>> loss = nn.MSELoss() >>> input_data = Tensor(np.array([1, 2, 3]), mindspore.float32) >>> target_data = Tensor(np.array([1, 2, 2]), mindspore.float32) >>> loss(input_data, target_data)

class
mindspore.nn.
MaxPool2d
(kernel_size=1, stride=1, pad_mode='valid')[source]¶ Max pooling operation for temporal data.
Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes.
Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), MaxPool2d outputs regional maximum in the \((H_{in}, W_{in})\)dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows.
\[\text{output}(N_i, C_j, h, w) = \max_{m=0, \ldots, h_{ker}1} \max_{n=0, \ldots, w_{ker}1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]Note
pad_mode for training only supports “same” and “valid”.
 Parameters
kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the max value, is an int number that represents height and width are both kernel_size, or a tuple of two int numbers that represent height and width respectively. Default: 1.
stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.
pad_mode (str) –
The optional values for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.
same: Adopts the way of completion. Output height and width will be the same as the input. Total number of padding will be calculated for horizontal and vertical direction and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.
valid: Adopts the way of discarding. The possibly largest height and width of output will be return without padding. Extra pixels will be discarded.
 Inputs:
input (Tensor)  Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).
 Outputs:
Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).
Examples
>>> pool = nn.MaxPool2d(kernel_size=3, stride=1) >>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32) [[[[1. 5. 5. 1.] [0. 3. 4. 8.] [4. 2. 7. 6.] [4. 9. 0. 1.]] [[3. 6. 2. 6.] [4. 4. 7. 8.] [0. 0. 4. 0.] [1. 8. 7. 0.]]]] >>> output = pool(x) >>> output.shape() (1, 2, 2, 2) >>> output [[[[7. 8.] [9. 9.]] [[7. 8.] [8. 8.]]]]

class
mindspore.nn.
Metric
[source]¶ Base class of metric.
Note
For examples of subclasses, please refer to the definition of class MAE, ‘Recall’ etc.

abstract
clear
()[source]¶ A interface describes the behavior of clearing the internal evaluation result.
Note
All subclasses should override this interface.

abstract

class
mindspore.nn.
Momentum
(params, learning_rate, momentum, weight_decay=0.0, loss_scale=1.0, decay_filter=<function Momentum.<lambda>>)[source]¶ Implements the Momentum algorithm.
Refer to the paper on the importance of initialization and momentum in deep learning for more details.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in parameters should be class mindspore.Parameter.
learning_rate (Union[float, Tensor, Iterable]) – A value for the learning rate. When the learning_rate is Iterable or a Tensor and the dims of the Tensor is 1, use dynamic learning rate, then the ith step will take the ith value as the learning rate. When the learning_rate is float or learning_rate is a Tensor but the dims of the Tensor is 0, use fixed learning rate. Other cases are not supported.
momentum (float) – Hyperparameter of type float, means momentum for the moving average.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.
loss_scale (float) – A floating point value for the loss scale. Default: 1.0.
decay_filter (Function) – A function to determine whether to apply weight decay on parameters. Default: lambda x: ‘beta’ not in x.name and ‘gamma’ not in x.name.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
tuple[bool], all elements are True.
 Raises
ValueError – If the momentum is less than 0.0.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.Momentum(params=net.trainable_params(), learning_rate=0.1, momentum=0.9) >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)

class
mindspore.nn.
Norm
(axis=(), keep_dims=False)[source]¶ Computes the norm of vectors, currently including Euclidean norm, i.e., \(L_2\)norm.
 Parameters
 Inputs:
input (Tensor)  Tensor which is not empty.
 Outputs:
Tensor, output tensor with dimensions in ‘axis’ reduced to 1 will be returned if ‘keep_dims’ is True; otherwise a Tensor with dimensions in ‘axis’ removed is returned.
Examples
>>> net = nn.Norm(axis=0) >>> input = Tensor(np.random.randint(0, 10, [4, 16]), mindspore.float32) >>> net(input)

class
mindspore.nn.
OneHot
(axis=1, depth=1, on_value=1.0, off_value=0.0, dtype=mindspore.float32)[source]¶ Returns a onehot tensor.
The locations represented by indices in argument ‘indices’ take value on_value, while all other locations take value off_value.
Note
If the input indices is rank \(N\), the output will have rank \(N+1\). The new axis is created at dimension axis.
 Parameters
axis (int) – Features x depth if axis == 1, depth x features if axis == 0. Default: 1.
depth (int) – A scalar defining the depth of the one hot dimension. Default: 1.
on_value (float) – A scalar defining the value to fill in output[i][j] when indices[j] = i. Default: 1.0.
off_value (float) – A scalar defining the value to fill in output[i][j] when indices[j] != i. Default: 0.0.
dtype (
mindspore.dtype
) – Data type of ‘on_value’ and ‘off_value’, not the data type of indices. Default: mindspore.float32.
 Inputs:
indices (Tensor)  A tensor of indices of data type mindspore.int32 and arbitrary shape.
 Outputs:
Tensor, the onehot tensor of data type ‘dtype’ with dimension at ‘axis’ expanded to ‘depth’ and filled with on_value and off_value.
Examples
>>> net = nn.OneHot(depth=4, axis=1) >>> indices = Tensor([[1, 3], [0, 2]], dtype=mindspore.int32) >>> net(indices) [[[0. 0.] [1. 0.] [0. 0.] [0. 1.]] [[1. 0.] [0. 0.] [0. 1.] [0. 0.]]]

class
mindspore.nn.
Optimizer
(learning_rate, parameters, weight_decay=0.0, loss_scale=1.0, decay_filter=<function Optimizer.<lambda>>)[source]¶ Base class for all optimizers.
This class defines the API to add Ops to train a model.
Note
This class defines the API to add Ops to train a model. Never use this class directly, but instead instantiate one of its subclasses.
 Parameters
learning_rate (float) – A floating point value for the learning rate. Should be greater than 0.
parameters (list) – A list of parameter, which will be updated. The element in parameters should be class mindspore.Parameter.
weight_decay (float) – A floating point value for the weight decay. If the type of weight_decay input is int, it will be convertd to float. Default: 0.0.
loss_scale (float) – A floating point value for the loss scale. It should be greater than 0. If the type of loss_scale input is int, it will be convertd to float. Default: 1.0.
decay_filter (Function) – A function to determine whether to apply weight decay on parameters. Default: lambda x: ‘beta’ not in x.name and ‘gamma’ not in x.name.
 Raises
ValueError – If the learning_rate is a Tensor, but the dims of tensor is greater than 1.
TypeError – If the learning_rate is not any of the three types: float, Tensor, Iterable.

decay_weight
(gradients)[source]¶ Weight decay.
An approach to reduce the overfitting of a deep learning neural network model.

get_lr
()[source]¶ Get the learning rate of current step.
 Returns
float, the learning rate of current step.

class
mindspore.nn.
PReLU
(channel=1, w=0.25)[source]¶ PReLU activation function.
Applies the PReLU function elementwise.
PReLU is defined as: \(prelu(x_i)= \max(0, x_i) + w * \min(0, x_i)\), where \(x_i\) is an element of an channel of the input.
Here \(w\) is an learnable parameter with default initial value 0.25. Parameter \(w\) has dimensionality of the argument channel. If called without argument channel, a single parameter \(w\) will be shared across all channels.
 Parameters
 Inputs:
input_data (Tensor)  The input of Tanh.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
PSNR
(max_val=1.0)[source]¶ Returns Peak SignaltoNoise Ratio of two image batches.
It produces a PSNR value for each image in batch. Assume inputs are \(I\) and \(K\), both with shape \(h*w\). \(MAX\) represents the dynamic range of pixel values.
\[\begin{split}MSE&=\frac{1}{hw}\sum\limits_{i=0}^{h1}\sum\limits_{j=0}^{w1}[I(i,j)K(i,j)]^2\\ PSNR&=10*log_{10}(\frac{MAX^2}{MSE})\end{split}\] Parameters
max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8bit grayscale images). Default: 1.0.
 Inputs:
img1 (Tensor)  The first image batch with format ‘NCHW’. It should be the same shape and dtype as img2.
img2 (Tensor)  The second image batch with format ‘NCHW’. It should be the same shape and dtype as img1.
 Outputs:
Tensor, with dtype mindspore.float32. It is a 1D tensor with shape N, where N is the batch num of img1.
Examples
>>> net = nn.PSNR() >>> img1 = Tensor(np.random.random((1,3,16,16))) >>> img2 = Tensor(np.random.random((1,3,16,16))) >>> psnr = net(img1, img2)

class
mindspore.nn.
Pad
(paddings, mode='CONSTANT')[source]¶ Pads the input tensor according to the paddings and mode.
 Parameters
paddings (tuple) – The shape of parameter paddings is (N, 2). N is the rank of input data. All elements of paddings are int type. For D th dimension of input, paddings[D, 0] indicates how many sizes to be extended ahead of the D th dimension of the input tensor, and paddings[D, 1] indicates how many sizes to be extended behind of the D th dimension of the input tensor.
mode (string) – Specifies padding mode. The optional values are “CONSTANT”, “REFLECT”, “SYMMETRIC”. Default: “CONSTANT”.
 Inputs:
** input_x** (Tensor)  The input tensor.
 Outputs:
Tensor, the tensor after padding.
If mode is “CONSTANT”, it fill the edge with 0, regardless of the values of the input_x. If the input_x is [[1,2,3],[4,5,6],[7,8,9]] and paddings is [[1,1],[2,2]], then the Outputs is [[0,0,0,0,0,0,0],[0,0,1,2,3,0,0],[0,0,4,5,6,0,0],[0,0,7,8,9,0,0],[0,0,0,0,0,0,0]].
If ‘mode` is “REFLECT”, it uses a way of symmetrical copying throught the axis of symmetry to fill in, symmetry. If the input_x is [[1,2,3],[4,5,6],[7,8,9]] and paddings is [[1,1],[2,2]], then the Outputs is [[6,5,4,5,6,5,4],[3,2,1,2,3,2,1],[6,5,4,5,6,5,4],[9,8,7,8,9,8,7],[6,5,4,5,6,5,4]].
If ‘mode’ is “SYMMETRIC”, the filling method is similar to the “REFLECT”. It is also copied according to the symmetry axis, except that it includes the symmetry axis. If the input_x is [[1,2,3],[4,5,6],[7,8,9]] and paddings is [[1,1],[2,2]], then the Outputs is [[2,1,1,2,3,3,2],[2,1,1,2,3,3,2],[5,4,4,5,6,6,5],[8,7,7,8,9,9,8],[8,7,7,8,9,9,8]].
Examples
>>> from mindspore import Tensor >>> from mindspore.ops import operations as P >>> import mindspore.nn as nn >>> import numpy as np >>> class Net(nn.Cell): >>> def __init__(self): >>> super(Net, self).__init__() >>> self.pad = nn.Pad(paddings=((1,1),(2,2)), mode="CONSTANT") >>> def construct(self, x): >>> return self.pad(x) >>> x = np.random.random(size=(2, 3)).astype(np.float32) >>> pad = Net() >>> ms_output = pad(Tensor(x))

class
mindspore.nn.
ParameterUpdate
(param)[source]¶ Cell that updates parameters.
With this Cell, one can manually update param with the input Tensor.
 Parameters
param (Parameter) – The parameter to be updated manually.
 Raises
KeyError – If parameter with the specified name do not exist.
Examples
>>> network = Net() >>> param = network.parameters_dict()['learning_rate'] >>> update = nn.ParameterUpdate(param) >>> update.phase = "update_param" >>> lr = Tensor(0.001, mindspore.float32) >>> update(lr)

class
mindspore.nn.
Precision
(eval_type='classification')[source]¶ Calculates precision for classification and multilabel data.
The precision function creates two local variables, \(\text{true_positive}\) and \(\text{false_positive}\), that are used to compute the precision. This value is ultimately returned as the precision, an idempotent operation that simply divides \(\text{true_positive}\) by the sum of \(\text{true_positive}\) and \(\text{false_positive}\).
\[\text{precision} = \frac{\text{true_positive}}{\text{true_positive} + \text{false_positive}}\]Note
In the multilabel cases, the elements of \(y\) and \(y_{pred}\) should be 0 or 1.
 Parameters
eval_type (str) – Metric to calculate accuracy over a dataset, for classification or multilabel. Default: ‘classification’.
Examples
>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]])) >>> y = Tensor(np.array([1, 0, 1])) >>> metric = nn.Precision('classification') >>> metric.clear() >>> metric.update(x, y) >>> precision = metric.eval()

eval
(average=False)[source]¶ Computes the precision.
 Parameters
average (bool) – Specify whether calculate the average precision. Default value is False.
 Returns
Float, the computed result.

update
(*inputs)[source]¶ Updates the internal evaluation result with y_pred and y.
 Parameters
inputs – Input y_pred and y. y_pred and y are Tensor, list or numpy.ndarray. For ‘classification’ evaluation type, y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. Shape of y can be \((N, C)\) with values 0 and 1 if onehot encoding is used or the shape is \((N,)\) with integer values if index of category is used. For ‘multilabel’ evaluation type, y_pred and y can only be onehot encoding with values 0 or 1. Indices with 1 indicate positive category. The shape of y_pred and y are both \((N, C)\).
 Raises
ValueError – If the number of input is not 2.

class
mindspore.nn.
RMSProp
(params, learning_rate=0.1, decay=0.9, momentum=0.0, epsilon=1e10, use_locking=False, centered=False, loss_scale=1.0, weight_decay=0.0, decay_filter=<function RMSProp.<lambda>>)[source]¶ Implements Root Mean Squared Propagation (RMSProp) algorithm.
Note
Update params according to the RMSProp algorithm.
The equation is as follows:
\[s_{t} = \rho s_{t1} + (1  \rho)(\nabla Q_{i}(w))^2\]\[m_{t} = \beta m_{t1} + \frac{\eta} {\sqrt{s_{t} + \epsilon}} \nabla Q_{i}(w)\]\[w = w  m_{t}\]The first equation calculates moving average of the squared gradient for each weight. Then dividing the gradient by \(\sqrt{ms_{t} + \epsilon}\).
if centered is True:
\[g_{t} = \rho g_{t1} + (1  \rho)\nabla Q_{i}(w)\]\[s_{t} = \rho s_{t1} + (1  \rho)(\nabla Q_{i}(w))^2\]\[m_{t} = \beta m_{t1} + \frac{\eta} {\sqrt{s_{t}  g_{t}^2 + \epsilon}} \nabla Q_{i}(w)\]\[w = w  m_{t}\]where, \(w\) represents params, which will be updated. \(g_{t}\) is mean gradients, \(g_{t1}\) is the last moment of \(g_{t}\). \(s_{t}\) is the mean square gradients, \(s_{t1}\) is the last moment of \(s_{t}\), \(m_{t}\) is moment, the delta of w, \(m_{t1}\) is the last moment of \(m_{t}\). \(\rho\) represents decay. \(\beta\) is the momentum term, represents momentum. \(\epsilon\) is a smoothing term to avoid division by zero, represents epsilon. \(\eta\) is learning rate, represents learning_rate. \(\nabla Q_{i}(w)\) is gradientse, represents gradients.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in parameters should be class mindspore.Parameter.
learning_rate (Union[float, Tensor, Iterable]) – A value for the learning rate. When the learning_rate is Iterable or a Tensor and the dims of the Tensor is 1, use dynamic learning rate, then the ith step will take the ith value as the learning rate. When the learning_rate is float or learning_rate is a Tensor but the dims of the Tensor is 0, use fixed learning rate. Other cases are not supported.
decay (float) – Decay rate.
momentum (float) – Hyperparameter of type float, means momentum for the moving average.
epsilon (float) – Term added to the denominator to improve numerical stability. Should be greater than 0.
use_locking (bool) – Enable a lock to protect the update of variable and accumlation tensors. Default: False.
centered (bool) – If True, gradients are normalized by the estimated variance of the gradient. Default: False
loss_scale (float) – A floating point value for the loss scale. Default: 1.0.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.
decay_filter (Function) – A function to determine whether to apply weight decay on parameters. Default: lambda x: ‘beta’ not in x.name and ‘gamma’ not in x.name.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
Tensor[bool], the value is True.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> opt = nn.RMSProp(params=net.trainable_params(), learning_rate=lr) >>> model = Model(net, loss, opt)

class
mindspore.nn.
ReLU
[source]¶ Rectified Linear Unit activation function.
Applies the rectified linear unit function elementwise. It returns elementwise \(\max(0, x)\), specially, the neurons with the negative output will suppressed and the active neurons will stay the same.
 Inputs:
input_data (Tensor)  The input of ReLU.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
ReLU6
[source]¶ Compute ReLU6 activation function.
ReLU6 is similar to ReLU with a upper limit of 6, which if the inputs are greater than 6, the outputs will be suppressed to 6. It computes elementwise as \(\min(\max(0, x), 6)\). The input is a Tensor of any valid shape.
 Inputs:
input_data (Tensor)  The input of ReLU6.
 Outputs:
Tensor, which has the same type with input_data.

class
mindspore.nn.
Recall
(eval_type='classification')[source]¶ Calculate recall for classification and multilabel data.
The recall class creates two local variables, \(\text{true_positive}\) and \(\text{false_negative}\), that are used to compute the recall. This value is ultimately returned as the recall, an idempotent operation that simply divides \(\text{true_positive}\) by the sum of \(\text{true_positive}\) and \(\text{false_negative}\).
\[\text{recall} = \frac{\text{true_positive}}{\text{true_positive} + \text{false_negative}}\]Note
In the multilabel cases, the elements of \(y\) and \(y_{pred}\) should be 0 or 1.
 Parameters
eval_type (str) – Metric to calculate the recall over a dataset, for classification or multilabel. Default: ‘classification’.
Examples
>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]])) >>> y = Tensor(np.array([1, 0, 1])) >>> metric = nn.Recall('classification') >>> metric.clear() >>> metric.update(x, y) >>> recall = metric.eval()

eval
(average=False)[source]¶ Computes the recall.
 Parameters
average (bool) – Specify whether calculate the average recall. Default value is False.
 Returns
Float, the computed result.

update
(*inputs)[source]¶ Updates the internal evaluation result with y_pred and y.
 Parameters
inputs – Input y_pred and y. y_pred and y are a Tensor, a list or an array. For ‘classification’ evaluation type, y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. Shape of y can be \((N, C)\) with values 0 and 1 if onehot encoding is used or the shape is \((N,)\) with integer values if index of category is used. For ‘multilabel’ evaluation type, y_pred and y can only be onehot encoding with values 0 or 1. Indices with 1 indicate positive category. The shape of y_pred and y are both \((N, C)\).
 Raises
ValueError – If the number of input is not 2.

class
mindspore.nn.
SGD
(params, learning_rate=0.1, momentum=0.0, dampening=0.0, weight_decay=0.0, nesterov=False, loss_scale=1.0)[source]¶ Implements stochastic gradient descent (optionally with momentum).
Introduction to SGD can be found at https://en.wikipedia.org/wiki/Stochastic_gradient_descent. Nesterov momentum is based on the formula from paper On the importance of initialization and momentum in deep learning.
 Parameters
params (list[Parameter]) – A list of parameter, which will be updated. The element in params should be class mindspore.Parameter.
learning_rate (float) – A floating point value for the learning rate. Default: 0.1.
momentum (float) – A floating point value the momentum. Default: 0.
dampening (float) – A floating point value of dampening for momentum. Default: 0.
weight_decay (float) – Weight decay (L2 penalty). Default: 0.
nesterov (bool) – Enables the Nesterov momentum. Default: False.
loss_scale (float) – A floating point value for the loss scale, which should be larger
0.0. Default (than) – 1.0.
 Inputs:
gradients (tuple[Tensor])  The gradients of params, the shape is the same as params.
 Outputs:
Tensor[bool], the value is True.
 Raises
ValueError – If the momentum, dampening or weight_decay value is less than 0.0.
Examples
>>> net = Net() >>> loss = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.SGD(params=net.trainable_params()) >>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)

class
mindspore.nn.
SSIM
(max_val=1.0, filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03)[source]¶ Returns SSIM index between img1 and img2.
Its implementation is based on Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing.
\[\begin{split}l(x,y)&=\frac{2\mu_x\mu_y+C_1}{\mu_x^2+\mu_y^2+C_1}, C_1=(K_1L)^2.\\ c(x,y)&=\frac{2\sigma_x\sigma_y+C_2}{\sigma_x^2+\sigma_y^2+C_2}, C_2=(K_2L)^2.\\ s(x,y)&=\frac{\sigma_{xy}+C_3}{\sigma_x\sigma_y+C_3}, C_3=C_2/2.\\ SSIM(x,y)&=l*c*s\\&=\frac{(2\mu_x\mu_y+C_1)(2\sigma_{xy}+C_2}{(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)}.\end{split}\] Parameters
max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8bit grayscale images). Default: 1.0.
filter_size (int) – The size of the Gaussian filter. Default: 11.
filter_sigma (float) – The standard deviation of Gaussian kernel. Default: 1.5.
k1 (float) – The constant used to generate c1 in the luminance comparison function. Default: 0.01.
k2 (float) – The constant used to generate c2 in the contrast comparison function. Default: 0.03.
 Inputs:
img1 (Tensor)  The first image batch with format ‘NCHW’. It should be the same shape and dtype as img2.
img2 (Tensor)  The second image batch with format ‘NCHW’. It should be the same shape and dtype as img1.
 Outputs:
Tensor, has the same dtype as img1. It is a 1D tensor with shape N, where N is the batch num of img1.
Examples
>>> net = nn.SSIM() >>> img1 = Tensor(np.random.random((1,3,16,16))) >>> img2 = Tensor(np.random.random((1,3,16,16))) >>> ssim = net(img1, img2)

class
mindspore.nn.
SequentialCell
(*args)[source]¶ Sequential cell container.
A list of Cells will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of cells can also be passed in.
 Parameters
args (list, optional) – List of subclass of Cell.
 Raises
TypeError – If arg is not of type list or OrderedDict.
 Inputs:
input (Tensor)  Tensor with shape according to the first Cell in the sequence.
 Outputs:
Tensor, the output Tensor with shape depending on the input and defined sequence of Cells.
Examples
>>> conv = nn.Conv2d(3, 2, 3, pad_mode='valid') >>> bn = nn.BatchNorm2d(2) >>> relu = nn.ReLU() >>> seq = nn.SequentialCell([conv, bn, relu]) >>> >>> x = Tensor(np.random.random((1, 3, 4, 4)), dtype=mindspore.float32) >>> seq(x) [[[[0.02531557 0. ] [0.04933941 0.04880078]] [[0. 0. ] [0. 0. ]]]]

class
mindspore.nn.
Sigmoid
[source]¶ Sigmoid activation function.
Applies sigmoidtype activation elementwise.
Sigmoid function is defined as: \(\text{sigmoid}(x_i) = \frac{1}{1 + \exp(x_i)}\), where \(x_i\) is the element of the input.
 Inputs:
input_data (Tensor)  The input of Tanh.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
SmoothL1Loss
(sigma=1.0)[source]¶ A loss class for learning region proposals.
SmoothL1Loss can be regarded as modified version of L1Loss or a combination of L1Loss and L2Loss. L1Loss computes the elementwise absolute difference between two input Tensor while L2Loss computes the squared difference between two input Tensor. L2Loss often leads to faster convergence but it is less robust to outliers.
Given two input \(x,\ y\) of length \(N\), the unreduced SmoothL1Loss can be described as follows:
\[\begin{split}L_{i} = \begin{cases} 0.5 (x_i  y_i)^2, & \text{if } x_i  y_i < \text{sigma}; \\ x_i  y_i  0.5, & \text{otherwise. } \end{cases}\end{split}\]Here \(\text{sigma}\) controls the point where the loss function changes from quadratic to linear. Its default value is 1.0. \(N\) is the batch size. This function returns an unreduced loss Tensor.
 Parameters
sigma (float) – A parameter used to control the point where the function will change from quadratic to linear. Default: 1.0.
 Inputs:
input_data (Tensor)  Tensor of shape \((x_1, x_2, ..., x_R)\).
target_data (Tensor)  Tensor of shape \((y_1, y_2, ..., y_S)\).
 Outputs:
Tensor, loss float tensor.
Examples
>>> loss = nn.SmoothL1Loss() >>> input_data = Tensor(np.array([1, 2, 3]), mindspore.float32) >>> target_data = Tensor(np.array([1, 2, 2]), mindspore.float32) >>> loss(input_data, target_data)

class
mindspore.nn.
Softmax
(axis=1)[source]¶ Softmax activation function.
Applies the Softmax function to an ndimensional input Tensor.
The input is a Tensor of logits transformed with exponential function and then normalized to lie in range [0, 1] and sum up to 1.
Softmax is defined as:
\[\text{softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_{j=0}^{n1}\exp(x_j)},\]where \(x_{i}\) is the \(i\)th slice along the given dim of the input Tensor.
 Parameters
axis (Union[int, tuple[int]]) – The axis to apply Softmax operation, 1 means the last dimension. Default: 1.
 Inputs:
x (Tensor)  The input of Softmax.
 Outputs:
Tensor, which has the same type and shape as x with values in the range[0,1].

class
mindspore.nn.
SoftmaxCrossEntropyExpand
(sparse=False)[source]¶ Computes softmax cross entropy between logits and labels. Implemented by expanded formula.
This is a wrapper of several functions.
\[\ell(x_i, t_i) = log\left(\frac{\exp(x_{t_i})}{\sum_j \exp(x_j)}\right),\]where \(x_i\) is a 1D score Tensor, \(t_i\) is the target class.
Note
When argument sparse is set to True, the format of label is the index range from \(0\) to \(C  1\) instead of onehot vectors.
 Parameters
sparse (bool) – Specifies whether labels use sparse format or not. Default: False.
 Inputs:
input_data (Tensor)  Tensor of shape \((x_1, x_2, ..., x_R)\).
label (Tensor)  Tensor of shape \((y_1, y_2, ..., y_S)\).
 Outputs:
Tensor, a scalar tensor including the mean loss.
Examples
>>> loss = nn.SoftmaxCrossEntropyExpand(sparse=True) >>> input_data = Tensor(np.ones([64, 512]), dtype=mindspore.float32) >>> label = Tensor(np.ones([64]), dtype=mindspore.int32) >>> loss(input_data, label)

class
mindspore.nn.
SoftmaxCrossEntropyWithLogits
(is_grad=True, sparse=False, reduction=None)[source]¶ Computes softmax cross entropy between logits and labels.
Measures the distribution error between the probabilities of the input (computed with softmax function) and the target where the classes are mutually exclusive (only one class is positive) using cross entropy loss.
Typical input into this function is unnormalized scores and target of each class. Scores Tensor \(x\) is of shape \((N, C)\) and target Tensor \(t\) is a Tensor of shape \((N, C)\) which contains onehot labels of length \(C\).
For each batch \(N_i\), the loss is given as:
\[\ell(x_i, t_i) = w_{t_i} \log\left(\frac{\exp(x_{t_i})}{\sum_j \exp(x_j)}\right) = w_{t_i} \left(x_{t_i} + \log\left(\sum_j \exp(x_i)\right)\right),\]where \(x_i\) is a 1D score Tensor, \(t_i\) is the target class and \(w\) is a weight Tensor to generate weighted loss for each class. When not specified, weight Tensor is set to be None and weight is the same (\(1\)) for all class.
Note
While the target classes are mutually exclusive, i.e., only one class is positive in the target, the predicted probabilities need not be exclusive. All that is required is that the predicted probability distribution of entry is a valid one.
 Parameters
 Inputs:
logits (Tensor)  Tensor of shape \((x_1, x_2, ..., x_R)\).
labels (Tensor)  Tensor of shape \((y_1, y_2, ..., y_S)\). If sparse is True, The type of labels is mindspore.int32. If sparse is False, the type of labels is same as the type of logits.
 Outputs:
Tensor, a tensor of the same shape as logits with the componentwise logistic losses.
Examples
>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True) >>> logits = Tensor(np.random.randint(0, 9, [1, 10]), mindspore.float32) >>> labels_np = np.ones([1,]).astype(np.int32) >>> labels = Tensor(labels_np) >>> loss(logits, labels)

class
mindspore.nn.
Tanh
[source]¶ Tanh activation function.
Applies the Tanh function elementwise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape.
Tanh function is defined as:
\[tanh(x_i) = \frac{\exp(x_i)  \exp(x_i)}{\exp(x_i) + \exp(x_i)} = \frac{\exp(2x_i)  1}{\exp(2x_i) + 1},\]where \(x_i\) is an element of the input Tensor.
 Inputs:
input_data (Tensor)  The input of Tanh.
 Outputs:
Tensor, with the same type and shape as the input_data.

class
mindspore.nn.
Top1CategoricalAccuracy
[source]¶ Calculates the top1 categorical accuracy. This class is a specialized class for TopKCategoricalAccuracy. Refer to class ‘TopKCategoricalAccuracy’ for more details.
Examples
>>> x = Tensor(np.array([[0.2, 0.5, 0.3, 0.6, 0.2], [0.1, 0.35, 0.5, 0.2, 0.], >>> [0.9, 0.6, 0.2, 0.01, 0.3]]), mindspore.float32) >>> y = Tensor(np.array([2, 0, 1]), mindspore.float32) >>> topk = nn.Top1CategoricalAccuracy() >>> topk.clear() >>> topk.update(x, y) >>> result = topk.eval()

class
mindspore.nn.
Top5CategoricalAccuracy
[source]¶ Calculates the top5 categorical accuracy. This class is a specialized class for TopKCategoricalAccuracy. Refer to class ‘TopKCategoricalAccuracy’ for more details.
Examples
>>> x = Tensor(np.array([[0.2, 0.5, 0.3, 0.6, 0.2], [0.1, 0.35, 0.5, 0.2, 0.], >>> [0.9, 0.6, 0.2, 0.01, 0.3]]), mindspore.float32) >>> y = Tensor(np.array([2, 0, 1]), mindspore.float32) >>> topk = nn.Top5CategoricalAccuracy() >>> topk.clear() >>> topk.update(x, y) >>> result = topk.eval()

class
mindspore.nn.
TopKCategoricalAccuracy
(k)[source]¶ Calculates the topk categorical accuracy.
Note
The method update must receive input of the form \((y_{pred}, y)\). If some samples have the same accuracy, the first sample will be chosen.
 Parameters
k (int) – Specifies the topk categorical accuracy to compute.
 Raises
TypeError – If k is not int.
ValueError – If k is less than 1.
Examples
>>> x = Tensor(np.array([[0.2, 0.5, 0.3, 0.6, 0.2], [0.1, 0.35, 0.5, 0.2, 0.], >>> [0.9, 0.6, 0.2, 0.01, 0.3]]), mindspore.float32) >>> y = Tensor(np.array([2, 0, 1]), mindspore.float32) >>> topk = nn.TopKCategoricalAccuracy(3) >>> topk.clear() >>> topk.update(x, y) >>> result = topk.eval()

update
(*inputs)[source]¶ Updates the internal evaluation result y_pred and y.
 Parameters
inputs – Input y_pred and y. y_pred and y are Tensor, list or numpy.ndarray. y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. y contains values of integers. The shape is \((N, C)\) if onehot encoding is used. Shape can also be \((N,)\) if category index is used.

class
mindspore.nn.
TrainOneStepCell
(network, optimizer, sens=1.0)[source]¶ Network training package class.
Wraps the network with an optimizer. The resulting Cell be trained with input data and label. Backward graph will be created in the construct function to do parameter updating. Different parallel modes are available to run the training.
 Parameters
 Inputs:
data (Tensor)  Tensor of shape \((N, \ldots)\).
label (Tensor)  Tensor of shape \((N, \ldots)\).
 Outputs:
Tensor, a scalar Tensor with shape \(()\).
Examples
>>> net = Net() >>> loss_fn = nn.SoftmaxCrossEntropyWithLogits() >>> optim = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9) >>> loss_net = nn.WithLossCell(net, loss_fn) >>> train_net = nn.TrainOneStepCell(loss_net, optim)

class
mindspore.nn.
TrainOneStepWithLossScaleCell
(network, optimizer, scale_update_cell=None)[source]¶ Network training with loss scaling.
This is a training step with loss scaling. It takes a network, an optimizer and possibly a scale update Cell as args. The loss scale value can be updated in both host side or device side. The TrainOneStepWithLossScaleCell will be compiled to be graph which takes data, label, sens as input data. The sens is acting as loss scaling value. If you want to update it on host side, the value should be provided. If sens is not given, the loss scale update logic should be provied by scale_update_cell. If scale_update_cell is not None and sens is provided, the scale_update_cell will be ignored.
 Parameters
 Inputs:
inputs (Tensor)  Tensor of shape \((N, \ldots)\).
label (Tensor)  Tensor of shape \((N, \ldots)\).
scaling_sens (Tensor)  Tensor of shape \(()\).
 Outputs:
Tuple of 3 Tensor, the loss, overflow flag and current loss scaling value.
loss (Tensor)  Tensor with shape \(()\).
overflow (Tensor)  Tensor with shape \(()\), type is bool.
loss_scale (Tensor)  Tensor with shape \(()\).
Examples
>>> net_with_loss = Net() >>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9) >>> manager = nn.DynamicLossScaleUpdateCell(init_loss_scale=2**12, scale_factor=2, scale_window=1000) >>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_update_cell=manager) >>> train_network.set_train() >>> >>> inputs = Tensor(np.ones([16, 16]).astype(np.float32)) >>> label = Tensor(np.zeros([16, 16]).astype(np.float32)) >>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mindspore.float32) >>> output = train_network(inputs, label, scaling_sens)

class
mindspore.nn.
Unfold
(ksizes, strides, rates, padding='valid')[source]¶ Extract patches from images. The input tensor must be a 4D tensor and the data format is NCHW.
 Parameters
ksizes (Union[tuple[int], list[int]]) – The size of sliding window, should be a tuple or list of int, and the format is [1, ksize_row, ksize_col, 1].
strides (Union[tuple[int], list[int]]) – Distance between the centers of the two consecutive patches, should be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].
rates (Union[tuple[int], list[int]]) – In each extracted patch, the gap between the corresponding dim pixel positions, should be a tuple or list of int, and the format is [1, rate_row, rate_col, 1].
padding (str) –
The type of padding algorithm, is a string whose value is “same” or “valid”, not case sensitive. Default: “valid”.
same: Means that the patch can take the part beyond the original image, and this part is filled with 0.
valid: Means that the patch area taken must be completely contained in the original image.
 Inputs:
input_x (Tensor)  A 4D tensor whose shape is [in_batch, in_depth, in_row, in_col] and data type is int8, float16, uint8.
 Outputs:
Tensor, a 4D tensor whose data type is same as ‘input_x’, and the shape is [out_batch, out_depth, out_row, out_col], the out_batch is same as the in_batch.
Examples
>>> net = Unfold(ksizes=[1, 2, 2, 1], strides=[1, 1, 1, 1], rates=[1, 1, 1, 1]) >>> image = Tensor(np.ones([1, 1, 3, 3]), dtype=mstype.float16) >>> net(image) Tensor ([[[[1, 1] [1, 1]] [[1, 1], [1, 1]] [[1, 1] [1, 1]], [[1, 1], [1, 1]]]], shape=(1, 4, 2, 2), dtype=mstype.float16)

class
mindspore.nn.
WithEvalCell
(network, loss_fn, add_cast_fp32=False)[source]¶ Cell that returns loss, output and label for evaluation.
This Cell accepts a network and loss function as arguments and computes loss for model. It returns loss, output and label to calculate the metrics.
 Inputs:
data (Tensor)  Tensor of shape \((N, \ldots)\).
label (Tensor)  Tensor of shape \((N, \ldots)\).
 Outputs:
Tuple, containing a scalar loss Tensor, a network output Tensor of shape \((N, \ldots)\) and a label Tensor of shape \((N, \ldots)\).
Examples
>>> # For a defined network Net without loss function >>> net = Net() >>> loss_fn = nn.SoftmaxCrossEntropyWithLogits() >>> eval_net = nn.WithEvalCell(net, loss_fn)

class
mindspore.nn.
WithGradCell
(network, loss_fn=None, sens=None)[source]¶ Cell that returns the gradients.
Wraps the network with backward cell to compute gradients. A network with a loss function is necessary as argument. If loss function in None, the network must be a wrapper of network and loss function. This Cell accepts data and label as inputs and returns gradients for each trainable parameter.
Note
Run in PyNative mode.
 Parameters
network (Cell) – The target network to wrap.
loss_fn (Cell) – Primitive loss function used to compute gradients. Default: None.
sens (Union[None, Tensor, Scalar, Tuple ...]) – The sensitive for backpropagation, the type and shape should be same as the network output. If None, we will fill one to a same type shape of output value. Default: None.
 Inputs:
data (Tensor)  Tensor of shape \((N, \ldots)\).
label (Tensor)  Tensor of shape \((N, \ldots)\).
 Outputs:
list, a list of Tensors with identical shapes as trainable weights.
Examples
>>> # For a defined network Net without loss function >>> net = Net() >>> loss_fn = nn.SoftmaxCrossEntropyWithLogits() >>> grad_net = nn.WithGradCell(net, loss_fn) >>> >>> # For a network wrapped with loss function >>> net = Net() >>> net_with_criterion = nn.WithLossCell(net, loss_fn) >>> grad_net = nn.WithGradCell(net_with_criterion)

class
mindspore.nn.
WithLossCell
(backbone, loss_fn)[source]¶ Cell with loss function.
Wraps the network with loss function. This Cell accepts data and label as inputs and the computed loss will be returned.
 Parameters
 Inputs:
data (Tensor)  Tensor of shape \((N, \ldots)\).
label (Tensor)  Tensor of shape \((N, \ldots)\).
 Outputs:
Tensor, a scalar tensor with shape \(()\).
Examples
>>> net = Net() >>> loss_fn = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True) >>> net_with_criterion = nn.WithLossCell(net, loss_fn) >>> >>> batch_size = 2 >>> data = Tensor(np.ones([batch_size, 3, 64, 64]).astype(np.float32) * 0.01) >>> label = Tensor(np.ones([batch_size, 1, 1, 1]).astype(np.int32)) >>> >>> net_with_criterion(data, label)

property
backbone_network
¶ Returns the backbone network.
 Returns
Cell, the backbone network.

mindspore.nn.
get_activation
(name)[source]¶ Gets the activation function.
 Parameters
name (str) – The name of the activation function.
 Returns
Function, the activation function.
Examples
>>> sigmoid = nn.get_activation('sigmoid')

mindspore.nn.
get_metric_fn
(name, *args, **kwargs)[source]¶ Gets the metric method base on the input name.
 Parameters
name (str) – The name of metric method. Refer to the ‘__factory__’ object for the currently supported metrics.
args – Arguments for the metric function.
kwargs – Keyword arguments for the metric function.
 Returns
Metric object, class instance of the metric method.
Examples
>>> metric = nn.get_metric_fn('precision', eval_type='classification')