mindspore.nn

Neural Networks Cells.

Pre-defined building blocks or computing units to construct Neural Networks.

class mindspore.nn.Accuracy(eval_type='classification')[source]

Calculates the accuracy for classification and multilabel data.

The accuracy class creates two local variables, the correct number and the total number that are used to compute the frequency with which predictions matches labels. This frequency is ultimately returned as the accuracy: an idempotent operation that simply divides the correct number by the total number.

\[\text{accuracy} =\frac{\text{true_positive} + \text{true_negative}} {\text{true_positive} + \text{true_negative} + \text{false_positive} + \text{false_negative}}\]
Parameters

eval_type (str) – Metric to calculate the accuracy over a dataset, for classification (single-label), and multilabel (multilabel classification). Default: ‘classification’.

Examples

>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]), mindspore.float32)
>>> y = Tensor(np.array([1, 0, 1]), mindspore.float32)
>>> metric = nn.Accuracy('classification')
>>> metric.clear()
>>> metric.update(x, y)
>>> accuracy = metric.eval()
clear()[source]

Clears the internal evaluation result.

eval()[source]

Computes the accuracy.

Returns

Float, the computed result.

Raises

RuntimeError – If the sample size is 0.

update(*inputs)[source]

Updates the internal evaluation result \(y_{pred}\) and \(y\).

Parameters

inputs – Input y_pred and y. y_pred and y are a Tensor, a list or an array. For the ‘classification’ evaluation type, y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. Shape of y can be \((N, C)\) with values 0 and 1 if one-hot encoding is used or the shape is \((N,)\) with integer values if index of category is used. For ‘multilabel’ evaluation type, y_pred and y can only be one-hot encoding with values 0 or 1. Indices with 1 indicate the positive category. The shape of y_pred and y are both \((N, C)\).

Raises

ValueError – If the number of the inputs is not 2.

class mindspore.nn.ActQuant(activation, ema_decay=0.999, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

Quantization aware training activation function.

Add the fake quant op to the end of activation op, by which the output of activation op will be truncated. Please check FakeQuantWithMinMax for more details.

Parameters
  • activation (Cell) – Activation cell class.

  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global steps. Default: 0.

Inputs:
  • x (Tensor) - The input of ReLU6Quant.

Outputs:

Tensor, with the same type and shape as the x.

Examples

>>> act_quant = nn.ActQuant(nn.ReLU())
>>> input_x = Tensor(np.array([[1, 2, -1], [-2, 0, -1]]), mindspore.float32)
>>> result = act_quant(input_x)
class mindspore.nn.Adam(params, learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e-08, use_locking=False, use_nesterov=False, weight_decay=0.0, loss_scale=1.0)[source]

Updates gradients by the Adaptive Moment Estimation (Adam) algorithm.

The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization.

The updating formulas are as follows,

\[\begin{split}\begin{array}{ll} \\ m = \beta_1 * m + (1 - \beta_1) * g \\ v = \beta_2 * v + (1 - \beta_2) * g * g \\ l = \alpha * \frac{\sqrt{1-\beta_2^t}}{1-\beta_1^t} \\ w = w - l * \frac{m}{\sqrt{v} + \epsilon} \end{array}\end{split}\]

\(m\) represents the 1st moment vector moment1, \(v\) represents the 2nd moment vector moment2, \(g\) represents gradients, \(l\) represents scaling factor lr, \(\beta_1, \beta_2\) represent beta1 and beta2, \(t\) represents updating step while \(beta_1^t\) and \(beta_2^t\) represent beta1_power and beta2_power, \(\alpha\) represents learning_rate, \(w\) represents params, \(\epsilon\) represents eps.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters is supported.

The sparse strategy is applied while the SparseGatherV2 operator is used for forward network. The sparse feature is under continuous development. The sparse behavior is currently performed on the CPU.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” is in the keys, the value of the corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” is in the keys, the value of the corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” is in the keys, the value must be the order of parameters and the order will be followed in the optimizer. There are no other keys in the dict and the parameters which in the ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float. Default: 1e-3.

  • beta1 (float) – The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0). Default: 0.9.

  • beta2 (float) – The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0). Default: 0.999.

  • eps (float) – Term added to the denominator to improve numerical stability. Should be greater than 0. Default: 1e-8.

  • use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If true, updates of the var, m, and v tensors will be protected by a lock. If false, the result is unpredictable. Default: False.

  • use_nesterov (bool) – Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. If true, update the gradients using NAG. If false, update the gradients without using NAG. Default: False.

  • weight_decay (float) – Weight decay (L2 penalty). It must be equal to or greater than 0. Default: 0.0.

  • loss_scale (float) – A floating point value for the loss scale. Should be greater than 0. Default: 1.0.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

Tensor[bool], the value is True.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.Adam(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.Adam(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
>>> # The no_conv_params's parameters will use learning rate of 0.01 and defaule weight decay of 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.AdamWeightDecay(params, learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e-06, weight_decay=0.0)[source]

Implements the Adam algorithm to fix the weight decay.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” is in the keys, the value of the corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” is in the keys, the value of the corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” is in the keys, the value must be the order of parameters and the order will be followed in the optimizer. There are no other keys in the dict and the parameters which in the ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use the dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float. Default: 1e-3.

  • beta1 (float) – The exponential decay rate for the 1st moment estimations. Default: 0.9. Should be in range (0.0, 1.0).

  • beta2 (float) – The exponential decay rate for the 2nd moment estimations. Default: 0.999. Should be in range (0.0, 1.0).

  • eps (float) – Term added to the denominator to improve numerical stability. Default: 1e-6. Should be greater than 0.

  • weight_decay (float) – Weight decay (L2 penalty). It must be equal to or greater than 0. Default: 0.0.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

tuple[bool], all elements are True.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.AdamWeightDecay(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.AdamWeightDecay(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
>>> # The no_conv_params's parameters will use learning rate of 0.01 and default weight decay of 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.AvgPool1d(kernel_size=1, stride=1, pad_mode='valid')[source]

Average pooling for temporal data.

Applies a 1D average pooling over an input Tensor which can be regarded as a composition of 1D input planes.

Typically the input is of shape \((N_{in}, C_{in}, L_{in})\), AvgPool1d outputs regional average in the \((L_{in})\)-dimension. Given kernel size \(ks = l_{ker}\) and stride \(s = s_0\), the operation is as follows.

\[\text{output}(N_i, C_j, l) = \frac{1}{l_{ker}} \sum_{n=0}^{l_{ker}-1} \text{input}(N_i, C_j, s_0 \times l + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (int) – The size of kernel window used to take the average value, Default: 1.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement is strides, Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, L_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, L_{out})\).

Examples

>>> pool = nn.AvgPool1d(kernel_size=6, strides=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 3, 6]), mindspore.float32)
>>> output = pool(x)
>>> output.shape
(1, 3, 1)
class mindspore.nn.AvgPool2d(kernel_size=1, stride=1, pad_mode='valid')[source]

Average pooling for temporal data.

Applies a 2D average pooling over an input Tensor which can be regarded as a composition of 2D input planes.

Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), AvgPool2d outputs regional average in the \((H_{in}, W_{in})\)-dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows.

\[\text{output}(N_i, C_j, h, w) = \frac{1}{h_{ker} * w_{ker}} \sum_{m=0}^{h_{ker}-1} \sum_{n=0}^{w_{ker}-1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the average value. The data type of kernel_size must be int and the value represents the height and width, or a tuple of two int numbers that represent height and width respectively. Default: 1.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> pool = nn.AvgPool2d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32)
[[[[5. 5. 9. 9.]
    [8. 4. 3. 0.]
    [2. 7. 1. 2.]
    [1. 8. 3. 3.]]
   [[6. 8. 2. 4.]
    [3. 0. 2. 1.]
    [0. 8. 9. 7.]
    [2. 1. 4. 9.]]]]
>>> output = pool(x)
>>> output.shape
(1, 2, 2, 2)
>>> output
[[[[4.888889  4.4444447]
   [4.111111  3.4444444]]
  [[4.2222223 4.5555553]
   [3.2222223 4.5555553]]]]
class mindspore.nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None)[source]

Batch normalization layer over a 2D input.

Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 2D input (a mini-batch of 1D inputs) to reduce internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore the mode is not recommended to be changed after net was initilized.

Parameters
  • num_features (int) – C from an expected input of size (N, C).

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.

  • affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, the training process will use the mean and variance of current batch data and track the running mean and variance, the evaluation process will use the running mean and variance. Default: None.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out})\).

Examples

>>> net = nn.BatchNorm1d(num_features=16)
>>> input = Tensor(np.random.randint(0, 255, [3, 16]), mindspore.float32)
>>> net(input)
class mindspore.nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None)[source]

Batch normalization layer over a 4D input.

Batch Normalization is widely used in convolutional networks. This layer applies Batch Normalization over a 4D input (a mini-batch of 2D inputs with additional channel dimension) to avoid internal covariate shift as described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

The implementation of BatchNorm is different in graph mode and pynative mode, therefore that mode can not be changed after net was initilized.

Parameters
  • num_features (int) – C from an expected input of size (N, C, H, W).

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.

  • affine (bool) – A bool value. When set to True, gamma and beta can be learned. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, the training process will use the mean and variance of current batch data and track the running mean and variance, the evaluation process will use the running mean and variance. Default: None.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> net = nn.BatchNorm2d(num_features=3)
>>> input = Tensor(np.random.randint(0, 255, [1, 3, 224, 224]), mindspore.float32)
>>> net(input)
class mindspore.nn.Cell(auto_prefix=True, flags=None)[source]

Base class for all neural networks.

A ‘Cell’ could be a single neural network cell, such as conv2d, relu, batch_norm, etc. or a composition of cells to constructing a network.

Note

In general, the autograd algorithm will automatically generate the implementation of the gradient function, but if bprop method is implemented, the gradient function will be replaced by the bprop. The bprop implementation will receive a Tensor dout containing the gradient of the loss w.r.t. the output, and a Tensor out containing the forward result. The bprop needs to compute the gradient of the loss w.r.t. the inputs, gradient of the loss w.r.t. Parameter variables are not supported currently.

Parameters

auto_prefix (bool) – Recursively generate namespaces. Default: True.

Examples

>>> class MyCell(Cell):
>>>    def __init__(self):
>>>        super(MyCell, self).__init__()
>>>        self.relu = P.ReLU()
>>>
>>>    def construct(self, x):
>>>        return self.relu(x)
property bprop_debug

Get whether cell custom bprop debug is enabled.

cast_param(param)[source]

Cast parameter according to auto mix precison level in pynative mode.

Parameters

param (Parameter) – The parameter to cast.

cells()[source]

Returns an iterator over immediate cells.

cells_and_names(cells=None, name_prefix='')[source]

Returns an iterator over all cells in the network.

Includes the cell’s name and itself.

Parameters
  • cells (str) – Cells to iterate over. Default: None.

  • name_prefix (str) – Namespace. Default: ‘’.

Examples

>>> n = Net()
>>> names = []
>>> for m in n.cells_and_names():
>>>     if m[0]:
>>>         names.append(m[0])
compile(*inputs)[source]

Compiles cell.

Parameters

inputs (tuple) – Input parameters.

compile_and_run(*inputs)[source]

Compiles and runs cell.

Parameters

inputs (tuple) – Input parameters.

Returns

Object, the result of executing.

construct(*inputs, **kwargs)[source]

Defines the computation to be performed.

This method must be overridden by all subclasses.

Note

The inputs of the top cell only allow Tensor. Other types (tuple, list, int etc.) are forbidden.

Returns

Tensor, returns the computed result.

exec_checkpoint_graph()[source]

Executes saving checkpoint graph operation.

extend_repr()[source]

Sets the extended representation of the Cell.

To print customized extended information, re-implement this method in your own cells.

generate_scope()[source]

Generate the scope for each cell object in the network.

get_func_graph_proto()[source]

Return graph binary proto.

get_parameters(expand=True)[source]

Returns an iterator over cell parameters.

Yields parameters of this cell. If expand is True, yield parameters of this cell and all subcells.

Parameters

expand (bool) – If true, yields parameters of this cell and all subcells. Otherwise, only yield parameters that are direct members of this cell. Default: True.

Examples

>>> net = Net()
>>> for item in net.get_parameters():
>>>     print(item)
get_scope()[source]

Returns the scope of a cell object in one network.

init_parameters_data(auto_parallel_mode=False)[source]

Initialize all parameters and replace the original saved parameters in cell.

Notes

trainable_params() and other similar interfaces may return different parameter instance after init_parameters_data, do not save these result.

Parameters

auto_parallel_mode (bool) – If running in auto_parallel_mode.

Returns

Dict[Parameter, Parameter], returns a dict of original parameter and replaced parameter.

insert_child_to_cell(child_name, child)[source]

Adds a child cell to the current cell.

Inserts a subcell with a given name to the current cell.

Parameters
  • child_name (str) – Name of the child cell.

  • child (Cell) – The child cell to be inserted.

Raises
  • KeyError – Child Cell’s name is incorrect or duplicated with the other child name.

  • TypeError – Child Cell’s type is incorrect.

insert_param_to_cell(param_name, param, check_name=True)[source]

Adds a parameter to the current cell.

Inserts a parameter with given name to the cell. Please refer to the usage in source code of mindspore.nn.Cell.__setattr__.

Parameters
  • param_name (str) – Name of the parameter.

  • param (Parameter) – Parameter to be inserted to the cell.

  • check_name (bool) – Determines whether the name input is compatible. Default: True.

Raises
  • KeyError – If the name of parameter is null or contains dot.

  • AttributeError – If user did not call init() first.

  • TypeError – If the type of parameter is not Parameter.

load_parameter_slice(params)[source]

Replace parameters with sliced tensors by parallel strategies.

Please refer to the usage in source code of mindspore.common._Executor.compile.

Parameters

params (dict) – The parameters dictionary used for initializing the data graph.

name_cells()[source]

Returns an iterator over all cells in the network.

Include name of the cell and cell itself.

property param_prefix

Param prefix is the prefix of current cell’s direct child parameter.

parameters_and_names(name_prefix='', expand=True)[source]

Returns an iterator over cell parameters.

Includes the parameter’s name and itself.

Parameters
  • name_prefix (str) – Namespace. Default: ‘’.

  • expand (bool) – If true, yields parameters of this cell and all subcells. Otherwise, only yield parameters that are direct members of this cell. Default: True.

Examples

>>> n = Net()
>>> names = []
>>> for m in n.parameters_and_names():
>>>     if m[0]:
>>>         names.append(m[0])
parameters_dict(recurse=True)[source]

Gets parameters dictionary.

Gets the parameters dictionary of this cell.

Parameters

recurse (bool) – Whether contains the parameters of subcells. Default: True.

Returns

OrderedDict, return parameters dictionary.

register_backward_hook(fn)[source]

Set the cell backward hook function. Note that this function is only supported in Pynative Mode.

Note

fn must be defined as the following code. cell_name is the name of registered cell. grad_input is gradient passed to the cell. grad_output is the gradient computed and passed to the next cell or primitve, which may be modified and returned. >>> hook_fn(cell_name, grad_input, grad_output) -> Tensor or None

Parameters

fn (function) – Specifies the hook function with grad as input.

set_auto_parallel()[source]

Set the cell to auto parallel mode.

Note

If a cell needs to use the auto parallel or semi auto parallel mode for training, evaluation or prediction, this interface needs to be called by the cell.

set_broadcast_flag(mode=True)[source]

Set the cell to data_parallel mode.

The cell can be accessed as an attribute using the given name.

Parameters

mode (bool) – Specifies whether the model is data_parallel. Default: True.

set_grad(requires_grad=True)[source]

Sets the cell flag for gradient.

Parameters

requires_grad (bool) – Specifies if the net need to grad, if it is True, cell will construct backward network in pynative mode. Default: True.

set_parallel_input_with_inputs(*inputs)[source]

Slice inputs tensors by parallel strategies, and set the sliced inputs to _parallel_input_run

Parameters

inputs (tuple) – inputs of construct method.

set_param_ps(recurse=True, init_in_server=False)[source]

Set whether the trainable parameter is updated by parameter server.

Note

It only works when a running task is in the parameter server mode.

Parameters

recurse (bool) – Whether sets the trainable parameters of subcells. Default: True.

set_train(mode=True)[source]

Sets the cell to training mode.

The cell itself and all children cells will be set to training mode.

Parameters

mode (bool) – Specifies whether the model is training. Default: True.

to_float(dst_type)[source]

Add cast on all inputs of cell and child cells to run with certain float type.

If dst_type is mindspore.dtype.float16, all the inputs of Cell including input, Parameter, Tensor as const will be cast to float16. Please refer to the usage in source code of mindspore.train.amp.build_train_network.

Note

Multiple calls will overwrite.

Parameters

dst_type (mindspore.dtype) – Transfer Cell to Run with dst_type. dst_type can be mindspore.dtype.float16 or mindspore.dtype.float32.

Raises

ValueError – If dst_type is not float32 nor float16.

trainable_params(recurse=True)[source]

Returns all trainable parameters.

Returns a list of all trainable parmeters.

Parameters

recurse (bool) – Whether contains the trainable parameters of subcells. Default: True.

Returns

List, the list of trainable parameters.

untrainable_params(recurse=True)[source]

Returns all untrainable parameters.

Returns a list of all untrainable parmeters.

Parameters

recurse (bool) – Whether contains the untrainable parameters of subcells. Default: True.

Returns

List, the list of untrainable parameters.

update_cell_prefix()[source]

Update the all child cells’ self.param_prefix.

After being invoked, it can get all the cell’s children’s name prefix by ‘_param_prefix’.

update_cell_type(cell_type)[source]

The current cell type is updated when a quantization aware training network is encountered.

After being invoked, it can set the cell type to ‘cell_type’.

update_parameters_name(prefix='', recurse=True)[source]

Updates the names of parameters with given prefix string.

Adds the given prefix to the names of parameters.

Parameters
  • prefix (str) – The prefix string.

  • recurse (bool) – Whether contains the parameters of subcells. Default: True.

class mindspore.nn.CellList(*args)[source]

Holds Cells in a list.

CellList can be used like a regular Python list, support ‘__getitem__’, ‘__setitem__’, ‘__delitem__’, ‘__len__’, ‘__iter__’ and ‘__iadd__’, but cells it contains are properly registered, and will be visible by all Cell methods.

Parameters

args (list, optional) – List of subclass of Cell.

Examples

>>> conv = nn.Conv2d(100, 20, 3)
>>> bn = nn.BatchNorm2d(20)
>>> relu = nn.ReLU()
>>> cell_ls = nn.CellList([bn])
>>> cell_ls.insert(0, conv)
>>> cell_ls.append(relu)
>>> x = Tensor(np.random.random((1, 3, 4, 4)), dtype=mindspore.float32)
>>> # not same as nn.SequentialCell, `cell_ls(x)` is not correct
>>> cell_ls
CellList< (0): Conv2d<input_channels=100, ..., bias_init=None>
          (1): BatchNorm2d<num_features=20, ..., moving_variance=Parameter (name=variance)>
          (2): ReLU<> >
append(cell)[source]

Appends a given cell to the end of the list.

extend(cells)[source]

Appends cells from a Python iterable to the end of the list.

Raises

TypeError – If the cells are not a list of subcells.

insert(index, cell)[source]

Inserts a given cell before a given index in the list.

class mindspore.nn.CentralCrop(central_fraction)[source]

Crop the centeral region of the images with the central_fraction.

Parameters

central_fraction (float) – Fraction of size to crop. It must be float and in range (0.0, 1.0].

Inputs:
  • image (Tensor) - A 3-D tensor of shape [C, H, W], or a 4-D tensor of shape [N, C, H, W].

Outputs:

Tensor, 3-D or 4-D float tensor, according to the input.

Examples

>>> net = nn.CentralCrop(central_fraction=0.5)
>>> image = Tensor(np.random.random((4, 3, 4, 4)), mindspore.float32)
>>> output = net(image)
class mindspore.nn.ClipByNorm[source]

Clips tensor values to a maximum \(L_2\)-norm.

The output of this layer remains the same if the \(L_2\)-norm of the input tensor is not greater than the argument clip_norm. Otherwise the tensor will be normalized as:

\[\text{output}(X) = \frac{\text{clip_norm} * X}{L_2(X)},\]

where \(L_2(X)\) is the \(L_2\)-norm of \(X\).

Inputs:
  • input (Tensor) - Tensor of shape N-D. The type must be float32 or float16.

  • clip_norm (Tensor) - A scalar Tensor of shape \(()\) or \((1)\).

Outputs:

Tensor, clipped tensor with the same shape as the input, whose type is float32.

Examples

>>> net = nn.ClipByNorm()
>>> input = Tensor(np.random.randint(0, 10, [4, 16]), mindspore.float32)
>>> clip_norm = Tensor(np.array([100]).astype(np.float32))
>>> net(input, clip_norm)
construct(x, clip_norm)[source]

add ms_function decorator for pynative mode

class mindspore.nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

1D convolution layer.

Applies a 1D convolution over an input tensor which is typically of shape \((N, C_{in}, W_{in})\), where \(N\) is batch size and \(C_{in}\) is channel number. For each batch of shape \((C_{in}, W_{in})\), the formula is defined as:

\[out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,\]

where \(ccor\) is the cross correlation operator, \(C_{in}\) is the input channel number, \(j\) ranges from \(0\) to \(C_{out} - 1\), \(W_{ij}\) corresponds to the \(i\)-th channel of the \(j\)-th filter and \(out_{j}\) corresponds to the \(j\)-th channel of the output. \(W_{ij}\) is a slice of kernel and it has shape \((\text{ks_w})\), where \(\text{ks_w}\) is the width of the convolution kernel. The full kernel has shape \((C_{out}, C_{in} // \text{group}, \text{ks_w})\), where group is the group number to split the input in the channel dimension.

If the ‘pad_mode’ is set to be “valid”, the output width will be \(\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} - (\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) respectively.

The first introduction of convolution layer can be found in paper Gradient Based Learning Applied to Document Recognition.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (int) – The data type is int. Specifies the width of the 1D convolution window.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement. Default: 1.

  • pad_mode (str) –

    Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

    • same: Adopts the way of completion. The output width will be the same as the input. The total number of padding will be calculated in the horizontal direction and evenly distributed to left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side. If this mode is set, padding must be 0.

    • valid: Adopts the way of discarding. The possible largest width of the output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.

    • pad: Implicit paddings on both sides of the input. The number of padding will be padded to the input Tensor borders. padding must be greater than or equal to 0.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – The data type is int. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – An initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, W_{out})\).

Examples

>>> net = nn.Conv1d(120, 240, 4, has_bias=False, weight_init='normal')
>>> input = Tensor(np.ones([1, 120, 640]), mindspore.float32)
>>> net(input).shape
(1, 240, 640)
class mindspore.nn.Conv1dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

1D transposed convolution layer.

Compute a 1D transposed convolution, which is also known as a deconvolution (although it is not an actual deconvolution).

Input is typically of shape \((N, C, W)\), where \(N\) is batch size and \(C\) is channel number.

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • kernel_size (int) – int, which specifies the width of the 1D convolution window.

  • stride (int) – The distance of kernel moving, an int number that represents the width of movement. Default: 1.

  • pad_mode (str) –

    Select the mode of the pad. The optional values are “pad”, “same”, “valid”. Default: “same”.

    • pad: Implicit paddings on both sides of the input.

    • same: Adopted the way of completion.

    • valid: Adopted the way of discarding.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – The data type is int. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_channels and out_channels must be divisible by the number of groups. This is not support for Davinci devices when group > 1. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a numbers.Number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, W_{out})\).

Examples

>>> net = nn.Conv1dTranspose(3, 64, 4, has_bias=False, weight_init='normal')
>>> input = Tensor(np.ones([1, 3, 50]), mindspore.float32)
>>> net(input)
class mindspore.nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

2D convolution layer.

Applies a 2D convolution over an input tensor which is typically of shape \((N, C_{in}, H_{in}, W_{in})\), where \(N\) is batch size, \(C_{in}\) is channel number, and \(H_{in}, W_{in})\) are height and width. For each batch of shape \((C_{in}, H_{in}, W_{in})\), the formula is defined as:

\[out_j = \sum_{i=0}^{C_{in} - 1} ccor(W_{ij}, X_i) + b_j,\]

where \(ccor\) is the cross-correlation operator, \(C_{in}\) is the input channel number, \(j\) ranges from \(0\) to \(C_{out} - 1\), \(W_{ij}\) corresponds to the \(i\)-th channel of the \(j\)-th filter and \(out_{j}\) corresponds to the \(j\)-th channel of the output. \(W_{ij}\) is a slice of kernel and it has shape \((\text{ks_h}, \text{ks_w})\), where \(\text{ks_h}\) and \(\text{ks_w}\) are the height and width of the convolution kernel. The full kernel has shape \((C_{out}, C_{in} // \text{group}, \text{ks_h}, \text{ks_w})\), where group is the group number to split the input in the channel dimension.

If the ‘pad_mode’ is set to be “valid”, the output height and width will be \(\left \lfloor{1 + \frac{H_{in} + 2 \times \text{padding} - \text{ks_h} - (\text{ks_h} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) and \(\left \lfloor{1 + \frac{W_{in} + 2 \times \text{padding} - \text{ks_w} - (\text{ks_w} - 1) \times (\text{dilation} - 1) }{\text{stride}}} \right \rfloor\) respectively.

The first introduction can be found in paper Gradient Based Learning Applied to Document Recognition.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the height and width of the 2D convolution window. Single int means the value is for both the height and the width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.

  • pad_mode (str) –

    Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side. If this mode is set, padding must be 0.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded. If this mode is set, padding must be 0.

    • pad: Implicit paddings on both sides of the input. The number of padding will be padded to the input Tensor borders. padding must be greater than or equal to 0.

  • padding (Union[int, tuple[int]]) – Implicit paddings on both sides of the input. If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple with four integers, the paddings of top, bottom, left and right will be equal to padding[0], padding[1], padding[2], and padding[3] accordingly. Default: 0.

  • dilation (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater or equal to 1 and bounded by the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. If the group is equal to in_channels and out_channels, this 2D convolution layer also can be called 2D depthwise convolution layer. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> net = nn.Conv2d(120, 240, 4, has_bias=False, weight_init='normal')
>>> input = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32)
>>> net(input).shape
(1, 240, 1024, 640)
class mindspore.nn.Conv2dBnAct(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', has_bn=False, momentum=0.9, eps=1e-05, activation=None, alpha=0.2, after_fake=True)[source]

A combination of convolution, Batchnorm, activation layer.

This part is a more detailed overview of Conv2d op.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple]) – The data type is int or a tuple of 2 integers. Specifies the height and width of the 2D convolution window. Single int means the value is for both height and width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.

  • stride (int) – Specifies stride for all spatial dimensions with the same value. The value of stride must be greater than or equal to 1 and lower than any one of the height and width of the input. Default: 1.

  • pad_mode (str) – Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater than or equal to 1 and lower than any one of the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

  • has_bn (bool) – Specifies to used batchnorm or not. Default: False.

  • momentum (float) – Momentum for moving average.Momentum value must be [0, 1].Default:0.9

  • eps (float) – Term added to the denominator to improve numerical stability. Should be greater than 0. Default: 1e-5.

  • activation (Cell) – Specifies activation type. The optional values are as following: ‘softmax’, ‘logsoftmax’, ‘relu’, ‘relu6’, ‘tanh’, ‘gelu’, ‘sigmoid’, ‘prelu’, ‘leakyrelu’, ‘hswish’, ‘hsigmoid’. Default: None.

  • alpha (float) – Slope of the activation function at x < 0. Default: 0.2.

  • after_fake (bool) – Determin whether there must be a fake quantization operation after Cond2dBnAct.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> net = Conv2dBnAct(120, 240, 4, has_bn=True, activation='ReLU')
>>> input = Tensor(np.ones([1, 120, 1024, 640]), mindspore.float32)
>>> net(input).shape
(1, 240, 1024, 640)
class mindspore.nn.Conv2dBnFoldQuant(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, eps=1e-05, momentum=0.997, has_bias=False, weight_init='normal', bias_init='zeros', beta_init='zeros', gamma_init='ones', mean_init='zeros', var_init='ones', fake=True, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0, freeze_bn=100000)[source]

2D convolution with BatchNormal op folded construct.

This part is a more detailed overview of Conv2d op.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple]) – Specifies the height and width of the 2D convolution window.

  • stride (int) – Specifies stride for all spatial dimensions with the same value.

  • pad_mode (str) – Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • eps (float) – Parameters for BatchNormal. Default: 1e-5.

  • momentum (float) – Parameters for BatchNormal op. Default: 0.997.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Default: ‘zeros’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta vector. Default: ‘zeros’.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma vector. Default: ‘ones’.

  • mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the mean vector. Default: ‘zeros’.

  • var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the variance vector. Default: ‘ones’.

  • fake (bool) – Whether Conv2dBnFoldQuant Cell adds FakeQuantWithMinMax op. Default: True.

  • per_channel (bool) – FakeQuantWithMinMax Parameters. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – The Quantization delay parameters according to the global step. Default: 0.

  • freeze_bn (int) – The quantization freeze BatchNormal op is according to the global step. Default: 100000.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> conv2d_bn = nn.Conv2dBnFoldQuant(1, 6, kernel_size=(2, 2), stride=(1, 1), pad_mode="valid")
>>> x = Tensor(np.random.randint(-2, 2, (2, 1, 1, 3)), mindspore.float32)
>>> y = conv2d_bn(x)
class mindspore.nn.Conv2dBnWithoutFoldQuant(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, eps=1e-05, momentum=0.997, weight_init='normal', bias_init='zeros', per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

2D convolution + batchnorm without fold with fake quant construct.

This part is a more detailed overview of Conv2d op.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple]) – Specifies the height and width of the 2D convolution window.

  • stride (int) – Specifies stride for all spatial dimensions with the same value. Default: 1.

  • pad_mode (str) – Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – Specifies the dilation rate to use for dilated convolution. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • eps (float) – Parameters for BatchNormal. Default: 1e-5.

  • momentum (float) – Parameters for BatchNormal op. Default: 0.997.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Default: ‘zeros’.

  • per_channel (bool) – FakeQuantWithMinMax Parameters. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> conv2d_quant = nn.Conv2dBnWithoutFoldQuant(1, 6, kernel_size=(2, 2), stride=(1, 1), pad_mode="valid")
>>> x = Tensor(np.random.randint(-2, 2, (2, 1, 1, 3)), mstype.float32)
>>> y = conv2d_quant(x)
class mindspore.nn.Conv2dQuant(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros', per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

2D convolution with fake quant op layer.

This part is a more detailed overview of Conv2d op.

Parameters
  • in_channels (int) – The number of input channel \(C_{in}\).

  • out_channels (int) – The number of output channel \(C_{out}\).

  • kernel_size (Union[int, tuple]) – Specifies the height and width of the 2D convolution window.

  • stride (int) – Specifies stride for all spatial dimensions with the same value. Default: 1.

  • pad_mode (str) – Specifies padding mode. The optional values are “same”, “valid”, “pad”. Default: “same”.

  • padding (int) – Implicit paddings on both sides of the input. Default: 0.

  • dilation (int) – Specifies the dilation rate to use for dilated convolution. Default: 1.

  • group (int) – Splits filter into groups, in_ channels and out_channels must be divisible by the number of groups. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Default: ‘zeros’.

  • per_channel (bool) – FakeQuantWithMinMax Parameters. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> conv2d_quant = nn.Conv2dQuant(1, 6, kernel_size= (2, 2), stride=(1, 1), pad_mode="valid")
>>> x = Tensor(np.random.randint(-2, 2, (2, 1, 1, 3)), mindspore.float32)
>>> y = conv2d_quant(x)
class mindspore.nn.Conv2dTranspose(in_channels, out_channels, kernel_size, stride=1, pad_mode='same', padding=0, dilation=1, group=1, has_bias=False, weight_init='normal', bias_init='zeros')[source]

2D transposed convolution layer.

Compute a 2D transposed convolution, which is also known as a deconvolution (although it is not an actual deconvolution).

Input is typically of shape \((N, C, H, W)\), where \(N\) is batch size and \(C\) is channel number.

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • kernel_size (Union[int, tuple]) – int or a tuple of 2 integers, which specifies the height and width of the 2D convolution window. Single int means the value is for both the height and the width of the kernel. A tuple of 2 ints means the first value is for the height and the other is for the width of the kernel.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Its value must be equal to or greater than 1. Default: 1.

  • pad_mode (str) –

    Select the mode of the pad. The optional values are “pad”, “same”, “valid”. Default: “same”.

    • pad: Implicit paddings on both sides of the input.

    • same: Adopted the way of completion.

    • valid: Adopted the way of discarding.

  • padding (Union[int, tuple[int]]) – Implicit paddings on both sides of the input. If padding is one integer, the paddings of top, bottom, left and right are the same, equal to padding. If padding is a tuple with four integers, the paddings of top, bottom, left and right will be equal to padding[0], padding[1], padding[2], and padding[3] accordingly. Default: 0.

  • dilation (Union[int, tuple[int]]) – The data type is int or a tuple of 2 integers. Specifies the dilation rate to use for dilated convolution. If set to be \(k > 1\), there will be \(k - 1\) pixels skipped for each sampling location. Its value must be greater than or equal to 1 and bounded by the height and width of the input. Default: 1.

  • group (int) – Splits filter into groups, in_channels and out_channels must be divisible by the number of groups. This does not support for Davinci devices when group > 1. Default: 1.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: False.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the convolution kernel. It can be a Tensor, a string, an Initializer or a number. When a string is specified, values from ‘TruncatedNormal’, ‘Normal’, ‘Uniform’, ‘HeUniform’ and ‘XavierUniform’ distributions as well as constant ‘One’ and ‘Zero’ distributions are possible. Alias ‘xavier_uniform’, ‘he_uniform’, ‘ones’ and ‘zeros’ are acceptable. Uppercase and lowercase are both acceptable. Refer to the values of Initializer for more details. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the bias vector. Possible Initializer and string are the same as ‘weight_init’. Refer to the values of Initializer for more details. Default: ‘zeros’.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> net = nn.Conv2dTranspose(3, 64, 4, has_bias=False, weight_init='normal')
>>> input = Tensor(np.ones([1, 3, 16, 50]), mindspore.float32)
>>> net(input)
class mindspore.nn.CosineDecayLR(min_lr, max_lr, decay_steps)[source]

Calculate learning rate base on cosine decay function.

For the i-th step, the formula of computing decayed_learning_rate[i] is:

\[decayed\_learning\_rate[i] = min\_learning\_rate + 0.5 * (max\_learning\_rate - min\_learning\_rate) * (1 + cos(\frac{current\_step}{decay\_steps}\pi))\]
Parameters
  • min_lr (float) – The minimum value of learning rate.

  • max_lr (float) – The maximum value of learning rate.

  • decay_steps (int) – A value used to calculate decayed learning rate.

Inputs:

Tensor. The current step number.

Returns

Tensor. The learning rate value for the current step.

Examples

>>> min_lr = 0.01
>>> max_lr = 0.1
>>> decay_steps = 4
>>> global_step = Tensor(2, mstype.int32)
>>> cosine_decay_lr = CosineDecayLR(min_lr, max_lr, decay_steps)
>>> cosine_decay_lr(global_steps)
class mindspore.nn.CosineEmbeddingLoss(margin=0.0, reduction='mean')[source]

Computes the similarity between two tensors using cosine distance.

Given two tensors x1, x2, and a Tensor label y with values 1 or -1:

\[\begin{split}loss(x_1, x_2, y) = \begin{cases} 1-cos(x_1, x_2), & \text{if } y = 1\\ max(0, cos(x_1, x_2)-margin), & \text{if } y = -1\\ \end{cases}\end{split}\]
Parameters
  • margin (float) – Should be in [-1.0, 1.0]. Default 0.0.

  • reduction (str) – Specifies which reduction to be applied to the output. It must be one of “none”, “mean”, and “sum”, meaning no reduction, reduce mean and sum on output, respectively. Default “mean”.

Inputs:
  • input_x1 (Tensor) - Input tensor.

  • input_x2 (Tensor) - Its shape and data type must be the same as input_x1’s shape and data type.

  • y (Tensor) - Contains value 1 or -1. Suppose the shape of input_x1 is \((x_1, x_2, x_3,..., x_R)\), then the shape of target must be \((x_1, x_3, x_4, ..., x_R)\).

Outputs:
  • loss (Tensor) - If reduction is “none”, its shape is the same as y’s shape, otherwise a scalar value will be returned.

Examples

>>> x1 = Tensor(np.array([[0.3, 0.8], [0.4, 0.3]]), mindspore.float32)
>>> x2 = Tensor(np.array([[0.4, 1.2], [-0.4, -0.9]]), mindspore.float32)
>>> y = Tensor(np.array([1,-1]), mindspore.int32)
>>> cosine_embedding_loss = P.CosineEmbeddingLoss()
>>> cosine_embedding_loss(x1, x2, y)
[0.0003426671]
class mindspore.nn.Dense(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, activation=None)[source]

The fully connected layer.

Applies dense-connected layer for the input. This layer implements the operation as:

\[\text{outputs} = \text{activation}(\text{inputs} * \text{kernel} + \text{bias}),\]

where \(\text{activation}\) is the activation function passed as the activation argument (if passed in), \(\text{kernel}\) is a weight matrix with the same data type as the inputs created by the layer, and \(\text{bias}\) is a bias vector with the same data type as the inputs created by the layer (only if has_bias is True).

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘zeros’.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.

  • activation (str) – activate function applied to the output of the fully connected layer, eg. ‘ReLU’. Default: None.

Raises

ValueError – If weight_init or bias_init shape is incorrect.

Inputs:
  • input (Tensor) - Tensor of shape \((N, in\_channels)\).

Outputs:

Tensor of shape \((N, out\_channels)\).

Examples

>>> net = nn.Dense(3, 4)
>>> input = Tensor(np.random.randint(0, 255, [2, 3]), mindspore.float32)
>>> net(input)
[[ 2.5246444   2.2738023   0.5711005  -3.9399147 ]
 [ 1.0739875   4.0155234   0.94188046 -5.459526  ]]
class mindspore.nn.DenseBnAct(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, has_bn=False, activation=None, after_fake=True)[source]

A combination of Dense, Batchnorm, and the activation layer.

This part is a more detailed overview of Dense op.

Parameters
  • in_channels (int) – The number of channels in the input space.

  • out_channels (int) – The number of channels in the output space.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘zeros’.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.

  • activation (string) – The regularization function applied to the output of the layer, eg. ‘ReLU’. Default: None.

  • has_bn (bool) – Specifies to use batchnorm or not. Default: False.

  • activation – Specifies activation type. The optional values are as following: ‘Softmax’, ‘LogSoftmax’, ‘ReLU’, ‘ReLU6’, ‘Tanh’, ‘GELU’, ‘Sigmoid’, ‘PReLU’, ‘LeakyReLU’, ‘h-Swish’, and ‘h-Sigmoid’. Default: None.

  • after_fake (bool) – Determin whether there must be a fake quantization operation after DenseBnAct.

Inputs:
  • input (Tensor) - Tensor of shape \((N, in\_channels)\).

Outputs:

Tensor of shape \((N, out\_channels)\).

Examples

>>> net = nn.DenseBnAct(3, 4)
>>> input = Tensor(np.random.randint(0, 255, [2, 3]), mindspore.float32)
>>> net(input)
class mindspore.nn.DenseQuant(in_channels, out_channels, weight_init='normal', bias_init='zeros', has_bias=True, activation=None, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

The fully connected layer with fake quant op.

This part is a more detailed overview of Dense op.

Parameters
  • in_channels (int) – The dimension of the input space.

  • out_channels (int) – The dimension of the output space.

  • weight_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable weight_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘normal’.

  • bias_init (Union[Tensor, str, Initializer, numbers.Number]) – The trainable bias_init parameter. The dtype is same as input x. The values of str refer to the function initializer. Default: ‘zeros’.

  • has_bias (bool) – Specifies whether the layer uses a bias vector. Default: True.

  • activation (str) – The regularization function applied to the output of the layer, eg. ‘relu’. Default: None.

  • per_channel (bool) – FakeQuantWithMinMax Parameters. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> dense_quant = nn.DenseQuant(3, 6)
>>> input_x = Tensor(np.random.randint(-2, 2, (2, 3)), mindspore.float32)
>>> result = dense_quant(input_x)
construct(x)[source]

Use operators to construct the Dense layer.

extend_repr()[source]

A pretty print for Dense layer.

class mindspore.nn.DistributedGradReducer(parameters, mean=True, degree=None)[source]

A distributed optimizer.

Constructs a gradient reducer Cell, which applies communication and average operations on single-process gradient values.

Parameters
  • parameters (list) – the parameters to be updated.

  • mean (bool) – When mean is true, the mean coefficient (degree) would apply on gradients. Default: False.

  • degree (int) – The mean coefficient. Usually it equals to device number. Default: None.

Raises

ValueError – If degree is not a int or less than 0.

Examples

>>> from mindspore.communication import init, get_group_size
>>> from mindspore.ops import composite as C
>>> from mindspore.ops import operations as P
>>> from mindspore.ops import functional as F
>>> from mindspore import context
>>> from mindspore.context import ParallelMode
>>> from mindspore import nn
>>> from mindspore import ParameterTuple
>>>
>>> device_id = int(os.environ["DEVICE_ID"])
>>> context.set_context(mode=context.GRAPH_MODE, device_target="Ascend", save_graphs=True,
>>>                     device_id=int(device_id))
>>> init()
>>> context.reset_auto_parallel_context()
>>> context.set_auto_parallel_context(parallel_mode=ParallelMode.DATA_PARALLEL)
>>>
>>>
>>> class TrainingWrapper(nn.Cell):
>>>     def __init__(self, network, optimizer, sens=1.0):
>>>         super(TrainingWrapper, self).__init__(auto_prefix=False)
>>>         self.network = network
>>>         self.network.add_flags(defer_inline=True)
>>>         self.weights = optimizer.parameters
>>>         self.optimizer = optimizer
>>>         self.grad = C.GradOperation(get_by_list=True, sens_param=True)
>>>         self.sens = sens
>>>         self.reducer_flag = False
>>>         self.grad_reducer = None
>>>         self.parallel_mode = context.get_auto_parallel_context("parallel_mode")
>>>         if self.parallel_mode in [ParallelMode.DATA_PARALLEL,
>>>                                            ParallelMode.HYBRID_PARALLEL]:
>>>             self.reducer_flag = True
>>>         if self.reducer_flag:
>>>             mean = context.get_auto_parallel_context("gradients_mean")
>>>             if mean.get_device_num_is_set():
>>>                 degree = context.get_auto_parallel_context("device_num")
>>>             else:
>>>                 degree = get_group_size()
>>>             self.grad_reducer = nn.DistributedGradReducer(optimizer.parameters, mean, degree)
>>>
>>>     def construct(self, *args):
>>>         weights = self.weights
>>>         loss = self.network(*args)
>>>         sens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)
>>>         grads = self.grad(self.network, weights)(*args, sens)
>>>         if self.reducer_flag:
>>>             # apply grad reducer on grads
>>>             grads = self.grad_reducer(grads)
>>>         return F.depend(loss, self.optimizer(grads))
>>>
>>> network = Net()
>>> optimizer = nn.Momentum(network.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> train_cell = TrainingWrapper(network, optimizer)
>>> inputs = Tensor(np.ones([16, 16]).astype(np.float32))
>>> label = Tensor(np.zeros([16, 16]).astype(np.float32))
>>> grads = train_cell(inputs, label)
construct(grads)[source]

Under certain circumstances, the data precision of grads could be mixed with float16 and float32. Thus, the result of AllReduce is unreliable. To solve the problem, grads must be cast to float32 before AllReduce, and cast back after the operation.

Parameters

grads (Union[Tensor, tuple[Tensor]]) – The gradient tensor or tuple before operation.

Returns

new_grads (Union[Tensor, tuple[Tensor]]), the gradient tensor or tuple after operation.

class mindspore.nn.Dropout(keep_prob=0.5, dtype=mindspore.float32)[source]

Dropout layer for the input.

Randomly set some elements of the input tensor to zero with probability \(1 - keep\_prob\) during training using samples from a Bernoulli distribution.

Note

Each channel will be zeroed out independently on every construct call.

The outputs are scaled by a factor of \(\frac{1}{keep\_prob}\) during training so that the output layer remains at a similar scale. During inference, this layer returns the same tensor as the input.

This technique is proposed in paper Dropout: A Simple Way to Prevent Neural Networks from Overfitting and proved to be effective to reduce over-fitting and prevents neurons from co-adaptation. See more details in Improving neural networks by preventing co-adaptation of feature detectors.

Parameters
  • keep_prob (float) – The keep rate, greater than 0 and less equal than 1. E.g. rate=0.9, dropping out 10% of input units. Default: 0.5.

  • dtype (mindspore.dtype) – Data type of input. Default: mindspore.float32.

Raises

ValueError – If keep_prob is not in range (0, 1].

Inputs:
  • input (Tensor) - The input tensor.

Outputs:

Tensor, output tensor with the same shape as the input.

Examples

>>> x = Tensor(np.ones([2, 2, 3]), mindspore.float32)
>>> net = nn.Dropout(keep_prob=0.8)
>>> net(x)
[[[1.0, 1.0, 1.0],
  [1.0, 1.0, 1.0]],
 [[1.0, 1.0, 1.0],
  [1.0, 1.0, 1.0]]]
class mindspore.nn.DynamicLossScaleUpdateCell(loss_scale_value, scale_factor, scale_window)[source]

Dynamic Loss scale update cell.

For loss scaling training, the initial loss scaling value will be set to be loss_scale_value. In each training step, the loss scaling value will be updated by loss scaling value/scale_factor when there is an overflow. And it will be increased by loss scaling value * scale_factor if there is no overflow for a continuous scale_window steps. This cell is used for Graph mode training in which all logic will be executed on device side(Another training mode is normal(non-sink) mode in which some logic will be executed on host).

Parameters
  • loss_scale_value (float) – Initializes loss scale.

  • scale_factor (int) – Coefficient of increase and decrease.

  • scale_window (int) – Maximum continuous training steps that do not have overflow.

Inputs:
  • inputs (Tensor) - Tensor of shape \((N, \ldots)\).

  • label (Tensor) - Tensor of shape \((N, \ldots)\).

Outputs:

Tensor, a scalar Tensor with shape \(()\).

Examples

>>> net_with_loss = Net()
>>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> manager = nn.DynamicLossScaleUpdateCell(loss_scale_value=2**12, scale_factor=2, scale_window=1000)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_update_cell=manager)
>>> train_network.set_train()
>>>
>>> inputs = Tensor(np.ones([16, 16]).astype(np.float32))
>>> label = Tensor(np.zeros([16, 16]).astype(np.float32))
>>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mindspore.float32)
>>> output = train_network(inputs, label, scaling_sens)
class mindspore.nn.ELU(alpha=1.0)[source]

Exponential Linear Uint activation function.

Applies the exponential linear unit function element-wise. The activation function is defined as:

\[E_{i} = \begin{cases} x, &\text{if } x \geq 0; \cr \text{alpha} * (\exp(x_i) - 1), &\text{otherwise.} \end{cases}\]
Parameters

alpha (float) – The coefficient of negative factor whose type is float. Default: 1.0.

Inputs:
  • input_data (Tensor) - The input of ELU.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float32)
>>> elu = nn.ELU()
>>> elu(input_x)
class mindspore.nn.Embedding(vocab_size, embedding_size, use_one_hot=False, embedding_table='normal', dtype=mindspore.float32)[source]

A simple lookup table that stores embeddings of a fixed dictionary and size.

This module is often used to store word embeddings and retrieve them using indices. The input to the module is a list of indices, and the output is the corresponding word embeddings.

Note

When ‘use_one_hot’ is set to True, the type of the input must be mindspore.int32.

Parameters
  • vocab_size (int) – Size of the dictionary of embeddings.

  • embedding_size (int) – The size of each embedding vector.

  • use_one_hot (bool) – Specifies whether to apply one_hot encoding form. Default: False.

  • embedding_table (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the embedding_table. Refer to class initializer for the values of string when a string is specified. Default: ‘normal’.

  • dtype (mindspore.dtype) – Data type of input. Default: mindspore.float32.

Inputs:
  • input (Tensor) - Tensor of shape \((\text{batch_size}, \text{input_length})\). The elements of the Tensor must be integer and not larger than vocab_size. Otherwise the corresponding embedding vector will be zero.

Outputs:

Tensor of shape \((\text{batch_size}, \text{input_length}, \text{embedding_size})\).

Examples

>>> net = nn.Embedding(20000, 768,  True)
>>> input_data = Tensor(np.ones([8, 128]), mindspore.int32)
>>>
>>> # Maps the input word IDs to word embedding.
>>> output = net(input_data)
>>> output.shape
(8, 128, 768)
class mindspore.nn.EmbeddingLookup(vocab_size, embedding_size, param_init='normal', target='CPU', slice_mode='batch_slice', manual_shapes=None)[source]

Returns a slice of input tensor based on the specified indices.

Note

When ‘target’ is set to ‘CPU’, this module will use P.EmbeddingLookup().add_prim_attr(‘primitive_target’, ‘CPU’) which specified ‘offset = 0’ to lookup table. When ‘target’ is set to ‘DEVICE’, this module will use P.GatherV2() which specified ‘axis = 0’ to lookup table. In field slice mode, the manual_shapes must be given. It is a tuple ,where the element is vocab[i], vocab[i] is the row numbers for i-th part.

Parameters
  • vocab_size (int) – Size of the dictionary of embeddings.

  • embedding_size (int) – The size of each embedding vector.

  • param_init (str) – The initialize way of embedding table. Default: ‘normal’.

  • target (str) – Specifies the target where the op is executed. The value must in [‘DEVICE’, ‘CPU’]. Default: ‘CPU’.

  • slice_mode (str) – The slicing way in semi_auto_parallel/auto_parallel. The value must get through nn.EmbeddingLookup. Default: nn.EmbeddingLookup.BATCH_SLICE.

  • manual_shapes (tuple) – The accompaniment array in field slice mode.

Inputs:
  • input_indices (Tensor) - The shape of tensor is \((y_1, y_2, ..., y_S)\). Specifies the indices of elements of the original Tensor. Values can be out of range of embedding_table, and the exceeding part will be filled with 0 in the output. Input_indices must only be a 2d tensor in this interface.

Outputs:

Tensor, the shape of tensor is \((z_1, z_2, ..., z_N)\).

Examples

>>> input_indices = Tensor(np.array([[1, 0], [3, 2]]), mindspore.int32)
>>> out = nn.EmbeddingLookup(4,2)(input_indices)
class mindspore.nn.ExponentialDecayLR(learning_rate, decay_rate, decay_steps, is_stair=False)[source]

Calculate learning rate base on exponential decay function.

For the i-th step, the formula of computing decayed_learning_rate[i] is:

\[decayed\_learning\_rate[i] = learning\_rate * decay\_rate^{p}\]

Where :

\[p = \frac{current\_step}{decay\_steps}\]

If is_stair is True, the formula is :

\[p = floor(\frac{current\_step}{decay\_steps})\]
Parameters
  • learning_rate (float) – The initial value of learning rate.

  • decay_rate (float) – The decay rate.

  • decay_steps (int) – A value used to calculate decayed learning rate.

  • is_stair (bool) – If true, learning rate is decayed once every decay_steps time. Default: False.

Inputs:

Tensor. The current step number.

Returns

Tensor. The learning rate value for the current step.

Examples

>>> learning_rate = 0.1
>>> decay_rate = 0.9
>>> decay_steps = 4
>>> global_step = Tensor(2, mstype.int32)
>>> exponential_decay_lr = ExponentialDecayLR(learning_rate, decay_rate, decay_steps)
>>> exponential_decay_lr(global_step)
class mindspore.nn.F1[source]

Calculates the F1 score. F1 is a special case of Fbeta when beta is 1. Refer to class Fbeta for more details.

\[F_1=\frac{2\cdot true\_positive}{2\cdot true\_positive + false\_negative + false\_positive}\]

Examples

>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> y = Tensor(np.array([1, 0, 1]))
>>> metric = nn.F1()
>>> metric.update(x, y)
>>> f1 = metric.eval()
class mindspore.nn.FTRL(params, initial_accum=0.1, learning_rate=0.001, lr_power=- 0.5, l1=0.0, l2=0.0, use_locking=False, loss_scale=1.0, weight_decay=0.0)[source]

Implement the FTRL algorithm with ApplyFtrl Operator.

FTRL is an online convex optimization algorithm that adaptively chooses its regularization function based on the loss functions. Refer to paper Adaptive Bound Optimization for Online Convex Optimization. Refer to paper Ad Click Prediction: a View from the Trenches for engineering document.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on all of the parameters.

To improve parameter groups performance, the customized order of parameters can be supported.

The sparse strategy is applied while the SparseGatherV2 operator being used for forward network. The sparse feature is under continuous development. The sparse behavior is currently performed on the CPU.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Using different learning rate by separating parameters is currently not supported.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • initial_accum (float) – The starting value for accumulators, must be zero or positive values. Default: 0.1.

  • learning_rate (float) – The learning rate value, must be zero or positive, dynamic learning rate is currently not supported. Default: 0.001.

  • lr_power (float) – Learning rate power controls how the learning rate decreases during training, must be less than or equal to zero. Use fixed learning rate if lr_power is zero. Default: -0.5.

  • l1 (float) – l1 regularization strength, must be greater than or equal to zero. Default: 0.0.

  • l2 (float) – l2 regularization strength, must be greater than or equal to zero. Default: 0.0.

  • use_locking (bool) – If true, use locks for updating operation. Default: False.

  • loss_scale (float) – Value for the loss scale. It must be equal to or greater than 1.0. Default: 1.0.

  • weight_decay (float) – Weight decay value to multiply weight, must be zero or positive value. Default: 0.0.

Inputs:
  • grads (tuple[Tensor]) - The gradients of params in the optimizer, the shape is the same as the params in optimizer.

Outputs:

tuple[Parameter], the updated parameters, the shape is the same as params.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.FTRL(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.FTRL(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use weight decay of 0.01.
>>> # The no_conv_params's parameters will use default weight decay of 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.FakeQuantWithMinMax(min_init=- 6, max_init=6, ema=False, ema_decay=0.999, per_channel=False, channel_axis=1, num_channels=1, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

Quantization aware op. This OP provides the fake quantization observer function on data with min and max.

Parameters
  • min_init (int, float) – The dimension of channel or 1(layer). Default: -6.

  • max_init (int, float) – The dimension of channel or 1(layer). Default: 6.

  • ema (bool) – The exponential Moving Average algorithm updates min and max. Default: False.

  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • channel_axis (int) – Quantization by channel axis. Default: 1.

  • num_channels (int) – declarate the min and max channel size, Default: 1.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – Whether the quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – Whether the quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - The input of FakeQuantWithMinMax.

Outputs:

Tensor, with the same type and shape as the x.

Examples

>>> fake_quant = FakeQuantWithMinMax()
>>> input_x = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> result = fake_quant(input_x)
class mindspore.nn.Fbeta(beta)[source]

Calculates the fbeta score.

Fbeta score is a weighted mean of precison and recall.

\[F_\beta=\frac{(1+\beta^2) \cdot true\_positive} {(1+\beta^2) \cdot true\_positive +\beta^2 \cdot false\_negative + false\_positive}\]
Parameters

beta (Union[float, int]) – The weight of precision.

Examples

>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> y = Tensor(np.array([1, 0, 1]))
>>> metric = nn.Fbeta(1)
>>> metric.update(x, y)
>>> fbeta = metric.eval()
clear()[source]

Clears the internal evaluation result.

eval(average=False)[source]

Computes the fbeta.

Parameters

average (bool) – Whether to calculate the average fbeta. Default value is False.

Returns

Float, computed result.

update(*inputs)[source]

Updates the internal evaluation result y_pred and y.

Parameters

inputs – Input y_pred and y. y_pred and y are Tensor, list or numpy.ndarray. y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. y contains values of integers. The shape is \((N, C)\) if one-hot encoding is used. Shape can also be \((N,)\) if category index is used.

class mindspore.nn.FixedLossScaleUpdateCell(loss_scale_value)[source]

Static scale update cell, the loss scaling value will not be updated.

For usage, refer to DynamicLossScaleUpdateCell.

Parameters

loss_scale_value (float) – Initializes loss scale.

Examples

>>> net_with_loss = Net()
>>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> manager = nn.FixedLossScaleUpdateCell(loss_scale_value=2**12)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_update_cell=manager)
>>> train_network.set_train()
>>>
>>> inputs = Tensor(np.ones([16, 16]).astype(np.float32))
>>> label = Tensor(np.zeros([16, 16]).astype(np.float32))
>>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mindspore.float32)
>>> output = train_network(inputs, label, scaling_sens)
class mindspore.nn.Flatten[source]

Flatten layer for the input.

Flattens a tensor without changing dimension of batch size on the 0-th axis.

Inputs:
  • input (Tensor) - Tensor of shape \((N, \ldots)\) to be flattened.

Outputs:

Tensor, the shape of the output tensor is \((N, X)\), where \(X\) is the product of the remaining dimensions.

Examples

>>> net = nn.Flatten()
>>> input = Tensor(np.array([[[1.2, 1.2], [2.1, 2.1]], [[2.2, 2.2], [3.2, 3.2]]]), mindspore.float32)
>>> input.shape
(2, 2, 2)
>>> net(input)
[[1.2 1.2 2.1 2.1]
 [2.2 2.2 3.2 3.2]]
class mindspore.nn.GELU[source]

Gaussian error linear unit activation function.

Applies GELU function to each element of the input. The input is a Tensor with any valid shape.

GELU is defined as: \(GELU(x_i) = x_i*P(X < x_i)\), where \(P\) is the cumulative distribution function of standard Gaussian distribution and \(x_i\) is the element of the input.

Inputs:
  • input_data (Tensor) - The input of GELU.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> gelu = nn.GELU()
>>> gelu(input_x)
[[-1.5880802e-01  3.9999299e+00 -3.1077917e-21]
 [ 1.9545976e+00 -2.2918017e-07  9.0000000e+00]]
class mindspore.nn.GetNextSingleOp(dataset_types, dataset_shapes, queue_name)[source]

Cell to run for getting the next operation.

Parameters
  • dataset_types (list[mindspore.dtype]) – The types of dataset.

  • dataset_shapes (list[tuple[int]]) – The shapes of dataset.

  • queue_name (str) – Queue name to fetch the data.

For detailed information, refer to ops.operations.GetNext.

class mindspore.nn.GlobalBatchNorm(num_features, eps=1e-05, momentum=0.9, affine=True, gamma_init='ones', beta_init='zeros', moving_mean_init='zeros', moving_var_init='ones', use_batch_statistics=None, device_num_each_group=2)[source]

Global normalization layer over a N-dimension input.

Global Normalization is cross device synchronized batch normalization. The implementation of Batch Normalization only normalizes the data within each device. Global normalization will normalize the input within the group. It has been described in the paper Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. It rescales and recenters the feature using a mini-batch of data and the learned parameters which can be described in the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]

Note

Currently, GlobalBatchNorm only supports 2D and 4D inputs.

Parameters
  • num_features (int) – C from an expected input of size (N, C, H, W).

  • device_num_each_group (int) – The number of devices in each group. Default: 2.

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • momentum (float) – A floating hyperparameter of the momentum for the running_mean and running_var computation. Default: 0.9.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_mean_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving mean. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • moving_var_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the moving variance. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • use_batch_statistics (bool) – If true, use the mean value and variance value of current batch data. If false, use the mean value and variance value of specified value. If None, training process will use the mean and variance of current batch data and track the running mean and variance, eval process will use the running mean and variance. Default: None.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor, the normalized, scaled, offset tensor, of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> global_bn_op = nn.GlobalBatchNorm(num_features=3, device_num_each_group=4)
>>> input = Tensor(np.random.randint(0, 255, [1, 3, 224, 224]), mindspore.float32)
>>> global_bn_op(input)
class mindspore.nn.GraphKernel(auto_prefix=True, pips=None)[source]

Base class for GraphKernel.

A GraphKernel a composite of basic primitives and can be compiled into a fused kernel automatically when enable_graph_kernel in context is set to True.

Examples

>>> class Relu(GraphKernel):
>>>    def __init__(self):
>>>        super(Relu, self).__init__()
>>>        self.max = P.Maximum()
>>>
>>>    def construct(self, x):
>>>        return self.max(P.Fill()(P.DType()(x), P.Shape()(x), 0.0), x)
class mindspore.nn.GroupNorm(num_groups, num_channels, eps=1e-05, affine=True, gamma_init='ones', beta_init='zeros')[source]

Group Normalization over a mini-batch of inputs.

Group normalization is widely used in recurrent neural networks. It applies normalization on a mini-batch of inputs for each single training case as described in the paper Group Normalization. Group normalization divides the channels into groups and computes within each group the mean and variance for normalization, and it performs very stable over a wide range of batch size. It can be described using the following formula.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]
Parameters
  • num_groups (int) – The number of groups to be divided along the channel dimension.

  • num_channels (int) – The number of channels per group.

  • eps (float) – A value added to the denominator for numerical stability. Default: 1e-5.

  • affine (bool) – A bool value, this layer will have learnable affine parameters when set to true. Default: True.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’. If gamma_init is a Tensor, the shape must be [num_channels].

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’. If beta_init is a Tensor, the shape must be [num_channels].

Inputs:
  • input_x (Tensor) - The input feature with shape [N, C, H, W].

Outputs:

Tensor, the normalized and scaled offset tensor, has the same shape and data type as the input_x.

Examples

>>> goup_norm_op = nn.GroupNorm(2, 2)
>>> x = Tensor(np.ones([1, 2, 4, 4], np.float32))
>>> goup_norm_op(x)
[[[[0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]
   [0. 0. 0. 0.]]
[[0. 0. 0. 0.]

[0. 0. 0. 0.] [0. 0. 0. 0.] [0. 0. 0. 0.]]]]

extend_repr()[source]

Display instance object as string.

class mindspore.nn.HSigmoid[source]

Hard sigmoid activation function.

Applies hard sigmoid activation element-wise. The input is a Tensor with any valid shape.

Hard sigmoid is defined as:

\[\text{hsigmoid}(x_{i}) = max(0, min(1, \frac{x_{i} + 3}{6})),\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Inputs:
  • input_data (Tensor) - The input of HSigmoid.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hsigmoid = nn.HSigmoid()
>>> hsigmoid(input_x)
class mindspore.nn.HSigmoidQuant(activation, ema_decay=0.999, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

HSigmoidQuant activation function. Add Fake Quant OP before and after HSigmoid OP.

This part is a more detailed overview of HSigmoid op.

Parameters
  • activation (Cell) – Activation cell class.

  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – Whether the quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – Whether the quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - The input of HSigmoidQuant.

Outputs:

Tensor, with the same type and shape as the x.

Examples

>>> activation = nn.HSigmoidQuant(nn.HSigmoid())
>>> input = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> result = activation(input)
class mindspore.nn.HSwish[source]

Hard swish activation function.

Applies hswish-type activation element-wise. The input is a Tensor with any valid shape.

Hard swish is defined as:

\[\text{hswish}(x_{i}) = x_{i} * \frac{ReLU6(x_{i} + 3)}{6},\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Inputs:
  • input_data (Tensor) - The input of HSwish.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> hswish = nn.HSwish()
>>> hswish(input_x)
class mindspore.nn.HSwishQuant(activation, ema_decay=0.999, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

HSwishQuant activation function. Add Fake Quant OP after HSwish OP.

This part is a more detailed overview of HSwish op.

Parameters
  • activation (Cell) – Activation cell class.

  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – Whether the quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – Whether the quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - The input of HSwishQuant.

Outputs:

Tensor, with the same type and shape as the x.

Examples

>>> activation = nn.HSwishQuant(nn.HSwish())
>>> input = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> result = activation(input)
class mindspore.nn.ImageGradients[source]

Returns two tensors, the first is along the height dimension and the second is along the width dimension.

Assume an image shape is \(h*w\). The gradients along the height and the width are \(dy\) and \(dx\), respectively.

\[ \begin{align}\begin{aligned}dy[i] = \begin{cases} image[i+1, :]-image[i, :], &if\ 0<=i<h-1 \cr 0, &if\ i==h-1\end{cases}\\dx[i] = \begin{cases} image[:, i+1]-image[:, i], &if\ 0<=i<w-1 \cr 0, &if\ i==w-1\end{cases}\end{aligned}\end{align} \]
Inputs:
  • images (Tensor) - The input image data, with format ‘NCHW’.

Outputs:
  • dy (Tensor) - vertical image gradients, the same type and shape as input.

  • dx (Tensor) - horizontal image gradients, the same type and shape as input.

Examples

>>> net = nn.ImageGradients()
>>> image = Tensor(np.array([[[[1,2],[3,4]]]]), dtype=mstype.int32)
>>> net(image)
[[[[2,2]
   [0,0]]]]
[[[[1,0]
   [1,0]]]]
class mindspore.nn.InverseDecayLR(learning_rate, decay_rate, decay_steps, is_stair=False)[source]

Calculate learning rate base on inverse-time decay function.

For the i-th step, the formula of computing decayed_learning_rate[i] is:

\[decayed\_learning\_rate[i] = learning\_rate / (1 + decay\_rate * p)\]

Where :

\[p = \frac{current\_step}{decay\_steps}\]

If is_stair is True, The formula is :

\[p = floor(\frac{current\_step}{decay\_steps})\]
Parameters
  • learning_rate (float) – The initial value of learning rate.

  • decay_rate (float) – The decay rate.

  • decay_steps (int) – A value used to calculate decayed learning rate.

  • is_stair (bool) – If true, learning rate decay once every decay_steps times. Default: False.

Inputs:

Tensor. The current step number.

Returns

Tensor. The learning rate value for the current step.

Examples

>>> learning_rate = 0.1
>>> decay_rate = 0.9
>>> decay_steps = 4
>>> global_step = Tenosr(2, mstype.int32)
>>> inverse_decay_lr = InverseDecayLR(learning_rate, decay_rate, decay_steps, True)
>>> inverse_decay_lr(global_step)
class mindspore.nn.L1Loss(reduction='mean')[source]

L1Loss creates a criterion to measure the mean absolute error (MAE) between \(x\) and \(y\) by element, where \(x\) is the input Tensor and \(y\) is the target Tensor.

For simplicity, let \(x\) and \(y\) be 1-dimensional Tensor with length \(N\), the unreduced loss (i.e. with argument reduction set to ‘none’) of \(x\) and \(y\) is given as:

\[L(x, y) = \{l_1,\dots,l_N\}, \quad \text{with } l_n = \left| x_n - y_n \right|\]

When argument reduction is ‘mean’, the mean value of \(L(x, y)\) will be returned. When argument reduction is ‘sum’, the sum of \(L(x, y)\) will be returned. \(N\) is the batch size.

Parameters

reduction (str) – Type of reduction to be applied to loss. The optional values are “mean”, “sum”, and “none”. Default: “mean”.

Inputs:
  • input_data (Tensor) - Tensor of shape \((x_1, x_2, ..., x_R)\). The data type must be float16 or float32.

  • target_data (Tensor) - Tensor of shape \((y_1, y_2, ..., y_S)\). The data type must be float16 or float32.

Outputs:

Tensor, loss float tensor.

Examples

>>> loss = nn.L1Loss()
>>> input_data = Tensor(np.array([1, 2, 3]), mindspore.float32)
>>> target_data = Tensor(np.array([1, 2, 2]), mindspore.float32)
>>> loss(input_data, target_data)
0.33333334
class mindspore.nn.LARS(optimizer, epsilon=1e-05, coefficient=0.001, use_clip=False, lars_filter=<function LARS.<lambda>>)[source]

Implements the LARS algorithm with LARSUpdate Operator.

LARS is an optimization algorithm employing a large batch optimization technique. Refer to paper LARGE BATCH TRAINING OF CONVOLUTIONAL NETWORKS.

Parameters
  • optimizer (Optimizer) – MindSpore optimizer for which to wrap and modify gradients.

  • epsilon (float) – Term added to the denominator to improve numerical stability. Default: 1e-05.

  • coefficient (float) – Trust coefficient for calculating the local learning rate. Default: 0.001.

  • use_clip (bool) – Whether to use clip operation for calculating the local learning rate. Default: False.

  • lars_filter (Function) – A function to determine whether apply the LARS algorithm. Default: lambda x: ‘LayerNorm’ not in x.name and ‘bias’ not in x.name.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params in the optimizer, the shape is the as same as the params in the optimizer.

Outputs:

Union[Tensor[bool], tuple[Parameter]], it depends on the output of optimizer.

Examples

>>> net = Net()
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> opt = nn.Momentum(net.trainable_params(), 0.1, 0.9)
>>> opt_lars = nn.LARS(opt, epsilon=1e-08, coefficient=0.02)
>>> model = Model(net, loss_fn=loss, optimizer=opt_lars, metrics=None)
class mindspore.nn.LGamma[source]

Calculate LGamma using Lanczos’ approximation refering to “A Precision Approximationof the Gamma Function”. The algorithm is:

\[ \begin{align}\begin{aligned}lgamma(z + 1) = \frac{(\log(2) + \log(pi))}{2} + (z + 1/2) * log(t(z)) - t(z) + A(z)\\t(z) = z + kLanczosGamma + 1/2\\A(z) = kBaseLanczosCoeff + \sum_{k=1}^n \frac{kLanczosCoefficients[i]}{z + k}\end{aligned}\end{align} \]

However, if the input is less than 0.5 use Euler’s reflection formula:

\[lgamma(x) = \log(pi) - lgamma(1-x) - \log(abs(sin(pi * x)))\]

And please note that

\[lgamma(+/-inf) = +inf\]

Thus, the behaviour of LGamma follows: when x > 0.5, return log(Gamma(x)) when x < 0.5 and is not an interger, return the real part of Log(Gamma(x)) where Log is the complex logarithm when x is an integer less or equal to 0, return +inf when x = +/- inf, return +inf

Inputs:
  • input_x (Tensor[Number]) - The input tensor. Only float16, float32 are supported.

Outputs:

Tensor, has the same shape and dtype as the input_x.

Examples

>>> input_x = Tensor(np.array([2, 3, 4]).astype(np.float32))
>>> op = nn.LGamma()
>>> output = op(input_x)
[3.5762787e-07 6.9314754e-01 1.7917603e+00]
class mindspore.nn.LSTM(input_size, hidden_size, num_layers=1, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]

LSTM (Long Short-Term Memory) layer.

Applies a LSTM to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

Parameters
  • input_size (int) – Number of features of input.

  • hidden_size (int) – Number of features of hidden layer.

  • num_layers (int) – Number of layers of stacked LSTM . Default: 1.

  • has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.

  • batch_first (bool) – Specifies whether the first dimension of input is batch_size. Default: False.

  • dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].

  • bidirectional (bool) – Specifies whether it is a bidirectional LSTM. Default: False.

Inputs:
  • input (Tensor) - Tensor of shape (seq_len, batch_size, input_size).

  • hx (tuple) - A tuple of two Tensors (h_0, c_0) both of data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). Data type of hx must be the same as input.

Outputs:

Tuple, a tuple constains (output, (h_n, c_n)).

  • output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).

  • hx_n (tuple) - A tuple of two Tensor (h_n, c_n) both of shape (num_directions * num_layers, batch_size, hidden_size).

Examples

>>> class LstmNet(nn.Cell):
>>>     def __init__(self, input_size, hidden_size, num_layers, has_bias, batch_first, bidirectional):
>>>         super(LstmNet, self).__init__()
>>>         self.lstm = nn.LSTM(input_size=input_size,
>>>                             hidden_size=hidden_size,
>>>                             num_layers=num_layers,
>>>                             has_bias=has_bias,
>>>                             batch_first=batch_first,
>>>                             bidirectional=bidirectional,
>>>                             dropout=0.0)
>>>
>>>     def construct(self, inp, h0, c0):
>>>         return self.lstm(inp, (h0, c0))
>>>
>>> net = LstmNet(10, 12, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> input = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 12]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 12]).astype(np.float32))
>>> output, (hn, cn) = net(input, h0, c0)
class mindspore.nn.LSTMCell(input_size, hidden_size, has_bias=True, batch_first=False, dropout=0, bidirectional=False)[source]

LSTM (Long Short-Term Memory) layer.

Applies a LSTM layer to the input.

There are two pipelines connecting two consecutive cells in a LSTM model; one is cell state pipeline and the other is hidden state pipeline. Denote two consecutive time nodes as \(t-1\) and \(t\). Given an input \(x_t\) at time \(t\), an hidden state \(h_{t-1}\) and an cell state \(c_{t-1}\) of the layer at time \({t-1}\), the cell state and hidden state at time \(t\) is computed using an gating mechanism. Input gate \(i_t\) is designed to protect the cell from perturbation by irrelevant inputs. Forget gate \(f_t\) affords protection of the cell by forgetting some information in the past, which is stored in \(h_{t-1}\). Output gate \(o_t\) protects other units from perturbation by currently irrelevant memory contents. Candidate cell state \(\tilde{c}_t\) is calculated with the current input, on which the input gate will be applied. Finally, current cell state \(c_{t}\) and hidden state \(h_{t}\) are computed with the calculated gates and cell states. The complete formulation is as follows.

\[\begin{split}\begin{array}{ll} \\ i_t = \sigma(W_{ix} x_t + b_{ix} + W_{ih} h_{(t-1)} + b_{ih}) \\ f_t = \sigma(W_{fx} x_t + b_{fx} + W_{fh} h_{(t-1)} + b_{fh}) \\ \tilde{c}_t = \tanh(W_{cx} x_t + b_{cx} + W_{ch} h_{(t-1)} + b_{ch}) \\ o_t = \sigma(W_{ox} x_t + b_{ox} + W_{oh} h_{(t-1)} + b_{oh}) \\ c_t = f_t * c_{(t-1)} + i_t * \tilde{c}_t \\ h_t = o_t * \tanh(c_t) \\ \end{array}\end{split}\]

Here \(\sigma\) is the sigmoid function, and \(*\) is the Hadamard product. \(W, b\) are learnable weights between the output and the input in the formula. For instance, \(W_{ix}, b_{ix}\) are the weight and bias used to transform from input \(x\) to \(i\). Details can be found in paper LONG SHORT-TERM MEMORY and Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling.

Parameters
  • input_size (int) – Number of features of input.

  • hidden_size (int) – Number of features of hidden layer.

  • layer_index (int) – index of current layer of stacked LSTM . Default: 0.

  • has_bias (bool) – Whether the cell has bias b_ih and b_hh. Default: True.

  • batch_first (bool) – Specifies whether the first dimension of input is batch_size. Default: False.

  • dropout (float, int) – If not 0, append Dropout layer on the outputs of each LSTM layer except the last layer. Default 0. The range of dropout is [0.0, 1.0].

  • bidirectional (bool) – Specifies whether this is a bidirectional LSTM. If set True, number of directions will be 2 otherwise number of directions is 1. Default: False.

Inputs:
  • input (Tensor) - Tensor of shape (seq_len, batch_size, input_size).

  • h - data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size).

  • c - data type mindspore.float32 or mindspore.float16 and shape (num_directions * num_layers, batch_size, hidden_size). Data type of h’ and ‘c’ must be the same of `input.

Outputs:

output, h_n, c_n, ‘reserve’, ‘state’.

  • output (Tensor) - Tensor of shape (seq_len, batch_size, num_directions * hidden_size).

  • h - A Tensor with shape (num_directions * num_layers, batch_size, hidden_size).

  • c - A Tensor with shape (num_directions * num_layers, batch_size, hidden_size).

  • reserve - reserved

  • state - reserved

Examples

>>> class LstmNet(nn.Cell):
>>>     def __init__(self, input_size, hidden_size, layer_index, has_bias, batch_first, bidirectional):
>>>         super(LstmNet, self).__init__()
>>>         self.lstm = nn.LSTMCell(input_size=input_size,
>>>                             hidden_size=hidden_size,
>>>                             layer_index=layer_index,
>>>                             has_bias=has_bias,
>>>                             batch_first=batch_first,
>>>                             bidirectional=bidirectional,
>>>                             dropout=0.0)
>>>
>>>     def construct(self, inp, h0, c0):
>>>         return self.lstm(inp, (h0, c0))
>>>
>>> net = LstmNet(10, 12, 2, has_bias=True, batch_first=True, bidirectional=False)
>>> input = Tensor(np.ones([3, 5, 10]).astype(np.float32))
>>> h0 = Tensor(np.ones([1 * 2, 3, 12]).astype(np.float32))
>>> c0 = Tensor(np.ones([1 * 2, 3, 12]).astype(np.float32))
>>> output, hn, cn, _, _ = net(input, h0, c0)
class mindspore.nn.Lamb(params, learning_rate, beta1=0.9, beta2=0.999, eps=1e-06, weight_decay=0.0)[source]

Lamb Dynamic Learning Rate.

LAMB is an optimization algorithm employing a layerwise adaptive large batch optimization technique. Refer to the paper LARGE BATCH OPTIMIZATION FOR DEEP LEARNING: TRAINING BERT IN 76 MINUTES.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float.

  • beta1 (float) – The exponential decay rate for the 1st moment estimations. Default: 0.9. Should be in range (0.0, 1.0).

  • beta2 (float) – The exponential decay rate for the 2nd moment estimations. Default: 0.999. Should be in range (0.0, 1.0).

  • eps (float) – Term added to the denominator to improve numerical stability. Default: 1e-6. Should be greater than 0.

  • weight_decay (float) – Weight decay (L2 penalty). Default: 0.0. Should be equal to or greater than 0.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

tuple[bool], all elements are True.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.Lamb(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> poly_decay_lr = learning_rate_schedule.PolynomialDecayLR()
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': poly_decay_lr},
>>>                 {'order_params': net.trainable_params(0.01, 0.0001, 10, 0.5)}]
>>> optim = nn.Lamb(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
>>> # The no_conv_params's parameters will use dynamic learning rate of poly decay learning rate and default
>>> # weight decay of 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.LayerNorm(normalized_shape, begin_norm_axis=- 1, begin_params_axis=- 1, gamma_init='ones', beta_init='zeros', epsilon=1e-07)[source]

Applies Layer Normalization over a mini-batch of inputs.

Layer normalization is widely used in recurrent neural networks. It applies normalization on a mini-batch of inputs for each single training case as described in the paper Layer Normalization. Unlike batch normalization, layer normalization performs exactly the same computation at training and testing time. It can be described using the following formula. It is applied across all channels and pixel but only one batch size.

\[y = \frac{x - \mathrm{E}[x]}{\sqrt{\mathrm{Var}[x] + \epsilon}} * \gamma + \beta\]
Parameters
  • normalized_shape (Union(tuple[int], list[int]) – The normalization is performed over axis begin_norm_axis … R - 1.

  • begin_norm_axis (int) – The first normalization dimension: normalization will be performed along dimensions begin_norm_axis: rank(inputs), the value should be in [-1, rank(input)). Default: -1.

  • begin_params_axis (int) – The first parameter(beta, gamma)dimension: scale and centering parameters will have dimensions begin_params_axis: rank(inputs) and will be broadcast with the normalized inputs accordingly, the value should be in [-1, rank(input)). Default: -1.

  • gamma_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the gamma weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘ones’.

  • beta_init (Union[Tensor, str, Initializer, numbers.Number]) – Initializer for the beta weight. The values of str refer to the function initializer including ‘zeros’, ‘ones’, ‘xavier_uniform’, ‘he_uniform’, etc. Default: ‘zeros’.

  • epsilon (float) – A value added to the denominator for numerical stability. Default: 1e-7.

Inputs:
  • input_x (Tensor) - The shape of ‘input_x’ is \((x_1, x_2, ..., x_R)\), and input_shape[begin_norm_axis:] is equal to normalized_shape.

Outputs:

Tensor, the normalized and scaled offset tensor, has the same shape and data type as the input_x.

Examples

>>> x = Tensor(np.ones([20, 5, 10, 10]), mindspore.float32)
>>> shape1 = x.shape[1:]
>>> m = nn.LayerNorm(shape1,  begin_norm_axis=1, begin_params_axis=1)
>>> m(x).shape
(20, 5, 10, 10)
extend_repr()[source]

Display instance object as string.

class mindspore.nn.LazyAdam(params, learning_rate=0.001, beta1=0.9, beta2=0.999, eps=1e-08, use_locking=False, use_nesterov=False, weight_decay=0.0, loss_scale=1.0)[source]

Updates gradients by Adaptive Moment Estimation (Adam) algorithm.

The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization.

The updating formulas are as follows,

\[\begin{split}\begin{array}{ll} \\ m = \beta_1 * m + (1 - \beta_1) * g \\ v = \beta_2 * v + (1 - \beta_2) * g * g \\ l = \alpha * \frac{\sqrt{1-\beta_2^t}}{1-\beta_1^t} \\ w = w - l * \frac{m}{\sqrt{v} + \epsilon} \end{array}\end{split}\]

\(m\) represents the 1st moment vector moment1, \(v\) represents the 2nd moment vector moment2, \(g\) represents gradients, \(l\) represents scaling factor lr, \(\beta_1, \beta_2\) represent beta1 and beta2, \(t\) represents updating step while \(beta_1^t\) and \(beta_2^t\) represent beta1_power and beta2_power, \(\alpha\) represents learning_rate, \(w\) represents params, \(\epsilon\) represents eps.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

The sparse strategy is applied while the SparseGatherV2 operator being used for forward network. The sparse behavior, to be notice, is not equivalent to the original Adam algorithm, as only the current indices parames will be updated. The sparse feature is under continuous development. The sparse behavior is currently performed on the CPU.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr” and “weight_decay” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float. Default: 1e-3.

  • beta1 (float) – The exponential decay rate for the 1st moment estimations. Should be in range (0.0, 1.0). Default: 0.9.

  • beta2 (float) – The exponential decay rate for the 2nd moment estimations. Should be in range (0.0, 1.0). Default: 0.999.

  • eps (float) – Term added to the denominator to improve numerical stability. Should be greater than 0. Default: 1e-8.

  • use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If true, updates of the var, m, and v tensors will be protected by a lock. If false, the result is unpredictable. Default: False.

  • use_nesterov (bool) – Whether to use Nesterov Accelerated Gradient (NAG) algorithm to update the gradients. If true, update the gradients using NAG. If true, update the gradients without using NAG. Default: False.

  • weight_decay (float) – Weight decay (L2 penalty). Default: 0.0.

  • loss_scale (float) – A floating point value for the loss scale. Should be equal to or greater than 1. Default: 1.0.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

Tensor[bool], the value is True.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.LazyAdam(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.LazyAdam(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
>>> # The no_conv_params's parameters will use learning rate of 0.01 and default weight decay of 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.LeakyReLU(alpha=0.2)[source]

Leaky ReLU activation function.

LeakyReLU is similar to ReLU, but LeakyReLU has a slope that makes it not equal to 0 at x < 0. The activation function is defined as:

\[\text{leaky_relu}(x) = \begin{cases}x, &\text{if } x \geq 0; \cr \text{alpha} * x, &\text{otherwise.}\end{cases}\]

See https://ai.stanford.edu/~amaas/papers/relu_hybrid_icml2013_final.pdf

Parameters

alpha (Union[int, float]) – Slope of the activation function at x < 0. Default: 0.2.

Inputs:
  • input_x (Tensor) - The input of LeakyReLU.

Outputs:

Tensor, has the same type and shape as the input_x.

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> leaky_relu = nn.LeakyReLU()
>>> leaky_relu(input_x)
[[-0.2  4.  -1.6]
 [ 2   -1.   9.]]
class mindspore.nn.LeakyReLUQuant(activation, ema_decay=0.999, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

LeakyReLUQuant activation function. Add Fake Quant OP after HSwish OP.

This part is a more detailed overview of HSwish op.

Parameters
  • activation (Cell) – Activation cell class.

  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - The input of LeakyReLUQuant.

Outputs:

Tensor, with the same type and shape as the x.

Examples

>>> activation = nn.LeakyReLUQuant(nn.LeakyReLU())
>>> input = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> result = activation(input)
class mindspore.nn.LinSpace(start, stop, num)[source]

Generates values in an interval.

Parameters
  • start (Union[int, float]) – The start of interval. With shape of 0-D.

  • stop (Union[int, float]) – The end of interval. With shape of 0-D.

  • num (int) – ticks number in the interval, the ticks include start and stop value. With shape of 0-D.

Outputs:

Tensor, With type same as start. The shape is 1-D with length of num.

Examples

>>> linspace = nn.LinSpace(1, 10, 5)
>>> output = linspace()
[1, 3.25, 5.5, 7.75, 10]
class mindspore.nn.LogSigmoid[source]

Logsigmoid activation function.

Applies logsigmoid activation element-wise. The input is a Tensor with any valid shape.

Logsigmoid is defined as:

\[\text{logsigmoid}(x_{i}) = log(\frac{1}{1 + \exp(-x_i)}),\]

where \(x_{i}\) is the element of the input.

Inputs:
  • input_data (Tensor) - The input of LogSigmoid.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> net = nn.LogSigmoid()
>>> input_x = Tensor(np.array([1.0, 2.0, 3.0]), mindspore.float32)
>>> logsigmoid = net(input_x)
[-3.1326166e-01, -1.2692806e-01, -4.8587345e-02]
class mindspore.nn.LogSoftmax(axis=- 1)[source]

LogSoftmax activation function.

Applies the LogSoftmax function to n-dimensional input tensor.

The input is transformed by the Softmax function and then by the log function to lie in range[-inf,0).

Logsoftmax is defined as: \(\text{logsoftmax}(x_i) = \log \left(\frac{\exp(x_i)}{\sum_{j=0}^{n-1} \exp(x_j)}\right)\), where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Parameters

axis (int) – The axis to apply LogSoftmax operation, -1 means the last dimension. Default: -1.

Inputs:
  • x (Tensor) - The input of LogSoftmax.

Outputs:

Tensor, which has the same type and shape as the input as x with values in the range[-inf,0).

Examples

>>> input_x = Tensor(np.array([[-1.0, 4.0, -8.0], [2.0, -5.0, 9.0]]), mindspore.float32)
>>> log_softmax = nn.LogSoftmax()
>>> log_softmax(input_x)
[[-5.00672150e+00 -6.72150636e-03 -1.20067215e+01]
 [-7.00091219e+00 -1.40009127e+01 -9.12250078e-04]]
class mindspore.nn.Loss[source]

Calculates the average of the loss. If method ‘update’ is called every \(n\) iterations, the result of evaluation will be:

\[loss = \frac{\sum_{k=1}^{n}loss_k}{n}\]

Examples

>>> x = Tensor(np.array(0.2), mindspore.float32)
>>> loss = nn.Loss()
>>> loss.clear()
>>> loss.update(x)
>>> result = loss.eval()
clear()[source]

Clears the internal evaluation result.

eval()[source]

Calculates the average of the loss.

Returns

Float, the average of the loss.

Raises

RuntimeError – If the total number is 0.

update(*inputs)[source]

Updates the internal evaluation result.

Parameters

inputs – Inputs contain only one element, the element is loss. The dimension of loss must be 0 or 1.

Raises
  • ValueError – If the length of inputs is not 1.

  • ValueError – If the dimensions of loss is not 1.

class mindspore.nn.MAE[source]

Calculates the mean absolute error.

Creates a criterion that measures the mean absolute error (MAE) between each element in the input: \(x\) and the target: \(y\).

\[\text{MAE} = \frac{\sum_{i=1}^n \|y_i - x_i\|}{n}\]

Here \(y_i\) is the prediction and \(x_i\) is the true value.

Note

The method update must be called with the form update(y_pred, y).

Examples

>>> x = Tensor(np.array([0.1, 0.2, 0.6, 0.9]), mindspore.float32)
>>> y = Tensor(np.array([0.1, 0.25, 0.7, 0.9]), mindspore.float32)
>>> error = nn.MAE()
>>> error.clear()
>>> error.update(x, y)
>>> result = error.eval()
clear()[source]

Clears the internal evaluation result.

eval()[source]

Computes the mean absolute error.

Returns

Float, the computed result.

Raises

RuntimeError – If the number of the total samples is 0.

update(*inputs)[source]

Updates the internal evaluation result \(y_{pred}\) and \(y\).

Parameters

inputs – Input y_pred and y for calculating mean absolute error where the shape of y_pred and y are both N-D and the shape are the same.

Raises

ValueError – If the number of the input is not 2.

class mindspore.nn.MSE[source]

Measures the mean squared error.

Creates a criterion that measures the mean squared error (squared L2 norm) between each element in the input: \(x\) and the target: \(y\).

\[\text{MSE}(x,\ y) = \frac{\sum_{i=1}^n(y_i - x_i)^2}{n},\]

where \(n\) is batch size.

Examples

>>> x = Tensor(np.array([0.1, 0.2, 0.6, 0.9]), mindspore.float32)
>>> y = Tensor(np.array([0.1, 0.25, 0.5, 0.9]), mindspore.float32)
>>> error = nn.MSE()
>>> error.clear()
>>> error.update(x, y)
>>> result = error.eval()
clear()[source]

Clear the internal evaluation result.

eval()[source]

Compute the mean squared error.

Returns

Float, the computed result.

Raises

RuntimeError – If the number of samples is 0.

update(*inputs)[source]

Updates the internal evaluation result \(y_{pred}\) and \(y\).

Parameters

inputs – Input y_pred and y for calculating mean square error where the shape of y_pred and y are both N-D and the shape are the same.

Raises

ValueError – If the number of input is not 2.

class mindspore.nn.MSELoss(reduction='mean')[source]

MSELoss creates a criterion to measure the mean squared error (squared L2-norm) between \(x\) and \(y\) by element, where \(x\) is the input and \(y\) is the target.

For simplicity, let \(x\) and \(y\) be 1-dimensional Tensor with length \(N\), the unreduced loss (i.e. with argument reduction set to ‘none’) of \(x\) and \(y\) is given as:

\[L(x, y) = \{l_1,\dots,l_N\}, \quad \text{with} \quad l_n = (x_n - y_n)^2.\]

When argument reduction is ‘mean’, the mean value of \(L(x, y)\) will be returned. When argument reduction is ‘sum’, the sum of \(L(x, y)\) will be returned. \(N\) is the batch size.

Parameters

reduction (str) – Type of reduction to be applied to loss. The optional values are “mean”, “sum”, and “none”. Default: “mean”.

Inputs:
  • input_data (Tensor) - Tensor of shape \((x_1, x_2, ..., x_R)\).

  • target_data (Tensor) - Tensor of shape \((y_1, y_2, ..., y_S)\).

Outputs:

Tensor, weighted loss float tensor.

Examples

>>> loss = nn.MSELoss()
>>> input_data = Tensor(np.array([1, 2, 3]), mindspore.float32)
>>> target_data = Tensor(np.array([1, 2, 2]), mindspore.float32)
>>> loss(input_data, target_data)
class mindspore.nn.MSSSIM(max_val=1.0, power_factors=(0.0448, 0.2856, 0.3001, 0.2363, 0.1333), filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03)[source]

Returns MS-SSIM index between img1 and img2.

Its implementation is based on Wang, Zhou, Eero P. Simoncelli, and Alan C. Bovik. Multiscale structural similarity for image quality assessment. Signals, Systems and Computers, 2004.

\[\begin{split}l(x,y)&=\frac{2\mu_x\mu_y+C_1}{\mu_x^2+\mu_y^2+C_1}, C_1=(K_1L)^2.\\ c(x,y)&=\frac{2\sigma_x\sigma_y+C_2}{\sigma_x^2+\sigma_y^2+C_2}, C_2=(K_2L)^2.\\ s(x,y)&=\frac{\sigma_{xy}+C_3}{\sigma_x\sigma_y+C_3}, C_3=C_2/2.\\ MSSSIM(x,y)&=l^alpha_M*{\prod_{1\leq j\leq M} (c^beta_j*s^gamma_j)}.\end{split}\]
Parameters
  • max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8-bit grayscale images). Default: 1.0.

  • power_factors (Union[tuple, list]) – Iterable of weights for each scal e. Default: (0.0448, 0.2856, 0.3001, 0.2363, 0.1333). Default values obtained by Wang et al.

  • filter_size (int) – The size of the Gaussian filter. Default: 11.

  • filter_sigma (float) – The standard deviation of Gaussian kernel. Default: 1.5.

  • k1 (float) – The constant used to generate c1 in the luminance comparison function. Default: 0.01.

  • k2 (float) – The constant used to generate c2 in the contrast comparison function. Default: 0.03.

Inputs:
  • img1 (Tensor) - The first image batch with format ‘NCHW’. It must be the same shape and dtype as img2.

  • img2 (Tensor) - The second image batch with format ‘NCHW’. It must be the same shape and dtype as img1.

Outputs:

Tensor, the value is in range [0, 1]. It is a 1-D tensor with shape N, where N is the batch num of img1.

Examples

>>> net = nn.MSSSIM(power_factors=(0.033, 0.033, 0.033))
>>> img1 = Tensor(np.random.random((1,3,128,128)))
>>> img2 = Tensor(np.random.random((1,3,128,128)))
>>> msssim = net(img1, img2)
class mindspore.nn.MatMul(transpose_x1=False, transpose_x2=False)[source]

Multiplies matrix x1 by matrix x2.

The rank of input tensors must be not less than 2. The none-matrix dimensions(batch) of inputs will be broadcasted and must be broadcastable.

Parameters
  • transpose_x1 (bool) – If true, a is transposed before multiplication. Default: False.

  • transpose_x2 (bool) – If true, b is transposed before multiplication. Default: False.

Inputs:
  • input_x1 (Tensor) - The first tensor to be multiplied. The shape of the tensor is \((*A, N, C)\), where \(*A\) represents the batch size of x1 which can be multidimensional. If transpose_a is True, its shape must be \((*A, N, C)\) after transposing.

  • input_x2 (Tensor) - The second tensor to be multiplied. The shape of the tensor is \((*B, C, M)\), where \(*B\) represents the batch size of x2 which can be multidimensional. If transpose_b is True, its shape must be \((*B, C, M)\) after transposing.

Outputs:

Tensor, the shape of the output tensor is \((*L, N, M)\). \(*L\) is the batch size after broadcasting.

Examples

>>> net = nn.MatMul()
>>> input_x1 = Tensor(np.ones(shape=[3, 2, 3]), mindspore.float32)
>>> input_x2 = Tensor(np.ones(shape=[3, 4]), mindspore.float32)
>>> output = net(input_x1, input_x2)
>>> print(output.shape)
(3, 2, 4)
class mindspore.nn.MatrixDiag[source]

Returns a batched diagonal tensor with a given batched diagonal values.

Inputs:
  • x (Tensor) - The diagonal values. It can be one of the following data types: float32, float16, int32, int8, and uint8.

Outputs:

Tensor, has the same type as input x. The shape must be x.shape + (x.shape[-1], ).

Examples

>>> x = Tensor(np.array([1, -1]), mstype.float32)
>>> matrix_diag = nn.MatrixDiag()
>>> result = matrix_diag(x)
[[1.   0.]
 [0.  -1.]]
class mindspore.nn.MatrixDiagPart[source]

Returns the batched diagonal part of a batched tensor.

Inputs:
  • x (Tensor) - The batched tensor. It can be one of the following data types: float32, float16, int32, int8, and uint8.

Outputs:

Tensor, has the same type as input x. The shape must be x.shape[:-2] + [min(x.shape[-2:])].

Examples

>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
>>> matrix_diag_part = nn.MatrixDiagPart()
>>> result = matrix_diag_part(x)
[[-1., 1.], [-1., 1.], [-1., 1.]]
class mindspore.nn.MatrixSetDiag[source]

Modify the batched diagonal part of a batched tensor.

Inputs:
  • x (Tensor) - The batched tensor. It can be one of the following data types: float32, float16, int32, int8, and uint8.

  • diagonal (Tensor) - The diagonal values.

Outputs:

Tensor, has the same type and shape as input x.

Examples

>>> x = Tensor([[[-1, 0], [0, 1]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]], mindspore.float32)
>>> diagonal = Tensor([[-1., 2.], [-1., 1.], [-1., 1.]], mindspore.float32)
>>> matrix_set_diag = nn.MatrixSetDiag()
>>> result = matrix_set_diag(x, diagonal)
[[[-1, 0], [0, 2]], [[-1, 0], [0, 1]], [[-1, 0], [0, 1]]]
class mindspore.nn.MaxPool2d(kernel_size=1, stride=1, pad_mode='valid')[source]

Max pooling operation for temporal data.

Applies a 2D max pooling over an input Tensor which can be regarded as a composition of 2D planes.

Typically the input is of shape \((N_{in}, C_{in}, H_{in}, W_{in})\), MaxPool2d outputs regional maximum in the \((H_{in}, W_{in})\)-dimension. Given kernel size \(ks = (h_{ker}, w_{ker})\) and stride \(s = (s_0, s_1)\), the operation is as follows.

\[\text{output}(N_i, C_j, h, w) = \max_{m=0, \ldots, h_{ker}-1} \max_{n=0, \ldots, w_{ker}-1} \text{input}(N_i, C_j, s_0 \times h + m, s_1 \times w + n)\]

Note

pad_mode for training only supports “same” and “valid”.

Parameters
  • kernel_size (Union[int, tuple[int]]) – The size of kernel used to take the max value, is an int number that represents height and width are both kernel_size, or a tuple of two int numbers that represent height and width respectively. Default: 1.

  • stride (Union[int, tuple[int]]) – The distance of kernel moving, an int number that represents the height and width of movement are both strides, or a tuple of two int numbers that represent height and width of movement respectively. Default: 1.

  • pad_mode (str) –

    The optional value for pad mode, is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Adopts the way of completion. The height and width of the output will be the same as the input. The total number of padding will be calculated in horizontal and vertical directions and evenly distributed to top and bottom, left and right if possible. Otherwise, the last extra padding will be done from the bottom and the right side.

    • valid: Adopts the way of discarding. The possible largest height and width of output will be returned without padding. Extra pixels will be discarded.

Inputs:
  • input (Tensor) - Tensor of shape \((N, C_{in}, H_{in}, W_{in})\).

Outputs:

Tensor of shape \((N, C_{out}, H_{out}, W_{out})\).

Examples

>>> pool = nn.MaxPool2d(kernel_size=3, stride=1)
>>> x = Tensor(np.random.randint(0, 10, [1, 2, 4, 4]), mindspore.float32)
[[[[1. 5. 5. 1.]
   [0. 3. 4. 8.]
   [4. 2. 7. 6.]
   [4. 9. 0. 1.]]
  [[3. 6. 2. 6.]
   [4. 4. 7. 8.]
   [0. 0. 4. 0.]
   [1. 8. 7. 0.]]]]
>>> output = pool(x)
>>> output.shape
(1, 2, 2, 2)
>>> output
[[[[7. 8.]
   [9. 9.]]
  [[7. 8.]
   [8. 8.]]]]
class mindspore.nn.Metric[source]

Base class of metric.

Note

For examples of subclasses, please refer to the definition of class MAE, ‘Recall’ etc.

abstract clear()[source]

An interface describes the behavior of clearing the internal evaluation result.

Note

All subclasses must override this interface.

abstract eval()[source]

An interface describes the behavior of computing the evaluation result.

Note

All subclasses must override this interface.

abstract update(*inputs)[source]

An interface describes the behavior of updating the internal evaluation result.

Note

All subclasses must override this interface.

Parameters

inputs – A variable-length input argument list.

class mindspore.nn.Momentum(params, learning_rate, momentum, weight_decay=0.0, loss_scale=1.0, use_nesterov=False)[source]

Implements the Momentum algorithm.

Refer to the paper on the importance of initialization and momentum in deep learning for more details.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

\[v_{t} = v_{t-1} \ast u + gradients\]
If use_nesterov is True:
\[p_{t} = p_{t-1} - (grad \ast lr + v_{t} \ast u \ast lr)\]
If use_nesterov is Flase:
\[p_{t} = p_{t-1} - lr \ast v_{t}\]

Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float.

  • momentum (float) – Hyperparameter of type float, means momentum for the moving average. It must be at least 0.0.

  • weight_decay (int, float) – Weight decay (L2 penalty). It must be equal to or greater than 0.0. Default: 0.0.

  • loss_scale (int, float) – A floating point value for the loss scale. It must be greater than 0.0. Default: 1.0.

  • use_nesterov (bool) – Enable Nesterov momentum. Default: False.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

tuple[bool], all elements are True.

Raises

ValueError – If the momentum is less than 0.0.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.Momentum(params=net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.Momentum(group_params, learning_rate=0.1, momentum=0.9, weight_decay=0.0)
>>> # The conv_params's parameters will use a learning rate of default value 0.1 and a weight decay of 0.01.
>>> # The no_conv_params's parameters will use a learning rate of 0.01 and a weight decay of default value 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim, metrics=None)
class mindspore.nn.MulQuant(ema_decay=0.999, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

Add Fake Quant OP after Mul OP.

This part is a more detailed overview of Mul op.

Parameters
  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - The input of MulQuant.

Outputs:

Tensor, with the same type and shape as the x.

class mindspore.nn.NaturalExpDecayLR(learning_rate, decay_rate, decay_steps, is_stair=False)[source]

Calculate learning rate base on natural exponential decay function.

For the i-th step, the formula of computing decayed_learning_rate[i] is:

\[decayed\_learning\_rate[i] = learning\_rate * e^{-decay\_rate * p}\]

Where :

\[p = \frac{current\_step}{decay\_steps}\]

If is_stair is True, the formula is :

\[p = floor(\frac{current\_step}{decay\_steps})\]
Parameters
  • learning_rate (float) – The initial value of learning rate.

  • decay_rate (float) – The decay rate.

  • decay_steps (int) – A value used to calculate decayed learning rate.

  • is_stair (bool) – If true, learning rate is decayed once every decay_steps time. Default: False.

Inputs:

Tensor. The current step number.

Returns

Tensor. The learning rate value for the current step.

Examples

>>> learning_rate = 0.1
>>> decay_rate = 0.9
>>> decay_steps = 4
>>> global_step = Tensor(2, mstype.int32)
>>> natural_exp_decay_lr = NaturalExpDecayLR(learning_rate, decay_rate, decay_steps, True)
>>> natural_exp_decay_lr(global_step)
class mindspore.nn.Norm(axis=(), keep_dims=False)[source]

Computes the norm of vectors, currently including Euclidean norm, i.e., \(L_2\)-norm.

Parameters
  • axis (Union[tuple, int]) – The axis over which to compute vector norms. Default: ().

  • keep_dims (bool) – If true, the axis indicated in axis are kept with size 1. Otherwise, the dimensions in axis are removed from the output shape. Default: False.

Inputs:
  • input (Tensor) - Tensor which is not empty.

Outputs:

Tensor, output tensor with dimensions in ‘axis’ reduced to 1 will be returned if ‘keep_dims’ is True; otherwise a Tensor with dimensions in ‘axis’ removed is returned.

Examples

>>> net = nn.Norm(axis=0)
>>> input = Tensor(np.random.randint(0, 10, [2, 4]), mindspore.float32)
>>> net(input)
[2.236068 9.848858 4. 5.656854]
class mindspore.nn.OneHot(axis=- 1, depth=1, on_value=1.0, off_value=0.0, dtype=mindspore.float32)[source]

Returns a one-hot tensor.

The locations represented by indices in argument ‘indices’ take value on_value, while all other locations take value off_value.

Note

If the input indices is rank \(N\), the output will have rank \(N+1\). The new axis is created at dimension axis.

Parameters
  • axis (int) – Features x depth if axis is -1, depth x features if axis is 0. Default: -1.

  • depth (int) – A scalar defining the depth of the one hot dimension. Default: 1.

  • on_value (float) – A scalar defining the value to fill in output[i][j] when indices[j] = i. Default: 1.0.

  • off_value (float) – A scalar defining the value to fill in output[i][j] when indices[j] != i. Default: 0.0.

  • dtype (mindspore.dtype) – Data type of ‘on_value’ and ‘off_value’, not the data type of indices. Default: mindspore.float32.

Inputs:
  • indices (Tensor) - A tensor of indices of data type mindspore.int32 and arbitrary shape.

Outputs:

Tensor, the one-hot tensor of data type ‘dtype’ with dimension at ‘axis’ expanded to ‘depth’ and filled with on_value and off_value.

Examples

>>> net = nn.OneHot(depth=4, axis=1)
>>> indices = Tensor([[1, 3], [0, 2]], dtype=mindspore.int32)
>>> net(indices)
[[[0. 0.]
  [1. 0.]
  [0. 0.]
  [0. 1.]]
 [[1. 0.]
  [0. 0.]
  [0. 1.]
  [0. 0.]]]
class mindspore.nn.Optimizer(learning_rate, parameters, weight_decay=0.0, loss_scale=1.0)[source]

Base class for all optimizers.

Note

This class defines the API to add Ops to train a model. Never use this class directly, but instead instantiate one of its subclasses.

Different parameter groups can set different learning_rate and weight_decay.

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight_decay is positive. For most optimizer, when not separating parameters, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

Parameters
  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float.

  • parameters (Union[list[Parameter], list[dict]]) –

    When the parameters is a list of Parameter which will be updated, the element in parameters must be class Parameter. When the parameters is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • weight_decay (float) – A floating point value for the weight decay. It must be equal to or greater than 0. If the type of weight_decay input is int, it will be converted to float. Default: 0.0.

  • loss_scale (float) – A floating point value for the loss scale. It must be greater than 0. If the type of loss_scale input is int, it will be converted to float. Default: 1.0.

Raises
  • ValueError – If the learning_rate is a Tensor, but the dimension of tensor is greater than 1.

  • TypeError – If the learning_rate is not any of the three types: float, Tensor, nor Iterable.

broadcast_params(optim_result)[source]

Apply Broadcast operations in the sequential order of parameter groups.

Returns

bool, the status flag.

decay_weight(gradients)[source]

Weight decay.

An approach to reduce the overfitting of a deep learning neural network model.

Parameters

gradients (tuple[Tensor]) – The gradients of self.parameters, and have the same shape as self.parameters.

Returns

tuple[Tensor], The gradients after weight decay.

get_lr()[source]

Get the learning rate of current step.

Returns

float, the learning rate of current step.

get_lr_parameter(param)[source]

Get the learning rate of parameter.

Parameters

param (Union[Parameter, list[Parameter]]) – The Parameter or list of Parameter.

Returns

Parameter, single Parameter or list[Parameter] according to the input type.

scale_grad(gradients)[source]

Loss scale for mixed precision.

An approach of mixed precision training to improve the speed and energy efficiency of training deep neural network.

Parameters

gradients (tuple[Tensor]) – The gradients of self.parameters, and have the same shape as self.parameters.

Returns

tuple[Tensor], The gradients after loss scale.

class mindspore.nn.PReLU(channel=1, w=0.25)[source]

PReLU activation function.

Applies the PReLU function element-wise.

PReLU is defined as: \(prelu(x_i)= \max(0, x_i) + w * \min(0, x_i)\), where \(x_i\) is an element of an channel of the input.

Here \(w\) is a learnable parameter with a default initial value 0.25. Parameter \(w\) has dimensionality of the argument channel. If called without argument channel, a single parameter \(w\) will be shared across all channels.

Parameters
  • channel (int) – The dimension of input. Default: 1.

  • w (float) – The initial value of w. Default: 0.25.

Inputs:
  • input_data (Tensor) - The input of PReLU.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.random.rand(1, 10, 4, 4), mindspore.float32)
>>> prelu = nn.PReLU()
>>> prelu(input_x)
class mindspore.nn.PSNR(max_val=1.0)[source]

Returns Peak Signal-to-Noise Ratio of two image batches.

It produces a PSNR value for each image in batch. Assume inputs are \(I\) and \(K\), both with shape \(h*w\). \(MAX\) represents the dynamic range of pixel values.

\[\begin{split}MSE&=\frac{1}{hw}\sum\limits_{i=0}^{h-1}\sum\limits_{j=0}^{w-1}[I(i,j)-K(i,j)]^2\\ PSNR&=10*log_{10}(\frac{MAX^2}{MSE})\end{split}\]
Parameters

max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8-bit grayscale images). The value must be greater than 0. Default: 1.0.

Inputs:
  • img1 (Tensor) - The first image batch with format ‘NCHW’. It must be the same shape and dtype as img2.

  • img2 (Tensor) - The second image batch with format ‘NCHW’. It must be the same shape and dtype as img1.

Outputs:

Tensor, with dtype mindspore.float32. It is a 1-D tensor with shape N, where N is the batch num of img1.

Examples

>>> net = nn.PSNR()
>>> img1 = Tensor(np.random.random((1,3,16,16)))
>>> img2 = Tensor(np.random.random((1,3,16,16)))
>>> psnr = net(img1, img2)
class mindspore.nn.Pad(paddings, mode='CONSTANT')[source]

Pads the input tensor according to the paddings and mode.

Parameters
  • paddings (tuple) – The shape of parameter paddings is (N, 2). N is the rank of input data. All elements of paddings are int type. For D th dimension of input, paddings[D, 0] indicates how many sizes to be extended ahead of the D th dimension of the input tensor, and paddings[D, 1] indicates how many sizes to be extended behind of the D th dimension of the input tensor.

  • mode (str) – Specifies padding mode. The optional values are “CONSTANT”, “REFLECT”, “SYMMETRIC”. Default: “CONSTANT”.

Inputs:
  • input_x (Tensor) - The input tensor.

Outputs:

Tensor, the tensor after padding.

  • If mode is “CONSTANT”, it fills the edge with 0, regardless of the values of the input_x. If the input_x is [[1,2,3],[4,5,6],[7,8,9]] and paddings is [[1,1],[2,2]], then the Outputs is [[0,0,0,0,0,0,0],[0,0,1,2,3,0,0],[0,0,4,5,6,0,0],[0,0,7,8,9,0,0],[0,0,0,0,0,0,0]].

  • If mode is “REFLECT”, it uses a way of symmetrical copying throught the axis of symmetry to fill in. If the input_x is [[1,2,3],[4,5,6],[7,8,9]] and paddings is [[1,1],[2,2]], then the Outputs is [[6,5,4,5,6,5,4],[3,2,1,2,3,2,1],[6,5,4,5,6,5,4],[9,8,7,8,9,8,7],[6,5,4,5,6,5,4]].

  • If mode is “SYMMETRIC”, the filling method is similar to the “REFLECT”. It is also copied according to the symmetry axis, except that it includes the symmetry axis. If the input_x is [[1,2,3],[4,5,6],[7,8,9]] and paddings is [[1,1],[2,2]], then the Outputs is [[2,1,1,2,3,3,2],[2,1,1,2,3,3,2],[5,4,4,5,6,6,5],[8,7,7,8,9,9,8],[8,7,7,8,9,9,8]].

Examples

>>> from mindspore import Tensor
>>> from mindspore.ops import operations as P
>>> import mindspore.nn as nn
>>> import numpy as np
>>> class Net(nn.Cell):
>>>     def __init__(self):
>>>         super(Net, self).__init__()
>>>         self.pad = nn.Pad(paddings=((1,1),(2,2)), mode="CONSTANT")
>>>     def construct(self, x):
>>>         return self.pad(x)
>>> x = np.random.random(size=(2, 3)).astype(np.float32)
>>> pad = Net()
>>> ms_output = pad(Tensor(x))
class mindspore.nn.ParameterUpdate(param)[source]

Cell that updates parameters.

With this Cell, one can manually update param with the input Tensor.

Parameters

param (Parameter) – The parameter to be updated manually.

Raises

KeyError – If parameter with the specified name does not exist.

Examples

>>> network = Net()
>>> param = network.parameters_dict()['learning_rate']
>>> update = nn.ParameterUpdate(param)
>>> update.phase = "update_param"
>>> lr = Tensor(0.001, mindspore.float32)
>>> update(lr)
class mindspore.nn.PolynomialDecayLR(learning_rate, end_learning_rate, decay_steps, power, update_decay_steps=False)[source]

Calculate learning rate base on polynomial decay function.

For the i-th step, the formula of computing decayed_learning_rate[i] is:

\[decayed\_learning\_rate[i] = (learning\_rate - end\_learning\_rate) * (1 - tmp\_step / tmp\_decay\_steps)^{power} + end\_learning\_rate\]

Where :

\[tmp\_step=min(current\_step, decay\_steps)\]

If update_decay_steps is true, update the value of tmp_decay_step every decay_steps. The formula is :

\[tmp\_decay\_steps = decay\_steps * ceil(current\_step / decay\_steps)\]
Parameters
  • learning_rate (float) – The initial value of learning rate.

  • end_learning_rate (float) – The end value of learning rate.

  • decay_steps (int) – A value used to calculate decayed learning rate.

  • power (float) – A value used to calculate decayed learning rate. This parameter must be greater than 0.

  • update_decay_steps (bool) – If true, learning rate is decayed once every decay_steps time. Default: False.

Inputs:

Tensor. The current step number.

Returns

Tensor. The learning rate value for the current step.

Examples

>>> learning_rate = 0.1
>>> end_learning_rate = 0.01
>>> decay_steps = 4
>>> power = 0.5
>>> global_step = Tensor(2, mstype.int32)
>>> polynomial_decay_lr = PolynomialDecayLR(learning_rate, end_learning_rate, decay_steps, power)
>>> polynomial_decay_lr(global_step)
class mindspore.nn.Precision(eval_type='classification')[source]

Calculates precision for classification and multilabel data.

The precision function creates two local variables, \(\text{true_positive}\) and \(\text{false_positive}\), that are used to compute the precision. This value is ultimately returned as the precision, an idempotent operation that simply divides \(\text{true_positive}\) by the sum of \(\text{true_positive}\) and \(\text{false_positive}\).

\[\text{precision} = \frac{\text{true_positive}}{\text{true_positive} + \text{false_positive}}\]

Note

In the multi-label cases, the elements of \(y\) and \(y_{pred}\) must be 0 or 1.

Parameters

eval_type (str) – Metric to calculate accuracy over a dataset, for classification or multilabel. Default: ‘classification’.

Examples

>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> y = Tensor(np.array([1, 0, 1]))
>>> metric = nn.Precision('classification')
>>> metric.clear()
>>> metric.update(x, y)
>>> precision = metric.eval()
clear()[source]

Clears the internal evaluation result.

eval(average=False)[source]

Computes the precision.

Parameters

average (bool) – Specify whether calculate the average precision. Default value is False.

Returns

Float, the computed result.

update(*inputs)[source]

Updates the internal evaluation result with y_pred and y.

Parameters

inputs – Input y_pred and y. y_pred and y are Tensor, list or numpy.ndarray. For ‘classification’ evaluation type, y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. Shape of y can be \((N, C)\) with values 0 and 1 if one-hot encoding is used or the shape is \((N,)\) with integer values if index of category is used. For ‘multilabel’ evaluation type, y_pred and y can only be one-hot encoding with values 0 or 1. Indices with 1 indicate positive category. The shape of y_pred and y are both \((N, C)\).

Raises

ValueError – If the number of input is not 2.

class mindspore.nn.ProximalAdagrad(params, accum=0.1, learning_rate=0.001, l1=0.0, l2=0.0, use_locking=False, loss_scale=1.0, weight_decay=0.0)[source]

Implement the ProximalAdagrad algorithm with ApplyProximalAdagrad Operator.

ProximalAdagrad is an online Learning and Stochastic Optimization. Refer to paper Efficient Learning using Forward-Backward Splitting.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

The sparse strategy is applied while the SparseGatherV2 operator being used for forward network. The sparse feature is under continuous development. The sparse behavior is currently performed on the CPU.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • accum (float) – The starting value for accumulators, must be zero or positive values. Default: 0.1.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float. Default: 0.001.

  • l1 (float) – l1 regularization strength, must be greater than or equal to zero. Default: 0.0.

  • l2 (float) – l2 regularization strength, must be greater than or equal to zero. Default: 0.0.

  • use_locking (bool) – If true, use locks for updating operation. Default: False.

  • loss_scale (float) – Value for the loss scale. It must be greater than 0.0. Default: 1.0.

  • weight_decay (float) – Weight decay value to multiply weight, must be zero or positive value. Default: 0.0.

Inputs:
  • grads (tuple[Tensor]) - The gradients of params in the optimizer, the shape is the same as the params in optimizer.

Outputs:

Tensor[bool], the value is True.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.ProximalAdagrad(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.ProximalAdagrad(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use default learning rate of 0.1 and weight decay of 0.01.
>>> # The no_conv_params's parameters will use learning rate of 0.01 and default weight decay of 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.RMSProp(params, learning_rate=0.1, decay=0.9, momentum=0.0, epsilon=1e-10, use_locking=False, centered=False, loss_scale=1.0, weight_decay=0.0)[source]

Implements Root Mean Squared Propagation (RMSProp) algorithm.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

Update params according to the RMSProp algorithm.

The equation is as follows:

\[s_{t} = \rho s_{t-1} + (1 - \rho)(\nabla Q_{i}(w))^2\]
\[m_{t} = \beta m_{t-1} + \frac{\eta} {\sqrt{s_{t} + \epsilon}} \nabla Q_{i}(w)\]
\[w = w - m_{t}\]

The first equation calculates moving average of the squared gradient for each weight. Then dividing the gradient by \(\sqrt{ms_{t} + \epsilon}\).

if centered is True:

\[g_{t} = \rho g_{t-1} + (1 - \rho)\nabla Q_{i}(w)\]
\[s_{t} = \rho s_{t-1} + (1 - \rho)(\nabla Q_{i}(w))^2\]
\[m_{t} = \beta m_{t-1} + \frac{\eta} {\sqrt{s_{t} - g_{t}^2 + \epsilon}} \nabla Q_{i}(w)\]
\[w = w - m_{t}\]

where \(w\) represents params, which will be updated. \(g_{t}\) is mean gradients, \(g_{t-1}\) is the last moment of \(g_{t}\). \(s_{t}\) is the mean square gradients, \(s_{t-1}\) is the last moment of \(s_{t}\), \(m_{t}\) is moment, the delta of w, \(m_{t-1}\) is the last moment of \(m_{t}\). \(\rho\) represents decay. \(\beta\) is the momentum term, represents momentum. \(\epsilon\) is a smoothing term to avoid division by zero, represents epsilon. \(\eta\) is learning rate, represents learning_rate. \(\nabla Q_{i}(w)\) is gradientse, represents gradients.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float. Default: 0.1.

  • decay (float) – Decay rate. Should be equal to or greater than 0. Default: 0.9.

  • momentum (float) – Hyperparameter of type float, means momentum for the moving average. Should be equal to or greater than 0. Default: 0.0.

  • epsilon (float) – Term added to the denominator to improve numerical stability. Should be greater than 0. Default: 1e-10.

  • use_locking (bool) – Whether to enable a lock to protect the variable and accumlation tensors from being updated. Default: False.

  • centered (bool) – If true, gradients are normalized by the estimated variance of the gradient. Default: False.

  • loss_scale (float) – A floating point value for the loss scale. Should be greater than 0. Default: 1.0.

  • weight_decay (float) – Weight decay (L2 penalty). Should be equal to or greater than 0. Default: 0.0.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

Tensor[bool], the value is True.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.RMSProp(params=net.trainable_params(), learning_rate=lr)
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.RMSProp(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use a learning rate of default value 0.1 and a weight decay of 0.01.
>>> # The no_conv_params's parameters will use a learning rate of 0.01 and a weight decay of default value 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.Range(start, limit=None, delta=1)[source]

Creates a sequence of numbers.

Parameters
  • start (Union[int, float]) – If limit is None, the value acts as limit in the range and first entry defaults to 0. Otherwise, it acts as first entry in the range.

  • limit (Union[int, float]) – Acts as upper limit of sequence. If None, defaults to the value of start while set the first entry of the range to 0. It can not be equal to start.

  • delta (Union[int, float]) – Increment of the range. It can not be equal to zero. Default: 1.

Outputs:

Tensor, the dtype is int if the dtype of start, limit and delta all are int. Otherwise, dtype is float.

Examples

>>> net = nn.Range(1, 8, 2)
>>> out = net()
[1, 3, 5, 7]
class mindspore.nn.ReLU[source]

Rectified Linear Unit activation function.

Applies the rectified linear unit function element-wise. It returns element-wise \(\max(0, x)\), specially, the neurons with the negative output will be suppressed and the active neurons will stay the same.

Inputs:
  • input_data (Tensor) - The input of ReLU.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([-1, 2, -3, 2, -1]), mindspore.float16)
>>> relu = nn.ReLU()
>>> relu(input_x)
[0.  2.  0.  2.  0.]
class mindspore.nn.ReLU6[source]

Compute ReLU6 activation function.

ReLU6 is similar to ReLU with a upper limit of 6, which if the inputs are greater than 6, the outputs will be suppressed to 6. It computes element-wise as \(\min(\max(0, x), 6)\). The input is a Tensor of any valid shape.

Inputs:
  • input_data (Tensor) - The input of ReLU6.

Outputs:

Tensor, which has the same type as input_data.

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> relu6 = nn.ReLU6()
>>> relu6(input_x)
[0.  0.  0.  2.  1.]
class mindspore.nn.Recall(eval_type='classification')[source]

Calculate recall for classification and multilabel data.

The recall class creates two local variables, \(\text{true_positive}\) and \(\text{false_negative}\), that are used to compute the recall. This value is ultimately returned as the recall, an idempotent operation that simply divides \(\text{true_positive}\) by the sum of \(\text{true_positive}\) and \(\text{false_negative}\).

\[\text{recall} = \frac{\text{true_positive}}{\text{true_positive} + \text{false_negative}}\]

Note

In the multi-label cases, the elements of \(y\) and \(y_{pred}\) must be 0 or 1.

Parameters

eval_type (str) – Metric to calculate the recall over a dataset, for classification or multilabel. Default: ‘classification’.

Examples

>>> x = Tensor(np.array([[0.2, 0.5], [0.3, 0.1], [0.9, 0.6]]))
>>> y = Tensor(np.array([1, 0, 1]))
>>> metric = nn.Recall('classification')
>>> metric.clear()
>>> metric.update(x, y)
>>> recall = metric.eval()
clear()[source]

Clears the internal evaluation result.

eval(average=False)[source]

Computes the recall.

Parameters

average (bool) – Specify whether calculate the average recall. Default value is False.

Returns

Float, the computed result.

update(*inputs)[source]

Updates the internal evaluation result with y_pred and y.

Parameters

inputs – Input y_pred and y. y_pred and y are a Tensor, a list or an array. For ‘classification’ evaluation type, y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. Shape of y can be \((N, C)\) with values 0 and 1 if one-hot encoding is used or the shape is \((N,)\) with integer values if index of category is used. For ‘multilabel’ evaluation type, y_pred and y can only be one-hot encoding with values 0 or 1. Indices with 1 indicate positive category. The shape of y_pred and y are both \((N, C)\).

Raises

ValueError – If the number of input is not 2.

class mindspore.nn.ReduceLogSumExp(axis, keep_dims=False)[source]

Reduce a dimension of a tensor by calculating exponential for all elements in the dimension, then calculate logarithm of the sum.

The dtype of the tensor to be reduced is number.

Parameters
  • keep_dims (bool) – If True, keep these reduced dimensions and the length is 1. If False, don’t keep these dimensions. Default : False.

  • axis (Union[int, tuple(int), list(int)]) – (), reduce all dimensions. Only constant value is allowed.

Inputs:
  • input_x (Tensor[Number]) - The input tensor. With float16 or float32 data type.

Outputs:

Tensor, has the same dtype as the input_x.

  • If axis is (), and keep_dims is False, the output is a 0-D tensor representing the sum of all elements in the input tensor.

  • If axis is int, set as 2, and keep_dims is False, the shape of output is \((x_1, x_3, ..., x_R)\).

  • If axis is tuple(int), set as (2, 3), and keep_dims is False, the shape of output is \((x_1, x_4, ..., x_R)\).

Examples

>>> input_x = Tensor(np.random.randn(3, 4, 5, 6).astype(np.float32))
>>> op = nn.ReduceLogSumExp(keep_dims=True, 1)
>>> output = op(input_x)
>>> output.shape
(3, 1, 5, 6)
class mindspore.nn.SGD(params, learning_rate=0.1, momentum=0.0, dampening=0.0, weight_decay=0.0, nesterov=False, loss_scale=1.0)[source]

Implements stochastic gradient descent (optionally with momentum).

Introduction to SGD can be found at https://en.wikipedia.org/wiki/Stochastic_gradient_descent. Nesterov momentum is based on the formula from paper On the importance of initialization and momentum in deep learning.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

To improve parameter groups performance, the customized order of parameters can be supported.

\[v_{t+1} = u \ast v_{t} + gradient \ast (1-dampening)\]
If nesterov is True:
\[p_{t+1} = p_{t} - lr \ast (gradient + u \ast v_{t+1})\]
If nesterov is Flase:
\[p_{t+1} = p_{t} - lr \ast v_{t+1}\]

To be noticed, for the first step, v_{t+1} = gradient

Here : where p, v and u denote the parameters, accum, and momentum respectively.

Parameters
  • params (Union[list[Parameter], list[dict]]) –

    When the params is a list of Parameter which will be updated, the element in params must be class Parameter. When the params is a list of dict, the “params”, “lr”, “weight_decay” and “order_params” are the keys can be parsed.

    • params: Required. The value must be a list of Parameter.

    • lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in the API will be used.

    • weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the API will be used.

    • order_params: Optional. If “order_params” in the keys, the value must be the order of parameters and the order will be followed in optimizer. There are no other keys in the dict and the parameters which in the value of ‘order_params’ must be in one of group parameters.

  • learning_rate (Union[float, Tensor, Iterable, LearningRateSchedule]) – A value or a graph for the learning rate. When the learning_rate is an Iterable or a Tensor in a 1D dimension, use dynamic learning rate, then the i-th step will take the i-th value as the learning rate. When the learning_rate is LearningRateSchedule, use dynamic learning rate, the i-th learning rate will be calculated during the process of training according to the formula of LearningRateSchedule. When the learning_rate is a float or a Tensor in a zero dimension, use fixed learning rate. Other cases are not supported. The float learning rate must be equal to or greater than 0. If the type of learning_rate is int, it will be converted to float. Default: 0.1.

  • momentum (float) – A floating point value the momentum. must be at least 0.0. Default: 0.0.

  • dampening (float) – A floating point value of dampening for momentum. must be at least 0.0. Default: 0.0.

  • weight_decay (float) – Weight decay (L2 penalty). It must be equal to or greater than 0. Default: 0.0.

  • nesterov (bool) – Enables the Nesterov momentum. If use nesterov, momentum must be positive, and dampening must equal to 0.0. Default: False.

  • loss_scale (float) – A floating point value for the loss scale, which must be larger than 0.0. Default: 1.0.

Inputs:
  • gradients (tuple[Tensor]) - The gradients of params, the shape is the same as params.

Outputs:

Tensor[bool], the value is True.

Raises

ValueError – If the momentum, dampening or weight_decay value is less than 0.0.

Examples

>>> net = Net()
>>> #1) All parameters use the same learning rate and weight decay
>>> optim = nn.SGD(params=net.trainable_params())
>>>
>>> #2) Use parameter groups and set different values
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'weight_decay': 0.01},
>>>                 {'params': no_conv_params, 'lr': 0.01},
>>>                 {'order_params': net.trainable_params()}]
>>> optim = nn.SGD(group_params, learning_rate=0.1, weight_decay=0.0)
>>> # The conv_params's parameters will use a learning rate of default value 0.1 and a weight decay of 0.01.
>>> # The no_conv_params's parameters will use a learning rate of 0.01 and a weight decay of default value 0.0.
>>> # The final parameters order in which the optimizer will be followed is the value of 'order_params'.
>>>
>>> loss = nn.SoftmaxCrossEntropyWithLogits()
>>> model = Model(net, loss_fn=loss, optimizer=optim)
class mindspore.nn.SSIM(max_val=1.0, filter_size=11, filter_sigma=1.5, k1=0.01, k2=0.03)[source]

Returns SSIM index between img1 and img2.

Its implementation is based on Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: from error visibility to structural similarity. IEEE transactions on image processing.

\[\begin{split}l(x,y)&=\frac{2\mu_x\mu_y+C_1}{\mu_x^2+\mu_y^2+C_1}, C_1=(K_1L)^2.\\ c(x,y)&=\frac{2\sigma_x\sigma_y+C_2}{\sigma_x^2+\sigma_y^2+C_2}, C_2=(K_2L)^2.\\ s(x,y)&=\frac{\sigma_{xy}+C_3}{\sigma_x\sigma_y+C_3}, C_3=C_2/2.\\ SSIM(x,y)&=l*c*s\\&=\frac{(2\mu_x\mu_y+C_1)(2\sigma_{xy}+C_2}{(\mu_x^2+\mu_y^2+C_1)(\sigma_x^2+\sigma_y^2+C_2)}.\end{split}\]
Parameters
  • max_val (Union[int, float]) – The dynamic range of the pixel values (255 for 8-bit grayscale images). Default: 1.0.

  • filter_size (int) – The size of the Gaussian filter. Default: 11.

  • filter_sigma (float) – The standard deviation of Gaussian kernel. Default: 1.5.

  • k1 (float) – The constant used to generate c1 in the luminance comparison function. Default: 0.01.

  • k2 (float) – The constant used to generate c2 in the contrast comparison function. Default: 0.03.

Inputs:
  • img1 (Tensor) - The first image batch with format ‘NCHW’. It must be the same shape and dtype as img2.

  • img2 (Tensor) - The second image batch with format ‘NCHW’. It must be the same shape and dtype as img1.

Outputs:

Tensor, has the same dtype as img1. It is a 1-D tensor with shape N, where N is the batch num of img1.

Examples

>>> net = nn.SSIM()
>>> img1 = Tensor(np.random.random((1,3,16,16)))
>>> img2 = Tensor(np.random.random((1,3,16,16)))
>>> ssim = net(img1, img2)
class mindspore.nn.SequentialCell(*args)[source]

Sequential cell container.

A list of Cells will be added to it in the order they are passed in the constructor. Alternatively, an ordered dict of cells can also be passed in.

Parameters

args (list, OrderedDict) – List of subclass of Cell.

Raises

TypeError – If the type of the argument is not list or OrderedDict.

Inputs:
  • input (Tensor) - Tensor with shape according to the first Cell in the sequence.

Outputs:

Tensor, the output Tensor with shape depending on the input and defined sequence of Cells.

Examples

>>> conv = nn.Conv2d(3, 2, 3, pad_mode='valid')
>>> bn = nn.BatchNorm2d(2)
>>> relu = nn.ReLU()
>>> seq = nn.SequentialCell([conv, bn, relu])
>>>
>>> x = Tensor(np.random.random((1, 3, 4, 4)), dtype=mindspore.float32)
>>> seq(x)
[[[[0.02531557 0.        ]
   [0.04933941 0.04880078]]
  [[0.         0.        ]
   [0.         0.        ]]]]
append(cell)[source]

Appends a given cell to the end of the list.

Examples

>>> conv = nn.Conv2d(3, 2, 3, pad_mode='valid')
>>> bn = nn.BatchNorm2d(2)
>>> relu = nn.ReLU()
>>> seq = nn.SequentialCell([conv, bn])
>>> seq.append(relu)
>>> x = Tensor(np.ones([1, 3, 4, 4]), dtype=mindspore.float32)
>>> seq(x)
[[[[0.12445523 0.12445523]
   [0.12445523 0.12445523]]
  [[0.         0.        ]
   [0.         0.        ]]]]
class mindspore.nn.Sigmoid[source]

Sigmoid activation function.

Applies sigmoid-type activation element-wise.

Sigmoid function is defined as: \(\text{sigmoid}(x_i) = \frac{1}{1 + \exp(-x_i)}\), where \(x_i\) is the element of the input.

Inputs:
  • input_data (Tensor) - The input of Tanh.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> sigmoid = nn.Sigmoid()
>>> sigmoid(input_x)
[0.2688  0.11914  0.5  0.881  0.7305]
class mindspore.nn.SmoothL1Loss(beta=1.0)[source]

A loss class for learning region proposals.

SmoothL1Loss can be regarded as modified version of L1Loss or a combination of L1Loss and L2Loss. L1Loss computes the element-wise absolute difference between two input Tensor while L2Loss computes the squared difference between two input Tensor. L2Loss often leads to faster convergence but it is less robust to outliers.

Given two input \(x,\ y\) of length \(N\), the unreduced SmoothL1Loss can be described as follows:

\[\begin{split}L_{i} = \begin{cases} \frac{0.5 (x_i - y_i)^{2}}{\text{beta}}, & \text{if } |x_i - y_i| < \text{beta} \\ |x_i - y_i| - 0.5 \text{beta}, & \text{otherwise. } \end{cases}\end{split}\]

Here \(\text{beta}\) controls the point where the loss function changes from quadratic to linear. Its default value is 1.0. \(N\) is the batch size. This function returns an unreduced loss Tensor.

Parameters

beta (float) – A parameter used to control the point where the function will change from quadratic to linear. Default: 1.0.

Inputs:
  • input_data (Tensor) - Tensor of shape \((x_1, x_2, ..., x_R)\).

  • target_data (Tensor) - Tensor of shape \((y_1, y_2, ..., y_S)\).

Outputs:

Tensor, loss float tensor.

Examples

>>> loss = nn.SmoothL1Loss()
>>> input_data = Tensor(np.array([1, 2, 3]), mindspore.float32)
>>> target_data = Tensor(np.array([1, 2, 2]), mindspore.float32)
>>> loss(input_data, target_data)
class mindspore.nn.Softmax(axis=- 1)[source]

Softmax activation function.

Applies the Softmax function to an n-dimensional input Tensor.

The input is a Tensor of logits transformed with exponential function and then normalized to lie in range [0, 1] and sum up to 1.

Softmax is defined as:

\[\text{softmax}(x_{i}) = \frac{\exp(x_i)}{\sum_{j=0}^{n-1}\exp(x_j)},\]

where \(x_{i}\) is the \(i\)-th slice in the given dimension of the input Tensor.

Parameters

axis (Union[int, tuple[int]]) – The axis to apply Softmax operation, -1 means the last dimension. Default: -1.

Inputs:
  • x (Tensor) - The input of Softmax.

Outputs:

Tensor, which has the same type and shape as x with values in the range[0,1].

Examples

>>> input_x = Tensor(np.array([-1, -2, 0, 2, 1]), mindspore.float16)
>>> softmax = nn.Softmax()
>>> softmax(input_x)
[0.03168  0.01166  0.0861  0.636  0.2341]
class mindspore.nn.SoftmaxCrossEntropyWithLogits(sparse=False, reduction='none')[source]

Computes softmax cross entropy between logits and labels.

Measures the distribution error between the probabilities of the input (computed with softmax function) and the target where the classes are mutually exclusive (only one class is positive) using cross entropy loss.

Typical input into this function is unnormalized scores and target of each class. Scores Tensor \(x\) is of shape \((N, C)\) and target Tensor \(t\) is a Tensor of shape \((N, C)\) which contains one-hot labels of length \(C\).

For each instance \(N_i\), the loss is given as:

\[\ell(x_i, t_i) = - \log\left(\frac{\exp(x_{t_i})}{\sum_j \exp(x_j)}\right) = -x_{t_i} + \log\left(\sum_j \exp(x_i)\right),\]

where \(x_i\) is a 1D score Tensor, \(t_i\) is a scalar.

Note

While the target classes are mutually exclusive, i.e., only one class is positive in the target, the predicted probabilities need not to be exclusive. It is only required that the predicted probability distribution of entry is a valid one.

Parameters
  • sparse (bool) – Specifies whether labels use sparse format or not. Default: False.

  • reduction (str) – Type of reduction to be applied to loss. The optional values are “mean”, “sum”, and “none”. If “none”, do not perform reduction. Default: “none”.

Inputs:
  • logits (Tensor) - Tensor of shape (N, C).

  • labels (Tensor) - Tensor of shape (N, ). If sparse is True, The type of labels is mindspore.int32. If sparse is False, the type of labels is the same as the type of logits.

Outputs:

Tensor, a tensor of the same shape as logits with the component-wise logistic losses.

Examples

>>> loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
>>> logits = Tensor(np.random.randint(0, 9, [1, 10]), mindspore.float32)
>>> labels_np = np.ones([1,]).astype(np.int32)
>>> labels = Tensor(labels_np)
>>> loss(logits, labels)
class mindspore.nn.SparseToDense[source]

Convert a sparse tensor into dense.

Not yet supported by any backend at the moment.

Parameters

sparse_tensor (SparseTensor) – the sparse tensor to convert.

Returns

Tensor, the tensor converted.

Examples

>>> class SparseToDenseCell(nn.Cell):
>>>     def __init__(self, dense_shape):
>>>         super(SparseToDenseCell, self).__init__()
>>>         self.dense_shape = dense_shape
>>>         self.sparse_to_dense = nn.SparseToDense()
>>>     def construct(self, indices, values):
>>>         sparse = SparseTensor(indices, values, self.dense_shape)
>>>         return self.sparse_to_dense(sparse)
>>>
>>> indices = Tensor([[0, 1], [1, 2]])
>>> values = Tensor([1, 2], dtype=ms.float32)
>>> dense_shape = (3, 4)
>>> SparseToDenseCell(dense_shape)(indices, values)
class mindspore.nn.Tanh[source]

Tanh activation function.

Applies the Tanh function element-wise, returns a new tensor with the hyperbolic tangent of the elements of input, The input is a Tensor with any valid shape.

Tanh function is defined as:

\[tanh(x_i) = \frac{\exp(x_i) - \exp(-x_i)}{\exp(x_i) + \exp(-x_i)} = \frac{\exp(2x_i) - 1}{\exp(2x_i) + 1},\]

where \(x_i\) is an element of the input Tensor.

Inputs:
  • input_data (Tensor) - The input of Tanh.

Outputs:

Tensor, with the same type and shape as the input_data.

Examples

>>> input_x = Tensor(np.array([1, 2, 3, 2, 1]), mindspore.float16)
>>> tanh = nn.Tanh()
>>> tanh(input_x)
[0.7617  0.964  0.995  0.964 0.7617]
class mindspore.nn.TensorAddQuant(ema_decay=0.999, per_channel=False, num_bits=8, symmetric=False, narrow_range=False, quant_delay=0)[source]

Add Fake Quant OP after TensorAdd OP.

This part is a more detailed overview of TensorAdd op.

Parameters
  • ema_decay (float) – Exponential Moving Average algorithm parameter. Default: 0.999.

  • per_channel (bool) – Quantization granularity based on layer or on channel. Default: False.

  • num_bits (int) – The bit number of quantization, supporting 4 and 8bits. Default: 8.

  • symmetric (bool) – The quantization algorithm is symmetric or not. Default: False.

  • narrow_range (bool) – The quantization algorithm uses narrow range or not. Default: False.

  • quant_delay (int) – Quantization delay parameters according to the global step. Default: 0.

Inputs:
  • x (Tensor) - The input of TensorAddQuant.

Outputs:

Tensor, with the same type and shape as the x.

Examples

>>> add_quant = nn.TensorAddQuant()
>>> input_x = Tensor(np.array([[1, 2, 1], [-2, 0, -1]]), mindspore.float32)
>>> input_y = Tensor(np.random.randint(-2, 2, (2, 3)), mindspore.float32)
>>> result = add_quant(input_x, input_y)
class mindspore.nn.Top1CategoricalAccuracy[source]

Calculates the top-1 categorical accuracy. This class is a specialized class for TopKCategoricalAccuracy. Refer to class ‘TopKCategoricalAccuracy’ for more details.

Examples

>>> x = Tensor(np.array([[0.2, 0.5, 0.3, 0.6, 0.2], [0.1, 0.35, 0.5, 0.2, 0.],
>>>         [0.9, 0.6, 0.2, 0.01, 0.3]]), mindspore.float32)
>>> y = Tensor(np.array([2, 0, 1]), mindspore.float32)
>>> topk = nn.Top1CategoricalAccuracy()
>>> topk.clear()
>>> topk.update(x, y)
>>> result = topk.eval()
class mindspore.nn.Top5CategoricalAccuracy[source]

Calculates the top-5 categorical accuracy. This class is a specialized class for TopKCategoricalAccuracy. Refer to class ‘TopKCategoricalAccuracy’ for more details.

Examples

>>> x = Tensor(np.array([[0.2, 0.5, 0.3, 0.6, 0.2], [0.1, 0.35, 0.5, 0.2, 0.],
>>>            [0.9, 0.6, 0.2, 0.01, 0.3]]), mindspore.float32)
>>> y = Tensor(np.array([2, 0, 1]), mindspore.float32)
>>> topk = nn.Top5CategoricalAccuracy()
>>> topk.clear()
>>> topk.update(x, y)
>>> result = topk.eval()
class mindspore.nn.TopKCategoricalAccuracy(k)[source]

Calculates the top-k categorical accuracy.

Note

The method update must receive input of the form \((y_{pred}, y)\). If some samples have the same accuracy, the first sample will be chosen.

Parameters

k (int) – Specifies the top-k categorical accuracy to compute.

Raises

Examples

>>> x = Tensor(np.array([[0.2, 0.5, 0.3, 0.6, 0.2], [0.1, 0.35, 0.5, 0.2, 0.],
>>>         [0.9, 0.6, 0.2, 0.01, 0.3]]), mindspore.float32)
>>> y = Tensor(np.array([2, 0, 1]), mindspore.float32)
>>> topk = nn.TopKCategoricalAccuracy(3)
>>> topk.clear()
>>> topk.update(x, y)
>>> result = topk.eval()
clear()[source]

Clear the internal evaluation result.

eval()[source]

Computes the top-k categorical accuracy.

Returns

Float, computed result.

update(*inputs)[source]

Updates the internal evaluation result y_pred and y.

Parameters

inputs – Input y_pred and y. y_pred and y are Tensor, list or numpy.ndarray. y_pred is in most cases (not strictly) a list of floating numbers in range \([0, 1]\) and the shape is \((N, C)\), where \(N\) is the number of cases and \(C\) is the number of categories. y contains values of integers. The shape is \((N, C)\) if one-hot encoding is used. Shape can also be \((N,)\) if category index is used.

class mindspore.nn.TrainOneStepCell(network, optimizer, sens=1.0)[source]

Network training package class.

Wraps the network with an optimizer. The resulting Cell is trained with input *inputs. The backward graph will be created in the construct function to update the parameter. Different parallel modes are available for training.

Parameters
  • network (Cell) – The training network. The network only supports single output.

  • optimizer (Cell) – Optimizer for updating the weights.

  • sens (Number) – The scaling number to be filled as the input of backpropagation. Default value is 1.0.

Inputs:
  • (*inputs) (Tuple(Tensor)) - Tuple of input tensors with shape \((N, \ldots)\).

Outputs:

Tensor, a scalar Tensor with shape \(()\).

Examples

>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> optim = nn.Momentum(net.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> #1) Using the WithLossCell existing provide
>>> loss_net = nn.WithLossCell(net, loss_fn)
>>> train_net = nn.TrainOneStepCell(loss_net, optim)
>>>
>>> #2) Using user-defined WithLossCell
>>>class MyWithLossCell(nn.cell):
>>>    def __init__(self, backbone, loss_fn):
>>>        super(WithLossCell, self).__init__(auto_prefix=False)
>>>        self._backbone = backbone
>>>        self._loss_fn = loss_fn
>>>
>>>    def construct(self, x, y, label):
>>>        out = self._backbone(x, y)
>>>        return self._loss_fn(out, label)
>>>
>>> loss_net = MyWithLossCell(net, loss_fn)
>>> train_net = nn.TrainOneStepCell(loss_net, optim)
class mindspore.nn.TrainOneStepWithLossScaleCell(network, optimizer, scale_sense)[source]

Network training with loss scaling.

This is a training step with loss scaling. It takes a network, an optimizer and possibly a scale update Cell as args. The loss scale value can be updated in both host side or device side. The TrainOneStepWithLossScaleCell will be compiled to be graph which takes *inputs as input data. The Tensor type of scale_sense is acting as loss scaling value. If you want to update it on host side, the value must be provided. If the Tensor type of scale_sense is not given, the loss scale update logic must be provied by Cell type of scale_sense.

Parameters
  • network (Cell) – The training network. The network only supports single output.

  • optimizer (Cell) – Optimizer for updating the weights.

  • scale_sense (Union[Tensor, Cell]) – If this value is Cell type, the loss scaling update logic cell.If this value is Tensor type, Tensor with shape \(()\) or \((1,)\).

Inputs:
  • (*inputs) (Tuple(Tensor)) - Tuple of input tensors with shape \((N, \ldots)\).

Outputs:

Tuple of 3 Tensor, the loss, overflow flag and current loss scaling value.

  • loss (Tensor) - Tensor with shape \(()\).

  • overflow (Tensor) - Tensor with shape \(()\), type is bool.

  • loss scaling value (Tensor) - Tensor with shape \(()\)

Examples

>>> net_with_loss = Net()
>>> optimizer = nn.Momentum(net_with_loss.trainable_params(), learning_rate=0.1, momentum=0.9)
>>> manager = nn.DynamicLossScaleUpdateCell(loss_scale_value=2**12, scale_factor=2, scale_window=1000)
>>> train_network = nn.TrainOneStepWithLossScaleCell(net_with_loss, optimizer, scale_sense=manager)
>>> train_network.set_train()
>>>
>>> inputs = Tensor(np.ones([16, 16]).astype(np.float32))
>>> label = Tensor(np.zeros([16, 16]).astype(np.float32))
>>> scaling_sens = Tensor(np.full((1), np.finfo(np.float32).max), dtype=mindspore.float32)
>>> output = train_network(inputs, label, scaling_sens)
set_sense_scale(sens)[source]

If the user has set the sens in the training process and wants to reassign the value, he can call this function again to make modification, and sens needs to be of type Tensor.

class mindspore.nn.Unfold(ksizes, strides, rates, padding='valid')[source]

Extract patches from images. The input tensor must be a 4-D tensor and the data format is NCHW.

Parameters
  • ksizes (Union[tuple[int], list[int]]) – The size of sliding window, must be a tuple or a list of integers, and the format is [1, ksize_row, ksize_col, 1].

  • strides (Union[tuple[int], list[int]]) – Distance between the centers of the two consecutive patches, must be a tuple or list of int, and the format is [1, stride_row, stride_col, 1].

  • rates (Union[tuple[int], list[int]]) – In each extracted patch, the gap between the corresponding dimension pixel positions, must be a tuple or a list of integers, and the format is [1, rate_row, rate_col, 1].

  • padding (str) –

    The type of padding algorithm, is a string whose value is “same” or “valid”, not case sensitive. Default: “valid”.

    • same: Means that the patch can take the part beyond the original image, and this part is filled with 0.

    • valid: Means that the taken patch area must be completely covered in the original image.

Inputs:
  • input_x (Tensor) - A 4-D tensor whose shape is [in_batch, in_depth, in_row, in_col] and data type is number.

Outputs:

Tensor, a 4-D tensor whose data type is same as ‘input_x’, and the shape is [out_batch, out_depth, out_row, out_col], the out_batch is the same as the in_batch.

Examples

>>> net = Unfold(ksizes=[1, 2, 2, 1], strides=[1, 1, 1, 1], rates=[1, 1, 1, 1])
>>> image = Tensor(np.ones([1, 1, 3, 3]), dtype=mstype.float16)
>>> net(image)
Tensor ([[[[1, 1] [1, 1]] [[1, 1], [1, 1]] [[1, 1] [1, 1]], [[1, 1], [1, 1]]]],
        shape=(1, 4, 2, 2), dtype=mstype.float16)
class mindspore.nn.VirtualDatasetCellTriple(backbone)[source]

Wrap the network with virtual dataset to convert data parallel layout to model parallel layout.

VirtualDatasetCellTriple is a virtual Primitive, it does not exist in the final executing graph. Inputs and outputs of VirtualDatasetCellTriple are distributed in data parallel pattern, tensor redistribution Primitives is inserted dynamically during the graph compile process.

Note

Only used in semi auto parallel and auto parallel mode. There are three inputs, as contrary to two inputs in _VirtualDatasetCell.

Parameters

backbone (Cell) – The target network to wrap.

Examples

>>> net = Net()
>>> net = VirtualDatasetCellTriple(net)
class mindspore.nn.WarmUpLR(learning_rate, warmup_steps)[source]

Get learning rate warming up.

For the i-th step, the formula of computing warmup_learning_rate[i] is:

\[warmup\_learning\_rate[i] = learning\_rate * tmp\_step / warmup\_steps\]

Where :

Parameters
  • learning_rate (float) – The initial value of learning rate.

  • warmup_steps (int) – The warm up steps of learning rate.

Inputs:

Tensor. The current step number.

Returns

Tensor. The learning rate value for the current step.

Examples

>>> learning_rate = 0.1
>>> warmup_steps = 2
>>> global_step = Tenosr(2, mstype.int32)
>>> warmup_lr = WarmUpLR(learning_rate, warmup_steps)
>>> warmup_lr(global_step)
class mindspore.nn.WithEvalCell(network, loss_fn, add_cast_fp32=False)[source]

Cell that returns loss, output and label for evaluation.

This Cell accepts a network and loss function as arguments and computes loss for model. It returns loss, output and label to calculate the metrics.

Parameters
  • network (Cell) – The network Cell.

  • loss_fn (Cell) – The loss Cell.

Inputs:
  • data (Tensor) - Tensor of shape \((N, \ldots)\).

  • label (Tensor) - Tensor of shape \((N, \ldots)\).

Outputs:

Tuple, containing a scalar loss Tensor, a network output Tensor of shape \((N, \ldots)\) and a label Tensor of shape \((N, \ldots)\).

Examples

>>> # For a defined network Net without loss function
>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> eval_net = nn.WithEvalCell(net, loss_fn)
class mindspore.nn.WithGradCell(network, loss_fn=None, sens=None)[source]

Cell that returns the gradients.

Wraps the network with backward cell to compute gradients. A network with a loss function is necessary as argument. If loss function in None, the network must be a wrapper of network and loss function. This Cell accepts ‘*inputs’ as inputs and returns gradients for each trainable parameter.

Note

Run in PyNative mode.

Parameters
  • network (Cell) – The target network to wrap. The network only supports single output.

  • loss_fn (Cell) – Primitive loss function used to compute gradients. Default: None.

  • sens (Union[None, Tensor, Scalar, Tuple ...]) – The sensitive for backpropagation, the type and shape must be same as the network output. If None, we will fill one to a same type shape of output value. Default: None.

Inputs:
  • (*inputs) (Tuple(Tensor)) - Tuple of input tensors with shape \((N, \ldots)\).

Outputs:

list, a list of Tensors with identical shapes as trainable weights.

Examples

>>> # For a defined network Net without loss function
>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits()
>>> grad_net = nn.WithGradCell(net, loss_fn)
>>>
>>> # For a network wrapped with loss function
>>> net = Net()
>>> net_with_criterion = nn.WithLossCell(net, loss_fn)
>>> grad_net = nn.WithGradCell(net_with_criterion)
class mindspore.nn.WithLossCell(backbone, loss_fn)[source]

Cell with loss function.

Wraps the network with loss function. This Cell accepts data and label as inputs and the computed loss will be returned.

Parameters
  • backbone (Cell) – The target network to wrap.

  • loss_fn (Cell) – The loss function used to compute loss.

Inputs:
  • data (Tensor) - Tensor of shape \((N, \ldots)\).

  • label (Tensor) - Tensor of shape \((N, \ldots)\).

Outputs:

Tensor, a scalar tensor with shape \(()\).

Examples

>>> net = Net()
>>> loss_fn = nn.SoftmaxCrossEntropyWithLogits(is_grad=False, sparse=True)
>>> net_with_criterion = nn.WithLossCell(net, loss_fn)
>>>
>>> batch_size = 2
>>> data = Tensor(np.ones([batch_size, 3, 64, 64]).astype(np.float32) * 0.01)
>>> label = Tensor(np.ones([batch_size, 1, 1, 1]).astype(np.int32))
>>>
>>> net_with_criterion(data, label)
property backbone_network

Returns the backbone network.

Returns

Cell, the backbone network.

mindspore.nn.get_activation(name)[source]

Gets the activation function.

Parameters

name (str) – The name of the activation function.

Returns

Function, the activation function.

Examples

>>> sigmoid = nn.get_activation('sigmoid')
mindspore.nn.get_metric_fn(name, *args, **kwargs)[source]

Gets the metric method based on the input name.

Parameters
  • name (str) – The name of metric method. Refer to the ‘__factory__’ object for the currently supported metrics.

  • args – Arguments for the metric function.

  • kwargs – Keyword arguments for the metric function.

Returns

Metric object, class instance of the metric method.

Examples

>>> metric = nn.get_metric_fn('precision', eval_type='classification')
mindspore.nn.names()[source]

Get the names of the metric methods.

Returns

List, the name list of metric methods.