mindspore.nn.Optimizer

class mindspore.nn.Optimizer(learning_rate, parameters, weight_decay=0.0, loss_scale=1.0)[source]

Base class for updating parameters. Never use this class directly, but instantiate one of its subclasses instead.

Grouping parameters is supported. If parameters are grouped, different strategy of learning_rate, weight_decay and grad_centralization can be applied to each group.

Note

If parameters are not grouped, the weight_decay in optimizer will be applied on the network parameters without ‘beta’ or ‘gamma’ in their names. Users can group parameters to change the strategy of decaying weight. When parameters are grouped, each group can set weight_decay, if not, the weight_decay in optimizer will be applied.

Parameters

learning_rate (Union[float, int, Tensor, Iterable, LearningRateSchedule]) –
- float: The fixed learning rate value. Must be equal to or greater than 0.
- int: The fixed learning rate value. Must be equal to or greater than 0. It will be converted to float.
- Tensor: Its value should be a scalar or a 1-D vector. For scalar, fixed learning rate will be applied. For vector, learning rate is dynamic, then the i-th step will take the i-th value as the learning rate.
- Iterable: Learning rate is dynamic. The i-th step will take the i-th value as the learning rate.
- LearningRateSchedule: Learning rate is dynamic. During training, the optimizer calls the instance of LearningRateSchedule with step as the input to get the learning rate of current step.
parameters (Union[list[Parameter], list[dict]]) –
Must be list of Parameter or list of dict. When the parameters is a list of dict, the string “params”, “lr”, “weight_decay”, “grad_centralization” and “order_params” are the keys can be parsed.
- params: Required. Parameters in current group. The value must be a list of Parameter.
- lr: Optional. If “lr” in the keys, the value of corresponding learning rate will be used. If not, the learning_rate in optimizer will be used. Fixed and dynamic learning rate are supported.
- weight_decay: Optional. If “weight_decay” in the keys, the value of corresponding weight decay will be used. If not, the weight_decay in the optimizer will be used.
- grad_centralization: Optional. Must be Boolean. If “grad_centralization” is in the keys, the set value will be used. If not, the grad_centralization is False by default. This configuration only works on the convolution layer.
- order_params: Optional. When parameters is grouped, this usually is used to maintain the order of parameters that appeared in the network to improve performance. The value should be parameters whose order will be followed in optimizer. If order_params in the keys, other keys will be ignored and the element of ‘order_params’ must be in one group of params.
weight_decay (Union[float, int]) – An int or a floating point value for the weight decay. It must be equal to or greater than 0. If the type of weight_decay input is int, it will be converted to float. Default: 0.0.
loss_scale (float) – A floating point value for the loss scale. It must be greater than 0. If the type of loss_scale input is int, it will be converted to float. In general, use the default value. Only when FixedLossScaleManager is used for training and the drop_overflow_update in FixedLossScaleManager is set to False, this value needs to be the same as the loss_scale in FixedLossScaleManager. Refer to class mindspore.FixedLossScaleManager for more details. Default: 1.0.

Raises

TypeError – If learning_rate is not one of int, float, Tensor, Iterable, LearningRateSchedule.
TypeError – If element of parameters is neither Parameter nor dict.
TypeError – If loss_scale is not a float.
TypeError – If weight_decay is neither float nor int.
ValueError – If loss_scale is less than or equal to 0.
ValueError – If weight_decay is less than 0.
ValueError – If learning_rate is a Tensor, but the dimension of tensor is greater than 1.

Supported Platforms:: Ascend GPU CPU

broadcast_params(optim_result)[source]

Apply Broadcast operations in the sequential order of parameter groups.

Parameters: optim_result (bool) – The results of updating parameters. This input is used to ensure that the parameters are updated before they are broadcast.
Returns: bool, the status flag.

decay_weight(gradients)[source]

Weight decay.

An approach to reduce the overfitting of a deep learning neural network model. User-defined optimizers based on mindspore.nn.Optimizer can also call this interface to apply weight decay.

Parameters: gradients (tuple[Tensor]) – The gradients of network parameters, and have the same shape as the parameters.
Returns: tuple[Tensor], The gradients after weight decay.

get_lr()[source]

The optimizer calls this interface to get the learning rate for the current step. User-defined optimizers based on mindspore.nn.Optimizer can also call this interface before updating the parameters.

Returns: float, the learning rate of current step.

get_lr_parameter(param)[source]

When parameters is grouped and learning rate is different for each group. Get the learning rate of the specified param.

Parameters: param (Union[Parameter, list[Parameter]]) – The Parameter or list of Parameter.
Returns: Parameter, single Parameter or list[Parameter] according to the input type. If learning rate is dynamic, LearningRateSchedule or list[LearningRateSchedule] that used to calculate the learning rate will be returned.

Examples

>>> from mindspore import nn
>>> net = Net()
>>> conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))
>>> no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))
>>> group_params = [{'params': conv_params, 'lr': 0.05},
...                 {'params': no_conv_params, 'lr': 0.01}]
>>> optim = nn.Momentum(group_params, learning_rate=0.1, momentum=0.9, weight_decay=0.0)
>>> conv_lr = optim.get_lr_parameter(conv_params)
>>> print(conv_lr[0].asnumpy())
0.05

gradients_centralization(gradients)[source]

Gradients centralization.

A method for optimizing convolutional layer parameters to improve the training speed of a deep learning neural network model. User-defined optimizers based on mindspore.nn.Optimizer can also call this interface to centralize gradients.

Parameters: gradients (tuple[Tensor]) – The gradients of network parameters, and have the same shape as the parameters.
Returns: tuple[Tensor], The gradients after gradients centralization.

scale_grad(gradients)[source]

Restore gradients for mixed precision.

User-defined optimizers based on mindspore.nn.Optimizer can also call this interface to restore gradients.

Parameters: gradients (tuple[Tensor]) – The gradients of network parameters, and have the same shape as the parameters.
Returns: tuple[Tensor], The gradients after loss scale.

property target: The property is used to determine whether the parameter is updated on host or device. The input type is str and can only be ‘CPU’, ‘Ascend’ or ‘GPU’.

property unique: Whether to make the gradients unique in optimizer. Generally, it is used in sparse networks. Set to True if the gradients of the optimizer are sparse. Set to False if the forward network has made the parameters unique, that is, the gradients of the optimizer is no longer sparse. The default value is True when it is not set.