mindspore.ops.ApplyMomentum

class mindspore.ops.ApplyMomentum(use_nesterov=False, use_locking=False, gradient_scale=1.0)[source]

Optimizer that implements the Momentum algorithm.

Refer to the paper On the importance of initialization and momentum in deep learning for more details.

\[v_{t+1} = v_{t} \times u + gradients\]

If use_nesterov is True:

\[p_{t+1} = p_{t} - (grad \times lr + v_{t+1} \times u \times lr)\]

If use_nesterov is False:

\[p_{t+1} = p_{t} - lr \times v_{t+1}\]

Here: where grad, lr, p, v and u denote the gradients, learning_rate, params, moments, and momentum respectively.

Inputs of variable, accumulation and gradient comply with the implicit type conversion rules to make the data types consistent. If they have different data types, the lower priority data type will be converted to the relatively highest priority data type.

Refer to mindspore.nn.Momentum for more details about the formula and usage.

Note

When separating parameter groups, the weight decay in each group will be applied on the parameters if the weight decay is positive. When not separating parameter groups, the weight_decay in the API will be applied on the parameters without ‘beta’ or ‘gamma’ in their names if weight_decay is positive.

When separating parameter groups, if you want to centralize the gradient, set grad_centralization to True, but the gradient centralization can only be applied to the parameters of the convolution layer. If the parameters of the non-convolution layer are set to True, an error will be reported.

To improve parameter groups performance, the customized order of parameters can be supported.

Parameters
  • use_locking (bool) – Whether to enable a lock to protect the variable and accumulation tensors from being updated. Default: False.

  • use_nesterov (bool) – Enable Nesterov momentum. Default: False.

  • gradient_scale (float) – The scale of the gradient. Default: 1.0.

Inputs:
  • variable (Parameter) - Weights to be updated. Data type must be float.

  • accumulation (Parameter) - Accumulated gradient value by moment weight, has the same data type with variable.

  • learning_rate (Union[Number, Tensor]) - The learning rate value, must be a float number or a scalar tensor with float data type.

  • gradient (Tensor) - Gradient, has the same data type as variable.

  • momentum (Union[Number, Tensor]) - Momentum, must be a float number or a scalar tensor with float data type.

Outputs:

Tensor, parameters to be updated.

Raises
  • TypeError – If the use_locking or use_nesterov is not a bool or gradient_scale is not a float.

  • RuntimeError – If the data type of var, accum and grad conversion of Parameter is not supported.

Supported Platforms:

Ascend GPU CPU

Examples

Please refer to the usage in mindspore.nn.Momentum.