mindspore.ops.AdamWeightDecay

class mindspore.ops.AdamWeightDecay(use_locking=False)[source]

Updates gradients by the Adaptive Moment Estimation (AdamWeightDecay) algorithm with weight decay.

The Adam algorithm is proposed in Adam: A Method for Stochastic Optimization. The AdamWeightDecay variant was proposed in Decoupled Weight Decay Regularization.

The updating formulas are as follows,

\[\begin{split}\begin{array}{ll} \\ m = \beta_1 * m + (1 - \beta_1) * g \\ v = \beta_2 * v + (1 - \beta_2) * g * g \\ update = \frac{m}{\sqrt{v} + eps} \\ update = \begin{cases} update + weight\_decay * w & \text{ if } weight\_decay > 0 \\ update & \text{ otherwise } \end{cases} \\ w = w - lr * update \end{array}\end{split}\]

\(m\) represents the 1st moment vector, \(v\) represents the 2nd moment vector, \(g\) represents gradient, \(\beta_1, \beta_2\) represent beta1 and beta2, \(lr\) represents learning_rate, \(w\) represents var, \(decay\) represents weight_decay, \(\epsilon\) represents epsilon.

Parameters: use_locking (bool) – Whether to enable a lock to protect variable tensors from being updated. If true, updates of the var, m, and v tensors will be protected by a lock. If false, the result is unpredictable. Default: False.

Inputs:

var (Tensor) - Weights to be updated. The shape is \((N, *)\) where \(*\) means, any number of additional dimensions. The data type can be float16 or float32.
m (Tensor) - The 1st moment vector in the updating formula, the shape and data type value should be the same as var.
v (Tensor) - the 2nd moment vector in the updating formula, the shape and data type value should be the same as var. Mean square gradients with the same type as var.
lr (float) - \(l\) in the updating formula. The paper suggested value is \(10^{-8}\), the data type value should be the same as var.
beta1 (float) - The exponential decay rate for the 1st moment estimations, the data type value should be the same as var. The paper suggested value is \(0.9\)
beta2 (float) - The exponential decay rate for the 2nd moment estimations, the data type value should be the same as var. The paper suggested value is \(0.999\)
epsilon (float) - Term added to the denominator to improve numerical stability.
decay (float) - The weight decay value, must be a scalar tensor with float data type. Default: 0.0.
gradient (Tensor) - Gradient, has the same shape and data type as var.

Outputs:

Tuple of 3 Tensor, the updated parameters.

var (Tensor) - The same shape and data type as var.
m (Tensor) - The same shape and data type as m.
v (Tensor) - The same shape and data type as v.

Supported Platforms:

GPU CPU

Examples

>>> import numpy as np
>>> import mindspore.nn as nn
>>> from mindspore import Tensor, Parameter, ops
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.adam_weight_decay = ops.AdamWeightDecay()
...         self.var = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="var")
...         self.m = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="m")
...         self.v = Parameter(Tensor(np.ones([2, 2]).astype(np.float32)), name="v")
...     def construct(self, lr, beta1, beta2, epsilon, decay, grad):
...         out = self.adam_weight_decay(self.var, self.m, self.v, lr, beta1, beta2,
...                               epsilon, decay, grad)
...         return out
>>> net = Net()
>>> gradient = Tensor(np.ones([2, 2]).astype(np.float32))
>>> output = net(0.001, 0.9, 0.999, 1e-8, 0.0, gradient)
>>> print(net.var.asnumpy())