mindscience.common.AdaHessian
- class mindscience.common.AdaHessian(params, learning_rate=1e-3, beta1=0.9, beta2=0.999, eps=1e-8, use_locking=False, use_nesterov=False, weight_decay=0.0, loss_scale=1.0, use_amsgrad=False, **kwargs)[source]
The Adahessian optimizer, which performs optimization using second-order information from the diagonal elements of the Hessian matrix. It has been proposed in ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning . The Hessian power here is fixed to
1, and the way of spatially averaging the Hessian traces isfor 1D: no spatial average.
for 2D: use the entire row as the spatial average.
for 3D (assume 1D Conv, can be customized): use the last dimension as spatial average.
for 4D (assume 2D Conv, can be customized): use the last 2 dimensions as spatial average.
Args see mindspore.nn.Adam .
Examples
>>> import numpy as np >>> import mindspore as ms >>> from mindspore import ops, nn >>> from mindscience.common import AdaHessian >>> ms.set_context(device_target="Ascend", mode=ms.GRAPH_MODE) >>> net = nn.Conv2d(in_channels=2, out_channels=4, kernel_size=3) >>> def forward(a): >>> return ops.mean(net(a)**2)**.5 >>> grad_fn = ms.grad(forward, grad_position=None, weights=net.trainable_params()) >>> optimizer = AdaHessian(net.trainable_params()) >>> inputs = ms.Tensor(np.reshape(range(100), [2, 2, 5, 5]), dtype=ms.float32) >>> optimizer(grad_fn, inputs) >>> print(optimizer.moment2[0].shape) (4, 2, 3, 3)