# 梯度求导

## 自动微分接口

MindSpore求梯度的接口目前有三种：

• fn (Union[Cell, Function]) - 待求导的函数或网络（Cell）。

• grad_position (Union[NoneType, int, tuple[int]]) - 指定求导输入位置的索引，默认值：0。

• weights (Union[ParameterTuple, Parameter, list[Parameter]]) - 训练网络中需要返回梯度的网络参数，默认值：None。

• has_aux (bool) - 是否返回辅助参数的标志。若为True， fn 输出数量必须超过一个，其中只有 fn 第一个输出参与求导，其他输出值将直接返回。默认值：False。

weights

output

0

None

1

None

(0, 1)

None

(第一个输入的梯度, 第二个输入的梯度)

None

weights

(weights的梯度)

0

weights

(第一个输入的梯度), (weights的梯度)

(0, 1)

weights

(第一个输入的梯度, 第二个输入的梯度), (weights的梯度)

None

None

[1]:

import mindspore as ms
from mindspore import nn

class Net(nn.Cell):
def __init__(self, in_channel, out_channel):
super(Net, self).__init__()
self.fc = nn.Dense(in_channel, out_channel, has_bias=False)
self.loss = nn.MSELoss()

def construct(self, x, y):
logits = self.fc(x).squeeze()
loss = self.loss(logits, y)
return loss, logits

net = Net(3, 1)
net.fc.weight.set_data(ms.Tensor([[2, 3, 4]], ms.float32))   # 给全连接的weight设置固定值

print("=== weight ===")
for param in net.trainable_params():
print("name:", param.name, "data:", param.data.asnumpy())
x = ms.Tensor([[1, 2, 3]], ms.float32)
y = ms.Tensor(19, ms.float32)

loss, logits = net(x, y)
print("=== output ===")
print(loss, logits)

=== weight ===
name: fc.weight data: [[2. 3. 4.]]
=== output ===
1.0 20.0

[2]:

# 对第一个输入求梯度

print("logit", logit)

=== grads 1 ===
logit (Tensor(shape=[], dtype=Float32, value= 20),)

[3]:

# 对第二个输入求梯度

print("logit", logit)

=== grads 2 ===
logit (Tensor(shape=[], dtype=Float32, value= 20),)

[4]:

# 对多个输入求梯度

print("logit", logit)

=== grads 3 ===
[[4.00000000e+000, 6.00000000e+000, 8.00000000e+000]]), Tensor(shape=[], dtype=Float32, value= -2))
logit (Tensor(shape=[], dtype=Float32, value= 20),)

[5]:

# 对weights求梯度

print("logits", logit)

=== grads 4 ===
[[2.00000000e+000, 4.00000000e+000, 6.00000000e+000]]),)
logits (Tensor(shape=[], dtype=Float32, value= 20),)

[6]:

# 对第一个输入和weights求梯度

print("logit", logit)

=== grads 5 ===
[[4.00000000e+000, 6.00000000e+000, 8.00000000e+000]]), (Tensor(shape=[1, 3], dtype=Float32, value=
[[2.00000000e+000, 4.00000000e+000, 6.00000000e+000]]),))
logit (Tensor(shape=[], dtype=Float32, value= 20),)

[7]:

# 对多个输入和weights求梯度

print("logit", logit)

=== grads 6 ===
[[4.00000000e+000, 6.00000000e+000, 8.00000000e+000]]), Tensor(shape=[], dtype=Float32, value= -2)), (Tensor(shape=[1, 3], dtype=Float32, value=
[[2.00000000e+000, 4.00000000e+000, 6.00000000e+000]]),))
logit (Tensor(shape=[], dtype=Float32, value= 20),)

[8]:

# has_aux=False的场景


=== grads 7 ===


has_aux=False的场景实际上等价于两个输出相加作为求梯度的输出：

[9]:

class Net2(nn.Cell):
def __init__(self, in_channel, out_channel):
super().__init__()
self.fc = nn.Dense(in_channel, out_channel, has_bias=False)
self.loss = nn.MSELoss()

def construct(self, x, y):
logits = self.fc(x).squeeze()
loss = self.loss(logits, y)
return loss + logits

net2 = Net2(3, 1)
net2.fc.weight.set_data(ms.Tensor([[2, 3, 4]], ms.float32))   # 给全连接的weight设置固定值

grad [[ 6.  9. 12.]]

# grad_position=None, weights=None

print("logit", logit)

# ValueError: grad_position and weight can not be None at the same time.


weights

output

0

None

(网络的输出, 第一个输入的梯度)

1

None

(网络的输出, 第二个输入的梯度)

(0, 1)

None

(网络的输出, (第一个输入的梯度, 第二个输入的梯度))

None

weights

(网络的输出, (weights的梯度))

0

weights

(网络的输出, ((第一个输入的梯度), (weights的梯度)))

(0, 1)

weights

(网络的输出, ((第一个输入的梯度, 第二个输入的梯度), (weights的梯度)))

None

None

[11]:

print("=== value and grad ===")
print("value", value)

=== value and grad ===
value (Tensor(shape=[], dtype=Float32, value= 1), Tensor(shape=[], dtype=Float32, value= 20))
[[4.00000000e+000, 6.00000000e+000, 8.00000000e+000]]), Tensor(shape=[], dtype=Float32, value= -2)), (Tensor(shape=[1, 3], dtype=Float32, value=
[[2.00000000e+000, 4.00000000e+000, 6.00000000e+000]]),))


## loss scale

[12]:

from mindspore.amp import StaticLossScaler, all_finite

loss_scale = StaticLossScaler(1024.)  # 静态lossscale

def forward_fn(x, y):
loss, logits = net(x, y)
print("loss", loss)
loss = loss_scale.scale(loss)
return loss, logits

print("=== loss scale ===")
print("loss", loss)
print("=== unscale ===")
loss = loss_scale.unscale(loss)
print("loss", loss)

# 检查是否溢出，无溢出的话返回True
print(state)

loss 1.0
=== loss scale ===
loss 1024.0
[[2.04800000e+003, 4.09600000e+003, 6.14400000e+003]]),)
=== unscale ===
loss 1.0
[[2.00000000e+000, 4.00000000e+000, 6.00000000e+000]]),)
True


loss scale的原理非常简单，通过给loss乘一个比较大的值，通过梯度的链式传导，在计算梯度的链路上乘一个比较大的值，防止在梯度反向传播过程中过小而出现精度问题。

## 梯度裁剪

[13]:

from mindspore import ops