{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 自动微分\n", "\n", "[![](https://gitee.com/mindspore/docs/raw/r1.2/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.2/tutorials/source_zh_cn/autograd.ipynb) [![](https://gitee.com/mindspore/docs/raw/r1.2/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.2/quick_start/mindspore_autograd.ipynb) [![](https://gitee.com/mindspore/docs/raw/r1.2/tutorials/training/source_zh_cn/_static/logo_modelarts.png)](https://console.huaweicloud.com/modelarts/?region=cn-north-4#/notebook/loading?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svbW9kZWxhcnRzL3F1aWNrX3N0YXJ0L21pbmRzcG9yZV9hdXRvZ3JhZC5pcHluYg==&image_id=65f636a0-56cf-49df-b941-7d2a07ba8c8c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "在训练神经网络时,最常用的算法是反向传播,在该算法中,根据损失函数对于给定参数的梯度来调整参数(模型权重)。\n", "\n", "MindSpore计算一阶导数方法`mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)`,其中`get_all`为`False`时,只会对第一个输入求导,为`True`时,会对所有输入求导;`get_by_list`为`False`时,不会对权重求导,为`True`时,会对权重求导;`sens_param`对网络的输出值做缩放以改变最终梯度。下面用MatMul算子的求导做深入分析。\n", "\n", "首先导入本文档需要的模块和接口,如下所示:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "import mindspore.ops as ops\n", "from mindspore import Tensor\n", "from mindspore import ParameterTuple, Parameter\n", "from mindspore import dtype as mstype" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 对输入求一阶导\n", "\n", "如果需要对输入进行求导,首先需要定义一个需要求导的网络,以一个由MatMul算子构成的网络$f(x,y)=z * x * y$为例。\n", "\n", "定义网络结构如下:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "class Net(nn.Cell):\n", " def __init__(self):\n", " super(Net, self).__init__()\n", " self.matmul = ops.MatMul()\n", " self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')\n", "\n", " def construct(self, x, y):\n", " x = x * self.z\n", " out = self.matmul(x, y)\n", " return out" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "接着定义求导网络,`__init__`函数中定义需要求导的网络`self.net`和`ops.GradOperation`操作,`construct`函数中对`self.net`进行求导。\n", "\n", "求导网络结构如下:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "class GradNetWrtX(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNetWrtX, self).__init__()\n", " self.net = net\n", " self.grad_op = ops.GradOperation()\n", "\n", " def construct(self, x, y):\n", " gradient_function = self.grad_op(self.net)\n", " return gradient_function(x, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "定义输入并且打印输出:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[4.5099998 2.7 3.6000001]\n", " [4.5099998 2.7 3.6000001]]\n" ] } ], "source": [ "x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)\n", "y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)\n", "output = GradNetWrtX(Net())(x, y)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "若考虑对`x`、`y`输入求导,只需在`GradNetWrtX`中设置`self.grad_op = GradOperation(get_all=True)`。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 对权重求一阶导\n", "\n", "若需要对权重的求导,将`ops.GradOperation`中的`get_by_list`设置为`True`:\n", "\n", "则`GradNetWrtX`结构为:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "class GradNetWrtX(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNetWrtX, self).__init__()\n", " self.net = net\n", " self.params = ParameterTuple(net.trainable_params())\n", " self.grad_op = ops.GradOperation(get_by_list=True)\n", "\n", " def construct(self, x, y):\n", " gradient_function = self.grad_op(self.net, self.params)\n", " return gradient_function(x, y)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "运行并打印输出:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)\n" ] } ], "source": [ "output = GradNetWrtX(Net())(x, y)\n", "print(output)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "若需要对某些权重不进行求导,则在定义求导网络时,对相应的权重中`requires_grad`设置为`False`。\n", "\n", "```Python\n", "self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z', requires_grad=False)\n", "```\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 梯度值缩放\n", "\n", "可以通过`sens_param`参数对网络的输出值做缩放以改变最终梯度。首先将`ops.GradOperation`中的`sens_param`设置为`True`,并确定缩放指数,其维度与输出维度保持一致。\n", "\n", "缩放指数`self.grad_wrt_output`可以记作如下形式:\n", "\n", "```python\n", "self.grad_wrt_output = Tensor([[s1, s2, s3], [s4, s5, s6]])\n", "```\n", "\n", "则`GradNetWrtX`结构为:" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[[2.211 0.51 1.49 ]\n", " [5.588 2.68 4.07 ]]\n" ] } ], "source": [ "class GradNetWrtX(nn.Cell):\n", " def __init__(self, net):\n", " super(GradNetWrtX, self).__init__()\n", " self.net = net\n", " self.grad_op = ops.GradOperation(sens_param=True)\n", " self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32)\n", "\n", " def construct(self, x, y):\n", " gradient_function = self.grad_op(self.net)\n", " return gradient_function(x, y, self.grad_wrt_output)\n", "\n", "output = GradNetWrtX(Net())(x, y) \n", "print(output)" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 4 }