{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "# 自动微分\n",
    "\n",
    "`Ascend` `GPU` `CPU` `入门` `模型开发`\n",
    "\n",
    "[![在线运行](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tb2RlbGFydHMvcXVpY2tfc3RhcnQvbWluZHNwb3JlX2F1dG9ncmFkLmlweW5i&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c)&emsp;[![下载Notebook](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.6/tutorials/zh_cn/mindspore_autograd.ipynb)&emsp;[![下载样例代码](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.6/tutorials/zh_cn/mindspore_autograd.py)&emsp;[![查看源文件](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.6/tutorials/source_zh_cn/autograd.ipynb)"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "自动微分是网络训练中常用的反向传播算法的一般化，利用该算法用户可以将多层复合函数分解为一系列简单的基本运算，该功能让用户可以跳过复杂的求导过程的编程，从而大大降低框架的使用门槛。\n",
    "\n",
    "MindSpore计算一阶导数方法`mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)`，其中`get_all`为`False`时，只会对第一个输入求导，为`True`时，会对所有输入求导；`get_by_list`为`False`时，不会对权重求导，为`True`时，会对权重求导；`sens_param`对网络的输出值做缩放以改变最终梯度。下面用MatMul算子的求导做深入分析。\n",
    "\n",
    "首先导入本文档需要的模块和接口，如下所示："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "source": [
    "import numpy as np\n",
    "import mindspore.nn as nn\n",
    "import mindspore.ops as ops\n",
    "from mindspore import Tensor\n",
    "from mindspore import ParameterTuple, Parameter\n",
    "from mindspore import dtype as mstype"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 对输入求一阶导\n",
    "\n",
    "如果需要对输入进行求导，首先需要定义一个需要求导的网络，以一个由MatMul算子构成的网络$f(x,y)=z*x*y$为例。\n",
    "\n",
    "定义网络结构如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "source": [
    "class Net(nn.Cell):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.matmul = ops.MatMul()\n",
    "        self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        x = x * self.z\n",
    "        out = self.matmul(x, y)\n",
    "        return out"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "接着定义求导网络，`__init__`函数中定义需要求导的网络`self.net`和`ops.GradOperation`操作，`construct`函数中对`self.net`进行求导。\n",
    "\n",
    "求导网络结构如下："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "定义输入并且打印输出："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "source": [
    "x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[[4.5099998 2.7       3.6000001]\n",
      " [4.5099998 2.7       3.6000001]]\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "若考虑对`x`、`y`输入求导，只需在`GradNetWrtX`中设置`self.grad_op = GradOperation(get_all=True)`。"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 对权重求一阶导\n",
    "\n",
    "若需要对权重的求导，将`ops.GradOperation`中的`get_by_list`设置为`True`：\n",
    "\n",
    "则`GradNetWrtX`结构为："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.params = ParameterTuple(net.trainable_params())\n",
    "        self.grad_op = ops.GradOperation(get_by_list=True)\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net, self.params)\n",
    "        return gradient_function(x, y)"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "运行并打印输出："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "source": [
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "若需要对某些权重不进行求导，则在定义求导网络时，对相应的权重中`requires_grad`设置为`False`。\n",
    "\n",
    "```Python\n",
    "self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z', requires_grad=False)\n",
    "```"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 梯度值缩放\n",
    "\n",
    "可以通过`sens_param`参数对网络的输出值做缩放以改变最终梯度。首先将`ops.GradOperation`中的`sens_param`设置为`True`，并确定缩放指数，其维度与输出维度保持一致。\n",
    "\n",
    "缩放指数`self.grad_wrt_output`可以记作如下形式：\n",
    "\n",
    "```python\n",
    "self.grad_wrt_output = Tensor([[s1, s2, s3], [s4, s5, s6]])\n",
    "```\n",
    "\n",
    "则`GradNetWrtX`结构为："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation(sens_param=True)\n",
    "        self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y, self.grad_wrt_output)\n",
    "\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[[2.211 0.51  1.49 ]\n",
      " [5.588 2.68  4.07 ]]\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 停止计算梯度\n",
    "\n",
    "我们可以使用`stop_gradient`来禁止网络内的算子对梯度的影响，例如："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "import numpy as np\n",
    "import mindspore.nn as nn\n",
    "import mindspore.ops as ops\n",
    "from mindspore import Tensor\n",
    "from mindspore import ParameterTuple, Parameter\n",
    "from mindspore import dtype as mstype\n",
    "from mindspore.ops import stop_gradient\n",
    "\n",
    "class Net(nn.Cell):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.matmul = ops.MatMul()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        out1 = self.matmul(x, y)\n",
    "        out2 = self.matmul(x, y)\n",
    "        out2 = stop_gradient(out2)\n",
    "        out = out1 + out2\n",
    "        return out\n",
    "\n",
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y)\n",
    "\n",
    "x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[[4.5 2.7 3.6]\n",
      " [4.5 2.7 3.6]]\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "在这里我们对`out2`设置了`stop_gradient`, 所以`out2`没有对梯度计算有任何的贡献。 如果我们删除`out2 = stop_gradient(out2)`，那么输出值会变为："
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ],
   "outputs": [
    {
     "output_type": "stream",
     "name": "stdout",
     "text": [
      "[[9.0 5.4 7.2]\n",
      " [9.0 5.4 7.2]]\n"
     ]
    }
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "在我们不对`out2`设置`stop_gradient`后， `out2`和`out1`会对梯度产生相同的贡献。 所以我们可以看到，结果中每一项的值都变为了原来的两倍。"
   ],
   "metadata": {}
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "MindSpore",
   "language": "python",
   "name": "mindspore"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}