{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 自动求导\n",
    "\n",
    "[![在线运行](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.7/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svcjEuNy90dXRvcmlhbHMvemhfY24vYWR2YW5jZWQvbmV0d29yay9taW5kc3BvcmVfZGVyaXZhdGlvbi5pcHluYg==&imageid=9d63f4d1-dc09-4873-b669-3483cea777c0)&emsp;[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.7/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.7/tutorials/zh_cn/advanced/network/mindspore_derivation.ipynb)&emsp;\n",
    "[![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.7/resource/_static/logo_download_code.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.7/tutorials/zh_cn/advanced/network/mindspore_derivation.py)&emsp;\n",
    "[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.7/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.7/tutorials/source_zh_cn/advanced/network/derivation.ipynb)\n",
    "\n",
    "`mindspore.ops`模块提供的`GradOperation`接口可以生成网络模型的梯度。本文主要介绍如何使用`GradOperation`接口进行一阶、二阶求导，以及如何停止计算梯度。\n",
    "\n",
    "> 更多求导接口相关信息可参考[API文档](https://mindspore.cn/docs/zh-CN/r1.7/api_python/ops/mindspore.ops.GradOperation.html#mindspore.ops.GradOperation)。\n",
    "\n",
    "## 一阶求导\n",
    "\n",
    "计算一阶导数方法：`mindspore.ops.GradOperation()`，其中参数使用方式为：\n",
    "\n",
    "- `get_all`：为`False`时，只会对第一个输入求导；为`True`时，会对所有输入求导。\n",
    "- `get_by_list：`为`False`时，不会对权重求导；为`True`时，会对权重求导。\n",
    "- `sens_param`：对网络的输出值做缩放以改变最终梯度，故其维度与输出维度保持一致；\n",
    "\n",
    "下面我们先使用[MatMul](https://mindspore.cn/docs/zh-CN/r1.7/api_python/ops/mindspore.ops.MatMul.html#mindspore.ops.MatMul)算子构建自定义网络模型`Net`，再对其进行一阶求导，通过这样一个例子对`GradOperation`接口的使用方式做简单介绍，即公式：\n",
    "\n",
    "$$f(x, y)=(x * z) * y \\tag{1}$$\n",
    "\n",
    "首先我们要定义网络模型`Net`、输入`x`和输入`y`："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import mindspore.nn as nn\n",
    "import mindspore.ops as ops\n",
    "from mindspore import Tensor\n",
    "from mindspore import ParameterTuple, Parameter\n",
    "from mindspore import dtype as mstype\n",
    "\n",
    "# 定义输入x和y\n",
    "x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)\n",
    "\n",
    "class Net(nn.Cell):\n",
    "    \"\"\"定义矩阵相乘网络Net\"\"\"\n",
    "\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.matmul = ops.MatMul()\n",
    "        self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        x = x * self.z\n",
    "        out = self.matmul(x, y)\n",
    "        return out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### 对输入进行求导\n",
    "\n",
    "对输入值进行求导，代码如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[4.5099998 2.7       3.6000001]\n",
      " [4.5099998 2.7       3.6000001]]\n"
     ]
    }
   ],
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    \"\"\"定义网络输入的一阶求导\"\"\"\n",
    "\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y)\n",
    "\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接下来我们对上面的结果做一个解释。为便于分析，我们把上面的输入`x`、`y`以及权重`z`表示成如下形式:\n",
    "\n",
    "```text\n",
    "x = Tensor([[x1, x2, x3], [x4, x5, x6]])\n",
    "y = Tensor([[y1, y2, y3], [y4, y5, y6], [y7, y8, y9]])\n",
    "z = Tensor([z])\n",
    "```\n",
    "\n",
    "根据MatMul算子定义可得前向结果：\n",
    "\n",
    "$$output = [[(x_1 \\cdot y_1 + x_2 \\cdot y_4 + x_3 \\cdot y_7) \\cdot z, (x_1 \\cdot y_2 + x_2 \\cdot y_5 + x_3 \\cdot y_8) \\cdot z, (x_1 \\cdot y_3 + x_2 \\cdot y_6 + x_3 \\cdot y_9) \\cdot z],$$\n",
    "\n",
    "$$[(x_4 \\cdot y_1 + x_5 \\cdot y_4 + x_6 \\cdot y_7) \\cdot z, (x_4 \\cdot y_2 + x_5 \\cdot y_5 + x_6 \\cdot y_8) \\cdot z, (x_4 \\cdot y_3 + x_5 \\cdot y_6 + x_6 \\cdot y_9) \\cdot z]] \\tag{2}$$\n",
    "\n",
    "梯度计算时由于MindSpore采用的是Reverse自动微分机制，会对输出结果求和后再对输入`x`求导：\n",
    "\n",
    "1. 求和公式：\n",
    "\n",
    "$$\\sum{output} = [(x_1 \\cdot y_1 + x_2 \\cdot y_4 + x_3 \\cdot y_7) + (x_1 \\cdot y_2 + x_2 \\cdot y_5 + x_3 \\cdot y_8) + (x_1 \\cdot y_3 + x_2 \\cdot y_6 + x_3 \\cdot y_9)$$\n",
    "\n",
    "$$+ (x_4 \\cdot y_1 + x_5 \\cdot y_4 + x_6 \\cdot y_7) + (x_4 \\cdot y_2 + x_5 \\cdot y_5 + x_6 \\cdot y_8) + (x_4 \\cdot y_3 + x_5 \\cdot y_6 + x_6 \\cdot y_9)] \\cdot z \\tag{3}$$\n",
    "\n",
    "2. 求导公式：\n",
    "\n",
    "$$\\frac{\\mathrm{d}(\\sum{output})}{\\mathrm{d}x} = [[(y_1 + y_2 + y_3) \\cdot z, (y_4 + y_5 + y_6) \\cdot z, (y_7 + y_8 + y_9) \\cdot z],$$\n",
    "\n",
    "$$[(y_1 + y_2 + y_3) \\cdot z, (y_4 + y_5 + y_6) \\cdot z, (y_7 + y_8 + y_9) \\cdot z]] \\tag{4}$$\n",
    "\n",
    "3. 计算结果：\n",
    "\n",
    "$$\\frac{\\mathrm{d}(\\sum{output})}{\\mathrm{d}x} = [[4.51 \\quad 2.7 \\quad 3.6] [4.51 \\quad 2.7 \\quad 3.6]] \\tag{5}$$\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 若考虑对`x`、`y`输入求导，只需在`GradNetWrtX`中设置`self.grad_op = GradOperation(get_all=True)`。\n",
    "\n",
    "### 对权重进行求导\n",
    "\n",
    "对权重进行求导，示例代码如下：\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[21.536]\n"
     ]
    }
   ],
   "source": [
    "class GradNetWrtZ(nn.Cell):\n",
    "    \"\"\"定义网络权重的一阶求导\"\"\"\n",
    "\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtZ, self).__init__()\n",
    "        self.net = net\n",
    "        self.params = ParameterTuple(net.trainable_params())\n",
    "        self.grad_op = ops.GradOperation(get_by_list=True)\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net, self.params)\n",
    "        return gradient_function(x, y)\n",
    "\n",
    "output = GradNetWrtZ(Net())(x, y)\n",
    "print(output[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "下面我们通过公式对上面的结果做一个解释。对权重的求导公式为：\n",
    "\n",
    "$$\\frac{\\mathrm{d}(\\sum{output})}{\\mathrm{d}z} = (x_1 \\cdot y_1 + x_2 \\cdot y_4 + x_3 \\cdot y_7) + (x_1 \\cdot y_2 + x_2 \\cdot y_5 + x_3 \\cdot y_8) + (x_1 \\cdot y_3 + x_2 \\cdot y_6 + x_3 \\cdot y_9)$$\n",
    "\n",
    "$$+ (x_4 \\cdot y_1 + x_5 \\cdot y_4 + x_6 \\cdot y_7) + (x_4 \\cdot y_2 + x_5 \\cdot y_5 + x_6 \\cdot y_8) + (x_4 \\cdot y_3 + x_5 \\cdot y_6 + x_6 \\cdot y_9) \\tag{6}$$\n",
    "\n",
    "计算结果：\n",
    "\n",
    "$$\\frac{\\mathrm{d}(\\sum{output})}{\\mathrm{d}z} = [2.1536e+01] \\tag{7}$$\n",
    "\n",
    "### 梯度值缩放\n",
    "\n",
    "可以通过`sens_param`参数控制梯度值的缩放："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2.211 0.51  1.49 ]\n",
      " [5.588 2.68  4.07 ]]\n"
     ]
    }
   ],
   "source": [
    "class GradNetWrtN(nn.Cell):\n",
    "    \"\"\"定义网络的一阶求导，控制梯度值缩放\"\"\"\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtN, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation(sens_param=True)\n",
    "\n",
    "        # 定义梯度值缩放\n",
    "        self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y, self.grad_wrt_output)\n",
    "\n",
    "output = GradNetWrtN(Net())(x, y)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了方便对上面的结果进行解释，我们把`self.grad_wrt_output`记作如下形式：\n",
    "\n",
    "```text\n",
    "self.grad_wrt_output = Tensor([[s1, s2, s3], [s4, s5, s6]])\n",
    "```\n",
    "\n",
    "缩放后的输出值为原输出值与`self.grad_wrt_output`对应元素的乘积，公式为：\n",
    "\n",
    "$$output = [[(x_1 \\cdot y_1 + x_2 \\cdot y_4 + x_3 \\cdot y_7) \\cdot z \\cdot s_1, (x_1 \\cdot y_2 + x_2 \\cdot y_5 + x_3 \\cdot y_8) \\cdot z \\cdot s_2, (x_1 \\cdot y_3 + x_2 \\cdot y_6 + x_3 \\cdot y_9) \\cdot z \\cdot s_3], $$\n",
    "\n",
    "$$[(x_4 \\cdot y_1 + x_5 \\cdot y_4 + x_6 \\cdot y_7) \\cdot z \\cdot s_4, (x_4 \\cdot y_2 + x_5 \\cdot y_5 + x_6 \\cdot y_8) \\cdot z \\cdot s_5, (x_4 \\cdot y_3 + x_5 \\cdot y_6 + x_6 \\cdot y_9) \\cdot z \\cdot s_6]] \\tag{8}$$\n",
    "\n",
    "求导公式变为输出值总和对`x`的每个元素求导：\n",
    "\n",
    "$$\\frac{\\mathrm{d}(\\sum{output})}{\\mathrm{d}x} = [[(s_1 \\cdot y_1 + s_2 \\cdot y_2 + s_3 \\cdot y_3) \\cdot z, (s_1 \\cdot y_4 + s_2 \\cdot y_5 + s_3 \\cdot y_6) \\cdot z, (s_1 \\cdot y_7 + s_2 \\cdot y_8 + s_3 \\cdot y_9) \\cdot z],$$\n",
    "\n",
    "$$[(s_4 \\cdot y_1 + s_5 \\cdot y_2 + s_6 \\cdot y_3) \\cdot z, (s_4 \\cdot y_4 + s_5 \\cdot y_5 + s_6 \\cdot y_6) \\cdot z, (s_4 \\cdot y_7 + s_5 \\cdot y_8 + s_6 \\cdot y_9) \\cdot z]] \\tag{9}$$\n",
    "\n",
    "计算结果：\n",
    "\n",
    "$$\\frac{\\mathrm{d}(\\sum{output})}{\\mathrm{d}x} = [[2.211 \\quad 0.51 \\quad 1.49][5.588 \\quad 2.68 \\quad 4.07]] \\tag{10}$$\n",
    "\n",
    "### 停止计算梯度\n",
    "\n",
    "我们可以使用`stop_gradient`来停止计算指定算子的梯度，从而消除该算子对梯度的影响。\n",
    "\n",
    "在上面一阶求导使用的矩阵相乘网络模型的基础上，我们再增加一个算子`out2`并禁止计算其梯度，得到自定义网络`Net2`，然后看一下对输入的求导结果情况。\n",
    "\n",
    "示例代码如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[4.5099998 2.7       3.6000001]\n",
      " [4.5099998 2.7       3.6000001]]\n"
     ]
    }
   ],
   "source": [
    "class Net(nn.Cell):\n",
    "\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.matmul = ops.MatMul()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        out1 = self.matmul(x, y)\n",
    "        out2 = self.matmul(x, y)\n",
    "        out2 = ops.stop_gradient(out2)  # 停止计算out2算子的梯度\n",
    "        out = out1 + out2\n",
    "        return out\n",
    "\n",
    "class GradNetWrtX(nn.Cell):\n",
    "\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y)\n",
    "\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从上面的打印可以看出，由于对`out2`设置了`stop_gradient`, 所以`out2`没有对梯度计算有任何的贡献，其输出结果与未加`out2`算子时一致。\n",
    "\n",
    "下面我们删除`out2 = stop_gradient(out2)`，再来看一下输出结果。示例代码为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[9.0199995 5.4       7.2000003]\n",
      " [9.0199995 5.4       7.2000003]]\n"
     ]
    }
   ],
   "source": [
    "class Net(nn.Cell):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.matmul = ops.MatMul()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        out1 = self.matmul(x, y)\n",
    "        out2 = self.matmul(x, y)\n",
    "        # out2 = stop_gradient(out2)\n",
    "        out = out1 + out2\n",
    "        return out\n",
    "\n",
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y)\n",
    "\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "打印结果可以看出，在我们把`out2`算子的梯度也计算进去之后，由于`out2`和`out1`算子完全相同，因此它们产生的梯度也完全相同，所以我们可以看到，结果中每一项的值都变为了原来的两倍（存在精度误差）。\n",
    "\n",
    "## 高阶求导\n",
    "\n",
    "高阶微分在AI支持科学计算、二阶优化等领域均有应用。如分子动力学模拟中，利用神经网络训练势能时，损失函数中需计算神经网络输出对输入的导数，则反向传播便存在损失函数对输入、权重的二阶交叉导数。\n",
    "\n",
    "此外，AI求解微分方程（如PINNs方法）还会存在输出对输入的二阶导数。又如二阶优化中，为了能够让神经网络快速收敛，牛顿法等需计算损失函数对权重的二阶导数。\n",
    "\n",
    "MindSpore可通过多次求导的方式支持高阶导数，下面通过几类例子展开阐述。\n",
    "\n",
    "### 单输入单输出高阶导数\n",
    "\n",
    "例如Sin算子，其公式为：\n",
    "\n",
    "$$f(x) = sin(x) \\tag{1}$$\n",
    "\n",
    "其一阶导数是：\n",
    "\n",
    "$$f'(x) = cos(x) \\tag{2}$$\n",
    "\n",
    "其二阶导数为：\n",
    "\n",
    "$$f''(x) = cos'(x) = -sin(x) \\tag{3}$$\n",
    "\n",
    "其二阶导数（-Sin）实现如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-0.]\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "import mindspore.nn as nn\n",
    "import mindspore.ops as ops\n",
    "from mindspore import Tensor\n",
    "\n",
    "class Net(nn.Cell):\n",
    "    \"\"\"定义基于Sin算子的网络模型\"\"\"\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.sin = ops.Sin()\n",
    "\n",
    "    def construct(self, x):\n",
    "        out = self.sin(x)\n",
    "        return out\n",
    "\n",
    "class Grad(nn.Cell):\n",
    "    \"\"\"一阶求导\"\"\"\n",
    "    def __init__(self, network):\n",
    "        super(Grad, self).__init__()\n",
    "        self.grad = ops.GradOperation()\n",
    "        self.network = network\n",
    "\n",
    "    def construct(self, x):\n",
    "        gout = self.grad(self.network)(x)\n",
    "        return gout\n",
    "\n",
    "class GradSec(nn.Cell):\n",
    "    \"\"\"二阶求导\"\"\"\n",
    "    def __init__(self, network):\n",
    "        super(GradSec, self).__init__()\n",
    "        self.grad = ops.GradOperation()\n",
    "        self.network = network\n",
    "\n",
    "    def construct(self, x):\n",
    "        gout = self.grad(self.network)(x)\n",
    "        return gout\n",
    "\n",
    "x_train = Tensor(np.array([3.1415926]), dtype=mstype.float32)\n",
    "\n",
    "net = Net()\n",
    "firstgrad = Grad(net)\n",
    "secondgrad = GradSec(firstgrad)\n",
    "output = secondgrad(x_train)\n",
    "\n",
    "# 打印结果\n",
    "result = np.around(output.asnumpy(), decimals=2)\n",
    "print(result)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "从上面的打印结果可以看出，`-sin(3.1415926)`的值接近于`0`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "> 由于不同计算平台的精度可能存在差异，因此本章节中的代码在不同平台上的执行结果会存在微小的差别。"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "MindSpore",
   "language": "python",
   "name": "mindspore"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}