{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 自动微分\n",
    "\n",
    "[![](https://gitee.com/mindspore/docs/raw/r1.2/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.2/tutorials/source_zh_cn/autograd.ipynb)&emsp;[![](https://gitee.com/mindspore/docs/raw/r1.2/resource/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.2/quick_start/mindspore_autograd.ipynb)&emsp;[![](https://gitee.com/mindspore/docs/raw/r1.2/tutorials/training/source_zh_cn/_static/logo_modelarts.png)](https://console.huaweicloud.com/modelarts/?region=cn-north-4#/notebook/loading?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svbW9kZWxhcnRzL3F1aWNrX3N0YXJ0L21pbmRzcG9yZV9hdXRvZ3JhZC5pcHluYg==&image_id=65f636a0-56cf-49df-b941-7d2a07ba8c8c)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "在训练神经网络时，最常用的算法是反向传播，在该算法中，根据损失函数对于给定参数的梯度来调整参数（模型权重）。\n",
    "\n",
    "MindSpore计算一阶导数方法`mindspore.ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)`，其中`get_all`为`False`时，只会对第一个输入求导，为`True`时，会对所有输入求导；`get_by_list`为`False`时，不会对权重求导，为`True`时，会对权重求导；`sens_param`对网络的输出值做缩放以改变最终梯度。下面用MatMul算子的求导做深入分析。\n",
    "\n",
    "首先导入本文档需要的模块和接口，如下所示："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "import mindspore.nn as nn\n",
    "import mindspore.ops as ops\n",
    "from mindspore import Tensor\n",
    "from mindspore import ParameterTuple, Parameter\n",
    "from mindspore import dtype as mstype"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 对输入求一阶导\n",
    "\n",
    "如果需要对输入进行求导，首先需要定义一个需要求导的网络，以一个由MatMul算子构成的网络$f(x,y)=z * x * y$为例。\n",
    "\n",
    "定义网络结构如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "class Net(nn.Cell):\n",
    "    def __init__(self):\n",
    "        super(Net, self).__init__()\n",
    "        self.matmul = ops.MatMul()\n",
    "        self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z')\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        x = x * self.z\n",
    "        out = self.matmul(x, y)\n",
    "        return out"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "接着定义求导网络，`__init__`函数中定义需要求导的网络`self.net`和`ops.GradOperation`操作，`construct`函数中对`self.net`进行求导。\n",
    "\n",
    "求导网络结构如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation()\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "定义输入并且打印输出："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[4.5099998 2.7       3.6000001]\n",
      " [4.5099998 2.7       3.6000001]]\n"
     ]
    }
   ],
   "source": [
    "x = Tensor([[0.8, 0.6, 0.2], [1.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "y = Tensor([[0.11, 3.3, 1.1], [1.1, 0.2, 1.4], [1.1, 2.2, 0.3]], dtype=mstype.float32)\n",
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "若考虑对`x`、`y`输入求导，只需在`GradNetWrtX`中设置`self.grad_op = GradOperation(get_all=True)`。"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 对权重求一阶导\n",
    "\n",
    "若需要对权重的求导，将`ops.GradOperation`中的`get_by_list`设置为`True`：\n",
    "\n",
    "则`GradNetWrtX`结构为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [],
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.params = ParameterTuple(net.trainable_params())\n",
    "        self.grad_op = ops.GradOperation(get_by_list=True)\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net, self.params)\n",
    "        return gradient_function(x, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "运行并打印输出："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(Tensor(shape=[1], dtype=Float32, value= [ 2.15359993e+01]),)\n"
     ]
    }
   ],
   "source": [
    "output = GradNetWrtX(Net())(x, y)\n",
    "print(output)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n",
    "若需要对某些权重不进行求导，则在定义求导网络时，对相应的权重中`requires_grad`设置为`False`。\n",
    "\n",
    "```Python\n",
    "self.z = Parameter(Tensor(np.array([1.0], np.float32)), name='z', requires_grad=False)\n",
    "```\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 梯度值缩放\n",
    "\n",
    "可以通过`sens_param`参数对网络的输出值做缩放以改变最终梯度。首先将`ops.GradOperation`中的`sens_param`设置为`True`，并确定缩放指数，其维度与输出维度保持一致。\n",
    "\n",
    "缩放指数`self.grad_wrt_output`可以记作如下形式：\n",
    "\n",
    "```python\n",
    "self.grad_wrt_output = Tensor([[s1, s2, s3], [s4, s5, s6]])\n",
    "```\n",
    "\n",
    "则`GradNetWrtX`结构为："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[[2.211 0.51  1.49 ]\n",
      " [5.588 2.68  4.07 ]]\n"
     ]
    }
   ],
   "source": [
    "class GradNetWrtX(nn.Cell):\n",
    "    def __init__(self, net):\n",
    "        super(GradNetWrtX, self).__init__()\n",
    "        self.net = net\n",
    "        self.grad_op = ops.GradOperation(sens_param=True)\n",
    "        self.grad_wrt_output = Tensor([[0.1, 0.6, 0.2], [0.8, 1.3, 1.1]], dtype=mstype.float32)\n",
    "\n",
    "    def construct(self, x, y):\n",
    "        gradient_function = self.grad_op(self.net)\n",
    "        return gradient_function(x, y, self.grad_wrt_output)\n",
    "\n",
    "output = GradNetWrtX(Net())(x, y)  \n",
    "print(output)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}