# Automatic Differentiation

[![View Source On Gitee](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r1.8/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/r1.8/tutorials/source_en/beginner/autograd.md)

Automatic differentiation can calculate a derivative value of a derivative function at a certain point, which is a generalization of backpropagation algorithms. The main problem solved by automatic differentiation is to decompose a complex mathematical operation into a series of simple basic operations. This function shields a large number of derivative details and processes from users, greatly reducing the threshold for using the framework.

MindSpore uses `ops.GradOperation` to calculate the first-order derivative. The `ops.GradOperation` attributes are as follows:

+ `get_all`: calculate the gradient. If it is equal to False, get the gradient of the first input. If it is equal to True, get the gradient of all inputs. The default value is False.
+ `get_by_list`: determines whether to derive the weight parameters. The default value is False.
+ `sens_param`: determines whether to scale the output value of the network to change the final gradient. The default value is False.

This chapter uses `ops.GradOperation` in MindSpore to find first-order derivatives of the function $f(x)=wx+b$.

## First-order Derivative of the Input

Define the formula before deriving the input:

$$f(x)=wx+b \tag {1} $$

The example code below is an expression of Equation (1). Since MindSpore is functionally programmed, all expressions of computational formulas are represented as functions.

```python
import numpy as np
import mindspore.nn as nn
import mindspore as ms

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.w = ms.Parameter(np.array([6.0]), name='w')
        self.b = ms.Parameter(np.array([1.0]), name='b')

    def construct(self, x):
        f = self.w * x + self.b
        return f
```

Define the derivative class `GradNet`. In the `__init__` function, define the `self.net` and `ops.GradOperation` networks. In the `construct` function, compute the derivative of `self.net`. The following formula (2) is generated in MindSpore:

$$f^{'}(x)=w\tag {2}$$

```python
import mindspore as ms
import mindspore.ops as ops

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.grad_op = ops.GradOperation()

    def construct(self, x):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x)
```

Finally, the weight parameter is defined as w, and a first-order derivative is found for the input parameter x in the input formula (1). According to the running result, the input in formula (1) is 6, that is:

$$f(x)=wx+b=6*x+1 \tag {3}$$

Derive the above equation:

$$f^{'}(x)=w=6 \tag {4}$$

```python
x = Tensor([100], dtype=ms.float32)
output = GradNet(Net())(x)

print(output)
```

```text
[6.]
```

MindSpore calculates the first-order derivative using `ops.GradOperation (get_all=False, get_by_list=False, sens_param=False)`. If `get_all` is set to `False`, the derivative of only the first input is calculated. If `get_all` is set to `True`, the derivative of all inputs is calculated.

## First-order Derivative of the Weight

To compute weight derivatives, you need to set `get_by_list` in `ops.GradOperation` to `True`.

```python
import mindspore as ms

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.params = ms.ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)  # Set the first-order derivative of the weight parameters.

    def construct(self, x):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x)
```

Next, derive the function:

```python
# Perform a derivative calculation on the function.
x = ms.Tensor([100], dtype=ms.float32)
fx = GradNet(Net())(x)

# Print the result.
print(f"wgrad: {fx[0]}\nbgrad: {fx[1]}")
```

```text
wgrad: [100.]
bgrad: [1.]
```

If derivation is not required for some weights, set `requires_grad` to `False` when defining the derivation network and declaring the corresponding weight parameters.

```python
import mindspore as ms
from mindspore import ops

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')
        self.b = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='b', requires_grad=False)

    def construct(self, x):
        out = x * self.w + self.b
        return out

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.params = ms.ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)

    def construct(self, x):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x)

# Construct a derivative network.
x = ms.Tensor([5], dtype=ms.float32)
fw = GradNet(Net())(x)

print(fw)
```

```text
(Tensor(shape=[1], dtype=Float32, value= [ 5.00000000e+00]),)
```

## Gradient Value Scaling

You can use the `sens_param` parameter to scale the output value of the network to change the final gradient. Set `sens_param` in `ops.GradOperation` to `True` and determine the scaling index. The dimension must be the same as the output dimension.

```python
class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        # Derivative operation.
        self.grad_op = ops.GradOperation(sens_param=True)
        # Scale an index.
        self.grad_wrt_output = Tensor([0.1], dtype=ms.float32)

    def construct(self, x):
        gradient_function = self.grad_op(self.net)
        return gradient_function(x, self.grad_wrt_output)

x = ms.Tensor([6], dtype=ms.float32)
output = GradNet(Net())(x)

print(output)
```

```text
[0.6]
```

## Stopping Gradient Calculation

You can use `ops.stop_gradient` to stop calculating gradients. The following is an example:

```python
from mindspore.ops import stop_gradient

class Net(nn.Cell):
    def __init__(self):
        super(Net, self).__init__()
        self.w = ms.Parameter(ms.Tensor(np.array([6], np.float32)), name='w')
        self.b = ms.Parameter(ms.Tensor(np.array([1.0], np.float32)), name='b')

    def construct(self, x):
        out = x * self.w + self.b
        # Stop updating the gradient. The out does not contribute to gradient calculations.
        out = stop_gradient(out)
        return out

class GradNet(nn.Cell):
    def __init__(self, net):
        super(GradNet, self).__init__()
        self.net = net
        self.params = ms.ParameterTuple(net.trainable_params())
        self.grad_op = ops.GradOperation(get_by_list=True)

    def construct(self, x):
        gradient_function = self.grad_op(self.net, self.params)
        return gradient_function(x)

x = ms.Tensor([100], dtype=ms.float32)
output = GradNet(Net())(x)

print(f"wgrad: {output[0]}\nbgrad: {output[1]}")
```

```text
wgrad: [0.]
bgrad: [0.]
```