Network Parameters

Ascend GPU CPU Model Development

Overview

Parameter is a variable tensor, indicating the parameters that need to be updated during network training. The following describes the Parameter initialization, attributes, methods, and ParameterTuple. The following describes the Parameter initialization, attributes, methods, ParameterTuple and dependency control.

Parameters

Parameter is a variable tensor, indicating the parameters that need to be updated during network training.

Declaration

mindspore.Parameter(default_input, name=None, requires_grad=True, layerwise_parallel=False)

default_input: Initialize a Parameter object. The input data supports the Tensor, Initializer, int, and float types. The initializer API can be called to generate the Initializer object. When init is used to initialize Tensor, the Tensor only stores the shape and type of the tensor, not the actual data. Therefore, Tensor does not occupy any memory, you can call the init_data API to convert Tensor saved in Parameter to the actual data.
name: You can specify a name for each Parameter to facilitate subsequent operations and updates. It is recommended to use the default value of name when initialize a parameter as one attribute of a cell, otherwise, the parameter name may be different than expected.
requires_grad: update a parameter, set requires_grad to True.
layerwise_parallel: When layerwise_parallel is set to True, this parameter will be filtered out during parameter broadcast and parameter gradient aggregation.

For details about the configuration of distributed parallelism, see https://www.mindspore.cn/docs/programming_guide/en/r1.6/auto_parallel.html.

In the following example, Parameter objects are built using three different data types. All the three Parameter objects need to be updated, and layerwise parallelism is not used.

The code sample is as follows:

import numpy as np
from mindspore import Tensor, Parameter
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer

x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name='x')
y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')
z = Parameter(default_input=2.0, name='z')

print(x, "\n\n", y, "\n\n", z)

The output is as follows:

 Parameter (name=x, shape=(2, 3), dtype=Int32, requires_grad=True)

 Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True)

 Parameter (name=z, shape=(), dtype=Float32, requires_grad=True)

Attributes

inited_param: returns Parameter that stores the actual data.
name: specifies a name for an instantiated Parameter.
sliced: specifies whether the data stored in Parameter is sharded data in the automatic parallel scenario.

If yes, do not shard the data. Otherwise, determine whether to shard the data based on the network parallel strategy.

is_init: initialization status of Parameter. At the GE backend, an init graph is required to synchronize data from the host to the device. This parameter specifies whether the data has been synchronized to the device. This parameter takes effect only at the GE backend. This parameter is set to False at other backends.
layerwise_parallel: specifies whether Parameter supports layerwise parallelism. If yes, parameters are not broadcasted and gradient aggregation is not performed. Otherwise, parameters need to be broadcasted and gradient aggregation is performed.
requires_grad: specifies whether to compute the parameter gradient. If a parameter needs to be trained, the parameter gradient needs to be computed. Otherwise, the parameter gradient does not need to be computed.
data: Parameter.

In the following example, Parameter is initialized through Tensor to obtain its attributes.

import numpy as np

from mindspore import Tensor, Parameter

x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))))

print("name: ", x.name, "\n",
      "sliced: ", x.sliced, "\n",
      "is_init: ", x.is_init, "\n",
      "inited_param: ", x.inited_param, "\n",
      "requires_grad: ", x.requires_grad, "\n",
      "layerwise_parallel: ", x.layerwise_parallel, "\n",
      "data: ", x.data)

The output is as follows:

name:  Parameter
sliced:  False
is_init:  False
inited_param:  None
requires_grad:  True
layerwise_parallel:  False

data:  Parameter (name=Parameter, shape=(2, 3), dtype=Int64, requires_grad=True)

Methods

init_data: When the network uses the semi-automatic or automatic parallel strategy, and the data input during Parameter initialization is Initializer, this API can be called to convert the data saved by Parameter to Tensor.
set_data: sets the data saved by Parameter. Tensor, Initializer, int, and float can be input for setting. When the input parameter slice_shape of the method is set to True, the shape of Parameter can be changed. Otherwise, the configured shape must be the same as the original shape of Parameter.
set_param_ps: controls whether training parameters are trained by using the Parameter Server.
clone: clones Parameter. You can specify the parameter name after cloning.

In the following example, Initializer is used to initialize Tensor, and methods related to Parameter are called.

import numpy as np

from mindspore import Tensor, Parameter
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer

x = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32))

print(x)

x_clone = x.clone()
x_clone.name = "x_clone"
print(x_clone)

print(x.init_data())
print(x.set_data(data=Tensor(np.arange(2*3).reshape((1, 2, 3)))))

The output is as follows:

Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=x_clone, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)

ParameterTuple

Inherited from tuple, ParameterTuple is used to store multiple Parameter objects. __new__(cls, iterable) is used to transfer an iterator for storing Parameter for building, and the clone API is provided for cloning.

The following example builds a ParameterTuple object and clones it.

import numpy as np
from mindspore import Tensor, Parameter, ParameterTuple
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer

x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name='x')
y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))
params_copy = params.clone("params_copy")
print(params, "\n")
print(params_copy)

The output is as follows:

(Parameter (name=x, shape=(2, 3), dtype=Int32, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))

(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int32, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))

Using Encapsulation Operator to Initialize Parameters

Mindspore provides a variety of methods of initializing parameters, and encapsulates parameter initialization functions in some operators. This section will introduce the method of initialization of parameters by operators with parameter initialization function. Taking Conv2D operator as an example, it will introduce the initialization of parameters in the network by strings, Initializer subclass and custom Tensor, etc. Normal, a subclass of Initializer, is used in the following code examples and can be replaced with any of the subclasses of Initializer in the code examples.

Character String

Network parameters are initialized using a string. The contents of the string need to be consistent with the name of the Initializer subclass(Letters are not case sensitive). Initialization using a string will use the default parameters in the Initializer subclass. For example, using the string Normal is equivalent to using the Initializer subclass Normal(). The code sample is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
from mindspore import set_seed

set_seed(1)

input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
net = nn.Conv2d(3, 64, 3, weight_init='Normal')
output = net(input_data)
print(output)

The output is as follows:

[[[[ 3.10382620e-02  4.38603461e-02  4.38603461e-02 ...  4.38603461e-02
     4.38603461e-02  1.38719045e-02]
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   ...
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   [ 9.66199022e-03  1.24104535e-02  1.24104535e-02 ...  1.24104535e-02
     1.24104535e-02 -1.38977719e-02]]

  ...

  [[ 3.98553275e-02 -1.35465711e-03 -1.35465711e-03 ... -1.35465711e-03
    -1.35465711e-03 -1.00310734e-02]
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   ...
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   [ 1.33139016e-02  6.74417242e-05  6.74417242e-05 ...  6.74417242e-05
     6.74417242e-05 -2.27325838e-02]]]]

Initializer Subclass

Initializer subclass is used to initialize network parameters, which is similar to the effect of using string to initialize parameters. The difference is that using string to initialize parameters uses the default parameter of the Initializer subclass. If you want to use the parameters in the Initializer subclass, the Initializer subclass must be used to initialize the parameters. Taking Normal(0.2) as an example, the code sample is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
from mindspore import set_seed
from mindspore.common.initializer import Normal

set_seed(1)

input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
net = nn.Conv2d(3, 64, 3, weight_init=Normal(0.2))
output = net(input_data)
print(output)

The output is as follows:

[[[[ 6.2076533e-01  8.7720710e-01  8.7720710e-01 ...  8.7720710e-01
     8.7720710e-01  2.7743810e-01]
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   ...
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   [ 1.9323981e-01  2.4820906e-01  2.4820906e-01 ...  2.4820906e-01
     2.4820906e-01 -2.7795550e-01]]

  ...

  [[ 7.9710668e-01 -2.7093157e-02 -2.7093157e-02 ... -2.7093157e-02
    -2.7093157e-02 -2.0062150e-01]
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   ...
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   [ 2.6627803e-01  1.3488382e-03  1.3488382e-03 ...  1.3488382e-03
     1.3488382e-03 -4.5465171e-01]]]]

The Custom of the Tensor

In addition to the above two initialization methods, when the network wants to use data types that are not available in MindSpore, users can customize Tensor to initialize the parameters. The code sample is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
from mindspore import dtype as mstype

weight = Tensor(np.ones([64, 3, 3, 3]), dtype=mstype.float32)
input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
net = nn.Conv2d(3, 64, 3, weight_init=weight)
output = net(input_data)
print(output)

The output is as follows:

[[[[12. 18. 18. ... 18. 18. 12.]
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   ...
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   [12. 18. 18. ... 18. 18. 12.]]

  ...

  [[12. 18. 18. ... 18. 18. 12.]
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   ...
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   [12. 18. 18. ... 18. 18. 12.]]]]

Dependency Control

If the result of a function depends on or affects an external state, we consider that the function has side effects, such as a function changing an external global variable, and the result of a function depends on the value of a global variable. If the operator changes the value of the input parameter or the output of the operator depends on the value of the global parameter, we think this is an operator with side effects.

Side effects are classified as memory side effects and IO side effects based on memory properties and IO status. At present, memory side effects are mainly Assign, optimizer operators and so on, IO side effects are mainly Print operators. You can view the operator definition in detail, the memory side effect operator has side_effect_mem properties in the definition, and the IO side effect operator has side_effect_io properties in the definition.

Depend is used for processing dependency operations.In most cases, if the operators have IO or memory side effects, they will be executed according to the user’s semantics, and there is no need to use the Depend operator to guarantee the execution order.In some cases, if the two operators A and B do not have sequential dependencies, and A must execute before B, we recommend that you use Depend to specify the order in which they are executed. Here’s how to use it:

a = A(x)                --->        a = A(x)
b = B(y)                --->        y = Depend(y, a)
                        --->        b = B(y)

Please note that a special set of operators for floating point overflow state detection have hidden side effects, but are not IO side effects or memory side effects. In addition, there are strict sequencing requirements for use, i.e., before using the NPUClearFloatStatus operator, you need to ensure that the NPU AllocFloatStatus has been executed, and before using the NPUGetFloatStatus operator, you need to ensure that the NPUClearFlotStatus has been executed. Because these operators are used less, the current scenario is to keep them defined as side-effect-free in the form of Depend ensuring execution order. These operators are only supported on Ascend. Examples are as follows:

import numpy as np
from mindspore import Tensor
from mindspore import ops, context

context.set_context(device_target="Ascend")

npu_alloc_status = ops.NPUAllocFloatStatus()
npu_get_status = ops.NPUGetFloatStatus()
npu_clear_status = ops.NPUClearFloatStatus()
x = Tensor(np.ones([3, 3]).astype(np.float32))
y = Tensor(np.ones([3, 3]).astype(np.float32))
init = npu_alloc_status()
sum_ = ops.Add()(x, y)
product = ops.MatMul()(x, y)
init = ops.depend(init, sum_)
init = ops.depend(init, product)
get_status = npu_get_status(init)
sum_ = ops.depend(sum_, get_status)
product = ops.depend(product, get_status)
out = ops.Add()(sum_, product)
init = ops.depend(init, out)
clear = npu_clear_status(init)
out = ops.depend(out, clear)
print(out)

The output is as follows:

[[5. 5. 5.]
 [5. 5. 5.]
 [5. 5. 5.]]

Specific usage methods can refer to the implementation of start_overflow_check functions in the overflow detection logic.