Initialization of Network Parameters

Translator: Karlos Ma

View Source On Gitee

Overview

MindSpore provides a weight initialization module, which allows users to initialize network parameters by encapsulated operators and initializer methods to call strings, initializer subclasses, or custom Tensors. The Initializer class is the basic data structure used for initialization in MindSpore. Its subclasses contain several different types of data distribution (Zero, One, XavierUniform, Heuniform, Henormal, Constant, Uniform, Normal, TruncatedNormal). The following two parameter initialization modes, encapsulation operator and initializer method, are introduced in detail.

Using Encapsulation Operator to Initialize Parameters

Mindspore provides a variety of methods of initializing parameters, and encapsulates parameter initialization functions in some operators. This section will introduce the method of initialization of parameters by operators with parameter initialization function. Taking Conv2D operator as an example, it will introduce the initialization of parameters in the network by strings, Initializer subclass and custom Tensor, etc. Normal, a subclass of Initializer, is used in the following code examples and can be replaced with any of the subclasses of Initializer in the code examples.

Character String

Network parameters are initialized using a string. The contents of the string need to be consistent with the name of the Initializer subclass. Initialization using a string will use the default parameters in the Initializer subclass. For example, using the string Normal is equivalent to using the Initializer subclass Normal(). The code sample is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.common import set_seed

set_seed(1)

input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
net = nn.Conv2d(3, 64, 3, weight_init='Normal')
output = net(input_data)
print(output)
[[[[ 3.10382620e-02  4.38603461e-02  4.38603461e-02 ...  4.38603461e-02
     4.38603461e-02  1.38719045e-02]
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   ...
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   [ 3.26051228e-02  3.54298912e-02  3.54298912e-02 ...  3.54298912e-02
     3.54298912e-02 -5.54019120e-03]
   [ 9.66199022e-03  1.24104535e-02  1.24104535e-02 ...  1.24104535e-02
     1.24104535e-02 -1.38977719e-02]]

  ...

  [[ 3.98553275e-02 -1.35465711e-03 -1.35465711e-03 ... -1.35465711e-03
    -1.35465711e-03 -1.00310734e-02]
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   ...
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   [ 4.38403059e-03 -3.60766202e-02 -3.60766202e-02 ... -3.60766202e-02
    -3.60766202e-02 -2.95619294e-02]
   [ 1.33139016e-02  6.74417242e-05  6.74417242e-05 ...  6.74417242e-05
     6.74417242e-05 -2.27325838e-02]]]]

Initializer Subclass

Initializer subclass is used to initialize network parameters, which is similar to the effect of using string to initialize parameters. The difference is that using string to initialize parameters uses the default parameter of the Initializer subclass. If you want to use the parameters in the Initializer subclass, the Initializer subclass must be used to initialize the parameters. Taking Normal(0.2) as an example, the code sample is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
from mindspore.common import set_seed
from mindspore.common.initializer import Normal

set_seed(1)

input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
net = nn.Conv2d(3, 64, 3, weight_init=Normal(0.2))
output = net(input_data)
print(output)
[[[[ 6.2076533e-01  8.7720710e-01  8.7720710e-01 ...  8.7720710e-01
     8.7720710e-01  2.7743810e-01]
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   ...
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   [ 6.5210247e-01  7.0859784e-01  7.0859784e-01 ...  7.0859784e-01
     7.0859784e-01 -1.1080378e-01]
   [ 1.9323981e-01  2.4820906e-01  2.4820906e-01 ...  2.4820906e-01
     2.4820906e-01 -2.7795550e-01]]

  ...

  [[ 7.9710668e-01 -2.7093157e-02 -2.7093157e-02 ... -2.7093157e-02
    -2.7093157e-02 -2.0062150e-01]
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   ...
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   [ 8.7680638e-02 -7.2153252e-01 -7.2153252e-01 ... -7.2153252e-01
    -7.2153252e-01 -5.9123868e-01]
   [ 2.6627803e-01  1.3488382e-03  1.3488382e-03 ...  1.3488382e-03
     1.3488382e-03 -4.5465171e-01]]]]

The Custom of the Tensor

In addition to the above two initialization methods, when the network wants to use data types that are not available in MindSpore, users can customize Tensor to initialize the parameters. The code sample is as follows:

import numpy as np
import mindspore.nn as nn
from mindspore import Tensor
from mindspore import dtype as mstype

weight = Tensor(np.ones([64, 3, 3, 3]), dtype=mstype.float32)
input_data = Tensor(np.ones([1, 3, 16, 50], dtype=np.float32))
net = nn.Conv2d(3, 64, 3, weight_init=weight)
output = net(input_data)
print(output)
[[[[12. 18. 18. ... 18. 18. 12.]
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   ...
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   [12. 18. 18. ... 18. 18. 12.]]

  ...

  [[12. 18. 18. ... 18. 18. 12.]
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   ...
   [18. 27. 27. ... 27. 27. 18.]
   [18. 27. 27. ... 27. 27. 18.]
   [12. 18. 18. ... 18. 18. 12.]]]]

Using the Initializer Method to Initialize Parameters

In the above code sample, the method of Parameter initialization in the network is given. For example, NN layer is used to encapsulate a Conv2D operator in the network, and the Parameter weight_init is passed into a Conv2D operator as the data type to be initialized. The operator will be initialized by calling Parameter class. Then the initializer method encapsulated in the Parameter class is called to initialize the parameters. However, some operators do not encapsulate the function of parameter initialization internally like Conv2D. For example, the weights of Conv3D operators are passed to Conv3D operators as parameters. In this case, it is necessary to manually define the initialization of weights.

When initializing a parameter, you can use the Initializer method to initialize the parameter by calling different data types in the Initializer subclasses, resulting in different types of data.

When initializer is used for parameter initialization, the parameters passed in are init, shape, dtype: -init: Supported subclasses of incoming Tensor, STR, Subclass of Initializer. -shape: Supported subclasses of incoming list, tuple, int. -dtype: Supported subclasses of incoming mindspore.dtype.

The Parameter of Init is Tensor

The code sample is shown below:

import numpy as np
from mindspore import Tensor
from mindspore import dtype as mstype
from mindspore.common import set_seed
from mindspore.common.initializer import initializer
from mindspore.ops.operations import nn_ops as nps

set_seed(1)

input_data = Tensor(np.ones([16, 3, 10, 32, 32]), dtype=mstype.float32)
weight_init = Tensor(np.ones([32, 3, 4, 3, 3]), dtype=mstype.float32)
weight = initializer(weight_init, shape=[32, 3, 4, 3, 3])
conv3d = nps.Conv3D(out_channel=32, kernel_size=(4, 3, 3))
output = conv3d(input_data, weight)
print(output)
The output is as follows:
[[[[[108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]
    ...
    [108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]]
    ...
   [[108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]
    ...
    [108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]
    [108 108 108 ... 108 108 108]]]]]

The Parameter of Init is Str

The code sample is as follows:

import numpy as np
from mindspore import Tensor
from mindspore import dtype as mstype
from mindspore.common import set_seed
from mindspore.common.initializer import initializer
from mindspore.ops.operations import nn_ops as nps

set_seed(1)

input_data = Tensor(np.ones([16, 3, 10, 32, 32]), dtype=mstype.float32)
weight = initializer('Normal', shape=[32, 3, 4, 3, 3], dtype=mstype.float32)
conv3d = nps.Conv3D(out_channel=32, kernel_size=(4, 3, 3))
output = conv3d(input_data, weight)
print(output)
The output is as follows:
[[[[[0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]
    ...
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]
    ...
   [[0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]
    ...
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]]]]

The Parameter of Init is the Subclass of Initializer

The code sample is as follows:

import numpy as np
from mindspore import Tensor
from mindspore import dtype as mstype
from mindspore.common import set_seed
from mindspore.ops.operations import nn_ops as nps
from mindspore.common.initializer import Normal, initializer

set_seed(1)

input_data = Tensor(np.ones([16, 3, 10, 32, 32]), dtype=mstype.float32)
weight = initializer(Normal(0.2), shape=[32, 3, 4, 3, 3], dtype=mstype.float32)
conv3d = nps.Conv3D(out_channel=32, kernel_size=(4, 3, 3))
output = conv3d(input_data, weight)
print(output)
[[[[[0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]
    ...
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]
    ...
   [[0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]
    ...
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]
    [0 0 0 ... 0 0 0]]]]]

Application in Parameter

The code sample is as follows:

import numpy as np
from mindspore import dtype as mstype
from mindspore.common import set_seed
from mindspore.ops import operations as ops
from mindspore import Tensor, Parameter, context
from mindspore.common.initializer import Normal, initializer

set_seed(1)

weight1 = Parameter(initializer('Normal', [5, 4], mstype.float32), name="w1")
weight2 = Parameter(initializer(Normal(0.2), [5, 4], mstype.float32), name="w2")
input_data = Tensor(np.arange(20).reshape(5, 4), dtype=mstype.float32)
net = ops.Add()
output = net(input_data, weight1)
output = net(output, weight2)
print(output)
[[-0.3305102  1.0412874  2.0412874  3.0412874]
 [ 4.0412874  4.9479127  5.9479127  6.9479127]
 [ 7.947912   9.063009  10.063009  11.063009 ]
 [12.063009  13.536987  14.536987  14.857441 ]
 [15.751231  17.073082  17.808317  19.364822 ]]