Updating Network Parameters
Overview
Parameter is a variable tensor, indicating the parameters that need to be updated during network training. The following describes the Parameter initialization, attributes, methods, and ParameterTuple.
Initialization
mindspore.Parameter(default_input, name=None, requires_grad=True, layerwise_parallel=False)
Initialize a Parameter object. The input data supports the Tensor, Initializer, int, and float types.
The initializer API can be called to generate the Initializer object.
When init is used to initialize Tensor, the Tensor only stores the shape and type of the tensor, not the actual data. Therefore, Tensor does not occupy any memory, you can call the init_data API to convert Tensor saved in Parameter to the actual data.
You can specify a name for each Parameter to facilitate subsequent operations and updates. It is recommended to use the default value of name when initialize a parameter as one attribute of a cell, otherwise, the parameter name may be different than expected.
To update a parameter, set requires_grad to True.
When layerwise_parallel is set to True, this parameter will be filtered out during parameter broadcast and parameter gradient aggregation.
For details about the configuration of distributed parallelism, see https://www.mindspore.cn/docs/programming_guide/en/r1.3/auto_parallel.html.
In the following example, Parameter objects are built using three different data types. All the three Parameter objects need to be updated, and layerwise parallelism is not used.
import numpy as np
from mindspore import Tensor, Parameter
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer
x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name='x')
y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')
z = Parameter(default_input=2.0, name='z')
print(x, "\n\n", y, "\n\n", z)
The following information is displayed:
Parameter (name=x, shape=(2, 3), dtype=Int32, requires_grad=True)
Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=z, shape=(), dtype=Float32, requires_grad=True)
Attributes
inited_param: returnsParameterthat stores the actual data.name: specifies a name for an instantiatedParameter.sliced: specifies whether the data stored inParameteris sharded data in the automatic parallel scenario.
If yes, do not shard the data. Otherwise, determine whether to shard the data based on the network parallel strategy.
is_init: initialization status ofParameter. At the GE backend, aninit graphis required to synchronize data from the host to the device. This parameter specifies whether the data has been synchronized to the device. This parameter takes effect only at the GE backend. This parameter is set to False at other backends.layerwise_parallel: specifies whetherParametersupports layerwise parallelism. If yes, parameters are not broadcasted and gradient aggregation is not performed. Otherwise, parameters need to be broadcasted and gradient aggregation is performed.requires_grad: specifies whether to compute the parameter gradient. If a parameter needs to be trained, the parameter gradient needs to be computed. Otherwise, the parameter gradient does not need to be computed.data:Parameter.
In the following example, Parameter is initialized through Tensor to obtain its attributes.
import numpy as np
from mindspore import Tensor, Parameter
x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))))
print("name: ", x.name, "\n",
"sliced: ", x.sliced, "\n",
"is_init: ", x.is_init, "\n",
"inited_param: ", x.inited_param, "\n",
"requires_grad: ", x.requires_grad, "\n",
"layerwise_parallel: ", x.layerwise_parallel, "\n",
"data: ", x.data)
The following information is displayed:
name: Parameter
sliced: False
is_init: False
inited_param: None
requires_grad: True
layerwise_parallel: False
data: Parameter (name=Parameter, shape=(2, 3), dtype=Int64, requires_grad=True)
Methods
init_data: When the network uses the semi-automatic or automatic parallel strategy, and the data input duringParameterinitialization isInitializer, this API can be called to convert the data saved byParametertoTensor.set_data: sets the data saved byParameter.Tensor,Initializer,int, andfloatcan be input for setting. When the input parameterslice_shapeof the method is set to True, the shape ofParametercan be changed. Otherwise, the configured shape must be the same as the original shape ofParameter.set_param_ps: controls whether training parameters are trained by using the Parameter Server.clone: clonesParameter. You can specify the parameter name after cloning.
In the following example, Initializer is used to initialize Tensor, and methods related to Parameter are called.
import numpy as np
from mindspore import Tensor, Parameter
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer
x = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32))
print(x)
x_clone = x.clone()
x_clone.name = "x_clone"
print(x_clone)
print(x.init_data())
print(x.set_data(data=Tensor(np.arange(2*3).reshape((1, 2, 3)))))
The following information is displayed:
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=x_clone, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
Parameter (name=Parameter, shape=(1, 2, 3), dtype=Float32, requires_grad=True)
ParameterTuple
Inherited from tuple, ParameterTuple is used to store multiple Parameter objects. __new__(cls, iterable) is used to transfer an iterator for storing Parameter for building, and the clone API is provided for cloning.
The following example builds a ParameterTuple object and clones it.
import numpy as np
from mindspore import Tensor, Parameter, ParameterTuple
from mindspore import dtype as mstype
from mindspore.common.initializer import initializer
x = Parameter(default_input=Tensor(np.arange(2*3).reshape((2, 3))), name='x')
y = Parameter(default_input=initializer('ones', [1, 2, 3], mstype.float32), name='y')
z = Parameter(default_input=2.0, name='z')
params = ParameterTuple((x, y, z))
params_copy = params.clone("params_copy")
print(params, "\n")
print(params_copy)
The following information is displayed:
(Parameter (name=x, shape=(2, 3), dtype=Int32, requires_grad=True), Parameter (name=y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=z, shape=(), dtype=Float32, requires_grad=True))
(Parameter (name=params_copy.x, shape=(2, 3), dtype=Int32, requires_grad=True), Parameter (name=params_copy.y, shape=(1, 2, 3), dtype=Float32, requires_grad=True), Parameter (name=params_copy.z, shape=(), dtype=Float32, requires_grad=True))
