# Loss Function `Ascend` `GPU` `CPU` `Model Development` Translator: [Misaka19998](https://gitee.com/Misaka19998) [![View Source On Gitee](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/r1.6/docs/mindspore/programming_guide/source_en/loss.md) ## Overview Loss function, also known as object function, is used for measuring the difference between predicted and true value. In deep learning, training a model is a process of decrease the loss value by iteration. So it is important to choose a loss function while training a model. A better loss function can efficiently increase model's performance. MindSpore provides many general loss functions for users. However, they are not suitable for all the situations. Users need to define their own loss functions in some cases. So this course will introduce how to define loss functions. Currently, MindSpore supports the following loss functions: `L1Loss`, `MSELoss`, `SmoothL1Loss`, `SoftmaxCrossEntropyWithLogits`, `SampledSoftmaxLoss`, `BCELoss`, and `CosineEmbeddingLoss`. All loss functions of MindSpore are implemented by subclasses of `Cell`. Therefore, customized loss functions are also supported. For details about how to build a loss function, see "Building a Customized Network." ### Built-in Loss Functions - L1Loss Computes the absolute value error of two input data for the regression model. The default value of `reduction` is mean. If the value of `reduction` is sum, the loss accumulation result is returned. If the value of `reduction` is none, the result of each loss is returned. - MSELoss Computes the square error of two input data for the regression model. The `reduction` parameter is the same as the `L1Loss` parameter. - SmoothL1Loss `SmoothL1Loss` is the smooth L1 loss function, which is used for the regression model. The default value of the `beta` threshold is 1. - SoftmaxCrossEntropyWithLogits Cross entropy loss function, which is used to classify models. If the tag data is not encoded in the one-hot mode, set `sparse` to True. The default value of `reduction` is none. The meaning of this parameter is the same as that of `L1Loss`. - CosineEmbeddingLoss `CosineEmbeddingLoss` is used to measure the similarity between two inputs and is used for classification models. The default value of `margin` is 0.0. The `reduction` parameter is the same as the `L1Loss` parameter. - BCELoss Binary cross entropy loss is used for binary classification. `weight` is a rescaling weight applied to the loss of each batch element. The default value of `weight` is None, which means the weight values are all 1. The default value of `reduction` parameter is none. The `reduction` parameter is the same as the `L1Loss` parameter. - SampledSoftmaxLoss Sampled softmax loss function, which is used for classification model when the number of class is large. `num_sampled` is the number of classes to randomly sample. `num_class` is the number of possible classes. `num_true` is the number of target classes per training example. `sampled_values` is the sampled candidate. The default value of `sampled_values` is None, which means UniformCandidateSampler is applied. `remove_accidental_hits` is the switch of whether to remove "accidental hits". The default value of `remove_accidental_hits` is True. `seed` is the random seed for candidate sampling with the default value of 0. The default value of reduction parameter is none. The `reduction` parameter is the same as the L1Loss parameter. ### Built-in Loss Functions Application Cases All loss functions of MindSpore are stored in mindspore.nn. The usage method is as follows: ```python import numpy as np import mindspore.nn as nn from mindspore import Tensor loss = nn.L1Loss() input_data = Tensor(np.array([[1, 2, 3], [2, 3, 4]]).astype(np.float32)) target_data = Tensor(np.array([[0, 2, 5], [3, 1, 1]]).astype(np.float32)) print(loss(input_data, target_data)) ``` ```text 1.5 ``` In this case, two pieces of tensor data are built. The `nn.L1Loss` API is used to define the loss, `input_data` and `target_data` are transferred to the loss, and the L1Loss computation is performed. The result is 1.5. If loss is set to nn.L1Loss(reduction='sum'), the result is 9.0. If loss is set to nn.L1Loss(reduction='none'), the result is [[1. 0. 2.] [1. 2. 3.]]. ## Defining Loss Function Cell is the basic network module of MindSpore, and can be used to construct the network and define loss functions. The way to define a loss function is the same as defining a network. The difference is that its execution logic is used to calculate the error between the output of the forward network and the true value. Taking a MindSpore loss function, L1 Loss, as an example. The way to define the loss function is as follow: ```python import mindspore.nn as nn import mindspore.ops as ops class L1Loss(nn.Cell): def __init__(self): super(L1Loss, self).__init__() self.abs = ops.Abs() self.reduce_mean = ops.ReduceMean() def construct(self, base, target): x = self.abs(base - target) return self.reduce_mean(x) ``` The needed operator will be instantiated in `__init__`method and used in `construct`. Then an L1Loss function is defined. With a series of given predicted and true value, users can call the loss function to get the difference of them, as follow: ```python import numpy as np from mindspore import Tensor loss = L1Loss() input_data = Tensor(np.array([0.1, 0.2, 0.3]).astype(np.float32)) target_data = Tensor(np.array([0.1, 0.2, 0.2]).astype(np.float32)) output = loss(input_data, target_data) print(output) ``` Taking `Ascend` backup as an example, the output is as follow: ```text 0.03333334 ``` When the loss function is defined, the base class `Loss` of the loss function can also be inherited. `Loss` provides the `get_loss` method, which is used to sum or average the loss values and output a scalar. The definition of L1Loss using `Loss` as the base class is as follows: ```python import mindspore.ops as ops from mindspore.nn import LossBase class L1Loss(LossBase): def __init__(self, reduction="mean"): super(L1Loss, self).__init__(reduction) self.abs = ops.Abs() def construct(self, base, target): x = self.abs(base - target) return self.get_loss(x) ``` Firstly, we use `Loss` as the base class of L1Loss, and then add a parameter `reduction` to `__init__`, and then pass to base class by `super`. Finally we call `get_loss` method in `construct`. `reduction` has three legal parameters, `mean`, `sum` and `none`, which represent average, sum and original value. ## Loss Function and Model Training Now we train model by the defined L1Loss. ### Defining Dataset and Network Taking the simple linear function fitting as an example. The dataset and network structure is defined as follows: > For a detailed introduction of linear fitting, please refer to the tutorial [Implementing Simple Linear Function Fitting](https://www.mindspore.cn/tutorials/en/r1.6/linear_regression.html). 1. Defining the Dataset ```python import numpy as np from mindspore import dataset as ds def get_data(num, w=2.0, b=3.0): for _ in range(num): x = np.random.uniform(-10.0, 10.0) noise = np.random.normal(0, 1) y = x * w + b + noise yield np.array([x]).astype(np.float32), np.array([y]).astype(np.float32) def create_dataset(num_data, batch_size=16): dataset = ds.GeneratorDataset(list(get_data(num_data)), column_names=['data', 'label']) dataset = dataset.batch(batch_size) return dataset ``` 2. Defining the Network ```python from mindspore.common.initializer import Normal import mindspore.nn as nn class LinearNet(nn.Cell): def __init__(self): super(LinearNet, self).__init__() self.fc = nn.Dense(1, 1, Normal(0.02), Normal(0.02)) def construct(self, x): return self.fc(x) ``` ### Training Model `Model` is a MindSpore high level API which is for training, evaluating and inferring a model. After creating a dataset and defining `Model`, we can train the model by API `train`. Then we will train the model by `Model`, and use the defined `L1Loss` as loss function. 1. Defining forward network, loss function and optimizer We will use the defined `LinearNet` and `L1Loss` as forward network and loss function, and choose MindSpore's `Momemtum` as optimizer. ```python # define network net = LinearNet() # define loss function loss = L1Loss() # define optimizer opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9) ``` 2. Defining `Model` When defining `Model`, it specifies the forward network, loss function and optimizer. The `Model` will associate them internally to form a training network. ```python from mindspore import Model # define Model model = Model(net, loss, opt) ``` 3. Creating dataset, and calling `train` to train the model When calling the train interface, you must specify the number of iterations `epoch` and the training dataset `train_dataset`. We set `epoch` to 1, and use the dataset created by `create_dataset` as the training set. `callbacks` is an optional parameter of the `train` interface. `LossMonitor` can be used in `callbacks` to monitor the change of the loss function value during the training process. `dataset_sink_mode` is also an optional parameter, here is set to False, which means to use non-sink mode for training. ```python from mindspore.train.callback import LossMonitor # create dataset ds_train = create_dataset(num_data=160) # training model.train(epoch=1, train_dataset=ds_train, callbacks=[LossMonitor()], dataset_sink_mode=False) ``` The complete code is as follows: > In the following example, the parameter initialization uses random values, and the output results in specific execution may be different from the results of local execution; if you need to stabilize the output of a fixed value, you can set a fixed random seed. For the setting method, please refer to [mindspore.set_seed()](https://www.mindspore.cn/docs/api/en/r1.6/api_python/mindspore/mindspore.set_seed.html). ```python import numpy as np import mindspore.nn as nn import mindspore.ops as ops from mindspore import Model from mindspore import dataset as ds from mindspore.nn import LossBase from mindspore.common.initializer import Normal from mindspore.train.callback import LossMonitor class LinearNet(nn.Cell): def __init__(self): super(LinearNet, self).__init__() self.fc = nn.Dense(1, 1, Normal(0.02), Normal(0.02)) def construct(self, x): return self.fc(x) class L1Loss(LossBase): def __init__(self, reduction="mean"): super(L1Loss, self).__init__(reduction) self.abs = ops.Abs() def construct(self, base, target): x = self.abs(base - target) return self.get_loss(x) def get_data(num, w=2.0, b=3.0): for _ in range(num): x = np.random.uniform(-10.0, 10.0) noise = np.random.normal(0, 1) y = x * w + b + noise yield np.array([x]).astype(np.float32), np.array([y]).astype(np.float32) def create_dataset(num_data, batch_size=16): dataset = ds.GeneratorDataset(list(get_data(num_data)), column_names=['data', 'label']) dataset = dataset.batch(batch_size) return dataset # define network net = LinearNet() # define loss functhon loss = L1Loss() # define optimizer opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9) # define Model model = Model(net, loss, opt) # create dataset ds_train = create_dataset(num_data=160) # training model.train(epoch=1, train_dataset=ds_train, callbacks=[LossMonitor()], dataset_sink_mode=False) ``` The output is as follows: ```text epoch: 1 step: 1, loss is 8.328788 epoch: 1 step: 2, loss is 8.594973 epoch: 1 step: 3, loss is 13.299595 epoch: 1 step: 4, loss is 9.04059 epoch: 1 step: 5, loss is 8.991402 epoch: 1 step: 6, loss is 6.5928526 epoch: 1 step: 7, loss is 8.239887 epoch: 1 step: 8, loss is 7.3984795 epoch: 1 step: 9, loss is 7.33724 epoch: 1 step: 10, loss is 4.3588376 ``` ## Multilabel Loss Function and Model Training In the last chapter, we defined a simple loss function `L1Loss`. Writing other loss functions is similar to `L1Loss`. However, some deep learning datasets are complex, such as the object detection network Faster R-CNN's dataset, which has several labels rather than simple data or label. The definition and usage of loss function is different in this situation. Faster R-CNN's structure is too complex to detailed describe here. This chapter will expand the linear function fitting by creating a multilabel dataset. Then we will introduce how to define loss function and train by `Model`. ### Defining Multilabel Dataset Firstly we define the dataset and make a slight modification to it: 1. `get_multilabel_data` will output two labels,`y1` and `y2`. 2. The parameters of `column_names` of `GeneratorDataset` are ['data', 'label1', 'label2'] Then `create_multilabel_dataset` will create dataset which has one `data`, and two labels `label1` and `label2`. ```python import numpy as np from mindspore import dataset as ds def get_multilabel_data(num, w=2.0, b=3.0): for _ in range(num): x = np.random.uniform(-10.0, 10.0) noise1 = np.random.normal(0, 1) noise2 = np.random.normal(-1, 1) y1 = x * w + b + noise1 y2 = x * w + b + noise2 yield np.array([x]).astype(np.float32), np.array([y1]).astype(np.float32), np.array([y2]).astype(np.float32) def create_multilabel_dataset(num_data, batch_size=16): dataset = ds.GeneratorDataset(list(get_multilabel_data(num_data)), column_names=['data', 'label1', 'label2']) dataset = dataset.batch(batch_size) return dataset ``` ### Defining Multilabel Loss Function We will define a loss function `L1LossForMultiLabel` according to defined multilabel dataset. The inputs of loss function's `construct` are predicted value `base`, and true value `target1` and `target2`. We will calculate the error between predict value and `target1`, `target2` respectively, and take the average of two values as final loss. The code is as follow: ```python import mindspore.ops as ops from mindspore.nn import LossBase class L1LossForMultiLabel(LossBase): def __init__(self, reduction="mean"): super(L1LossForMultiLabel, self).__init__(reduction) self.abs = ops.Abs() def construct(self, base, target1, target2): x1 = self.abs(base - target1) x2 = self.abs(base - target2) return self.get_loss(x1)/2 + self.get_loss(x2)/2 ``` ### Training Multilabel Model Model will internally link the forward network, loss function and optimizer. Forward network is connected to loss function by `nn.WithLossCell`, and forward network is connected to loss function by`nn.WithLossCell` as follows: ```python import mindspore.nn as nn class WithLossCell(nn.Cell): def __init__(self, backbone, loss_fn): super(WithLossCell, self).__init__(auto_prefix=False) self._backbone = backbone self._loss_fn = loss_fn def construct(self, data, label): output = self._backbone(data) return self._loss_fn(output, label) ``` It should be noted that the default `nn.WithLossCell` of normal `Model` only has two inputs `data` and `label` , which is not suitable for multilabel case. Users need to connect the forward network and loss function as follows if they want to train by `Model`. 1. Defining the suitable `CustomWithLossCell` in this case We can copy the definition of `nn.WithLossCell` by changing the input of the `construct` to three parameters, that is, passing data to `backend`, and predicted and true value to`loss_fn`. ```python import mindspore.nn as nn class CustomWithLossCell(nn.Cell): def __init__(self, backbone, loss_fn): super(CustomWithLossCell, self).__init__(auto_prefix=False) self._backbone = backbone self._loss_fn = loss_fn def construct(self, data, label1, label2): output = self._backbone(data) return self._loss_fn(output, label1, label2) ``` 2. Connecting the forward network and loss function by `CustomWithLossCell` We use the forward network `LinearNet` defined in last chapter, and loss function `L1LossForMultiLabel`. Then connecting them by `CustomWithLossCell` as follows: ```python net = LinearNet() loss = L1LossForMultiLabel() loss_net = CustomWithLossCell(net, loss) ``` `loss_net` contains the logic of forward network and loss function. 3. Defining Model and Training The `network` of `Model` is set to `loss_net`. `loss_fn` is not appointed, while the optimizer is still `Momentum`. As the user do not appoint `loss_fn`, `Model` will know that `network` has its own loss function logit. And it will not encapsulate forward network and loss function by `nn.WithLossCell`. Creating multilabel dataset by `create_multilabel_dataset` and training: ```python from mindspore.train.callback import LossMonitor from mindspore import Model opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9) model = Model(network=loss_net, optimizer=opt) ds_train = create_multilabel_dataset(num_data=160) model.train(epoch=1, train_dataset=ds_train, callbacks=[LossMonitor()], dataset_sink_mode=False) ``` The complete code is as follows: > In the following example, the parameter initialization uses random values, and the output results in specific execution may be different from the results of local execution; if you need to stabilize the output of a fixed value, you can set a fixed random seed. For the setting method, please refer to [mindspore.set_seed()](https://www.mindspore.cn/docs/api/en/r1.6/api_python/mindspore/mindspore.set_seed.html). ```python import numpy as np import mindspore.nn as nn import mindspore.ops as ops from mindspore import Model from mindspore import dataset as ds from mindspore.nn import LossBase from mindspore.common.initializer import Normal from mindspore.train.callback import LossMonitor class LinearNet(nn.Cell): def __init__(self): super(LinearNet, self).__init__() self.fc = nn.Dense(1, 1, Normal(0.02), Normal(0.02)) def construct(self, x): return self.fc(x) class L1LossForMultiLabel(LossBase): def __init__(self, reduction="mean"): super(L1LossForMultiLabel, self).__init__(reduction) self.abs = ops.Abs() def construct(self, base, target1, target2): x1 = self.abs(base - target1) x2 = self.abs(base - target2) return self.get_loss(x1)/2 + self.get_loss(x2)/2 class CustomWithLossCell(nn.Cell): def __init__(self, backbone, loss_fn): super(CustomWithLossCell, self).__init__(auto_prefix=False) self._backbone = backbone self._loss_fn = loss_fn def construct(self, data, label1, label2): output = self._backbone(data) return self._loss_fn(output, label1, label2) def get_multilabel_data(num, w=2.0, b=3.0): for _ in range(num): x = np.random.uniform(-10.0, 10.0) noise1 = np.random.normal(0, 1) noise2 = np.random.normal(-1, 1) y1 = x * w + b + noise1 y2 = x * w + b + noise2 yield np.array([x]).astype(np.float32), np.array([y1]).astype(np.float32), np.array([y2]).astype(np.float32) def create_multilabel_dataset(num_data, batch_size=16): dataset = ds.GeneratorDataset(list(get_multilabel_data(num_data)), column_names=['data', 'label1', 'label2']) dataset = dataset.batch(batch_size) return dataset net = LinearNet() loss = L1LossForMultiLabel() # build loss network loss_net = CustomWithLossCell(net, loss) opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9) model = Model(network=loss_net, optimizer=opt) ds_train = create_multilabel_dataset(num_data=160) model.train(epoch=1, train_dataset=ds_train, callbacks=[LossMonitor()], dataset_sink_mode=False) ``` The output is as follow: ```text epoch: 1 step: 1, loss is 11.039986 epoch: 1 step: 2, loss is 7.7847576 epoch: 1 step: 3, loss is 9.236277 epoch: 1 step: 4, loss is 8.3316345 epoch: 1 step: 5, loss is 6.957058 epoch: 1 step: 6, loss is 9.231144 epoch: 1 step: 7, loss is 9.1072 epoch: 1 step: 8, loss is 6.7703295 epoch: 1 step: 9, loss is 6.363703 epoch: 1 step: 10, loss is 5.014839 ``` This chapter explains how to define loss function and train by `Model` in multilabel case. In some other cases, we can train the model by similar ways.