{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 构建训练与评估网络\n", "\n", "`Ascend` `GPU` `CPU` `模型开发` `模型运行` `模型评估`\n", "\n", "[![在线运行](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_modelarts.png)](https://authoring-modelarts-cnnorth4.huaweicloud.com/console/lab?share-url-b64=aHR0cHM6Ly9taW5kc3BvcmUtd2Vic2l0ZS5vYnMuY24tbm9ydGgtNC5teWh1YXdlaWNsb3VkLmNvbS9ub3RlYm9vay9tYXN0ZXIvcHJvZ3JhbW1pbmdfZ3VpZGUvemhfY24vbWluZHNwb3JlX3RyYWluX2FuZF9ldmFsLmlweW5i&imageid=65f636a0-56cf-49df-b941-7d2a07ba8c8c) [![下载Notebook](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_notebook.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r1.6/programming_guide/zh_cn/mindspore_train_and_eval.ipynb) [![下载样例代码](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_download_code.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r1.6/programming_guide/zh_cn/mindspore_train_and_eval.py) [![查看源文件](https://gitee.com/mindspore/docs/raw/r1.6/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.6/docs/mindspore/programming_guide/source_zh_cn/train_and_eval.ipynb)\n", "\n", "## 概述\n", "\n", "前面章节讲解了MindSpore构建网络所使用的基本元素,如MindSpore的网络基本单元、损失函数、优化器等。本文档重点介绍如何使用这些元素组成训练和评估网络。\n", "\n", "## 构建前向网络\n", "\n", "使用Cell构建前向网络,这里定义一个简单的线性回归LinearNet:\n", "\n", "> Cell的用法详见" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.256615Z", "start_time": "2021-12-31T07:16:14.276838Z" } }, "outputs": [], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "from mindspore.common.initializer import Normal\n", "\n", "class LinearNet(nn.Cell):\n", " def __init__(self):\n", " super().__init__()\n", " self.fc = nn.Dense(1, 1, Normal(0.02), Normal(0.02))\n", "\n", " def construct(self, x):\n", " return self.fc(x)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 构建训练网络\n", "\n", "构建训练网络需要在前向网络的基础上叠加损失函数、反向传播和优化器。\n", "\n", "### 使用训练网络包装函数\n", "\n", "MindSpore的nn模块提供了训练网络封装函数`TrainOneStepCell`,下面使用`nn.TrainOneStepCell`将前面定义的LinearNet封装成一个训练网络。具体过程如下:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.275751Z", "start_time": "2021-12-31T07:16:16.256615Z" } }, "outputs": [], "source": [ "# 实例化前向网络\n", "net = LinearNet()\n", "# 设定损失函数并连接前向网络与损失函数\n", "crit = nn.MSELoss()\n", "net_with_criterion = nn.WithLossCell(net, crit)\n", "# 设定优化器\n", "opt = nn.Momentum(net.trainable_params(), learning_rate=0.005, momentum=0.9)\n", "# 定义训练网络\n", "train_net = nn.TrainOneStepCell(net_with_criterion, opt)\n", "# 设置网络为训练模式\n", "train_net.set_train()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`set_train`递归地配置了`Cell`的`training`属性,在实现训练和推理结构不同的网络时可以通过`training`属性区分训练和推理场景,例如`BatchNorm`、`Dropout`。\n", "\n", "前面的[损失函数](https://www.mindspore.cn/docs/programming_guide/zh-CN/r1.6/loss.html)章节已经介绍了如何定义损失函数,以及如何使用`WithLossCell`将前向网络与损失函数连接起来,这里介绍如何获取梯度和更新权重,构成一个完整的训练网络。MindSpore提供的`nn.TrainOneStepCell`具体实现如下:" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.289390Z", "start_time": "2021-12-31T07:16:16.277762Z" } }, "outputs": [], "source": [ "import mindspore.ops as ops\n", "from mindspore.context import get_auto_parallel_context, ParallelMode\n", "from mindspore.communication import get_group_size\n", "\n", "def get_device_num():\n", " \"\"\"Get the device num.\"\"\"\n", " parallel_mode = auto_parallel_context().get_parallel_mode()\n", " if parallel_mode == \"stand_alone\":\n", " device_num = 1\n", " return device_num\n", "\n", " if auto_parallel_context().get_device_num_is_set() is False:\n", " device_num = get_group_size()\n", " else:\n", " device_num = auto_parallel_context().get_device_num()\n", " return device_num\n", "\n", "class TrainOneStepCell(nn.Cell):\n", " def __init__(self, network, optimizer, sens=1.0):\n", " super(TrainOneStepCell, self).__init__(auto_prefix=False)\n", " self.network = network\n", " self.network.set_grad()\n", " self.optimizer = optimizer\n", " self.weights = self.optimizer.parameters\n", " self.grad = ops.GradOperation(get_by_list=True, sens_param=True)\n", " self.sens = sens\n", " self.reducer_flag = False\n", " self.grad_reducer = ops.identity\n", " self.parallel_mode = auto_parallel_context().get_parallel_mode()\n", " self.reducer_flag = self.parallel_mode in (ParallelMode.DATA_PARALLEL, ParallelMode.HYBRID_PARALLEL)\n", " if self.reducer_flag:\n", " self.mean = auto_parallel_context().get_gradients_mean()\n", " self.degree = get_device_num()\n", " self.grad_reducer = nn.DistributedGradReducer(self.weights, self.mean, self.degree)\n", "\n", " def construct(self, *inputs):\n", " loss = self.network(*inputs)\n", " sens = F.fill(loss.dtype, loss.shape, self.sens)\n", " grads = self.grad(self.network, self.weights)(*inputs, sens)\n", " grads = self.grad_reducer(grads)\n", " loss = F.depend(loss, self.optimizer(grads))\n", " return loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`TrainOneStepCell`中包含入参:\n", "\n", "- network (Cell):参与训练的网络,该网络包含前向网络和损失函数的计算逻辑,输入数据和标签,输出损失函数值。\n", "\n", "- optimizer (Cell):所使用的优化器。\n", "\n", "- sens (float):反向传播缩放比例。\n", "\n", "`TrainOneStepCell`初始化时还定义了以下内容:\n", "\n", "- GradOperation:反向传播函数,用于进行反向传播并获取梯度。\n", "\n", "- DistributedGradReducer:用于在分布式场景下进行梯度广播,单机单卡不需要使用。\n", "\n", "`construct`定义的训练执行过程主要包含4个步骤:\n", "\n", "- `loss = self.network(*inputs)`:执行前向网络,计算当前输入的损失函数值。\n", "- `grads = self.grad(self.network, self.weights)(*inputs, sens)`:进行反向传播,计算梯度。\n", "- `grads = self.grad_reducer(grads)`:在分布式情况下进行梯度广播,单机单卡时直接返回输入梯度。\n", "- `self.optimizer(grads)`:使用优化器更新权重。\n", "\n", "### 创建数据集并执行训练\n", "\n", "生成数据集并进行数据预处理:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.433585Z", "start_time": "2021-12-31T07:16:16.291411Z" } }, "outputs": [], "source": [ "import mindspore.dataset as ds\n", "import numpy as np\n", "\n", "def get_data(num, w=2.0, b=3.0):\n", " for _ in range(num):\n", " x = np.random.uniform(-10.0, 10.0)\n", " noise = np.random.normal(0, 1)\n", " y = x * w + b + noise\n", " yield np.array([x]).astype(np.float32), np.array([y]).astype(np.float32)\n", "\n", "def create_dataset(num_data, batch_size=16):\n", " dataset = ds.GeneratorDataset(list(get_data(num_data)), column_names=['data', 'label'])\n", " dataset = dataset.batch(batch_size)\n", " return dataset\n", "\n", "train_dataset = create_dataset(num_data=160)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用`nn.TrainOneStepCell`封装的训练网络的输出损失函数值,进行模型训练:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.757920Z", "start_time": "2021-12-31T07:16:16.435621Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "145.24876\n", "70.20597\n", "14.140765\n", "39.922314\n", "115.094505\n", "44.916664\n", "61.667316\n", "14.272891\n", "6.810166\n", "20.222588\n", "33.35916\n", "38.416348\n", "11.631884\n", "5.031072\n", "6.032862\n", "18.471722\n", "19.095896\n", "5.288643\n", "4.173666\n", "1.0577652\n" ] } ], "source": [ "# 获取训练过程数据\n", "epochs = 2\n", "for epoch in range(epochs):\n", " for d in train_dataset.create_dict_iterator():\n", " result = train_net(d[\"data\"], d[\"label\"])\n", " print(result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 自定义训练网络包装函数\n", "\n", "一般情况下,用户可以使用框架提供的`nn.TrainOneStepCell`封装训练网络,在`nn.TrainOneStepCell`不能满足需求时,则需要自定义符合实际场景的`TrainOneStepCell`。例如:\n", "\n", "1、ModelZoo中的Bert就在`nn.TrainOneStepCell`的基础上,加入了梯度截断操作,以获得更好的训练效果,Bert定义的训练包装函数代码片段如下:\n", "\n", "> Bert网络详见:https://gitee.com/mindspore/models/tree/r1.6/official/nlp/bert" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.769749Z", "start_time": "2021-12-31T07:16:16.758936Z" } }, "outputs": [], "source": [ "GRADIENT_CLIP_TYPE = 1\n", "GRADIENT_CLIP_VALUE = 1.0\n", "\n", "clip_grad = ops.MultitypeFuncGraph(\"clip_grad\")\n", "\n", "@clip_grad.register(\"Number\", \"Number\", \"Tensor\")\n", "def _clip_grad(clip_type, clip_value, grad):\n", " if clip_type not in (0, 1):\n", " return grad\n", " dt = ops.dtype(grad)\n", " if clip_type == 0:\n", " new_grad = ops.clip_by_value(grad, ops.cast(ops.tuple_to_array((-clip_value,)), dt),\n", " ops.cast(ops.tuple_to_array((clip_value,)), dt))\n", " else:\n", " new_grad = nn.ClipByNorm()(grad, ops.cast(ops.tuple_to_array((clip_value,)), dt))\n", " return new_grad\n", "\n", "class BertTrainOneStepCell(nn.TrainOneStepCell):\n", " def __init__(self, network, optimizer, sens=1.0, enable_clip_grad=True):\n", " super(BertTrainOneStepCell, self).__init__(network, optimizer, sens)\n", " self.cast = ops.Cast()\n", " self.hyper_map = ops.HyperMap()\n", " self.enable_clip_grad = enable_clip_grad\n", "\n", " def construct(self, *inputs):\n", " weights = self.weights\n", " loss = self.network(*inputs)\n", " grads = self.grad(self.network, weights)(*inputs, self.cast(ops.tuple_to_array((self.sens,)), mstype.float32))\n", " if self.enable_clip_grad:\n", " # 进行梯度截断\n", " grads = self.hyper_map(ops.partial(clip_grad, GRADIENT_CLIP_TYPE, GRADIENT_CLIP_VALUE), grads)\n", " grads = self.grad_reducer(grads)\n", " self.optimizer(grads)\n", " return loss" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "2、Wide&Deep输出两个损失函数值,并对网络的Wide和Deep两部分分别进行反向传播和参数更新,而`nn.TrainOneStep`仅适用于一个损失函数值的场景,因此ModelZoo中Wide&Deep自定义了训练封装函数,代码片段如下:\n", "\n", "> Wide&Deep网络详见:。" ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.789155Z", "start_time": "2021-12-31T07:16:16.771795Z" } }, "outputs": [], "source": [ "class IthOutputCell(nn.Cell):\n", " \"\"\"\n", " IthOutputCell\n", " \"\"\"\n", " def __init__(self, network, output_index):\n", " super(IthOutputCell, self).__init__()\n", " self.network = network\n", " self.output_index = output_index\n", "\n", " def construct(self, *inputs):\n", " \"\"\"\n", " IthOutputCell construct\n", " \"\"\"\n", " predict = self.network(*inputs)[self.output_index]\n", " return predict\n", "\n", "class TrainStepWrap(nn.Cell):\n", " def __init__(self, network, config, sens=1000.0):\n", " super(TrainStepWrap, self).__init__()\n", " self.network = network\n", " self.network.set_train()\n", " self.trainable_params = network.trainable_params()\n", " weights_w = []\n", " weights_d = []\n", " for params in self.trainable_params:\n", " if 'wide' in params.name:\n", " weights_w.append(params)\n", " else:\n", " weights_d.append(params)\n", "\n", " self.weights_w = ParameterTuple(weights_w)\n", " self.weights_d = ParameterTuple(weights_d)\n", " self.optimizer_w = nn.FTRL(learning_rate=config.ftrl_lr,\n", " params=self.weights_w,\n", " l1=5e-4,\n", " l2=5e-4,\n", " initial_accum=0.1,\n", " loss_scale=sens)\n", "\n", " self.optimizer_d = nn.Adam(self.weights_d,\n", " learning_rate=config.adam_lr,\n", " eps=1e-6,\n", " loss_scale=sens)\n", "\n", " self.hyper_map = ops.HyperMap()\n", "\n", " self.grad_w = ops.GradOperation(get_by_list=True, sens_param=True)\n", " self.grad_d = ops.GradOperation(get_by_list=True, sens_param=True)\n", "\n", " self.sens = sens\n", " self.loss_net_w = IthOutputCell(network, output_index=0)\n", " self.loss_net_d = IthOutputCell(network, output_index=1)\n", " self.loss_net_w.set_grad()\n", " self.loss_net_w.set_grad()\n", "\n", " self.reducer_flag = False\n", " self.grad_reducer_w = None\n", " self.grad_reducer_d = None\n", " parallel_mode = get_auto_parallel_context(\"parallel_mode\")\n", " if parallel_mode in (ParallelMode.DATA_PARALLEL,\n", " ParallelMode.HYBRID_PARALLEL):\n", " self.reducer_flag = True\n", " if self.reducer_flag:\n", " mean = get_auto_parallel_context(\"gradients_mean\")\n", " degree = get_auto_parallel_context(\"device_num\")\n", " self.grad_reducer_w = DistributedGradReducer(\n", " self.optimizer_w.parameters, mean, degree)\n", " self.grad_reducer_d = DistributedGradReducer(\n", " self.optimizer_d.parameters, mean, degree)\n", "\n", " def construct(self, *inputs):\n", " \"\"\"\n", " TrainStepWrap construct\n", " \"\"\"\n", " weights_w = self.weights_w\n", " weights_d = self.weights_d\n", " loss_w, loss_d = self.network(*inputs)\n", "\n", " sens_w = ops.Fill()(ops.DType()(loss_w), ops.Shape()(loss_w), self.sens)\n", " sens_d = ops.Fill()(ops.DType()(loss_d), ops.Shape()(loss_d), self.sens)\n", " grads_w = self.grad_w(self.loss_net_w, weights_w)(*inputs, sens_w)\n", " grads_d = self.grad_d(self.loss_net_d, weights_d)(*inputs, sens_d)\n", " if self.reducer_flag:\n", " # apply grad reducer on grads\n", " grads_w = self.grad_reducer_w(grads_w)\n", " grads_d = self.grad_reducer_d(grads_d)\n", " return ops.depend(loss_w, self.optimizer_w(grads_w)), ops.depend(\n", " loss_d, self.optimizer_d(grads_d))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 构建评估网络\n", "\n", "评估网络的功能是输出预测值和真实标签,以便在验证集上评估模型训练的效果。MindSpore同样提供了评估网络包装函数`nn.WithEvalCell`。\n", "\n", "### 使用评估网络包装函数\n", "\n", "使用前面定义的前向网络和损失函数构建一个评估网络,具体过程如下:" ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.821621Z", "start_time": "2021-12-31T07:16:16.792286Z" } }, "outputs": [], "source": [ "# 构建评估网络\n", "eval_net = nn.WithEvalCell(net, crit)\n", "eval_net.set_train(False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "执行`eval_net`输出预测值和标签,并使用评估指标进行处理,便可获得模型评估结果。`nn.WithEvalCell`的具体定义如下:" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.860587Z", "start_time": "2021-12-31T07:16:16.823210Z" } }, "outputs": [], "source": [ "class WithEvalCell(nn.Cell):\n", " def __init__(self, network, loss_fn, add_cast_fp32=False):\n", " super(WithEvalCell, self).__init__(auto_prefix=False)\n", " self._network = network\n", " self._loss_fn = loss_fn\n", "\n", " def construct(self, data, label):\n", " outputs = self._network(data)\n", " if self.add_cast_fp32:\n", " label = F.mixed_precision_cast(mstype.float32, label)\n", " outputs = F.cast(outputs, mstype.float32)\n", " loss = self._loss_fn(outputs, label)\n", " return loss, outputs, label" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`WithEvalCell`中包含入参:\n", "\n", "- network (Cell):前向网络,输入数据和标签,并输出预测值。\n", "\n", "- loss_fn (Cell):所使用的损失函数,MindSpore提供的`WithEvalCell`输出`loss`,以便于将损失函数也作为一个评价指标,实际场景中`loss`并不是必须的输出项。\n", "\n", "- add_cast_fp32 (Bool):是否使用float32精度计算损失函数,目前该参数仅在`Model`使用`nn.WithEvalCell`构建评估网络时生效。\n", "\n", "`construct`定义的训练执行过程主要包含2个步骤:\n", "\n", "- `outputs = self._network(data)`:执行前向网络,计算当前输入数据的预测值。\n", "\n", "- `return loss, outputs, label`:输出当前输入的损失函数值、预测值和标签。\n", "\n", "### 创建数据集并执行评估\n", "\n", "定义模型评价指标:" ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.873312Z", "start_time": "2021-12-31T07:16:16.862602Z" } }, "outputs": [], "source": [ "mae = nn.MAE()\n", "loss = nn.Loss()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "使用前面定义的`DatasetGenerator`创建验证集:" ] }, { "cell_type": "code", "execution_count": 11, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.909292Z", "start_time": "2021-12-31T07:16:16.875364Z" } }, "outputs": [], "source": [ "eval_dataset = create_dataset(num_data=160)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "遍历数据集,执行`eval_net`,并使用`eval_net`的输出计算评估指标:" ] }, { "cell_type": "code", "execution_count": 12, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.982492Z", "start_time": "2021-12-31T07:16:16.910299Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "mae: 1.8630892157554626\n", "loss: 4.745016288757324\n" ] } ], "source": [ "mae.clear()\n", "loss.clear()\n", "for d in eval_dataset.create_dict_iterator():\n", " outputs = eval_net(d[\"data\"], d[\"label\"])\n", " mae.update(outputs[1], outputs[2])\n", " loss.update(outputs[0])\n", "\n", "mae_result = mae.eval()\n", "loss_result = loss.eval()\n", "print(\"mae: \", mae_result)\n", "print(\"loss: \", loss_result)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`nn.WithEvalCell`输出损失函数值以便于计算评价指标`Loss`,如不需要可忽略该输出。\n", "\n", "由于数据和权重具有随机性,因此训练结果具有随机性。\n", "\n", "### 自定义评估网络包装函数\n", "\n", "前面我们讲解了`nn.WithEvalCell`的计算逻辑,注意到`nn.WithEvalCell`只有两个输入data和label,对于多个数据或多个标签的场景显然不适用,此时如果想要构建评估网络就需要自定义`WithEvalCell`。这是因为评估网络需要使用数据计算预测值,并输出标签,当用户向`WithEvalCell`传入大于两个的输入时,框架无法识别这些输入中哪些是数据,哪些是标签。在自定义时,如不需要损失函数作为评价指标,则无需定义`loss_fn`。\n", "\n", "以输入三个输入`data`, `label1`, `label2`为例,可以采用如下方式自定义`WithEvalCell`:" ] }, { "cell_type": "code", "execution_count": 13, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:16.990257Z", "start_time": "2021-12-31T07:16:16.984560Z" } }, "outputs": [], "source": [ "class CustomWithEvalCell(nn.Cell):\n", " def __init__(self, network):\n", " super(CustomWithEvalCell, self).__init__(auto_prefix=False)\n", " self._network = network\n", "\n", " def construct(self, data, label1, label2):\n", " outputs = self._network(data)\n", " return outputs, label1, label2\n", "\n", "eval_net = CustomWithEvalCell(net)\n", "eval_net.set_train(False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "MindSpore提供的基础评估指标仅适用于两个输入logits和label,当评估网络输出多个标签或多个预测值时,需要调用set_indexes函数指定哪几个输出用于计算评价指标。如果多个输出均需要用于计算评价指标,意味着MindSpore内置的评价指标不能满足需求,需要自定义。\n", "\n", "Metric的使用方法和自定义方式详见。\n", "\n", "## 构建网络的权重共享\n", "\n", "通过前面的介绍可以看出,前向网络、训练网络和评估网络具有不同的逻辑,因此在需要时我们会构建三张网络。我们经常使用训练好的模型进行推理和评估,这就需要推理和评估网络中的权重值与训练网络中相同。使用模型保存和加载接口,将训练好的模型保存下来,再加载到推理和评估网络中,可以确保权重值相同。在训练平台上完成模型训练,再到其他推理平台进行推理时,模型保存与加载是必不可少的。\n", "\n", "但在网络调测过程中,或使用边训练边验证方式进行模型调优时,往往在同一Python脚本中完成模型训练,评估或推理,此时MindSpore的权重共享机制可确保不同网络间的权重一致性。\n", "\n", "使用MindSpore构建不同网络结构时,只要这些网络结构是在一个实例的基础上封装的,那这个实例中的所有权重便是共享的,一个网络中的权重发生变化,意味着其他网络中的权重同步发生了变化。\n", "\n", "在本文档中,定义训练和评估网络时便使用了权重共享机制:" ] }, { "cell_type": "code", "execution_count": 14, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:17.017589Z", "start_time": "2021-12-31T07:16:16.991264Z" } }, "outputs": [], "source": [ "# 实例化前向网络\n", "net = LinearNet()\n", "# 设定损失函数并连接前向网络与损失函数\n", "crit = nn.MSELoss()\n", "net_with_criterion = nn.WithLossCell(net, crit)\n", "# 设定优化器\n", "opt = nn.Adam(params=net.trainable_params())\n", "# 定义训练网络\n", "train_net = nn.TrainOneStepCell(net_with_criterion, opt)\n", "train_net.set_train()\n", "# 构建评估网络\n", "eval_net = nn.WithEvalCell(net, crit)\n", "eval_net.set_train(False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`train_net`和`eval_net`均在`net`实例的基础上封装,因此在进行模型评估时,不需要加载`train_net`的权重。\n", "\n", "若在构建`eval_net`时重新的定义前向网络,那`train_net`和`eval_net`之间便没有共享权重,如下:" ] }, { "cell_type": "code", "execution_count": 15, "metadata": { "ExecuteTime": { "end_time": "2021-12-31T07:16:17.046932Z", "start_time": "2021-12-31T07:16:17.018611Z" } }, "outputs": [], "source": [ "# 实例化前向网络\n", "net = LinearNet()\n", "# 设定损失函数并连接前向网络与损失函数\n", "crit = nn.MSELoss()\n", "net_with_criterion = nn.WithLossCell(net, crit)\n", "# 设定优化器\n", "opt = nn.Adam(params=net.trainable_params())\n", "# 定义训练网络\n", "train_net = nn.TrainOneStepCell(net_with_criterion, opt)\n", "train_net.set_train()\n", "\n", "# 再次实例化前向网络\n", "net2 = LinearNet()\n", "# 构建评估网络\n", "eval_net = nn.WithEvalCell(net2, crit)\n", "eval_net.set_train(False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "此时,若要在模型训练后进行评估,就需要将`train_net`中的权重加载到`eval_net`中。在同一脚本中进行模型训练、评估和推理时,利用好权重共享机制不失为一种更简便的方式。需要注意的是,如果前向网络结构中构建了训练和推理两种场景,同样需要确保满足权重共享的条件,如果分支语句中使用了同样的权重,该权重相关的网络结构只实例化一次。\n", "\n", "这里讲解了如何构建和执行网络模型,后续章节会进一步讲解如何通过高阶API`Model`进行模型训练和评估。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }