{ "cells": [ { "cell_type": "markdown", "id": "a882ffc9", "metadata": {}, "source": [ "# 训练网络与梯度求导\n", "\n", "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_notebook.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0.0-alpha/zh_cn/migration_guide/model_development/mindspore_training_and_gradient.ipynb)\n", "[![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_download_code.png)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/r2.0.0-alpha/zh_cn/migration_guide/model_development/mindspore_training_and_gradient.py)\n", "[![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/r2.0.0-alpha/resource/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r2.0.0-alpha/docs/mindspore/source_zh_cn/migration_guide/model_development/training_and_gradient.ipynb)\n", "\n", "## 自动微分\n", "\n", "正向网络构建完成之后,MindSpore提供了[自动微分](https://mindspore.cn/tutorials/zh-CN/r2.0.0-alpha/beginner/autograd.html)的接口用以计算模型的梯度结果。\n", "在[自动求导](https://mindspore.cn/tutorials/zh-CN/r2.0.0-alpha/advanced/derivation.html)的教程中,对各种梯度计算的场景做了一些介绍。\n", "\n", "## 训练网络\n", "\n", "整个训练网络包含正向网络(网络和loss函数),自动梯度求导和优化器更新。MindSpore提供了三种方式来实现这个过程。\n", "\n", "1. 封装`Model`,使用`model.train`或者`model.fit`方法执行网络训练,如[模型训练](https://mindspore.cn/tutorials/zh-CN/r2.0.0-alpha/beginner/train.html)。\n", "\n", "2. 使用MindSpore封装好的[TrainOneStepCell](https://www.mindspore.cn/docs/zh-CN/r2.0.0-alpha/api_python/nn/mindspore.nn.TrainOneStepCell.html) 和 [TrainOneStepWithLossScaleCell](https://www.mindspore.cn/docs/zh-CN/r2.0.0-alpha/api_python/nn/mindspore.nn.TrainOneStepWithLossScaleCell.html) 分别用于普通的训练流程和带[loss_scale](https://www.mindspore.cn/tutorials/zh-CN/r2.0.0-alpha/advanced/mixed_precision.html)的训练流程。如[快速入门](https://mindspore.cn/tutorials/zh-CN/r2.0.0-alpha/beginner/quick_start.html)。\n", "\n", "3. 自定义训练Cell。\n", "\n", "### 自定义训练Cell\n", "\n", "前两个方法举了官网的两个例子说明,对于自定义训练Cell,我们先复习下TrainOneStepCell里面做了什么:" ] }, { "cell_type": "code", "execution_count": 1, "id": "12825411", "metadata": { "ExecuteTime": { "end_time": "2022-09-08T09:03:04.381779Z", "start_time": "2022-09-08T09:03:01.036969Z" } }, "outputs": [], "source": [ "import mindspore as ms\n", "from mindspore import ops, nn\n", "from mindspore.parallel._utils import (_get_device_num, _get_gradients_mean,\n", " _get_parallel_mode)\n", "class TrainOneStepCell(nn.Cell):\n", " def __init__(self, network, optimizer):\n", " super(TrainOneStepCell, self).__init__(auto_prefix=False)\n", " self.network = network # 带loss的网络结构\n", " self.network.set_grad() # PYNATIVE模式时需要,如果为True,则在执行正向网络时,将生成需要计算梯度的反向网络。\n", " self.optimizer = optimizer # 优化器,用于参数更新\n", " self.weights = self.optimizer.parameters # 获取优化器的参数\n", "\n", " # 并行计算相关逻辑\n", " self.reducer_flag = False\n", " self.grad_reducer = ops.identity\n", " self.parallel_mode = _get_parallel_mode()\n", " self.reducer_flag = self.parallel_mode in (ms.ParallelMode.DATA_PARALLEL, ms.ParallelMode.HYBRID_PARALLEL)\n", " if self.reducer_flag:\n", " self.mean = _get_gradients_mean()\n", " self.degree = _get_device_num()\n", " self.grad_reducer = nn.DistributedGradReducer(self.weights, self.mean, self.degree)\n", "\n", " def construct(self, *inputs):\n", " loss, grads = ms.value_and_grad(self.network, grad_position=None, weights=self.weights)(*inputs) # 获取loss以及所有Parameter自由变量的梯度\n", " # grads = grad_op(grads) # 可以在这里加对梯度的一些计算逻辑,如梯度裁剪\n", " grads = self.grad_reducer(grads) # 梯度聚合\n", " self.optimizer(grads) # 优化器更新参数\n", " return loss" ] }, { "cell_type": "markdown", "id": "a9e8b21e", "metadata": {}, "source": [ "整个训练流程其实可以包装成一个Cell,在这个Cell里实现网络的正向计算,反向梯度求导和参数更新整个训练的流程,其中我们可以在获取到梯度之后,对梯度做一些特别的处理。\n", "\n", "#### 梯度裁剪\n", "\n", "当训练过程中遇到梯度爆炸或者梯度特别大,训练不稳定的情况,可以考虑添加梯度裁剪,这里对常用的使用global_norm进行梯度裁剪的场景举例说明:" ] }, { "cell_type": "code", "execution_count": 2, "id": "e6ac2d24", "metadata": { "ExecuteTime": { "end_time": "2022-09-08T09:03:04.395177Z", "start_time": "2022-09-08T09:03:04.384312Z" } }, "outputs": [], "source": [ "import mindspore as ms\n", "from mindspore import nn, ops\n", "\n", "_grad_scale = ops.MultitypeFuncGraph(\"grad_scale\")\n", "\n", "@_grad_scale.register(\"Tensor\", \"Tensor\")\n", "def tensor_grad_scale(scale, grad):\n", " return grad * ops.cast(ops.Reciprocal()(scale), ops.dtype(grad))\n", "\n", "class MyTrainOneStepCell(nn.TrainOneStepWithLossScaleCell):\n", " \"\"\"\n", " Network training package class with gradient clip.\n", "\n", " Append an optimizer to the training network after that the construct function\n", " can be called to create the backward graph.\n", "\n", " Args:\n", " network (Cell): The training network.\n", " optimizer (Cell): Optimizer for updating the weights.\n", " sens (Number): The adjust parameter. Default value is 1.0.\n", " grad_clip (bool): Whether clip gradients. Default value is False.\n", " \"\"\"\n", "\n", " def __init__(self, network, optimizer, scale_sense=1, grad_clip=False):\n", " if isinstance(scale_sense, (int, float)):\n", " scale_sense = nn.FixedLossScaleUpdateCell(scale_sense)\n", " super(MyTrainOneStepCell, self).__init__(network, optimizer, scale_sense)\n", " self.grad_clip = grad_clip\n", "\n", " def construct(self, x, img_shape, gt_bboxe, gt_label, gt_num):\n", " # 大多数是基类的属性和方法,详情参考对应基类API\n", " weights = self.weights\n", " loss = self.network(x, img_shape, gt_bboxe, gt_label, gt_num)\n", " scaling_sens = self.scale_sense\n", " # 启动浮点溢出检测。创建并清除溢出检测状态\n", " status, scaling_sens = self.start_overflow_check(loss, scaling_sens)\n", " # 给梯度乘一个scale防止梯度下溢\n", " scaling_sens_filled = ops.ones_like(loss) * ops.cast(scaling_sens, ops.dtype(loss))\n", " grads = self.grad(self.network, weights)(x, img_shape, gt_bboxe, gt_label, gt_num, scaling_sens_filled)\n", " # 给求得的梯度除回scale计算真实的梯度值\n", " grads = self.hyper_map(ops.partial(_grad_scale, scaling_sens), grads)\n", " # 梯度裁剪\n", " if self.grad_clip:\n", " grads = ops.clip_by_global_norm(grads)\n", " # 梯度聚合\n", " grads = self.grad_reducer(grads)\n", "\n", " # 获取浮点溢出状态\n", " cond = self.get_overflow_status(status, grads)\n", " # 动态loss scale时根据溢出状态计算损失缩放系数\n", " overflow = self.process_loss_scale(cond)\n", " # 如果没有溢出,执行优化器更新参数\n", " if not overflow:\n", " self.optimizer(grads)\n", " return loss, cond, scaling_sens.value()" ] }, { "cell_type": "markdown", "id": "702bfe13", "metadata": {}, "source": [ "#### 梯度累积\n", "\n", "梯度累积是一种训练神经网络的数据样本按Batch拆分为几个小Batch的方式,然后按顺序计算,用以解决由于内存不足,导致Batch size过大,神经网络无法训练或者网络模型过大无法加载的OOM(Out Of Memory)问题。\n", "\n", "详情请参考[梯度累积](https://www.mindspore.cn/tutorials/experts/zh-CN/r2.0.0-alpha/optimize/gradient_accumulation.html)。" ] } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 5 }