{ "cells": [ { "cell_type": "markdown", "id": "487f7a6d-506a-4824-94b1-a96b4866a447", "metadata": {}, "source": [ "[![下载Notebook](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_notebook.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/zh_cn/advanced/modules/mindspore_optimizer.ipynb) [![下载样例代码](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_download_code.svg)](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/master/tutorials/zh_cn/advanced/modules/mindspore_optimizer.py) [![查看源文件](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/resource/_static/logo_source.svg)](https://gitee.com/mindspore/docs/blob/master/tutorials/source_zh_cn/advanced/modules/optimizer.ipynb)" ] }, { "cell_type": "markdown", "source": [ "# 优化器\n", "\n", "模型训练过程中,使用优化器更新网络参数,合适的优化器可以有效减少训练时间,提高模型性能。\n", "\n", "最基本的优化器是随机梯度下降算法(SGD),很多优化器在SGD的基础上进行了改进,以实现目标函数能更快速更有效地收敛到全局最优点。MindSpore中的`nn`模块提供了常用的优化器,如`nn.SGD`、`nn.Adam`、`nn.Momentum`等。本章主要介绍如何配置MindSpore提供的优化器以及如何自定义优化器。\n", "\n", "![learningrate.png](https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/website-images/master/tutorials/source_zh_cn/advanced/modules/images/learning_rate.png)\n", "\n", "> MindSpore提供的优化器详细内容参见[优化器API](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.nn.html#优化器)。\n", "\n", "## nn.optim\n", "\n", "### 配置优化器\n", "\n", "使用MindSpore提供的优化器时,首先需要指定待优化的网络参数`params`,然后设置优化器的其他主要参数,如学习率`learning_rate`和权重衰减`weight_decay`等。\n", "\n", "若要为不同网络参数单独设置选项,如对卷积参数和非卷积参数设置不同的学习率,则可使用参数分组的方法来设置优化器。\n", "\n", "#### 参数配置\n", "\n", "在构建优化器实例时,需要通过优化器参数`params`配置模型网络中要训练和更新的权重。`Parameter`中包含了一个`requires_grad`的布尔型的类属性,用于表示模型中的网络参数是否需要进行更新。\n", "\n", "网络中大部分参数的`requires_grad`默认值为True,少部分默认值为False,例如BatchNorm中的`moving_mean`和`moving_variance`。\n", "\n", "MindSpore中的`trainable_params`方法会屏蔽掉`Parameter`中`requires_grad`为False的属性,在为优化器配置 `params` 入参时,可使用`net.trainable_params()`方法来指定需要优化和更新的网络参数。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 48, "id": "d21d5a57", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:36:36.236910200Z", "start_time": "2023-08-03T08:36:36.173591700Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Parameter (name=param, shape=(1,), dtype=Float32, requires_grad=True), Parameter (name=conv.weight, shape=(6, 1, 5, 5), dtype=Float32, requires_grad=True)]\n" ] } ], "source": [ "import numpy as np\n", "import mindspore\n", "from mindspore import nn, ops\n", "from mindspore import Tensor, Parameter\n", "\n", "class Net(nn.Cell):\n", " def __init__(self):\n", " super().__init__()\n", " self.conv = nn.Conv2d(1, 6, 5, pad_mode=\"valid\")\n", " self.param = Parameter(Tensor(np.array([1.0], np.float32)), 'param')\n", "\n", " def construct(self, x):\n", " x = self.conv(x)\n", " x = x * self.param\n", " out = ops.matmul(x, x)\n", " return out\n", "\n", "net = Net()\n", "\n", "# 配置优化器需要更新的参数\n", "optim = nn.Adam(params=net.trainable_params())\n", "print(net.trainable_params())" ] }, { "cell_type": "markdown", "source": [ "用户可以手动修改网络权重中 `Parameter` 的 `requires_grad` 属性的默认值,来决定哪些参数需要更新。\n", "\n", "如下例所示,使用 `net.get_parameters()` 方法获取网络中所有参数,并手动修改巻积参数的 `requires_grad` 属性为False,训练过程中将只对非卷积参数进行更新。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 49, "id": "61282381", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:36:56.238579600Z", "start_time": "2023-08-03T08:36:56.171731800Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Parameter (name=param, shape=(1,), dtype=Float32, requires_grad=True)]\n" ] } ], "source": [ "conv_params = [param for param in net.get_parameters() if 'conv' in param.name]\n", "for conv_param in conv_params:\n", " conv_param.requires_grad = False\n", "print(net.trainable_params())\n", "optim = nn.Adam(params=net.trainable_params())" ] }, { "cell_type": "markdown", "source": [ "#### 学习率\n", "\n", "学习率作为机器学习及深度学习中常见的超参,对目标函数能否收敛到局部最小值及何时收敛到最小值有重要影响。学习率过大容易导致目标函数波动较大,难以收敛到最优值,太小则会导致收敛过程耗时过长。除了设置固定学习率,MindSpore还支持设置动态学习率,这些方法在深度学习网络中能明显提升收敛效率。\n", "\n", "**固定学习率**:\n", "\n", "使用固定学习率时,优化器传入的`learning_rate`为浮点类型或标量Tensor。\n", "\n", "以`nn.Momentum`为例,固定学习率为0.01,示例如下:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 50, "id": "b3906fa2", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:37:16.472983Z", "start_time": "2023-08-03T08:37:16.461731100Z" } }, "outputs": [], "source": [ "# 设置学习率为0.01\n", "optim = nn.Momentum(params=net.trainable_params(), learning_rate=0.01, momentum=0.9)" ] }, { "cell_type": "markdown", "source": [ "**动态学习率**:\n", "\n", "`mindspore.nn`提供了动态学习率的模块,分为Dynamic LR函数和LearningRateSchedule类。其中Dynamic LR函数会预先生成长度为`total_step`的学习率列表,将列表传入优化器中使用,训练过程中,第i步使用第i个学习率的值作为当前step的学习率,其中`total_step`的设置值不能小于训练的总步数;LearningRateSchedule类将实例传递给优化器,优化器根据当前step计算得到当前的学习率。\n", "\n", "- Dynamic LR函数\n", "\n", " [Dynamic LR](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.nn.html#dynamic-lr%E5%87%BD%E6%95%B0)函数目前有基于余弦衰减函数计算学习率(`nn.cosine_decay_lr`)、基于指数衰减函数计算学习率(`nn.exponential_decay_lr`)、基于逆时衰减函数计算学习率(`nn.inverse_decay_lr`)、基于自然指数衰减函数计算学习率(`nn.natural_exp_decay_lr`)、获取分段常量学习率(`nn.piecewise_constant_lr`)、基于多项式衰减函数计算学习率(`nn.polynomial_decay_lr`)和预热学习率(`nn.warmup_lr`)。\n", "\n", "下例以分段常量学习率`nn.piecewise_constant_lr`为例:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 51, "id": "59c06ad2", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:37:33.109488200Z", "start_time": "2023-08-03T08:37:33.029909300Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[0.1, 0.05, 0.05, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01, 0.01]\n" ] } ], "source": [ "milestone = [1, 3, 10]\n", "learning_rates = [0.1, 0.05, 0.01]\n", "lr = nn.piecewise_constant_lr(milestone, learning_rates)\n", "\n", "# 打印学习率\n", "print(lr)\n", "\n", "net = Net()\n", "# 优化器设置待优化的网络参数和分段常量学习率\n", "optim = nn.SGD(net.trainable_params(), learning_rate=lr)" ] }, { "cell_type": "markdown", "source": [ "- LearningRateSchedule类\n", "\n", " [LearningRateSchedule类](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.nn.html#learningrateschedule%E7%B1%BB)目前有基于余弦衰减函数计算学习率(`nn.CosineDecayLR`)、基于指数衰减函数计算学习率(`nn.ExponentialDecayLR`)、基于逆时衰减函数计算学习率(`nn.InverseDecayLR`)、基于自然指数衰减函数计算学习率(`nn.NaturalExpDecayLR`)、基于多项式衰减函数计算学习率(`nn.PolynomialDecayLR`)和预热学习率(`nn.WarmUpLR`)。\n", "\n", "下例基于指数衰减函数计算学习率`nn.ExponentialDecayLR`为例:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 52, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "step1, lr:0.1\n", "step2, lr:0.097400375\n", "step3, lr:0.094868325\n", "step4, lr:0.09240211\n" ] } ], "source": [ "learning_rate = 0.1 # 学习率的初始值\n", "decay_rate = 0.9 # 衰减率\n", "decay_steps = 4 # 衰减的step数\n", "step_per_epoch = 2\n", "\n", "exponential_decay_lr = nn.ExponentialDecayLR(learning_rate, decay_rate, decay_steps)\n", "\n", "for i in range(decay_steps):\n", " step = Tensor(i, mindspore.int32)\n", " result = exponential_decay_lr(step)\n", " print(f\"step{i+1}, lr:{result}\")\n", "\n", "net = Net()\n", "\n", "# 优化器设置学习率为基于指数衰减函数计算学习率\n", "optim = nn.Momentum(net.trainable_params(), learning_rate=exponential_decay_lr, momentum=0.9)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:37:50.911740800Z", "start_time": "2023-08-03T08:37:50.852026100Z" } } }, { "cell_type": "markdown", "source": [ "#### 权重衰减\n", "\n", "权重衰减(weight decay),通常也被称为L2正则化,是一种缓解深度神经网络过拟合的方法。\n", "\n", "一般情况下,`weight_decay`取值范围为$[0, 1)$,其默认值为0.0,此时不使用权重衰减策略。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 53, "id": "e0a8d738", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:38:06.817467600Z", "start_time": "2023-08-03T08:38:06.812287800Z" } }, "outputs": [], "source": [ "net = Net()\n", "optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.01,\n", " momentum=0.9, weight_decay=0.9)" ] }, { "cell_type": "markdown", "source": [ "此外,MindSpore还支持动态weight decay。此时weight_decay是用户自定义的一个Cell,称之为weight_decay_schedule。在训练过程中,优化器内部会调用该Cell的实例,传入global_step计算当前step的weight_decay值。其中global_step是内部维护的变量,每训练一个step,global_step都会自加1。注意,自定义的weight_decay_schedule的construct仅接收一个输入。如下是weight_decay在训练过程中进行指数衰减的一个示例。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 54, "id": "639bef5b", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:38:24.692025100Z", "start_time": "2023-08-03T08:38:24.634537200Z" } }, "outputs": [], "source": [ "from mindspore.nn import Cell\n", "from mindspore import ops, nn\n", "import mindspore as ms\n", "\n", "class ExponentialWeightDecay(Cell):\n", "\n", " def __init__(self, weight_decay, decay_rate, decay_steps):\n", " super(ExponentialWeightDecay, self).__init__()\n", " self.weight_decay = weight_decay\n", " self.decay_rate = decay_rate\n", " self.decay_steps = decay_steps\n", "\n", " def construct(self, global_step):\n", " # construct只能有一个输入,训练过程中,会自动传入global step进行计算\n", " p = global_step / self.decay_steps\n", " return self.weight_decay * ops.pow(self.decay_rate, p)\n", "\n", "net = Net()\n", "\n", "weight_decay = ExponentialWeightDecay(weight_decay=0.0001, decay_rate=0.1, decay_steps=10000)\n", "optimizer = nn.Momentum(net.trainable_params(), learning_rate=0.01,\n", " momentum=0.9, weight_decay=weight_decay)" ] }, { "cell_type": "markdown", "source": [ "#### 超参分组\n", "\n", "优化器也支持为不同参数单独设置选项,此时不直接传入变量,而是传入一个字典列表,每个字典都对应一组参数的设置值,字典内可用的key有`params`、`lr`、`weight_decay`和`grad_centralizaiton`,value为对应的设定值。\n", "\n", "其中,`params`必须配置,其余参数可选择性配置,未配置参数项将采用定义优化器时设置的参数值。分组时,学习率既可使用固定学习率,又可使用动态学习率,`weight_decay`可使用固定值。\n", "\n", "下例分别对卷积参数和非卷积参数设置不同的学习率和权重衰减参数。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 55, "id": "616fd8a2", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:38:42.296051700Z", "start_time": "2023-08-03T08:38:42.252895300Z" } }, "outputs": [], "source": [ "net = Net()\n", "\n", "# 卷积参数\n", "conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))\n", "# 非卷积参数\n", "no_conv_params = list(filter(lambda x: 'conv' not in x.name, net.trainable_params()))\n", "\n", "# 固定学习率\n", "fix_lr = 0.01\n", "\n", "# 基于多项式衰减函数计算学习率\n", "polynomial_decay_lr = nn.PolynomialDecayLR(learning_rate=0.1, # 学习率初始值\n", " end_learning_rate=0.01, # 学习率最终值\n", " decay_steps=4, # 衰减的step数\n", " power=0.5) # 多项式幂\n", "\n", "# 卷积参数使用固定学习率0.001,权重衰减为0.01\n", "# 非卷积参数使用动态学习率,权重衰减为0.0\n", "group_params = [{'params': conv_params, 'weight_decay': 0.01, 'lr': fix_lr},\n", " {'params': no_conv_params, 'lr': polynomial_decay_lr}]\n", "\n", "optim = nn.Momentum(group_params, learning_rate=0.1, momentum=0.9, weight_decay=0.0)" ] }, { "cell_type": "markdown", "source": [ "> 当前MindSpore除个别优化器外(例如AdaFactor,FTRL),均支持对学习率进行分组,详情参考[优化器API](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.nn.html#优化器)。\n", "\n", "### 自定义优化器\n", "\n", "除使用MindSpore提供的优化器外,用户还可自定义优化器。\n", "\n", "自定义优化器时需继承优化器基类[nn.Optimizer](https://www.mindspore.cn/docs/zh-CN/master/api_python/nn/mindspore.nn.Optimizer.html#mindspore.nn.Optimizer),并重写`__init__`方法和`construct`方法以自行设定参数更新策略。\n", "\n", "下例实现自定义优化器Momentum(带有动量的SGD算法):\n", "\n", "$$ v_{t+1} = v_t×u+grad \\tag{1} $$\n", "\n", "$$p_{t+1} = p_t - lr*v_{t+1} \\tag{2} $$\n", "\n", "其中,$grad$ 、$lr$ 、$p$ 、$v$ 和 $u$ 分别表示梯度、学习率、权重参数、动量参数(Momentum)和初始速度。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 56, "id": "c40a31dd", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:38:58.316547Z", "start_time": "2023-08-03T08:38:58.250408500Z" } }, "outputs": [], "source": [ "class Momentum(nn.Optimizer):\n", " \"\"\"定义优化器\"\"\"\n", " def __init__(self, params, learning_rate, momentum=0.9):\n", " super(Momentum, self).__init__(learning_rate, params)\n", " self.momentum = Parameter(Tensor(momentum, ms.float32), name=\"momentum\")\n", " self.moments = self.parameters.clone(prefix=\"moments\", init=\"zeros\")\n", "\n", " def construct(self, gradients):\n", " \"\"\"construct输入为梯度,在训练中自动传入梯度gradients\"\"\"\n", " lr = self.get_lr()\n", " params = self.parameters # 待更新的权重参数\n", "\n", " for i in range(len(params)):\n", " # 更新moments值\n", " ops.assign(self.moments[i], self.moments[i] * self.momentum + gradients[i])\n", " update = params[i] - self.moments[i] * lr #带有动量的SGD算法\n", " ops.assign(params[i], update)\n", " return params\n", "\n", "net = Net()\n", "# 设置优化器待优化的参数和学习率为0.01\n", "opt = Momentum(net.trainable_params(), 0.01)" ] }, { "cell_type": "markdown", "source": [ "`mindspore.ops`也封装了优化器算子供用户自行定义优化器,如`ops.ApplyCenteredRMSProp`、 `ops.ApplyMomentum`和`ops.ApplyRMSProp`等。下例使用`ApplyMomentum`算子自定义优化器Momentum:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 57, "id": "82602027", "metadata": { "ExecuteTime": { "end_time": "2023-08-03T08:39:14.689402300Z", "start_time": "2023-08-03T08:39:14.636999400Z" } }, "outputs": [], "source": [ "class Momentum(nn.Optimizer):\n", " \"\"\"定义优化器\"\"\"\n", " def __init__(self, params, learning_rate, momentum=0.9):\n", " super(Momentum, self).__init__(learning_rate, params)\n", " self.moments = self.parameters.clone(prefix=\"moments\", init=\"zeros\")\n", " self.momentum = momentum\n", " self.opt = ops.ApplyMomentum()\n", "\n", " def construct(self, gradients):\n", " # 待更新的权重参数\n", " params = self.parameters\n", " success = None\n", " for param, mom, grad in zip(params, self.moments, gradients):\n", " success = self.opt(param, mom, self.learning_rate, grad, self.momentum)\n", " return success\n", "\n", "net = Net()\n", "# 设置优化器待优化的参数和学习率为0.01\n", "opt = Momentum(net.trainable_params(), 0.01)" ] }, { "cell_type": "markdown", "source": [ "## experimental.optim\n", "\n", "除上述 `mindspore.nn.optim` 模块内的优化器外,MindSpore也提供了实验性的优化器模块 `mindspore.experimental.optim`,旨在对优化器做功能扩展。\n", "\n", "> `mindspore.experimental.optim` 模块仍在开发中,目前此模块的优化器只适用于函数式编程场景,且仅适配 [mindspore.experimental.optim.lr_scheduler](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.experimental.html#lrscheduler类) 下的动态学习率类。\n", "\n", "使用差异:\n", "\n", "| 参数 | nn.optim | experimental.optim | 功能 |\n", "|-------|----------| ------------|-------------------------|\n", "| 参数配置(超参不分组) | 配置入参为 `params` | 配置入参为 `params` | 常规场景下配置及功能一致,传入 `net.trainable_params` 即可|\n", "| 学习率 | 配置入参为 `learning_rate` | 配置入参为 `lr` | 动态学习率场景下配置及功能不同,详见[动态学习率](#学习率-1)|\n", "| 权重衰减 | 配置入参为 `weight_decay` | 配置入参为 `weight_decay` | 动态weight_decay场景下配置不同,详见[权重衰减](#权重衰减-1)|\n", "| 超参分组 | 配置入参为 `params`,传入设置的参数组dict | 配置入参为 `params`,传入设置的参数组dict | 分组场景下,即当 `params` 为dict时,功能不同,详见[超参分组](#超参分组-1)|\n", "\n", "除上述异同外,`mindspore.experimental.optim` 下的优化器还支持[查看参数组](#查看优化器配置)、[运行中修改优化器参数](#运行中修改优化器参数)等功能,详见下文。\n", "\n", "### 配置优化器\n", "\n", "#### 参数配置\n", "\n", "常规场景下,与 `mindspore.nn.optim` 的参数配置方式相同,传入 `net.trainable_params` 即可。\n", "\n", "#### 学习率\n", "\n", "**固定学习率**:\n", "\n", "与 `mindspore.nn.optim` 的固定学习率配置方式相同。\n", "\n", "**动态学习率**:\n", "\n", "`mindspore.experimental.optim.lr_scheduler` 下提供了动态学习率模块与 `mindspore.experimental.optim` 配合使用,使用方式与 `mindspore.nn.optim` 不同:\n", "\n", "`mindspore.nn.optim`:将动态学习率列表或实例传给优化器的入参 `learning_rate`,使用方式请参考[DynamicLR函数](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.nn.html#dynamic-lr%E5%87%BD%E6%95%B0)和[LearningRateSchedule类](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.nn.html#learningrateschedule%E7%B1%BB)。\n", "\n", "`mindspore.experimental.optim`:将优化器实例传给动态学习率类的入参 `optimizer`,使用方式请参考[LRScheduler类](https://www.mindspore.cn/docs/zh-CN/master/api_python/mindspore.experimental.html#lrscheduler类)。\n", "\n", "`LRScheduler` 提供的获取学习率的方式:\n", "\n", "`get_lr`方法。以 `StepLR` 为例,训练过程中可以直接使用`scheduler.get_lr()` 手动获取学习率。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 58, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[Tensor(shape=[], dtype=Float32, value= 0.1)]\n" ] } ], "source": [ "from mindspore.experimental import optim\n", "net = Net()\n", "optimizer = optim.Adam(net.trainable_params(), lr=0.1)\n", "scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=30, gamma=0.1)\n", "print(scheduler.get_last_lr())" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:39:34.246598500Z", "start_time": "2023-08-03T08:39:34.233932700Z" } } }, { "cell_type": "markdown", "source": [ "#### 权重衰减\n", "\n", "`mindspore.nn.optim`:`weight_decay` 除支持int和float类型外,也支持Cell类型用于动态weight_decay场景。\n", "\n", "`mindspore.experimental.optim`:`weight_decay` 数据类型仅支持int和float类型,但支持用户在PyNative模式下手动修改weight_decay的值。\n", "\n", "#### 超参分组\n", "\n", "`mindspore.nn.optim`:支持特定key分组:\"params\"、\"lr\"、\"weight_decay\"和\"grad_centralizaiton\",具体使用方式详见[上文](#超参分组)。\n", "\n", "`mindspore.experimental.optim`:支持所有优化器参数分组。\n", "\n", "代码样例:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 60, "outputs": [], "source": [ "conv_params = list(filter(lambda x: 'conv' in x.name, net.trainable_params()))\n", "no_conv_params = list(\n", " filter(lambda x: 'conv' not in x.name, net.trainable_params()))\n", "group_params = [\n", " {'params': conv_params, 'weight_decay': 0.01, 'lr': 0.9, \"amsgrad\": True},\n", " {'params': no_conv_params, 'lr': 0.66, \"eps\": 1e-6, \"betas\": (0.8, 0.88)}]\n", "optimizer = optim.Adam(params=group_params, lr=0.01)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:40:06.059015500Z", "start_time": "2023-08-03T08:40:06.034906700Z" } } }, { "cell_type": "markdown", "source": [ "#### 查看优化器配置\n", "\n", "**使用 `param_group` 属性查看参数组**:\n", "\n", "代码样例:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 61, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[{'params': [Parameter (name=conv.weight, shape=(6, 1, 5, 5), dtype=Float32, requires_grad=True)], 'weight_decay': 0.01, 'lr': Parameter (name=learning_rate_group_0, shape=(), dtype=Float32, requires_grad=True), 'amsgrad': True, 'betas': (0.9, 0.999), 'eps': 1e-08, 'maximize': False}, {'params': [Parameter (name=param, shape=(1,), dtype=Float32, requires_grad=True)], 'lr': Parameter (name=learning_rate_group_1, shape=(), dtype=Float32, requires_grad=True), 'eps': 1e-06, 'betas': (0.8, 0.88), 'weight_decay': 0.0, 'amsgrad': False, 'maximize': False}]\n" ] } ], "source": [ "print(optimizer.param_groups)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:40:24.357926400Z", "start_time": "2023-08-03T08:40:24.302510500Z" } } }, { "cell_type": "markdown", "source": [ "从上述输出中可以发现,优化器参数组中学习率为 `Parameter`,mindspore中的 `Parameter` 不显示参数值,可以通过使用 `.value()` 的方式查看参数值。也可以使用上述动态学习率模块 `mindspore.experimental.optim.lr_scheduler.LRScheduler` 的 `get_last_lr` 方法获取。" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 62, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "0.66\n" ] } ], "source": [ "print(optimizer.param_groups[1][\"lr\"].value())" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:40:42.010457100Z", "start_time": "2023-08-03T08:40:41.999532Z" } } }, { "cell_type": "markdown", "source": [ "**直接打印优化器实例查看参数组**:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 63, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Adam (\n", "Parameter Group 0\n", " amsgrad: True\n", " betas: (0.9, 0.999)\n", " eps: 1e-08\n", " lr: 0.9\n", " maximize: False\n", " weight_decay: 0.01\n", "\n", "Parameter Group 1\n", " amsgrad: False\n", " betas: (0.8, 0.88)\n", " eps: 1e-06\n", " lr: 0.66\n", " maximize: False\n", " weight_decay: 0.0\n", ")\n" ] } ], "source": [ "print(optimizer)" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:41:06.592709700Z", "start_time": "2023-08-03T08:41:06.498062400Z" } } }, { "cell_type": "markdown", "source": [ "### 运行中修改优化器参数\n", "\n", "#### 运行中修改学习率\n", "\n", "`mindspore.experimental.optim.Optimizer` 中学习率为 `Parameter`,除通过上述动态学习率模块 `mindspore.experimental.optim.lr_scheduler` 动态修改学习率,也支持使用 `assign` 赋值的方式修改学习率。\n", "\n", "例如下述样例,在训练step中,设置如果损失值相比上一个step变化小于0.1,将优化器第1个参数组的学习率调整至0.01:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 64, "outputs": [], "source": [ "net = Net()\n", "loss_fn = nn.MAELoss()\n", "optimizer = optim.Adam(net.trainable_params(), lr=0.1)\n", "scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)\n", "last_step_loss = 0.1\n", "\n", "def forward_fn(data, label):\n", " logits = net(data)\n", " loss = loss_fn(logits, label)\n", " return loss\n", "\n", "grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)\n", "\n", "def train_step(data, label):\n", " (loss, _), grads = grad_fn(data, label)\n", " optimizer(grads)\n", " if ops.abs(loss - last_step_loss) < 0.1:\n", " ops.assign(optimizer.param_groups[1][\"lr\"], Tensor(0.01))\n", " return loss" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:41:20.268627700Z", "start_time": "2023-08-03T08:41:20.237688800Z" } } }, { "cell_type": "markdown", "source": [ "#### 运行中修改除lr以外的优化器参数\n", "\n", "> 目前仅PyNative模式下支持运行中修改其他优化器参数,Graph模式下的修改将不生效或报错。\n", "\n", "下述样例,在训练step中,设置如果损失值相比上一个step变化小于0.1,将优化器第1个参数组的 `weight_decay` 调整至0.02:" ], "metadata": { "collapsed": false } }, { "cell_type": "code", "execution_count": 65, "outputs": [], "source": [ "net = Net()\n", "loss_fn = nn.MAELoss()\n", "optimizer = optim.Adam(net.trainable_params(), lr=0.1)\n", "scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=10, gamma=0.5)\n", "last_step_loss = 0.1\n", "\n", "def forward_fn(data, label):\n", " logits = net(data)\n", " loss = loss_fn(logits, label)\n", " return loss\n", "\n", "grad_fn = mindspore.value_and_grad(forward_fn, None, optimizer.parameters, has_aux=True)\n", "\n", "def train_step(data, label):\n", " (loss, _), grads = grad_fn(data, label)\n", " optimizer(grads)\n", " if ops.abs(loss - last_step_loss) < 0.1:\n", " optimizer.param_groups[1][\"weight_decay\"] = 0.02\n", " return loss" ], "metadata": { "collapsed": false, "ExecuteTime": { "end_time": "2023-08-03T08:41:32.669897800Z", "start_time": "2023-08-03T08:41:32.644117800Z" } } }, { "cell_type": "markdown", "source": [ "### 自定义优化器\n", "\n", "与上述[自定义优化器](#自定义优化器)方式相同,自定义优化器时也可以继承优化器基类[experimental.optim.Optimizer](https://www.mindspore.cn/docs/zh-CN/master/api_python/experimental/optim/mindspore.experimental.optim.Optimizer.html),并重写`__init__`方法和`construct`方法以自行设定参数更新策略。" ], "metadata": { "collapsed": false } } ], "metadata": { "kernelspec": { "display_name": "MindSpore", "language": "python", "name": "mindspore" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 5 }