[{"data":1,"prerenderedAt":182},["ShallowReactive",2],{"content-query-X1CCE4RPY2":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":9,"date":10,"cover":11,"type":12,"category":13,"body":14,"_type":176,"_id":177,"_source":178,"_file":179,"_stem":180,"_extension":181},"/technology-blogs/zh/1889","zh",false,"","【MindSpore易点通】静态LossScale和动态LossScale的区别","一般情况下LossScale功能不需要和优化器配合使用，但如果drop_overflow_update为False，那么优化器需设置loss_scale的值，且loss_scale的值要与静态LossScale算子相同。","2022-09-29","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/10/25/5e7f6e6f0afd4753b4b04512bc6b764f.png","technology-blogs","调试调优",{"type":15,"children":16,"toc":170},"root",[17,25,31,36,41,48,53,58,62,67,72,82,87,95,100,108,113,121,127,132,137,142,147,152,157,162],{"type":18,"tag":19,"props":20,"children":22},"element","h1",{"id":21},"mindspore易点通静态lossscale和动态lossscale的区别",[23],{"type":24,"value":8},"text",{"type":18,"tag":26,"props":27,"children":28},"p",{},[29],{"type":24,"value":30},"背景信息",{"type":18,"tag":26,"props":32,"children":33},{},[34],{"type":24,"value":35},"在混合精度中，使用float16类型来替代float32类型存储数据，从而达到减少内存和提高计算速度的效果。但是由于float16类型要比float32类型表示的范围小很多，所以当某些参数（比如说梯度）在训练过程中变得很小时，就会发生数据下溢的情况，进而影响网络精度。而loss scale正是为了解决float16类型数据下溢问题的，LossScale的主要思想是在计算loss时，将loss扩大一定的倍数，由于链式法则的存在，梯度也会相应扩大，然后在优化器更新权重时再缩小相应的倍数，从而避免了数据下溢的情况又不影响计算结果；而LossScale又可分为静态LossScale与动态LossScale。",{"type":18,"tag":26,"props":37,"children":38},{},[39],{"type":24,"value":40},"MindSpore中提供了两种Loss Scale的方式，分别是FixedLossScaleManager和DynamicLossScaleManager，需要和Model配合使用。在使用Model构建模型时，可配置混合精度策略amp_level和Loss Scale方式loss_scale_manager。",{"type":18,"tag":42,"props":43,"children":45},"h3",{"id":44},"_1静态lossscale",[46],{"type":24,"value":47},"1、静态LossScale",{"type":18,"tag":26,"props":49,"children":50},{},[51],{"type":24,"value":52},"LossScale在训练过程中使用固定scale的值，且在训练过程中不会改变scale的值，scale的值由入参loss_scale控制，可以由用户指定，不指定则取默认值。",{"type":18,"tag":26,"props":54,"children":55},{},[56],{"type":24,"value":57},"静态LossScale算子的另一个参数是drop_overflow_update，用来控制发生溢出时是否更新参数。",{"type":18,"tag":26,"props":59,"children":60},{},[61],{"type":24,"value":9},{"type":18,"tag":26,"props":63,"children":64},{},[65],{"type":24,"value":66},"FixedLossScaleManager具体用法如下：",{"type":18,"tag":26,"props":68,"children":69},{},[70],{"type":24,"value":71},"1.import必要的库，并声明使用图模式下执行。",{"type":18,"tag":73,"props":74,"children":76},"pre",{"code":75},"import numpy as npimport mindspore as msimport mindspore.nn as nnfrom mindspore.nn import Accuracyfrom mindspore.common.initializer import Normalfrom mindspore import dataset as ds\n\n#ms.set_seed(0)ms.set_context(mode=ms.GRAPH_MODE, device_target=\"GPU\")\n",[77],{"type":18,"tag":78,"props":79,"children":80},"code",{"__ignoreMap":7},[81],{"type":24,"value":75},{"type":18,"tag":26,"props":83,"children":84},{},[85],{"type":24,"value":86},"2.定义LeNet5网络模型，任何网络模型都可以使用Loss Scale机制。",{"type":18,"tag":73,"props":88,"children":90},{"code":89},"class LeNet5(nn.Cell):\n\n    \"\"\"        Lenet network\n\n        Args:            num_class (int): Number of classes. Default: 10.            num_channel (int): Number of channels. Default: 1.\n\n        Returns:            Tensor, output tensor    \"\"\"\n\n    def __init__(self, num_class=10, num_channel=1):\n\n        super(LeNet5, self).__init__()\n\n        self.conv1 = nn.Conv2d(num_channel, 6, 5, pad_mode='valid')\n\n        self.conv2 = nn.Conv2d(6, 16, 5, pad_mode='valid')\n\n        self.fc1 = nn.Dense(16 * 5 * 5, 120, weight_init=Normal(0.02))\n\n        self.fc2 = nn.Dense(120, 84, weight_init=Normal(0.02))\n\n        self.fc3 = nn.Dense(84, num_class, weight_init=Normal(0.02))\n\n        self.relu = nn.ReLU()\n\n        self.max_pool2d = nn.MaxPool2d(kernel_size=2, stride=2)\n\n        self.flatten = nn.Flatten()\n\n\n\n    def construct(self, x):\n\n        x = self.max_pool2d(self.relu(self.conv1(x)))\n\n        x = self.max_pool2d(self.relu(self.conv2(x)))\n\n        x = self.flatten(x)\n\n        x = self.relu(self.fc1(x))\n\n        x = self.relu(self.fc2(x))\n\n        x = self.fc3(x)\n\n        return x\n",[91],{"type":18,"tag":78,"props":92,"children":93},{"__ignoreMap":7},[94],{"type":24,"value":89},{"type":18,"tag":26,"props":96,"children":97},{},[98],{"type":24,"value":99},"3.定义数据集和训练流程中常用的接口。",{"type":18,"tag":73,"props":101,"children":103},{"code":102},"# create datasetdef get_data(num, img_size=(1, 32, 32), num_classes=10, is_onehot=True):\n\n    for _ in range(num):\n\n        img = np.random.randn(*img_size)\n\n        target = np.random.randint(0, num_classes)\n\n        target_ret = np.array([target]).astype(np.float32)\n\n        if is_onehot:\n\n            target_onehot = np.zeros(shape=(num_classes,))\n\n            target_onehot[target] = 1\n\n            target_ret = target_onehot.astype(np.float32)\n\n        yield img.astype(np.float32), target_ret\n\ndef create_dataset(num_data=1024, batch_size=32, repeat_size=1):\n\n    input_data = ds.GeneratorDataset(list(get_data(num_data)), column_names=['data', 'label'])\n\n    input_data = input_data.batch(batch_size, drop_remainder=True)\n\n    input_data = input_data.repeat(repeat_size)\n\n    return input_data\n\nds_train = create_dataset()\n\n# Initialize networknetwork = LeNet5(10)\n\n# Define Loss and Optimizernet_loss = nn.SoftmaxCrossEntropyWithLogits(reduction=\"mean\")\n",[104],{"type":18,"tag":78,"props":105,"children":106},{"__ignoreMap":7},[107],{"type":24,"value":102},{"type":18,"tag":26,"props":109,"children":110},{},[111],{"type":24,"value":112},"4.真正使用Loss Scale的API接口，作用于优化器和模型中。",{"type":18,"tag":73,"props":114,"children":116},{"code":115},"# Define Loss Scale, optimizer and model\n\n#1) Drop the parameter update if there is an overflow\n\nloss_scale_manager = ms.FixedLossScaleManager()\n\nnet_opt = nn.Momentum(network.trainable_params(), learning_rate=0.01, momentum=0.9)\n\nmodel = ms.Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()}, amp_level=\"O0\", loss_scale_manager=loss_scale_manager)\n\n\n\n#2) Execute parameter update even if overflow occurs\n\nloss_scale = 1024.0\n\nloss_scale_manager = ms.FixedLossScaleManager(loss_scale, False)\n\nnet_opt = nn.Momentum(network.trainable_params(), learning_rate=0.01, momentum=0.9, loss_scale=loss_scale)\n\nmodel = ms.Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()}, amp_level=\"O0\", loss_scale_manager=loss_scale_manager)\n\n\n\n# Run training\n\nmodel.train(epoch=10, train_dataset=ds_train, callbacks=[ms.LossMonitor()])\n\n\n\nTotalTime = 0.639526, [19]\n\n[parse]: 0.00137553\n\n[symbol_resolve]: 0.0630952, [1]\n\n    [Cycle 1]: 0.0626392, [1]\n\n        [resolve]: 0.0626309\n\n[combine_like_graphs]: 0.00123225\n\n[inference_opt_prepare]: 0.000407694\n\n[elininate_unused_parameter]: 0.000182617\n\n[abstract_specialize]: 0.0623443\n\n[auto_monad]: 0.00156646\n\n[inline]: 1.93249e-06\n\n[py_pre_ad]: 9.71369e-07\n\n[pipeline_split]: 2.88989e-06\n\n[optimize]: 0.477379, [14]\n\n    [simplify_data_structures]: 0.00029841\n\n    [opt_a]: 0.473795, [3]\n\n        [Cycle 1]: 0.450451, [25]\n\n            [expand_dump_flag]: 1.45491e-05\n\n            [switch_simplify]: 0.000713205\n\n            [a_1]: 0.00994622\n\n            [recompute_prepare]: 8.40612e-05\n\n            [updatestate_depend_eliminate]: 0.000666355\n\n            [updatestate_assign_eliminate]: 6.59283e-05\n\n            [updatestate_loads_eliminate]: 0.000232871\n\n            [parameter_eliminate]: 2.64868e-06\n\n...\n\n 0.16% :     0.000156s :      1: opt.transforms.opt_after_cconv\n\n 0.04% :     0.000044s :      1: opt.transforms.opt_b\n\n 0.24% :     0.000240s :      1: opt.transforms.opt_trans_graph\n\n 0.02% :     0.000018s :      1: opt.transforms.stop_gradient_special_op\n\nepoch: 1 step: 32, loss is 2.301311731338501\n\nepoch: 2 step: 32, loss is 2.2960867881774902\n\nepoch: 3 step: 32, loss is 2.292977809906006\n\nepoch: 4 step: 32, loss is 2.2935791015625\n\nepoch: 5 step: 32, loss is 2.3079633712768555\n\nepoch: 6 step: 32, loss is 2.3191769123077393\n\nepoch: 7 step: 32, loss is 2.3069581985473633\n\nepoch: 8 step: 32, loss is 2.2944517135620117\n\nepoch: 9 step: 32, loss is 2.3150291442871094\n\nepoch: 10 step: 32, loss is 2.3247132301330566\n",[117],{"type":18,"tag":78,"props":118,"children":119},{"__ignoreMap":7},[120],{"type":24,"value":115},{"type":18,"tag":42,"props":122,"children":124},{"id":123},"_2动态lossscale",[125],{"type":24,"value":126},"2、动态LossScale",{"type":18,"tag":26,"props":128,"children":129},{},[130],{"type":24,"value":131},"LossScale在训练过程中可以动态改变scale值的大小，在没有发生溢出（此处为上溢出）的情况下，要尽可能保持较大的scale的值。如果发生了溢出（上溢出），将缩小scale的值。",{"type":18,"tag":26,"props":133,"children":134},{},[135],{"type":24,"value":136},"名词解释：",{"type":18,"tag":26,"props":138,"children":139},{},[140],{"type":24,"value":141},"**上溢出：**数值超出最大表示范围。如表示区间为0~65504，那么当进行50000 + 50000的操作时，则会发生上溢出，输出结果可能是65500。",{"type":18,"tag":26,"props":143,"children":144},{},[145],{"type":24,"value":146},"**下溢出：**数值小于最小表示精度。如最小表示精度为6e-8，那么当进行8e-8 / 10的操作时，则会发生下溢出，输出结果可能是0。",{"type":18,"tag":26,"props":148,"children":149},{},[150],{"type":24,"value":151},"DynamicLossScaleManager会首先将scale设置为一个初始值，该值由入参init_loss_scale控制。",{"type":18,"tag":26,"props":153,"children":154},{},[155],{"type":24,"value":156},"在训练过程中，如果不发生溢出，在更新scale_window次参数后，会尝试扩大scale的值，如果发生了溢出，则跳过参数更新，并缩小scale的值，入参scale_factor是控制扩大或缩小的步数，scale_window控制没有发生溢出时，最大的连续更新步数。",{"type":18,"tag":26,"props":158,"children":159},{},[160],{"type":24,"value":161},"具体用法如下，仅需将FixedLossScaleManager样例中定义LossScale，优化器和模型部分的代码改成如下代码：",{"type":18,"tag":73,"props":163,"children":165},{"code":164},"# Define Loss Scale, optimizer and model\n\nscale_factor = 4\n\nscale_window = 3000\n\nloss_scale_manager = ms.DynamicLossScaleManager(scale_factor, scale_window)\n\nnet_opt = nn.Momentum(network.trainable_params(), learning_rate=0.01, momentum=0.9)\n\nmodel = ms.Model(network, net_loss, net_opt, metrics={\"Accuracy\": Accuracy()}, amp_level=\"O0\", loss_scale_manager=loss_scale_manager)\n\n",[166],{"type":18,"tag":78,"props":167,"children":168},{"__ignoreMap":7},[169],{"type":24,"value":164},{"title":7,"searchDepth":171,"depth":171,"links":172},4,[173,175],{"id":44,"depth":174,"text":47},3,{"id":123,"depth":174,"text":126},"markdown","content:technology-blogs:zh:1889.md","content","technology-blogs/zh/1889.md","technology-blogs/zh/1889","md",1776506116765]