[{"data":1,"prerenderedAt":110},["ShallowReactive",2],{"content-query-BtupPYobow":3},{"_path":4,"_dir":5,"_draft":6,"_partial":6,"_locale":7,"title":8,"description":10,"date":11,"cover":12,"type":13,"category":14,"body":15,"_type":104,"_id":105,"_source":106,"_file":107,"_stem":108,"_extension":109},"/technology-blogs/en/1821","en",false,"",[9],"MindSpore Made Easy","This blog describes how to convert the PyTorch source code into MindSpore low-level API code and implement single-server single-device training on Ascend processors.","2022-08-12","https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2024/11/28/992a262b47c1446385d113662f7c829d.png","technology-blogs","Developer Sharing",{"type":16,"children":17,"toc":101},"root",[18,32,43,48,58,63,71,76,84,89,96],{"type":19,"tag":20,"props":21,"children":23},"element","h1",{"id":22},"mindspore-made-easy-conversion-from-pytorch-source-code-into-mindspore-low-level-api-code-and-single-server-single-device-training-on-ascend-processors",[24,30],{"type":19,"tag":25,"props":26,"children":27},"span",{},[28],{"type":29,"value":9},"text",{"type":29,"value":31}," Conversion from PyTorch Source Code into MindSpore Low-Level API Code and Single-Server Single-Device Training on Ascend Processors",{"type":19,"tag":33,"props":34,"children":35},"p",{},[36,38],{"type":29,"value":37},"August 12, 2022 1. Overview This blog describes how to convert the PyTorch source code into MindSpore low-level API code and implement single-server single-device training on Ascend processors. The following figure shows the differences between the training processes of MindSpore high-level APIs, low-level APIs, and PyTorch. ",{"type":19,"tag":39,"props":40,"children":42},"img",{"alt":7,"src":41},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/23/1d1764cb43d3481994ede5a8efa14653.png",[],{"type":19,"tag":33,"props":44,"children":45},{},[46],{"type":29,"value":47},"Similar to MindSpore high-level APIs, low-level API training also requires run configuration, data reading and preprocessing, network definition, loss function definition and optimizers. 2. Model Construction (Low-Level APIs) During model construction, the network prototype and loss function are encapsulated first. Then the combined model is encapsulated with an optimizer to form a network that can be used for training. Training and validation require the accuracy on the training set. Therefore, the return value must contain the output value of the network.",{"type":19,"tag":49,"props":50,"children":52},"pre",{"code":51},"import mindsporefrom mindspore import Modelimport mindspore.nn as nnfrom mindspore.ops import functional as Ffrom mindspore.ops import operations as P\nclass BuildTrainNetwork(nn.Cell):\n'''Build train network.'''\ndef __init__(self, my_network, my_criterion, train_batch_size, class_num):\nsuper(BuildTrainNetwork, self).__init__()\nself.network = my_network\nself.criterion = my_criterion\nself.print = P.Print()\n# Initialize self.output\nself.output = mindspore.Parameter(Tensor(np.ones((train_batch_size,\nclass_num)), mindspore.float32), requires_grad=False)\n\ndef construct(self, input_data, label):\noutput = self.network(input_data)\n# Get the network output and assign it to self.output\nself.output = output\nloss0 = self.criterion(output, label)\nreturn loss0\nclass TrainOneStepCellV2(TrainOneStepCell):\n'''Build train network.'''\ndef __init__(self, network, optimizer, sens=1.0):\nsuper(TrainOneStepCellV2, self).__init__(network, optimizer, sens=1.0)\n\ndef construct(self, *inputs):\nweights = self.weights\nloss = self.network(*inputs)\n# Obtain self.network from BuildTrainNetwork\noutput = self.network.output\nsens = P.Fill()(P.DType()(loss), P.Shape()(loss), self.sens)\n# Get the gradient of the network parameters\ngrads = self.grad(self.network, weights)(*inputs, sens)\ngrads = self.grad_reducer(grads)\n# Optimize model parameters\nloss = F.depend(loss, self.optimizer(grads))\nreturn loss, output\n# Construct model\nmodel_constructed = BuildTrainNetwork(net, loss_function, TRAIN_BATCH_SIZE, CLASS_NUM)\nmodel_constructed = TrainOneStepCellV2(model_constructed, opt)\n",[53],{"type":19,"tag":54,"props":55,"children":56},"code",{"__ignoreMap":7},[57],{"type":29,"value":51},{"type":19,"tag":33,"props":59,"children":60},{},[61],{"type":29,"value":62},"3 Training and Validation (Low-Level APIs) Similar to PyTorch, network training and validation are performed with low-level APIs.",{"type":19,"tag":49,"props":64,"children":66},{"code":65},"class CorrectLabelNum(nn.Cell):\n\ndef __init__(self):\n\nsuper(CorrectLabelNum, self).__init__()\n\nself.print = P.Print()\n\nself.argmax = mindspore.ops.Argmax(axis=1)\n\nself.sum = mindspore.ops.ReduceSum()\n\n\n\ndef construct(self, output, target):\n\noutput = self.argmax(output)\n\ncorrect = self.sum((output == target).astype(mindspore.dtype.float32))\n\nreturn correct\n\ndef train_net(model, network, criterion,\n\nepoch_max, train_path, val_path,\n\ntrain_batch_size, val_batch_size,\n\nrepeat_size):\n\n\n\n\"\"\"define the training method\"\"\"\n\n# Create dataset\n\nds_train, steps_per_epoch_train = create_dataset(train_path,\n\ndo_train=True, batch_size=train_batch_size, repeat_num=repeat_size)\n\nds_val, steps_per_epoch_val = create_dataset(val_path, do_train=False,\n\nbatch_size=val_batch_size, repeat_num=repeat_size)\n\n\n\n# CheckPoint CallBack definition\n\nconfig_ck = CheckpointConfig(save_checkpoint_steps=steps_per_epoch_train,\n\nkeep_checkpoint_max=epoch_max)\n\nckpoint_cb = ModelCheckpoint(prefix=\"train_resnet_cifar10\",\n\ndirectory=\"./\", config=config_ck)\n\n\n\n# Create dict to save internal callback object's parameters\n\ncb_params = _InternalCallbackParam()\n\ncb_params.train_network = model\n\ncb_params.epoch_num = epoch_max\n\ncb_params.batch_num = steps_per_epoch_train\n\ncb_params.cur_epoch_num = 0\n\ncb_params.cur_step_num = 0\n\nrun_context = RunContext(cb_params)\n\nckpoint_cb.begin(run_context)\n\n\n\nprint(\"============== Starting Training ==============\")\n\ncorrect_num = CorrectLabelNum()\n\ncorrect_num.set_train(False)\n\n\n\nfor epoch in range(epoch_max):\n\nprint(\"\nEpoch:\", epoch+1, \"/\", epoch_max)\n\ntrain_loss = 0\n\ntrain_correct = 0\n\ntrain_total = 0\n\nfor _, (data, gt_classes) in enumerate(ds_train):\n\nmodel.set_train()\n\nloss, output = model(data, gt_classes)\n\ntrain_loss += loss\n\ncorrect = correct_num(output, gt_classes)\n\ncorrect = correct.asnumpy()\n\ntrain_correct += correct.sum()\n\n# Update current step number\n\ncb_params.cur_step_num += 1\n\n# Check whether to save checkpoint or not\n\nckpoint_cb.step_end(run_context)\n\n\n\ncb_params.cur_epoch_num += 1\n\nmy_train_loss = train_loss/steps_per_epoch_train\n\nmy_train_accuracy = 100*train_correct/(train_batch_size*\n\nsteps_per_epoch_train)\n\nprint('Train Loss:', my_train_loss)\n\nprint('Train Accuracy:', my_train_accuracy, '%')\n\n\n\nprint('evaluating {}/{} ...'.format(epoch + 1, epoch_max))\n\nval_loss = 0\n\nval_correct = 0\n\nfor _, (data, gt_classes) in enumerate(ds_val):\n\nnetwork.set_train(False)\n\noutput = network(data)\n\nloss = criterion(output, gt_classes)\n\nval_loss += loss\n\ncorrect = correct_num(output, gt_classes)\n\ncorrect = correct.asnumpy()\n\nval_correct += correct.sum()\n\n\n\nmy_val_loss = val_loss/steps_per_epoch_val\n\nmy_val_accuracy = 100*val_correct/(val_batch_size*steps_per_epoch_val)\n\nprint('Validation Loss:', my_val_loss)\n\nprint('Validation Accuracy:', my_val_accuracy, '%')\n\n\n\nprint(\"--------- trains out ---------\")\n",[67],{"type":19,"tag":54,"props":68,"children":69},{"__ignoreMap":7},[70],{"type":29,"value":65},{"type":19,"tag":33,"props":72,"children":73},{},[74],{"type":29,"value":75},"4 Script Running Run the command:",{"type":19,"tag":49,"props":77,"children":79},{"code":78},"python MindSpore_1P_low_API.py --data_path=xxx --epoch_num=xxx\n",[80],{"type":19,"tag":54,"props":81,"children":82},{"__ignoreMap":7},[83],{"type":29,"value":78},{"type":19,"tag":33,"props":85,"children":86},{},[87],{"type":29,"value":88},"Run the script on the terminal in the development environment and the network output is displayed.",{"type":19,"tag":33,"props":90,"children":91},{},[92],{"type":19,"tag":39,"props":93,"children":95},{"alt":7,"src":94},"https://obs-mindspore-file.obs.cn-north-4.myhuaweicloud.com/file/2022/09/23/f04937a7fb29456687e3eb83342e3b8e.png",[],{"type":19,"tag":33,"props":97,"children":98},{},[99],{"type":29,"value":100},"Note: High-level APIs support model training in data offloading mode, which is not supported by low-level APIs. Therefore, model training with high-level APIs is faster than that with low-level APIs. Performance comparison: Low-level APIs: 2000 imgs/sec; high-level APIs: 2200 imgs/sec",{"title":7,"searchDepth":102,"depth":102,"links":103},4,[],"markdown","content:technology-blogs:en:1821.md","content","technology-blogs/en/1821.md","technology-blogs/en/1821","md",1776506105116]