{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# 运行管理\n", "\n", "[![](https://gitee.com/mindspore/docs/raw/r1.2/docs/programming_guide/source_zh_cn/_static/logo_source.png)](https://gitee.com/mindspore/docs/blob/r1.2/docs/programming_guide/source_zh_cn/context.ipynb) [![](https://gitee.com/mindspore/docs/raw/r1.2/docs/programming_guide/source_zh_cn/_static/logo_notebook.png)](https://obs.dualstack.cn-north-4.myhuaweicloud.com/mindspore-website/notebook/r1.2/programming_guide/mindspore_context.ipynb) [![](https://gitee.com/mindspore/docs/raw/r1.2/docs/programming_guide/source_zh_cn/_static/logo_modelarts.png)](https://console.huaweicloud.com/modelarts/?region=cn-north-4#/notebook/loading?share-url-b64=aHR0cHM6Ly9vYnMuZHVhbHN0YWNrLmNuLW5vcnRoLTQubXlodWF3ZWljbG91ZC5jb20vbWluZHNwb3JlLXdlYnNpdGUvbm90ZWJvb2svbW9kZWxhcnRzL3Byb2dyYW1taW5nX2d1aWRlL21pbmRzcG9yZV9jb250ZXh0LmlweW5i&image_id=65f636a0-56cf-49df-b941-7d2a07ba8c8c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 概述\n", "\n", "初始化网络之前要配置context参数,用于控制程序执行的策略。比如选择执行模式、选择执行后端、配置分布式相关参数等。按照context参数设置实现的不同功能,可以将其分为执行模式管理、硬件管理、分布式管理和维测管理等。\n", "\n", "## 执行模式管理\n", "\n", "MindSpore支持PyNative和Graph这两种运行模式:\n", "\n", "- `PYNATIVE_MODE`:动态图模式,将神经网络中的各个算子逐一下发执行,方便用户编写和调试神经网络模型。\n", "\n", "- `GRAPH_MODE`:静态图模式或者图模式,将神经网络模型编译成一整张图,然后下发执行。该模式利用图优化等技术提高运行性能,同时有助于规模部署和跨平台运行。\n", "\n", "### 模式选择\n", "\n", "通过设置可以控制程序运行的模式,默认情况下,MindSpore处于PyNative模式。\n", "\n", "代码样例如下:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2021-02-10T02:10:02.859770Z", "start_time": "2021-02-10T02:10:02.856678Z" } }, "outputs": [], "source": [ "from mindspore import context\n", "context.set_context(mode=context.GRAPH_MODE)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 模式切换\n", "\n", "实现两种模式之间的切换。\n", "\n", "MindSpore处于PYNATIVE模式时,可以通过`context.set_context(mode=context.GRAPH_MODE)`切换为Graph模式;同样地,MindSpore处于Graph模式时,可以通过 `context.set_context(mode=context.PYNATIVE_MODE)`切换为PyNative模式。\n", "\n", "代码样例如下:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "Tensor(shape=[1, 4, 5, 5], dtype=Float32, value=\n", "[[[[ 1.64782144e-02, 5.31007685e-02, 5.31007685e-02, 5.31007685e-02, 5.11828624e-02],\n", " [ 3.00714076e-02, 6.57572001e-02, 6.57572001e-02, 6.57572001e-02, 4.35083285e-02],\n", " [ 3.00714076e-02, 6.57572001e-02, 6.57572001e-02, 6.57572001e-02, 4.35083285e-02]\n", " [ 3.00714076e-02, 6.57572001e-02, 6.57572001e-02, 6.57572001e-02, 4.35083285e-02],\n", " [ 1.84759758e-02, 4.71352898e-02, 4.71352898e-02, 4.71352898e-02, 3.72093469e-02]],\n", " [[-3.36203352e-02, -6.12429380e-02, -6.12429380e-02, -6.12429380e-02, -4.33492810e-02],\n", " [-2.67659649e-02, -8.04031491e-02, -8.04031491e-02, -8.04031491e-02, -6.84653893e-02],\n", " [-2.67659649e-02, -8.04031491e-02, -8.04031491e-02, -8.04031491e-02, -6.84653893e-02]\n", " [-2.67659649e-02, -8.04031491e-02, -8.04031491e-02, -8.04031491e-02, -6.84653893e-02],\n", " [-5.57974726e-03, -6.80863336e-02, -6.80863336e-02, -6.80863336e-02, -8.38923305e-02]],\n", " [[-1.60222687e-02, 2.26615220e-02, 2.26615220e-02, 2.26615220e-02, 6.03060052e-02],\n", " [-6.76476881e-02, -2.96694487e-02, -2.96694487e-02, -2.96694487e-02, 4.86185402e-02],\n", " [-6.76476881e-02, -2.96694487e-02, -2.96694487e-02, -2.96694487e-02, 4.86185402e-02]\n", " [-6.76476881e-02, -2.96694487e-02, -2.96694487e-02, -2.96694487e-02, 4.86185402e-02],\n", " [-6.52819276e-02, -3.50066647e-02, -3.50066647e-02, -3.50066647e-02, 2.85858363e-02]]\n", " [[-3.10218725e-02, -3.84682454e-02, -3.84682454e-02, -3.84682454e-02, -8.58424231e-03],\n", " [-4.27014455e-02, -7.07850009e-02, -7.07850009e-02, -7.07850009e-02, -5.36267459e-02],\n", " [-4.27014455e-02, -7.07850009e-02, -7.07850009e-02, -7.07850009e-02, -5.36267459e-02]\n", " [-4.27014455e-02, -7.07850009e-02, -7.07850009e-02, -7.07850009e-02, -5.36267459e-02],\n", " [-1.23060495e-02, -4.99926135e-02, -4.99926135e-02, -4.99926135e-02, -4.71802950e-02]]]])" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import numpy as np\n", "import mindspore.nn as nn\n", "from mindspore import context, Tensor\n", "\n", "context.set_context(mode=context.GRAPH_MODE, device_target=\"GPU\")\n", "\n", "conv = nn.Conv2d(3, 4, 3, bias_init='zeros')\n", "input_data = Tensor(np.ones([1, 3, 5, 5]).astype(np.float32))\n", "conv(input_data)\n", "context.set_context(mode=context.PYNATIVE_MODE)\n", "\n", "conv(input_data)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "上面的例子先将运行模式设置为`GRAPH_MODE`模式,然后将模式切换为`PYNATIVE_MODE`模式,实现了模式的切换。\n", "\n", "> 本示例代码运行于GPU环境。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 硬件管理\n", "\n", "硬件管理部分主要包括`device_target`和`device_id`两个参数。\n", "\n", "- `device_target`: 用于设置目标设备,支持Ascend、GPU和CPU,可以根据实际环境情况设置。\n", "\n", "- `device_id`: 表示卡物理序号,即卡所在机器中的实际序号。如果目标设备为Ascend,且规格为N*Ascend(其中N>1,如8*Ascend),在非分布式模式执行的情况下,为了避免设备的使用冲突,可以通过设置`device_id`决定程序执行的device编号,该编号范围为:0 ~ 服务器总设备数量-1,服务器总设备数量不能超过4096,默认为设备0。\n", "\n", "> 在GPU和CPU上,设置`device_id`参数无效。\n", "\n", "代码样例如下:\n", "```python\n", "from mindspore import context\n", "context.set_context(device_target=\"Ascend\", device_id=6)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 分布式管理\n", "\n", "context中有专门用于配置并行训练参数的接口:`context.set_auto_parallel_context`,该接口必须在初始化网络之前调用。\n", "\n", "> 分布式并行训练详细介绍可以查看[分布式并行训练](https://www.mindspore.cn/tutorial/training/zh-CN/r1.2/advanced_use/distributed_training_tutorials.html)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 维测管理\n", "\n", "为了方便维护和定位问题,context提供了大量维测相关的参数配置,如采集profiling数据、异步数据dump功能和print算子落盘等。\n", "\n", "### 采集profiling数据\n", "\n", "系统支持在训练过程中采集profiling数据,然后通过profiling工具进行性能分析。当前支持采集的profiling数据包括:\n", "\n", "- `enable_profiling`:是否开启profiling功能。设置为True,表示开启profiling功能,从`enable_options`读取profiling的采集选项;设置为False,表示关闭profiling功能,仅采集`training_trace`。\n", "\n", "- `profiling_options`:profiling采集选项,取值如下,支持采集多项数据。 \n", " - `result_path`:Profiling采集结果文件保存路径。该参数指定的目录需要在启动训练的环境上(容器或Host侧)提前创建且确保安装时配置的运行用户具有读写权限,支持配置绝对路径或相对路径(相对执行命令时的当前路径)。 \n", " - `training_trace`:采集迭代轨迹数据,即训练任务及AI软件栈的软件信息,实现对训练任务的性能分析,重点关注数据增强、前后向计算、梯度聚合更新等相关数据,取值on/off。 \n", " - `task_trace`:采集任务轨迹数据,即昇腾910处理器HWTS/AICore的硬件信息,分析任务开始、结束等信息,取值on/off。 \n", " - `aicpu_trace`:采集aicpu数据增强的profiling数据。取值on/off。 \n", " - `fp_point`:`training_trace`为on时需要配置。指定训练网络迭代轨迹正向算子的开始位置,用于记录前向算子开始时间戳。配置值为指定的正向第一个算子名字。当该值为空时,系统自动获取正向第一个算子名字。 \n", " - `bp_point`:`training_trace`为on时需要配置。指定训练网络迭代轨迹反向算子的结束位置,用于记录反向算子结束时间戳。配置值为指定的反向最后一个算子名字。当该值为空时,系统自动获取反向最后一个算子名字。 \n", " - `ai_core_metrics`取值如下:\n", "\n", " - ArithmeticUtilization:各种计算类指标占比统计。\n", "\n", " - PipeUtilization:计算单元和搬运单元耗时占比,该项为默认值。\n", "\n", " - Memory:外部内存读写类指令占比。\n", "\n", " - MemoryL0:内部内存读写类指令占比。\n", "\n", " - ResourceConflictRatio:流水线队列类指令占比。\n", "\n", "代码样例如下:\n", "\n", "```python\n", "from mindspore import context\n", "context.set_context(enable_profiling=True, profiling_options= '{\"result_path\":\"/home/data/output\",\"training_trace\":\"on\"}')\n", "\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### 保存MindIR\n", "\n", "通过`context.set_context(save_graphs=True)`来保存各个编译阶段的中间代码。\n", "\n", "被保存的中间代码有两种格式:一个是后缀名为`.ir`的文本格式,一个是后缀名为`.dot`的图形化格式。\n", "\n", "当网络规模较大时建议使用更高效的文本格式来查看,当网络规模不大时,建议使用更直观的图形化格式来查看。\n", "\n", "代码样例如下:\n", "\n", "```python\n", "from mindspore import context\n", "context.set_context(save_graphs=True)\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> MindIR详细介绍可以查看[MindSpore IR(MindIR)](https://www.mindspore.cn/doc/note/zh-CN/r1.2/design/mindspore/mindir.html)。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### print算子落盘\n", "\n", "默认情况下,MindSpore的自研print算子可以将用户输入的Tensor或字符串信息打印出来,支持多字符串输入,多Tensor输入和字符串与Tensor的混合输入,输入参数以逗号隔开。\n", "\n", "> Print打印功能可以查看[Print算子功能介绍](https://www.mindspore.cn/tutorial/training/zh-CN/r1.2/advanced_use/custom_debugging_info.html#print)。\n", "\n", "- `print_file_path`:可以将print算子数据保存到文件,同时关闭屏幕打印功能。如果保存的文件已经存在,则会给文件添加时间戳后缀。数据保存到文件可以解决数据量较大时屏幕打印数据丢失的问题。\n", "\n", "代码样例如下:\n", "\n", "```python\n", "from mindspore import context\n", "context.set_context(print_file_path=\"print.pb\")\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "> context接口详细介绍可以查看[mindspore.context](https://www.mindspore.cn/doc/api_python/zh-CN/r1.2/mindspore/mindspore.context.html)。" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.5" } }, "nbformat": 4, "nbformat_minor": 4 }