mindspore.profiler._ExperimentalConfig

View Source On Gitee
class mindspore.profiler._ExperimentalConfig(profiler_level: ProfilerLevel = ProfilerLevel.Level0, aic_metrics: AicoreMetrics = AicoreMetrics.AiCoreNone, l2_cache: bool = False, mstx: bool = False, data_simplification: bool = True, export_type: list = None, mstx_domain_include: list = None, mstx_domain_exclude: list = None, sys_io: bool = False, sys_interconnection: bool = False, host_sys: list = None)[source]

The purpose of this class is to configure scalable parameters when using profiles for model performance data acquisition.

Parameters
  • profiler_level (ProfilerLevel, optional) –

    (Ascend only) The level of profiling. Default: ProfilerLevel.Level0.

    • ProfilerLevel.LevelNone: This setting takes effect only when mstx is enabled, indicating that no operator data is collected on the device side.

    • ProfilerLevel.Level0: Leanest level of profiling data collection, collects information about the elapsed time of the computational operators on the NPU and communication large operator information.

    • ProfilerLevel.Level1: Collect more CANN layer AscendCL data and AICore performance metrics and communication mini operator information based on Level0.

    • ProfilerLevel.Level2: Collect GE and Runtime information in CANN layer on top of Level1

  • aic_metrics (AicoreMetrics, optional) –

    (Ascend only) Types of AICORE performance data collected, when using this parameter, activities must include ProfilerActivity.NPU , and the value must be a member of AicoreMetrics. When profiler_level is Level0, the default value is AicoreMetrics.AiCoreNone; Profiler_level is a Level1 or Level2 stores, the default value is: AicoreMetrics. PipeUtilization.The data items contained in each metric are as follows:

    • AicoreMetrics.AiCoreNone: Does not collect AICORE data.

    • AicoreMetrics.ArithmeticUtilization: ArithmeticUtilization contains mac_fp16/int8_ratio, vec_fp32/fp16/int32_ratio, vec_misc_ratio etc.

    • AicoreMetrics.PipeUtilization: PipeUtilization contains vec_ratio, mac_ratio, scalar_ratio, mte1/mte2/mte3_ratio, icache_miss_rate etc.

    • AicoreMetrics.Memory: Memory contains ub_read/write_bw, l1_read/write_bw, l2_read/write_bw, main_mem_read/write_bw etc.

    • AicoreMetrics.MemoryL0: MemoryL0 contains l0a_read/write_bw, l0b_read/write_bw, l0c_read/write_bw etc.

    • AicoreMetrics.ResourceConflictRatio: ResourceConflictRatio contains vec_bankgroup/bank/resc_cflt_ratio etc.

    • AicoreMetrics.MemoryUB: MemoryUB contains ub_read/write_bw_mte, ub_read/write_bw_vector, ub_read/write_bw_scalar etc.

    • AicoreMetrics.L2Cache: L2Cache contains write_cache_hit, write_cache_miss_allocate, r0_read_cache_hit, r1_read_cache_hit etc. This function only supports Atlas A2 training series products.

    • AicoreMetrics.MemoryAccess: Statistics on storage access bandwidth and storage capacity of main storage and l2 cache etc.

  • l2_cache (bool, optional) – (Ascend only) Whether to collect l2 cache data, collect when True. Default: False . The l2_cache.csv file is generated in the ASCEND_PROFILER_OUTPUT folder. In GE backend, only support mindspore.profiler.schedule configuration wait and skip_first parameter is 0.

  • mstx (bool, optional) – (Ascend only) Whether to collect light weight profiling data, collect when True. Default: False .

  • data_simplification (bool, optional) – (Ascend only) Whether to remove FRAMEWORK data and other redundant data. If set to True, only the profiler deliverables and raw performance data under the PROF_XXX directory are kept to save space. Default value: True .

  • export_type (list, optional) –

    (Ascend only) The data type to export. The db and text formats can be exported at the same time. The default value is None, indicating that data of the text type is exported.

    • ExportType.Text: Export text type data.

    • ExportType.Db: Export db type data.

  • mstx_domain_include (list, optional) – (Ascend only) Set the set of enabled domain names when the mstx switch is turned on. The name must be of str type. Default value: [], indicating that this parameter is not used to control the domain. This parameter is mutually exclusive with the mstx_domain_exclude parameter and cannot be set. simultaneously. If both are set, only the mstx_domain_include parameter takes effect.

  • mstx_domain_exclude (list, optional) – (Ascend only) Set the set of domain names that are not enabled when the mstx switch is turned on. The name must be of str type. Default value: [], indicating that this parameter is not used to control the domain.

  • sys_io (bool, optional) – (Ascend only) Whether to collect NIC and RoCE data. Default: False.

  • sys_interconnection (bool, optional) – (Ascend only) Whether to collect system interconnection data, including HCCS data, PCIe data, and Stars Chip Trans. Default: False.

  • host_sys (list, optional) –

    Collect the data of system call classes on the host side. Default: [], indicating that system class data on the host side is not collected. You need to set start_profile of mindspore.profiler.profile to False.When collecting DISK or OSRT data, it is necessary to install the iotop, perf, and ltrace third-party tools in advance. For detailed steps, please refer to Installing Third-party Tools . After the third-party tool is successfully installed, user permissions need to be configured. For detailed steps, please refer to Configure User Permissions . Note that in step 3 of configuring user permissions, the content in the msprof_data_collection.sh script needs to be replaced with msprof_data_collection.sh.

    Final deliverables by MindStudio Insight tool visualizes the analysis results. For detailed analysis, please refer to host-side CPU data analysis, host-side MEM data analysis, host-side DISK data analysis, host-side NETWORK data analysis, host-side OSRT data analysis.

    • HostSystem.CPU: Collect the CPU utilization at the process level.

    • HostSystem.MEM: Collect the memory utilization at the process level.

    • HostSystem.DISK: Collect the disk I/O utilization at the process level.

    • HostSystem.NETWORK: Collect the network I/O utilization at the system level.

    • HostSystem.OSRT: Collect system call stack data at the system level.

Raises

RuntimeError – When the version of CANN does not match the version of MindSpore, MindSpore cannot parse the generated ascend_job_id directory structure.

Supported Platforms:

Ascend GPU

Examples

>>> import numpy as np
>>> import mindspore
>>> from mindspore import nn, context
>>> import mindspore.dataset as ds
>>> from mindspore.profiler import ProfilerLevel, ProfilerActivity, AicoreMetrics, ExportType
>>>
>>> class Net(nn.Cell):
...     def __init__(self):
...         super(Net, self).__init__()
...         self.fc = nn.Dense(2,2)
...     def construct(self, x):
...         return self.fc(x)
>>>
>>> def generator():
...     for i in range(2):
...         yield np.ones([2, 2]).astype(np.float32), np.ones([2]).astype(np.int32)
>>>
>>> def train(net):
...     optimizer = nn.Momentum(net.trainable_params(), 1, 0.9)
...     loss = nn.SoftmaxCrossEntropyWithLogits(sparse=True)
...     data = ds.GeneratorDataset(generator, ["data", "label"])
...     model = mindspore.train.Model(net, loss, optimizer)
...     model.train(1, data)
>>>
>>> if __name__ == '__main__':
...     # If the device_target is GPU, set the device_target to "GPU"
...     context.set_context(mode=mindspore.GRAPH_MODE)
...     mindspore.set_device("Ascend")
...
...     # Init Profiler
...     experimental_config = mindspore.profiler._ExperimentalConfig(
...                                 profiler_level=ProfilerLevel.Level0,
...                                 aic_metrics=AicoreMetrics.AiCoreNone,
...                                 l2_cache=False,
...                                 mstx=False,
...                                 data_simplification=False,
...                                 export_type=[ExportType.Text])
...     steps = 10
...     net = Net()
...     # Note that the Profiler should be initialized before model.train
...     with mindspore.profiler.profile(activities=[ProfilerActivity.CPU, ProfilerActivity.NPU],
...                                     schedule=mindspore.profiler.schedule(wait=0, warmup=0, active=1,
...                                           repeat=1, skip_first=0),
...                                     on_trace_ready=mindspore.profiler.tensorboard_trace_handler("./data"),
...                                     profile_memory=False,
...                                     experimental_config=experimental_config) as prof:
...
...         # Train Model
...         for step in range(steps):
...             train(net)
...             prof.step()