[ "MindSpore Made Easy" ]

[ "MindSpore Made Easy" ]

MindSpore Made Easy Collecting Profile Data in a ModelArts Development Environment

August 12, 2022

This blog describes how to collect profile data in a ModelArts development environment and enable profiler for performance debugging as required.

1. Collecting Profile Data by Using the Training Script in the Development Environment

To collect the profile data of a neural network, you need to add MindSpore Profiler APIs to the training script.

(1) After set_context is executed and before the network and HCCL are initialized, initialize the MindSpore Profiler objects.

(2) After the training is complete, call Profiler.analyse() to stop profile data collection and generate the profiling results.

The sample code is as follows:

from mindspore.profiler import Profilerfrom mindspore import Model



context.set_context(mode=context.GRAPH_MODE,

device_target=args.device_target)

SAVE_PATH = "./profile"

# Init Profiler and SummaryCollector# Data directory should be placed under SAVE_PATH.

profiler_output_path = SAVE_PATH + "mindspore_profile"

profiler = Profiler(output_path=profiler_output_path)# Train Model

Model.train()

# Profiler end

profiler.analyse()

Note: output_path indicates the path where the profile data is generated. If this path is not specified, the profile data is automatically saved in the data folder (automatically generated) under the current directory.

2. Collecting Profile Data for Performance Debugging as Required

Enable profile data collection as required.

(1) The sample code for collecting profile data by step is as follows:

class StopAtStep(Callback):

    def __init__(self, start_step, stop_step):

        super(StopAtStep, self).__init__()

        self.start_step = start_step

        self.stop_step = stop_step

        self.profiler = Profiler(start_profile=False)



    def step_begin(self, run_context):

        cb_params = run_context.original_args()

        step_num = cb_params.cur_step_num

        if step_num == self.start_step:

            self.profiler.start() # Enable profile data collection as required.



    def step_end(self, run_context):

        cb_params = run_context.original_args()

        step_num = cb_params.cur_step_num

        if step_num == self.stop_step:

            self.profiler.stop() # Disable profile data collection as required.



    def end(self, run_context):

        self.profiler.analyse()

...

...

start_step = 2

stop_step = 5

profiler_data = StopAtEpoch(start_step, stop_step)

model.train(..., callbacks=[..., profiler_data])

(2) The sample code for collecting profile data by epoch is as follows:

class StopAtEpoch(Callback):

    def __init__(self, start_epoch, stop_epoch):

        super(StopAtEpoch, self).__init__()

        self.start_epoch = start_epoch

        self.stop_epoch = stop_epoch

        self.profiler = Profiler(start_profile=False)



    def epoch_begin(self, run_context):

        cb_params = run_context.original_args()

        epoch_num = cb_params.cur_epoch_num

        if epoch_num == self.start_epoch:

            self.profiler.start() # Enable profile data collection as required.



    def epoch_end(self, run_context):

        cb_params = run_context.original_args()

        epoch_num = cb_params.cur_epoch_num

        if epoch_num == self.stop_epoch:

            self.profiler.stop() # Disable profile data collection as required.



    def end(self, run_context):

        self.profiler.analyse()

...

...

start_epoch = 2

stop_epoch = 5

profiler_data = StopAtEpoch(start_epoch, stop_epoch)

model.train(..., callbacks=[..., profiler_data])

3. Running the Script

Startup command:

python MindSpore_1P_profiler.py data_path=xxx

Run the script in Terminal of the development environment. After script execution, the generated profile data is stored in SAVE_PATH.

Note: On-demand profiler performance debugging does not support user-defined data storage paths. Therefore, after the program is complete, the profile data is saved in the data file in the current directory by default.

Precautions:

1. Currently, performance debugging is not supported during training while inference, but is supported for separate training or inference.

2. Ascend performance debugging does not support the dynamic shape, multi-subgraph, and control flow scenarios.