mindspore_lite.Context

View Source On Gitee
class mindspore_lite.Context[source]

The Context class is used to transfer environment variables during execution.

The context should be configured before running the program. If it is not configured, the target will be set to cpu, and automatically set cpu attributes by default.

Context.parallel defines the context and configuration of ModelParallelRunner class.

Context.parallel properties:
  • workers_num (int) - the num of workers. A ModelParallelRunner contains multiple workers, which are the units that actually perform parallel inferring. Setting workers_num to 0 represents workers_num will be automatically adjusted based on computer performance and core numbers.

  • config_info (dict{str, dict{str, str}}) - Nested map for transferring user defined options during building ModelParallelRunner online. More configurable options refer to config_path . For example, {"model_file": {"mindir_path": "/home/user/model_graph.mindir"}}. section is "model_file", one of the keys is "mindir_path", the corresponding value in the map is "/home/user/model_graph.mindir".

  • config_path (str) - Set the config file path. The config file is used to transfer user-defined options during building ModelParallelRunner . In the following scenarios, users may need to set the parameter. For example, "/home/user/config.txt".

    • Usage 1: Set mixed precision inference. The content and description of the configuration file are as follows:

      [execution_plan]
      [op_name1]=data_Type: float16 (The operator named op_name1 sets the data type as float16)
      [op_name2]=data_Type: float32 (The operator named op_name2 sets the data type as float32)
      
    • Usage 2: When GPU inference, set the configuration of TensorRT. The content and description of the configuration file are as follows:

      [ms_cache]
      serialize_Path=[serialization model path](storage path of serialization model)
      [gpu_context]
      input_shape=input_Name: [input_dim] (Model input dimension, for dynamic shape)
      dynamic_Dims=[min_dim~max_dim] (dynamic dimension range of model input, for dynamic shape)
      opt_Dims=[opt_dim] (the optimal input dimension of the model, for dynamic shape)
      
    • Usage 3: For the large model, when using the model buffer to load and compile, you need to set the path of the weight file separately through passing the path of the large model. And it is necessary to ensure that the large model file and the folder where the weight file is located are in the same folder. For example, when the directory is as follows:

      .
      └── /home/user/
           ├── model_graph.mindir
           └── model_variables
                └── data_0
      

      The content and description of the configuration file are as follows:

      [model_file]
      mindir_path=[/home/user/model_graph.mindir](storage path of the large model)
      

Examples

>>> # create default context, which target is cpu by default.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> print(context)
target: ['cpu'].
>>> # testcase 2 about context's attribute parallel based on server inference package
>>> # (export MSLITE_ENABLE_SERVER_INFERENCE=on before compile lite or use cloud inference package)
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.target = ["cpu"]
>>> context.parallel.workers_num = 4
>>> context.parallel.config_info = {"model_file": {"mindir_path": "/home/user/model_graph.mindir"}}
>>> context.parallel.config_path = "/home/user/config.txt"
>>> print(context.parallel)
workers num: 4,
config info: model_file: mindir_path /home/user/model_graph.mindir,
config path: /home/user/config.txt.
property group_info_file

Get or set communication group info file for distributed inference.

In the pipeline parallel scenario, different stage device nodes are in different communication groups. When exporting the model, set the group_ckpt_save_file parameter in interface [mindspore.set_auto_parallel_context](https://www.mindspore.cn/docs/zh-CN/r2.3/api_python/mindspore/mindspore.set_auto_parallel_context.html) to export the group file information. In addition, in non pipeline parallel scenarios, if there are communication operators involving local communication groups, the group file information also needs to be exported through the ‘group_ckpt_save_file’ parameter.

Examples

>>> # export communication group information file when export mindir
>>> import mindspore
>>> mindspore.set_auto_parallel_context(group_ckpt_save_file=f"{export_dir}/group_config_{rank_id}.pb")
>>>
>>> # use communication group information file when load mindir
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> context.group_info_file = f"{export_dir}/group_config_{rank_id}.pb"
property target

Get the target device information of context.

Currently support target: "cpu" , "gpu" , "ascend".

Note

After gpu is added to target, cpu will be added automatically as the backup target. Because when ops are not supported on gpu, The system will try whether the cpu supports it. At that time, need to switch to the context with cpu.

After Ascend is added, cpu will be added automatically as the backup target. when the inputs format of the original model is inconsistent with that of the model generated by Converter, the model generated by Converter on Ascend device will contain the ‘Transpose’ node, which needs to be executed on the cpu device currently. So it needs to switch to the context with cpu target.

cpu properties:
  • inter_op_parallel_num (int) - Set the parallel number of operators at runtime. inter_op_parallel_num cannot be greater than thread_num . Setting inter_op_parallel_num to 0 represents inter_op_parallel_num will be automatically adjusted based on computer performance and core num.

  • precision_mode (str) - Set the mix precision mode. Options are "preferred_fp16" , "enforce_fp32".

    • "preferred_fp16" : prefer to use fp16.

    • "enforce_fp32" : force use fp32.

  • thread_num (int) - Set the number of threads at runtime. thread_num cannot be less than inter_op_parallel_num . Setting thread_num to 0 represents thread_num will be automatically adjusted based on computer performance and core numbers.

  • thread_affinity_mode (int) - Set the mode of the CPU core binding policy at runtime. The following thread_affinity_mode are supported.

    • 0 : no binding core.

    • 1 : binding big cores first.

    • 2 : binding middle cores first.

  • thread_affinity_core_list (list[int]) - Set the list of CPU core binding policies at runtime. For example, [0,1] represents the specified binding of CPU0 and CPU1.

gpu properties:
  • device_id (int) - The device id.

  • group_size (int) - the number of the clusters. Get only, not settable.

  • precision_mode (str) - Set the mix precision mode. Options are "preferred_fp16" , "enforce_fp32".

    • "preferred_fp16": prefer to use fp16.

    • "enforce_fp32": force use fp32.

  • rank_id (int) - the ID of the current device in the cluster, which starts from 0. Get only, not settable.

ascend properties:
  • device_id (int) - The device id.

  • precision_mode (str) - Set the mix precision mode. Options are "enforce_fp32" , "preferred_fp32" , "enforce_fp16" , "enforce_origin" , "preferred_optimal".

    • "enforce_fp32": ACL option is force_fp32, force use fp32.

    • "preferred_fp32": ACL option is allow_fp32_to_fp16, prefer to use fp32.

    • "enforce_fp16": ACL option is force_fp16, force use fp16.

    • "enforce_origin": ACL option is must_keep_origin_dtype, force use original type.

    • "preferred_optimal": ACL option is allow_mix_precision, prefer to use fp16+ mix precision mode.

  • provider (str) - The provider that supports the inference capability of the target device, can be "" or "ge". The default is "".

  • rank_id (int) - The ID of the current device in the cluster, which starts from 0.

Returns

list[str], the target device information of context.

Examples

>>> # create default context, which target is cpu by default.
>>> import mindspore_lite as mslite
>>> context = mslite.Context()
>>> # set context with cpu target.
>>> context.target = ["cpu"]
>>> print(context.target)
['cpu']
>>> context.cpu.precision_mode = "preferred_fp16"
>>> context.cpu.thread_num = 2
>>> context.cpu.inter_op_parallel_num = 2
>>> context.cpu.thread_affinity_mode = 1
>>> context.cpu.thread_affinity_core_list = [0,1]
>>> print(context.cpu)
device_type: DeviceType.kCPU,
precision_mode: preferred_fp16,
thread_num: 2,
inter_op_parallel_num: 2,
thread_affinity_mode: 1,
thread_affinity_core_list: [0, 1].
>>> # set context with gpu target.
>>> context.target = ["gpu"]
>>> print(context.target)
['gpu']
>>> context.gpu.precision_mode = "preferred_fp16"
>>> context.gpu.device_id = 2
>>> print(context.gpu.rank_id)
0
>>> print(context.gpu.group_size)
1
>>> print(context.gpu)
device_type: DeviceType.kGPU,
precision_mode: preferred_fp16,
device_id: 2,
rank_id: 0,
group_size: 1.
>>> # set context with ascend target.
>>> context.target = ["ascend"]
>>> print(context.target)
['ascend']
>>> context.ascend.precision_mode = "enforce_fp32"
>>> context.ascend.device_id = 2
>>> context.ascend.provider = "ge"
>>> context.ascend.rank_id = 0
>>> print(context.ascend)
device_type: DeviceType.kAscend,
precision_mode: enforce_fp32,
device_id: 2,
provider: ge,
rank_id: 0.