mindspore.mint.distributed.init_process_group

View Source On AtomGit
mindspore.mint.distributed.init_process_group(backend='hccl', init_method=None, timeout=None, world_size=- 1, rank=- 1, store=None, group_name='', pg_options=None, device_id=None)[source]

Initialize the collective communication library and create a default collective communication group.

Note

  • This method isn't supported in GPU and CPU versions of MindSpore.

  • On Ascend hardware platforms, this API should be called before the definition of any Tensor and Parameter, and before the instantiation and execution of any operation and net.

  • Only support PyNative mode, Graph mode is not currently supported.

Parameters
  • backend (str, optional) – The backend to use. Default is "hccl" and now only support "hccl".

  • init_method (str, optional) – URL specifying how to initialize the collective communication group. Default is None.

  • timeout (timedelta, optional) – Timeout for API execution. Default is None. Currently, this parameter is only supported for host-side cluster network configuration using init_method or store.

  • world_size (int, optional) – Number of the processes participating in the job. Default is -1.

  • rank (int, optional) – Rank of the current process. Default is -1.

  • store (Store, optional) – An object that stores key/value data, facilitating the exchange of inter-process communication addresses and connection information. Default is None. Currently, only the TCPStore type is supported.

  • group_name (str, optional) – Set the default global communication group name. Default is "".

  • pg_options (ProcessGroupOptions, invalid) – Process group options specifying what additional options need to be passed in during the construction of specific process group. The provided parameter is a reserved parameter, and the current setting does not take effect.

  • device_id (int, invalid) – The device ID to execute on. The provided parameter is a reserved parameter, and the current setting does not take effect.

Raises
  • ValueError – If backend is not hccl.

  • ValueError – If world_size is not equal to -1 or process group number.

  • ValueError – If both init_method and store are set.

  • ValueErrorworld_size is not correctly set as a positive integer value, when using the initialization method init_method or store.

  • ValueErrorrank is not correctly set as a non-negative integer, when using the initialization method init_method or store.

  • RuntimeError – If device target is invalid, or backend is invalid, or distributed initialization fails, or the environment variables RANK_ID/MINDSPORE_HCCL_CONFIG_PATH have not been exported when backend is HCCL.

Supported Platforms:

Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies. Please see the msrun startup for more details.

>>> import mindspore as ms
>>> from mindspore.mint.distributed import init_process_group, destroy_process_group
>>> ms.set_device(device_target="Ascend")
>>> init_process_group()
>>> destroy_process_group()