mindspore.communication.comm_func.all_to_all_v_c

View Source On Gitee
mindspore.communication.comm_func.all_to_all_v_c(output, input, send_count_matrix, group=None, async_op=False)[source]

Based on the user-specified split size, the input tensor is divided and sent to other devices, where split chunks are received and then merged into a single output tensor.

Note

Only support PyNative mode, Graph mode is not currently supported.

Parameters
  • output (Tensor) – the output tensor is gathered concatenated from remote ranks.

  • input (Tensor) – tensor to be scattered to remote rank.

  • send_count_matrix (list[int]) – The sending and receiving parameters of all ranks, \(\text{send_count_matrix}[i*\text{rank_size}+j]\) represents the amount of data sent by rank i to rank j, and the basic unit is first dimension sizes. Among them, rank_size indicates the size of the communication group.

  • group (str, optional) – The communication group to work on. If None, which means "hccl_world_group" in Ascend. Default: None.

  • async_op (bool, optional) – Whether this operator should be an async operator. Default: False .

Returns

CommHandle. CommHandle is an async work handle, if async_op is set to True. CommHandle will be None, when async_op is False.

Raises

TypeError – If input or output is not tensor. group is not a str, or async_op is not bool.

Supported Platforms:

Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies. Please see the msrun start up for more details.

This example should be run with 2 devices.

>>> import numpy as np
>>> import mindspore
>>> from mindspore.mint.distributed import init_process_group, get_rank
>>> from mindspore.communication.comm_func import all_to_all_v_c
>>> from mindspore import Tensor
>>> from mindspore.ops import zeros
>>>
>>> init_process_group()
>>> this_rank = get_rank()
>>> if this_rank == 0:
...     output = Tensor(np.zeros([3]).astype(np.float32))
...     tensor = Tensor([0, 1, 2.]) * this_rank
...     result = all_to_all_v_c(output, tensor, [0, 3, 3, 0])
...     print(output)
>>> if this_rank == 1:
...     output = Tensor(np.zeros([3]).astype(np.float32))
...     tensor = Tensor([0, 1, 2.]) * this_rank
...     result = all_to_all_v_c(output, tensor, [0, 3, 3, 0])
...     print(output)
rank 0:
[0. 1. 2]
rank 1:
[0. 0. 0]