mindspore.ops.communication.all_to_all_single

mindspore.ops.communication.all_to_all_single(output, input, output_split_sizes=None, input_split_sizes=None, group=None, async_op=False)[source]

scatter and gather input with split size to/from all rank, and return result in a single tensor.

Note

Only support PyNative mode, Graph mode is not currently supported.

Parameters

output (Union(Tensor, Tuple(int))) – The output tensor is gathered concatenated from remote ranks, if the function operates in-place. Otherwise, the tensor or shape to indicate the shape of tensor gathered concatenated from remote rank.
input (Tensor) – tensor to be scattered to remote rank.
output_split_sizes (Union(Tuple(int), List(int)), optional) – output split size at dim 0. Default: None, indicating uniform segmentation.
input_split_sizes (Union(Tuple(int), List(int)), optional) – input split size at dim 0. Default: None, indicating uniform segmentation.
group (str, optional) – The communication group to work on. Default: None, which means "hccl_world_group" in Ascend.
async_op (bool, optional) – Whether this operator should be an async operator. Default: False.

Returns

If the function operates in-place, return CommHandle.
If the function operates non in-place, return Tuple(Tensor, CommHandle). The first element stores the output result, and the second element is CommHandle.

Among them, when async_op is True, then CommHandle is an asynchronous working handle; When async_op is False, CommHandle will return None.

Raises

TypeError – If input or output is not Tensor, group is not a str, or async_op is not bool.
ValueError – When input_split_sizes is empty, input dim 0 can not be divided by the number of cards in the communication group.
ValueError – When output_split_sizes is empty, output dim 0 can not be divided by the number of cards in the communication group.

Supported Platforms:: Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies. Please see the msrun startup for more details.

This example should be run with 2 devices.

>>> import numpy as np
>>> import mindspore
>>> from mindspore.ops.communication import init_process_group, get_rank
>>> from mindspore.ops.communication import all_to_all_single
>>> from mindspore import Tensor
>>> from mindspore.ops.communication import zeros
>>>
>>> init_process_group()
>>> this_rank = get_rank()
>>> if this_rank == 0:
...     output = Tensor(np.zeros([3, 3]).astype(np.float32))
...     tensor = Tensor([[0, 1, 2.], [3, 4, 5], [6, 7, 8]])
...     result = all_to_all_single(output, tensor, [2, 1], [2, 1])
...     print(output)
>>> if this_rank == 1:
...     output = Tensor(np.zeros([2, 3]).astype(np.float32))
...     tensor = Tensor([[9, 10., 11], [12, 13, 14]])
...     result = all_to_all_single(output, tensor, [1, 1], [1, 1])
...     print(output)
rank 0:
[[ 0.  1.  2.]
 [ 3.  4.  5.]
 [ 9. 10. 11.]]
rank 1:
[[ 6.  7.  8.]
 [12. 13. 14.]]