mindspore.mint.distributed.reduce_scatter_tensor_uneven

mindspore.mint.distributed.reduce_scatter_tensor_uneven(output, input, input_split_sizes=None, op=ReduceOp.SUM, group=None, async_op=False)[source]

Reduce tensors from the specified communication group and scatter to the output tensor according to input_split_sizes.

Note

The input tensor must have identical shape and format across all processes.
The first dimension of input tensor should equal to the sum of input_split_sizes.
Only support PyNative mode, Graph mode is not currently supported.

Parameters

output (Tensor) – the output tensor has the same dtype as input with a shape of \((input\_split\_sizes[rank], *)\), where rank is the local rank id of the device.
input (Tensor) – The input tensor to be reduced and scattered, Expected shape \((N, *)\), where * means any number of additional dimensions. N must equal the sum of input_split_sizes across ranks.
input_split_sizes (list[int], optional) – List specifying how to split the first dimension of input tensor. If None, splits evenly according to group size. Default: None.
op (str, optional) – Specifies an operation used for element-wise reductions, One of ReduceOp: 'SUM', 'MIN', 'MAX'. Default: ReduceOp.SUM.
group (str, optional) – The communication group to work on. If None, which means "hccl_world_group" in Ascend. Default: None.
async_op (bool, optional) – Whether this operator should be an async operator. Default: False.

Returns

CommHandle, CommHandle is an async work handle, if async_op is set to True. CommHandle will be None, when async_op is False.

Raises

ValueError – If the shape of output does not match the constraints of input_split_sizes.
RuntimeError – If device target is invalid, or backend is invalid, or distributed initialization fails.

Supported Platforms:: Ascend

Examples

Note

Before running the following examples, you need to configure the communication environment variables.

For Ascend devices, it is recommended to use the msrun startup method without any third-party or configuration file dependencies. Please see the msrun start up for more details.

This example should be run with 2 devices.

>>> import mindspore as ms
>>> from mindspore import Tensor
>>> from mindspore.mint.distributed import init_process_group, get_rank
>>> from mindspore.mint.distributed import reduce_scatter_tensor_uneven
>>> import numpy as np
>>>
>>> ms.set_device(device_target="Ascend")
>>> init_process_group()
>>> input_tensor = Tensor(np.ones([5, 8]).astype(np.float32))
>>> if get_rank() == 0:
...     output_tensor = Tensor(np.ones([2, 8]).astype(np.float32))
... else:
...     output_tensor = Tensor(np.ones([3, 8]).astype(np.float32))
>>> input_split_sizes = [2, 3]
>>> output = reduce_scatter_tensor_uneven(output_tensor, input_tensor, input_split_sizes)
>>> print(output_tensor)
rank 0:
[[2. 2. 2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2. 2. 2.]]
rank 1:
[[2. 2. 2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2. 2. 2.]
 [2. 2. 2. 2. 2. 2. 2. 2.]]