mindscience.distributed.manager.initialize_parallel

mindscience.distributed.manager.initialize_parallel(tensor_parallel_size=1, context_parallel_size=1, order='tp-cp-dp')[source]

Initialize parallel communication groups for distributed training.

This function creates and initializes orthogonal communication groups used by different model parallelisms (tensor, context, and data) in distributed training. It sets up backend communication groups so that code can query group sizes, ranks and names for each parallelism. The distributed backends required by MindSpore communication services should be initialized before call this function.

Parameters

tensor_parallel_size (int, optional) – Size of tensor parallelism. Default: 1.
context_parallel_size (int, optional) – Size of context parallelism. Default: 1.
order (str, optional) – A dash-separated string specifying the ordering of dimensions when computing orthogonal partitions, e.g. "tp-cp-dp". The order determines how the world ranks are decomposed into multi-dimensional indices used to form groups. Default: "tp-cp-dp".

Raises

RuntimeError – If world_size is not divisible by the product of parallel sizes.