Distributed Parallel Startup Methods
Startup Method
Currently GPU, Ascend and CPU support multiple startup methods respectively, three of which are msrun, dynamic cluster, mpirun:
msrun: msrun is the capsulation of Dynamic cluster. It allows user to launch distributed jobs using one single command in each node. It could be used after MindSpore is installed. This method does not rely on third-party libraries and configuration files, has disaster recovery function, good security, and supports three hardware platforms. It is recommended that users prioritize the use of this startup method.
Dynamic cluster: dynamic cluster requires user to spawn multiple processes and export environment variables. It's the implementation of msrun. Use this method when running Parameter Server training mode. For other distributed jobs, msrun is recommended.
mpirun: this method relies on the open source library OpenMPI, and startup command is simple. Multi-machine need to ensure two-by-two password-free login. It is recommended for users who have experience in using OpenMPI to use this startup method.
Warning
rank_table method has been deprecated in MindSpore 2.4 version.
The hardware support for the four startup methods is shown in the table below:
GPU |
Ascend |
CPU |
|
|---|---|---|---|
|
Support |
Support |
Support |
Dynamic cluster |
Support |
Support |
Support |
|
Support |
Support |
Not support |