mindspore.transform_checkpoints

mindspore.transform_checkpoints(src_checkpoints_dir, dst_checkpoints_dir, ckpt_prefix, src_strategy_file=None, dst_strategy_file=None)[source]

Transform distributed checkpoint from source sharding strategy to destination sharding strategy for a rank. For more details about converting distributed Checkpoint, please refer to Distributed Resilience Training and Inference.

Note

The src_checkpoints_dir directory structure should be organized like “src_checkpoints_dir/rank_0/a.ckpt”, the rank number should be set to a subdirectory and the checkpoint file is stored in this subdirectory. If multiple files exist in a rank directory, the last file in the lexicgraphic order would be selected.

Parameters
  • src_checkpoints_dir (str) – The source checkpoints directory.

  • dst_checkpoints_dir (str) – The destination checkpoints directory to save the converted checkpoints.

  • ckpt_prefix (str) – The destination checkpoint name prefix.

  • src_strategy_file (str) – Name of source sharding strategy file which saved by ‘mindspore.set_auto_parallel_context(strategy_ckpt_save_file)’. when the ‘src_strategy_file’ is None, it means that the source sharding strategy is without any sharing for each parameter. Default:None.

  • dst_strategy_file (str) – Name of destination sharding strategy file which saved by ‘mindspore.set_auto_parallel_context(strategy_ckpt_save_file)’. when the ‘dst_strategy_file’ is None, it means that the destination sharding strategy is without any sharing for each parameter. Default:None.

Raises
  • ValueErrorsrc_strategy_file or dst_strategy_file is incorrect.

  • NotADirectoryErrorsrc_checkpoints_dir or dst_checkpoints_dir is not a directory.

  • ValueError – The checkpoint file is missing in src_checkpoints_dir.

  • TypeErrorsrc_strategy_file or dst_strategy_file is not a string.

Examples

>>> transform_checkpoints(src_checkpoints_dir, dst_checkpoints_dir, "dst_checkpoint",
...                       "./src_strategy.ckpt", "./dst_strategy.ckpt")