Safetensors Weights
Overview
Safetensors is a reliable and portable machine learning model storage format from Huggingface for storing Tensors securely and with fast storage (zero copies). This article focuses on how MindSpore Transformers supports saving and loading of this file format to help users use weights better and faster.
Safetensors Weights Samples
There are two main types of Safetensors files: complete weights files and distributed weights files. Below are examples of how they are obtained and the corresponding files.
Complete Weights
Safetensors complete weights can be obtained in two ways:
Download directly from Huggingface.
After MindSpore Transformers distributed training, the weights are generated by merge script.
Huggingface Safetensors example catalog structure is as follows:
qwen2_7b
└── hf_unified_safetenosrs
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
└── model.safetensors.index.json # Huggingface weight parameter and file storage relationship mapping json file
MindSpore Safetensors example catalog structure is as follows:
qwen2_7b
└── ms_unified_safetenosrs
├── model-00001-of-00004.safetensors
├── model-00002-of-00004.safetensors
├── model-00003-of-00004.safetensors
├── model-00004-of-00004.safetensors
├── hyper_param.safetensors # Hyperparameter files for training task records
└── param_name_map.json # MindSpore weight parameter and file storage relationship mapping json file
Distributed Weights
Safetensors distributed weights can be obtained in two ways:
Generated by distributed training with MindSpore Transformers.
Using format conversion script, the original distributed ckpt weights are changed to the Safetensors format.
Distributed Safetensors example catalog structure is as follows:
qwen2_7b
└── distributed_safetenosrs
├── rank_0
└── qwen2_7b_rank_0.safetensors
├── rank_1
└── qwen2_7b_rank_1.safetensors
...
└── rank_x
└── qwen2_7b_rank_x.safetensors
Weight Saving
Overview
In the training process of deep learning models, saving the model weights is a crucial step. The weight saving function allows us to store the model parameters at any stage of training, so that users can restore, continue training, evaluate or deploy after training is interrupted or completed. At the same time, by saving weights, experimental results can be reproduced in different environments.
Currently, MindSpore TransFormer supports reading and saving weight files in the safetensors format.
Directory Structure
During the training process, MindSpore Transformers will generate a weight saving folder: checkpoint
in the output directory (same as training log, default is ./output
).
If the configuration item save_network_params:True
is set in yaml file, an additional weight saving folder checkpoint_network
will be generated.
Folder |
Description |
---|---|
checkpoint |
Save model weights, optimizer state, step and epoch in safetensors files, which can be used to restore training from breakpoints. |
checkpoint_network |
Only the model weight parameters are saved in the safetensors file, which is suitable for subsequent fine-tuning, reasoning, and evaluation. It does not support breakpoint continuation. |
checkpoint Directory Structure
Take an 8-rank task as an example, the weight files in the output
folder are saved in the following format:
output
├── checkpoint
├── rank_0
├── meta.json
└── {prefix}-{epoch}_{step}.safetensors
...
└── rank_7
├── meta.json
└── {prefix}-{epoch}_{step}.safetensors
└──checkpoint_network
├── rank_0
└── {prefix}-{epoch}_{step}.safetensors
...
└── rank_7
└── {prefix}-{epoch}_{step}.safetensors
Weight-related File Description
File |
Description |
---|---|
meta.json |
Records the |
{prefix}-{epoch}_{step}.safetensors |
The saved weight file, |
Configuration and Usage
YAML Parameter Configuration
Users can control the weight saving behavior by modifying the configuration file. The following are the main parameters:
Users can modify the fields under CheckpointMonitor
in the yaml
configuration file to control the weight saving behavior.
Taking DeepSeek-V3
pre-training yaml as an example, the following configuration can be made:
# callbacks
callbacks:
...
- type: CheckpointMonitor
prefix: "deepseekv3"
save_checkpoint_steps: 1000
keep_checkpoint_max: 5
save_network_params: False
integrated_save: False
async_save: False
checkpoint_format: "safetensors"
...
The meaning of this configuration is: save the safetensors weights every 1000 steps, store up to 5 weights at the same time, do not merge and save the split Tensor in parallel scenarios, and do not use asynchronous method to save weight files.
The main parameters concerning the preservation of the weight configuration are listed in the following table:
Parameter names |
Descriptions |
Description of values |
---|---|---|
prefix |
The prefix name of the model weights file, which can be used to refer to the model name. |
(str, optional) - Default: |
save_checkpoint_steps |
Save the weights several steps of training. |
(int, optional) - Default: |
keep_checkpoint_max |
The maximum number of weight files that can be saved at the same time, and when the limit is reached the oldest weight file will be deleted when the weights are saved. |
(int, optional) - Default: |
integrated_save |
Whether to merge and save split Tensors in parallel scenarios. The merge and save feature is only supported in automatic parallel scenarios, not in manual parallel scenarios. |
(bool, optional) - Default: |
async_save |
Whether to save safetensors files asynchronously. |
(bool, optional) - Asynchronous threads are used by default when |
checkpoint_format |
The format of the output file, which needs to be configured as |
(str, optional) - Format in which model weights are saved. Supports |
remove_redundancy |
Whether redundancy is removed when saving model weights. |
(bool, optional) - Default: |
save_network_params |
Whether to additionally save only network parameters. |
(bool, optional) - Whether to additionally save only network parameters. Default: |
If you want to know more about CheckpointMonitor, you can refer to CheckpointMonitor API documentation.
Weight Loading
Overview
MindSpore Transformers supports training, inference, and resumable training in a full range of scenarios with single and multiple cards, including full weights and distributed weights. Please refer to the following instructions to adjust the configuration for the corresponding scenarios.
Configuration Description
Parameter names |
Descriptions |
---|---|
load_checkpoint |
The path to the folder where the weights are preloaded. |
load_ckpt_format |
The format of the loaded model weights, optionally |
use_parallel |
Whether to load in parallel. |
auto_trans_ckpt |
Whether to enable the online slicing function. |
Complete Weight Loading
Single-card Loading
# configuration file
load_checkpoint: '/qwen2_7b/unified_safetenosrs' # Load full weights file path
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: False # Full weights + single card loading requires this configuration item to be turned off
use_parallel: False # single card loading
parallel_config: # Configure the target distributed strategy
data_parallel: 1
model_parallel: 1
pipeline_stage: 1
Multi-cards Loading
# configuration file
load_checkpoint: '/qwen2_7b/unified_safetenosrs' # Load full weights file path
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: True # This configuration item needs to be turned on for full weights + distributed loading to turn on online slicing
use_parallel: True # Multi-cards loading
parallel_config: # Configure the target distributed strategy
data_parallel: 1
model_parallel: 4
pipeline_stage: 1
Distributed Weight Loading
Multi-card Loading-Original Slicing Strategy
# configuration file
load_checkpoint: '/output/distributed_safetenosrs' # Load source distributed weights file paths
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: False # Disable the online slicing function
parallel_config: # Configure the target distributed strategy
data_parallel: 2
model_parallel: 4
pipeline_stage: 1
Multi-Card Loading - Changing the Slicing Strategy
# configuration file
load_checkpoint: '/output/distributed_safetenosrs' # Load source distributed weights file paths
src_strategy_path_or_dir: '/output/src_strategy' # Load source strategy file for merging source distributed weights into full weights
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: True # Able the online slicing function
parallel_config: # Configure the target distributed strategy
data_parallel: 4
model_parallel: 2
pipeline_stage: 1
In large cluster scale scenarios, to avoid the online merging process taking too long and occupying training resources, it is recommended to pass in the original distributed weights file after merge complete weights offline, when there is no need to pass in the path of the source cut-partitioning strategy file.
Special Scenarios
Physical Machine Multi-machcine Multi-card Training
Large-scale models usually need to be trained by clusters of multiple servers. Weight slicing conversion needs to rely on the target slicing strategy file after the compilation is completed. In this multi-machine and multi-card scenario, if there is a shared disk between servers and the generated strategy file is in the same directory, you can use the automatic conversion function; if there is no shared disk between servers, you need to manually copy the strategy file and then carry out the conversion function. The following is an example of two servers and 16 cards training.
Scenario 1: There are shared disks between servers
In scenarios where there are shared disks between servers, you can use MindSpore Transformers Auto-Weight Conversion feature to automatically perform weight conversion prior to multi-computer, multi-card training. Assuming that /data
is a shared disk on the server and the project code for MindSpore Transformers is located under the data/mindformers
path.
Parameter Configuration:
output_dir: './output' # The strategy file is generated under ./output/strategy, which is used to slice the weights online.
load_checkpoint: '/qwen2_7b/unified_safetenosrs' # Load full weights file path
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: True # This configuration item needs to be turned on for full weights + distributed loading to turn on online slicing
train_dataset: &train_dataset
data_loader:
type: MindDataset
dataset_dir: "/worker/dataset/wiki103/"
shuffle: True
parallel_config: # Configuring a 16-card distributed strategy (for information only)
data_parallel: 2
model_parallel: 4
pipeline_stage: 2
micro_batch_num: 2
vocab_emb_dp: True
gradient_aggregation_group: 4
micro_batch_interleave_num: 1
Initiating tasks:
Use mindformers/scripts/msrun_launcher.sh to initiate tasks.
# The first server (master node)
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config {CONFIG_PATH} \
--run_mode train" \
16 8 ${ip} ${port} 0 output/msrun_log False 300
# The second server (sub-node)
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config {CONFIG_PATH} \
--run_mode train" \
16 8 ${ip} ${port} 1 output/msrun_log False 300
Scenario 2: No shared disks between servers
In the case where there is no shared disk between servers, you need to perform an offline merge and forward operation on the generated strategy files before enabling the online slicing function. The following steps describe how to perform this operation and start a multi-machine, multi-card training task.
1.Getting Distributed Strategies
Before performing the offline weight conversion, it is first necessary to obtain the distributed policy files of each node.
# Set only_save_strategy to True to get a distributed strategy file, which is generated and the task exits automatically
only_save_strategy: True
# Configure dataset paths
train_dataset: &train_dataset
data_loader:
type: MindDataset
dataset_dir: "/worker/dataset/wikitext_2048/"
shuffle: True
# Configure 16-card distributed strategy (for information only)
parallel_config:
data_parallel: 2
model_parallel: 4
pipeline_stage: 2
micro_batch_num: 2
vocab_emb_dp: True
gradient_aggregation_group: 4
micro_batch_interleave_num: 1
The strategy files for each node will be saved separately in their respective output/strategy
directories. For example, node 0 will save only the ckpt_strategy_rank_0-7.ckpt
file and node 1 will save only the ckpt_strategy_rank_8-15.ckpt
file. Subsequently, the strategy files of all nodes need to be centralized on the same server for subsequent operations, and the directories and files after centralization are as follows.
output
├── strategy
├── ckpt_strategy_rank_0.ckpt
...
├── ckpt_strategy_rank_7.ckpt
├── ckpt_strategy_rank_8.ckpt
...
└── ckpt_strategy_rank_15.ckpt
2. Merging Distributed Strategy
Call the strategy merge interface to merge all strategy files after centralization into one file for subsequent weight slicing.
import mindspore as ms
ms.parallel.merge_pipeline_strategys("/output/strategy", "/output/merged_strategy/dst_strategy.ckpt")
3.Weight Slice Loading
Distribute strategy files + online slicing (recommended):
Distribute the merged strategy file dst_strategy.ckpt
to each node under the . /output/merged_strategy/
directory, turn on auto-slicing, and pull up the training task again. The configuration file for each node needs to be modified.
output_dir: './output' # Make sure that each node under ./output/merged_strategy/ has the merged strategy file
load_checkpoint: '/qwen2_7b/unified_safetenosrs' # Load full weights file path
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: True # This configuration item needs to be turned on for full weights + distributed loading to turn on online slicing
Offline slicing + distributing distributed weights:
According to the weight slicing guide, the full weights are first sliced offline into distributed weights files, which are then distributed to each machine, the automatic slicing is turned off, and load_checkpoint
is configured as the distributed weights path. Each node's configuration file needs to be modified.
Because distributed weight files are generally larger than strategy files and distribution operations are more time-consuming, the first approach is more recommended.
load_checkpoint: '/output/distributed_safetenosrs' # Load distributed weights file path
load_ckpt_format: 'safetensors' # Load weight file format
auto_trans_ckpt: False # Distributed weight loading with online slicing turned off
4. Initiating tasks:
Use mindformers/scripts/msrun_launcher.sh to initiate tasks.
# The first server (master node)
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config {CONFIG_PATH} \
--run_mode train" \
16 8 ${ip} ${port} 0 output/msrun_log False 300
# The second server (sub-node)
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config {CONFIG_PATH} \
--run_mode train" \
16 8 ${ip} ${port} 1 output/msrun_log False 300
Weight Features
De-redundant Saving and Loading
Currently when MindSpore Transformers saves weights, by default it duplicates multiple consistent weight files in the dp/opt domain, resulting in additional storage overhead and burden. The following configuration and usage methods can be used to realize dp/opt de-redundant saving and loading, effectively reducing the storage pressure under large-scale clusters of thousands of cards and above. This feature is only effective under distributed weights, and complete weights do not involve de-redundancy.
The following configuration is enabled when saved:
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
remove_redundancy: True # Turn on de-redundancy when saving weights
The saved distributed weights are of different sizes, and the total weight file is smaller than that before the de-redundancy feature is turned on:
output
├── checkpoint
├── rank_0
└── example-1_1.safetensors #file size:5.2G
├── rank_1
└── example-1_1.safetensors #file size:5.2G
...
├── rank_6
└── example-1_1.safetensors #file size:4.1G
└── rank_7
└── example-1_1.safetensors #file size:4.1G
Turn on the following configuration when loading:
load_ckpt_format: 'safetensors' # Load weight file format
remove_redundancy: True # Turn on de-redundancy when loading weights
MindSpore Transformers version 1.5.0 and below may cause accuracy anomalies when the saved and loaded configuration items for de-redundancy are not the same, please make sure the configuration is correct. Version 1.5.0 and above will automatically identify and load the weights based on whether they are de-redundant or not, so you don't need to pay attention to the loaded configuration.
Loading Hugging Face safetensors
By adding the pretrained_model_dir field in the configuration file, specify a folder directory that stores all model files downloaded from Hugging Face (including config. json, tokenizer, weight files, etc.), and then directly instantiated the model configuration and tokenizer, loading Hugging Face weights.
Taking Qwen3 as an example, the meaning of the fields configured in the YAML configuration file is as follows: the folder directory specified in pretrained_model_dir stores the Qwen3 model configuration file, tokenizer file, and weight file on Hugging Face.
use_legacy: False
load_checkpoint : ''
pretrained_model_dir: "/path/qwen3"
model:
model_config:
compute_dtype: "bfloat16"
layernorm_compute_dtype: "float32"
softmax_compute_dtype: "float32"
rotary_dtype: "bfloat16"
params_dtype: "bfloat16"
generation:
max_length: 30
Parameter Descriptions:
use_legacy - This parameter is set to False to enable Hugging Face loading
load_checkpoint - User defined weight loading path, high priority
pretrained_model_dir - Hugging Face weight, low priority
The priority for selecting the weight path of load_checkpoint
is high. When configuring this parameter, the weight files in the path of pretrained_model_dir
will not be loaded.
When load_checkpoint
is not configured, if there are safetensor weight files in the path 'pretrained_model_dir', it will be loaded. If it does not exist, the weights will be randomly initialized.
This feature currently only supports Qwen3 series and DeepSeek V3 series models in fine-tuning/inference scenarios, and is being continuously updated.
Weight Slicing and Merging
Overview
In the current distributed training and inference environment, when users need to change the distributed strategy, they need to merge the existing distributed weights into the complete weights before completing the weight loading by online slicing/offline slicing. In order to meet the needs of weight conversion in different scenarios, you can refer to the following scripts and interfaces to realize the functions of weight multi-card merging single card and single card slicing multi-card.
Weight Merging
Usage Directions
Use the safetensors weights merging script provided by MindSpore Transformers to perform safetensors weight merging as follows. The format of the merged weights is complete-weights.
python toolkit/safetensors/unified_safetensors.py \
--src_strategy_dirs src_strategy_path_or_dir \
--mindspore_ckpt_dir mindspore_ckpt_dir\
--output_dir output_dir \
--file_suffix "1_1" \
--has_redundancy has_redundancy
Parameter Descriptions
src_strategy_dirs: The path to the distributed strategy file corresponding to the source weights, usually saved by default in the
output/strategy/
directory after starting the training task. Distributed weights need to be filled in according to the following:Source weights turn on pipeline parallelism: The weight conversion is based on the merged strategy files, fill in the path to the distributed strategies folder. The script will automatically merge all
ckpt_strategy_rank_x.ckpt
files in the folder and generatemerged_ckpt_strategy.ckpt
in the folder. Ifmerged_ckpt_strategy.ckpt
already exists, you can just fill in the path to that file.Source weights turn off pipeline parallelism: The weight conversion can be based on any of the strategy files, just fill in the path to any of the
ckpt_strategy_rank_x.ckpt
files.
Note: If
merged_ckpt_strategy.ckpt
already exists in the strategy folder and the folder path is still passed in, the script will first delete the oldmerged_ckpt_strategy.ckpt
and merge it to create a newmerged_ckpt_strategy.ckpt
for weight conversion. Therefore, make sure that the folder has sufficient write permissions, otherwise the operation will report an error.mindspore_ckpt_dir: Distributed weights path, please fill in the path of the folder where the source weights are located, the source weights should be stored in
model_dir/rank_x/xxx.safetensors
format, and fill in the folder path asmodel_dir
.output_dir: The path where the target weights will be saved. The default value is "/new_llm_data/******/ckpt/nbg3_31b/tmp", i.e., the target weights will be placed in the
/new_llm_data/******/ckpt/nbg3_31b/tmp
directory.file_suffix: The naming suffix of the target weights file. The default value is "1_1", i.e. the target weights will be looked up in the
*1_1.safetensors
format.has_redundancy: Whether the merged source weights are redundant weights, defaults to
True
.filter_out_param_prefix: You can customize the parameters to be filtered out when merging weights, and the filtering rules are based on prefix name matching. For example, optimizer parameter "adam_".
max_process_num: Maximum number of processes to merge. Default value: 64.
Samples
Scenario one:
If merging to remove redundant safetensors weights, you can fill in the parameters as follows:
python toolkit/safetensors/unified_safetensors.py \
--src_strategy_dirs src_strategy_path_or_dir \
--mindspore_ckpt_dir mindspore_ckpt_dir\
--output_dir output_dir \
--file_suffix "1_1" \
--has_redundancy False
Scenario two:
If merge filtering the Adam optimizer's safetensors weights, you can fill in the parameters as follows:
python toolkit/safetensors/unified_safetensors.py \
--src_strategy_dirs src_strategy_path_or_dir \
--mindspore_ckpt_dir mindspore_ckpt_dir\
--output_dir output_dir \
--file_suffix "1_1" \
--filter_out_param_prefix "adam_"
Weight Slicing
Usage Directions
Use strategy merging interface and slicing saving interface provided by MindSpore. The safetensors weights are sliced and saved offline as follows. The format of the sliced weights is distributed weights.
import mindspore as ms
# step1: Merge target slicing strategy document
ms.parallel.merge_pipeline_strategys("output/strategy", "output/merged_strategy/dst_strategy.ckpt")
# step2: Based on the merged target slicing strategy and the complete weights, the weights are sliced and saved as distributed weights
ms.load_distributed_checkpoint(
network=None,
predict_strategy='output/merged_strategy/dst_strategy.ckpt',
unified_safetensors_dir='/path/unified_safetensors',
dst_safetensors_dir='/path/distributed_safetensors',
format='safetensors',
max_process_num=64
)
Parameter Descriptions
network (Cell) - Distributed Predictive Network, when format is safetensors, network is passed as None, at which point the interface executes the save mode.
predict_strategy (Union[dict, str]) - The target slice strategy file. Default:
None
.unified_safetensors_dir (str) - Directory of complete weights files. Default:
None
.dst_safetensors_dir (str) - The save directory for the weights in the save mode scenario.
max_process_num (int) - Maximum number of processes. Default: 64.
Weights Format Conversion
Converting Ckpt ot Safetensors
MindSpore Transformers stock weights file is in ckpt format, which can be formatted into safetensors file in the following two ways.
Interface Calling
Call Mindspore format conversion interface to implement.
import mindspore as ms
ms.ckpt_to_safetensors("./ckpt_save_path/rank0/checkpoint_0.ckpt", "./output/safetensors_path/")
#Parameter descriptions
#file_path (str) - Path to directory containing checkpoint files or path to individual checkpoint files (.ckpt)
#save_path (str, optional) - Path to the directory where safetensors files are stored. Default: None
Training Tasks
The MindSpore Transformers training task is started after adjusting the configuration file, and the conversion is achieved by loading in ckpt format and saving in safetensor format on a trial basis.
load_checkpoint: 'output/checkpoint/' # Load weights file path
load_ckpt_format: 'ckpt' # Load weight file format为ckpt
callbacks:
- type: CheckpointMonitor
checkpoint_format: 'safetensors' # Save the weights file format as safetensor
Usage Example
Examples of Training Tasks
If you use the full weighted multicard online fine-tuning, take the Qwen2.5-7B model as an example and modify the configuration item finetune_qwen2_5_7b_8k.yaml:
# Modified configuration
load_checkpoint: '/qwen2.5_7b/hf_unified_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
auto_trans_ckpt: True # This configuration item needs to be turned on for complete weights to enable the online slicing feature
parallel_config: # Configure the target distributed strategy
data_parallel: 2
model_parallel: 4
pipeline_stage: 1
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
If you use distributed weights multicard online fine-tuning, take the Qwen2.5-7B model as an example, modify the configuration item finetune_qwen2_5_7b_8k.yaml:
# Modified configuration
load_checkpoint: '/qwen2.5_7b/distributed_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
parallel_config: # Configure the target distributed strategy
data_parallel: 2
model_parallel: 4
pipeline_stage: 1
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
Execute the command when completed:
bash scripts/msrun_launcher.sh "run_mindformer.py \
--config research/qwen2_5/finetune_qwen2_5_7b_8k.yaml \
--train_dataset_dir /{path}/alpaca-data.mindrecord \
--register_path research/qwen2_5 \
--use_parallel True \
--run_mode finetune" 8
After the task is executed, a checkpoint folder is generated in the mindformers/output directory, while the model files are saved in that folder.
For more details, please refer to Introduction to SFT fine-tuning and Introduction to Pre-training.
Example of an Inference Task
If you use complete weighted multicard online inference, take the Qwen2.5-7B model as an example, and modify the configuration item predict_qwen2_5_7b_instruct.yaml:
# Modified configuration
load_checkpoint: '/qwen2.5_7b/hf_unified_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
auto_trans_ckpt: True # This configuration item needs to be turned on for complete weights to enable the online slicing function
parallel_config:
data_parallel: 1
model_parallel: 2
pipeline_stage: 1
If you use distributed weighted multicard online inference, take the Qwen2.5-7B model as an example, modify the configuration item predict_qwen2_5_7b_instruct.yaml:
# Modified configuration
load_checkpoint: '/qwen2.5_7b/distributed_safetenosrs' # Load weights file path
load_ckpt_format: 'safetensors' # Load weights file format
parallel_config:
data_parallel: 1
model_parallel: 2
pipeline_stage: 1
Execute the command when completed:
bash scripts/msrun_launcher.sh "python run_mindformer.py \
--config research/qwen2_5/predict_qwen2_5_7b_instruct.yaml \
--run_mode predict \
--use_parallel True \
--register_path research/qwen2_5 \
--predict_data 'I love Beijing, because'" \
2
The results of executing the above single-card inference and multi-card inference commands are as follows:
'text_generation_text': [I love Beijing, because it is a city with a long history and culture.......]
For more details, please refer to: Introduction to Inference
Examples of Resumable Training after Breakpoint Tasks
MindSpore Transformers supports step-level resumable training after breakpoint, which allows you to save a model's checkpoints during training and load the saved checkpoints to restore the previous state to continue training after a break in training.
If you use distributed weight multicard resumable training and do not change the slicing strategy, modify the configuration item and start the original training task:
# Modified configuration
load_checkpoint: '/output/checkpoint' # Load source distributed weights file path
load_ckpt_format: 'safetensors' # Load weights file format
resume_training: True # Resumable training after breakpoint switch
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
If the distributed weight multi-card training is renewed and the slicing strategy is changed, it is necessary to pass in the path of the source slicing strategy file and start the original training task after modifying the configuration items:
# Modified configuration
load_checkpoint: '/output/checkpoint' # Load source distributed weights file path
src_strategy_path_or_dir: '/output/src_strategy' # Load source strategy file for merging source distributed weights into full weights
load_ckpt_format: 'safetensors' # Load weights file format
auto_trans_ckpt: True # Enable online slicing
resume_training: True # Resumable training after breakpoint switch
parallel_config: # Configure the target distributed strategy
data_parallel: 2
model_parallel: 4
pipeline_stage: 1
callbacks:
- type: CheckpointMonitor
checkpoint_format: safetensors # Save weights file format
For more details, please refer to: Introduction to Breakpoints.