Start Tasks

Overview

MindSpore Transformers provides a one-click startup script run_mindformer.py and a distributed task launch script msrun_launcher.sh.

The run_mindformer.py script is used to start tasks on a single device, providing one-click capabilities for pre-training, fine-tuning, and inference tasks.
The msrun_launcher.sh script is used to start distributed tasks on multi-device within a single node or multi-device with multi-node, launching tasks on each device through the msrun tool.

Run_mindformer One-click Start Script

In the root directory of the MindSpore Transformers code, execute the run_mindformer.py script using Python to start the task. The supported parameters of the script are as follows. When an optional parameter is not set or is set to None, the configuration with the same name in the YAML configuration file will be taken.

Basic Parameters

Parameters	Parameter Descriptions	Value Description	Applicable Scenarios
`--config`	YAML config files.	str, required	pre-train/finetune/predict
`--mode`	Set the backend execution mode.	int, optional, `0` is GRAPH_MODE and `1` is PYNATIVE_MODE. Currently, only GRAPH_MODE is supported.	pre-train/finetune/predict
`--device_id`	Set the execution device ID. The value must be within the range of available devices.	int, optional	pre-train/finetune/predict
`--device_target`	Set the backend execution device. MindSpore Transformers is only supported on `Ascend` devices.	str, optional	pre-train/finetune/predict
`--run_mode`	Set the running mode of the model: `train`, `finetune` or `predict`.	str, optional	pre-train/finetune/predict
`--load_checkpoint`	File or folder paths for loading weights. For detailed usage, please refer to Weight Conversion Function	str, optional	pre-train/finetune/predict
`--use_parallel`	Whether to use parallel mode.	bool, optional	pre-train/finetune/predict
`--options`	Override some settings in the used config, the key-value pair in xxx=yyy format will be merged into config file. This parameter has been deprecated and will be removed in the next version.	str, optional	pre-train/finetune/predict
`--output_dir`	Set the paths for saving logs, weights, sharding strategies, and other files.	str, optional	pre-train/finetune/predict
`--register_path`	The absolute path of the directory where the external code is located. For example, the model directory under the research directory.	str, optional	pre-train/finetune/predict
`--remote_save_url`	Remote save url, where all the output files will be transferred and stored in here. This parameter has been deprecated and will be removed in the next version.	str, optional	pre-train/finetune/predict
`--seed`	Set the global seed. For details, refer to mindspore.set_seed.	int, optional	pre-train/finetune/predict
`--trust_remote_code`	Whether Hugging Face AutoTokenizer trusts remote code.	bool, optional	pre-train/finetune/predict

Weight Slicing

Parameters	Parameter Descriptions	Value Description	Applicable Scenarios
`--src_strategy_path_or_dir`	The strategy of load_checkpoint.	str, optional	pre-train/finetune/predict
`--auto_trans_ckpt`	Enable online weight automatic conversion. Refer to Weight Conversion Function.	bool, optional	pre-train/finetune/predict
`--transform_process_num`	The number of processes responsible for checkpoint transform.	int, optional	pre-train/finetune/predict
`--only_save_strategy`	Whether to only save the strategy files.	bool, optional, when it is `true`, the task exits directly after saving the strategy file.	pre-train/finetune/predict
`--strategy_load_checkpoint`	The path to the distributed strategy file to be loaded. This parameter has been deprecated and will be removed in the next version.	str, optional	pre-train/finetune/predict

Training

Parameters	Parameter Descriptions	Value Description	Applicable Scenarios
`--do_eval`	Whether to evaluate in training process. This parameter has been deprecated and will be removed in the next version.	bool, optional	pre-train/finetune
`--eval_dataset_dir`	Dataset directory of data loader to eval. This parameter has been deprecated and will be removed in the next version.	str, optional	pre-train/finetune
`--train_dataset_dir`	Dataset directory of data loader to pre-train/finetune.	str, optional	pre-train/finetune
`--resume_training`	Enable resumable training after breakpoint. For details, refer to Resumable Training After Breakpoint.	bool, optional	pre-train/finetune
`--profile`	Whether to use profile analysis. This parameter has been deprecated and will be removed in the next version.	bool, optional	pre-train/finetune
`--epochs`	Train epochs.	int, optional	pre-train/finetune
`--batch_size`	The sample size of the batch data.	int, optional	pre-train/finetune
`--gradient_accumulation_steps`	The number of gradient accumulation steps.	int, optional	pre-train/finetune
`--sink_mode`	Whether to use sink mode. This parameter has been deprecated and will be removed in the next version.	bool, optional	pre-train/finetune
`--num_samples`	Number of datasets samples used.	int, optional	pre-train/finetune

Inference

Parameters	Parameter Descriptions	Value Description	Applicable Scenarios
`--predict_data`	Input data for inference.	str, optional, It can be the input for predict (single-batch predict) or the file path of a txt file containing multiple lines of text (multi-batch predict).	predict
`--modal_type`	Modal type of input data for predict. This parameter has been deprecated and will be removed in the next version.	str, optional	predict
`--adapter_id`	LoRA ID for predict. This parameter has been deprecated and will be removed in the next version.	str, optional	predict
`--predict_batch_size`	The batch size for multi-batch inference.	int, optional	predict
`--do_sample`	Whether to use random sampling when selecting tokens for inference.	bool, optional, `True` means using sampling encoding, `False` means using greedy decoding.	predict

Distributed Task Pull-up Script

The distributed task pull up script msrun_launcher.sh is located in the scripts/ directory and can automatically start distributed multiprocess tasks using the msrun command based on the input parameters. This script has the following several usage methods:

For Default 8 Devices In Single Machine:

bash msrun_launcher.sh [EXECUTE_ORDER]

For Quick Start On Multiple Devices In Single Machine:

bash msrun_launcher.sh [EXECUTE_ORDER] [WORKER_NUM]

For Multiple Devices In Single Machine:

bash msrun_launcher.sh [EXECUTE_ORDER] [WORKER_NUM] [MASTER_PORT] [LOG_DIR] [JOIN] [CLUSTER_TIME_OUT]

For Multiple Devices In Multiple Machines:

bash msrun_launcher.sh [EXECUTE_ORDER] [WORKER_NUM] [LOCAL_WORKER] [MASTER_ADDR] [MASTER_PORT] [NODE_RANK] [LOG_DIR] [JOIN] [CLUSTER_TIME_OUT]

The parameter descriptions of the script are as follows:

Parameters	Parameter Descriptions	Value Description
`EXECUTE_ORDER`	The parameters of the Python script command to be executed in a distributed manner.	str, required, set it to a string containing the Python script to be executed and the script parameters
`WORKER_NUM`	The total number of Worker processes participating in the distributed task.	int, optional, default: `8`
`LOCAL_WORKER`	The number of Worker processes pulled up on the current node.	int, optional, default: `8`
`MASTER_ADDR`	Specifies the IP address or hostname of the Scheduler.	str, optional, default: `"127.0.0.1"`
`MASTER_PORT`	Specifies the Scheduler binding port number.	int, optional, default: `8118`
`NODE_RANK`	The index of the current node.	int, optional, default: `0`
`LOG_DIR`	Worker, and Scheduler log output paths.	str, optional, default: `"output/msrun_log"`
`JOIN`	Whether msrun waits for the Worker as well as the Scheduler to exit.	bool, optional, default: `False`
`CLUSTER_TIME_OUT`	Cluster networking timeout in seconds.	int, optional, default: `7200`

Task Startup Tutorial

Next, taking the fine-tuning of Qwen2.5-0.5B as an example, we will explain the usage of single-device, single-node, and multi-node tasks.

Single-Device

Execute the Python script in the root directory of the MindSpore Transformers code to perform single-device fine-tuning. The path in the command needs to be replaced with the real path.

python run_mindformer.py \
--register_path research/qwen2_5 \
--config research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml \
--use_parallel False \
--run_mode finetune \
--train_dataset_dir ./path/alpaca-data.mindrecord

Single-Node

Execute the msrun startup script in the root directory of the MindSpore Transformers code to perform single-node fine-tuning. The path in the command needs to be replaced with the real path.

bash scripts/msrun_launcher.sh "run_mindformer.py \
 --register_path research/qwen2_5 \
 --config research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml \
 --run_mode finetune \
 --train_dataset_dir ./path/alpaca-data.mindrecord "

Multi-Node

Take Qwen2.5-0.5B as an example to perform 2-node 16-device fine-tuning.

Modify the corresponding config file research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml based on information such as the number of used nodes:
```
parallel_config:
  data_parallel: 16
  ...
```
If the number of nodes and the number of devices are used to change, data_parallel, model_parallel, and pipeline_stage need to be modified to meet the actual number of running devices. device_num=data_parallel×model_parallel×pipeline_stage. Meanwhile, micro_batch_num >= pipeline_stage.

Execute the msrun startup script:

For distributed tasks by executing scripts on multiple nodes and multiple devices, it is necessary to run the scripts on different nodes respectively and set the parameter MASTER_ADDR to the IP address of the main node. The IP addresses set for all nodes are the same, and only the parameter NODE_RANK is different among different nodes.

# Node 0. Set the IP address of node 0 to the value of {master_addr}, which is used as the IP address of the primary node. There are 16 devices in total with 2 devices for each node.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --register_path research/qwen2_5 \
  --config research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml \
  --train_dataset_dir /{path}/wiki4096.mindrecord \
  --run_mode finetune" \
  16 8 {master_addr} 8118 0 output/msrun_log False 300


# Node 1. Set the IP address of node 0 to the value of {master_addr}, which is used as the IP address of the primary node. The startup commands of node 0 and node 1 differ only in the parameter NODE_RANK.
bash scripts/msrun_launcher.sh "run_mindformer.py \
  --register_path research/qwen2_5 \
  --config research/qwen2_5/finetune_qwen2_5_0_5b_8k.yaml \
  --train_dataset_dir /{path}/wiki4096.mindrecord \
  --run_mode finetune" \
  16 8 {master_addr} 8118 1 output/msrun_log False 300