Supervised Fine-Tuning (SFT)
Overview
SFT (Supervised Fine-Tuning) adopts the concept of supervised learning, referring to the process of adjusting some or all parameters of a pre-trained model to better adapt it to specific tasks or datasets.
MindSpore Transformers supports two SFT fine-tuning methods: full-parameter fine-tuning and LoRA fine-tuning. Full-parameter fine-tuning involves updating all parameters during training, suitable for large-scale data refinement, offering optimal task adaptability but requiring significant computational resources. LoRA fine-tuning updates only a subset of parameters, consuming less memory and training faster than full-parameter fine-tuning, though its performance may be inferior in certain tasks.
Basic Process of SFT Fine-Tuning
Combining practical operations, SFT fine-tuning can be broken down into the following steps:
1. Weight Preparation
Before fine-tuning, the weight files of the pre-trained model need to be prepared. MindSpore Transformers supports loading safetensors weights, enabling direct loading of model weights downloaded from the Hugging Face model hub.
2. Dataset Preparation
MindSpore Transformers currently supports datasets in Hugging Face format and MindRecord format for the fine-tuning phase. Users can prepare data according to task requirements.
3. Configuration File Preparation
Fine-tuning tasks are uniformly controlled through configuration files, allowing users to flexibly adjust model training hyperparameters. Additionally, fine-tuning performance can be optimized using distributed parallel training, memory optimization features, and other training features.
4. Launching the Training Task
MindSpore Transformers provides a one-click startup script to initiate fine-tuning tasks. During training, logs and visualization tools can be used to monitor the training process.
5. Model Saving
Checkpoints are saved during training, or model weights are saved to a specified path upon completion. Currently, weights can be saved in Safetensors format or Ckpt format, which can be used for resumed training or further fine-tuning.
6. Fault Recovery
To handle exceptions such as training interruptions, MindSpore Transformers offers training high availability like last-state saving and automatic recovery, as well as checkpoint-based resumed training, enhancing training stability.
Full-Parameter Fine-Tuning with MindSpore Transformers
Selecting a Pre-Trained Model
MindSpore Transformers currently supports mainstream large-scale models in the industry. This guide uses the Qwen2.5-7B model as an example.
Downloading Model Weights
MindSpore Transformers supports loading Hugging Face model weights, enabling direct loading of weights downloaded from the Hugging Face model hub. For details, refer to MindSpore Transformers-Safetensors Weights.
Model Name |
Hugging Face Weight Download Link |
---|---|
Qwen2.5-7B |
Dataset Preparation
MindSpore Transformers supports online loading of Hugging Face datasets. For details, refer to MindSpore Transformers-Dataset-Hugging Face Dataset.
This guide uses llm-wizard/alpaca-gpt4-data as the fine-tuning dataset.
Dataset Name |
Applicable Phase |
Download Link |
---|---|---|
llm-wizard/alpaca-gpt4-data |
Fine-Tuning |
Executing the Fine-Tuning Task
Single-NPU Training
First, prepare the configuration file. This guide provides a fine-tuning configuration file for the Qwen2.5-7B model, finetune_qwen2_5_7b_8k_1p.yaml
, available for download from the Gitee repository.
Due to limited single-NPU memory, the
num_layers
in the configuration file is set to 4, used as an example only.
Then, modify the parameters in the configuration file based on actual conditions, mainly including:
load_checkpoint: '/path/to/Qwen2.5-7B/' # Path to the pre-trained model weight folder
...
train_dataset: &train_dataset
...
data_loader:
...
handler:
- type: AlpacaInstructDataHandler
tokenizer:
vocab_file: "/path/to/Qwen2.5-7B/vocab.json" # Path to the vocabulary file
merges_file: "/path/to/Qwen2.5-7B/merges.txt" # Path to the merges file
Run run_mindformer.py
to start the single-NPU fine-tuning task. The command is as follows:
python run_mindformer.py \
--config /path/to/finetune_qwen2_5_7b_8k_1p.yaml \
--register_path research/qwen2_5 \
--use_parallel False \
--run_mode finetune
Parameter descriptions:
config: Model configuration file
use_parallel: Whether to enable parallel training
run_mode: Running mode, train: training, finetune: fine-tuning, predict: inference
Single-Node Training
First, prepare the configuration file. This guide provides a fine-tuning configuration file for the Qwen2.5-7B model, finetune_qwen2_5_7b_8k.yaml
, available for download from the Gitee repository.
Then, modify the parameters in the configuration file based on actual conditions, mainly including:
load_checkpoint: '/path/to/Qwen2.5-7B/' # Path to the pre-trained model weight folder
...
train_dataset: &train_dataset
...
data_loader:
...
handler:
- type: AlpacaInstructDataHandler
tokenizer:
vocab_file: "/path/to/Qwen2.5-7B/vocab.json" # Path to the vocabulary file
merges_file: "/path/to/Qwen2.5-7B/merges.txt" # Path to the merges file
Run the following msrun startup script for 8-NPU distributed training:
bash scripts/msrun_launcher.sh "run_mindformer.py \
--register_path research/qwen2_5 \
--config /path/to/finetune_qwen2_5_7b_8k.yaml \
--use_parallel True \
--run_mode finetune" 8
Parameter descriptions:
config: Model configuration file
use_parallel: Whether to enable parallel training
run_mode: Running mode, train: training, finetune: fine-tuning, predict: inference
After task completion, a checkpoint folder will be generated in the mindformers/output directory, and the model files will be saved in this folder.
Multi-Node Training
Multi-Node, multi-NPU fine-tuning tasks are similar to launching pre-training. Refer to multi-node, multi-NPU pre-training commands.
First, modify the configuration file, adjusting settings based on the number of nodes:
parallel_config:
data_parallel: ...
model_parallel: ...
pipeline_stage: ...
context_parallel: ...
Modify the command as follows:
Add the startup script parameter
--config /path/to/finetune_qwen2_5_7b_8k.yaml
to load pre-trained weights.Set
--run_mode finetune
in the startup script, where run_mode indicates the running mode: train (training), finetune (fine-tuning), or predict (inference).
After task completion, a checkpoint folder will be generated in the mindformers/output directory, and the model files will be saved in this folder.
LoRA Fine-Tuning with MindSpore Transformers
MindSpore Transformers supports configuration-driven LoRA fine-tuning, eliminating the need for code adaptations for each model. By modifying the model configuration in the full-parameter fine-tuning YAML file and adding the pet_config
parameter-efficient fine-tuning configuration, LoRA fine-tuning tasks can be performed. Below is an example of the model configuration section in a YAML file for LoRA fine-tuning of the Llama2 model, with detailed explanations of the pet_config
parameters.
Introduction to LoRA Principles
LoRA significantly reduces the number of parameters by decomposing the original model’s weight matrix into two low-rank matrices. For example, suppose a weight matrix W has dimensions \(m \times n\). With LoRA, it is decomposed into two low-rank matrices A and B, where A has dimensions \(m \times r\) and B has dimensions \(r \times n\) (\(r\) is much smaller than \(m\) and \(n\)). During fine-tuning, only these two low-rank matrices are updated, leaving the rest of the original model unchanged.
This approach not only drastically reduces the computational cost of fine-tuning but also preserves the model’s original performance, making it particularly suitable for model optimization in environments with limited data or computational resources. For detailed principles, refer to the paper LoRA: Low-Rank Adaptation of Large Language Models.
Modifying the Configuration File
Based on the full-parameter fine-tuning configuration file, add LoRA-related parameters to the model configuration and rename it to fine_tune_qwen2_5_7b_8k_lora.yaml
. Below is an example configuration snippet showing how to add LoRA fine-tuning parameters for the Qwen2.5-7B model:
# model config
model:
model_config:
...
pet_config:
pet_type: lora
lora_rank: 16
lora_alpha: 16
lora_dropout: 0.05
target_modules: '.*wq|.*wk|.*wv|.*wo'
Detailed Explanation of pet_config Parameters
In the model_config
, pet_config
is the core configuration section for LoRA fine-tuning, used to specify LoRA-related parameters. The parameters are explained as follows:
pet_type: Specifies the type of Parameter-Efficient Tuning (PET) as LoRA. This means LoRA modules will be inserted into key layers of the model to reduce the number of parameters required for fine-tuning.
lora_rank: Defines the rank of the low-rank matrices. A smaller rank results in fewer parameters to update, reducing computational resource usage. Setting it to 16 is a common balance point, significantly reducing the parameter count while maintaining model performance.
lora_alpha: Controls the scaling factor for weight updates in the LoRA module. This value determines the magnitude and impact of weight updates during fine-tuning. Setting it to 16 indicates a moderate scaling factor, helping to stabilize the training process.
lora_dropout: Sets the dropout probability in the LoRA module. Dropout is a regularization technique used to reduce the risk of overfitting. A value of 0.05 means there is a 5% chance of randomly “disabling” certain neural connections during training, which is particularly important when data is limited.
target_modules: Specifies which weight matrices in the model LoRA will be applied to, using regular expressions. In Llama, this configuration applies LoRA to the Query (wq), Key (wk), Value (wv), and Output (wo) matrices in the self-attention mechanism. These matrices play critical roles in the Transformer architecture, and applying LoRA to them maintains model performance while reducing the parameter count.
LoRA Fine-Tuning Example for Qwen2.5-7B
The dataset used for LoRA fine-tuning can be prepared as described in the Dataset Preparation section of the full-parameter fine-tuning process.
For the Qwen2.5-7B model, the following msrun startup command can be executed for 8-NPU distributed fine-tuning:
bash scripts/msrun_launcher.sh "run_mindformer.py \
--register_path research/qwen2_5 \
--config /path/to/finetune_qwen2_5_7b_8k_lora.yaml \
--use_parallel True \
--run_mode finetune" 8