Features Overview

MindSpore Transformers provides a wide range of features across the full process of pre-training, fine-tuning, inference, and deployment, enabling configurable development and optimization. This section summarizes all features by category: General Features, Training Features, and Inference Features, for quick reference and navigation.

General Features

Foundational capabilities reusable across pre-training, fine-tuning, and inference for consistent setup and reuse.

Feature	Description	Architecture Support
Start Tasks	One-click start for single-device, single-node and multi-node tasks.	Mcore/Legacy
Ckpt Weights	[Checkpoint 1.0] Supports conversion, slicing and merging of weight files in ckpt format.	Legacy
Safetensors Weights	[Checkpoint 1.0] Supports saving and loading weight files in safetensors format.	Mcore/Legacy
Configuration File Descriptions	Use YAML files to centrally manage and adjust configurable items in tasks.	Mcore/Legacy
Loading Hugging Face Model Configuration	Plug-and-play loading of Hugging Face community model configurations.	Mcore
Logs	Introduction to logs, including log structure and log saving.	Mcore/Legacy
Using Tokenizer	Introduction to tokenizer; supports Hugging Face Tokenizer in inference and datasets.	Mcore

Training Features

Supports large-scale, reliable large model training and tuning.

Feature	Description	Architecture Support
Dataset	Supports multiple types and formats of datasets (Megatron, Hugging Face, MindRecord, etc.).	Mcore/Legacy
Training Hyperparameters	Flexibly configure hyperparameters (learning rate, optimizer, etc.) for large model training.	Mcore/Legacy
Training Metrics Monitoring	Visualization for the training phase to monitor and analyze metrics and information.	Mcore/Legacy
Resumable Training After Breakpoint	[Checkpoint 1.0] Step-level resumable training to reduce waste from unexpected interruptions.	Mcore/Legacy
Checkpoint Saving and Loading	[Checkpoint 2.0] Checkpoint saving and loading.	Mcore
Resumable Training After Breakpoint 2.0	[Checkpoint 2.0] Step-level resumable training with scaling and incremental scenarios.	Mcore
Training High-Availability (Beta)	End-of-life CKPT, UCE fault-tolerant recovery, and process-level rescheduling recovery.	Mcore
Distributed Parallel Training	One-click multi-dimensional hybrid distributed parallel for efficient training at scale.	Mcore/Legacy
Training Memory Optimization	Recomputation and fine-grained activation SWAP to reduce peak memory.	Mcore/Legacy
Data Skip and Health Monitoring	Data skip and checkpoint health monitoring for more robust training.	Mcore/Legacy
Pre-trained Model Average (PMA) Weight Merge	Merge multiple checkpoints (PMA) and fused checkpoint saving.	Mcore
Other Training Features	Gradient accumulation, gradient clipping, CPU affinity, MoE DropRate, RoPE/SwiGLU fusion, etc.	Mcore/Legacy

Inference Features

Targets inference and deployment scenarios, enabling trained models to be deployed efficiently for production use.

Feature	Description	Architecture Support
Quantization	Integrates MindSpore Golden Stick for a unified quantization inference workflow.	Mcore/Legacy