Deprecated

This page belongs to the Static Graph (GRAPH_MODE) Implementation section and has been marked as deprecated. New features are primarily being developed in the "r2.0.0 Dynamic Graph Implementation" section. Please refer to the dynamic graph documentation first.

Features Overview

MindSpore Transformers provides a wide range of features across the full process of pre-training, fine-tuning, inference, and deployment, enabling configurable development and optimization. This section summarizes all features by category: General Features, Training Features, and Inference Features, for quick reference and navigation.

General Features

Foundational capabilities reusable across pre-training, fine-tuning, and inference for consistent setup and reuse.

Feature	Description	Architecture Support
Start Tasks	One-click start for single-device, single-node and multi-node tasks.	Mcore/Legacy
Ckpt Weights	[Checkpoint 1.0] Supports conversion, slicing and merging of weight files in ckpt format.	Legacy
Safetensors Weights	[Checkpoint 1.0] Supports saving and loading weight files in safetensors format.	Mcore/Legacy
Configuration File Descriptions	Use YAML files to centrally manage and adjust configurable items in tasks.	Mcore/Legacy
Loading Hugging Face Model Configuration	Plug-and-play loading of Hugging Face community model configurations.	Mcore
Logs	Introduction to logs, including log structure and log saving.	Mcore/Legacy
Using Tokenizer	Introduction to tokenizer; supports Hugging Face Tokenizer in inference and datasets.	Mcore

Training Features

Supports large-scale, reliable large model training and tuning.

Feature	Description	Architecture Support
Dataset	Supports multiple types and formats of datasets (Megatron, Hugging Face, MindRecord, etc.).	Mcore/Legacy
Training Hyperparameters	Flexibly configure hyperparameters (learning rate, optimizer, etc.) for large model training.	Mcore/Legacy
Training Metrics Monitoring	Visualization for the training phase to monitor and analyze metrics and information.	Mcore/Legacy
Resumable Training After Breakpoint	[Checkpoint 1.0] Step-level resumable training to reduce waste from unexpected interruptions.	Mcore/Legacy
Checkpoint Saving and Loading	[Checkpoint 2.0] Checkpoint saving and loading.	Mcore
Resumable Training After Breakpoint 2.0	[Checkpoint 2.0] Step-level resumable training with scaling and incremental scenarios.	Mcore
Training High-Availability (Beta)	End-of-life CKPT, UCE fault-tolerant recovery, and process-level rescheduling recovery.	Mcore
Distributed Parallel Training	One-click multi-dimensional hybrid distributed parallel for efficient training at scale.	Mcore/Legacy
Training Memory Optimization	Recomputation and fine-grained activation SWAP to reduce peak memory.	Mcore/Legacy
Data Skip and Health Monitoring	Data skip and checkpoint health monitoring for more robust training.	Mcore/Legacy
Pre-trained Model Average (PMA) Weight Merge	Merge multiple checkpoints (PMA) and fused checkpoint saving.	Mcore
Other Training Features	Gradient accumulation, gradient clipping, CPU affinity, MoE DropRate, RoPE/SwiGLU fusion, etc.	Mcore/Legacy

Inference Features

Targets inference and deployment scenarios, enabling trained models to be deployed efficiently for production use.

Feature	Description	Architecture Support
Quantization	Integrates MindSpore Golden Stick for a unified quantization inference workflow.	Mcore/Legacy