Deprecated
This page belongs to the Static Graph (GRAPH_MODE) Implementation section and has been marked as deprecated. New features are primarily being developed in the "r2.0.0 Dynamic Graph Implementation" section. Please refer to the dynamic graph documentation first.
Features Overview
MindSpore Transformers provides a wide range of features across the full process of pre-training, fine-tuning, inference, and deployment, enabling configurable development and optimization. This section summarizes all features by category: General Features, Training Features, and Inference Features, for quick reference and navigation.
General Features
Foundational capabilities reusable across pre-training, fine-tuning, and inference for consistent setup and reuse.
Feature |
Description |
Architecture Support |
|---|---|---|
One-click start for single-device, single-node and multi-node tasks. |
Mcore/Legacy |
|
[Checkpoint 1.0] Supports conversion, slicing and merging of weight files in ckpt format. |
Legacy |
|
[Checkpoint 1.0] Supports saving and loading weight files in safetensors format. |
Mcore/Legacy |
|
Use YAML files to centrally manage and adjust configurable items in tasks. |
Mcore/Legacy |
|
Plug-and-play loading of Hugging Face community model configurations. |
Mcore |
|
Introduction to logs, including log structure and log saving. |
Mcore/Legacy |
|
Introduction to tokenizer; supports Hugging Face Tokenizer in inference and datasets. |
Mcore |
Training Features
Supports large-scale, reliable large model training and tuning.
Feature |
Description |
Architecture Support |
|---|---|---|
Supports multiple types and formats of datasets (Megatron, Hugging Face, MindRecord, etc.). |
Mcore/Legacy |
|
Flexibly configure hyperparameters (learning rate, optimizer, etc.) for large model training. |
Mcore/Legacy |
|
Visualization for the training phase to monitor and analyze metrics and information. |
Mcore/Legacy |
|
[Checkpoint 1.0] Step-level resumable training to reduce waste from unexpected interruptions. |
Mcore/Legacy |
|
[Checkpoint 2.0] Checkpoint saving and loading. |
Mcore |
|
[Checkpoint 2.0] Step-level resumable training with scaling and incremental scenarios. |
Mcore |
|
End-of-life CKPT, UCE fault-tolerant recovery, and process-level rescheduling recovery. |
Mcore |
|
One-click multi-dimensional hybrid distributed parallel for efficient training at scale. |
Mcore/Legacy |
|
Recomputation and fine-grained activation SWAP to reduce peak memory. |
Mcore/Legacy |
|
Data skip and checkpoint health monitoring for more robust training. |
Mcore/Legacy |
|
Merge multiple checkpoints (PMA) and fused checkpoint saving. |
Mcore |
|
Gradient accumulation, gradient clipping, CPU affinity, MoE DropRate, RoPE/SwiGLU fusion, etc. |
Mcore/Legacy |
Inference Features
Targets inference and deployment scenarios, enabling trained models to be deployed efficiently for production use.
Feature |
Description |
Architecture Support |
|---|---|---|
Integrates MindSpore Golden Stick for a unified quantization inference workflow. |
Mcore/Legacy |