Features Overview
MindSpore Transformers provides a wide range of features across the full process of pre-training, fine-tuning, inference, and deployment, enabling configurable development and optimization. This section summarizes all features by category: General Features, Training Features, and Inference Features, for quick reference and navigation.
General Features
Foundational capabilities reusable across pre-training, fine-tuning, and inference for consistent setup and reuse.
Feature |
Description |
Architecture Support |
|---|---|---|
One-click start for single-device, single-node and multi-node tasks. |
Mcore/Legacy |
|
[Checkpoint 1.0] Supports conversion, slicing and merging of weight files in ckpt format. |
Legacy |
|
[Checkpoint 1.0] Supports saving and loading weight files in safetensors format. |
Mcore/Legacy |
|
Use YAML files to centrally manage and adjust configurable items in tasks. |
Mcore/Legacy |
|
Plug-and-play loading of Hugging Face community model configurations. |
Mcore |
|
Introduction to logs, including log structure and log saving. |
Mcore/Legacy |
|
Introduction to tokenizer; supports Hugging Face Tokenizer in inference and datasets. |
Mcore |
Training Features
Supports large-scale, reliable large model training and tuning.
Feature |
Description |
Architecture Support |
|---|---|---|
Supports multiple types and formats of datasets (Megatron, Hugging Face, MindRecord, etc.). |
Mcore/Legacy |
|
Flexibly configure hyperparameters (learning rate, optimizer, etc.) for large model training. |
Mcore/Legacy |
|
Visualization for the training phase to monitor and analyze metrics and information. |
Mcore/Legacy |
|
[Checkpoint 1.0] Step-level resumable training to reduce waste from unexpected interruptions. |
Mcore/Legacy |
|
[Checkpoint 2.0] Checkpoint saving and loading. |
Mcore |
|
[Checkpoint 2.0] Step-level resumable training with scaling and incremental scenarios. |
Mcore |
|
End-of-life CKPT, UCE fault-tolerant recovery, and process-level rescheduling recovery. |
Mcore |
|
One-click multi-dimensional hybrid distributed parallel for efficient training at scale. |
Mcore/Legacy |
|
Recomputation and fine-grained activation SWAP to reduce peak memory. |
Mcore/Legacy |
|
Data skip and checkpoint health monitoring for more robust training. |
Mcore/Legacy |
|
Merge multiple checkpoints (PMA) and fused checkpoint saving. |
Mcore |
|
Gradient accumulation, gradient clipping, CPU affinity, MoE droprate, RoPE/SwiGLU fusion, etc. |
Mcore/Legacy |
Inference Features
Targets inference and deployment scenarios, enabling trained models to be deployed efficiently for production use.
Feature |
Description |
Architecture Support |
|---|---|---|
Integrates MindSpore Golden Stick for a unified quantization inference workflow. |
Mcore/Legacy |