Features Overview

View Source on AtomGit

MindSpore Transformers provides a wide range of features across the full process of pre-training, fine-tuning, inference, and deployment, enabling configurable development and optimization. This section summarizes all features by category: General Features, Training Features, and Inference Features, for quick reference and navigation.

General Features

Foundational capabilities reusable across pre-training, fine-tuning, and inference for consistent setup and reuse.

Feature

Description

Architecture Support

Start Tasks

One-click start for single-device, single-node and multi-node tasks.

Mcore/Legacy

Ckpt Weights

[Checkpoint 1.0] Supports conversion, slicing and merging of weight files in ckpt format.

Legacy

Safetensors Weights

[Checkpoint 1.0] Supports saving and loading weight files in safetensors format.

Mcore/Legacy

Configuration File Descriptions

Use YAML files to centrally manage and adjust configurable items in tasks.

Mcore/Legacy

Loading Hugging Face Model Configuration

Plug-and-play loading of Hugging Face community model configurations.

Mcore

Logs

Introduction to logs, including log structure and log saving.

Mcore/Legacy

Using Tokenizer

Introduction to tokenizer; supports Hugging Face Tokenizer in inference and datasets.

Mcore

Training Features

Supports large-scale, reliable large model training and tuning.

Feature

Description

Architecture Support

Dataset

Supports multiple types and formats of datasets (Megatron, Hugging Face, MindRecord, etc.).

Mcore/Legacy

Training Hyperparameters

Flexibly configure hyperparameters (learning rate, optimizer, etc.) for large model training.

Mcore/Legacy

Training Metrics Monitoring

Visualization for the training phase to monitor and analyze metrics and information.

Mcore/Legacy

Resumable Training After Breakpoint

[Checkpoint 1.0] Step-level resumable training to reduce waste from unexpected interruptions.

Mcore/Legacy

Checkpoint Saving and Loading

[Checkpoint 2.0] Checkpoint saving and loading.

Mcore

Resumable Training After Breakpoint 2.0

[Checkpoint 2.0] Step-level resumable training with scaling and incremental scenarios.

Mcore

Training High-Availability (Beta)

End-of-life CKPT, UCE fault-tolerant recovery, and process-level rescheduling recovery.

Mcore

Distributed Parallel Training

One-click multi-dimensional hybrid distributed parallel for efficient training at scale.

Mcore/Legacy

Training Memory Optimization

Recomputation and fine-grained activation SWAP to reduce peak memory.

Mcore/Legacy

Data Skip and Health Monitoring

Data skip and checkpoint health monitoring for more robust training.

Mcore/Legacy

Pre-trained Model Average (PMA) Weight Merge

Merge multiple checkpoints (PMA) and fused checkpoint saving.

Mcore

Other Training Features

Gradient accumulation, gradient clipping, CPU affinity, MoE droprate, RoPE/SwiGLU fusion, etc.

Mcore/Legacy

Inference Features

Targets inference and deployment scenarios, enabling trained models to be deployed efficiently for production use.

Feature

Description

Architecture Support

Quantization

Integrates MindSpore Golden Stick for a unified quantization inference workflow.

Mcore/Legacy