Deprecated

This page belongs to the Static Graph (GRAPH_MODE) Implementation section and has been marked as deprecated. New features are primarily being developed in the "r2.0.0 Dynamic Graph Implementation" section. Please refer to the dynamic graph documentation first.

Features Overview

View Source on AtomGit

MindSpore Transformers provides a wide range of features across the full process of pre-training, fine-tuning, inference, and deployment, enabling configurable development and optimization. This section summarizes all features by category: General Features, Training Features, and Inference Features, for quick reference and navigation.

General Features

Foundational capabilities reusable across pre-training, fine-tuning, and inference for consistent setup and reuse.

Feature

Description

Architecture Support

Start Tasks

One-click start for single-device, single-node and multi-node tasks.

Mcore/Legacy

Ckpt Weights

[Checkpoint 1.0] Supports conversion, slicing and merging of weight files in ckpt format.

Legacy

Safetensors Weights

[Checkpoint 1.0] Supports saving and loading weight files in safetensors format.

Mcore/Legacy

Configuration File Descriptions

Use YAML files to centrally manage and adjust configurable items in tasks.

Mcore/Legacy

Loading Hugging Face Model Configuration

Plug-and-play loading of Hugging Face community model configurations.

Mcore

Logs

Introduction to logs, including log structure and log saving.

Mcore/Legacy

Using Tokenizer

Introduction to tokenizer; supports Hugging Face Tokenizer in inference and datasets.

Mcore

Training Features

Supports large-scale, reliable large model training and tuning.

Feature

Description

Architecture Support

Dataset

Supports multiple types and formats of datasets (Megatron, Hugging Face, MindRecord, etc.).

Mcore/Legacy

Training Hyperparameters

Flexibly configure hyperparameters (learning rate, optimizer, etc.) for large model training.

Mcore/Legacy

Training Metrics Monitoring

Visualization for the training phase to monitor and analyze metrics and information.

Mcore/Legacy

Resumable Training After Breakpoint

[Checkpoint 1.0] Step-level resumable training to reduce waste from unexpected interruptions.

Mcore/Legacy

Checkpoint Saving and Loading

[Checkpoint 2.0] Checkpoint saving and loading.

Mcore

Resumable Training After Breakpoint 2.0

[Checkpoint 2.0] Step-level resumable training with scaling and incremental scenarios.

Mcore

Training High-Availability (Beta)

End-of-life CKPT, UCE fault-tolerant recovery, and process-level rescheduling recovery.

Mcore

Distributed Parallel Training

One-click multi-dimensional hybrid distributed parallel for efficient training at scale.

Mcore/Legacy

Training Memory Optimization

Recomputation and fine-grained activation SWAP to reduce peak memory.

Mcore/Legacy

Data Skip and Health Monitoring

Data skip and checkpoint health monitoring for more robust training.

Mcore/Legacy

Pre-trained Model Average (PMA) Weight Merge

Merge multiple checkpoints (PMA) and fused checkpoint saving.

Mcore

Other Training Features

Gradient accumulation, gradient clipping, CPU affinity, MoE DropRate, RoPE/SwiGLU fusion, etc.

Mcore/Legacy

Inference Features

Targets inference and deployment scenarios, enabling trained models to be deployed efficiently for production use.

Feature

Description

Architecture Support

Quantization

Integrates MindSpore Golden Stick for a unified quantization inference workflow.

Mcore/Legacy