MindSpore Transformers Documentation

The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.

Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:

One-click initiation of single or multi-card pre-training, fine-tuning, inference, and deployment processes for large models;
Provide rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;
System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;
Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;
Provide real-time visualization of training accuracy/performance monitoring indicators.

Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models.

The open-source code repository for MindSpore Transformers is located at Gitee | MindSpore/mindformers.

If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.

Full-process Developing with MindSpore Transformers

MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:

Code repository address: <https://gitee.com/mindspore/mindformers>

Features description of MindSpore Transformers

MindSpore Transformers provides a wealth of features throughout the full-process of large model development. Users can learn about these features via the following links:

General Features:
- Start Tasks
  
  One-click start for single-device, single-node and multi-node tasks.
- Ckpt Weights
  
  Supports conversion, slice and merge weight files in ckpt format.
- Safetensors Weights
  
  Supports saving and loading weight files in safetensors format.
- Configuration File Descriptions
  
  Supports the use of YAML files to centrally manage and adjust configurable items in tasks.
- Loading Hugging Face Model Configuration
  
  Supports plug-and-play loading of Hugging Face community model configurations for seamless integration.
- Logs
  
  Introduction of logs, including log structure, log saving, and so on.
- Using Tokenizer
  
  Introduction of tokenizer, supports the Hugging Face Tokenizer for use in inference and datasets.
Training Features:
- Dataset
  
  Supports multiple types and formats of datasets.
- Training Hyperparameters
  
  Flexibly configure hyperparameter settings for large model training.
- Training Metrics Monitoring
  
  Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.
- Resumable Training After Breakpoint
  
  Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.
- Training High-Availability (Beta)
  
  Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature).
- Distributed Parallel Training
  
  One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.
- Training Memory Optimization
  
  Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training.
- Other Training Features
  
  Supports gradient accumulation, gradient clipping, CPU affinity binding, etc.
Inference Features:
- Quantization
  
  Integrates MindSpore Golden Stick toolkit and provides a unified quantization inference process.

Advanced developing with MindSpore Transformers

Diagnostics and Optimization
- Precision Optimization
- Performance Optimization
Model Development
- Development Migration
- Guide to Using the Inference Configuration Template
Accuracy Comparison
- Compare Training Accuracy with Megatron-LM
- Comparison of Inference Precision

Environment Variables

Environment Variables Description

MindSpore Transformers Documentation

Full-process Developing with MindSpore Transformers

Features description of MindSpore Transformers

Advanced developing with MindSpore Transformers

Environment Variables

Contribution Guide

FAQ