MindSpore Transformers Documentation

The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.

Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:

  • One-click initiation of single or multi-card pre-training, fine-tuning, inference, and deployment processes for large models;

  • Provide rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;

  • System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;

  • Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;

  • Provide real-time visualization of training accuracy/performance monitoring indicators.

Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models.

The open-source code repository for MindSpore Transformers is located at AtomGit | MindSpore/mindformers.

If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.

Full-process Developing with MindSpore Transformers

MindSpore Transformers provides a unified one-click start for single- and multi-card training, fine-tuning, and inference. From getting started to going live, refer as needed to: Training Guide, Pretraining, Supervised Fine-Tuning, Inference, Service Deployment, and Evaluation.

Features description of MindSpore Transformers

General capabilities, training capabilities (such as dataset, parallelism, resumable training, memory optimization, etc.), and inference and quantization are summarized by category in the Features Overview. Use it to quickly find and jump to the right documentation.

Advanced developing with MindSpore Transformers

After you have basic training and inference in place, for model migration, precision and performance tuning, or accuracy comparison with a reference implementation, see the Advanced Development Overview, which organizes all advanced development docs by diagnostics and optimization, model development and configuration, accuracy comparison, and API reference.

Environment variables, contribution, and FAQ