MindSpore Transformers Documentation

The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development.

Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features:

  • One-click initiation of single or multi-card pre-training, fine-tuning, inference, and deployment processes for large models;

  • Provide rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration;

  • System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery;

  • Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.;

  • Provide real-time visualization of training accuracy/performance monitoring indicators.

Users can refer to Overall Architecture and Model Library to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models.

The open-source code repository for MindSpore Transformers is located at Gitee | MindSpore/mindformers.

If you have any suggestions for MindSpore Transformers, please contact us via issue and we will handle them promptly.

Full-process Developing with MindSpore Transformers

MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents:

Code repository address: <https://gitee.com/mindspore/mindformers>

Features description of MindSpore Transformers

MindSpore Transformers provides a wealth of features throughout the full-process of large model development. Users can learn about these features via the following links:

  • General Features:

    • Start Tasks

      One-click start for single-device, single-node and multi-node tasks.

    • Ckpt Weights

      Supports conversion, slice and merge weight files in ckpt format.

    • Safetensors Weights

      Supports saving and loading weight files in safetensors format.

    • Configuration File Descriptions

      Supports the use of YAML files to centrally manage and adjust configurable items in tasks.

    • Loading Hugging Face Model Configuration

      Supports plug-and-play loading of Hugging Face community model configurations for seamless integration.

    • Logs

      Introduction of logs, including log structure, log saving, and so on.

    • Using Tokenizer

      Introduction of tokenizer, supports the Hugging Face Tokenizer for use in inference and datasets.

  • Training Features:

    • Dataset

      Supports multiple types and formats of datasets.

    • Training Hyperparameters

      Flexibly configure hyperparameter settings for large model training.

    • Training Metrics Monitoring

      Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process.

    • Resumable Training After Breakpoint

      Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training.

    • Training High-Availability (Beta)

      Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature).

    • Distributed Parallel Training

      One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards.

    • Training Memory Optimization

      Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training.

    • Other Training Features

      Supports gradient accumulation and gradient clipping, etc.

  • Inference Features:

    • Quantization

      Integrates MindSpore Golden Stick toolkit and provides a unified quantization inference process.

Advanced developing with MindSpore Transformers

Environment Variables

Contribution Guide

FAQ