Search | MindSpore Transformers master documentation | MindSpore

MindSpore Transformers

Introduction

Overall Structure
Models

Installation

Installation Guidelines

Full-process Guide to Large Models

Training Guide
Pretraining
Supervised Fine-Tuning (SFT)
Inference
Service Deployment
Evaluation

Features

Features Overview
Start Tasks
Ckpt Weights
Safetensors Weights
Configuration File Descriptions
Loading Hugging Face Model Configuration
Logs
Using Tokenizer
Dataset
Training Hyperparameters
Training Metrics Monitoring
Resumable Training After Breakpoint
Checkpoint Saving and Loading
Resume Training2.0
Distributed Parallelism Training
Training High Availability
Memory Optimization
Data Skip And Checkpoint Health Monitor
Pre-trained Model Average Weight Consolidation
Other Training Features
Quantization

Advanced Development

Advanced Development Overview
Large Model Precision Optimization Guide
Large Model Performance Optimization Guide
Development Migration
Guide to Using the Inference Configuration Template
Comparison of Reasoning Precision
Comparing the Model Precision with that of Megatron-LM
Training Configuration Template Instruction
Weight Conversion Development Adaptation
API

Excellent Practice

Practical Case: Creating a Docker Image for MindSpore Transformers
Practice Case of Using DeepSeek-R1 for Model Distillation
Practical Case: Converting Model Weights to Megatron Model Weights
Practice Case: Interconnecting MindSpore Transformers with General Evaluation Tools
Practice Case: Using GLM4-9B for Multi-Device Model Fine-Tuning

Environment Variables

Environment Variable Descriptions

Contribution Guide

MindSpore Transformers Contribution Guidelines
Modelers Contribution Guidelines

FAQ

Model-Related FAQ
Feature-Related FAQ

MindSpore Transformers

»
Search

© Copyright MindSpore.

Built with Sphinx using a theme provided by Read the Docs.