MindSpore Transformers

Introduction

  • Overall Structure
  • Models

Installation

  • Installation Guidelines

Full-process Guide to Large Models

  • Training Guide
  • Pretraining
  • Supervised Fine-Tuning (SFT)
  • Inference
  • Service Deployment
  • Evaluation

Features

  • Features Overview
  • Start Tasks
  • Ckpt Weights
  • Safetensors Weights
  • Configuration File Descriptions
  • Loading Hugging Face Model Configuration
  • Logs
  • Using Tokenizer
  • Dataset
  • Training Hyperparameters
  • Training Metrics Monitoring
  • Resumable Training After Breakpoint
  • Checkpoint Saving and Loading
  • Resume Training2.0
  • Distributed Parallelism Training
  • Training High Availability
  • Memory Optimization
  • Data Skip And Checkpoint Health Monitor
  • Pre-trained Model Average Weight Consolidation
  • Other Training Features
  • Quantization

Advanced Development

  • Advanced Development Overview
  • Large Model Precision Optimization Guide
  • Large Model Performance Optimization Guide
  • Development Migration
  • Guide to Using the Inference Configuration Template
  • Comparison of Reasoning Precision
  • Comparing the Model Precision with that of Megatron-LM
  • Training Configuration Template Instruction
  • Weight Conversion Development Adaptation
  • API

Excellent Practice

  • Practice Case of Using DeepSeek-R1 for Model Distillation

Environment Variables

  • Environment Variable Descriptions

Contribution Guide

  • MindSpore Transformers Contribution Guidelines
  • Modelers Contribution Guidelines

FAQ

  • Model-Related FAQ
  • Feature-Related FAQ
MindSpore Transformers
  • »
  • Search


© Copyright MindSpore.

Built with Sphinx using a theme provided by Read the Docs.