MindSpore Transformers Documentation ===================================== The goal of the MindSpore Transformers suite is to build a full-process development suite for Large model pre-training, fine-tuning, inference, and deployment. It provides mainstream Transformer-based Large Language Models (LLMs) and Multimodal Models (MMs). It is expected to help users easily realize the full process of large model development. Based on MindSpore's built-in parallel technology and component-based design, the MindSpore Transformers suite has the following features: - One-click initiation of single or multi card pre-training, fine-tuning, inference, and deployment processes for large models; - Provides rich multi-dimensional hybrid parallel capabilities for flexible and easy-to-use personalized configuration; - System-level deep optimization on large model training and inference, native support for ultra-large-scale cluster efficient training and inference, rapid fault recovery; - Support for configurable development of task components. Any module can be enabled by unified configuration, including model network, optimizer, learning rate policy, etc.; - Provide real-time visualization of training accuracy/performance monitoring indicators. Users can refer to `Overall Architecture `_ and `Model Library `_ to get a quick overview of the MindSpore Transformers system architecture, and the list of supported foundation models. If you have any suggestions for MindSpore Transformers, please contact us via `issue `_ and we will handle them promptly. Full-process Developing with MindSpore Transformers ------------------------------------------------------------------------------------------- MindSpore Transformers supports one-click start of single/multi-card training, fine-tuning, and inference processes for any task, which makes the execution of deep learning tasks more efficient and user-friendly by simplifying the operation, providing flexibility, and automating the process. Users can learn from the following explanatory documents: - `Pretraining `_ - `Supervised Fine-Tuning `_ - `Inference `_ - `Service Deployment `_ Code repository address: Features description of MindSpore Transformers ------------------------------------------------------------------------------------------- MindSpore Transformers provides a wealth of features throughout the full-process of large model development. Users can learn about these features via the following links: - General Features: - `Start Tasks `_ One-click start for single-device, single-node and multi-node tasks. - `Ckpt Weights `_ Supports conversion, slice and merge weight files in ckpt format. - `Safetensors Weights `_ Supports saving and loading weight files in safetensors format. - `Configuration File Descriptions `_ Supports the use of `YAML` files to centrally manage and adjust configurable items in tasks. - `Loading Hugging Face Model Configuration `_ Supports plug-and-play loading of Hugging Face community model configurations for seamless integration. - `Logs `_ Introduction of logs, including log structure, log saving, and so on. - `Using Tokenizer `_ Introduction of tokenizer, supports the Hugging Face Tokenizer for use in reasoning and datasets. - Training Features: - `Dataset `_ Supports multiple types and formats of datasets. - `Training Hyperparameters `_ Flexibly configure hyperparameter settings for large model training. - `Training Metrics Monitoring `_ Provides visualization services for the training phase of large models for monitoring and analyzing various indicators and information during the training process. - `Resumable Training After Breakpoint `_ Supports step-level resumable training after breakpoint, effectively reducing the waste of time and resources caused by unexpected interruptions during large-scale training. - `Training High Availability (Beta) `_ Provides high-availability capabilities for the training phase of large models, including end-of-life CKPT preservation, UCE fault-tolerant recovery, and process-level rescheduling recovery (Beta feature). - `Distributed Parallel Training `_ One-click configuration of multi-dimensional hybrid distributed parallel allows models to run efficiently in clusters up to 10,000 cards. - `Training Memory Optimization `_ Supports fine-grained recomputation and activations swap, to reduce peak memory overhead during model training. - `Other Training Features `_ Supports gradient accumulation and gradient clipping, etc. - Inference Features: - `Evaluation `_ Supports the use of third-party open-source evaluation frameworks and datasets for large-scale model ranking evaluations. - `Quantization `_ Integrated MindSpore Golden Stick toolkit to provides a unified quantization inference process. Advanced developing with MindSpore Transformers ------------------------------------------------- - Diagnostics and Optimization - `Precision Optimization `_ - `Performance Optimization `_ - Model Development - `Development Migration `_ - `Guide to Using the Inference Configuration Template `_ - Accuracy Comparison - `Compare Training Accuracy with Megatron-LM `_ Environment Variables ------------------------------------ - `Environment Variables Description `_ Contribution Guide ------------------------------------ - `MindSpore Transformers Contribution Guide `_ - `Modelers Contribution Guide `_ FAQ ------------------------------------ - `Model-Related `_ - `Function-Related `_ .. toctree:: :glob: :maxdepth: 1 :caption: Introduction :hidden: introduction/overview introduction/models .. toctree:: :glob: :maxdepth: 1 :caption: Installation :hidden: installation .. toctree:: :glob: :maxdepth: 1 :caption: Full-process Guide to Large Models :hidden: guide/pre_training guide/supervised_fine_tuning guide/inference guide/deployment .. toctree:: :glob: :maxdepth: 1 :caption: Features :hidden: feature/start_tasks feature/ckpt feature/safetensors feature/configuration feature/load_huggingface_config feature/logging feature/training_function feature/infer_function feature/tokenizer .. toctree:: :glob: :maxdepth: 1 :caption: Advanced Development :hidden: advanced_development/precision_optimization advanced_development/performance_optimization advanced_development/dev_migration advanced_development/yaml_config_inference advanced_development/accuracy_comparison advanced_development/api .. toctree:: :glob: :maxdepth: 1 :caption: Excellent Practice :hidden: example/distilled/distilled .. toctree:: :glob: :maxdepth: 1 :caption: Environment Variables :hidden: env_variables .. toctree:: :glob: :maxdepth: 1 :caption: Contribution Guide :hidden: contribution/mindformers_contribution contribution/modelers_contribution .. toctree:: :glob: :maxdepth: 1 :caption: FAQ :hidden: faq/model_related faq/feature_related