Release Notes

MindSpore Transformers 1.9.0 Release Notes

The following is the changelog for MindSpore Transformers 1.9.0 compared with 1.8.0, including key new features and bug fixes.

Training: Supports forward inference during training; when pipeline parallelism is enabled for training jobs, parameter loading information for the corresponding rank can be printed.
Model support: Added inference and pre-training for TeleChat3-36B; added pre-training for TeleChat3-105B.
Performance monitoring: Extended the Profile performance monitoring module with timing tracking for the cluster’s first startup phase.
Checkpoint solution: Checkpoint 2.0 is adapted for fast recovery from failures; optimizes Hugging Face weight loading performance^[1].
PyNative Capability (Experimental): Supports launching the training process via Trainer; supports the construction of Qwen3 dense models.

Newly supported models:

Model	Variants
TeleChat3	TeleChat3-36B (pre-training, inference), TeleChat3-105B-A4.7B (pre-training)

During this release cycle we fixed issues across models, features, usability, documentation, and more. Key fixes include:

!8006: Fixed incorrect TFLOPs printing for MoE models.
!7874: Fixed pad_token_id not taking effect in MCore networks.
!7818: Fixed hostname retrieval failures in some environments.
!7793 !7713: Fixed Hugging Face dataset-related issues.
!7630: Fixed safetensors weight conversion and loading when changing parallel strategies.
!7620: Fixed accuracy issues caused by communication for VocabEmbedding under certain configurations.

This release includes changes to some historically deprecated models, code, and materials. Details:

Change	Description
None	No change notes for this version

Thanks to everyone who contributed during this release cycle:

Contributions in any form are welcome!

Experimental tests show that loading time for a hundred-billion-parameter model on a hundred-NPU cluster has been reduced by 80%.