# Release Notes

## MindSpore Transformers Release Notes

## MindSpore Transformers 1.5.0 Release Notes

The following is the changelog for the MindSpore Transformers suite version 1.5.0, with the following key new features and bugfixes compared to version 1.3.2.

### New Features

* [Distributed Parallelism](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/function/distributed_parallel.html): Added Seq Pipe feature, Hybrid Sequence Parallelization feature.
* [Weights](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/function/safetensors.html): Added support for Safetensors format weights, which supports Safetensors remove-redundancy saving.
* [Datasets](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/function/dataset.html): For Hugging Face datasets added support for Packing; For Megatron multi-source mixed datasets added support for EOD mask compression.
* [Training Monitor](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/function/monitor.html): Added support for TensorBoard real-time visualized monitoring of training metrics.
* [High Availability](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/function/high_availability.html): Added end-of-life CKPT function, UCE fault tolerance recovery function and process-level rescheduling recovery function.
* [Heterogeneous Storage](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/function/fine_grained_activations_swap.html): Added SWAP function for fine-grained activation values during training.

### New Models

The following new models are supported:

| Models                                                                                       | Specifications                                                                      |
|----------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------|
| [DeepSeek-V3/R1](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/deepseek3)     | DeepSeek-V3-671B (pre-training, fine-tuning, inference), DeepSeek-R1-671B (inference) |
| [Llama3.2](https://gitee.com/mindspore/mindformers/blob/r1.5.0/docs/model_cards/llama3_2.md) | Llama3.2-3B (inference), Llama3.2-Vision-11B (fine-tuning, inference)               |
| [Qwen2.5](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/qwen2_5)              | Qwen2.5-0.5B/1.5B (inference) /7B/14B/32B/72B (fine-tuning, inference)              |
| [TeleChat2](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/telechat2)          | TeleChat2-7B/35B/115 (fine-tuning, inference)                                       |
| [YiZhao](https://gitee.com/mindspore/mindformers/tree/r1.5.0/research/yizhao)                | YiZhao-12B (pre-training, fine-tuning)                                     |

### Bugfix

During the current release cycle, we have bugfixed many aspects of the model/functionality/usability/documentation. Here is a list of some of the key fixes:

* [!6013](https://gitee.com/mindspore/mindformers/pulls/6013): Fixed incompatibility between context parallelism (cp) and sequence parallelism (use_seq_parallel).
* [!6007](https://gitee.com/mindspore/mindformers/pulls/6007): Fixed that setting the maximum number of checkpoints to keep during training (keep_checkpoint_max) does not take effect on keeping checkpoints for pure model parameters.
* [!83880](https://gitee.com/mindspore/mindspore/pulls/83880): Fix overflow detection failure when large cluster gradient overflows.
* [!80845](https://gitee.com/mindspore/mindspore/pulls/80845), [!80861](https://gitee.com/mindspore/mindspore/pulls/80861): Fix an issue where Llama models report an error when enabling ConstantWarmUpLR with compilation cache turned on.

### Change Description

In the current version, some historical deprecated models/codes/documentations have been changed. Details of the changes are as follows:

| Change Content                                                          | Change Description                                                                                                                                                                                                                                 |
|-------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Downgraded code, configuration files and materials of deprecated models | The models involved include Bloom, BaiChuan, BaiChuan2, CodeGeeX, CodeGeeX2, GLM, GLM2, VisualGLM, InternLM, PanguAlpha, SAM, SkyWork, WizardCoder, Qwen, Ziya, Llama                                                                              |
| Downgraded code for deprecated interfaces                               | The involved interfaces include CompareLoss, FusedCastAdamWeightDecay, MultiImgCapDataLoader, MultiImgCapDataset, ImageToTextRetrievalTrainer, auto_augment, group_ic_params, group_mim_parameters, TokenClassificationTrainer                     |
| Downgraded the old version of the official documentation                | Downgraded the old version of the documentation related files in the repository. Subsequent official documentation is available at [MindSpore Transformers Official Documentation](https://www.mindspore.cn/mindformers/docs/en/r1.5.0/index.html) |

### Contributors

Thanks to the following people for their contributions:

chengxianbin, Chong Li, ehaleva, hangangqiang, huangshengshuai, huangzhuo, leida, lilei, limengyuan, liubuyu, lizhihao, moran, wangpingan, wangshaocong, wudawei, wutiancheng, wuweikang, yangminghai, yao_yf, zhanzhan, ZhouJingfeng, zhouyaqiang, 常少中, 陈心锐, 陈昱坤, 程泽睿志, 樊瑞, 范益, 封霆谚, 冯浩, 葛煜洪, 郭儒辰, 何泽泉, 胡安东, 胡思超, 胡志坤, 宦晓玲, 黄靖伟, 黄磊, 黄新元, 黄勇, 黄志超, 黄子灵, 季文尚, 金仁操, 孔紫怡, 蓝翔, 李嘉坤, 李俊标, 李子垠, 林盈来, 刘晨晖, 刘烙彬, 刘力力, 刘言伟, 马成贵, 倪钰鑫, 牛君豪, 彭竞由, 秦思莼, 任峪瑾, 赛尧, 苏海波, 孙宇轩, 谭纬城, 唐德志, 汪家傲, 王浩然, 王振邦, 魏琢艺, 吴昊天, 吴治锋, 吴致远, 肖尧, 尤日帆, 俞涵, 张丹阳, 张浩, 张敏利, 张森镇, 张奕晖, 张又文, 赵奕舜, 周声煦, 周小琪, 祝建伟, 邹文祥

Contributions to the project in any form are welcome!