Release Notes

View Source On Gitee

MindSpore 2.7.1 Release Notes

Major Features and Improvements

Dataset

  • [STABLE] The parallel acceleration capability of the .map operation and .batch operation in the dataset has been restructured. The original data transfer logic of c++ thread(with GIL) -> python process pool -> worker[i] has been optimized to c++ thread -> msg & shm -> worker[i], reducing the data transfer chain and eliminating the dependency on the GIL lock, which significantly improves the efficiency of custom data processing.

Parallel

  • [STABLE] In static graph mode, the AlltoAllVC forward and reverse operators are supported. Users can use this operator through the mindspore.ops.AlltoAllVC interface. In dynamic graph mode, the mindspore.communication.comm_func.all_to_all_v_c communication interface is supported. Users can use this operator through this interface.

  • [STABLE] In dynamic graph mode, supports TCPStore function, supports the communication class mindspore.mint.distributed.TCPStore, users can use the TCPStore function through this class.

  • [STABLE] The mindspore.communication.create_group interface now supports the hccl_comm parameter, allowing the reuse of externally created communication groups. Users can create communication groups through other means and pass them to MindSpore for network construction.

  • [BETA] The mindspore.mint.distributed.init_process_group interface supports init_method or store parameters. Users can use this interface to initialize communication in a way that does not depend on the scheduler process.

Ascend

  • [BETA] Operator automatic fusion is supported in PyNative asynchronous execution mode. Users can enable it by setting the environment variable: export MS_DEV_PYNATIVE_FUSION_FLAGS="--opt_level=1".

  • [BETA] The static graph now supports custom operator integration via the CustomOpBuilder method, and exposes the MS_CUSTOM_OPS_REGISTER macro for registering custom operator function classes.

  • [BETA] Dynamic graph mode supports the use of custom memory pools via mindspore.runtime.use_mem_pool to implement custom memory allocation within the context.

PyNative

  • [STABLE] The mindspore.mint.empty_like and mindspore.mint.empty functions support the pin_memory parameter. If users set the pin_memory parameter to True, the returned Tensor will be allocated on pinned memory. Works only for CPU Tensors.

  • [STABLE] View operator performance optimization. By streamlining execution workflows and refining critical implementations, we have significantly enhanced the execution efficiency of View operators in eager mode.

  • [STABLE] The mindspore.Tensor.to interface is extended to support copying Tensor data between devices.

  • [STABLE] Tensor storage supports CPU, GPU, and Ascend. Before 2.7.1, Tensor storage only supported Ascend hardware.

Tools

  • [STABLE] The static graph supports the joint detection scheme for feature values. Users can enable this feature by configuring the environment variable MS_NPU_ASD_CONFIG.

  • [STABLE] HCCS link fault detection is supported. The stress_detect interface now includes an HCCS detection type.

API Change

  • [STABLE] mindspore.mint API provides some new functional and Tensor interfaces. Most of the mint interfaces are currently still experimental interfaces. They perform better than ops interfaces in jit_level="O0"/"O1" and pynative mode. Currently, the graph sinking mode and CPU/GPU backend are not supported, and it will be gradually improved in the future.

    mindspore.mint

    mindspore.mint.real

    mindspore.mint.imag

    mindspore.Tensor

    mindspore.Tensor.sign_

    mindspore.Tensor.masked_scatter_

    mindspore.Tensor.index_copy_

    mindspore.Tensor.index_fill_

    mindspore.Tensor.sigmoid_

  • [STABLE] mindspore.mint.nn.functional.conv1d and mindspore.mint.nn.Conv1d interfaces have transitioned from demo to stable.

  • [STABLE] mindspore.mint.stack and mindspore.mint.concat interfaces now support Tensor inputs of different data types.

  • [BETA] The mindspore.ops.Morph operator primitive now supports custom backward propagation functions (bprop), allowing users to define corresponding gradient computation logic for the custom forward function.

  • [BETA] mindspore.recompute supports being used in static graph, user can call the api in the function which is decorated by @jit.

  • [BETA] mindspore.enable_dynamic supports symbolic deduction, meaning that the input shapes are allowed to use mindspore.Symbol.

Contributors

Bellatan,caifubi,chaijinwei,changzherui,chengbin,chujinjin,DavidFFFan,DeshiChen,Dring,ehaleva,fary86,gaoyong10,guangpengz,GuoZhibin,guozhijian,haozhang,hedongdong,Henry Shi,hhz886,huangbingjian,huangzhuo,huangziling,huda,Huilan Li,huoxinyou,jiangshanfeng,jiaorui,jiaxueyu,jizewei,kairui_kou,kingxian,kisnwang,leida,liangchenghui,lichen,limingqi107,Linhai,LiNuohang,linux,liubuyu,liudongxu,liuluobin,liuyanwei,lizhitong,looop5,luochao60,machenggui,maoyuanpeng1,Margaret_wangrui,mengxian,MengXiangyu,mengyuanli,Metaqiang,NaCN,nepdada,panzhihui,Qiao_Fu,qiuleilei,qqqhhhbbb,qujianwei,shaoshengqi,shen_haochen,shenwei41,shuqian0,tanghuikang,uuhuu,wang_ziqi,wangjialin,wujueying,XianglongZeng,Xiaoda,xiaopeng,xiaotianci,xiaoyao,XinDu,xuzhen,yanghaoran,yao_yf,yefeng,yide12,yiguangzheng,YijieChen,yuanqi,yuchaojie,YuJianfeng,YukioZzz,yuliangbin,yyyyrf,zhangbuxue,zhangdanyang,zhanghanLeo,zhangyinxia,ZhangZGC,zhanzhan,zhaochenjie,zhengzuohe,zhunaipan,ZPaC,zyli2020,范吉斌,龚昊宇,胡彬,宦晓玲,李栋,李良灿,李林杰,刘崇鸣,刘飞扬,刘力力,刘子涵,宋佳琪,孙昊辰,王泓皓,王振邦,俞涵,张栩浩,张学同,周一航

MindSpore 2.7.0 Release Notes

Major Features and Improvements

Dataset

Parallel

  • [STABLE] MindSpore now supports gather and reduce operations for tensors of non-uniform sizes. Users can use this functionality through the mint.distributed.all_gather_into_tensor_uneven and mint.distributed.reduce_scatter_tensor_uneven interfaces. mint.distributed.all_gather and mint.distributed.reduce_scatter also support gather and reduce operations for non-uniform sizes, respectively.

  • [BETA] The pipeline parallelism supports ZeroBubbleV scheduling, reducing pipeline parallel bubbles, and can be combined with overlap of forward and backward computation-communication phases, to enhance the proportion of communication and computation overlap.

  • [Stable] MindSpore optimizes the PP communication domain and significantly reduces HCCL_BUFFER_SIZE.

  • [BETA] MindSpore supports fine-grained optimization of the HCCL_BUFFER_SIZE, which can be set through the environment variable MS_DEV_HCCL_CONF. Refer to Environment Variables for details.

Compiler

  • [BETA] Support mindspore.nn.Cell registration of forward hooks and backward hooks in graph mode.

Runtime

  • [STABLE] MindSpore supports reserving huge page memory. Users can enable this feature by passing the huge_page_reserve_size parameter in the mindspore.runtime.set_memory API.

Lite

MindSpore Lite​​ delivers lightweight AI inference acceleration capabilities for diverse hardware devices, empowering smart applications. It provides developers with an ​​end-to-end solution​​ and offers algorithm engineers and data scientists a ​​user-friendly development experience​​ characterized by efficient execution and flexible deployment.

To better foster the thriving development of the AI software and hardware application ecosystem, ​​MindSpore Lite has established an independent code repository to drive ecosystem growth​​. In the future, MindSpore Lite will work together with the ​​MindSpore AI community​​ enrich the AI software and hardware application ecosystem.

For further details, please visit the MindSpore Lite Code Repository.

API Change

  • [STABLE] As part of the task of mindspore.mint API integration task, the interface definitions and functionalities of several Tensor APIs have been aligned and optimized.

    mindspore.Tensor

    mindspore.Tensor.masked_scatter

    mindspore.Tensor.bernoulli_

    mindspore.Tensor.zero_

    mindspore.Tensor.copy_

  • [STABLE] mindspore.ops API provides a new interface mindspore.ops.ring_attention_update. Currently, it is only supported on Atlas A2 Training Series Products.

  • [STABLE] Provide new interface mindspore.enable_dynamic to specify whether the shape of the parameter is dynamic shape or dynamic rank.

Contributors

Bellatan,caifubi,ccsszz,chaijinwei,chengbin,chenweifeng,chujinjin,DavidFFFan,DeshiChen,dingjinshan,fary86,fuchao,gaoyong10,GuoZhibin,guozhijian,haozhang,hedongdong,Henry Shi,hhz886,huangbingjian,huangziling,huda,Huilan Li,jiangchao_j,jianghui58,jiangshanfeng,jiaorui,jiaxueyu,jizewei,leida,lichen,limingqi107,LiNuohang,linux,liubuyu,liuluobin,looop5,luochao60,luoyang,maoyuanpeng1,Margaret_wangrui,mengxian,MengXiangyu,NaCN,One_East,panzhihui,Qiao_Fu,qiuleilei,qiuyufeng,r1chardf1d0,SaiYao,shaoshengqi,shen_haochen,shenwei41,shuqian0,St.Universe,suteng,TAJh,tanghuikang,tianxiaodong,wang_ziqi,wangyibo,wujueying,wusimin,XianglongZeng,xiaopeng,xiaotianci,xiaoyao,XinDu,xuzhen,yanghaoran,yangyingchun,yide12,yonibaehr,yuanqi,yuchaojie,YuJianfeng,YukioZzz,yuliangbin,zhangbuxue,zhangdanyang,zhanghanLeo,zhangyinxia,ZhangZGC,zhaochenjie,Zhi Feng Wu,zhuguodong,ZPaC,zyli2020,程超,胡犇,胡彬,宦晓玲,黄勇,李良灿,李林杰,刘飞扬,刘勇琪,刘子涵,王振邦,熊攀,杨卉,俞涵,云骑士,张栩浩,周一航

MindSpore 2.7.0-rc1 Release Notes

Major Features and Improvements

Ascend

  • [STABLE] Added GroupedMatmul optimization in O1 scenario, which supports fusion of element-wise operators to significantly reduce data handling overhead and improve computational efficiency. Users can turn it on by setting the environment variable MS_DEV_GRAPH_KERNEL_FLAGS to "--enable_cluster_ops=GroupedMatmul".

  • [STABLE] Improved ease of use of memory tracker: Users can import tracker data for model memory analysis via mindspore.runtime.memory_replay(file_path); set the tracker data storage path by setting export MS_ALLOC_CONF=memory_tracker_path:file_path;reduce the size of saved data by setting export MS_ALLOC_CONF=simple_tracker:true to save only the last user of each memory block.

  • [STABLE] Optimized the custom operator function of ops.Custom primitive for aclnn types in graph mode, with full support for inputs of non-Tensor type and automatic loading of RegInfo information, which significantly improves the ease of use and flexibility of aclnn custom operator.

PyNative

  • [STABLE] mindspore.nn.Cell.register_forward_hook and mindspore.nn.Cell.register_forward_pre_hook added the with_kwargs argument (default: False) to support passing keyword arguments from the construct call to hook_fn.

  • [STABLE] mindspore.Tensor.register_hook now supports registering hooks on output tensors of operators with multiple outputs.

  • [STABLE] Enhanced Cell custom bprop. Support verification and automatic conversion of data type and shape for return value and corresponding input.

  • [STABLE] Added Storage API for Tensor: Supports memory operations through Storage and related interfaces to achieve memory optimization.

  • [STABLE] Added C++ ms::pynative::PyboostRunner interface to facilitate customization of the operator to support PyNative's multi-stage pipelining runtime.

parallel

Training

  • [STABLE] Recomputation communication overlap: Supports mutual overlap of the communication between two cells for full recomputation, improving the performance of the recomputation scenario model.

  • [STABLE] Quick recovery from accuracy failure without reboot: Supports resuming training of loading a checkpoint file without restrarting the process when training result excepition occurs.

  • [STABLE] Silent data corruption detection: Supporting validation of MatMul results during forward and backward computations. User can enable it by setting the environment variable MS_SDC_DETECT_ENABLE to 1. Using interfaces in the mindspore.utils.sdc_detect module to start/stop detection and get the detection result.

Inference

  • [STABLE] The vLLM MindSpore plugin has now been adapted to vLLM v0.8.3 and supports foundational features of the vLLM V1 new architecture, including inference capabilities such as Chunked Prefill and Automatic Prefix Caching. For service-oriented deployment, vLLM MindSpore adds support for hybrid DP/TP/EP parallel inference capabilities on DeepSeek-V3/R1, effectively improving both full and incremental inference efficiency while reducing device memory overhead.

Tool

MindInsight will no longer update or release new versions after version 2.3,and the related documents have been removed. The origin system optimization data visualization has been integrated into MindStudio Insight, and scalar visualization, parameter distribution visualization, and computational graphs visualization have been integrated into the MindStudio Insight plugins. For details, see the MindStudio Insight User Guide.

  • [STABLE] MindSpore Profiler supports msMonitor, enabling users to collect performance data through online monitoring tools.

  • [STABLE] MindSpore Profiler adds the record_shapes parameter, supporting users to collect shapes of operators issued by the framework side.

  • [STABLE] MindSpore Profiler adds sys resource parameters, supporting the ability to collect sys resource class information.

  • [STABLE] MindSpore Profiler adds host_sysparameter, supporting the ability to collect host information such as system call class, storage class, CPU information, etc.

  • [STABLE] MindSpore Profiler mstx module provides domain functionality, supporting users to finely control mstx data.

API Change

  • [STABLE] Some of the functional, nn and Tensor interfaces in the DEMO state in the mindspore.mint API are turned to STABLE. The mint interfaces are still mostly experimental, with better performance than ops interfaces in the graph compilation mode O0/O1 and PyNative mode. Currently O2 compilation mode (graph sink) and CPU, GPU backend are not supported, which will be gradually improved.

    mindspore.mint

    mindspore.mint.randomperm

    mindspore.mint.randn

    mindspore.mint.randint

    mindspore.mint.triu

    mindspore.mint.empty_like

    mindspore.mint.empty

    mindspore.mint.floor_divide

    mindspore.mint.nn

    mindspore.mint.nn.BatchNorm1d

    mindspore.mint.nn.BatchNorm2d

    mindspore.mint.nn.BatchNorm3d

    mindspore.mint.nn.PixelShuffle

    mindspore.mint.nn.Threshold

    mindspore.mint.nn.functional

    mindspore.mint.nn.functional.threshold

    mindspore.mint.nn.functional.threshold_

    mindspore.mint.nn.functional.pixel_shuffle

    mindspore.Tensor

    mindspore.Tensor.new_full

    mindspore.Tensor.new_empty

    mindspore.Tensor.floor_divide

    mindspore.Tensor.exponential_

  • [STABLE] mindspore.ops API provides an new interface mindspore.ops.swiglu. Currently, only Ascend backend is supported.

  • [STABLE] mindspore.ops.svd of mindspore.ops API now extra supports Ascend backend.

  • [STABLE] mindspore.mint.nn.functional.silu and mindspore.mint.nn.SiLU now support input argument inplace.

  • [STABLE] communication.create_group adds support for additional configuration options for communication domains. HCCL backend supports setting hccl_config in options to set the HCCL communication domain cache size for communication domains.

  • [STABLE] mindspore.runtime API adds implementation of mindspore.runtime.empty_cache.

  • [STABLE] mindspore.runtime.set_memory now supports input argument huge_page_reserve_size.

  • [STABLE] mindspore.runtime.set_cpu_affinity now supports input argument module_to_cpu_dict.

  • [STABLE] minspore.nn.cell added the function to view/save model's state_dict. New interfaces are as follows:

    mindspore.nn.Cell

    cell.register_state_dict_post_hook

    cell.register_state_dict_pre_hook

    cell.state_dict

    cell.register_load_state_dict_pre_hook

    cell.register_load_state_dict_post_hook

    cell.load_state_dict

  • [STABLE] minspore.nn.cell added the function to view/register model's buffer. New interfaces are as follows:

    mindspore.nn.Cell

    cell.register_buffer

    cell.get_buffer

    cell.get_sub_cell

    cell.named_buffer

    cell.buffers

Backwards Incompatible Change

  • runtime.set_cpu_affinity

    The type of affinity_cpu_list changed from dictionary to list to customize the configuration of affinity CPU range segments for a single process only. Added a new parameter module_to_cpu_dict to support customized configuration of CPU affinity policies for hot module threads.

    2.6

    2.7

    >>> from mindspore.runtime import set_cpu_affinity
    >>> set_cpu_affinity(True, {"device0": ["10-19", "23-40"]})

    >>> from mindspore.runtime import set_cpu_affinity
    >>> set_cpu_affinity(True, ["10-19", "23-40"],
    …                                {"main": [0,1,2,3],
    …                                 "runtime": [4,5,6],
    …                                 "pynative": [7,8,9]})

Contributors

baochong,Bellatan,BJ-WANG,caifubi,caiji_zhang,Carey,chaijinwei,changzherui,chengbin,chujinjin,DavidFFFan,DeshiChen,dingjinshan,Dring,ehaleva,Erpim,fary86,fengtingyan,fengyixing,fuchao,gaoyong10,gengdongjie,guangpengz,GuoZhibin,gupengcheng0401,haozhang,hedongdong,hhz886,huandong1,huangbingjian,huangziling,huda,HuilanLi,hujiahui8,jiangchao_j,jianghui58,jiangshanfeng,jiaorui,jiaxueyu,jizewei,jjfeing,jshawjc,kairui_kou,kingxian,kisnwang,lanzhineng,leida,LiangZhibo,lichen,limingqi107,LiNuohang,linux,liubuyu,liuchengji,liuluobin,liuyanwei,lkp,looop5,lujiale,luochao60,luoyang,maoyuanpeng1,Margaret_wangrui,mengxian,MengXiangyu,mengyuanli,NaCN,One_East,panshaowu,panzhihui,pengqi,Qiao_Fu,qiuleilei,qiuyufeng,rainyhorse,SaiYao,shaoshengqi,shen_haochen,shenhaojing,shenwei41,shuqian0,tanghuikang,tangmengcheng,tan-wei-cheng,tianxiaodong,uuhuu,wang_ziqi,WangChengzhao,wangshaocong,wangyibo,wujueying,XianglongZeng,xiaopeng,xiaotianci,xiaoyao,XinDu,xuzhen,yangguodong,yanghaoran,yangyingchun,Yanzhi_YI,yide12,yihangchen,YijieChen,yuanqi,yuchaojie,yuezenglin,YuJianfeng,YukioZzz,yuliangbin,yyuse,zhangbuxue,zhangdanyang,zhanghanLeo,zhangyinxia,ZhangZGC,zhanzhan,zhaochenjie,zhengzuohe,zhouyaqiang0,zhunaipan,zichun_ye,ZPaC,zyli2020,程超,范吉斌,胡犇,胡彬,宦晓玲,黄勇,李栋,李良灿,李林杰,李寅杰3,刘飞扬,刘力力,刘勇琪,刘子涵,梅飞要,宋佳琪,王泓皓,王禹程,王振邦,熊攀,徐安越,杨卉,杨明海,俞涵,虞良斌,云骑士,张栩浩,周一航