Inference Model Overview

Ascend GPU CPU Inference Application

MindSpore can execute inference tasks on different hardware platforms based on trained models.

Model Files

MindSpore can save two types of data: training parameters and network models that contain parameter information.

Training parameters are stored in the checkpoint format.
Network models are stored in the MindIR, AIR, or ONNX format.

Basic concepts and application scenarios of these formats are as follows:

Checkpoint
- Checkpoint uses the Protocol Buffers format and stores all network parameter values.
- It is generally used to resume training after a training task is interrupted or executes a fine-tune task after training.
MindSpore IR (MindIR)
- MindIR is a graph-based function-like IR of MindSpore and defines scalable graph structures and operator IRs.
- It eliminates model differences between different backends and is generally used to perform inference tasks across hardware platforms.
Open Neural Network Exchange (ONNX)
- ONNX is an open format built to represent machine learning models.
- It is generally used to transfer models between different frameworks or used on the inference engine (TensorRT).
- At present, mindspire only supports the export of ONNX model, and does not support loading onnx model for inference. Currently, the models supported for export are resnet50, yolov3_ darknet53, YOLOv4 and BERT. These models can be used on ONNX Runtime.
Ascend Intermediate Representation (AIR)
- AIR is an open file format defined by Huawei for machine learning.
- It adapts to Huawei AI processors well and is generally used to execute inference tasks on Ascend 310.

Inference Execution

Inference can be classified into the following two modes based on the application environment:

Local inference

Load a checkpoint file generated during network training and call the model.predict API for inference and validation. For details, see Online Inference with Checkpoint.
Cross-platform inference

Use a network definition and a checkpoint file, call the export API to export a model file, and perform inference on different platforms. Currently, MindIR, ONNX, and AIR (on only Ascend AI Processors) models can be exported. For details, see Saving Models.

Introduction to MindIR

MindSpore defines logical network structures and operator attributes through a unified IR, and decouples model files in MindIR format from hardware platforms to implement one-time training and multiple-time deployment.

Overview

As a unified model file of MindSpore, MindIR stores network structures and weight parameter values. In addition, it can be deployed on the on-cloud Serving and the on-device Lite platforms to execute inference tasks.

A MindIR file supports the deployment of multiple hardware forms.
- On-cloud deployment and inference on Serving: After MindSpore trains and generates a MindIR model file, the file can be directly sent to MindSpore Serving for loading and inference. No additional model conversion is required. This ensures that models on different hardware such as Ascend, GPU, and CPU are unified.
- On-device inference and deployment on Lite: MindIR can be directly used for Lite deployment. In addition, to meet the lightweight requirements on devices, the model miniaturization and conversion functions are provided. An original MindIR model file can be converted from the Protocol Buffers format to the FlatBuffers format for storage, and the network structure is lightweight to better meet the performance and memory requirements on devices.
Application Scenarios

Use a network definition and a checkpoint file to export a MindIR model file, and then execute inference based on different requirements, for example, Inference Using the MindIR Model on Ascend 310 AI Processors, MindSpore Serving-based Inference Service Deployment, and Inference on Devices.

Networks Supported by MindIR

AdvancedEast	AlexNet	AutoDis	BERT	BGCF	CenterFace
CNN	CNN&CTC	CRNN	CSPDarkNet53	CTPN	DeepFM
DeepLabV3	DeepText	DenseNet121	DPN	DS-CNN	FaceAttribute
FaceDetection	FaceQualityAssessment	FaceRecognition	FaceRecognitionForTracking	Faster R-CNN	FasterRcnn-ResNet50
FasterRcnn-ResNet101	FasterRcnn-ResNet152	FCN	FCN-4	GAT	GCN
GoogLeNet	GRU	hardnet	InceptionV3	InceptionV4	LeNet
LSTM-SegtimentNet	Mask R-CNN	MaskRCNN_MobileNetV1	MASS	MobileNetV1	MobileNetV2
NCF	PSENet	ResNet18	ResNet50	ResNet101	ResNet152
ResNetV2-50	ResNetV2-101	ResNetV2-152	SE-Net	SSD-MobileNetV2	ResNext50
ResNext101	RetinaNet	Seq2Seq(Attention)	SE-ResNet50	ShuffleNetV1	SimplePoseNet
SqueezeNet	SSD	SSD-GhostNet	SSD-MobileNetV1-FPN	SSD-MobileNetV2-FPNlite	SSD-ResNet50
SSD-ResNet50-FPN	SSD-VGG16	TextCNN	TextRCNN	TinyBert	TinyDarknet
Transformer	UNet++	UNet2D	VGG16	WarpCTC	Wide&Deep
WGAN	Xception	YOLOv3-DarkNet53	YOLOv3-ResNet18	YOLOv4	YOLOv5

In addition to the network in the above table, if the operator used in the user-defined network can be exported to the MindIR model file, the MindIR model file can also be used to execute inference tasks.