Quick Start to Cloud-side Inference
Overview
This article introduces you to the basic functions and usage of MindSpore Lite by using MindSpore Lite to perform cloud-side inference as an example.
MindSpore Lite cloud-side inference is supported to run in Linux environment deployment only. Atlas 200/300/500 inference product, Atlas inference series (with Ascend 310P AI processor), Atlas training series, Nvidia GPU and CPU hardware backends are supported.
Before starting using MindSpore Lite in this chapter, users should have a Linux (e.g. Ubuntu/CentOS/EulerOS) environment ready to operate the verification.
To experience the MindSpore Lite device-side inference process, please refer to the document Quick Start to Device-Side Inference.
We will demonstrate how to use MindSpore Lite distributions for integrated development and write your own inference programs, taking MindSpore Lite C++ interface for integration as an example. For detailed usage of MindSpore Lite C++ interface, users can refer to Cloud-Side inference with C++ Interface.
In addition, users can use Python interface and Java interface of MindSpore Lite for integration. For details, please refer to Cloud-side inference by using Python interface and Cloud-side inference by using Java interface.
Preparation
- Environment requirements - System environment: Linux x86_64, Ubuntu 18.04.02LTS recommended 
 
- Download distributions - Users can download the MindSpore Lite cloud-side inference package - mindspore-lite-{version}-linux-{arch}.tar.gzon the download page of MindSpore official website,- {arch}for- x64or- aarch64.- x64version supports Ascend, Nvidia GPU, CPU three hardware backends,- aarch64only supports Ascend and CPU hardware backends.- The following is the contents of the - x64tar package.- mindspore-lite-{version}-linux-x64 ├── runtime │ ├── include # API header files for MindSpore Lite integrated development │ ├── lib │ │ ├── libascend_ge_plugin.so # Ascend Hardware Backend Remote Mode Plugin │ │ ├── libascend_kernel_plugin.so # Ascend Hardware Backend Plugin │ │ ├── libdvpp_utils.so # Ascend Hardware Backend DVPP Plugin │ │ ├── libminddata-lite.a # Image processing static library │ │ ├── libminddata-lite.so # Image processing dynamic library │ │ ├── libmindspore_core.so # Dynamic library for MindSpore Lite inference framework │ │ ├── libmindspore_glog.so.0 # MindSpore Lite Logging Dynamic Library │ │ ├── libmindspore-lite-jni.so # JNI dynamic library for MindSpore Lite inference framework │ │ ├── libmindspore-lite.so # Dynamic library for MindSpore Lite inference framework │ │ ├── libmsplugin-ge-litert.so # CPU Hardware Backend Plugin │ │ ├── libruntime_convert_plugin.so # Online Converter Plugin │ │ ├── libtensorrt_plugin.so # Nvidia GPU Hardware Backend Plugin │ │ ├── libtransformer-shared.so # Transformer Dynamic Library │ │ └── mindspore-lite-java.jar # MindSpore Lite inference framework jar package │ └── third_party └── tools ├── benchmark # Benchmark Test Tools Catalogue └── converter # Model Converter Catalogue
- Obtain model - MindSpore Lite cloud-side inference currently only supports MindIR model format of MindSpore. You can export MindIR model by MindSpore or get MindIR model by model converter to convert models in Tensorflow, Onnx, Caffe. - The model file mobilenetv2.mindir can be downloaded as a sample model. 
- Obtain sample - The sample code of this section is put in the directory mindspore/lite/examples/cloud_infer/quick_start_cpp. - quick_start_cpp ├── CMakeLists.txt ├── main.cc ├── build # Temporary build directory └── model └── mobilenetv2.mindir # Model files
Environment Variables
To ensure that the script will work properly, environment variables need to be set before building and executing the inference.
MindSpore Lite Environment Variables
After unzipping the MindSpore Lite cloud-side inference package, set the LITE_HOME environment variable to the path of the unzipping, e.g.
export LITE_HOME=$some_path/mindpsore-lite-2.0.0-linux-x64
Set the environment variable LD_LIBRARY_PATH:
export LD_LIBRARY_PATH=$LITE_HOME/runtime/lib:$LITE_HOME/runtime/third_party/dnnl:$LITE_HOME/tools/converter/lib:$LD_LIBRARY_PATH
If you need to use the convert_lite or benchmark tools, you need to set the environment variable PATH.
export PATH=$LITE_HOME/tools/converter/converter:$LITE_HOME/tools/benchmark:$PATH
Ascend Hardware Backend Environment Variables
- Verify the run package installation path - If you use the root user to complete the run package installation, the default path is ‘/usr/local/Ascend’, and the default installation path for non-root users is ‘/home/HwHiAiUser/Ascend’. - Taking the path of the root user as an example, set the environment variables as follows: - export ASCEND_HOME=/usr/local/Ascend # the root directory of run package 
- Distinguish run package versions - The run package is divided into 2 versions, distinguished by whether the ‘ascend-toolkit’ folder is set in the installation directory. - If the ‘ascend-toolkit’ folder exists, set the environment variables as follows: - export ASCEND_HOME=/usr/local/Ascend export PATH=${ASCEND_HOME}/ascend-toolkit/latest/compiler/bin:${ASCEND_HOME}/ascend-toolkit/latest/compiler/ccec_compiler/bin/:${PATH} export LD_LIBRARY_PATH=${ASCEND_HOME}/driver/lib64:${ASCEND_HOME}/ascend-toolkit/latest/lib64:${LD_LIBRARY_PATH} export ASCEND_OPP_PATH=${ASCEND_HOME}/ascend-toolkit/latest/opp export ASCEND_AICPU_PATH=${ASCEND_HOME}/ascend-toolkit/latest/ export PYTHONPATH=${ASCEND_HOME}/ascend-toolkit/latest/compiler/python/site-packages:${PYTHONPATH} export TOOLCHAIN_HOME=${ASCEND_HOME}/ascend-toolkit/latest/toolkit - If not exist, set the environment variables as follows: - export ASCEND_HOME=/usr/local/Ascend export PATH=${ASCEND_HOME}/latest/compiler/bin:${ASCEND_HOME}/latest/compiler/ccec_compiler/bin:${PATH} export LD_LIBRARY_PATH=${ASCEND_HOME}/driver/lib64:${ASCEND_HOME}/latest/lib64:${LD_LIBRARY_PATH} export ASCEND_OPP_PATH=${ASCEND_HOME}/latest/opp export ASCEND_AICPU_PATH=${ASCEND_HOME}/latest export PYTHONPATH=${ASCEND_HOME}/latest/compiler/python/site-packages:${PYTHONPATH} export TOOLCHAIN_HOME=${ASCEND_HOME}/latest/toolkit 
Nvidia GPU Hardware Backend Environment Variables
When the hardware backend is an Nvidia GPU, inference relies on cuda and TensorRT, and users need to install cuda and TensorRT first.
The following is an example of cuda11.1 and TensorRT8.5.1.7. Users need to set the environment variables according to the actual installation path.
export CUDA_HOME=/usr/local/cuda-11.1
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
export TENSORRT_PATH=/usr/local/TensorRT-8.5.1.7
export PATH=$TENSORRT_PATH/bin:$PATH
export LD_LIBRARY_PATH=$TENSORRT_PATH/lib:$LD_LIBRARY_PATH
Setting Host-side Logging Level
The Host logging level defaults to WARNING.
export GLOG_v=2 # 0-DEBUG, 1-INFO, 2-WARNING, 3-ERROR, 4-CRITICAL, default level is WARNING.
Integration Inference
We will demonstrate how to use MindSpore Lite distributions for integrated development and write your own inference programs, using MindSpore Lite C++ interface for integration as an example.
Before integration, users can also directly use the benchmark tool (benchmark) distributed with the distribution to perform inference tests.
Configuring CMake
Users need to integrate the mindspore-lite library file inside the distribution and perform model inference through the API interface declared in the MindSpore Lite header file.
The following is sample code when integrating the libmindspore-lite.so dynamic library via CMake. The environment variable LITE_HOME is read to get the unpacked header and library file directories of MindSpore Lite tar package.
cmake_minimum_required(VERSION 3.14)
project(QuickStartCpp)
if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.3.0)
    message(FATAL_ERROR "GCC version ${CMAKE_CXX_COMPILER_VERSION} must not be less than 7.3.0")
endif()
if(DEFINED ENV{LITE_HOME})
    set(LITE_HOME $ENV{LITE_HOME})
endif()
# Add directory to include search path
include_directories(${LITE_HOME}/runtime)
# Add directory to linker search path
link_directories(${LITE_HOME}/runtime/lib)
link_directories(${LITE_HOME}/tools/converter/lib)
file(GLOB_RECURSE QUICK_START_CXX ${CMAKE_CURRENT_SOURCE_DIR}/*.cc)
add_executable(mindspore_quick_start_cpp ${QUICK_START_CXX})
target_link_libraries(mindspore_quick_start_cpp mindspore-lite pthread dl)
Writing Code
The code in main.cc is shown below:
#include <algorithm>
#include <random>
#include <iostream>
#include <fstream>
#include <cstring>
#include <memory>
#include "include/api/model.h"
#include "include/api/context.h"
#include "include/api/status.h"
#include "include/api/types.h"
template <typename T, typename Distribution>
void GenerateRandomData(int size, void *data, Distribution distribution) {
  std::mt19937 random_engine;
  int elements_num = size / sizeof(T);
  (void)std::generate_n(static_cast<T *>(data), elements_num,
                        [&distribution, &random_engine]() { return static_cast<T>(distribution(random_engine)); });
}
int GenerateInputDataWithRandom(std::vector<mindspore::MSTensor> inputs) {
  for (auto tensor : inputs) {
    auto input_data = tensor.MutableData();
    if (input_data == nullptr) {
      std::cerr << "MallocData for inTensor failed." << std::endl;
      return -1;
    }
    GenerateRandomData<float>(tensor.DataSize(), input_data, std::uniform_real_distribution<float>(0.1f, 1.0f));
  }
  return 0;
}
int QuickStart(int argc, const char **argv) {
  if (argc < 2) {
    std::cerr << "Model file must be provided.\n";
    return -1;
  }
  // Read model file.
  std::string model_path = argv[1];
  if (model_path.empty()) {
    std::cerr << "Model path " << model_path << " is invalid.";
    return -1;
  }
  // Create and init context, add CPU device info
  auto context = std::make_shared<mindspore::Context>();
  if (context == nullptr) {
    std::cerr << "New context failed." << std::endl;
    return -1;
  }
  auto &device_list = context->MutableDeviceInfo();
  auto device_info = std::make_shared<mindspore::CPUDeviceInfo>();
  if (device_info == nullptr) {
    std::cerr << "New CPUDeviceInfo failed." << std::endl;
    return -1;
  }
  device_list.push_back(device_info);
  mindspore::Model model;
  // Build model
  auto build_ret = model.Build(model_path, mindspore::kMindIR, context);
  if (build_ret != mindspore::kSuccess) {
    std::cerr << "Build model error " << build_ret << std::endl;
    return -1;
  }
  // Get Input
  auto inputs = model.GetInputs();
  // Generate random data as input data.
  if (GenerateInputDataWithRandom(inputs) != 0) {
    std::cerr << "Generate Random Input Data failed." << std::endl;
    return -1;
  }
  // Model Predict
  std::vector<mindspore::MSTensor> outputs;
  auto predict_ret = model.Predict(inputs, &outputs);
  if (predict_ret != mindspore::kSuccess) {
    std::cerr << "Predict error " << predict_ret << std::endl;
    return -1;
  }
  // Print Output Tensor Data.
  constexpr int kNumPrintOfOutData = 50;
  for (auto &tensor : outputs) {
    std::cout << "tensor name is:" << tensor.Name() << " tensor size is:" << tensor.DataSize()
              << " tensor elements num is:" << tensor.ElementNum() << std::endl;
    auto out_data = reinterpret_cast<const float *>(tensor.Data().get());
    std::cout << "output data is:";
    for (int i = 0; i < tensor.ElementNum() && i <= kNumPrintOfOutData; i++) {
      std::cout << out_data[i] << " ";
    }
    std::cout << std::endl;
  }
  return 0;
}
int main(int argc, const char **argv) { return QuickStart(argc, argv); }
The code function is parsed as follows:
- Initialize the Context configuration - Context holds the relevant configurations needed for model inference, including operator preferences, number of threads, automatic concurrency, and other configurations related to the inference processor. For more details about Context, please refer to API interface description of Context. When loading the model in MindSpore Lite, an object of class - Contextmust be provided, so in this example, an object- contextof class- Contextis first requested.- auto context = std::make_shared<mindspore::Context>(); - Next, get the device management list of the - contextobject through the- Context::MutableDeviceInfointerface.- auto &device_list = context->MutableDeviceInfo(); - In this example, since the CPU is used for inference, an object - device_infoof class- CPUDeviceInfoneeds to be requested.- auto device_info = std::make_shared<mindspore::CPUDeviceInfo>(); - Since the default CPU settings are used, there is no need to do any settings for the - device_infoobject and it is directly added to the device management list of- context.- device_list.push_back(device_info); 
- Load models - First create the object - modelof a- Modelclass, and the- Modelclass defines the model in MindSpore for computational graph management. For a detailed description of the- Modelclass, please refer to the API documentation.- mindspore::Model model; - Then call the - Buildinterface to pass in the model and compile it to a running state on the device.- auto build_ret = model.Build(model_path, mindspore::kMindIR, context); 
- Pass in data - Before performing model inference, you need to set the input data for inference. In this example, all the input tensor of the model is obtained through the - Model.GetInputsinterface. The format of the individual tensor is- MSTensor. For a detailed description of the- MSTensortensor, please refer to the API description of- MSTensor.- auto inputs = model.GetInputs(); - The - MutableDatainterface of the tensor can get the data memory pointer of the tensor, and the- DataSizeinterface of the tensor can get the data byte length of the tensor. The data type of the tensor can be obtained through the- DataTypeinterface of the tensor, and users can do different processing according to the data format of their models.- auto input_data = tensor.MutableData(); - Next, the data on which we want to perform inference is passed inside the tensor via a data pointer. In this case we pass in floating point data randomly generated from 0.1 to 1 and the data is evenly distributed. In practical inference, after reading the actual data such as images or audio, the user needs to perform algorithm-specific pre-processing operations and pass the processed data into the model. - template <typename T, typename Distribution> void GenerateRandomData(int size, void *data, Distribution distribution) { std::mt19937 random_engine; int elements_num = size / sizeof(T); (void)std::generate_n(static_cast<T *>(data), elements_num, [&distribution, &random_engine]() { return static_cast<T>(distribution(random_engine)); }); } int GenerateInputDataWithRandom(std::vector<mindspore::MSTensor> inputs) { for (auto tensor : inputs) { auto input_data = tensor.MutableData(); if (input_data == nullptr) { std::cerr << "MallocData for inTensor failed." << std::endl; return -1; } GenerateRandomData<float>(tensor.DataSize(), input_data, std::uniform_real_distribution<float>(0.1f, 1.0f)); } return 0; } // Get Input auto inputs = model.GetInputs(); // Generate random data as input data. if (GenerateInputDataWithRandom(inputs) != 0) { std::cerr << "Generate Random Input Data failed." << std::endl; return -1; } 
- Execute inference - First, an array - outputsis requested to hold the output tensor of the model inference, and then the model inference interface- Predictis called with the input tensor and output tensor as its parameters. After a successful inference, the output tensor is stored in- outputs.- std::vector<MSTensor> outputs; auto status = model.Predict(inputs, &outputs); 
- Obtain inference results - The data pointer to the output tensor is obtained via - Data. In this case, it is strongly converted to a floating point pointer, and the user can convert the corresponding type according to the data type of model, or get the data type through the- DataTypeinterface of the tensor.- auto out_data = reinterpret_cast<float *>(tensor.Data().get()); - In this example, the inference output is printed directly. - for (int i = 0; i < tensor.ElementNum() && i <= kNumPrintOfOutData; i++) { std::cout << out_data[i] << " "; } std::cout << std::endl; 
- Release the model object - Model destructions will release model-related resources. 
Compiling
Set the environment variables as described in the Environment Variables section. Then compile the program as follows.
mkdir build && cd build
cmake ../
make
After successful compilation, you can get the quick_start_cpp executable in the build directory.
Running the Inference Program
./mindspore_quick_start_cpp ../model/mobilenetv2.mindir
After execution, the following results will be obtained, printing the name of the output Tensor, the size of the output Tensor, the number of the output Tensor and the first 50 data:
tensor name is:Default/head-MobileNetV2Head/Softmax-op204 tensor size is:4000 tensor elements num is:1000
output data is:5.07155e-05 0.00048712 0.000312549 0.00035624 0.0002022 8.58958e-05 0.000187147 0.000365937 0.000281044 0.000255672 0.00108948 0.00390996 0.00230398 0.00128984 0.00307477 0.00147607 0.00106759 0.000589853 0.000848115 0.00143693 0.000685777 0.00219331 0.00160639 0.00215123 0.000444315 0.000151986 0.000317552 0.00053971 0.00018703 0.000643944 0.000218269 0.000931556 0.000127084 0.000544278 0.000887942 0.000303909 0.000273875 0.00035335 0.00229062 0.000453207 0.0011987 0.000621194 0.000628335 0.000838564 0.000611029 0.000372603 0.00147742 0.000270685 8.29869e-05 0.000116974 0.000876237