Quick Start to Cloud-side Inference

Overview

This article introduces you to the basic functions and usage of MindSpore Lite by using MindSpore Lite to perform cloud-side inference as an example.

MindSpore Lite cloud-side inference is supported to run in Linux environment deployment only. Atlas 200/300/500 inference product, Atlas inference series (with Ascend 310P AI processor), Atlas training series, Nvidia GPU and CPU hardware backends are supported.

Before starting using MindSpore Lite in this chapter, users should have a Linux (e.g. Ubuntu/CentOS/EulerOS) environment ready to operate the verification.

To experience the MindSpore Lite device-side inference process, please refer to the document Quick Start to Device-Side Inference.

We will demonstrate how to use MindSpore Lite distributions for integrated development and write your own inference programs, taking MindSpore Lite C++ interface for integration as an example. For detailed usage of MindSpore Lite C++ interface, users can refer to Cloud-Side inference with C++ Interface.

In addition, users can use Python interface and Java interface of MindSpore Lite for integration. For details, please refer to Cloud-side inference by using Python interface and Cloud-side inference by using Java interface.

Preparation

Environment requirements
- System environment: Linux x86_64, Ubuntu 18.04.02LTS recommended

Download distributions

Users can download the MindSpore Lite cloud-side inference package mindspore-lite-{version}-linux-{arch}.tar.gz on the download page of MindSpore official website, {arch} for x64 or aarch64. x64 version supports Ascend, Nvidia GPU, CPU three hardware backends, aarch64 only supports Ascend and CPU hardware backends.

The following is the contents of the x64 tar package.

mindspore-lite-{version}-linux-x64
├── runtime
│   ├── include                          # API header files for MindSpore Lite integrated development
│   ├── lib
│   │   ├── libascend_ge_plugin.so       # Ascend Hardware Backend Remote Mode Plugin
│   │   ├── libascend_kernel_plugin.so   # Ascend Hardware Backend Plugin
│   │   ├── libdvpp_utils.so             # Ascend Hardware Backend DVPP Plugin
│   │   ├── libminddata-lite.a           # Image processing static library
│   │   ├── libminddata-lite.so          # Image processing dynamic library
│   │   ├── libmindspore_core.so         # Dynamic library for MindSpore Lite inference framework
│   │   ├── libmindspore_glog.so.0       # MindSpore Lite Logging Dynamic Library
│   │   ├── libmindspore-lite-jni.so     # JNI dynamic library for MindSpore Lite inference framework
│   │   ├── libmindspore-lite.so         # Dynamic library for MindSpore Lite inference framework
│   │   ├── libmsplugin-ge-litert.so     # CPU Hardware Backend Plugin
│   │   ├── libruntime_convert_plugin.so # Online Converter Plugin
│   │   ├── libtensorrt_plugin.so        # Nvidia GPU Hardware Backend Plugin
│   │   ├── libtransformer-shared.so     # Transformer Dynamic Library
│   │   └── mindspore-lite-java.jar      # MindSpore Lite inference framework jar package
│   └── third_party
└── tools
    ├── benchmark       # Benchmark Test Tools Catalogue
    └── converter       # Model Converter Catalogue

Obtain model

MindSpore Lite cloud-side inference currently only supports MindIR model format of MindSpore. You can export MindIR model by MindSpore or get MindIR model by model converter to convert models in Tensorflow, Onnx, Caffe.

The model file mobilenetv2.mindir can be downloaded as a sample model.

Obtain sample

The sample code of this section is put in the directory mindspore/lite/examples/cloud_infer/quick_start_cpp.

quick_start_cpp
├── CMakeLists.txt
├── main.cc
├── build                           # Temporary build directory
└── model
    └── mobilenetv2.mindir          # Model files

Environment Variables

To ensure that the script will work properly, environment variables need to be set before building and executing the inference.

MindSpore Lite Environment Variables

After unzipping the MindSpore Lite cloud-side inference package, set the LITE_HOME environment variable to the path of the unzipping, e.g.

export LITE_HOME=$some_path/mindpsore-lite-2.0.0-linux-x64

Set the environment variable LD_LIBRARY_PATH:

export LD_LIBRARY_PATH=$LITE_HOME/runtime/lib:$LITE_HOME/runtime/third_party/dnnl:$LITE_HOME/tools/converter/lib:$LD_LIBRARY_PATH

If you need to use the convert_lite or benchmark tools, you need to set the environment variable PATH.

export PATH=$LITE_HOME/tools/converter/converter:$LITE_HOME/tools/benchmark:$PATH

Ascend Hardware Backend Environment Variables

Verify the run package installation path

If you use the root user to complete the run package installation, the default path is ‘/usr/local/Ascend’, and the default installation path for non-root users is ‘/home/HwHiAiUser/Ascend’.

Taking the path of the root user as an example, set the environment variables as follows:
```
export ASCEND_HOME=/usr/local/Ascend  # the root directory of run package
```

Distinguish run package versions

The run package is divided into 2 versions, distinguished by whether the ‘ascend-toolkit’ folder is set in the installation directory.

If the ‘ascend-toolkit’ folder exists, set the environment variables as follows:

export ASCEND_HOME=/usr/local/Ascend
export PATH=${ASCEND_HOME}/ascend-toolkit/latest/compiler/bin:${ASCEND_HOME}/ascend-toolkit/latest/compiler/ccec_compiler/bin/:${PATH}
export LD_LIBRARY_PATH=${ASCEND_HOME}/driver/lib64:${ASCEND_HOME}/ascend-toolkit/latest/lib64:${LD_LIBRARY_PATH}
export ASCEND_OPP_PATH=${ASCEND_HOME}/ascend-toolkit/latest/opp
export ASCEND_AICPU_PATH=${ASCEND_HOME}/ascend-toolkit/latest/
export PYTHONPATH=${ASCEND_HOME}/ascend-toolkit/latest/compiler/python/site-packages:${PYTHONPATH}
export TOOLCHAIN_HOME=${ASCEND_HOME}/ascend-toolkit/latest/toolkit

If not exist, set the environment variables as follows:

export ASCEND_HOME=/usr/local/Ascend
export PATH=${ASCEND_HOME}/latest/compiler/bin:${ASCEND_HOME}/latest/compiler/ccec_compiler/bin:${PATH}
export LD_LIBRARY_PATH=${ASCEND_HOME}/driver/lib64:${ASCEND_HOME}/latest/lib64:${LD_LIBRARY_PATH}
export ASCEND_OPP_PATH=${ASCEND_HOME}/latest/opp
export ASCEND_AICPU_PATH=${ASCEND_HOME}/latest
export PYTHONPATH=${ASCEND_HOME}/latest/compiler/python/site-packages:${PYTHONPATH}
export TOOLCHAIN_HOME=${ASCEND_HOME}/latest/toolkit

Nvidia GPU Hardware Backend Environment Variables

When the hardware backend is an Nvidia GPU, inference relies on cuda and TensorRT, and users need to install cuda and TensorRT first.

The following is an example of cuda11.1 and TensorRT8.5.1.7. Users need to set the environment variables according to the actual installation path.

export CUDA_HOME=/usr/local/cuda-11.1
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH

export TENSORRT_PATH=/usr/local/TensorRT-8.5.1.7
export PATH=$TENSORRT_PATH/bin:$PATH
export LD_LIBRARY_PATH=$TENSORRT_PATH/lib:$LD_LIBRARY_PATH

Setting Host-side Logging Level

The Host logging level defaults to WARNING.

export GLOG_v=2 # 0-DEBUG, 1-INFO, 2-WARNING, 3-ERROR, 4-CRITICAL, default level is WARNING.

Integration Inference

We will demonstrate how to use MindSpore Lite distributions for integrated development and write your own inference programs, using MindSpore Lite C++ interface for integration as an example.

Before integration, users can also directly use the benchmark tool (benchmark) distributed with the distribution to perform inference tests.

Configuring CMake

Users need to integrate the mindspore-lite library file inside the distribution and perform model inference through the API interface declared in the MindSpore Lite header file.

The following is sample code when integrating the libmindspore-lite.so dynamic library via CMake. The environment variable LITE_HOME is read to get the unpacked header and library file directories of MindSpore Lite tar package.

cmake_minimum_required(VERSION 3.14)
project(QuickStartCpp)

if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU" AND CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.3.0)
    message(FATAL_ERROR "GCC version ${CMAKE_CXX_COMPILER_VERSION} must not be less than 7.3.0")
endif()

if(DEFINED ENV{LITE_HOME})
    set(LITE_HOME $ENV{LITE_HOME})
endif()

# Add directory to include search path
include_directories(${LITE_HOME}/runtime)
# Add directory to linker search path
link_directories(${LITE_HOME}/runtime/lib)
link_directories(${LITE_HOME}/tools/converter/lib)

file(GLOB_RECURSE QUICK_START_CXX ${CMAKE_CURRENT_SOURCE_DIR}/*.cc)
add_executable(mindspore_quick_start_cpp ${QUICK_START_CXX})

target_link_libraries(mindspore_quick_start_cpp mindspore-lite pthread dl)

Writing Code

The code in main.cc is shown below:

#include <algorithm>
#include <random>
#include <iostream>
#include <fstream>
#include <cstring>
#include <memory>
#include "include/api/model.h"
#include "include/api/context.h"
#include "include/api/status.h"
#include "include/api/types.h"

template <typename T, typename Distribution>
void GenerateRandomData(int size, void *data, Distribution distribution) {
  std::mt19937 random_engine;
  int elements_num = size / sizeof(T);
  (void)std::generate_n(static_cast<T *>(data), elements_num,
                        [&distribution, &random_engine]() { return static_cast<T>(distribution(random_engine)); });
}

int GenerateInputDataWithRandom(std::vector<mindspore::MSTensor> inputs) {
  for (auto tensor : inputs) {
    auto input_data = tensor.MutableData();
    if (input_data == nullptr) {
      std::cerr << "MallocData for inTensor failed." << std::endl;
      return -1;
    }
    GenerateRandomData<float>(tensor.DataSize(), input_data, std::uniform_real_distribution<float>(0.1f, 1.0f));
  }
  return 0;
}

int QuickStart(int argc, const char **argv) {
  if (argc < 2) {
    std::cerr << "Model file must be provided.\n";
    return -1;
  }
  // Read model file.
  std::string model_path = argv[1];
  if (model_path.empty()) {
    std::cerr << "Model path " << model_path << " is invalid.";
    return -1;
  }

  // Create and init context, add CPU device info
  auto context = std::make_shared<mindspore::Context>();
  if (context == nullptr) {
    std::cerr << "New context failed." << std::endl;
    return -1;
  }
  auto &device_list = context->MutableDeviceInfo();
  auto device_info = std::make_shared<mindspore::CPUDeviceInfo>();
  if (device_info == nullptr) {
    std::cerr << "New CPUDeviceInfo failed." << std::endl;
    return -1;
  }
  device_list.push_back(device_info);

  mindspore::Model model;
  // Build model
  auto build_ret = model.Build(model_path, mindspore::kMindIR, context);
  if (build_ret != mindspore::kSuccess) {
    std::cerr << "Build model error " << build_ret << std::endl;
    return -1;
  }

  // Get Input
  auto inputs = model.GetInputs();
  // Generate random data as input data.
  if (GenerateInputDataWithRandom(inputs) != 0) {
    std::cerr << "Generate Random Input Data failed." << std::endl;
    return -1;
  }

  // Model Predict
  std::vector<mindspore::MSTensor> outputs;
  auto predict_ret = model.Predict(inputs, &outputs);
  if (predict_ret != mindspore::kSuccess) {
    std::cerr << "Predict error " << predict_ret << std::endl;
    return -1;
  }

  // Print Output Tensor Data.
  constexpr int kNumPrintOfOutData = 50;
  for (auto &tensor : outputs) {
    std::cout << "tensor name is:" << tensor.Name() << " tensor size is:" << tensor.DataSize()
              << " tensor elements num is:" << tensor.ElementNum() << std::endl;
    auto out_data = reinterpret_cast<const float *>(tensor.Data().get());
    std::cout << "output data is:";
    for (int i = 0; i < tensor.ElementNum() && i <= kNumPrintOfOutData; i++) {
      std::cout << out_data[i] << " ";
    }
    std::cout << std::endl;
  }
  return 0;
}

int main(int argc, const char **argv) { return QuickStart(argc, argv); }

The code function is parsed as follows:

Initialize the Context configuration

Context holds the relevant configurations needed for model inference, including operator preferences, number of threads, automatic concurrency, and other configurations related to the inference processor. For more details about Context, please refer to API interface description of Context. When loading the model in MindSpore Lite, an object of class Context must be provided, so in this example, an object context of class Context is first requested.
```
auto context = std::make_shared<mindspore::Context>();
```
Next, get the device management list of the context object through the Context::MutableDeviceInfo interface.
```
auto &device_list = context->MutableDeviceInfo();
```
In this example, since the CPU is used for inference, an object device_info of class CPUDeviceInfo needs to be requested.
```
auto device_info = std::make_shared<mindspore::CPUDeviceInfo>();
```
Since the default CPU settings are used, there is no need to do any settings for the device_info object and it is directly added to the device management list of context.
```
device_list.push_back(device_info);
```
Load models

First create the object model of a Model class, and the Model class defines the model in MindSpore for computational graph management. For a detailed description of the Model class, please refer to the API documentation.
```
mindspore::Model model;
```
Then call the Build interface to pass in the model and compile it to a running state on the device.
```
auto build_ret = model.Build(model_path, mindspore::kMindIR, context);
```

Pass in data

Before performing model inference, you need to set the input data for inference. In this example, all the input tensor of the model is obtained through the Model.GetInputs interface. The format of the individual tensor is MSTensor. For a detailed description of the MSTensor tensor, please refer to the API description of MSTensor.

auto inputs = model.GetInputs();

The MutableData interface of the tensor can get the data memory pointer of the tensor, and the DataSize interface of the tensor can get the data byte length of the tensor. The data type of the tensor can be obtained through the DataType interface of the tensor, and users can do different processing according to the data format of their models.

auto input_data = tensor.MutableData();

Next, the data on which we want to perform inference is passed inside the tensor via a data pointer. In this case we pass in floating point data randomly generated from 0.1 to 1 and the data is evenly distributed. In practical inference, after reading the actual data such as images or audio, the user needs to perform algorithm-specific pre-processing operations and pass the processed data into the model.

template <typename T, typename Distribution>
void GenerateRandomData(int size, void *data, Distribution distribution) {
  std::mt19937 random_engine;
  int elements_num = size / sizeof(T);
  (void)std::generate_n(static_cast<T *>(data), elements_num,
                        [&distribution, &random_engine]() { return static_cast<T>(distribution(random_engine)); });
}

int GenerateInputDataWithRandom(std::vector<mindspore::MSTensor> inputs) {
  for (auto tensor : inputs) {
    auto input_data = tensor.MutableData();
    if (input_data == nullptr) {
      std::cerr << "MallocData for inTensor failed." << std::endl;
      return -1;
    }
    GenerateRandomData<float>(tensor.DataSize(), input_data, std::uniform_real_distribution<float>(0.1f, 1.0f));
  }
  return 0;
}

  // Get Input
  auto inputs = model.GetInputs();
  // Generate random data as input data.
  if (GenerateInputDataWithRandom(inputs) != 0) {
    std::cerr << "Generate Random Input Data failed." << std::endl;
    return -1;
  }

Execute inference

First, an array outputs is requested to hold the output tensor of the model inference, and then the model inference interface Predict is called with the input tensor and output tensor as its parameters. After a successful inference, the output tensor is stored in outputs.
```
std::vector<MSTensor> outputs;
auto status = model.Predict(inputs, &outputs);
```
Obtain inference results

The data pointer to the output tensor is obtained via Data. In this case, it is strongly converted to a floating point pointer, and the user can convert the corresponding type according to the data type of model, or get the data type through the DataType interface of the tensor.
```
auto out_data = reinterpret_cast<float *>(tensor.Data().get());
```
In this example, the inference output is printed directly.
```
for (int i = 0; i < tensor.ElementNum() && i <= kNumPrintOfOutData; i++) {
  std::cout << out_data[i] << " ";
}
std::cout << std::endl;
```
Release the model object

Model destructions will release model-related resources.

Compiling

Set the environment variables as described in the Environment Variables section. Then compile the program as follows.

mkdir build && cd build
cmake ../
make

After successful compilation, you can get the quick_start_cpp executable in the build directory.

Running the Inference Program

./mindspore_quick_start_cpp ../model/mobilenetv2.mindir

After execution, the following results will be obtained, printing the name of the output Tensor, the size of the output Tensor, the number of the output Tensor and the first 50 data:

tensor name is:Default/head-MobileNetV2Head/Softmax-op204 tensor size is:4000 tensor elements num is:1000
output data is:5.07155e-05 0.00048712 0.000312549 0.00035624 0.0002022 8.58958e-05 0.000187147 0.000365937 0.000281044 0.000255672 0.00108948 0.00390996 0.00230398 0.00128984 0.00307477 0.00147607 0.00106759 0.000589853 0.000848115 0.00143693 0.000685777 0.00219331 0.00160639 0.00215123 0.000444315 0.000151986 0.000317552 0.00053971 0.00018703 0.000643944 0.000218269 0.000931556 0.000127084 0.000544278 0.000887942 0.000303909 0.000273875 0.00035335 0.00229062 0.000453207 0.0011987 0.000621194 0.000628335 0.000838564 0.000611029 0.000372603 0.00147742 0.000270685 8.29869e-05 0.000116974 0.000876237