# MindSpore Serving-based Inference Service Deployment

[![View Source On Gitee](https://gitee.com/mindspore/docs/raw/r1.5/resource/_static/logo_source_en.png)](https://gitee.com/mindspore/docs/blob/r1.5/docs/serving/docs/source_en/serving_example.md)

## Overview

MindSpore Serving is a lightweight and high-performance service module that helps MindSpore developers efficiently deploy online inference services in the production environment. After completing model training on MindSpore, you can export the MindSpore model and use MindSpore Serving to create an inference service for the model.

The following uses a simple `Add` network as an example to describe how to use MindSpore Serving.

## Preparing the Environment

Before running the sample network, ensure that MindSpore Serving has been properly installed and the environment variables are configured. To install and configure MindSpore Serving on your PC, go to the [MindSpore Serving installation page](https://www.mindspore.cn/serving/docs/en/r1.5/serving_install.html).

## Downloading the Example

Please download the [add example](https://gitee.com/mindspore/serving/blob/r1.5/example/tensor_add/) first.

## Exporting the Model

In the directory `export_model`, use [add_model.py](https://gitee.com/mindspore/serving/blob/r1.5/example/tensor_add/export_model/add_model.py) to build a network with only the Add operator and export the MindSpore inference deployment model.

```python
import os
from shutil import copyfile
import numpy as np

import mindspore.context as context
import mindspore.nn as nn
import mindspore.ops as ops
import mindspore as ms

context.set_context(mode=context.GRAPH_MODE, device_target="Ascend")


class Net(nn.Cell):
    """Define Net of add"""

    def __init__(self):
        super(Net, self).__init__()
        self.add = ops.Add()

    def construct(self, x_, y_):
        """construct add net"""
        return self.add(x_, y_)


def export_net():
    """Export add net of 2x2 + 2x2, and copy output model `tensor_add.mindir` to directory ../add/1"""
    x = np.ones([2, 2]).astype(np.float32)
    y = np.ones([2, 2]).astype(np.float32)
    add = Net()
    ms.export(add, ms.Tensor(x), ms.Tensor(y), file_name='tensor_add', file_format='MINDIR')
    dst_dir = '../add/1'
    try:
        os.mkdir(dst_dir)
    except OSError:
        pass

    dst_file = os.path.join(dst_dir, 'tensor_add.mindir')
    copyfile('tensor_add.mindir', dst_file)
    print("copy tensor_add.mindir to " + dst_dir + " success")


if __name__ == "__main__":
    export_net()
```

To use MindSpore for neural network definition, inherit `mindspore.nn.Cell`. (A `Cell` is a base class of all neural networks.) Define each layer of a neural network in the `__init__` method in advance, and then define the `construct` method to complete the forward construction of the neural network. Use `export` of the `mindspore` module to export the model file.
For more detailed examples, see [Quick Start for Beginners](https://www.mindspore.cn/tutorials/en/r1.5/quick_start.html).

Execute the `add_model.py` script to generate the `tensor_add.mindir` file. The input of the model is two 2D tensors with shape [2,2], and the output is the sum of the two input tensors.

## Deploying the Serving Inference Service

### Configuring the Service

Start Serving with the following files:

```text
tensor_add
├── add/
│   │── servable_config.py
│   └── 1/
│       └── tensor_add.mindir
└── serving_server.py
```

- `serving_server.py`: Script file for starting the service.
- `add`: Model folder, which is named after the model name.
- `tensor_add.mindir`: Model file generated by the network in the previous step, which is stored in folder 1 (the number indicates the version number). Different versions are stored in different folders. The version number must be a string of digits. By default, the latest model file is started.
- [servable_config.py](https://gitee.com/mindspore/serving/blob/r1.5/example/tensor_add/add/servable_config.py): [Model configuration file](https://www.mindspore.cn/serving/docs/en/r1.5/serving_model.html), which defines the model processing functions, including the `add_common` and `add_cast` methods. `add_common` defines an addition operation whose input is two pieces of float32 data, and `add_cast` defines an addition operation whose input is data with its type converted to float32.

Content of the configuration file:

```python
import numpy as np
from mindspore_serving.server import register


def add_trans_datatype(x1, x2):
    """define preprocess, this example has two inputs and two outputs"""
    return x1.astype(np.float32), x2.astype(np.float32)


# when with_batch_dim is set to False, only 2x2 add is supported
# when with_batch_dim is set to True(default), Nx2 add is supported, while N is viewed as batch
# float32 inputs/outputs
model = register.declare_model(model_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)


# register add_common method in add
@register.register_method(output_names=["y"])
def add_common(x1, x2):  # only support float32 inputs
    """method add_common data flow definition, only call model"""
    y = register.add_stage(model, x1, x2, outputs_count=1)
    return y


# register add_cast method in add
@register.register_method(output_names=["y"])
def add_cast(x1, x2):
    """method add_cast data flow definition, only preprocessing and call model"""
    x1, x2 = register.add_stage(add_trans_datatype, x1, x2, outputs_count=2)  # cast input to float32
    y = register.add_stage(model, x1, x2, outputs_count=1)
    return y
```

### Starting the Service

Run the [serving_server.py](https://gitee.com/mindspore/serving/blob/r1.5/example/tensor_add/serving_server.py) script to start the Serving server:

```python
import os
import sys
from mindspore_serving import server


def start():
    servable_dir = os.path.dirname(os.path.realpath(sys.argv[0]))

    servable_config = server.ServableStartConfig(servable_directory=servable_dir, servable_name="add",
                                                 device_ids=(0, 1))
    server.start_servables(servable_configs=servable_config)

    server.start_grpc_server(address="127.0.0.1:5500")
    server.start_restful_server(address="127.0.0.1:1500")


if __name__ == "__main__":
    start()
```

The `start_servables` in the above startup script will load and run two inference copies of `add` on devices 0 and 1, and the inference requests from the client will be split to the two inference copies.

If the following log information is displayed on the server, the gRPC and RESTful services are started successfully.

```text
Serving gRPC server start success, listening on 127.0.0.1:5500
Serving RESTful server start success, listening on 127.0.0.1:1500
```

## Inference Execution

The client can access the inference service through either [gRPC](https://www.mindspore.cn/serving/docs/en/r1.5/serving_grpc.html) or [RESTful](https://www.mindspore.cn/serving/docs/en/r1.5/serving_restful.html). The following uses gRPC as an example.
Execute [serving_client.py](https://gitee.com/mindspore/serving/blob/r1.5/example/tensor_add/serving_client.py) to start the Python client.

```python
import numpy as np
from mindspore_serving.client import Client


def run_add_common():
    """invoke servable add method add_common"""
    client = Client("127.0.0.1:5500", "add", "add_common")
    instances = []

    # instance 1
    x1 = np.asarray([[1, 1], [1, 1]]).astype(np.float32)
    x2 = np.asarray([[1, 1], [1, 1]]).astype(np.float32)
    instances.append({"x1": x1, "x2": x2})

    # instance 2
    x1 = np.asarray([[2, 2], [2, 2]]).astype(np.float32)
    x2 = np.asarray([[2, 2], [2, 2]]).astype(np.float32)
    instances.append({"x1": x1, "x2": x2})

    # instance 3
    x1 = np.asarray([[3, 3], [3, 3]]).astype(np.float32)
    x2 = np.asarray([[3, 3], [3, 3]]).astype(np.float32)
    instances.append({"x1": x1, "x2": x2})

    result = client.infer(instances)
    print(result)


def run_add_cast():
    """invoke servable add method add_cast"""
    client = Client("127.0.0.1:5500", "add", "add_cast")
    instances = []
    x1 = np.ones((2, 2), np.int32)
    x2 = np.ones((2, 2), np.int32)
    instances.append({"x1": x1, "x2": x2})
    result = client.infer(instances)
    print(result)


if __name__ == '__main__':
    run_add_common()
    run_add_cast()
```

Use the `Client` class defined by `mindspore_serving.client`. The client defines two cases to call two model methods. In the `run_add_common` case with three pairs of float32 arrays, each pair of arrays are added up. In the `run_add_cast` case, two int32 arrays are added up. If the results of the two cases are displayed as follows, the Serving has properly executed the `Add` network inference.

```text
[{'y': array([[2. , 2.],
        [2.,  2.]], dtype=float32)},{'y': array([[4. , 4.],
        [4.,  4.]], dtype=float32)},{'y': array([[6. , 6.],
        [6.,  6.]], dtype=float32)}]
[{'y': array([[2. , 2.],
        [2.,  2.]], dtype=float32)}]
```