mindspore_serving

MindSpore Serving.

mindspore_serving.master

MindSpore Serving Master

mindspore_serving.master.start_grpc_server(ip=0.0.0.0, grpc_port=5500, max_msg_mb_size=100)[source]

Start gRPC server for the communication between client and serving.

Parameters
  • ip (str) – gRPC server ip.

  • grpc_port (int) – gRPC port ip, default 5500, ip port range [1, 65535].

  • max_msg_mb_size (int) – The maximum acceptable gRPC message size in megabytes(MB), default 100, value range [1, 512].

Raises

RuntimeError – Fail to start the gRPC server.

Examples

>>> from mindspore_serving import master
>>>
>>> master.start_grpc_server("0.0.0.0", 5500)
>>> master.start_restful_server("0.0.0.0", 1500)
mindspore_serving.master.start_master_server(ip=127.0.0.1, master_port=6100)[source]

Start the gRPC server for the communication between workers and the master.

Note

The ip is expected to be accessed only by workers, not clients.

Parameters
  • ip (str) – gRPC ip for workers to communicate with, default ‘127.0.0.1’.

  • master_port (int) – gRPC port ip, default 6100, ip port range [1, 65535].

Raises

RuntimeError – Fail to start the master server.

Examples

>>> from mindspore_serving import master
>>>
>>> master.start_grpc_server("0.0.0.0", 5500)
>>> master.start_restful_server("0.0.0.0", 1500)
>>> master.start_master_server("127.0.0.1", 6100)
mindspore_serving.master.start_restful_server(ip=0.0.0.0, restful_port=5900, max_msg_mb_size=100)[source]

Start RESTful server for the communication between client and serving.

Parameters
  • ip (str) – RESTful server ip.

  • restful_port (int) – gRPC port ip, default 5900, ip port range [1, 65535].

  • max_msg_mb_size (int) – The maximum acceptable RESTful message size in megabytes(MB), default 100, value range [1, 512].

Raises

RuntimeError – Fail to start the RESTful server.

Examples

>>> from mindspore_serving import master
>>>
>>> master.start_restful_server("0.0.0.0", 1500)
mindspore_serving.master.stop()[source]

Stop the running of master.

Examples

>>> from mindspore_serving import master
>>>
>>> master.start_grpc_server("0.0.0.0", 5500)
>>> master.start_restful_server("0.0.0.0", 1500)
>>> ...
>>> master.stop()

mindspore_serving.worker

MindSpore Serving Worker.

mindspore_serving.worker.start_servable(servable_directory, servable_name, version_number=0, device_type=None, device_id=0, master_ip=0.0.0.0, master_port=6100, worker_ip=0.0.0.0, worker_port=6200)[source]

Start up the servable named ‘servable_name’ defined in ‘servable_directory’, and link the worker to the master through gRPC (master_ip, master_port).

Serving has two running modes. One is running in a single process, providing the Serving service of a single model. The other includes a master and multiple workers. This interface is for the second scenario.

The master is responsible for providing the Serving access interface for clients, while the worker is responsible for providing the inference service of the specific model. The communications between the master and workers through gPRC are defined as (master_ip, master_port) and (worker_ip, worker_port).

Parameters
  • servable_directory (str) – The directory where the servable is located in. There expects to has a directory named servable_name. For more detail: How to config Servable .

  • servable_name (str) – The servable name.

  • version_number (int) – Servable version number to be loaded. The version number should be a positive integer, starting from 1, and 0 means to load the latest version. Default: 0.

  • device_type (str) – Currently only supports “Ascend”, “Davinci” and None, Default: None. “Ascend” means the device type can be Ascend910 or Ascend310, etc. “Davinci” has the same meaning as “Ascend”. None means the device type is determined by the MindSpore environment.

  • device_id (int) – The id of the device the model loads into and runs in.

  • master_ip (str) – The master ip the worker linked to.

  • master_port (int) – The master port the worker linked to.

  • worker_ip (str) – The worker ip the master linked to.

  • worker_port (int) – The worker port the master linked to.

Examples

>>> import os
>>> from mindspore_serving import worker
>>>
>>> servable_dir = os.path.abspath(".")
>>> worker.start_servable(servable_dir, "lenet", device_id=0, \
...                       master_ip="127.0.0.1", master_port=6500, \
...                       host_ip="127.0.0.1", host_port=6600)
mindspore_serving.worker.start_servable_in_master(servable_directory, servable_name, version_number=0, device_type=None, device_id=0)[source]

Start up the servable named ‘servable_name’ defined in ‘svable_directory’, and the worker will run in the process of the master.

Serving has two running modes. One is running in a single process, providing the Serving service of a single model. The other includes a master and multiple workers. This interface is for the first scenario.

Parameters
  • servable_directory (str) –

    The directory where the servable is located in. There expects to has a directory named servable_name. For more detail: How to config Servable .

  • servable_name (str) – The servable name.

  • version_number (int) – Servable version number to be loaded. The version number should be a positive integer, starting from 1, and 0 means to load the latest version. Default: 0.

  • device_type (str) – Currently only supports “Ascend”, “Davinci” and None, Default: None. “Ascend” means the device type can be Ascend910 or Ascend310, etc. “Davinci” has the same meaning as “Ascend”. None means the device type is determined by the MindSpore environment.

  • device_id (int) – The id of the device the model loads into and runs in.

Examples

>>> import os
>>> from mindspore_serving import worker
>>> from mindspore_serving import master
>>>
>>> servable_dir = os.path.abspath(".")
>>> worker.start_servable_in_master(servable_dir, "lenet", device_id=0)
>>>
>>> master.start_grpc_server("0.0.0.0", 5500)
>>> master.start_restful_server("0.0.0.0", 1500)
mindspore_serving.worker.stop()[source]

Stop the running of worker.

Examples

>>> import os
>>> from mindspore_serving import worker
>>>
>>> servable_dir = os.path.abspath(".")
>>> worker.start_servable(servable_dir, "lenet", device_id=0, \
...                       master_ip="127.0.0.1", master_port=6500, \
...                       host_ip="127.0.0.1", host_port=6600)
>>> ...
>>> worker.stop()

mindspore_serving.worker.register

MindSpore Serving Worker, for servable config.

class mindspore_serving.worker.register.AclOptions(**kwargs)[source]

Helper class to set acl options.

Parameters
  • insert_op_cfg_path (str) – Path of aipp config file.

  • input_format (str) – Manually specify the model input format, the value can be “ND”, “NCHW”, “NHWC”, “CHWN”, “NC1HWC0”, or “NHWC1C0”.

  • input_shape (str) – Manually specify the model input shape, such as “input_op_name1: n1,c2,h3,w4;input_op_name2: n4,c3,h2,w1”,

  • output_type (str) – Manually specify the model output type, the value can be “FP16”, “UINT8”,or “FP32”, default “FP32”.

  • precision_mode (str) – Model precision mode, the value can be “force_fp16”,”allow_fp32_to_fp16”, “must_keep_origin_dtype” or “allow_mix_precision”, default “force_fp16”.

  • op_select_impl_mode (str) – The operator selection mode, the value can be “high_performance” or “high_precision”, default “high_performance”.

Raises

RuntimeError – Acl option is invalid, or value is not str.

Examples

>>> from mindspore_serving.worker import register
>>> options = register.AclOptions(op_select_impl_mode="high_precision", precision_mode="allow_fp32_to_fp16")
>>> register.declare_servable(servable_file="deeptext.mindir", model_format="MindIR", options=options)
set_input_format(val)[source]

Set option ‘input_format’, manually specify the model input format, and the value can be “ND”, “NCHW”, “NHWC”, “CHWN”, “NC1HWC0”, or “NHWC1C0”.

Parameters

val (str) – Value of option ‘input_format’, and the value can be “ND”, “NCHW”, “NHWC”, “CHWN”, “NC1HWC0”, or “NHWC1C0”.

Raises

RuntimeError – The type of value is not str, or the value is invalid.

set_input_shape(val)[source]

Set option ‘input_shape’, manually specify the model input shape, such as “input_op_name1: n1,c2,h3,w4;input_op_name2: n4,c3,h2,w1”.

Parameters

val (str) – Value of option ‘input_shape’.

Raises

RuntimeError – The type of value is not str, or the value is invalid.

set_insert_op_cfg_path(val)[source]

Set option ‘insert_op_cfg_path’

Parameters

val (str) – Value of option ‘insert_op_cfg_path’.

Raises

RuntimeError – The type of value is not str.

set_op_select_impl_mode(val)[source]

Set option ‘op_select_impl_mode’, which means model precision mode, and the value can be “high_performance” or “high_precision”, default “high_performance”.

Parameters

val (str) – Value of option ‘op_select_impl_mode’,which can be “high_performance” or “high_precision”, default “high_performance”.

Raises

RuntimeError – The type of value is not str, or the value is invalid.

set_output_type(val)[source]

Set option ‘output_type’, manually specify the model output type, and the value can be “FP16”, “UINT8”, or “FP32”, default “FP32”.

Parameters

val (str) – Value of option ‘output_type’, and the value can be “FP16”, “UINT8”, or “FP32”, default “FP32”.

Raises

RuntimeError – The type of value is not str, or the value is invalid.

set_precision_mode(val)[source]

Set option ‘precision_mode’, which means operator selection mode, and the value can be “force_fp16”, “force_fp16”, “must_keep_origin_dtype”, or “allow_mix_precision”, default “force_fp16”.

Parameters

val (str) – Value of option ‘precision_mode’, and the value can be “force_fp16”, “force_fp16”, “must_keep_origin_dtype”, or “allow_mix_precision”, default “force_fp16”.

Raises

RuntimeError – The type of value is not str, or the value is invalid.

class mindspore_serving.worker.register.GpuOptions(**kwargs)[source]

Helper class to set gpu options.

Parameters

enable_trt_infer (bool) – Whether enable inference with TensorRT.

Raises

RuntimeError – Gpu option is invalid, or value is not str.

Examples

>>> from mindspore_serving.worker import register
>>> options = register.GpuOptions(enable_trt_infer=True)
>>> register.declare_servable(servable_file="deeptext.mindir", model_format="MindIR", options=options)
set_trt_infer_mode(val)[source]

Set option ‘enable_trt_infer’

Parameters

val (bool) – Value of option ‘enable_trt_infer’.

Raises

RuntimeError – The type of value is not bool.

mindspore_serving.worker.register.call_postprocess(postprocess_fun, *args)[source]

For method registration, define the postprocessing function and its’ parameters.

Parameters
  • postprocess_fun (function) – Python function for postprocess.

  • args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters is invalid, or other error happened.

mindspore_serving.worker.register.call_postprocess_pipeline(postprocess_fun, *args)[source]

For method registration, define the postprocessing pipeline function and its’ parameters.

A single request can include multiple instances, and multiple queued requests will also have multiple instances. If you need to process multiple instances through multi thread or other parallel processing capability in preprocess or postprocess, such as using MindData concurrency ability to process multiple input images in preprocess, MindSpore Serving provides ‘call_preprocess_pipeline’ and ‘call_pstprocess_pipeline’ to register such preprocessing and postprocessing. For more detail, please refer to Resnet50 model configuration example.

Parameters
  • postprocess_fun (function) – Python pipeline function for postprocess.

  • args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters is invalid, or other error happened.

mindspore_serving.worker.register.call_preprocess(preprocess_fun, *args)[source]

For method registration, define the preprocessing function and its’ parameters.

Parameters
  • preprocess_fun (function) – Python function for preprocess.

  • args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters is invalid, or other error happened.

Examples

>>> from mindspore_serving.worker import register
>>> import numpy as np
>>> def add_trans_datatype(x1, x2):
...     return x1.astype(np.float32), x2.astype(np.float32)
>>>
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_cast method in add
>>> def add_cast(x1, x2):
...     x1, x2 = register.call_preprocess(add_trans_datatype, x1, x2)  # cast input to float32
...     y = register.call_servable(x1, x2)
...     return y
mindspore_serving.worker.register.call_preprocess_pipeline(preprocess_fun, *args)[source]

For method registration, define the preprocessing pipeline function and its’ parameters.

A single request can include multiple instances, and multiple queued requests will also have multiple instances. If you need to process multiple instances through multi thread or other parallel processing capability in preprocess or postprocess, such as using MindData concurrency ability to process multiple input images in preprocess, MindSpore Serving provides ‘call_preprocess_pipeline’ and ‘call_pstprocess_pipeline’ to register such preprocessing and postprocessing. For more detail, please refer to Resnet50 model configuration example.

Parameters
  • preprocess_fun (function) – Python pipeline function for preprocess.

  • args – Preprocess inputs. The length of ‘args’ should equal to the input parameters number of implemented python function.

Raises

RuntimeError – The type or value of the parameters is invalid, or other error happened.

Examples

>>> from mindspore_serving.worker import register
>>> import numpy as np
>>> def add_trans_datatype(instances):
...     for instance in instances:
...         x1 = instance[0]
...         x2 = instance[0]
...         yield x1.astype(np.float32), x2.astype(np.float32)
>>>
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_cast method in add
>>> def add_cast(x1, x2):
...     x1, x2 = register.call_preprocess_pipeline(add_trans_datatype, x1, x2)  # cast input to float32
...     y = register.call_servable(x1, x2)
...     return y
mindspore_serving.worker.register.call_servable(*args)[source]

For method registration, define the inputs data of model inference

Note

The length of ‘args’ should be equal to the inputs number of model

Parameters

args – Model’s inputs, the length of ‘args’ should be equal to the inputs number of model.

Raises

RuntimeError – The type or value of the parameters is invalid, or other error happened.

Examples

>>> from mindspore_serving.worker import register
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_common method in add
>>> def add_common(x1, x2):
...     y = register.call_servable(x1, x2)
...     return y
mindspore_serving.worker.register.declare_servable(servable_file, model_format, with_batch_dim=True, options=None, without_batch_dim_inputs=None)[source]

declare the servable info.

Parameters
  • servable_file (str) – Model file name.

  • model_format (str) – Model format, “OM” or “MindIR”, case ignored.

  • with_batch_dim (bool) – Whether the first shape dim of the inputs and outputs of model is batch dim, default True.

  • options (None, AclOptions, GpuOptions, map) – Options of model, currently AclOptions, GpuOptions works.

  • without_batch_dim_inputs (None, int, tuple or list of int) – Index of inputs that without batch dim when with_batch_dim is True.

Raises

RuntimeError – The type or value of the parameters is invalid.

mindspore_serving.worker.register.register_method(output_names)[source]

register method for servable.

Define the data flow of preprocess, model inference and postprocess in the method. Preprocess and postprocess are optional.

Parameters

output_names (str, tuple or list of str) – The output names of method. The input names is the args names of the registered function.

Raises

RuntimeError – The type or value of the parameters is invalid, or other error happened.

Examples

>>> from mindspore_serving.worker import register
>>> import numpy as np
>>> def add_trans_datatype(x1, x2):
...      return x1.astype(np.float32), x2.astype(np.float32)
>>>
>>> register.declare_servable(servable_file="tensor_add.mindir", model_format="MindIR", with_batch_dim=False)
>>>
>>> @register.register_method(output_names=["y"]) # register add_cast method in add
>>> def add_cast(x1, x2):
...     x1, x2 = register.call_preprocess(add_trans_datatype, x1, x2)  # cast input to float32
...     y = register.call_servable(x1, x2)
...     return y

mindspore_serving.worker.distributed

MindSpore Serving Worker.

mindspore_serving.worker.distributed.declare_distributed_servable(rank_size, stage_size, with_batch_dim=True, without_batch_dim_inputs=None)[source]

declare distributed servable in servable_config.py.

Parameters
  • rank_size (int) – Te rank size of the distributed model.

  • stage_size (int) – The stage size of the distributed model.

  • with_batch_dim (bool) – Whether the first shape dim of the inputs and outputs of model is batch, default True.

  • without_batch_dim_inputs (None, int, tuple or list of int) – Index of inputs that without batch dim when with_batch_dim is True.

Examples

>>> from mindspore_serving.worker import distributed
>>> distributed.declare_distributed_servable(rank_size=8, stage_size=1)
mindspore_serving.worker.distributed.start_distributed_servable(servable_directory, servable_name, rank_table_json_file, version_number=1, worker_ip=0.0.0.0, worker_port=6200, master_ip=0.0.0.0, master_port=6100, wait_agents_time_in_seconds=0)[source]

Start up the servable named ‘servable_name’ defined in ‘servable_directory’, and link the worker to the master through gRPC (master_ip, master_port).

Serving has two running modes. One is running in a single process, providing the Serving service of a single model. The other includes a master and multiple workers. This interface is for the second scenario.

The master is responsible for providing the Serving access interface for clients, while the worker is responsible for providing the inference service of the specific model. The communications between the master and workers through gPRC are defined as (master_ip, master_port) and (worker_ip, worker_port).

Parameters
  • servable_directory (str) –

    The directory where the servable is located in. There expects to has a directory named servable_name. For more detail: How to config Servable .

  • servable_name (str) – The servable name.

  • version_number (int) – Servable version number to be loaded. The version number should be a positive integer, starting from 1, and 0 means to load the latest version. Default: 0.

  • rank_table_json_file (str) – The ranke table json file name.

  • master_ip (str) – The master ip the worker linked to.

  • master_port (int) – The master port the worker linked to.

  • worker_ip (str) – The worker ip the master and agents linked to.

  • worker_port (int) – The worker port the master and agents linked to.

  • wait_agents_time_in_seconds (int) – The maximum time in seconds the worker waiting ready of all agents, 0 means unlimited time, default 0

Examples

>>> import os
>>> from mindspore_serving.worker import distributed
>>>
>>> servable_dir = os.path.abspath(".")
>>> distributed.start_distributed_servable(servable_dir, "matmul", rank_table_json_file="hccl_8p.json", \
...                                        worker_ip="127.0.0.1", worker_port=6200,   \
...                                        master_ip="127.0.0.1", master_port=6500)
mindspore_serving.worker.distributed.start_distributed_servable_in_master(servable_directory, servable_name, rank_table_json_file, version_number=1, worker_ip=0.0.0.0, worker_port=6200, wait_agents_time_in_seconds=0)[source]

Start up the servable named ‘servable_name’ defined in ‘svable_directory’, and the worker will run in the process of the master.

Serving has two running modes. One is running in a single process, providing the Serving service of a single model. The other includes a master and multiple workers. This interface is for the first scenario.

Parameters
  • servable_directory (str) –

    The directory where the servable is located in. There expects to has a directory named servable_name. For more detail: How to config Servable .

  • servable_name (str) – The servable name.

  • version_number (int) – Servable version number to be loaded. The version number should be a positive integer, starting from 1, and 0 means to load the latest version. Default: 0.

  • rank_table_json_file (str) – The ranke table json file name.

  • worker_ip (str) – The worker ip the agents linked to.

  • worker_port (int) – The worker port the agents linked to.

  • wait_agents_time_in_seconds (int) – The maximum time in seconds the worker waiting ready of all agents, 0 means unlimited time, default 0.

Examples

>>> import os
>>> from mindspore_serving.worker import distributed
>>> from mindspore_serving import master
>>>
>>> servable_dir = os.path.abspath(".")
>>> distributed.start_distributed_servable_in_master(servable_dir, "matmul", \
...                                                  rank_table_json_file="hccl_8p.json", \
...                                                  worker_ip="127.0.0.1", worker_port=6200)
>>>
>>> master.start_grpc_server("0.0.0.0", 5500)
>>> master.start_restful_server("0.0.0.0", 1500)
mindspore_serving.worker.distributed.startup_worker_agents(worker_ip, worker_port, model_files, group_config_files=None, agent_start_port=7000, agent_ip=None, rank_start=None)[source]

Start up all needed worker agenton current machine.

Serving has two running modes. One is running in a single process, providing the Serving service of a single model. The other includes a master and multiple workers. This interface is for the second scenario.

The master is responsible for providing the Serving access interface for clients, while the worker is responsible for providing the inference service of the specific model. The communications between the master and workers through gPRC are defined as (master_ip, master_port) and (worker_ip, worker_port).

Parameters
  • worker_ip (str) – The worker ip the agents linked to.

  • worker_port (int) – The worker port the agents linked to.

  • model_files (list or tuple of str) – All model files need in current machine, absolute path or path relative to this startup python script.

  • group_config_files (None, list or tuple of str) – All group config files need in current machine, absolute path or path relative to this startup python script, default None, which means there are no configuration files.

  • agent_start_port (int) – The starting agent port of the agents link to worker.

  • agent_ip (str or None) – The local agent ip, if it’s None, the agent ip will be obtained from rank table file. Default None. Parameter agent_ip and parameter rank_start must have values at the same time, or both None at the same time.

  • rank_start (int or None) – The starting rank id of this machine, if it’s None, the rank ip will be obtained from rank table file. Default None. Parameter agent_ip and parameter rank_start must have values at the same time, or both None at the same time.

Examples

>>> import os
>>> from mindspore_serving.worker import distributed
>>> model_files = []
>>> for i in range(8):
>>>    model_files.append(f"models/device{i}/matmul.mindir")
>>> distributed.startup_worker_agents(worker_ip="127.0.0.1", worker_port=6200, model_files=model_files)

mindspore_serving.client

MindSpore Serving Client

class mindspore_serving.client.Client(ip, port, servable_name, method_name, version_number=0)[source]

The Client encapsulates the serving gRPC API, which can be used to create requests, access serving, and parse results.

Parameters
  • ip (str) – Serving ip.

  • port (int) – Serving port.

  • servable_name (str) – The name of servable supplied by Serving.

  • method_name (str) – The name of method supplied by servable.

  • version_number (int) – The version number of servable, default 0, which means the maximum version number in all running versions.

Raises

RuntimeError – The type or value of the parameters is invalid, or other errors happened.

Examples

>>> from mindspore_serving.client import Client
>>> import numpy as np
>>> client = Client("localhost", 5500, "add", "add_cast")
>>> instances = []
>>> x1 = np.ones((2, 2), np.int32)
>>> x2 = np.ones((2, 2), np.int32)
>>> instances.append({"x1": x1, "x2": x2})
>>> result = client.infer(instances)
>>> print(result)
infer(instances)[source]

Used to create requests, access serving service, and parse and return results.

Parameters

instances (map, tuple of map) – Instance or tuple of instances, every instance item is the inputs map. The map key is the input name, and the value is the input value, the type of value can be python int, float, bool, str, bytes, numpy number, or numpy array object.

Raises

RuntimeError – The type or value of the parameters is invalid, or other errors happened.

Examples

>>> from mindspore_serving.client import Client
>>> import numpy as np
>>> client = Client("localhost", 5500, "add", "add_cast")
>>> instances = []
>>> x1 = np.ones((2, 2), np.int32)
>>> x2 = np.ones((2, 2), np.int32)
>>> instances.append({"x1": x1, "x2": x2})
>>> result = client.infer(instances)
>>> print(result)
infer_async(instances)[source]

Used to create requests, async access serving.

Parameters

instances (map, tuple of map) – Instance or tuple of instances, every instance item is the inputs map. The map key is the input name, and the value is the input value.

Raises

RuntimeError – The type or value of the parameters is invalid, or other errors happened.

Examples

>>> from mindspore_serving.client import Client
>>> import numpy as np
>>> client = Client("localhost", 5500, "add", "add_cast")
>>> instances = []
>>> x1 = np.ones((2, 2), np.int32)
>>> x2 = np.ones((2, 2), np.int32)
>>> instances.append({"x1": x1, "x2": x2})
>>> result_future = client.infer_async(instances)
>>> result = result_future.result()
>>> print(result)