mindpandas.channel

class mindpandas.channel.DataSender(address, namespace='default', num_shards=1, dataset_name='dataset', full_batch=False, max_queue_size=10)

The sender (input side) of the channel. It can be used for sending new object through the channel.

Parameters
  • address (str) - The ip address of the node current sender runs on.

  • namespace (str, optional) - The namespace that the channel belongs to. By default the value is 'default' and the sender will be running in namespace default. DataSender and DataReceiver in different namespaces cannot connect to each other.

  • num_shards (int, optional) - Specifies how many shards the data will be divided into. By default the value is 1.

  • dataset_name (str, optional) - The name of the dataset. By default the value is 'dataset'.

  • full_batch (bool, optional) - If True, each shard will get complete data sent by the sender. Otherwise each shard only gets part of the data. By default the value is False.

  • max_queue_size (int, optional) - The maximum number of data that can be cached in the queue. By default the value is 10.

Raises
  • ValueError - When num_shards is an invalid value.

Note

Distributed executor has to be started in advance.

send(obj)

Send object through the channel.

Parameters
  • obj (Union[numpy.ndarray, list, mindpandas.DataFrame]) - The object to send.

Raises
  • TypeError - If the type of the obj is invalid.

  • ValueError - If the length of the obj is not a positive integer or cannot be evenly divided by the number of shards.

property num_shards

Returns the num_shards of current channel.

property full_batch

Returns the value of full_batch.

get_queue(shard_id=None)

Returns the object references that haven’t been consumed in the shard specified by shard_id.

Parameters
  • shard_id (int, optional) - The id of the requested shard. By default the value is None and it will return all shards.

Returns

List,stores references of the data in the shard.

class mindpandas.channel.DataReceiver(address, namespace='default', shard_id=0, dataset_name='dataset')

The receiver (output side) of the channel. It can be used for receiving new object from the channel.

Parameters
  • address (str) - The ip address of the node current receiver runs on.

  • namespace (str, optional) - he namespace that the channel belongs to. By default the value is 'default' and the receiver will be running in namespace default. DataSender and DataReceiver in different namespaces cannot connect to each other.

  • shard_id (int, optional) - Specifies the shard of data that is received by current receiver. By default the value is 0 and the receiver will get data from the shard with id 0.

  • dataset_name (str, optional) - The name of the dataset. By default the value is 'dataset'.

Note

Distributed executor has to be started and a DataSender has to be initialized in advance. To pair with the correct DataSender, the namespace and dataset_name have to be identical to the DataSender.

recv()

Get data from the channel.

Returns

object,the least recent object in the shard that haven’t been consumed.

Raises
  • ValueError - When the shard_id of current receiver is invalid.

property shard_id

Returns the shard_id of current receiver.

property num_shards

Returns the num_shards of current channel.