mindpandas.channel
- class mindpandas.channel.DataSender(address, namespace='default', num_shards=1, dataset_name='dataset', full_batch=False, max_queue_size=10)
The sender (input side) of the channel. It can be used for sending new object through the channel.
- Parameters
address (str) - The ip address of the node current sender runs on.
namespace (str, optional) - The namespace that the channel belongs to. By default the value is default and the sender will be running in namespace default. DataSender and DataReceiver in different namespaces cannot connect to each other.
num_shards (int, optional) - Specifies how many shards the data will be divided into. By default the value is 1.
dataset_name (str, optional) - The name of the dataset. By default the value is dataset.
full_batch (bool, optional) - If true, each shard will get complete data sent by the sender. Otherwise each shard only gets part of the data. By default the value is False.
max_queue_size (int, optional) - The maximum number of data that can be cached in the queue. By default the value is 10.
- Raises
ValueError - When num_shards is an invalid value.
Note
Distributed executor has to be started in advance.
- send(obj)
Send object through the channel.
- Parameters
obj (Union[numpy.ndarray, list, mindpandas.DataFrame]) - The object to send.
- Raises
TypeError - If the type of the obj is invalid.
ValueError - If the length of the obj is not a positive integer or cannot be evenly divided by the number of shards.
- property num_shards
Returns the num_shards of current channel.
- property full_batch
Returns the value of full_batch.
- get_queue(shard_id=None)
Returns the object references that haven’t been consumed in the shard specified by shard_id.
- Parameters
shard_id (int, optional) - The id of the requested shard. By default the value is None and it will return all shards.
- Returns
List,stores references of the data in the shard.
- class mindpandas.channel.DataReceiver(address, namespace='default', shard_id=0, dataset_name='dataset')
The receiver (output side) of the channel. It can be used for receiving new object from the channel.
- Parameters
address (str) - The ip address of the node current receiver runs on.
namespace (str, optional) - he namespace that the channel belongs to. By default the value is default and the receiver will be running in namespace default. DataSender and DataReceiver in different namespaces cannot connect to each other.
shard_id (int, optional) - Specifies the shard of data that is received by current receiver. By default the value is 0 and the receiver will get data from the shard with id 0.
dataset_name (str, optional) - The name of the dataset. By default the value is dataset.
Note
Distributed executor has to be started and a DataSender has to be initialized in advance. To pair with the correct DataSender, the namespace and dataset_name have to be identical to the DataSender.
- recv()
Get data from the channel.
- Returns
object,the least recent object in the shard that haven’t been consumed.
- Raises
ValueError - When the shard_id of current receiver is invalid.
- property shard_id
Returns the shard_id of current receiver.
- property num_shards
Returns the num_shards of current channel.