mindspore.DatasetHelper

class mindspore.DatasetHelper(dataset, dataset_sink_mode=True, sink_size=- 1, epoch_num=1)[source]

DatasetHelper is a class to process the MindData dataset and provides the information of dataset.

According to different contexts, change the iterations of dataset and use the same iteration for loop in different contexts.

Note

The iteration of DatasetHelper will provide one epoch data.

Parameters
  • dataset (Dataset) – The dataset iterator. The dataset can be generated by dataset generator API in mindspore.dataset, such as mindspore.dataset.ImageFolderDataset.

  • dataset_sink_mode (bool) – If the value is True, GetNext is employed to fetch the data at device through the dataset pipeline, otherwise fetch the data at host by iterating through the dataset. Default: True.

  • sink_size (int) – Control the amount of data in each sink. If sink_size=-1, sink the complete dataset for each epoch. If sink_size>0, sink sink_size data for each epoch. Default: -1.

  • epoch_num (int) – The number of passes of the entire dataset to be sent. Default: 1.

Examples

>>> import numpy as np
>>> from mindspore import DatasetHelper, nn
>>> from mindspore import dataset as ds
>>>
>>> data = {"x": np.float32(np.random.rand(64, 10)), "y": np.random.randint(0, 5, (64,))}
>>> train_dataset = ds.NumpySlicesDataset(data=data).batch(32)
>>> set_helper = DatasetHelper(train_dataset, dataset_sink_mode=False)
>>>
>>> net = nn.Dense(10, 5)
>>> # Object of DatasetHelper is iterable
>>> for next_element in set_helper:
...     # `next_element` includes data and label, using data to run the net
...     data = next_element[0]
...     net(data)
continue_send()[source]

Continue to send data to device at the beginning of epoch.

dynamic_min_max_shapes()[source]

Return the minimum and maximum data length of dynamic source dataset.

Examples

>>> from mindspore import DatasetHelper
>>>
>>> train_dataset = create_custom_dataset()
>>> # config dynamic shape
>>> dataset.set_dynamic_columns(columns={"data1": [16, None, 83], "data2": [None]})
>>> dataset_helper = DatasetHelper(train_dataset, dataset_sink_mode=True)
>>>
>>> min_shapes, max_shapes = dataset_helper.dynamic_min_max_shapes()
get_data_info()[source]

In sink mode, it returns the types and shapes of the current data. Generally, it works in dynamic shape scenarios.

release()[source]

Free up resources about data sink.

sink_size()[source]

Get sink_size for each iteration.

Examples

>>> from mindspore import DatasetHelper
>>>
>>> train_dataset = create_custom_dataset()
>>> dataset_helper = DatasetHelper(train_dataset, dataset_sink_mode=True, sink_size=-1)
>>>
>>> # if sink_size==-1, then will return the full size of source dataset.
>>> sink_size = dataset_helper.sink_size()
stop_send()[source]

Stop send data about data sink.

types_shapes()[source]

Get the types and shapes from dataset on the current configuration.

Examples

>>> from mindspore import DatasetHelper
>>>
>>> train_dataset = create_custom_dataset()
>>> dataset_helper = DatasetHelper(train_dataset, dataset_sink_mode=True)
>>>
>>> types, shapes = dataset_helper.types_shapes()