mindspore.dataset.dataloader.DataLoader

class mindspore.dataset.dataloader.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0.0, worker_init_fn=None, multiprocessing_context=None, generator=None, *, prefetch_factor=None, persistent_workers=False, in_order=True)[source]

Data loader provides an iterator over the given dataset.

It supports map style and iterable style dataset with single or multi-process loading.

Parameters:

dataset (Dataset) – The dataset to load data from.
batch_size (Union[int, None], optional) – The number of samples per mini-batch. If None , will not batch. Default: 1 .
shuffle (Union[bool, None], optional) – Whether to shuffle the dataset. Default: None , not shuffle.
sampler (Union[Sampler, Iterable, None], optional) – The sampler to use. Default: None , use SequentialSampler if shuffle is False , or use RandomSampler .
batch_sampler (Union[Sampler[List], Iterable[List], None], optional) – The batch sampler to use. Default: None , generate internal BatchSampler if batch_size is not None .
num_workers (int, optional) – The number of workers for loading. Default: 0 , load in main process.
collate_fn (Union[_CollateFnType, None], optional) – The collate function to use. Default: None , use default collate function.
pin_memory (bool, optional) – Whether to copy data into pinned memory. Default: False .
drop_last (bool, optional) – Whether to drop the last incomplete batch. Default: False .
timeout (float, optional) – The timeout for waiting the worker to process the data. Default: 0.0 , wait forever.
worker_init_fn (Union[Callable[[int], None], None], optional) – The worker init function to use. Default: None , do nothing.
multiprocessing_context (Union[multiprocessing.context.BaseContext, str, None], optional) – The multiprocessing context to use. Default: None , use mindspore.multiprocessing .
generator (Union[numpy.random.Generator, None], optional) – The generator to use. Default: None , use default generator.

Keyword Arguments:

prefetch_factor (Union[int, None], optional) – The prefetch factor. Default: None , use 2 when num_workers is greater than 0 .
persistent_workers (bool, optional) – Whether to keep the worker alive after iteration. Default: False .
in_order (bool, optional) – Whether to keep the order of the data in multi-process loading. Default: True .

Examples

>>> from mindspore.dataset.dataloader import DataLoader, Dataset, IterableDataset
>>>
>>> # 1. Load from map style dataset
>>> class MapStyleDataset(Dataset):
...     def __init__(self, data):
...         self.data = data
...
...     def __getitem__(self, index):
...         return self.data[index]
...
...     def __len__(self):
...         return len(self.data)
>>>
>>> dataset = MapStyleDataset(range(2))
>>> dataloader = DataLoader(dataset)
>>> print(list(dataloader))
[Tensor(shape=[1], dtype=Int64, value= [0]), Tensor(shape=[1], dtype=Int64, value= [1])]
>>>
>>> # 2. Load from iterable style dataset
>>> class IterableStyleDataset(IterableDataset):
...     def __init__(self, num_samples):
...         self.start = 0
...         self.end = num_samples
...
...     def __iter__(self):
...         return iter(range(self.start, self.end))
>>>
>>> dataset = IterableStyleDataset(2)
>>> dataloader = DataLoader(dataset)
>>> print(list(dataloader))
[Tensor(shape=[1], dtype=Int64, value= [0]), Tensor(shape=[1], dtype=Int64, value= [1])]