mindspore.dataset.dataloader.DataLoader
- class mindspore.dataset.dataloader.DataLoader(dataset, batch_size=1, shuffle=None, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, pin_memory=False, drop_last=False, timeout=0., worker_init_fn=None, multiprocessing_context=None, generator=None, *, prefetch_factor=None, persistent_workers=False, in_order=True)[source]
Data loader provides an iterator over the given dataset.
It supports map style and iterable style dataset with single or multi-process loading.
- Parameters
dataset (Dataset) – The dataset to load data from.
batch_size (Union[int, None], optional) – The number of samples per mini-batch. If
None
, will not batch. Default:1
.shuffle (Union[bool, None], optional) – Whether to shuffle the dataset. Default:
None
, not shuffle.sampler (Union[Sampler, Iterable, None], optional) – The sampler to use. Default:
None
, useSequentialSampler
if shuffle isFalse
, or useRandomSampler
.batch_sampler (Union[Sampler[List], Iterable[List], None], optional) – The batch sampler to use. Default:
None
,generate internalBatchSampler
if batch_size is notNone
.num_workers (int, optional) – The number of workers for loading. Default:
0
, load in main process.collate_fn (Union[_CollateFnType, None], optional) – The collate function to use. Default:
None
, use default collate function.pin_memory (bool, optional) – Whether to copy data into pinned memory. Default:
False
.drop_last (bool, optional) – Whether to drop the last incomplete batch. Default:
False
.timeout (float, optional) – The timeout for waiting the worker to process the data. Default:
0.
, wait forever.worker_init_fn (Union[Callable[[int], None], None], optional) – The worker init function to use. Default:
None
, do nothing.multiprocessing_context (Union[multiprocessing.context.BaseContext, str, None], optional) – The multiprocessing context to use. Default:
None
, usemindspore.multiprocessing
.generator (Union[numpy.random.Generator, None], optional) – The generator to use. Default:
None
, use default generator.
- Keyword Arguments
prefetch_factor (Union[int, None], optional) – The prefetch factor. Default:
None
, use2
when num_workers is greater than0
.persistent_workers (bool, optional) – Whether to keep the worker alive after iteration. Default:
False
.in_order (bool, optional) – Whether to keep the order of the data in multi-process loading. Default:
True
.
Examples
>>> from mindspore.dataset.dataloader import DataLoader, Dataset, IterableDataset >>> >>> # 1. Load from map style dataset >>> class MapStyleDataset(Dataset): ... def __init__(self, data): ... self.data = data ... ... def __getitem__(self, index): ... return self.data[index] ... ... def __len__(self): ... return len(self.data) >>> >>> dataset = MapStyleDataset(range(2)) >>> dataloader = DataLoader(dataset) >>> print(list(dataloader)) [Tensor(shape=[1], dtype=Int64, value= [0]), Tensor(shape=[1], dtype=Int64, value= [1])] >>> >>> # 2. Load from iterable style dataset >>> class IterableStyleDataset(IterableDataset): ... def __init__(self, num_samples): ... self.start = 0 ... self.end = num_samples ... ... def __iter__(self): ... return iter(range(self.start, self.end)) >>> >>> dataset = IterableStyleDataset(2) >>> dataloader = DataLoader(dataset) >>> print(list(dataloader)) [Tensor(shape=[1], dtype=Int64, value= [0]), Tensor(shape=[1], dtype=Int64, value= [1])]