mindspore.dataset.DistributedSampler
- class mindspore.dataset.DistributedSampler(num_shards, shard_id, shuffle=True, num_samples=None, offset=- 1)[source]
- A sampler that accesses a shard of the dataset, it helps divide dataset into multi-subset for distributed training. - Parameters
- num_shards (int) – Number of shards to divide the dataset into. 
- shard_id (int) – Shard ID of the current shard, which should within the range of [0, num_shards-1]. 
- shuffle (bool, optional) – If True, the indices are shuffled, otherwise it will not be shuffled(default=True). 
- num_samples (int, optional) – The number of samples to draw (default=None, which means sample all elements). 
- offset (int, optional) – The starting shard ID where the elements in the dataset are sent to, which should be no more than num_shards. This parameter is only valid when a ConcatDataset takes a DistributedSampler as its sampler. It will affect the number of samples of per shard (default=-1, which means each shard has the same number of samples). 
 
- Raises
- TypeError – If num_shards is not of type int. 
- TypeError – If shard_id is not of type int. 
- TypeError – If shuffle is not of type bool. 
- TypeError – If num_samples is not of type int. 
- TypeError – If offset is not of type int. 
- ValueError – If num_samples is a negative value. 
- RuntimeError – If num_shards is not a positive value. 
- RuntimeError – If shard_id is smaller than 0 or equal to num_shards or larger than num_shards. 
- RuntimeError – If offset is greater than num_shards. 
 
 - Examples - >>> # creates a distributed sampler with 10 shards in total. This shard is shard 5. >>> sampler = ds.DistributedSampler(10, 5) >>> dataset = ds.ImageFolderDataset(image_folder_dataset_dir, ... num_parallel_workers=8, ... sampler=sampler) - add_child(sampler)
- Add a sub-sampler for given sampler. The parent will receive all data from the output of sub-sampler sampler and apply its sample logic to return new samples. - Parameters
- sampler (Sampler) – Object used to choose samples from the dataset. Only builtin samplers(DistributedSampler, PKSampler, RandomSampler, SequentialSampler, SubsetRandomSampler, WeightedRandomSampler) are supported. 
 - Examples - >>> sampler = ds.SequentialSampler(start_index=0, num_samples=3) >>> sampler.add_child(ds.RandomSampler(num_samples=4)) >>> dataset = ds.Cifar10Dataset(cifar10_dataset_dir, sampler=sampler) 
 - get_child()
- Get the child sampler of given sampler. - Returns
- Sampler, The child sampler of given sampler. 
 - Examples - >>> sampler = ds.SequentialSampler(start_index=0, num_samples=3) >>> sampler.add_child(ds.RandomSampler(num_samples=2)) >>> child_sampler = sampler.get_child() 
 - get_num_samples()
- All samplers can contain a numeric num_samples value (or it can be set to None). A child sampler can exist or be None. If a child sampler exists, then the child sampler count can be a numeric value or None. These conditions impact the resultant sampler count that is used. The following table shows the possible results from calling this function. - child sampler - num_samples - child_samples - result - T - x - y - min(x, y) - T - x - None - x - T - None - y - y - T - None - None - None - None - x - n/a - x - None - None - n/a - None - Returns
- int, the number of samples, or None. 
 - Examples - >>> sampler = ds.SequentialSampler(start_index=0, num_samples=3) >>> num_samplers = sampler.get_num_samples()