mindspore.dataset.Dataset.shuffle

View Source On Gitee
Dataset.shuffle(buffer_size)[source]

Shuffle the dataset by creating a cache with the size of buffer_size .

  1. Make a shuffle buffer that contains the first buffer_size rows.

  2. Randomly select an element from the shuffle buffer to be the next row propagated to the child node.

  3. Get the next row (if any) from the parent node and put it in the shuffle buffer.

  4. Repeat steps 2 and 3 until there are no more rows left in the shuffle buffer.

A random seed can be provided to be used on the first epoch via dataset.config.set_seed . In every subsequent epoch, the seed is changed to a new one, randomly generated value.

Parameters

buffer_size (int) – The size of the buffer (must be larger than 1) for shuffling. Setting buffer_size equal to the number of rows in the entire dataset will result in a global shuffle.

Returns

Dataset, a new dataset with the above operation applied.

Raises

RuntimeError – If exist sync operations before shuffle.

Examples

>>> import mindspore.dataset as ds
>>> dataset = ds.GeneratorDataset([i for i in range(10)], "column1")
>>>
>>> # Optionally set the seed for fixed randomness
>>> ds.config.set_seed(58)
>>>
>>> # Create a shuffled dataset using a shuffle buffer of size 4
>>> dataset = dataset.shuffle(4)