mindspore.dataset.Dataset.concat

View Source On Gitee
Dataset.concat(datasets)[source]

Concatenate the dataset objects in the input list. Performing “+” operation on dataset objects can achieve the same effect.

For a dataset concatenated by many other dataset objects, it returns the data in the order of datasets passed in. If you want to change the data order(such as random selection from each dataset instead of in sequence), apply use_sampler method on the concatenated dataset object. Currently use_sampler supports dataset.DistributedSampler for sharding selection from each dataset or dataset.RandomSampler for random selection from each dataset, see examples below.

Note

The column name, and rank and type of the column data must be the same in the input datasets.

Parameters

datasets (Union[list, Dataset]) – A list of datasets or a single class Dataset to be concatenated together with this dataset.

Returns

Dataset, a new dataset with the above operation applied.

Examples

>>> import mindspore.dataset as ds
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>>
>>> # Create a dataset by concatenating dataset_1 and dataset_2 with "+" operator
>>> dataset = dataset_1 + dataset_2
>>> # Create a dataset by concatenating dataset_1 and dataset_2 with concat operation
>>> dataset = dataset_1.concat(dataset_2)
>>>
>>> # Check the data order of dataset
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>> dataset = dataset_1 + dataset_2
>>> result = list(dataset)
>>> # [[Tensor(shape=[], dtype=Int64, value= 1)], [Tensor(shape=[], dtype=Int64, value= 2)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 3)], [Tensor(shape=[], dtype=Int64, value= 4)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 5)], [Tensor(shape=[], dtype=Int64, value= 6)]]
>>>
>>> # Change the data order of concatenated dataset with sharding selection
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>> dataset = dataset_1.concat(dataset_2)
>>> dataset.use_sampler(ds.DistributedSampler(num_shards=2, shard_id=1, shuffle=False))
>>> result = list(dataset)
>>> # [[Tensor(shape=[], dtype=Int64, value= 2)], [Tensor(shape=[], dtype=Int64, value= 4)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 6)]]
>>>
>>> # Change the data order of concatenated dataset with random selection
>>> dataset_1 = ds.GeneratorDataset([1, 2, 3], "column1", shuffle=False)
>>> dataset_2 = ds.GeneratorDataset([4, 5, 6], "column1", shuffle=False)
>>> dataset = dataset_1.concat(dataset_2)
>>> dataset.use_sampler(ds.RandomSampler())
>>> result = list(dataset)
>>> # [[Tensor(shape=[], dtype=Int64, value= 1)], [Tensor(shape=[], dtype=Int64, value= 4)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 2)], [Tensor(shape=[], dtype=Int64, value= 5)],
>>> #  [Tensor(shape=[], dtype=Int64, value= 6)], [Tensor(shape=[], dtype=Int64, value= 3)]]