mindspore.dataset.transforms.Unique

View Source On Gitee
class mindspore.dataset.transforms.Unique[source]

Perform the unique operation on the input tensor, only support transform one column each time.

Return 3 tensor: unique output tensor, index tensor, count tensor.

  • Output tensor contains all the unique elements of the input tensor in the same order that they occur in the input tensor.

  • Index tensor that contains the index of each element of the input tensor in the unique output tensor.

  • Count tensor that contains the count of each element of the output tensor in the input tensor.

Note

Call batch op before calling this function.

Raises

RuntimeError – If given Tensor has two columns.

Supported Platforms:

CPU

Examples

>>> import numpy as np
>>> import mindspore.dataset as ds
>>> import mindspore.dataset.transforms as transforms
>>>
>>> # Use the transform in dataset pipeline mode
>>> # Data before
>>> # |  x                 |
>>> # +--------------------+
>>> # | [[0,1,2], [1,2,3]] |
>>> # +--------------------+
>>> data = [[[0,1,2], [1,2,3]]]
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data, ["x"])
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=transforms.Unique(),
...                                                 input_columns=["x"],
...                                                 output_columns=["x", "y", "z"])
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["x"].shape, item["y"].shape, item["z"].shape)
...     print(item["x"].dtype, item["y"].dtype, item["z"].dtype)
(4,) (6,) (4,)
int64 int32 int32
>>> # Data after
>>> # |  x      |  y              |z        |
>>> # +---------+-----------------+---------+
>>> # | [0,1,2,3] | [0,1,2,1,2,3] | [1,2,2,1]
>>> # +---------+-----------------+---------+
>>>
>>> # Use the transform in eager mode
>>> data = [[0, -1, -2, -1, 2], [2, -0, 2, 1, -3]]
>>> output = transforms.Unique()(data)
>>> print(output[0].shape, output[1].shape, output[2].shape)
(6,) (10,) (6,)
>>> print(output[0].dtype, output[1].dtype, output[2].dtype)
int64 int32 int32