mindspore.dataset.serialize

View Source On AtomGit
mindspore.dataset.serialize(dataset, json_filepath='')[source]

Serialize dataset pipeline into a JSON file.

Note

Complete serialization of Python objects is not currently supported. Scenarios that are not supported include data pipelines that use GeneratorDataset or map / batch operations that contain custom Python functions. For Python objects, serialization operations do not yield the full object content, which means that deserialization of the JSON file obtained by serialization may result in errors. For example, when serializing the data pipeline of Python user-defined functions, a related warning message appears and the obtained JSON file cannot be deserialized into a usable data pipeline.

Parameters:
  • dataset (Dataset) – The starting node.

  • json_filepath (str) – The filepath where a serialized JSON file will be generated. Default: ''.

Returns:

Dict, the dictionary containing the serialized dataset graph.

Raises:

OSError – Cannot open a file.

Examples

>>> import mindspore.dataset as ds
>>> import mindspore.dataset.transforms as transforms
>>>
>>> mnist_dataset_dir = "/path/to/mnist_dataset_directory"
>>> dataset = ds.MnistDataset(mnist_dataset_dir, num_samples=100)
>>> one_hot_encode = transforms.OneHot(10)  # num_classes is input argument
>>> dataset = dataset.map(operations=one_hot_encode, input_columns="label")
>>> dataset = dataset.batch(batch_size=10, drop_remainder=True)
>>> # serialize it to JSON file
>>> serialized_data = ds.serialize(dataset, json_filepath="/path/to/mnist_dataset_pipeline.json")