mindspore.dataset.Dataset.save

View Source On Gitee
Dataset.save(file_name, num_files=1, file_type='mindrecord')[source]

Save the dynamic data processed by the dataset pipeline in common dataset format. Supported dataset formats: 'mindrecord' only. And you can use mindspore.dataset.MindDataset API to read the saved file(s).

Implicit type casting exists when saving data as 'mindrecord' . The transform table shows how to do type casting.

Implicit Type Casting when Saving as mindrecord

Type in dataset

Type in mindrecord

Details

bool

int32

transform to int32

int8

int32

uint8

int32

int16

int32

uint16

int32

int32

int32

uint32

int64

int64

int64

uint64

int64

Maybe reverse

float16

float32

float32

float32

float64

float64

string

string

Multi-dimensional string not supported

bytes

bytes

Multi-dimensional bytes not supported

Note

  1. To save the samples in order, set dataset’s shuffle to False and num_files to 1.

  2. Before calling the function, do not use batch operation, repeat operation or data augmentation operations with random attribute in map operation.

  3. When array dimension is variable, one-dimensional arrays or multi-dimensional arrays with variable dimension 0 are supported.

  4. MindRecord does not support multi-dimensional string or multi-dimensional bytes.

Parameters
  • file_name (str) – Path to dataset file.

  • num_files (int, optional) – Number of dataset files. Default: 1 .

  • file_type (str, optional) – Dataset format. Default: 'mindrecord' .

Examples

>>> import mindspore.dataset as ds
>>> import numpy as np
>>>
>>> def generator_1d():
...     for i in range(10):
...         yield (np.array([i]),)
>>>
>>> # apply dataset operations
>>> d1 = ds.GeneratorDataset(generator_1d, ["data"], shuffle=False)
>>> d1.save('/path/to/save_file')