mindspore.mindrecord

Introduction to mindrecord:

Mindrecord is a module to implement reading, writing, search and converting for MindSpore format dataset. Users could load(modify) mindrecord data through FileReader(FileWriter). Users could also convert other format dataset to mindrecord data through corresponding sub-module.

class mindspore.mindrecord.FileWriter(file_name, shard_num=1)[source]

Class to write user defined raw data into MindRecord File series.

Parameters
  • file_name (str) – File name of MindRecord File.

  • shard_num (int, optional) – Number of MindRecord File (default=1). It should be between [1, 1000].

Raises

ParamValueError – If file_name or shard_num is invalid.

add_index(index_fields)[source]

Select index fields from schema to accelerate reading.

Parameters

index_fields (list[str]) – Fields would be set as index which should be primitive type.

Returns

MSRStatus, SUCCESS or FAILED.

Raises
  • ParamTypeError – If index field is invalid.

  • MRMDefineIndexError – If index field is not primitive type.

  • MRMAddIndexError – If failed to add index field.

  • MRMGetMetaError – If the schema is not set or get meta failed.

add_schema(content, desc=None)[source]

Returns a schema id if added schema successfully, or raise exception.

Parameters
  • content (dict) – Dict of user defined schema.

  • desc (str, optional) – String of schema description (default=None).

Returns

int, schema id.

Raises
  • MRMInvalidSchemaError – If schema is invalid.

  • MRMBuildSchemaError – If failed to build schema.

  • MRMAddSchemaError – If failed to add schema.

commit()[source]

Flush data to disk and generate the correspond db files.

Returns

MSRStatus, SUCCESS or FAILED.

Raises
  • MRMOpenError – If failed to open MindRecord File.

  • MRMSetHeaderError – If failed to set header.

  • MRMIndexGeneratorError – If failed to create index generator.

  • MRMGenerateIndexError – If failed to write to database.

  • MRMCommitError – If failed to flush data to disk.

open_and_set_header()[source]

Open writer and set header

classmethod open_for_append(file_name)[source]

Open MindRecord file and get ready to append data.

Parameters

file_name (str) – String of MindRecord file name.

Returns

Instance of FileWriter.

Raises
  • ParamValueError – If file_name is invalid.

  • FileNameError – If path contains invalid character.

  • MRMOpenError – If failed to open MindRecord File.

  • MRMOpenForAppendError – If failed to open file for appending data.

set_header_size(header_size)[source]

Set the size of header.

Parameters

header_size (int) – Size of header, between 16KB and 128MB.

Returns

MSRStatus, SUCCESS or FAILED.

Raises

MRMInvalidHeaderSizeError – If failed to set header size.

set_page_size(page_size)[source]

Set the size of Page.

Parameters

page_size (int) – Size of page, between 32KB and 256MB.

Returns

MSRStatus, SUCCESS or FAILED.

Raises

MRMInvalidPageSizeError – If failed to set page size.

write_raw_data(raw_data, parallel_writer=False)[source]

Write raw data and generate sequential pair of MindRecord File and validate data based on predefined schema by default.

Parameters
  • raw_data (list[dict]) – List of raw data.

  • parallel_writer (bool, optional) – Load data parallel if it equals to True (default=False).

Raises
  • ParamTypeError – If index field is invalid.

  • MRMOpenError – If failed to open MindRecord File.

  • MRMValidateDataError – If data does not match blob fields.

  • MRMSetHeaderError – If failed to set header.

  • MRMWriteDatasetError – If failed to write dataset.

class mindspore.mindrecord.FileReader(file_name, num_consumer=4, columns=None, operator=None)[source]

Class to read MindRecord File series.

Parameters
  • file_name (str, list[str]) – One of MindRecord File or file list.

  • num_consumer (int, optional) – Number of consumer threads which load data to memory (default=4). It should not be smaller than 1 or larger than the number of CPU.

  • columns (list[str], optional) – List of fields which correspond data would be read (default=None).

  • operator (int, optional) – Reserved parameter for operators (default=None).

Raises

ParamValueError – If file_name, num_consumer or columns is invalid.

close()[source]

Stop reader worker and close File.

finish()[source]

Stop reader worker.

Raises

MRMFinishError – If failed to finish worker threads.

get_next()[source]

Yield a batch of data according to columns at a time.

Yields

dict – keys is the same as columns.

Raises

MRMUnsupportedSchemaError – If schema is invalid.

class mindspore.mindrecord.MindPage(file_name, num_consumer=4)[source]

Class to read MindRecord File series in pagination.

Parameters
  • file_name (str) – One of MindRecord File or file list.

  • num_consumer (int, optional) – Number of consumer threads which load data to memory (default=4). It should not be smaller than 1 or larger than the number of CPU.

Raises
  • ParamValueError – If file_name, num_consumer or columns is invalid.

  • MRMInitSegmentError – If failed to initialize ShardSegment.

property candidate_fields

Return candidate category fields.

Returns

list[str], by which data could be grouped.

property category_field

Getter function for category field

get_category_fields()[source]

Return candidate category fields.

read_at_page_by_id(category_id, page, num_row)[source]

Query by category id in pagination.

Parameters
  • category_id (int) – Category id, referred to the return of read_category_info.

  • page (int) – Index of page.

  • num_row (int) – Number of rows in a page.

Returns

List, list[dict].

Raises
  • ParamValueError – If any parameter is invalid.

  • MRMFetchDataError – If failed to fetch data by category.

  • MRMUnsupportedSchemaError – If schema is invalid.

read_at_page_by_name(category_name, page, num_row)[source]

Query by category name in pagination.

Parameters
  • category_name (str) – String of category field’s value, referred to the return of read_category_info.

  • page (int) – Index of page.

  • num_row (int) – Number of row in a page.

Returns

str, read at page.

read_category_info()[source]

Return category information when data is grouped by indicated category field.

Returns

str, description of group information.

Raises

MRMReadCategoryInfoError – If failed to read category information.

set_category_field(category_field)[source]

Set category field for reading.

Note

Should be a candidate category field.

Parameters

category_field (str) – String of category field name.

Returns

MSRStatus, SUCCESS or FAILED.

class mindspore.mindrecord.Cifar10ToMR(source, destination)[source]

Class is for transformation from cifar10 to MindRecord.

Parameters
  • source (str) – the cifar10 directory to be transformed.

  • destination (str) – the MindRecord file path to transform into.

Raises

ValueError – If source or destination is invalid.

transform(fields=None)[source]

Executes transformation from cifar10 to MindRecord.

Parameters

fields (list[str], optional) – list of index fields, ie. [“label”] (default=None).

Returns

SUCCESS/FAILED, whether successfully written into MindRecord.

class mindspore.mindrecord.Cifar100ToMR(source, destination)[source]

Class is for transformation from cifar100 to MindRecord.

Parameters
  • source (str) – the cifar100 directory to be transformed.

  • destination (str) – the MindRecord file path to transform into.

Raises

ValueError – If source or destination is invalid.

transform(fields=None)[source]

Executes transformation from cifar100 to MindRecord.

Parameters

fields (list[str]) – list of index field, ie. [“fine_label”, “coarse_label”].

Returns

SUCCESS/FAILED, whether successfully written into MindRecord.

class mindspore.mindrecord.ImageNetToMR(map_file, image_dir, destination, partition_number=1)[source]

Class is for transformation from imagenet to MindRecord.

Parameters
  • map_file (str) –

    the map file which indicate label. the map file content should like this:

    n02119789 1 pen
    n02100735 2 notebook
    n02110185 3 mouse
    n02096294 4 orange
    

  • image_dir (str) – image directory contains n02119789, n02100735, n02110185, n02096294 dir.

  • destination (str) – the MindRecord file path to transform into.

  • partition_number (int, optional) – partition size (default=1).

Raises

ValueError – If map_file, image_dir or destination is invalid.

transform()[source]

Executes transformation from imagenet to MindRecord.

Returns

SUCCESS/FAILED, whether successfully written into MindRecord.

class mindspore.mindrecord.MnistToMR(source, destination, partition_number=1)[source]

Class is for transformation from Mnist to MindRecord.

Parameters
  • source (str) – directory which contain t10k-images-idx3-ubyte.gz, train-images-idx3-ubyte.gz, t10k-labels-idx1-ubyte.gz, train-labels-idx1-ubyte.gz.

  • destination (str) – the MindRecord file directory to transform into.

  • partition_number (int, optional) – partition size (default=1).

Raises

ValueError – If source/destination/partition_number is invalid.

transform()[source]

Executes transformation from Mnist to MindRecord.

Returns

SUCCESS/FAILED, whether successfully written into MindRecord.