mindspore.mindrecord
Introduction of MindRecord.
MindRecord is a module to implement reading, writing, searching and converting for MindSpore format dataset. Users could use the FileWriter API to generate MindRecord data and use the MindDataset API to load MindRecord data. Users could also convert other format datasets to mindrecord data through corresponding sub-module.
- class mindspore.mindrecord.Cifar100ToMR(source, destination)[source]
- A class to transform from cifar100 to MindRecord. - Note - For details about Examples, please refer to Converting the CIFAR-10 Dataset. - Parameters
- Raises
- ValueError – If source or destination is invalid. 
 
- class mindspore.mindrecord.Cifar10ToMR(source, destination)[source]
- A class to transform from cifar10 to MindRecord. - Note - For details about Examples, please refer to Converting the CIFAR-10 Dataset. - Parameters
- Raises
- ValueError – If source or destination is invalid. 
 
- class mindspore.mindrecord.CsvToMR(source, destination, columns_list=None, partition_number=1)[source]
- A class to transform from csv to MindRecord. - Note - For details about Examples, please refer to Converting CSV Dataset. - Parameters
- Raises
- ValueError – If source, destination, partition_number is invalid. 
- RuntimeError – If columns_list is invalid. 
 
 
- class mindspore.mindrecord.FileReader(file_name, num_consumer=4, columns=None, operator=None)[source]
- Class to read MindRecord files. - Note - If file_name is a filename string, it tries to load all MindRecord files generated in a conversion, and throws an exception if a MindRecord file is missing. If file_name is a filename list, only the MindRecord files in the list are loaded. - Parameters
- file_name (str, list[str]) – One of MindRecord file or a file list. 
- num_consumer (int, optional) – Number of reader workers which load data. Default: 4. It should not be smaller than 1 or larger than the number of processor cores. 
- columns (list[str], optional) – A list of fields where corresponding data would be read. Default: None. 
- operator (int, optional) – Reserved parameter for operators. Default: None. 
 
- Raises
- ParamValueError – If file_name, num_consumer or columns is invalid. 
 
- class mindspore.mindrecord.FileWriter(file_name, shard_num=1, overwrite=False)[source]
- Class to write user defined raw data into MindRecord files. - Note - After the MindRecord file is generated, if the file name is changed, the file may fail to be read. - Parameters
- Raises
- ParamValueError – If file_name or shard_num or overwrite is invalid. 
 - Examples - >>> from mindspore.mindrecord import FileWriter >>> schema_json = {"file_name": {"type": "string"}, "label": {"type": "int32"}, "data": {"type": "bytes"}} >>> indexes = ["file_name", "label"] >>> data = [{"file_name": "1.jpg", "label": 0, ... "data": b"\x10c\xb3w\xa8\xee$o&<q\x8c\x8e(\xa2\x90\x90\x96\xbc\xb1\x1e\xd4QER\x13?\xff"}, ... {"file_name": "2.jpg", "label": 56, ... "data": b"\xe6\xda\xd1\xae\x07\xb8>\xd4\x00\xf8\x129\x15\xd9\xf2q\xc0\xa2\x91YFUO\x1dsE1"}, ... {"file_name": "3.jpg", "label": 99, ... "data": b"\xaf\xafU<\xb8|6\xbd}\xc1\x99[\xeaj+\x8f\x84\xd3\xcc\xa0,i\xbb\xb9-\xcdz\xecp{T\xb1"}] >>> writer = FileWriter(file_name="test.mindrecord", shard_num=1, overwrite=True) >>> writer.add_schema(schema_json, "test_schema") 0 >>> writer.add_index(indexes) MSRStatus.SUCCESS >>> writer.write_raw_data(data) MSRStatus.SUCCESS >>> writer.commit() MSRStatus.SUCCESS - add_index(index_fields)[source]
- Select index fields from schema to accelerate reading. - Note - The index fields should be primitive type. e.g. int/float/str. If the function is not called, the fields of the primitive type in schema are set as indexes by default. - Please refer to the Examples of class: mindspore.mindrecord.FileWriter. - Parameters
- Returns
- MSRStatus, SUCCESS or FAILED. 
- Raises
- ParamTypeError – If index field is invalid. 
- MRMDefineIndexError – If index field is not primitive type. 
- MRMAddIndexError – If failed to add index field. 
- MRMGetMetaError – If the schema is not set or failed to get meta. 
 
 
 - add_schema(content, desc=None)[source]
- The schema is added to describe the raw data to be written. - Note - Please refer to the Examples of class: mindspore.mindrecord.FileWriter. 
 - commit()[source]
- Flush data in memory to disk and generate the corresponding database files. - Note - Please refer to the Examples of class: mindspore.mindrecord.FileWriter. - Returns
- MSRStatus, SUCCESS or FAILED. 
- Raises
- MRMOpenError – If failed to open MindRecord file. 
- MRMSetHeaderError – If failed to set header. 
- MRMIndexGeneratorError – If failed to create index generator. 
- MRMGenerateIndexError – If failed to write to database. 
- MRMCommitError – If failed to flush data to disk. 
 
 
 - open_and_set_header()[source]
- Open writer and set header. The function is only used for parallel writing and is called before the write_raw_data. - Returns
- MSRStatus, SUCCESS or FAILED. 
- Raises
- MRMOpenError – If failed to open MindRecord file. 
- MRMSetHeaderError – If failed to set header. 
 
 
 - classmethod open_for_append(file_name)[source]
- Open MindRecord file and get ready to append data. - Parameters
- file_name (str) – String of MindRecord file name. 
- Returns
- FileWriter, file writer object for the opened MindRecord file. 
- Raises
- ParamValueError – If file_name is invalid. 
- FileNameError – If path contains invalid characters. 
- MRMOpenError – If failed to open MindRecord file. 
- MRMOpenForAppendError – If failed to open file for appending data. 
 
 - Examples - >>> from mindspore.mindrecord import FileWriter >>> schema_json = {"file_name": {"type": "string"}, "label": {"type": "int32"}, "data": {"type": "bytes"}} >>> data = [{"file_name": "1.jpg", "label": 0, ... "data": b"\x10c\xb3w\xa8\xee$o&<q\x8c\x8e(\xa2\x90\x90\x96\xbc\xb1\x1e\xd4QER\x13?\xff"}] >>> writer = FileWriter(file_name="test.mindrecord", shard_num=1, overwrite=True) >>> writer.add_schema(schema_json, "test_schema") 0 >>> writer.write_raw_data(data) MSRStatus.SUCCESS >>> writer.commit() MSRStatus.SUCCESS >>> write_append = FileWriter.open_for_append("test.mindrecord") >>> write_append.write_raw_data(data) MSRStatus.SUCCESS >>> write_append.commit() MSRStatus.SUCCESS 
 - set_header_size(header_size)[source]
- Set the size of header which contains shard information, schema information, page meta information, etc. The larger a header, the more data the MindRecord file can store. If the size of header is larger than the default size (16MB), users need to call the API to set a proper size. - Parameters
- header_size (int) – Size of header, between 16*1024(16KB) and 128*1024*1024(128MB). 
- Returns
- MSRStatus, SUCCESS or FAILED. 
- Raises
- MRMInvalidHeaderSizeError – If failed to set header size. 
 - Examples - >>> from mindspore.mindrecord import FileWriter >>> writer = FileWriter(file_name="test.mindrecord", shard_num=1) >>> writer.set_header_size(1 << 25) # 32MB MSRStatus.SUCCESS 
 - set_page_size(page_size)[source]
- Set the size of page that represents the area where data is stored, and the areas are divided into two types: raw page and blob page. The larger a page, the more data the page can store. If the size of a sample is larger than the default size (32MB), users need to call the API to set a proper size. - Parameters
- page_size (int) – Size of page, between 32*1024(32KB) and 256*1024*1024(256MB). 
- Returns
- MSRStatus, SUCCESS or FAILED. 
- Raises
- MRMInvalidPageSizeError – If failed to set page size. 
 - Examples - >>> from mindspore.mindrecord import FileWriter >>> writer = FileWriter(file_name="test.mindrecord", shard_num=1) >>> writer.set_page_size(1 << 26) # 128MB MSRStatus.SUCCESS 
 - write_raw_data(raw_data, parallel_writer=False)[source]
- Convert raw data into a series of consecutive MindRecord files after the raw data is verified against the schema. - Note - Please refer to the Examples of class: mindspore.mindrecord.FileWriter. - Parameters
- Returns
- MSRStatus, SUCCESS or FAILED. 
- Raises
- ParamTypeError – If index field is invalid. 
- MRMOpenError – If failed to open MindRecord file. 
- MRMValidateDataError – If data does not match blob fields. 
- MRMSetHeaderError – If failed to set header. 
- MRMWriteDatasetError – If failed to write dataset. 
 
 
 
- class mindspore.mindrecord.ImageNetToMR(map_file, image_dir, destination, partition_number=1)[source]
- A class to transform from imagenet to MindRecord. - Note - For details about Examples, please refer to Converting the ImageNet Dataset. - Parameters
- map_file (str) – - the map file that indicates label. The map file content should be like this: - n02119789 0 n02100735 1 n02110185 2 n02096294 3 
- image_dir (str) – image directory contains n02119789, n02100735, n02110185 and n02096294 directory. 
- destination (str) – the MindRecord file path to transform into. 
- partition_number (int, optional) – partition size. Default: 1. 
 
- Raises
- ValueError – If map_file, image_dir or destination is invalid. 
 
- class mindspore.mindrecord.MindPage(file_name, num_consumer=4)[source]
- Class to read MindRecord files in pagination. - Parameters
- Raises
- ParamValueError – If file_name, num_consumer or columns is invalid. 
- MRMInitSegmentError – If failed to initialize ShardSegment. 
 
 - property candidate_fields
- Return candidate category fields. - Returns
- list[str], by which data could be grouped. 
 
 - property category_field
- Getter function for category fields. - Returns
- list[str], by which data could be grouped. 
 
 - get_category_fields()[source]
- Return candidate category fields. - Returns
- list[str], by which data could be grouped. 
 
 - read_at_page_by_id(category_id, page, num_row)[source]
- Query by category id in pagination. - Parameters
- Returns
- list[dict], data queried by category id. 
- Raises
- ParamValueError – If any parameter is invalid. 
- MRMFetchDataError – If failed to fetch data by category. 
- MRMUnsupportedSchemaError – If schema is invalid. 
 
 
 
- class mindspore.mindrecord.MnistToMR(source, destination, partition_number=1)[source]
- A class to transform from Mnist to MindRecord. - Parameters
- Raises
- ValueError – If source, destination, partition_number is invalid. 
 
- class mindspore.mindrecord.TFRecordToMR(source, destination, feature_dict, bytes_fields=None)[source]
- A class to transform from TFRecord to MindRecord. - Note - For details about Examples, please refer to Converting TFRecord Dataset. - Parameters
- source (str) – the TFRecord file to be transformed. 
- destination (str) – the MindRecord file path to transform into. 
- feature_dict (dict) – a dictionary that states the feature type, and VarLenFeature is not supported. 
- bytes_fields (list, optional) – the bytes fields which are in feature_dict and can be images bytes. Default: None. 
 
- Raises
- ValueError – If parameter is invalid. 
- Exception – when tensorflow module is not found or version is not correct. 
 
 - run()[source]
- Execute transformation from TFRecord to MindRecord. - Returns
- MSRStatus, whether TFRecord is successfully transformed to MindRecord. 
 
 - tfrecord_iterator()[source]
- Yield a dictionary whose keys are fields in schema. - Yields
- dict, data dictionary whose keys are the same as columns.