Function mindspore::dataset::MindData

Function Documentation

inline std::shared_ptr<MindDataDataset> mindspore::dataset::MindData(const std::vector<std::string> &dataset_files, const std::vector<std::string> &columns_list = {}, const std::shared_ptr<Sampler> &sampler = std::make_shared<RandomSampler>(), nlohmann::json *padded_sample = nullptr, int64_t num_padded = 0, ShuffleMode shuffle_mode = ShuffleMode::kGlobal, const std::shared_ptr<DatasetCache> &cache = nullptr)

Function to create a MindDataDataset.

Parameters
  • dataset_files[in] List of dataset files to be read directly.

  • columns_list[in] List of columns to be read (default={}).

  • sampler[in] Shared pointer to a sampler object used to choose samples from the dataset. If sampler is not given, a RandomSampler will be used to randomly iterate the entire dataset (default = RandomSampler()), supported sampler list: SubsetRandomSampler, PkSampler, RandomSampler, SequentialSampler, DistributedSampler.

  • padded_sample[in] Samples will be appended to dataset, where keys are the same as column_list.

  • num_padded[in] Number of padding samples. Dataset size plus num_padded should be divisible by num_shards.

  • shuffle_mode[in] The mode for shuffling data every epoch (Default=ShuffleMode::kGlobal). Can be any of: ShuffleMode::kFalse - No shuffling is performed. ShuffleMode::kFiles - Shuffle files only. ShuffleMode::kGlobal - Shuffle both the files and samples. ShuffleMode::kInfile - Shuffle samples in file.

  • cache[in] Tensor cache to use (default=nullptr which means no cache is used).

Returns

Shared pointer to the MindDataDataset.

样例
/* Define dataset path and MindData object */
std::string file_path1 = "/path/to/mindrecord_file1";
std::string file_path2 = "/path/to/mindrecord_file2";
std::vector<std::string> file_list = {file_path1, file_path2};
std::vector<std::string> column_names = {"data", "file_name", "label"};
std::shared_ptr<Dataset> ds = MindData(file_list, column_names);

/* Create iterator to read dataset */
std::shared_ptr<Iterator> iter = ds->CreateIterator();
std::unordered_map<std::string, mindspore::MSTensor> row;
iter->GetNextRow(&row);

/* Note: As we defined before, each data dictionary owns keys "data", "file_name" and "label" */
auto data = row["data"];