Function mindspore::dataset::Tedlium

Function Documentation

inline std::shared_ptr<TedliumDataset> mindspore::dataset::Tedlium(const std::string &dataset_dir, const std::string &release, const std::string &usage = "all", const std::string &extensions = ".sph", const std::shared_ptr<Sampler> &sampler = std::make_shared<RandomSampler>(), const std::shared_ptr<DatasetCache> &cache = nullptr)

Function to create a TedliumDataset.

Note

The generated dataset has six columns [“waveform”, “sample_rate”, “transcript”, “talk_id”, “speaker_id”, “identifier”].

Parameters
  • dataset_dir[in] Path to the root directory that contains the dataset.

  • release[in] Release of the dataset, can be “release1”, “release2”, “release3”.

  • usage[in] Part of dataset of TEDLIUM, for release3, only can be “all”, for release1 and release2, can be “train”, “test” or “all” (default = “all”).

  • extensions[in] The extensions of audio file. Only support “.sph” now (default = “.sph”).

  • sampler[in] Shared pointer to a sampler object used to choose samples from the dataset. If sampler is not given, a RandomSampler will be used to randomly iterate the entire dataset (default = RandomSampler()).

  • cache[in] Tensor cache to use (default=nullptr, which means no cache is used).

Returns

Shared pointer to the TedliumDataset.