Class SentencePieceTokenizer

Inheritance Relationships

Base Type

Class Documentation

class SentencePieceTokenizer : public mindspore::dataset::TensorTransform

Tokenize a scalar token or a 1-D token to tokens by sentencepiece.

Public Functions

SentencePieceTokenizer(const std::shared_ptr<SentencePieceVocab> &vocab, mindspore::dataset::SPieceTokenizerOutType out_type)

Constructor.

Parameters
  • vocab[in] a SentencePieceVocab object.

  • out_type[in] The type of the output.

inline SentencePieceTokenizer(const std::string &vocab_path, mindspore::dataset::SPieceTokenizerOutType out_type)

Constructor.

Parameters
  • vocab_path[in] vocab model file path.

  • out_type[in] The type of the output.

SentencePieceTokenizer(const std::vector<char> &vocab_path, mindspore::dataset::SPieceTokenizerOutType out_type)

Constructor.

Parameters
  • vocab_path[in] vocab model file path. type should be char of vector.

  • out_type[in] The type of the output.

~SentencePieceTokenizer() = default

Destructor.