Class SentencePieceTokenizer
- Defined in File text.h 
Inheritance Relationships
Base Type
- public mindspore::dataset::TensorTransform(Class TensorTransform)
Class Documentation
- 
class SentencePieceTokenizer : public mindspore::dataset::TensorTransform
- Tokenize a scalar token or a 1-D token to tokens by sentencepiece. - Public Functions - 
SentencePieceTokenizer(const std::shared_ptr<SentencePieceVocab> &vocab, mindspore::dataset::SPieceTokenizerOutType out_type)
- Constructor. - Parameters
- vocab – [in] a SentencePieceVocab object. 
- out_type – [in] The type of the output. 
 
 
 - 
inline SentencePieceTokenizer(const std::string &vocab_path, mindspore::dataset::SPieceTokenizerOutType out_type)
- Constructor. - Parameters
- vocab_path – [in] vocab model file path. 
- out_type – [in] The type of the output. 
 
 
 - 
SentencePieceTokenizer(const std::vector<char> &vocab_path, mindspore::dataset::SPieceTokenizerOutType out_type)
- Constructor. - Parameters
- vocab_path – [in] vocab model file path. type should be char of vector. 
- out_type – [in] The type of the output. 
 
 
 - 
~SentencePieceTokenizer() = default
- Destructor. 
 
- 
SentencePieceTokenizer(const std::shared_ptr<SentencePieceVocab> &vocab, mindspore::dataset::SPieceTokenizerOutType out_type)