Class SentencePieceVocab
- Defined in File text.h 
Class Documentation
- 
class SentencePieceVocab
- SentencePiece object that is used to do words segmentation. - Public Static Functions - Build a SentencePiece object from a file. - 参数
- path_list – [in] Path to the file which contains the SentencePiece list. 
- vocab_size – [in] Vocabulary size. 
- character_coverage – [in] Amount of characters covered by the model, good defaults are: 0.9995 for languages with rich character set like Japanese or Chinese and 1.0 for other languages with small character set. 
- model_type – [in] It can be any of [SentencePieceModel::kUnigram, SentencePieceModel::kBpe, SentencePieceModel::kChar, SentencePieceModel::kWord], default is SentencePieceModel::kUnigram. The input sentence must be pre-tokenized when using SentencePieceModel.WORD type. - SentencePieceModel.kUnigram, Unigram Language Model means the next word in the sentence is assumed to be independent of the previous words generated by the model. 
- SentencePieceModel.kBpe, refers to byte pair encoding algorithm, which replaces the most frequent pair of bytes in a sentence with a single, unused byte. 
- SentencePieceModel.kChar, refers to char based sentencePiece Model type. 
- SentencePieceModel.kWord, refers to word based sentencePiece Model type. 
 
- params – [in] A dictionary with no incoming parameters(The parameters are derived from SentencePiece library). 
- vocab – [out] A SentencePieceVocab object. 
 
- 返回
- SentencePieceVocab, vocab built from the file. 样例
- std::string dataset_path; dataset_path = datasets_root_path_ + "/test_sentencepiece/vocab.txt"; std::vector<std::string> path_list; path_list.emplace_back(dataset_path); std::unordered_map<std::string, std::string> param_map; std::shared_ptr<SentencePieceVocab> spm = std::make_unique<SentencePieceVocab>(); Status rc = SentencePieceVocab::BuildFromFile(path_list, 5000, 0.9995, SentencePieceModel::kUnigram, param_map, &spm); 
 
 - Save the SentencePiece model into given file path. - 参数
- vocab – [in] A SentencePiece object to be saved. 
- path – [in] Path to store the model. 
- filename – [in] The save name of model file. 
 样例
- // Save vocab model to local vocab->SaveModel(&vocab, datasets_root_path_ + "/test_sentencepiece", "m.model");