mindspore.dataset.text.SentencePieceModel

class mindspore.dataset.text.SentencePieceModel[source]

An enumeration for SentencePieceModel.

Possible enumeration values are: SentencePieceModel.UNIGRAM, SentencePieceModel.BPE, SentencePieceModel.CHAR, SentencePieceModel.WORD.

  • SentencePieceModel,UNIGRAM: Unigram Language Model means the next word in the sentence is assumed to be independent of the previous words generated by the model.

  • SentencePieceModel.BPE: refers to byte pair encoding algorithm, which replaces the most frequent pair of bytes in a sentence with a single, unused byte.

  • SentencePieceModel.CHAR: refers to char based sentencePiece Model type.

  • SentencePieceModel.WORD: refers to word based sentencePiece Model type.