mindspore.dataset.text.CharNGram

View Source On Gitee
class mindspore.dataset.text.CharNGram[source]

CharNGram pre-trained word embeddings.

A word or sentence is represented using a character n-gram count vector, followed by a single nonlinear transformation to yield a low-dimensional embedding.

classmethod from_file(file_path, max_vectors=None)[source]

Load the CharNGram pre-training vector set file.

Parameters
  • file_path (str) – Path to the CharNGram pre-training vector set file.

  • max_vectors (int, optional) – The upper limit on the number of pre-trained vectors to load. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn’t fit in memory, or is not needed for another reason, this value can limit the size of the loaded set. Default: None, no upper limit.

Returns

CharNGram, CharNGram pre-training vectors.

Raises
  • TypeError – If file_path is not of type str.

  • RuntimeError – If file_path does not exist or is not accessible.

  • TypeError – If max_vectors is not of type int.

  • ValueError – If max_vectors is negative.

Examples

>>> import mindspore.dataset.text as text
>>>
>>> char_n_gram = text.CharNGram.from_file("/path/to/char_n_gram/file", max_vectors=None)
>>> to_vectors = text.ToVectors(char_n_gram)
>>> # Look up a token into vectors according CharNGram model.
>>> word_vector = to_vectors(["word1", "word2"])