mindspore.dataset.text.FastText

View Source On Gitee
class mindspore.dataset.text.FastText[source]

FastText pre-trained word embeddings.

FastText allows one to create an unsupervised learning or supervised learning algorithm vector representations for words.

classmethod from_file(file_path, max_vectors=None)[source]

Load the FastText pre-training vector set file.

Parameters
  • file_path (str) – Path to the FastText pre-trained vector set file. File suffix should be *.vec.

  • max_vectors (int, optional) – The upper limit on the number of pre-trained vectors to load. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn’t fit in memory, or is not needed for another reason, this value can limit the size of the loaded set. Default: None, no upper limit.

Returns

FastText, FastText pre-training vectors.

Raises
  • TypeError – If file_path is not of type str.

  • RuntimeError – If file_path does not exist or is not accessible.

  • TypeError – If max_vectors is not of type int.

  • ValueError – If max_vectors is negative.

Examples

>>> import mindspore.dataset.text as text
>>> fast_text = text.FastText.from_file("/path/to/fast_text/file", max_vectors=None)
>>> to_vectors = text.ToVectors(fast_text)
>>> # Look up a token into vectors according FastText model.
>>> word_vector = to_vectors(["word1", "word2"])