mindspore.dataset.text.FastText

class mindspore.dataset.text.FastText[source]

FastText pre-trained word embeddings.

FastText allows one to create vector representations for words using unsupervised or supervised learning algorithms.

classmethod from_file(file_path, max_vectors=None)[source]

Load the FastText pre-training vector set file.

Parameters

file_path (str) – Path to the FastText pre-trained vector set file. File suffix should be *.vec.
max_vectors (int, optional) – The upper limit on the number of pre-trained vectors to load. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn't fit in memory, or is not needed for another reason, this value can limit the size of the loaded set. Default: None, no upper limit.

Returns

FastText, pre-training vectors.

Raises

TypeError – If file_path is not of type str.
RuntimeError – If file_path does not exist or is not accessible.
TypeError – If max_vectors is not of type int.
ValueError – If max_vectors is negative.

Examples

>>> import mindspore.dataset.text as text
>>> fast_text = text.FastText.from_file("/path/to/fast_text/file", max_vectors=None)
>>> to_vectors = text.ToVectors(fast_text)
>>> # Look up tokens and get vectors according to the FastText model.
>>> word_vector = to_vectors(["word1", "word2"])