mindspore.dataset.text.GloVe

View Source On Gitee
class mindspore.dataset.text.GloVe[source]

Global Vectors (GloVe) pre-trained word embeddings.

GloVe is an unsupervised learning algorithm for obtaining vector representations for word.

classmethod from_file(file_path, max_vectors=None)[source]

Load the GloVe pre-training vector set file.

Parameters
  • file_path (str) – Path to the GloVe pre-training vector set file. File name is similar to glove.*.txt.

  • max_vectors (int, optional) – The upper limit on the number of pre-trained vectors to load. Most pre-trained vector sets are sorted in the descending order of word frequency. Thus, in situations where the entire set doesn’t fit in memory, or is not needed for another reason, this value can limit the size of the loaded set. Default: None, no upper limit.

Returns

GloVe, GloVe pre-training vectors.

Raises
  • TypeError – If file_path is not of type str.

  • RuntimeError – If file_path does not exist or is not accessible.

  • TypeError – If max_vectors is not of type int.

  • ValueError – If max_vectors is negative.

Examples

>>> import mindspore.dataset.text as text
>>> glove = text.GloVe.from_file("/path/to/glove/file", max_vectors=None)
>>> to_vectors = text.ToVectors(glove)
>>> # Look up a token into vectors according GloVe model.
>>> word_vector = to_vectors(["word1", "word2"])