mindspore.dataset.text.transforms.Lookup

class mindspore.dataset.text.transforms.Lookup(vocab, unknown_token=None, data_type=mstype.int32)[source]

Look up a word into an id according to the input vocabulary table.

Parameters
  • vocab (Vocab) – A vocabulary object.

  • unknown_token (str, optional) – Word used for lookup if the word being looked up is out-of-vocabulary (OOV). If unknown_token is OOV, a runtime error will be thrown (default=None).

  • data_type (mindspore.dtype, optional) – mindspore.dtype that lookup maps string to (default=mindspore.int32)

Examples

>>> # Load vocabulary from list
>>> vocab = text.Vocab.from_list(['深', '圳', '欢', '迎', '您'])
>>> # Use Lookup operator to map tokens to ids
>>> lookup = text.Lookup(vocab)
>>> text_file_dataset = text_file_dataset.map(operations=[lookup])