mindspore.dataset.text.AddToken

View Source On Gitee
class mindspore.dataset.text.AddToken(token, begin=True)[source]

Add token to beginning or end of sequence.

Parameters
  • token (str) – The token to be added.

  • begin (bool, optional) – Choose the position where the token is inserted. If True, the token will be inserted at the beginning of the sequence. Otherwise, it will be inserted at the end of the sequence. Default: True.

Raises
  • TypeError – If token is not of type string.

  • TypeError – If begin is not of type bool.

Supported Platforms:

CPU

Examples

>>> import mindspore.dataset as ds
>>> import mindspore.dataset.text as text
>>>
>>> # Use the transform in dataset pipeline mode
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data=[['a', 'b', 'c', 'd', 'e']], column_names=["text"])
>>> # Data before
>>> # |           text            |
>>> # +---------------------------+
>>> # | ['a', 'b', 'c', 'd', 'e'] |
>>> # +---------------------------+
>>> add_token_op = text.AddToken(token='TOKEN', begin=True)
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=add_token_op)
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["text"])
['TOKEN' 'a' 'b' 'c' 'd' 'e']
>>> # Data after
>>> # |           text            |
>>> # +---------------------------+
>>> # | ['TOKEN', 'a', 'b', 'c', 'd', 'e'] |
>>> # +---------------------------+
>>>
>>> # Use the transform in eager mode
>>> data = ["happy", "birthday", "to", "you"]
>>> output = text.AddToken(token='TOKEN', begin=True)(data)
>>> print(output)
['TOKEN' 'happy' 'birthday' 'to' 'you']
Tutorial Examples: