mindspore.dataset.text.transforms.PythonTokenizer

class mindspore.dataset.text.transforms.PythonTokenizer(tokenizer)[source]

Callable class to be used for user-defined string tokenizer.

Parameters

tokenizer (Callable) – Python function that takes a str and returns a list of str as tokens.

Examples

>>> def my_tokenizer(line):
...     return line.split()
>>> text_file_dataset = text_file_dataset.map(operations=text.PythonTokenizer(my_tokenizer))