mindspore.dataset.text.RegexReplace

class mindspore.dataset.text.RegexReplace(pattern, replace, replace_all=True)[source]

Replace part of the input UTF-8 string with a difference text string using regular expressions.

Note

RegexReplace is not supported on Windows platform yet.

Parameters

pattern (str) – The regular expression, used to mean the specific, standard textual syntax for representing patterns for matching text.
replace (str) – The string used to replace the matched elements.
replace_all (bool, optional) – Whether to replace all matched elements. If False, only the first matched element will be replaced; otherwise, all matched elements will be replaced. Default: True.

Raises

TypeError – If pattern is not of type str.
TypeError – If replace is not of type str.
TypeError – If replace_all is not of type bool.

Supported Platforms:: CPU

Examples

>>> import mindspore.dataset as ds
>>> import mindspore.dataset.text as text
>>>
>>> # Use the transform in dataset pipeline mode
>>> numpy_slices_dataset = ds.NumpySlicesDataset(data=['apple orange apple orange apple'],
...                                              column_names=["text"])
>>> regex_replace = text.RegexReplace('apple', 'orange')
>>> numpy_slices_dataset = numpy_slices_dataset.map(operations=regex_replace)
>>> for item in numpy_slices_dataset.create_dict_iterator(num_epochs=1, output_numpy=True):
...     print(item["text"])
orange orange orange orange orange
>>>
>>> # Use the transform in eager mode
>>> data = 'onetwoonetwoone'
>>> output = text.RegexReplace(pattern="one", replace="two", replace_all=True)(data)
>>> print(output)
twotwotwotwotwo

Tutorial Examples:

Illustration of text transforms