Class UnicodeCharTokenizer

Inheritance Relationships

Base Type

Class Documentation

class UnicodeCharTokenizer : public mindspore::dataset::TensorTransform

Tokenize a scalar tensor of UTF-8 string to Unicode characters.

Public Functions

explicit UnicodeCharTokenizer(bool with_offsets = false)

Constructor.

Parameters

with_offsets[in] whether to output offsets of tokens (default=false).

~UnicodeCharTokenizer() = default

Destructor.