Class UnicodeScriptTokenizer
- Defined in File text.h 
Inheritance Relationships
Base Type
- public mindspore::dataset::TensorTransform(Class TensorTransform)
Class Documentation
- 
class UnicodeScriptTokenizer : public mindspore::dataset::TensorTransform
- Tokenize a scalar tensor of UTF-8 string on Unicode script boundaries. - Public Functions - 
explicit UnicodeScriptTokenizer(bool keep_whitespace = false, bool with_offsets = false)
- Constructor. - Example
- /* Define operations */ auto tokenizer_op = text::UnicodeScriptTokenizer(false, true); /* dataset is an instance of Dataset object */ dataset = dataset->Map({tokenizer_op}, // operations {"text"}); // input columns 
 - 参数
- keep_whitespace – [in] whether to emit whitespace tokens (default=false). 
- with_offsets – [in] whether to output offsets of tokens (default=false). 
 
 
 - 
~UnicodeScriptTokenizer() = default
- Destructor. 
 
- 
explicit UnicodeScriptTokenizer(bool keep_whitespace = false, bool with_offsets = false)