Class Ngram

Inheritance Relationships

Base Type

Class Documentation

class Ngram : public mindspore::dataset::TensorTransform

Generate n-gram from a 1-D string Tensor.

Public Functions

inline explicit Ngram(const std::vector<int32_t> &ngrams, const std::pair<std::string, int32_t> &left_pad = {"", 0}, const std::pair<std::string, int32_t> &right_pad = {"", 0}, const std::string &separator = " ")

Constructor.

Parameters
  • ngrams[in] ngrams is a vector of positive integers. For example, if ngrams={4, 3}, then the result would be a 4-gram followed by a 3-gram in the same tensor. If the number of words is not enough to make up a n-gram, an empty string will be returned.

  • left_pad[in] {“pad_token”, pad_width}. Padding performed on left side of the sequence. pad_width will be capped at n-1. left_pad=(“_”,2) would pad the left side of the sequence with “__” (default={“”, 0}}).

  • right_pad[in] {“pad_token”, pad_width}. Padding performed on right side of the sequence.pad_width will be capped at n-1. right_pad=(“-“,2) would pad the right side of the sequence with “–” (default={“”, 0}}).

  • separator[in] Symbol used to join strings together (default=” “).

样例
/* Define operations */
auto ngram_op = text::Ngram({2, 3}, {"&", 2}, {"&", 2}, "-");

/* dataset is an instance of Dataset object */
dataset = dataset->Map({ngram_op},   // operations
                       {"text"});    // input columns
Ngram(const std::vector<int32_t> &ngrams, const std::pair<std::vector<char>, int32_t> &left_pad, const std::pair<std::vector<char>, int32_t> &right_pad, const std::vector<char> &separator)

Constructor.

Parameters
  • ngrams[in] ngrams is a vector of positive integers. For example, if ngrams={4, 3}, then the result would be a 4-gram followed by a 3-gram in the same tensor. If the number of words is not enough to make up a n-gram, an empty string will be returned.

  • left_pad[in] {“pad_token”, pad_width}. Padding performed on left side of the sequence. pad_width will be capped at n-1. left_pad=(“_”,2) would pad the left side of the sequence with “__” (default={“”, 0}}).

  • right_pad[in] {“pad_token”, pad_width}. Padding performed on right side of the sequence.pad_width will be capped at n-1. right_pad=(“-“,2) would pad the right side of the sequence with “–” (default={“”, 0}}).

  • separator[in] Symbol used to join strings together (default=” “).

~Ngram() = default

Destructor.