比较与torchtext.data.functional.load_sp_model的差异

查看源文件

torchtext.data.functional.load_sp_model

torchtext.data.functional.load_sp_model(
    spm
)

更多内容详见torchtext.data.functional.load_sp_model

mindspore.dataset.text.SentencePieceTokenizer

class mindspore.dataset.text.SentencePieceTokenizer(mode, out_type)

更多内容详见mindspore.dataset.text.SentencePieceTokenizer

使用方式

PyTorch:加载SentencePiece分词模型。

MindSpore:构造一个SentencePiece分词器,包含加载SentencePiece模型功能。

分类

子类

PyTorch

MindSpore

差异

参数

参数1

spm

mode

MindSpore支持SentencePiece词汇表或SentencePiece模型地址

参数2

-

out_type

分词器输出的类型

代码示例

from download import download

url = "https://mindspore-website.obs.cn-north-4.myhuaweicloud.com/notebook/datasets/sentencepiece.bpe.model"
download(url, './sentencepiece.bpe.model', replace=True)

# PyTorch
from torchtext.data.functional import load_sp_model
model = load_sp_model("sentencepiece.bpe.model")

# MindSpore
import mindspore.dataset.text as text
model = text.SentencePieceTokenizer("sentencepiece.bpe.model", out_type=text.SPieceTokenizerOutType.STRING)