Deprecated
This page belongs to the Static Graph (GRAPH_MODE) Implementation section and has been marked as deprecated. New features are primarily being developed in the "r2.0.0 Dynamic Graph Implementation" section. Please refer to the dynamic graph documentation first.
Quantization
Overview
Quantization is an important technology for compressing foundation models. It converts floating-point parameters in a model into low-precision integer parameters to compress the parameters. As the parameters and specifications of a model increase, quantization can effectively reduce the model storage space and loading time during model deployment, improving the model inference performance.
MindSpore Transformers integrates the MindSpore Golden Stick tool component to provide a unified quantization inference process, facilitating out-of-the-box use. Please refer to MindSpore Golden Stick Installation Tutorial for installation and MindSpore Golden Stick Application PTQ algorithm to quantify the models in MindSpore Transformers.
Model Support
Currently, only the following models are supported, and the supported models are continuously being added.
Supported Model |
|---|