Quantization Aware Training, ERNIE and Kurtosis Regularizer: a short empirical study
Andrea Zanetti

TL;DR
This paper empirically investigates quantization aware training (QAT) for pre-trained models like Ernie, highlighting challenges and proposing a simple regularizer approach to improve INT8 accuracy in low-precision inference.
Contribution
It identifies why existing regularizers do not work out-of-the-box for Ernie and proposes a basic method to enhance quantization robustness for pre-trained models.
Findings
Initial results show increased INT8 accuracy with the proposed regularizer.
Analysis explains the incompatibility of existing regularizers with Ernie.
Provides practical insights for low-precision deployment of pre-trained models.
Abstract
Pre-trained language models like Ernie or Bert are currently used in many applications. These models come with a set of pre-trained weights typically obtained in unsupervised/self-supervised modality on a huge amount of data. After that, they are fine-tuned on a specific task. Applications then use these models for inference, and often some additional constraints apply, like low power-budget or low latency between input and output. The main avenue to meet these additional requirements for the inference settings, is to use low precision computation (e.g. INT8 rather than FP32), but this comes with a cost of deteriorating the functional performance (e.g. accuracy) of the model. Some approaches have been developed to tackle the problem and go beyond the limitations of the PTO (Post-Training Quantization), more specifically the QAT (Quantization Aware Training, see [4]) is a procedure that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image Retrieval and Classification Techniques · Generative Adversarial Networks and Image Synthesis
MethodsMulti-Head Attention · Attention Is All You Need · ERNIE · Linear Layer · Attentive Walk-Aggregating Graph Neural Network · Weight Decay · Adam · Layer Normalization · WordPiece · Dropout
