Outlier Suppression: Pushing the Limit of Low-bit Transformer Language   Models

Xiuying Wei; Yunchen Zhang; Xiangguo Zhang; Ruihao Gong; Shanghang; Zhang; Qi Zhang; Fengwei Yu; Xianglong Liu

arXiv:2209.13325·cs.LG·February 22, 2023·28 cites

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models

Xiuying Wei, Yunchen Zhang, Xiangguo Zhang, Ruihao Gong, Shanghang, Zhang, Qi Zhang, Fengwei Yu, Xianglong Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel outlier suppression framework for low-bit transformer models, significantly improving quantization performance and enabling 6-bit BERT quantization to reach full-precision accuracy.

Contribution

It reveals the role of LayerNorm gamma as an outlier amplifier and proposes Gamma Migration and Token-Wise Clipping to effectively suppress outliers without extra computational burden.

Findings

01

Surpasses existing methods in outlier suppression

02

Enables 6-bit BERT quantization to match full-precision performance

03

Provides a plug-and-play framework for low-bit transformer quantization

Abstract

Transformer architecture has become the fundamental element of the widespread natural language processing~(NLP) models. With the trends of large NLP models, the increasing memory and computation costs hinder their efficient deployment on resource-limited devices. Therefore, transformer quantization attracts wide research interest. Recent work recognizes that structured outliers are the critical bottleneck for quantization performance. However, their proposed methods increase the computation overhead and still leave the outliers there. To fundamentally address this problem, this paper delves into the inherent inducement and importance of the outliers. We discover that $γ$ in LayerNorm (LN) acts as a sinful amplifier for the outliers, and the importance of outliers varies greatly where some outliers provided by a few tokens cover a large area but can be clipped sharply…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wimh966/outlier_suppression
pytorchOfficial

Videos

Outlier Suppression: Pushing the Limit of Low-bit Transformer Language Models· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Dropout · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Residual Connection · Weight Decay · Attention Dropout