BinaryBERT: Pushing the Limit of BERT Quantization

Haoli Bai; Wei Zhang; Lu Hou; Lifeng Shang; Jing Jin; Xin Jiang; Qun; Liu; Michael Lyu; Irwin King

arXiv:2012.15701·cs.CL·July 23, 2021·45 cites

BinaryBERT: Pushing the Limit of BERT Quantization

Haoli Bai, Wei Zhang, Lu Hou, Lifeng Shang, Jing Jin, Xin Jiang, Qun, Liu, Michael Lyu, Irwin King

PDF

Open Access 1 Repo

TL;DR

BinaryBERT introduces a novel weight binarization method for BERT, achieving significant compression with minimal performance loss by leveraging ternary weight splitting and fine-tuning.

Contribution

The paper presents a new approach to BERT quantization using weight binarization with ternary weight splitting, enabling effective training and high compression rates.

Findings

01

BinaryBERT is 24x smaller than full-precision BERT.

02

It achieves state-of-the-art compression on GLUE and SQuAD.

03

Performance drop is minimal compared to full-precision models.

Abstract

The rapid development of large pre-trained language models has greatly increased the demand for model compression techniques, among which quantization is a popular solution. In this paper, we propose BinaryBERT, which pushes BERT quantization to the limit by weight binarization. We find that a binary BERT is hard to be trained directly than a ternary counterpart due to its complex and irregular loss landscape. Therefore, we propose ternary weight splitting, which initializes BinaryBERT by equivalently splitting from a half-sized ternary network. The binary model thus inherits the good performance of the ternary one, and can be further enhanced by fine-tuning the new architecture after splitting. Empirical results show that our BinaryBERT has only a slight performance drop compared with the full-precision model while being 24x smaller, achieving the state-of-the-art compression results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

huawei-noah/Pretrained-Language-Model
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · TernaryBERT · Ternary Weight Splitting · BinaryBERT · Dropout · Softmax · Linear Warmup With Linear Decay · Dense Connections · Attention Dropout · Attention Is All You Need