BiBERT: Accurate Fully Binarized BERT

Haotong Qin; Yifu Ding; Mingyuan Zhang; Qinghua Yan; Aishan Liu,; Qingqing Dang; Ziwei Liu; Xianglong Liu

arXiv:2203.06390·cs.CL·March 15, 2022·33 cites

BiBERT: Accurate Fully Binarized BERT

Haotong Qin, Yifu Ding, Mingyuan Zhang, Qinghua Yan, Aishan Liu,, Qingqing Dang, Ziwei Liu, Xianglong Liu

PDF

Open Access 1 Repo 1 Video

TL;DR

BiBERT is a fully binarized BERT model that significantly reduces computation and memory costs while maintaining high performance, achieved through novel attention and distillation techniques.

Contribution

This paper introduces BiBERT, the first fully binarized BERT, with new attention and distillation methods to overcome performance drops in binarization.

Findings

01

Outperforms existing quantized BERTs on NLP benchmarks

02

Achieves 56.3x FLOPs and 31.2x model size reduction

03

Maintains competitive accuracy with ultra-low bit activations

Abstract

The large pre-trained BERT has achieved remarkable performance on Natural Language Processing (NLP) tasks but is also computation and memory expensive. As one of the powerful compression approaches, binarization extremely reduces the computation and memory consumption by utilizing 1-bit parameters and bitwise operations. Unfortunately, the full binarization of BERT (i.e., 1-bit weight, embedding, and activation) usually suffer a significant performance drop, and there is rare study addressing this problem. In this paper, with the theoretical justification and empirical analysis, we identify that the severe performance drop can be mainly attributed to the information degradation and optimization direction mismatch respectively in the forward and backward propagation, and propose BiBERT, an accurate fully binarized BERT, to eliminate the performance bottlenecks. Specifically, BiBERT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

htqin/bibert
pytorchOfficial

Videos

BiBERT: Accurate Fully Binarized BERT· slideslive

Taxonomy

TopicsMultimodal Machine Learning Applications · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Layer · Dense Connections · Residual Connection · Weight Decay · Layer Normalization · Bilinear Attention · Linear Warmup With Linear Decay