BEBERT: Efficient and Robust Binary Ensemble BERT

Jiayi Tian; Chao Fang; Haonan Wang; Zhongfeng Wang

arXiv:2210.15976·cs.CL·May 10, 2023

BEBERT: Efficient and Robust Binary Ensemble BERT

Jiayi Tian, Chao Fang, Haonan Wang, Zhongfeng Wang

PDF

Open Access 1 Repo

TL;DR

BEBERT introduces an ensemble of binary BERT models that significantly improves accuracy and robustness while maintaining computational efficiency, reducing training time and model size compared to full-precision BERT.

Contribution

This work is the first to apply ensemble techniques to binary BERT models, achieving superior accuracy and robustness without knowledge distillation, and demonstrating practical efficiency gains.

Findings

01

BEBERT outperforms existing binary BERT models in accuracy and robustness.

02

BEBERT achieves a 2x speedup in training time.

03

BEBERT reduces model size by 13x and FLOPs by 15x with minimal accuracy loss.

Abstract

Pre-trained BERT models have achieved impressive accuracy on natural language processing (NLP) tasks. However, their excessive amount of parameters hinders them from efficient deployment on edge devices. Binarization of the BERT models can significantly alleviate this issue but comes with a severe accuracy drop compared with their full-precision counterparts. In this paper, we propose an efficient and robust binary ensemble BERT (BEBERT) to bridge the accuracy gap. To the best of our knowledge, this is the first work employing ensemble techniques on binary BERTs, yielding BEBERT, which achieves superior accuracy while retaining computational efficiency. Furthermore, we remove the knowledge distillation procedures during ensemble to speed up the training process without compromising accuracy. Experimental results on the GLUE benchmark show that the proposed BEBERT significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ttttttris/bebert
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Machine Learning and Data Classification

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Linear Warmup With Linear Decay · Attention Dropout · Weight Decay · Dense Connections · Linear Layer · Layer Normalization · Residual Connection