LadaBERT: Lightweight Adaptation of BERT through Hybrid Model   Compression

Yihuan Mao; Yujing Wang; Chufan Wu; Chen Zhang; Yang Wang; Yaming; Yang; Quanlu Zhang; Yunhai Tong; Jing Bai

arXiv:2004.04124·cs.CL·October 22, 2020·19 cites

LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression

Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Yaming, Yang, Quanlu Zhang, Yunhai Tong, Jing Bai

PDF

Open Access

TL;DR

LadaBERT introduces a hybrid model compression approach combining pruning, factorization, and distillation to make BERT more efficient for online applications, maintaining high accuracy with significantly reduced training costs.

Contribution

The paper presents LadaBERT, a novel hybrid compression method that reduces training overhead while preserving BERT's performance, addressing limitations of existing distillation techniques.

Findings

01

Achieves state-of-the-art accuracy on multiple datasets.

02

Reduces training overhead by an order of magnitude.

03

Combines multiple compression techniques effectively.

Abstract

BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a hybrid solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsLinear Layer · Knowledge Distillation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece