LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
Yihuan Mao, Yujing Wang, Chufan Wu, Chen Zhang, Yang Wang, Yaming, Yang, Quanlu Zhang, Yunhai Tong, Jing Bai

TL;DR
LadaBERT introduces a hybrid model compression approach combining pruning, factorization, and distillation to make BERT more efficient for online applications, maintaining high accuracy with significantly reduced training costs.
Contribution
The paper presents LadaBERT, a novel hybrid compression method that reduces training overhead while preserving BERT's performance, addressing limitations of existing distillation techniques.
Findings
Achieves state-of-the-art accuracy on multiple datasets.
Reduces training overhead by an order of magnitude.
Combines multiple compression techniques effectively.
Abstract
BERT is a cutting-edge language representation model pre-trained by a large corpus, which achieves superior performances on various natural language understanding tasks. However, a major blocking issue of applying BERT to online services is that it is memory-intensive and leads to unsatisfactory latency of user requests, raising the necessity of model compression. Existing solutions leverage the knowledge distillation framework to learn a smaller model that imitates the behaviors of BERT. However, the training procedure of knowledge distillation is expensive itself as it requires sufficient training data to imitate the teacher model. In this paper, we address this issue by proposing a hybrid solution named LadaBERT (Lightweight adaptation of BERT through hybrid model compression), which combines the advantages of different model compression methods, including weight pruning, matrix…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · Knowledge Distillation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece
