DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou, Zhiqi Huang, Lifeng Shang, Xin Jiang, Xiao Chen, Qun Liu

TL;DR
DynaBERT is a flexible, dynamic version of BERT that can adapt its size and latency for different hardware constraints, maintaining high performance across various configurations.
Contribution
The paper introduces DynaBERT, a novel method for training BERT models that can dynamically adjust width and depth, enabling efficient deployment on diverse edge devices.
Findings
DynaBERT achieves comparable performance to BERT-base at its largest size.
It outperforms existing BERT compression methods at smaller sizes.
The model maintains high accuracy across various efficiency constraints.
Abstract
The pre-trained language models like BERT, though powerful in many natural language processing tasks, are both computation and memory expensive. To alleviate this problem, one approach is to compress them for specific tasks before deployment. However, recent works on BERT compression usually compress the large BERT model to a fixed smaller size. They can not fully satisfy the requirements of different edge devices with various hardware performances. In this paper, we propose a novel dynamic BERT model (abbreviated as DynaBERT), which can flexibly adjust the size and latency by selecting adaptive width and depth. The training process of DynaBERT includes first training a width-adaptive BERT and then allowing both adaptive width and depth, by distilling knowledge from the full-sized model to small sub-networks. Network rewiring is also used to keep the more important attention heads and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · DynaBERT · RoBERTa · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam
