MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny, Zhou

TL;DR
MobileBERT is a compact, task-agnostic version of BERT designed for resource-limited devices, achieving significant size and speed reductions while maintaining competitive NLP performance.
Contribution
The paper introduces MobileBERT, a smaller and faster BERT variant with a novel training method using knowledge transfer from a specially designed teacher model.
Findings
MobileBERT is 4.3x smaller than BERT_BASE.
MobileBERT is 5.5x faster than BERT_BASE.
MobileBERT achieves competitive results on NLP benchmarks.
Abstract
Natural Language Processing (NLP) has recently achieved great success by using huge pre-trained models with hundreds of millions of parameters. However, these models suffer from heavy model sizes and high latency such that they cannot be deployed to resource-limited mobile devices. In this paper, we propose MobileBERT for compressing and accelerating the popular BERT model. Like the original BERT, MobileBERT is task-agnostic, that is, it can be generically applied to various downstream NLP tasks via simple fine-tuning. Basically, MobileBERT is a thin version of BERT_LARGE, while equipped with bottleneck structures and a carefully designed balance between self-attentions and feed-forward networks. To train MobileBERT, we first train a specially designed teacher model, an inverted-bottleneck incorporated BERT_LARGE model. Then, we conduct knowledge transfer from this teacher to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗csarron/mobilebert-uncased-squad-v1model· 1.2k dl· ♡ 11.2k dl♡ 1
- 🤗csarron/mobilebert-uncased-squad-v2model· 75k dl· ♡ 875k dl♡ 8
- 🤗lordtt13/emo-mobilebertmodel· 2.1k dl· ♡ 42.1k dl♡ 4
- 🤗mrm8488/mobilebert-uncased-finetuned-squadv1model· 6 dl· ♡ 16 dl♡ 1
- 🤗mrm8488/mobilebert-uncased-finetuned-squadv2model· 7 dl· ♡ 27 dl♡ 2
- 🤗mrm8488/squeezebert-finetuned-squadv1model· 20 dl20 dl
- 🤗mrm8488/squeezebert-finetuned-squadv2model· 7 dl7 dl
- 🤗ysakuramoto/mobilebert-jamodel· 63 dl· ♡ 163 dl♡ 1
- 🤗cambridgeltl/sst_mobilebert-uncasedmodel· 7 dl· ♡ 17 dl♡ 1
- 🤗lifeweb-ai/shirazmodel· 14 dl· ♡ 814 dl♡ 8
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
MethodsLinear Layer · Inverted Bottleneck BERT · MobileBERT · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam
