LightMBERT: A Simple Yet Effective Method for Multilingual BERT Distillation
Xiaoqi Jiao, Yichun Yin, Lifeng Shang, Xin Jiang, Xiao Chen, Linlin, Li, Fang Wang, Qun Liu

TL;DR
LightMBERT is a straightforward distillation approach that effectively transfers multilingual BERT's capabilities to smaller models, enabling efficient deployment without significant performance loss.
Contribution
The paper introduces LightMBERT, a simple distillation method that improves the efficiency of multilingual BERT models while maintaining high performance.
Findings
LightMBERT outperforms baseline distillation methods.
Small models achieve comparable performance to mBERT.
The method is efficient and suitable for resource-constrained devices.
Abstract
The multilingual pre-trained language models (e.g, mBERT, XLM and XLM-R) have shown impressive performance on cross-lingual natural language understanding tasks. However, these models are computationally intensive and difficult to be deployed on resource-restricted devices. In this paper, we propose a simple yet effective distillation method (LightMBERT) for transferring the cross-lingual generalization ability of the multilingual BERT to a small student model. The experiment results empirically demonstrate the efficiency and effectiveness of LightMBERT, which is significantly better than the baselines and performs comparable to the teacher mBERT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsLinear Layer · mBERT · Residual Connection · Byte Pair Encoding · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay · Weight Decay · Multi-Head Attention · Dense Connections · Softmax
