Intriguing Properties of Compression on Multilingual Models
Kelechi Ogueji, Orevaoghene Ahia, Gbemileke Onilude, Sebastian, Gehrmann, Sara Hooker, Julia Kreutzer

TL;DR
This paper investigates how compression techniques affect multilingual models during fine-tuning, revealing that compression can improve robustness and support low-resource languages, challenging prior assumptions.
Contribution
It introduces an experimental framework to analyze sparsification effects on multilingual models, uncovering new benefits of compression for robustness and low-resource language performance.
Findings
Compression can enhance model robustness.
Sparsification may support low-resource languages.
Contrary to prior beliefs, compression does not always harm performance.
Abstract
Multilingual models are often particularly dependent on scaling to generalize to a growing number of languages. Compression techniques are widely relied upon to reconcile the growth in model size with real world resource constraints, but compression can have a disparate effect on model performance for low-resource languages. It is thus crucial to understand the trade-offs between scale, multilingualism, and compression. In this work, we propose an experimental framework to characterize the impact of sparsifying multilingual pre-trained language models during fine-tuning. Applying this framework to mBERT named entity recognition models across 40 languages, we find that compression confers several intriguing and previously unknown generalization properties. In contrast to prior findings, we find that compression may improve model robustness over dense models. We additionally observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsmBERT
