On Multilingual Encoder Language Model Compression for Low-Resource Languages
Daniil Gurgurov, Michal Gregor, Josef van Genabith, Simon Ostermann

TL;DR
This paper presents a comprehensive approach to compress multilingual language models for low-resource languages, combining multiple techniques to significantly reduce size while maintaining performance across various NLP tasks.
Contribution
It introduces a novel combination of knowledge distillation, pruning, truncation, and vocabulary trimming for extreme model compression tailored to low-resource languages.
Findings
Achieved up to 92% compression with 2-10% performance drop
Smaller performance degradation with more language-specific data
Ablation studies identify best practices for compression
Abstract
In this paper, we combine two-step knowledge distillation, structured pruning, truncation, and vocabulary trimming for extremely compressing multilingual encoder-only language models for low-resource languages. Our novel approach systematically combines existing techniques and takes them to the extreme, reducing layer depth, feed-forward hidden size, and intermediate layer embedding size to create significantly smaller monolingual models while retaining essential language-specific knowledge. We achieve compression rates of up to 92% while maintaining competitive performance, with average drops of 2-10% for moderate compression and 8-13% at maximum compression in four downstream tasks, including sentiment analysis, topic classification, named entity recognition, and part-of-speech tagging, across three low-resource languages. Notably, the performance degradation correlates with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · ICT in Developing Communities · Speech Recognition and Synthesis
