On Multilingual Encoder Language Model Compression for Low-Resource Languages

Daniil Gurgurov; Michal Gregor; Josef van Genabith; Simon Ostermann

arXiv:2505.16956·cs.CL·November 7, 2025

On Multilingual Encoder Language Model Compression for Low-Resource Languages

Daniil Gurgurov, Michal Gregor, Josef van Genabith, Simon Ostermann

PDF

Open Access

TL;DR

This paper presents a comprehensive approach to compress multilingual language models for low-resource languages, combining multiple techniques to significantly reduce size while maintaining performance across various NLP tasks.

Contribution

It introduces a novel combination of knowledge distillation, pruning, truncation, and vocabulary trimming for extreme model compression tailored to low-resource languages.

Findings

01

Achieved up to 92% compression with 2-10% performance drop

02

Smaller performance degradation with more language-specific data

03

Ablation studies identify best practices for compression

Abstract

In this paper, we combine two-step knowledge distillation, structured pruning, truncation, and vocabulary trimming for extremely compressing multilingual encoder-only language models for low-resource languages. Our novel approach systematically combines existing techniques and takes them to the extreme, reducing layer depth, feed-forward hidden size, and intermediate layer embedding size to create significantly smaller monolingual models while retaining essential language-specific knowledge. We achieve compression rates of up to 92% while maintaining competitive performance, with average drops of 2-10% for moderate compression and 8-13% at maximum compression in four downstream tasks, including sentiment analysis, topic classification, named entity recognition, and part-of-speech tagging, across three low-resource languages. Notably, the performance degradation correlates with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · ICT in Developing Communities · Speech Recognition and Synthesis