Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks

Maitha Alshehhi; Ahmed Sharshar; Mohsen Guizani

arXiv:2507.19699·cs.CL·July 29, 2025

Towards Inclusive NLP: Assessing Compressed Multilingual Transformers across Diverse Language Benchmarks

Maitha Alshehhi, Ahmed Sharshar, Mohsen Guizani

PDF

TL;DR

This paper evaluates the performance of compressed multilingual and monolingual large language models across diverse languages, highlighting the impact of compression techniques and linguistic diversity on model effectiveness and fairness.

Contribution

It provides a comprehensive benchmark of multilingual LLMs with various compression strategies across multiple languages, revealing insights into cross-lingual transfer and model efficiency.

Findings

01

Multilingual models outperform language-specific models in diverse languages.

02

Quantization preserves accuracy while improving efficiency.

03

Aggressive pruning reduces performance, especially in larger models.

Abstract

Although LLMs have attained significant success in high-resource languages, their capacity in low-resource linguistic environments like Kannada and Arabic is not yet fully understood. This work benchmarking the performance of multilingual and monolingual Large Language Models (LLMs) across Arabic, English, and Indic languages, with particular emphasis on the effects of model compression strategies such as pruning and quantization. Findings shows significant performance differences driven by linguistic diversity and resource availability on SOTA LLMS as BLOOMZ, AceGPT, Jais, LLaMA-2, XGLM, and AraGPT2. We find that multilingual versions of the model outperform their language-specific counterparts across the board, indicating substantial cross-lingual transfer benefits. Quantization (4-bit and 8-bit) is effective in maintaining model accuracy while promoting efficiency, but aggressive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.