M-Wanda: Improving One-Shot Pruning for Multilingual LLMs
Rochelle Choenni, Ivan Titov

TL;DR
This paper introduces M-Wanda, a novel multilingual pruning method that enhances one-shot pruning performance by considering cross-lingual variation, aiming to maintain multilingual capabilities while reducing model size efficiently.
Contribution
M-Wanda is the first pruning approach explicitly designed to optimize for multilingual performance by incorporating language-aware statistics and dynamic sparsity adjustments.
Findings
Moderate sparsity ratios significantly harm multilingual performance.
M-Wanda consistently improves performance with minimal additional costs.
It is the first method to explicitly optimize multilingual pruning.
Abstract
Multilingual LLM performance is often critically dependent on model size. With an eye on efficiency, this has led to a surge in interest in one-shot pruning methods that retain the benefits of large-scale pretraining while shrinking the model size. However, as pruning tends to come with performance loss, it is important to understand the trade-offs between multilinguality and sparsification. In this work, we study multilingual performance under different sparsity constraints and show that moderate ratios already substantially harm performance. To help bridge this gap, we propose M-Wanda, a pruning method that models cross-lingual variation by incorporating language-aware activation statistics into its pruning criterion and dynamically adjusts layerwise sparsity based on cross-lingual importance. We show that M-Wanda consistently improves performance at minimal additional costs. We are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Translation Studies and Practices
