CULL-MT: Compression Using Language and Layer pruning for Machine Translation
Pedram Rostami, Mohammad Javad Dousti

TL;DR
This paper introduces CULL-MT, a model compression technique for multilingual machine translation that prunes unimportant layers to reduce inference costs while maintaining translation quality, demonstrated on large models with minimal performance loss.
Contribution
The paper proposes a novel layer pruning and knowledge distillation method tailored for multilingual translation models, enabling significant compression with limited accuracy degradation.
Findings
Pruning 25% of layers in NLLB-3.3B results in only 0.9 spBLEU drop.
LLaMA3.1-8B-Instruct is more sensitive, with a 2.0 spBLEU drop after pruning 5 layers.
CULL-MT effectively reduces model size while preserving translation quality.
Abstract
Multilingual machine translation models often outperform traditional bilingual models by leveraging translation knowledge transfer. Recent advancements have led to these models supporting hundreds of languages and achieving state-of-the-art results across various translation directions. However, as these models grow larger, their inference operations become increasingly costly. In many use cases, there is no need to support such a wide range of language pairs, as translation is typically needed in only a few selected directions. In this paper, we present CULL-MT, a compression method for machine translation models based on structural layer pruning and selected language directions. Our approach identifies and prunes unimportant layers using a greedy strategy, then mitigates the impact by applying knowledge distillation from the original model along with parameter-efficient fine-tuning.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
MethodsPruning · Knowledge Distillation
