CULL-MT: Compression Using Language and Layer pruning for Machine   Translation

Pedram Rostami; Mohammad Javad Dousti

arXiv:2411.06506·cs.CL·November 12, 2024

CULL-MT: Compression Using Language and Layer pruning for Machine Translation

Pedram Rostami, Mohammad Javad Dousti

PDF

Open Access

TL;DR

This paper introduces CULL-MT, a model compression technique for multilingual machine translation that prunes unimportant layers to reduce inference costs while maintaining translation quality, demonstrated on large models with minimal performance loss.

Contribution

The paper proposes a novel layer pruning and knowledge distillation method tailored for multilingual translation models, enabling significant compression with limited accuracy degradation.

Findings

01

Pruning 25% of layers in NLLB-3.3B results in only 0.9 spBLEU drop.

02

LLaMA3.1-8B-Instruct is more sensitive, with a 2.0 spBLEU drop after pruning 5 layers.

03

CULL-MT effectively reduces model size while preserving translation quality.

Abstract

Multilingual machine translation models often outperform traditional bilingual models by leveraging translation knowledge transfer. Recent advancements have led to these models supporting hundreds of languages and achieving state-of-the-art results across various translation directions. However, as these models grow larger, their inference operations become increasingly costly. In many use cases, there is no need to support such a wide range of language pairs, as translation is typically needed in only a few selected directions. In this paper, we present CULL-MT, a compression method for machine translation models based on structural layer pruning and selected language directions. Our approach identifies and prunes unimportant layers using a greedy strategy, then mitigates the impact by applying knowledge distillation from the original model along with parameter-efficient fine-tuning.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsPruning · Knowledge Distillation