KD4MT: A Survey of Knowledge Distillation for Machine Translation
Ona de Gibert, Joseph Attieh, Timothee Mickus, Yves Scherrer, J\"org Tiedemann

TL;DR
This survey reviews the development and application of Knowledge Distillation in Machine Translation, analyzing 105 papers to identify trends, challenges, and future directions in this evolving field.
Contribution
It provides a comprehensive categorization, analysis, and practical guidelines for KD in MT, highlighting research gaps and the impact of large language models.
Findings
Common trends and research gaps identified in KD4MT
Lack of unified evaluation practices in the field
Potential risks like hallucination and bias amplification
Abstract
Knowledge Distillation (KD) as a research area has gained a lot of traction in recent years as a compression tool to address challenges related to ever-larger models in NLP. Remarkably, Machine Translation (MT) offers a much more nuanced take on this narrative: in MT, KD also functions as a general-purpose knowledge transfer mechanism that shapes supervision and translation quality as well as efficiency. This survey synthesizes KD for MT (KD4MT) across 105 papers (through October 1, 2025). We begin by introducing both MT and KD for non-experts, followed by an overview of the standard KD approaches relevant to MT applications. Subsequently, we categorize advances in the KD4MT literature based on (i) their methodological contributions and (ii) their practical applications. Our qualitative and quantitative analyses identify common trends in the field and highlight key research gaps as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
