Evolving Knowledge Distillation for Lightweight Neural Machine Translation

Xuewen Zhang; Haixiao Zhang; Xinlong Huang

arXiv:2605.09924·cs.CL·May 12, 2026

Evolving Knowledge Distillation for Lightweight Neural Machine Translation

Xuewen Zhang, Haixiao Zhang, Xinlong Huang

PDF

1 Repo

TL;DR

This paper introduces Evolving Knowledge Distillation, a progressive training method that enables lightweight neural machine translation models to approach the performance of larger models by learning from a sequence of increasingly capable teachers.

Contribution

The paper proposes EKD, a novel progressive training framework that effectively bridges the capacity gap in knowledge distillation for NMT models.

Findings

01

EKD improves translation quality across multiple benchmarks.

02

The final student model achieves BLEU scores close to the strongest teacher.

03

EKD consistently narrows the performance gap between small and large models.

Abstract

Recent advancements in Neural Machine Translation (NMT) have significantly improved translation quality. However, the increasing size and complexity of state-of-the-art models present significant challenges for deployment on resource-limited devices. Knowledge distillation (KD) is a promising approach for compressing models, but its effectiveness diminishes when there is a large capacity gap between teacher and student models. To address this issue, we propose Evolving Knowledge Distillation (EKD), a progressive training framework in which the student model learns from a sequence of teachers with gradually increasing capacities. Experiments on IWSLT-14, WMT-17, and WMT-23 benchmarks show that EKD leads to consistent improvements at each stage. On IWSLT-14, the final student achieves a BLEU score of 34.24, narrowing the gap to the strongest teacher (34.32 BLEU) to just 0.08 BLEU. Similar…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

agi-content-generation/EKD
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.