Self-Evolution Knowledge Distillation for LLM-based Machine Translation
Yuncheng Song, Liang Ding, Changtong Zan, Shujian Huang

TL;DR
This paper introduces Self-Evolution Knowledge Distillation, a dynamic method that improves large language model-based machine translation by adaptively transferring knowledge based on token difficulty, leading to significant BLEU score improvements.
Contribution
It proposes a novel self-evolution distillation strategy that adaptively combines teacher and ground truth knowledge based on token difficulty, enhancing translation quality.
Findings
Achieves an average of 1.4 BLEU point improvement across four translation directions.
Effectively leverages teacher models for better knowledge transfer.
Demonstrates the importance of adaptive distillation based on token difficulty.
Abstract
Knowledge distillation (KD) has shown great promise in transferring knowledge from larger teacher models to smaller student models. However, existing KD strategies for large language models often minimize output distributions between student and teacher models indiscriminately for each token. This overlooks the imbalanced nature of tokens and their varying transfer difficulties. In response, we propose a distillation strategy called Self-Evolution KD. The core of this approach involves dynamically integrating teacher distribution and one-hot distribution of ground truth into the student distribution as prior knowledge, which promotes the distillation process. It adjusts the ratio of prior knowledge based on token learning difficulty, fully leveraging the teacher model's potential. Experimental results show our method brings an average improvement of approximately 1.4 SacreBLEU points…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies
