Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation

Verna Dankers; Vikas Raunak

arXiv:2502.01491·cs.CL·July 18, 2025

Memorization Inheritance in Sequence-Level Knowledge Distillation for Neural Machine Translation

Verna Dankers, Vikas Raunak

PDF

Open Access

TL;DR

This paper investigates how sequence-level knowledge distillation in neural machine translation causes student models to inherit memorization and hallucination tendencies from teacher models, proposing an intervention to mitigate these issues.

Contribution

It reveals the extent of memorization inheritance in SeqKD for NMT and introduces Adaptive-SeqKD to reduce memorization and hallucinations in student models.

Findings

01

Students memorize more than baseline models despite not seeing original data.

02

SeqKD amplifies hallucination rates and memorization of low-quality data.

03

Adaptive-SeqKD reduces memorization and hallucination in student models.

Abstract

In this work, we explore how instance-level memorization in the teacher Neural Machine Translation (NMT) model gets inherited by the student model in sequence-level knowledge distillation (SeqKD). We find that despite not directly seeing the original training data, students memorize more than baseline models (models of the same size, trained on the original data) -- 3.4% for exact matches and 57% for extractive memorization -- and show increased hallucination rates. Further, under this SeqKD setting, we also characterize how students behave on specific training data subgroups, such as subgroups with low quality and specific counterfactual memorization (CM) scores, and find that students exhibit amplified denoising on low-quality subgroups. Finally, we propose a modification to SeqKD named Adaptive-SeqKD, which intervenes in SeqKD to reduce memorization and hallucinations. Overall, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques