Memorization Dynamics in Knowledge Distillation for Language Models

Jaydeep Borkar; Karan Chadha; Niloofar Mireshghallah; Yuchen Zhang; Irina-Elena Veliche; Archi Mitra; David A. Smith; Zheng Xu; Diego Garcia-Olano

arXiv:2601.15394·cs.CL·January 23, 2026

Memorization Dynamics in Knowledge Distillation for Language Models

Jaydeep Borkar, Karan Chadha, Niloofar Mireshghallah, Yuchen Zhang, Irina-Elena Veliche, Archi Mitra, David A. Smith, Zheng Xu, Diego Garcia-Olano

PDF

Open Access

TL;DR

This paper investigates how knowledge distillation affects memorization in large language models, finding it reduces memorization and varies with distillation type, thus improving privacy and generalization.

Contribution

It provides the first comprehensive analysis of memorization dynamics in knowledge distillation for language models, comparing different distillation methods and datasets.

Findings

01

Distilled models memorize over 50% less data than fine-tuned models.

02

Certain examples are inherently easier to memorize, dominating memorization.

03

Hard distillation inherits 2.7 times more teacher-specific examples than soft distillation.

Abstract

Knowledge Distillation (KD) is increasingly adopted to transfer capabilities from large language models to smaller ones, offering significant improvements in efficiency and utility while often surpassing standard fine-tuning. Beyond performance, KD is also explored as a privacy-preserving mechanism to mitigate the risk of training data leakage. While training data memorization has been extensively studied in standard pre-training and fine-tuning settings, its dynamics in a knowledge distillation setup remain poorly understood. In this work, we study memorization across the KD pipeline using three large language model (LLM) families (Pythia, OLMo-2, Qwen-3) and three datasets (FineWeb, Wikitext, Nemotron-CC-v2). We find: (1) distilled models memorize significantly less training data than standard fine-tuning (reducing memorization by more than 50%); (2) some examples are inherently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification