Distillation Traps and Guards: A Calibration Knob for LLM Distillability

Weixiao Zhan; Yongcheng Jing; Leszek Rutkowski; Dacheng Tao

arXiv:2604.18963·cs.LG·April 22, 2026

Distillation Traps and Guards: A Calibration Knob for LLM Distillability

Weixiao Zhan, Yongcheng Jing, Leszek Rutkowski, Dacheng Tao

PDF

TL;DR

This paper identifies key challenges in knowledge distillation for LLMs, introduces a calibration method to control distillability, and demonstrates improved student performance and model protection across tasks.

Contribution

It presents a novel post-hoc calibration technique using reinforcement fine-tuning to regulate teacher distillability, enhancing transfer quality and safety.

Findings

01

Calibrated teachers improve student performance over baselines.

02

Undistillable teachers retain task performance but prevent student collapse.

03

Calibration offers a practical safety and IP protection mechanism.

Abstract

Knowledge distillation (KD) transfers capabilities from large language models (LLMs) to smaller students, yet it can fail unpredictably and also underpins model leakage risks. Our analysis revealed several distillation traps: tail noise, off-policy instability, and, most fundamentally, the teacher-student gap, that distort training signals. These traps manifest as overconfident hallucinations, self-correction collapse, and local decoding degradation, causing distillation to fail. Motivated by these findings, we propose a post-hoc calibration method that, to the best of our knowledge, for the first time enables control over a teacher's distillability via reinforcement fine-tuning (RFT). Our objective combines task utility, KL anchor, and across-tokenizer calibration reward. This makes distillability a practical safety lever for foundation models, connecting robust teacher-student…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.