Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Zihao Han; Tiangang Zhang; Huaibin Wang; Yilun Sun

arXiv:2605.11458·cs.AI·May 13, 2026

Adaptive Teacher Exposure for Self-Distillation in LLM Reasoning

Zihao Han, Tiangang Zhang, Huaibin Wang, Yilun Sun

PDF

1 Datasets

TL;DR

This paper introduces ATESD, a method that adaptively controls teacher exposure during self-distillation in large language models, leading to improved reasoning performance.

Contribution

It proposes a learnable exposure control mechanism for self-distillation, optimizing teacher-student training dynamics based on future student improvement.

Findings

01

ATESD outperforms existing self-distillation methods on multiple benchmarks.

02

Adaptive exposure control improves reasoning accuracy over fixed exposure strategies.

03

The learned controller effectively balances teacher guidance and student learning progress.

Abstract

On-policy self-distillation has become a strong recipe for LLM reasoning, where a privileged teacher supervises the student's own rollouts while conditioning on the reference solution. A design choice shared by nearly all such methods, however, has gone unquestioned: the teacher always sees the full reference reasoning. We argue that this default itself is part of the problem and identify a teacher-side exposure mismatch: when the teacher conditions on reasoning far beyond the student's current competence, the resulting token targets become too strong to absorb. A controlled fixed-exposure sweep makes this concrete on two fronts: 1) full exposure is not reliably the best choice, and 2) student-teacher mismatch grows monotonically as the teacher sees more privileged reasoning. This motivates treating teacher exposure not as a fixed hyperparameter but as a learnable training-time control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

aoiandroid/papers
dataset· 28 dl
28 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.