From Generic Correlation to Input-Specific Credit in On-Policy Self Distillation
Guobin Shen, Lei Huang, Xiang Cheng, Chenxiao Zhao, Jindong Li, Dongcheng Zhao, Xing Yu

TL;DR
This paper analyzes on-policy self-distillation rewards in language models, revealing they measure pointwise mutual information and proposing CREDIT to isolate input-specific credit, improving performance across benchmarks.
Contribution
It provides a Bayesian filtering interpretation of self-distillation rewards and introduces CREDIT, a method to focus on input-specific information, enhancing model performance.
Findings
CREDIT isolates input-specific reward components effectively.
CREDIT achieves strong performance across multiple benchmarks.
The reward corresponds to a Bayesian filtering increment related to mutual information.
Abstract
On-policy self-distillation has emerged as a promising paradigm for post-training language models, in which the model conditions on environment feedback to serve as its own teacher, providing dense token-level rewards without external teacher models or step-level annotations. Despite its empirical success, what this reward actually measures and what kind of credit it assigns remain unclear. Under a posterior-compatibility interpretation of feedback conditioning, standard in the implicit-reward literature, we show that the self-distillation token reward is a Bayesian filtering increment whose trajectory sum is exactly the pointwise mutual information between the response and the feedback given the input. This pMI can be raised by input-specific reasoning or by input-generic shortcuts, so we further decompose the teacher log-probability along the input axis. Based on this analysis, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
