TL;DR
This paper introduces Anti-Self-Distillation (AntiSD), a novel method that improves reasoning in language models by reversing the traditional distillation divergence, leading to faster and more accurate training.
Contribution
AntiSD offers a simple, effective modification to self-distillation that enhances reasoning capabilities and training efficiency in large language models.
Findings
AntiSD matches baseline accuracy in fewer training steps.
AntiSD improves final accuracy by up to 11.5 points.
AntiSD reduces training time by 2 to 10 times on math reasoning benchmarks.
Abstract
On-policy self-distillation, where a student is pulled toward a copy of itself conditioned on privileged context (e.g., a verified solution or feedback), offers a promising direction for advancing reasoning capability without a stronger external teacher. Yet in math reasoning the gains are inconsistent, even when the same approach succeeds elsewhere. A pointwise mutual information analysis traces the failure to the privileged context itself: it inflates the teacher's confidence on tokens already implied by the solution (structural connectives, verifiable claims) and deflates it on deliberation tokens ("Wait", "Let", "Maybe") that drive multi-step search. We propose Anti-Self-Distillation (AntiSD), which ascends a divergence between student and teacher rather than descending it: this reverses the per-token sign and yields a naturally bounded advantage in one step. An entropy-triggered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
