Project Aletheia: Verifier-Guided Distillation of Backtracking for Small Language Models
Aradhya Dixit, Tianxi Liang, Jai Telang

TL;DR
This paper introduces Verifier-Guided Distillation, a training method for small language models that teaches error detection and correction during reasoning, improving their ability to handle constraint-satisfaction problems.
Contribution
It presents a novel training protocol that transfers error repair processes to small models, enabling them to perform explicit conflict detection and backtracking.
Findings
Small models can learn to detect contradictions.
Models can revise earlier reasoning steps.
Improved performance on constraint problems.
Abstract
Small Language Models (SLMs, under 10B parameters) are attractive for private, on-device deployment, yet they frequently fail on strict constraint-satisfaction problems due to linear, overconfident reasoning traces that do not recover from early mistakes. We introduce Verifier-Guided Distillation, a training protocol that transfers the process of error repair - explicit conflict detection and backtracking - rather than only correct final answers. By training a 7B model on verified reasoning traces that include mistakes and self-corrections, we show that latent verification behavior can emerge in small models, enabling them to occasionally stop, detect contradictions, and revise earlier assumptions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Software System Performance and Reliability · Natural Language Processing Techniques
