ThinkTuning: Instilling Cognitive Reflections without Distillation
Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou

TL;DR
ThinkTuning introduces a novel interactive training method that uses teacher feedback to enhance reasoning in language models, outperforming baseline approaches without relying on distillation.
Contribution
We propose ThinkTuning, a GRPO-based interactive training approach that improves reasoning by using teacher feedback, without distillation, to instill thinking behaviors in language models.
Findings
Achieves 3.85% average improvement over zero-shot baselines.
Outperforms vanilla-GRPO on MATH-500, AIME, and GPQA-Diamond datasets.
Enhances reasoning capabilities without model distillation.
Abstract
Recent advances in test-time scaling have led to the emergence of thinking LLMs that exhibit self-reflective behaviors and multi-step reasoning. While RL drives this self-improvement paradigm, a recent study (Gandhi et al., 2025) shows that RL alone does not truly instill these new reasoning abilities - it merely draws out behaviors already present in the base models. This raises a question: How can we train the models that don't exhibit such thinking behavior to develop it in the first place? To this end, we propose ThinkTuning, a GRPO-based interactive training approach where we augment the rollouts of a student model with the guidance from a teacher model. A simple idea from classroom practice inspires our method: a teacher poses a problem, lets the student try an answer, then gives corrective feedback -- enough to point the mind in the right direction and then show the solution.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsEducational Games and Gamification · Innovative Teaching and Learning Methods
