ThinkTuning: Instilling Cognitive Reflections without Distillation

Aswin RRV; Jacob Dineen; Divij Handa; Md Nayem Uddin; Mihir Parmar; Chitta Baral; Ben Zhou

arXiv:2508.07616·cs.AI·August 22, 2025

ThinkTuning: Instilling Cognitive Reflections without Distillation

Aswin RRV, Jacob Dineen, Divij Handa, Md Nayem Uddin, Mihir Parmar, Chitta Baral, Ben Zhou

PDF

Open Access 1 Video

TL;DR

ThinkTuning introduces a novel interactive training method that uses teacher feedback to enhance reasoning in language models, outperforming baseline approaches without relying on distillation.

Contribution

We propose ThinkTuning, a GRPO-based interactive training approach that improves reasoning by using teacher feedback, without distillation, to instill thinking behaviors in language models.

Findings

01

Achieves 3.85% average improvement over zero-shot baselines.

02

Outperforms vanilla-GRPO on MATH-500, AIME, and GPQA-Diamond datasets.

03

Enhances reasoning capabilities without model distillation.

Abstract

Recent advances in test-time scaling have led to the emergence of thinking LLMs that exhibit self-reflective behaviors and multi-step reasoning. While RL drives this self-improvement paradigm, a recent study (Gandhi et al., 2025) shows that RL alone does not truly instill these new reasoning abilities - it merely draws out behaviors already present in the base models. This raises a question: How can we train the models that don't exhibit such thinking behavior to develop it in the first place? To this end, we propose ThinkTuning, a GRPO-based interactive training approach where we augment the rollouts of a student model with the guidance from a teacher model. A simple idea from classroom practice inspires our method: a teacher poses a problem, lets the student try an answer, then gives corrective feedback -- enough to point the mind in the right direction and then show the solution.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ThinkTuning: Instilling Cognitive Reflections without Distillation· underline

Taxonomy

TopicsEducational Games and Gamification · Innovative Teaching and Learning Methods