Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Unggi Lee; Jiyeong Bae; Jaehyeon Park; Haeun Park; Taejun Park; Younghoon Jeon; Sungmin Cho; Junbo Koh; Yeil Jeong; Gyeonggeon Lee

arXiv:2601.14560·cs.CL·January 22, 2026

Rewarding How Models Think Pedagogically: Integrating Pedagogical Reasoning and Thinking Rewards for LLMs in Education

Unggi Lee, Jiyeong Bae, Jaehyeon Park, Haeun Park, Taejun Park, Younghoon Jeon, Sungmin Cho, Junbo Koh, Yeil Jeong, Gyeonggeon Lee

PDF

Open Access

TL;DR

This paper introduces PedagogicalRL-Thinking, a framework that enhances LLMs for education by guiding their internal reasoning with domain-specific prompts and reinforcing pedagogical quality through specialized rewards, leading to improved educational performance.

Contribution

It presents two novel methods—pedagogical reasoning prompting and thinking reward—for aligning LLMs' internal reasoning with educational pedagogical principles.

Findings

01

Domain-specific prompting outperforms generic instructions.

02

Thinking reward combined with pedagogical prompting yields best results.

03

Models show improved reasoning and instructional decision-making.

Abstract

Large language models (LLMs) are increasingly deployed as intelligent tutoring systems, yet research on optimizing LLMs specifically for educational contexts remains limited. Recent works have proposed reinforcement learning approaches for training LLM tutors, but these methods focus solely on optimizing visible responses while neglecting the model's internal thinking process. We introduce PedagogicalRL-Thinking, a framework that extends pedagogical alignment to reasoning LLMs in education through two novel approaches: (1) Pedagogical Reasoning Prompting, which guides internal reasoning using domain-specific educational theory rather than generic instructions; and (2) Thinking Reward, which explicitly evaluates and reinforces the pedagogical quality of the model's reasoning traces. Our experiments reveal that domain-specific, theory-grounded prompting outperforms generic prompting, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Topic Modeling · Text Readability and Simplification