From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

David Dinucu-Jianu; Jakub Macina; Nico Daheim; Ido Hakimi; Iryna Gurevych; Mrinmaya Sachan

arXiv:2505.15607·cs.CL·October 14, 2025

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning

David Dinucu-Jianu, Jakub Macina, Nico Daheim, Ido Hakimi, Iryna Gurevych, Mrinmaya Sachan

PDF

Open Access 1 Repo 2 Models 1 Video

TL;DR

This paper presents an RL-based framework to align large language models with pedagogical principles, enabling them to serve as effective tutors that balance guiding students and preserving reasoning skills.

Contribution

It introduces a reinforcement learning approach for training LLMs as pedagogical tutors without human annotations, emphasizing strategic withholding of answers and interpretability.

Findings

01

Achieved tutoring performance comparable to larger models like LearnLM.

02

Introduced a controllable reward system to balance pedagogical support and accuracy.

03

Models better preserve reasoning capabilities than traditional supervised fine-tuning.

Abstract

Large language models (LLMs) can transform education, but their optimization for direct question-answering often undermines effective pedagogy which requires strategically withholding answers. To mitigate this, we propose an online reinforcement learning (RL)-based alignment framework that can quickly adapt LLMs into effective tutors using simulated student-tutor interactions by emphasizing pedagogical quality and guided problem-solving over simply giving away answers. We use our method to train a 7B parameter tutor model without human annotations which reaches similar performance to larger proprietary models like LearnLM. We introduce a controllable reward weighting to balance pedagogical support and student solving accuracy, allowing us to trace the Pareto frontier between these two objectives. Our models better preserve reasoning capabilities than single-turn SFT baselines and can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eth-lre/pedagogicalrl
pytorchOfficial

Models

Videos

From Problem-Solving to Teaching Problem-Solving: Aligning LLMs with Pedagogy using Reinforcement Learning· underline

Taxonomy

TopicsArtificial Intelligence in Law

MethodsShrink and Fine-Tune