Efficient RL for optimizing conversation level outcomes with an LLM-based tutor

Hyunji Nam; Omer Gottesman; Amy Zhang; Dean Foster; Emma Brunskill; Lyle Ungar

arXiv:2507.16252·cs.CL·July 23, 2025

Efficient RL for optimizing conversation level outcomes with an LLM-based tutor

Hyunji Nam, Omer Gottesman, Amy Zhang, Dean Foster, Emma Brunskill, Lyle Ungar

PDF

Open Access

TL;DR

This paper introduces a lightweight reinforcement learning approach for LLM-based tutors that optimizes long-term student outcomes in multi-turn math tutoring by using a latent state representation of dialogue history.

Contribution

It presents a novel method to incorporate long-term planning in LLM tutors using a low-dimensional latent state, improving over turn-level optimization.

Findings

01

Enhanced long-term student outcomes in simulated tutoring tasks.

02

Reduced computational resources compared to end-to-end training.

03

Better alignment with long-term educational goals.

Abstract

Large language models (LLMs) built on existing reinforcement learning with human feedback (RLHF) frameworks typically optimize responses based on immediate turn-level human preferences. However, this approach falls short in multi-turn dialogue settings, such as online math tutoring. We propose a method to enhance LLM-based tutors by representing the dialogue history with a lower-dimensional latent state representation of a student and optimizing a long-term policy to determine high-level actions based on the latent state. The goal is to better align the tutor's behavior with the long-term objective of guiding the student towards solving a target math problem on their own. Our model is lightweight, requiring less computational resources than prior work of training the tutor policy end-to-end to directly output the tutor's next utterance. Our experiment results demonstrate that these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning