Simulated Students in Tutoring Dialogues: Substance or Illusion?

Alexander Scarlatos; Jaewook Lee; Simon Woodhead; Andrew Lan

arXiv:2601.04025·cs.CL·May 6, 2026

Simulated Students in Tutoring Dialogues: Substance or Illusion?

Alexander Scarlatos, Jaewook Lee, Simon Woodhead, Andrew Lan

PDF

TL;DR

This paper assesses the quality of simulated students in tutoring dialogues, proposing evaluation metrics and benchmarking various methods, revealing current approaches' limitations and the need for further research.

Contribution

It formally defines the student simulation task, introduces comprehensive evaluation metrics, and benchmarks multiple simulation methods on real-world data.

Findings

01

Prompting strategies perform poorly in simulating students.

02

Supervised fine-tuning and preference optimization improve simulation quality.

03

Current methods still have limited effectiveness, indicating a challenging task.

Abstract

Advances in large language models (LLMs) enable many new innovations in education. However, evaluating the effectiveness of new technology requires real students, which is time-consuming and hard to scale up. Therefore, many recent works on LLM-powered tutoring solutions have used simulated students for both training and evaluation, often via simple prompting. Surprisingly, little work has been done to ensure or even measure the quality of simulated students. In this work, we formally define the student simulation task, propose a set of evaluation metrics that span linguistic, behavioral, and cognitive aspects, and benchmark a wide range of student simulation methods on these metrics. We experiment on a real-world math tutoring dialogue dataset, where both automated and human evaluation results show that prompting strategies for student simulation perform poorly; supervised fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.