Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study

Danielle R. Thomas; Conrad Borchers; Jionghao Lin; Sanjit Kakarla; Shambhavi Bhushan; Erin Gatz; Shivang Gupta; Ralph Abboud; Kenneth R. Koedinger

arXiv:2506.17410·cs.CL·June 24, 2025

Leveraging LLMs to Assess Tutor Moves in Real-Life Dialogues: A Feasibility Study

Danielle R. Thomas, Conrad Borchers, Jionghao Lin, Sanjit Kakarla, Shambhavi Bhushan, Erin Gatz, Shivang Gupta, Ralph Abboud, Kenneth R. Koedinger

PDF

TL;DR

This study explores the use of advanced language models to automatically identify and evaluate key tutoring actions in real-world math tutoring dialogues, demonstrating high accuracy and practical feasibility.

Contribution

It introduces a scalable, AI-based approach for assessing tutor moves in authentic settings, with a novel prompting strategy and reproducible LLM prompts.

Findings

01

Models reliably detected praise and math errors with over 82% accuracy.

02

AI assessments closely matched human judgments, with 73-89% agreement.

03

Proposes cost-effective prompting methods for real-world tutoring analysis.

Abstract

Tutoring improves student achievement, but identifying and studying what tutoring actions are most associated with student learning at scale based on audio transcriptions is an open research problem. This present study investigates the feasibility and scalability of using generative AI to identify and evaluate specific tutor moves in real-life math tutoring. We analyze 50 randomly selected transcripts of college-student remote tutors assisting middle school students in mathematics. Using GPT-4, GPT-4o, GPT-4-turbo, Gemini-1.5-pro, and LearnLM, we assess tutors' application of two tutor skills: delivering effective praise and responding to student math errors. All models reliably detected relevant situations, for example, tutors providing praise to students (94-98% accuracy) and a student making a math error (82-88% accuracy) and effectively evaluated the tutors' adherence to tutoring…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsDropout · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Layer Normalization · Dense Connections · Softmax · Transformer · GPT-4