Stepwise Verification and Remediation of Student Reasoning Errors with   Large Language Model Tutors

Nico Daheim; Jakub Macina; Manu Kapur; Iryna Gurevych; Mrinmaya Sachan

arXiv:2407.09136·cs.CL·July 15, 2024

Stepwise Verification and Remediation of Student Reasoning Errors with Large Language Model Tutors

Nico Daheim, Jakub Macina, Manu Kapur, Iryna Gurevych, Mrinmaya Sachan

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a stepwise verification approach for LLM-based math tutors that detects student errors more accurately, leading to more targeted and reliable feedback, thereby enhancing personalized education.

Contribution

It presents a novel error verification method grounded in real student solutions, improving LLM tutor responses by accurately identifying mistakes and reducing hallucinations.

Findings

01

Verifiers improve error detection accuracy.

02

Targeted responses are more correct and less hallucinated.

03

Grounding verification enhances tutor quality.

Abstract

Large language models (LLMs) present an opportunity to scale high-quality personalized education to all. A promising approach towards this means is to build dialog tutoring models that scaffold students' problem-solving. However, even though existing LLMs perform well in solving reasoning questions, they struggle to precisely detect student's errors and tailor their feedback to these errors. Inspired by real-world teaching practice where teachers identify student errors and customize their response based on them, we focus on verifying student solutions and show how grounding to such verification improves the overall quality of tutor response generation. We collect a dataset of 1K stepwise math reasoning chains with the first error step annotated by teachers. We show empirically that finding the mistake in a student solution is challenging for current models. We propose and evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eth-lre/verify-then-generate
noneOfficial

Datasets

eth-nlped/stepverify
dataset· 238 dl
238 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIntelligent Tutoring Systems and Adaptive Learning · Educational Technology and Assessment

MethodsFocus