A Dialogue-Based Framework for Correcting Multimodal Errors in AI-Assisted STEM Education

Akshay Syal; Lawrence Swaminathan Xavier Prince; Evin Gultepe; Nik Bear Brown; Srinivas Sridhar

arXiv:2605.04131·physics.ed-ph·May 7, 2026

A Dialogue-Based Framework for Correcting Multimodal Errors in AI-Assisted STEM Education

Akshay Syal, Lawrence Swaminathan Xavier Prince, Evin Gultepe, Nik Bear Brown, Srinivas Sridhar

PDF

TL;DR

This paper evaluates LLMs on multimodal physics problems, identifies specific failure modes, and demonstrates a dialogue-based intervention that significantly improves error correction without retraining models.

Contribution

It introduces a structured multimodal dialogue framework that effectively corrects errors in LLMs handling multimodal STEM content, addressing a key limitation in AI tutoring.

Findings

01

LLMs perform well on text-only physics problems (96% accuracy).

02

Performance drops on multimodal problems due to specific failure modes.

03

The dialogue intervention corrected 82% of errors, especially visual processing errors.

Abstract

Large Language Models (LLMs) are democratizing access to personalized tutoring; however, their effectiveness is hindered by challenges in processing multimodal content, which limits AI's potential to provide equitable, high-quality STEM support. This study evaluates LLM performance on multimodal physics problems, identifies specific failure modes through an empirical error taxonomy, and tests practical interventions designed to overcome multimodal processing limitations. We assessed three publicly available LLMs (Claude, Gemini, and ChatGPT) on multimodal physics problems from the OpenStax database and compared the results with text-only performance. An empirically derived error taxonomy was developed through pilot testing, followed by evaluation of a structured multimodal dialogue intervention. All three models achieved near-ceiling accuracy (96%) on text-only physics problems.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.