A Dialogue-Based Framework for Correcting Multimodal Errors in AI-Assisted STEM Education
Akshay Syal, Lawrence Swaminathan Xavier Prince, Evin Gultepe, Nik Bear Brown, Srinivas Sridhar

TL;DR
This paper evaluates LLMs on multimodal physics problems, identifies specific failure modes, and demonstrates a dialogue-based intervention that significantly improves error correction without retraining models.
Contribution
It introduces a structured multimodal dialogue framework that effectively corrects errors in LLMs handling multimodal STEM content, addressing a key limitation in AI tutoring.
Findings
LLMs perform well on text-only physics problems (96% accuracy).
Performance drops on multimodal problems due to specific failure modes.
The dialogue intervention corrected 82% of errors, especially visual processing errors.
Abstract
Large Language Models (LLMs) are democratizing access to personalized tutoring; however, their effectiveness is hindered by challenges in processing multimodal content, which limits AI's potential to provide equitable, high-quality STEM support. This study evaluates LLM performance on multimodal physics problems, identifies specific failure modes through an empirical error taxonomy, and tests practical interventions designed to overcome multimodal processing limitations. We assessed three publicly available LLMs (Claude, Gemini, and ChatGPT) on multimodal physics problems from the OpenStax database and compared the results with text-only performance. An empirically derived error taxonomy was developed through pilot testing, followed by evaluation of a structured multimodal dialogue intervention. All three models achieved near-ceiling accuracy (96%) on text-only physics problems.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
