Can Large Language Models Correctly Interpret Equations with Errors?
Lachlan McGinness, Peter Baumgartner

TL;DR
This study evaluates the ability of Large Language Models to accurately interpret and translate student-written equations with errors into a standard format, aiming to facilitate automated grading in physics education.
Contribution
The paper introduces two novel frameworks—consensus verification and neuro-symbolic feedback—for improving equation translation accuracy by LLMs.
Findings
No open-source model achieved the desired translation accuracy.
Automated reasoning feedback did not significantly improve translation performance.
Future work suggested to break down tasks and extend to handwritten responses.
Abstract
This paper explores the potential of Large Language Models to accurately extract and translate equations from typed student responses into a standard format. This is a useful task as standardized equations can be graded reliably using a Computer Algebra System or a Satisfiability Modulo Theories solver. Therefore physics instructors interested in automated grading would not need to rely on the mathematical reasoning capabilities of Language Models. We used two novel frameworks to improve the translations. The first is consensus where a pair of models verify the correctness of the translations. The second is a neuro-symbolic LLM-modulo approach were models receive feedback from an automated reasoning tool. We performed experiments using responses to the Australian Physics Olympaid exam. We report on results, finding that no open-source model was able to translate the student responses…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling
