Using AI Large Language Models for Grading in Education: A Hands-On Test for Physics
Ryan Mok, Faraaz Akhtar, Louis Clare, Christine Li, Jun Ida, Lewis Ross, and Mario Campanelli

TL;DR
This study evaluates the effectiveness of large language models in grading undergraduate physics assessments, highlighting their potential and limitations, and proposes a method to improve AI grading accuracy using mark schemes.
Contribution
It introduces an empirical procedure to assess AI grading in physics, demonstrating how providing mark schemes enhances grading quality and analyzing topic-specific differences.
Findings
AI grading prone to errors and hallucinations
Providing mark schemes improves grading accuracy
Grading performance correlates with problem-solving ability
Abstract
Grading assessments is time-consuming and prone to human bias. Students may experience delays in receiving feedback that may not be tailored to their expectations or needs. Harnessing AI in education can be effective for grading undergraduate physics problems, enhancing the efficiency of undergraduate-level physics learning and teaching, and helping students understand concepts with the help of a constantly available tutor. This report devises a simple empirical procedure to investigate and quantify how well large language model (LLM) based AI chatbots can grade solutions to undergraduate physics problems in Classical Mechanics, Electromagnetic Theory and Quantum Mechanics, comparing humans against AI grading. The following LLMs were tested: Gemini 1.5 Pro, GPT-4, GPT-4o and Claude 3.5 Sonnet. The results show AI grading is prone to mathematical errors and hallucinations, which render…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOnline Learning and Analytics · Advanced Data Processing Techniques · Topic Modeling
