Using an LLM to Investigate Students' Explanations on Conceptual Physics Questions
Sean Savage, N. Sanjay Rebello

TL;DR
This study demonstrates that GPT-4o can effectively classify and analyze students' written physics explanations at scale, revealing deeper misconceptions beyond multiple-choice assessments.
Contribution
The paper introduces a validated LLM approach to assess and categorize students' written explanations in physics, providing insights into their conceptual understanding.
Findings
GPT-4o classifies student explanations with 0-3% discrepancy from human graders.
Categories of incorrect explanations differ from MC distractors, revealing deeper misconceptions.
LLM enables scalable, detailed analysis of student understanding in large classes.
Abstract
Analyzing students' written solutions to physics questions is a major area in PER. However, gauging student understanding in college courses is bottlenecked by large class sizes, which limits assessments to a multiple-choice (MC) format for ease of grading. Although sufficient in quantifying scientifically correct conceptions, MC assessments do not uncover students' deeper ways of understanding physics. Large language models (LLMs) offer a promising approach for assessing students' written responses at scale. Our study used an LLM, validated by human graders, to classify students' written explanations to three questions on the Energy and Momentum Conceptual Survey as correct or incorrect, and organized students' incorrect explanations into emergent categories. We found that the LLM (GPT-4o) can fairly assess students' explanations, comparable to human graders (0-3% discrepancy).…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
