How Well Do AI Systems Solve AP Physics? A Comparative Evaluation of Large Language Models on Algebra-Based Free Response Questions

Bilas Paul; Jashandeep Kaur; Shantanu Chakraborty; and Shruti Shrestha

arXiv:2603.07457·physics.ed-ph·March 10, 2026

How Well Do AI Systems Solve AP Physics? A Comparative Evaluation of Large Language Models on Algebra-Based Free Response Questions

Bilas Paul, Jashandeep Kaur, Shantanu Chakraborty, and Shruti Shrestha

PDF

Open Access

TL;DR

This study systematically evaluates four large language models on AP Physics free-response questions, revealing high problem-solving ability but limitations in spatial reasoning and diagram interpretation, with performance variability across years.

Contribution

It provides a comparative analysis of AI models on physics problems, highlighting their strengths and limitations in educational assessment contexts.

Findings

01

Models achieved 82-92% average scores.

02

Significant year-to-year performance variability.

03

Common errors include diagram misinterpretation and reasoning mistakes.

Abstract

The rapid advancement of LLMs has generated growing interest in their potential role in physics education and assessment, yet a focused evaluation of their performance on multi-faceted, free-response physics problems remains underexplored. In this study, we systematically evaluate the performance of four widely accessible AI systems-ChatGPT 4.1 mini, Gemini 2.5 Flash, Claude 4.0 Sonnet, and DeepSeek R1-on AP Physics 1 and 2 free-response questions administered between 2015 and 2025. Model-generated solutions were produced under standardized exam-style prompting and evaluated by three independent physics experts using official College Board scoring guidelines. All models achieved relatively high mean scores (82-92%), indicating strong capability in structured algebraic problem solving. However, substantial year-to-year variability was observed, particularly for AP Physics 1, where…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScience Education and Pedagogy · Machine Learning in Materials Science · Model Reduction and Neural Networks