Performance of ChatGPT on tasks involving physics visual representations: the case of the Brief Electricity and Magnetism Assessment
Giulia Polverini, Jakob Melin, Elias Onerud, and Bor Gregorcic

TL;DR
This study assesses ChatGPT-4 and ChatGPT-4o's performance on physics tasks involving visual representations, revealing strengths and persistent challenges, especially in visual interpretation and spatial reasoning, with implications for education and assessment design.
Contribution
It provides a comprehensive evaluation of multimodal ChatGPT models on physics assessments with visual content, highlighting their capabilities and limitations in interpreting physics visuals.
Findings
ChatGPT-4o outperforms ChatGPT-4 and students on BEMA tasks.
ChatGPT-4o has improved vision interpretation over ChatGPT-4.
Persistent difficulties in visual interpretation and spatial reasoning tasks.
Abstract
Artificial intelligence-based chatbots are increasingly influencing physics education due to their ability to interpret and respond to textual and visual inputs. This study evaluates the performance of two large multimodal model-based chatbots, ChatGPT-4 and ChatGPT-4o on the Brief Electricity and Magnetism Assessment (BEMA), a conceptual physics inventory rich in visual representations such as vector fields, circuit diagrams, and graphs. Quantitative analysis shows that ChatGPT-4o outperforms both ChatGPT-4 and a large sample of university students, and demonstrates improvements in ChatGPT-4o's vision interpretation ability over its predecessor ChatGPT-4. However, qualitative analysis of ChatGPT-4o's responses reveals persistent challenges. We identified three types of difficulties in the chatbot's responses to tasks on BEMA: (1) difficulties with visual interpretation, (2)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
