Performance of ChatGPT on tasks involving physics visual representations: the case of the Brief Electricity and Magnetism Assessment

Giulia Polverini; Jakob Melin; Elias Onerud; and Bor Gregorcic

arXiv:2412.10019·physics.ed-ph·May 29, 2025

Performance of ChatGPT on tasks involving physics visual representations: the case of the Brief Electricity and Magnetism Assessment

Giulia Polverini, Jakob Melin, Elias Onerud, and Bor Gregorcic

PDF

TL;DR

This study assesses ChatGPT-4 and ChatGPT-4o's performance on physics tasks involving visual representations, revealing strengths and persistent challenges, especially in visual interpretation and spatial reasoning, with implications for education and assessment design.

Contribution

It provides a comprehensive evaluation of multimodal ChatGPT models on physics assessments with visual content, highlighting their capabilities and limitations in interpreting physics visuals.

Findings

01

ChatGPT-4o outperforms ChatGPT-4 and students on BEMA tasks.

02

ChatGPT-4o has improved vision interpretation over ChatGPT-4.

03

Persistent difficulties in visual interpretation and spatial reasoning tasks.

Abstract

Artificial intelligence-based chatbots are increasingly influencing physics education due to their ability to interpret and respond to textual and visual inputs. This study evaluates the performance of two large multimodal model-based chatbots, ChatGPT-4 and ChatGPT-4o on the Brief Electricity and Magnetism Assessment (BEMA), a conceptual physics inventory rich in visual representations such as vector fields, circuit diagrams, and graphs. Quantitative analysis shows that ChatGPT-4o outperforms both ChatGPT-4 and a large sample of university students, and demonstrates improvements in ChatGPT-4o's vision interpretation ability over its predecessor ChatGPT-4. However, qualitative analysis of ChatGPT-4o's responses reveals persistent challenges. We identified three types of difficulties in the chatbot's responses to tasks on BEMA: (1) difficulties with visual interpretation, (2)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.