Multilingual Performance of a Multimodal Artificial Intelligence System on Multisubject Physics Concept Inventories
Gerd Kortemeyer, Marina Babayeva, Giulia Polverini, Ralf Widenhorn, Bor Gregorcic

TL;DR
This study evaluates GPT-4o's multilingual multimodal capabilities on physics concept inventories, revealing strengths in language and subject variation, but weaknesses in laboratory skills and visual interpretation, with implications for education.
Contribution
It is the first to assess GPT-4o's performance on physics inventories across multiple languages and modalities, highlighting its potential and limitations in physics education.
Findings
GPT-4o outperforms undergraduate students in most physics subjects.
Performance varies across languages, with European languages performing better.
Laboratory skills are the weakest area for GPT-4o.
Abstract
We investigate the multilingual and multimodal performance of a large language model-based artificial intelligence (AI) system, GPT-4o, using a diverse set of physics concept inventories spanning multiple languages and subject categories. The inventories, sourced from the PhysPort website, cover classical physics topics such as mechanics, electromagnetism, optics, and thermodynamics, as well as relativity, quantum mechanics, astronomy, mathematics, and laboratory skills. Unlike previous text-only studies, we uploaded the inventories as images to reflect what a student would see on paper, thereby assessing the system's multimodal functionality. Our results indicate variation in performance across subjects, with laboratory skills standing out as the weakest. We also observe differences across languages, with English and European languages showing the strongest performance. Notably, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech and dialogue systems · Educational Technology and Assessment
MethodsSparse Evolutionary Training
