Evaluating vision-capable chatbots in interpreting kinematics graphs: a   comparative study of free and subscription-based models

Giulia Polverini; Bor Gregorcic

arXiv:2406.14685·physics.ed-ph·October 25, 2024

Evaluating vision-capable chatbots in interpreting kinematics graphs: a comparative study of free and subscription-based models

Giulia Polverini, Bor Gregorcic

PDF

Open Access

TL;DR

This study evaluates the visual interpretation abilities of eight large multimodal chatbots on kinematics graphs, revealing that OpenAI's models outperform others and highlighting the influence of task type on chatbot performance.

Contribution

It provides a comparative analysis of free and subscription-based multimodal chatbots' performance on graph interpretation tasks in STEM, an area with limited prior research.

Findings

01

OpenAI's chatbots outperform others in graph interpretation.

02

ChatGPT-4o achieves the best overall performance.

03

Tasks with linguistic input are easier than visual interpretation tasks.

Abstract

This study investigates the performance of eight large multimodal model (LMM)-based chatbots on the Test of Understanding Graphs in Kinematics (TUG-K), a research-based concept inventory. Graphs are a widely used representation in STEM and medical fields, making them a relevant topic for exploring LMM-based chatbots' visual interpretation abilities. We evaluated both freely available chatbots (Gemini 1.0 Pro, Claude 3 Sonnet, Microsoft Copilot, and ChatGPT-4o) and subscription-based ones (Gemini 1.0 Ultra, Gemini 1.5 Pro API, Claude 3 Opus, and ChatGPT-4). We found that OpenAI's chatbots outperform all the others, with ChatGPT-4o showing the overall best performance. Contrary to expectations, we found no notable differences in the overall performance between freely available and subscription-based versions of Gemini and Claude 3 chatbots, with the exception of Gemini 1.5 Pro, available…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAI in Service Interactions