Realizing Visual Question Answering for Education: GPT-4V as a   Multimodal AI

Gyeong-Geon Lee; and Xiaoming Zhai

arXiv:2405.07163·physics.ed-ph·May 14, 2024·2 cites

Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI

Gyeong-Geon Lee, and Xiaoming Zhai

PDF

Open Access

TL;DR

This paper explores how GPT-4V can enable accessible and useful visual question answering (VQA) techniques for educational research, bridging AI and education without technical barriers.

Contribution

It introduces the application of GPT-4V-based VQA in education, demonstrating its potential to transform educational research methodologies.

Findings

01

GPT-4V enables accessible VQA for educators.

02

VQA can analyze classroom images and drawings.

03

Potential to enhance educational research methods.

Abstract

Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual language models, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques

Methodstravel james