Realizing Visual Question Answering for Education: GPT-4V as a Multimodal AI
Gyeong-Geon Lee, and Xiaoming Zhai

TL;DR
This paper explores how GPT-4V can enable accessible and useful visual question answering (VQA) techniques for educational research, bridging AI and education without technical barriers.
Contribution
It introduces the application of GPT-4V-based VQA in education, demonstrating its potential to transform educational research methodologies.
Findings
GPT-4V enables accessible VQA for educators.
VQA can analyze classroom images and drawings.
Potential to enhance educational research methods.
Abstract
Educational scholars have analyzed various image data acquired from teaching and learning situations, such as photos that shows classroom dynamics, students' drawings with regard to the learning content, textbook illustrations, etc. Unquestioningly, most qualitative analysis of and explanation on image data have been conducted by human researchers, without machine-based automation. It was partially because most image processing artificial intelligence models were not accessible to general educational scholars or explainable due to their complex deep neural network architecture. However, the recent development of Visual Question Answering (VQA) techniques is accomplishing usable visual language models, which receive from the user a question about the given image and returns an answer, both in natural language. Particularly, GPT-4V released by OpenAI, has wide opened the state-of-the-art…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques
Methodstravel james
