Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam
Nabor C. Mendon\c{c}a

TL;DR
This study evaluates ChatGPT-4 Vision's performance on Brazil's 2021 National Undergraduate Computer Science Exam, demonstrating its strengths in visual question handling and highlighting the importance of question quality and human oversight.
Contribution
First large-scale assessment of ChatGPT-4 Vision on a national exam, revealing its capabilities and limitations in academic evaluation contexts.
Findings
ChatGPT-4 Vision outperformed average students, ranking in the top 10 percentile.
The model excelled in visual questions but struggled with interpretation and reasoning.
Some exam questions were poorly constructed, affecting evaluation accuracy.
Abstract
The recent integration of visual capabilities into Large Language Models (LLMs) has the potential to play a pivotal role in science and technology education, where visual elements such as diagrams, charts, and tables are commonly used to improve the learning experience. This study investigates the performance of ChatGPT-4 Vision, OpenAI's most advanced visual model at the time the study was conducted, on the Bachelor in Computer Science section of Brazil's 2021 National Undergraduate Exam (ENADE). By presenting the model with the exam's open and multiple-choice questions in their original image format and allowing for reassessment in response to differing answer keys, we were able to evaluate the model's reasoning and self-reflecting capabilities in a large-scale academic assessment involving textual and visual content. ChatGPT-4 Vision significantly outperformed the average exam…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Online Learning and Analytics
