Evaluating GPT-4's Vision Capabilities on Brazilian University Admission Exams
Ramon Pires, Thales Sales Almeida, Hugo Abonizio, Rodrigo Nogueira

TL;DR
This paper evaluates GPT-4's ability to handle complex Brazilian university entrance exam questions involving both text and visuals, highlighting strengths in multidisciplinary questions and challenges in mathematical reasoning.
Contribution
It introduces a comprehensive multimodal evaluation framework for language models on real-world exams, focusing on Portuguese tests and visual comprehension.
Findings
GPT-4 excels in multidisciplinary questions
Text captions outperform direct image use
Mathematical questions remain challenging
Abstract
Recent advancements in language models have showcased human-comparable performance in academic entrance exams. However, existing studies often overlook questions that require the integration of visual comprehension, thus compromising the full spectrum and complexity inherent in real-world scenarios. To address this gap, we present a comprehensive framework to evaluate language models on entrance exams, which incorporates both textual and visual elements. We evaluate the two most recent editions of Exame Nacional do Ensino M\'edio (ENEM), the main standardized entrance examination adopted by Brazilian universities. Our study not only reaffirms the capabilities of GPT-4 as the state of the art for handling complex multidisciplinary questions, but also pioneers in offering a realistic assessment of multimodal language models on Portuguese examinations. One of the highlights is that text…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Text Readability and Simplification · Natural Language Processing Techniques
MethodsAttention Is All You Need · Byte Pair Encoding · Dense Connections · Dropout · Softmax · Absolute Position Encodings · Layer Normalization · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing
