Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams
Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, and Rodrigo, Nogueira

TL;DR
This study evaluates GPT-3.5 and GPT-4's ability to answer Brazilian high-stakes exams, showing GPT-4 with Chain-of-Thought prompts achieves 87% accuracy, surpassing GPT-3.5.
Contribution
It demonstrates the high performance of GPT-4 on complex, multidisciplinary exam questions and explores prompt strategies like Chain-of-Thought for improved accuracy.
Findings
GPT-4 with CoT achieved 87% accuracy on 2022 ENEM questions.
GPT-4 outperformed GPT-3.5 by 11 percentage points.
Prompt strategies significantly improved model performance.
Abstract
The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities. This exam poses challenging tasks for LMs, since its questions may span into multiple fields of knowledge, requiring understanding of information from diverse domains. For instance, a question may require comprehension of both statistics and biology to be solved. This work analyzed responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models was completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling
Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Dropout · Cosine Annealing · Dense Connections
