Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission   Exams

Desnes Nunes; Ricardo Primi; Ramon Pires; Roberto Lotufo; and Rodrigo; Nogueira

arXiv:2303.17003·cs.CL·March 31, 2023·24 cites

Evaluating GPT-3.5 and GPT-4 Models on Brazilian University Admission Exams

Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, and Rodrigo, Nogueira

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This study evaluates GPT-3.5 and GPT-4's ability to answer Brazilian high-stakes exams, showing GPT-4 with Chain-of-Thought prompts achieves 87% accuracy, surpassing GPT-3.5.

Contribution

It demonstrates the high performance of GPT-4 on complex, multidisciplinary exam questions and explores prompt strategies like Chain-of-Thought for improved accuracy.

Findings

01

GPT-4 with CoT achieved 87% accuracy on 2022 ENEM questions.

02

GPT-4 outperformed GPT-3.5 by 11 percentage points.

03

Prompt strategies significantly improved model performance.

Abstract

The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities. This exam poses challenging tasks for LMs, since its questions may span into multiple fields of knowledge, requiring understanding of information from diverse domains. For instance, a question may require comprehension of both statistics and biology to be solved. This work analyzed responses generated by GPT-3.5 and GPT-4 models for questions presented in the 2009-2017 exams, as well as for questions of the 2022 exam, which were made public after the training of the models was completed. Furthermore, different prompt strategies were tested, including the use of Chain-of-Thought (CoT) prompts to generate explanations for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

piresramon/gpt-4-enem
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Topic Modeling

Methods{Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · Attention Is All You Need · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Dropout · Cosine Annealing · Dense Connections