Assessing the Capability of LLMs in Solving POSCOMP Questions

Cayo Viegas; Rohit Gheyi; M\'arcio Ribeiro

arXiv:2505.20338·cs.CL·November 19, 2025

Assessing the Capability of LLMs in Solving POSCOMP Questions

Cayo Viegas, Rohit Gheyi, M\'arcio Ribeiro

PDF

Open Access

TL;DR

This study evaluates the performance of various large language models on the POSCOMP computer science exam, demonstrating that recent models like ChatGPT-4 and Gemini 2.5 Pro outperform human participants in text-based questions.

Contribution

It provides a comprehensive assessment of LLM capabilities on a specialized, challenging computer science exam, highlighting recent models' superiority over humans.

Findings

01

ChatGPT-4 outperforms all human participants in 2023.

02

Recent models show continuous improvement across years.

03

LLMs excel in text-based questions but struggle with image interpretation.

Abstract

Recent advancements in Large Language Models (LLMs) have significantly expanded the capabilities of artificial intelligence in natural language processing tasks. Despite this progress, their performance in specialized domains such as computer science remains relatively unexplored. Understanding the proficiency of LLMs in these domains is critical for evaluating their practical utility and guiding future developments. The POSCOMP, a prestigious Brazilian examination used for graduate admissions in computer science promoted by the Brazlian Computer Society (SBC), provides a challenging benchmark. This study investigates whether LLMs can match or surpass human performance on the POSCOMP exam. Four LLMs - ChatGPT-4, Gemini 1.0 Advanced, Claude 3 Sonnet, and Le Chat Mistral Large - were initially evaluated on the 2022 and 2023 POSCOMP exams. The assessments measured the models' proficiency…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law