ChatGPT as a Solver and Grader of Programming Exams written in Spanish

Pablo Saborido-Fern\'andez; Marcos Fern\'andez-Pichel; David E.; Losada

arXiv:2409.15112·cs.AI·March 21, 2025

ChatGPT as a Solver and Grader of Programming Exams written in Spanish

Pablo Saborido-Fern\'andez, Marcos Fern\'andez-Pichel, David E., Losada

PDF

Open Access

TL;DR

This study evaluates ChatGPT's ability to solve and grade Spanish programming exams, finding it effective only for simple tasks and less so for complex problems or grading, while providing a new dataset for future research.

Contribution

It introduces a new corpus of programming tasks and prompts, and assesses ChatGPT's capabilities in solving and grading Spanish programming exams.

Findings

01

Effective for simple coding tasks

02

Limited ability to solve complex problems

03

Not effective at grading solutions

Abstract

Evaluating the capabilities of Large Language Models (LLMs) to assist teachers and students in educational tasks is receiving increasing attention. In this paper, we assess ChatGPT's capacities to solve and grade real programming exams, from an accredited BSc degree in Computer Science, written in Spanish. Our findings suggest that this AI model is only effective for solving simple coding tasks. Its proficiency in tackling complex problems or evaluating solutions authored by others are far from effective. As part of this research, we also release a new corpus of programming tasks and the corresponding prompts for solving the problems or grading the solutions. This resource can be further exploited by other research teams.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Online Learning and Analytics