ChatGPT as a Solver and Grader of Programming Exams written in Spanish
Pablo Saborido-Fern\'andez, Marcos Fern\'andez-Pichel, David E., Losada

TL;DR
This study evaluates ChatGPT's ability to solve and grade Spanish programming exams, finding it effective only for simple tasks and less so for complex problems or grading, while providing a new dataset for future research.
Contribution
It introduces a new corpus of programming tasks and prompts, and assesses ChatGPT's capabilities in solving and grading Spanish programming exams.
Findings
Effective for simple coding tasks
Limited ability to solve complex problems
Not effective at grading solutions
Abstract
Evaluating the capabilities of Large Language Models (LLMs) to assist teachers and students in educational tasks is receiving increasing attention. In this paper, we assess ChatGPT's capacities to solve and grade real programming exams, from an accredited BSc degree in Computer Science, written in Spanish. Our findings suggest that this AI model is only effective for solving simple coding tasks. Its proficiency in tackling complex problems or evaluating solutions authored by others are far from effective. As part of this research, we also release a new corpus of programming tasks and the corresponding prompts for solving the problems or grading the solutions. This resource can be further exploited by other research teams.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Online Learning and Analytics
