Evaluating Generative AI for CS1 Code Grading: Direct vs Reverse Methods

Ahmad Memon; Abdallah Mohamed

arXiv:2511.14798·cs.SE·November 20, 2025

Evaluating Generative AI for CS1 Code Grading: Direct vs Reverse Methods

Ahmad Memon, Abdallah Mohamed

PDF

Open Access

TL;DR

This paper compares two AI-based methods for grading introductory programming assignments, evaluating their accuracy, efficiency, and consistency against human graders, and explores their potential for scalable, fair assessment.

Contribution

It introduces a novel 'Reverse' grading approach using LLMs, compares it with the traditional 'Direct' method, and assesses their effectiveness in automated CS1 code grading.

Findings

01

Reverse method offers more fine-grained assessment.

02

Direct method is faster and simpler to implement.

03

Both methods benefit from careful prompt engineering.

Abstract

Manual grading of programming assignments in introductory computer science courses can be time-consuming and prone to inconsistencies. While unit testing is commonly used for automatic evaluation, it typically follows a binary pass/fail model and does not give partial marks. Recent advances in large language models (LLMs) offer the potential for automated, scalable, and more objective grading. This paper compares two AI-based grading techniques: \textit{Direct}, where the AI model applies a rubric directly to student code, and \textit{Reverse} (a newly proposed approach), where the AI first fixes errors, then deduces a grade based on the nature and number of fixes. Each method was evaluated on both the instructor's original grading scale and a tenfold expanded scale to assess the impact of range on AI grading accuracy. To assess their effectiveness, AI-assigned scores were evaluated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTeaching and Learning Programming · Intelligent Tutoring Systems and Adaptive Learning · Online Learning and Analytics