Language Models are Few-Shot Graders

Chenyan Zhao; Mariana Silva; Seth Poulsen

arXiv:2502.13337·cs.CL·February 20, 2025

Language Models are Few-Shot Graders

Chenyan Zhao, Mariana Silva, Seth Poulsen

PDF

Open Access

TL;DR

This paper introduces an LLM-based automatic short answer grading system that outperforms existing models, analyzes different OpenAI models for grading, and explores prompt strategies like RAG and rubrics to improve accuracy.

Contribution

The paper presents a novel LLM-based ASAG pipeline that surpasses previous models and systematically evaluates prompt strategies and model choices for effective automated grading.

Findings

01

GPT-4o balances accuracy and cost-effectiveness.

02

RAG-based selection improves grading accuracy.

03

Rubrics enhance evaluation consistency.

Abstract

Providing evaluations to student work is a critical component of effective student learning, and automating its process can significantly reduce the workload on human graders. Automatic Short Answer Grading (ASAG) systems, enabled by advancements in Large Language Models (LLMs), offer a promising solution for assessing and providing instant feedback for open-ended student responses. In this paper, we present an ASAG pipeline leveraging state-of-the-art LLMs. Our new LLM-based ASAG pipeline achieves better performances than existing custom-built models on the same datasets. We also compare the grading performance of three OpenAI models: GPT-4, GPT-4o, and o1-preview. Our results demonstrate that GPT-4o achieves the best balance between accuracy and cost-effectiveness. On the other hand, o1-preview, despite higher accuracy, exhibits a larger variance in error that makes it less practical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsAttention Is All You Need · Absolute Position Encodings · Linear Layer · Byte Pair Encoding · Layer Normalization · Residual Connection · Dense Connections · Label Smoothing · Multi-Head Attention · Position-Wise Feed-Forward Layer