LCES: Zero-shot Automated Essay Scoring via Pairwise Comparisons Using Large Language Models
Takumi Shibata, Yuichi Miyamura

TL;DR
This paper introduces LCES, a zero-shot essay scoring method that uses pairwise comparisons with large language models and RankNet to produce accurate, scalable, and robust automated essay scores without training on labeled data.
Contribution
The paper proposes a novel pairwise comparison approach for zero-shot AES using LLMs and RankNet, improving accuracy and scalability over existing methods.
Findings
LCES outperforms traditional zero-shot methods in accuracy.
LCES maintains efficiency with large essay datasets.
LCES is robust across different LLM backbones.
Abstract
Recent advances in large language models (LLMs) have enabled zero-shot automated essay scoring (AES), providing a promising way to reduce the cost and effort of essay scoring in comparison with manual grading. However, most existing zero-shot approaches rely on LLMs to directly generate absolute scores, which often diverge from human evaluations owing to model biases and inconsistent scoring. To address these limitations, we propose LLM-based Comparative Essay Scoring (LCES), a method that formulates AES as a pairwise comparison task. Specifically, we instruct LLMs to judge which of two essays is better, collect many such comparisons, and convert them into continuous scores. Considering that the number of possible comparisons grows quadratically with the number of essays, we improve scalability by employing RankNet to efficiently transform LLM preferences into scalar scores. Experiments…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
