Do We Need a Detailed Rubric for Automated Essay Scoring using Large Language Models?
Lui Yoshida

TL;DR
This paper evaluates whether simplified rubrics can replace detailed ones in automated essay scoring with large language models, finding that most models perform equally well with less detailed guidance, thus reducing complexity and token usage.
Contribution
The study demonstrates that simplified rubrics can maintain scoring accuracy across multiple LLMs, challenging the necessity of detailed rubrics in AES systems.
Findings
Most LLMs perform similarly with simplified and detailed rubrics.
Simplified rubrics significantly reduce token usage.
Performance varies across different LLMs, requiring model-specific evaluation.
Abstract
This study investigates the necessity and impact of a detailed rubric in automated essay scoring (AES) using large language models (LLMs). While using rubrics are standard in LLM-based AES, creating detailed rubrics requires substantial ef-fort and increases token usage. We examined how different levels of rubric detail affect scoring accuracy across multiple LLMs using the TOEFL11 dataset. Our experiments compared three conditions: a full rubric, a simplified rubric, and no rubric, using four different LLMs (Claude 3.5 Haiku, Gemini 1.5 Flash, GPT-4o-mini, and Llama 3 70B Instruct). Results showed that three out of four models maintained similar scoring accuracy with the simplified rubric compared to the detailed one, while significantly reducing token usage. However, one model (Gemini 1.5 Flash) showed decreased performance with more detailed rubrics. The findings suggest that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
MethodsLLaMA
