Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven   Multi-Trait Essay Scoring

Heejin Do; Sangwon Ryu; Gary Geunbae Lee

arXiv:2502.20748·cs.CL·March 3, 2025

Teach-to-Reason with Scoring: Self-Explainable Rationale-Driven Multi-Trait Essay Scoring

Heejin Do, Sangwon Ryu, Gary Geunbae Lee

PDF

TL;DR

This paper introduces RaDME, a self-explainable multi-trait essay scoring system that combines large language models' reasoning with a smaller, efficient model for transparent and accurate scoring.

Contribution

It presents a novel framework that distills LLM reasoning into a smaller model to produce both scores and rationales, improving transparency in AES.

Findings

01

RaDME achieves accurate multi-trait scoring.

02

LLMs excel in rationale generation with precise scores.

03

The framework enhances transparency and reasoning in AES.

Abstract

Multi-trait automated essay scoring (AES) systems provide a fine-grained evaluation of an essay's diverse aspects. While they excel in scoring, prior systems fail to explain why specific trait scores are assigned. This lack of transparency leaves instructors and learners unconvinced of the AES outputs, hindering their practical use. To address this, we propose a self-explainable Rationale-Driven Multi-trait automated Essay scoring (RaDME) framework. RaDME leverages the reasoning capabilities of large language models (LLMs) by distilling them into a smaller yet effective scorer. This more manageable student model is optimized to sequentially generate a trait score followed by the corresponding rationale, thereby inherently learning to select a more justifiable score by considering the subsequent rationale during training. Our findings indicate that while LLMs underperform in direct AES…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.