Beyond Agreement: Diagnosing the Rationale Alignment of Automated Essay Scoring Methods based on Linguistically-informed Counterfactuals
Yupei Wang, Renfen Hu, Zhe Zhao

TL;DR
This paper introduces a counterfactual intervention method using LLMs to diagnose and compare the rationale alignment of different AES models, revealing their focus areas and improving transparency.
Contribution
It presents a novel approach employing linguistically-informed counterfactuals with LLMs to analyze and enhance understanding of AES decision mechanisms.
Findings
BERT-like models focus mainly on sentence-level features
LLMs like GPT-3.5, GPT-4, and Llama-3 consider conventions, accuracy, and organization
LLMs can identify counterfactual interventions in essay feedback
Abstract
While current Automated Essay Scoring (AES) methods demonstrate high scoring agreement with human raters, their decision-making mechanisms are not fully understood. Our proposed method, using counterfactual intervention assisted by Large Language Models (LLMs), reveals that BERT-like models primarily focus on sentence-level features, whereas LLMs such as GPT-3.5, GPT-4 and Llama-3 are sensitive to conventions & accuracy, language complexity, and organization, indicating a more comprehensive rationale alignment with scoring rubrics. Moreover, LLMs can discern counterfactual interventions when giving feedback on essays. Our approach improves understanding of neural AES methods and can also apply to other domains seeking transparency in model-driven decisions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Techniques and Practices · Multi-Agent Systems and Negotiation · Software Engineering Research
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Linear Layer · Residual Connection · Weight Decay · Position-Wise Feed-Forward Layer · Label Smoothing · Cosine Annealing · Dropout
