Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs
Stefano Bann\`o, Kate Knill, Mark Gales

TL;DR
This paper introduces a self-referential, profile-based evaluation framework for L2 writing assessment using LLMs, emphasizing intra-learner analysis over inter-learner ranking to better diagnose strengths and weaknesses.
Contribution
It proposes a novel intra-learner assessment method that improves diagnostic accuracy and compares LLMs with human raters in a zero-shot setting using a dense L2 writing dataset.
Findings
LLMs outperform single human raters in identifying weaknesses.
Human raters are better at recognizing strengths.
Rank-based metrics may obscure true diagnostic performance.
Abstract
Automated essay scoring (AES) research often relies on rank-based correlation metrics to validate analytic assessment. However, such metrics obscure both intrinsic intercorrelations among analytic dimensions that arise from the structure of writing proficiency itself and halo effects, whereby holistic impressions bleed into fine-grained component scores. As a result, high correlations may mask a system's true diagnostic behaviour. In this study, we propose a novel self-referential assessment evaluation framework that focuses on identifying intra-learner strengths and weaknesses rather than assessing inter-learner rankings. We conduct experiments on the publicly available ICNALE GRA, a uniquely dense second-language writing dataset annotated holistically and analytically by up to 80 trained raters. To obtain reliable reference scores, we apply two-facet Rasch modelling to calibrate rater…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
