Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs

Stefano Bann\`o; Kate Knill; Mark Gales

arXiv:2605.04298·cs.CL·May 7, 2026

Towards Self-Referential Analytic Assessment: A Profile-Based Approach to L2 Writing Evaluation with LLMs

Stefano Bann\`o, Kate Knill, Mark Gales

PDF

TL;DR

This paper introduces a self-referential, profile-based evaluation framework for L2 writing assessment using LLMs, emphasizing intra-learner analysis over inter-learner ranking to better diagnose strengths and weaknesses.

Contribution

It proposes a novel intra-learner assessment method that improves diagnostic accuracy and compares LLMs with human raters in a zero-shot setting using a dense L2 writing dataset.

Findings

01

LLMs outperform single human raters in identifying weaknesses.

02

Human raters are better at recognizing strengths.

03

Rank-based metrics may obscure true diagnostic performance.

Abstract

Automated essay scoring (AES) research often relies on rank-based correlation metrics to validate analytic assessment. However, such metrics obscure both intrinsic intercorrelations among analytic dimensions that arise from the structure of writing proficiency itself and halo effects, whereby holistic impressions bleed into fine-grained component scores. As a result, high correlations may mask a system's true diagnostic behaviour. In this study, we propose a novel self-referential assessment evaluation framework that focuses on identifying intra-learner strengths and weaknesses rather than assessing inter-learner rankings. We conduct experiments on the publicly available ICNALE GRA, a uniquely dense second-language writing dataset annotated holistically and analytically by up to 80 trained raters. To obtain reliable reference scores, we apply two-facet Rasch modelling to calibrate rater…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.