CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Jingheng Ye; Zishan Xu; Yinghui Li; Linlin Song; Qingyu Zhou; Hai-Tao Zheng; Ying Shen; Wenhao Jiang; Hong-Gee Kim; Ruitong Liu; Xin Su; Zifei Shan

arXiv:2407.00934·cs.CL·May 30, 2025

CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction

Jingheng Ye, Zishan Xu, Yinghui Li, Linlin Song, Qingyu Zhou, Hai-Tao Zheng, Ying Shen, Wenhao Jiang, Hong-Gee Kim, Ruitong Liu, Xin Su, Zifei Shan

PDF

Open Access 1 Repo 1 Video

TL;DR

CLEME2.0 introduces an interpretable, aspect-based evaluation metric for grammatical error correction that improves human consistency and outperforms existing metrics.

Contribution

The paper presents CLEME2.0, a novel reference-based GEC evaluation metric that disentangles correction aspects for better interpretability and robustness.

Findings

01

Achieves state-of-the-art results on human judgment datasets.

02

Improves human consistency over existing metrics.

03

Effectively exposes GEC system qualities and drawbacks.

Abstract

The paper focuses on the interpretability of Grammatical Error Correction (GEC) evaluation metrics, which received little attention in previous studies. To bridge the gap, we introduce **CLEME2.0**, a reference-based metric describing four fundamental aspects of GEC systems: hit-correction, wrong-correction, under-correction, and over-correction. They collectively contribute to exposing critical qualities and locating drawbacks of GEC systems. Evaluating systems by combining these aspects also leads to superior human consistency over other reference-based and reference-less metrics. Extensive experiments on two human judgment datasets and six reference datasets demonstrate the effectiveness and robustness of our method, achieving a new state-of-the-art result. Our codes are released at https://github.com/THUKElab/CLEME.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thukelab/cleme
none

Videos

CLEME2.0: Towards Interpretable Evaluation by Disentangling Edits for Grammatical Error Correction· underline

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Mathematics, Computing, and Information Processing

MethodsSoftmax · Attention Is All You Need