JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction

Yuhao Zhan; Yuqing Zhang; Jing Yuan; Qixiang Ma; Zhiqi Yang; Yu Gu; Zemin Liu; Fei Wu

arXiv:2511.21700·cs.CL·December 9, 2025

JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction

Yuhao Zhan, Yuqing Zhang, Jing Yuan, Qixiang Ma, Zhiqi Yang, Yu Gu, Zemin Liu, Fei Wu

PDF

Open Access 1 Video

TL;DR

JELV is an automated framework that improves grammatical error correction evaluation and reference expansion by validating edits for grammaticality, faithfulness, and fluency, leading to more accurate assessments and better model training.

Contribution

The paper introduces JELV, a novel automated validity judge for GEC edits, with two implementations achieving high agreement with humans, and demonstrates its effectiveness in evaluation and dataset expansion.

Findings

01

JELV achieves 90% agreement with human judgments.

02

Applying JELV to expand datasets improves GEC system performance.

03

JELV enhances evaluation accuracy and reference diversity in GEC.

Abstract

Existing Grammatical Error Correction (GEC) systems suffer from limited reference diversity, leading to underestimated evaluation and restricted model generalization. To address this issue, we introduce the Judge of Edit-Level Validity (JELV), an automated framework to validate correction edits from grammaticality, faithfulness, and fluency. Using our proposed human-annotated Pair-wise Edit-level Validity Dataset (PEVData) as benchmark, JELV offers two implementations: a multi-turn LLM-as-Judges pipeline achieving 90% agreement with human annotators, and a distilled DeBERTa classifier with 85% precision on valid edits. We then apply JELV to reclassify misjudged false positives in evaluation and derive a comprehensive evaluation metric by integrating false positive decoupling and fluency scoring, resulting in state-of-the-art correlation with human judgments. We also apply JELV to filter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

JELV: A Judge of Edit-Level Validity for Evaluation and Automated Reference Expansion in Grammatical Error Correction· underline

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification