Rethinking Evaluation Metrics for Grammatical Error Correction: Why Use a Different Evaluation Process than Human?
Takumi Goto, Yusuke Sakai, Taro Watanabe

TL;DR
This paper proposes an aggregation method for automatic GEC evaluation metrics that aligns with human evaluation procedures, improving correlation with human preferences and enhancing metric performance on benchmarks.
Contribution
It introduces a new aggregation approach for automatic metrics that better mimics human evaluation, bridging the gap between automatic and human assessments in GEC.
Findings
Improved correlation with human rankings across various metrics.
Resolves the evaluation gap by aligning automatic scoring with human preferences.
BERT-based metrics sometimes outperform GPT-4 metrics in GEC evaluation.
Abstract
One of the goals of automatic evaluation metrics in grammatical error correction (GEC) is to rank GEC systems such that it matches human preferences. However, current automatic evaluations are based on procedures that diverge from human evaluation. Specifically, human evaluation derives rankings by aggregating sentence-level relative evaluation results, e.g., pairwise comparisons, using a rating algorithm, whereas automatic evaluation averages sentence-level absolute scores to obtain corpus-level scores, which are then sorted to determine rankings. In this study, we propose an aggregation method for existing automatic evaluation metrics which aligns with human evaluation methods to bridge this gap. We conducted experiments using various metrics, including edit-based metrics, n-gram based metrics, and sentence-level metrics, and show that resolving the gap improves results for the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsText Readability and Simplification · Intelligent Tutoring Systems and Adaptive Learning · Natural Language Processing Techniques
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Multi-Head Attention · Position-Wise Feed-Forward Layer · Adam · Softmax · Dropout · Label Smoothing · Byte Pair Encoding
