GLEU Without Tuning
Courtney Napoles, Keisuke Sakaguchi, Matt Post, and Joel Tetreault

TL;DR
This paper introduces an improved version of the GLEU metric for evaluating grammatical error correction that eliminates the need for tuning and performs better with multiple reference sets.
Contribution
The paper presents a modified GLEU metric that addresses issues with multiple references and removes the requirement for tuning, enhancing evaluation consistency.
Findings
The new GLEU metric performs reliably with multiple reference sets.
It does not require parameter tuning, simplifying its application.
The modified metric is recommended over the original for grammatical error correction evaluation.
Abstract
The GLEU metric was proposed for evaluating grammatical error corrections using n-gram overlap with a set of reference sentences, as opposed to precision/recall of specific annotated errors (Napoles et al., 2015). This paper describes improvements made to the GLEU metric that address problems that arise when using an increasing number of reference sets. Unlike the originally presented metric, the modified metric does not require tuning. We recommend that this version be used instead of the original version.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
