A Self-Refinement Strategy for Noise Reduction in Grammatical Error   Correction

Masato Mita; Shun Kiyono; Masahiro Kaneko; Jun Suzuki; Kentaro Inui

arXiv:2010.03155·cs.CL·October 8, 2020

A Self-Refinement Strategy for Noise Reduction in Grammatical Error Correction

Masato Mita, Shun Kiyono, Masahiro Kaneko, Jun Suzuki, Kentaro Inui

PDF

Open Access

TL;DR

This paper introduces a self-refinement denoising strategy for GEC datasets that improves model performance by enhancing data quality, leading to state-of-the-art results on major benchmarks.

Contribution

The paper proposes a novel self-refinement method leveraging model prediction consistency to denoise GEC datasets, significantly improving correction coverage and fluency.

Findings

01

Achieved state-of-the-art results on CoNLL-2014, JFLEG, and BEA-2019 benchmarks.

02

Denoising improves correction recall and overall GEC performance.

03

Model prediction consistency effectively identifies and reduces dataset noise.

Abstract

Existing approaches for grammatical error correction (GEC) largely rely on supervised learning with manually created GEC datasets. However, there has been little focus on verifying and ensuring the quality of the datasets, and on how lower-quality data might affect GEC performance. We indeed found that there is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected. To address this, we designed a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models, and outperformed strong denoising baseline methods. We further applied task-specific techniques and achieved state-of-the-art performance on the CoNLL-2014, JFLEG, and BEA-2019 benchmarks. We then analyzed the effect of the proposed denoising method, and found that our approach leads to improved coverage of corrections and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification