Reaching Human-level Performance in Automatic Grammatical Error   Correction: An Empirical Study

Tao Ge; Furu Wei; Ming Zhou

arXiv:1807.01270·cs.CL·July 12, 2018·97 cites

Reaching Human-level Performance in Automatic Grammatical Error Correction: An Empirical Study

Tao Ge, Furu Wei, Ming Zhou

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel fluency boost learning and inference mechanism for neural seq2seq models in grammatical error correction, achieving state-of-the-art results that reach human-level performance on major benchmarks.

Contribution

It proposes a new fluency boosting approach that enhances error correction models by generating diverse training data and enabling incremental correction during inference.

Findings

01

Achieves 75.72 F_{0.5} on CoNLL-2014

02

Achieves 62.42 GLEU on JFLEG

03

First GEC system to reach human-level performance on both benchmarks

Abstract

Neural sequence-to-sequence (seq2seq) approaches have proven to be successful in grammatical error correction (GEC). Based on the seq2seq framework, we propose a novel fluency boost learning and inference mechanism. Fluency boosting learning generates diverse error-corrected sentence pairs during training, enabling the error correction model to learn how to improve a sentence's fluency from more instances, while fluency boosting inference allows the model to correct a sentence incrementally with multiple inference steps. Combining fluency boost learning and inference with convolutional seq2seq models, our approach achieves the state-of-the-art performance: 75.72 (F_{0.5}) on CoNLL-2014 10 annotation dataset and 62.42 (GLEU) on JFLEG test set respectively, becoming the first GEC system that reaches human-level performance (72.58 for CoNLL and 62.37 for JFLEG) on both of the benchmarks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

getao/human-performance-gec
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory · Sequence to Sequence