Improving Grammatical Error Correction via Pre-Training a Copy-Augmented Architecture with Unlabeled Data
Wei Zhao, Liang Wang, Kewei Shen, Ruoyu Jia, Jingming Liu

TL;DR
This paper introduces a novel copy-augmented neural architecture for grammatical error correction, pre-trained with unlabeled data, significantly improving state-of-the-art performance on standard benchmarks.
Contribution
It is the first to combine copying mechanisms with full pre-training on unlabeled data for GEC, enhancing accuracy over previous methods.
Findings
Outperforms recent state-of-the-art results on CoNLL-2014.
Demonstrates effectiveness of pre-training with unlabeled data.
Shows benefits of combining copying with multi-task learning.
Abstract
Neural machine translation systems have become state-of-the-art approaches for Grammatical Error Correction (GEC) task. In this paper, we propose a copy-augmented architecture for the GEC task by copying the unchanged words from the source sentence to the target sentence. Since the GEC suffers from not having enough labeled training data to achieve high accuracy. We pre-train the copy-augmented architecture with a denoising auto-encoder using the unlabeled One Billion Benchmark and make comparisons between the fully pre-trained model and a partially pre-trained model. It is the first time copying words from the source context and fully pre-training a sequence to sequence model are experimented on the GEC task. Moreover, We add token-level and sentence-level multi-task learning for the GEC task. The evaluation results on the CoNLL-2014 test set show that our approach outperforms all…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
