Improving Grammatical Error Correction via Contextual Data Augmentation
Yixuan Wang, Baoxin Wang, Yijun Liu, Qingfu Zhu, Dayong Wu, Wanxiang, Che

TL;DR
This paper introduces a novel contextual data augmentation method for grammatical error correction that combines rule-based and model-based techniques, improving performance with minimal synthetic data.
Contribution
It proposes a new synthetic data construction approach using contextual augmentation and relabeling-based cleaning to enhance GEC models, especially in data-limited scenarios.
Findings
Outperforms strong baselines on CoNLL14 and BEA19-Test datasets.
Achieves state-of-the-art results with limited synthetic data.
Demonstrates effective mitigation of noisy labels in augmented data.
Abstract
Nowadays, data augmentation through synthetic data has been widely used in the field of Grammatical Error Correction (GEC) to alleviate the problem of data scarcity. However, these synthetic data are mainly used in the pre-training phase rather than the data-limited fine-tuning phase due to inconsistent error distribution and noisy labels. In this paper, we propose a synthetic data construction method based on contextual augmentation, which can ensure an efficient augmentation of the original data with a more consistent error distribution. Specifically, we combine rule-based substitution with model-based generation, using the generative model to generate a richer context for the extracted error patterns. Besides, we also propose a relabeling-based data cleaning method to mitigate the effects of noisy labels in synthetic data. Experiments on CoNLL14 and BEA19-Test show that our proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducational Technology and Assessment · Natural Language Processing Techniques
