Grammatical Error Generation Based on Translated Fragments

Eetu Sj\"oblom; Mathias Creutz; Teemu Vahtola

arXiv:2104.09933·cs.CL·April 21, 2021·1 cites

Grammatical Error Generation Based on Translated Fragments

Eetu Sj\"oblom, Mathias Creutz, Teemu Vahtola

PDF

Open Access

TL;DR

This paper introduces a neural translation-based method to generate diverse non-native style grammatical errors for training correction models, outperforming existing synthetic data approaches.

Contribution

It presents a novel approach using neural machine translation of fragments to produce varied grammatical and lexical errors for error correction training.

Findings

01

Model trained on generated data outperforms baseline on error-rich test sets.

02

Generated data includes a wider range of errors compared to previous methods.

03

Quantitative and qualitative evaluations confirm effectiveness.

Abstract

We perform neural machine translation of sentence fragments in order to create large amounts of training data for English grammatical error correction. Our method aims at simulating mistakes made by second language learners, and produces a wider range of non-native style language in comparison to state-of-the-art synthetic data creation methods. In addition to purely grammatical errors, our approach generates other types of errors, such as lexical errors. We perform grammatical error correction experiments using neural sequence-to-sequence models, and carry out quantitative and qualitative evaluation. A model trained on data created using our proposed method is shown to outperform a baseline model on test data with a high proportion of errors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification