Grammatical Error Correction in Low-Resource Scenarios

Jakub N\'aplava; Milan Straka

arXiv:1910.00353·cs.CL·October 17, 2019

Grammatical Error Correction in Low-Resource Scenarios

Jakub N\'aplava, Milan Straka

PDF

1 Repo

TL;DR

This paper introduces a new Czech grammatical error correction dataset and demonstrates that Transformer models trained on synthetic data achieve state-of-the-art results in low-resource language scenarios.

Contribution

The paper presents AKCES-GEC, a new dataset for Czech GEC, and shows that synthetic data with Transformer models improves GEC performance in low-resource languages.

Findings

01

Transformer models with synthetic data outperform previous methods

02

State-of-the-art results achieved on Czech, German, and Russian datasets

03

AKCES-GEC dataset is publicly available for research

Abstract

Grammatical error correction in English is a long studied problem with many existing systems and datasets. However, there has been only a limited research on error correction of other languages. In this paper, we present a new dataset AKCES-GEC on grammatical error correction for Czech. We then make experiments on Czech, German and Russian and show that when utilizing synthetic parallel corpus, Transformer neural machine translation model can reach new state-of-the-art results on these datasets. AKCES-GEC is published under CC BY-NC-SA 4.0 license at https://hdl.handle.net/11234/1-3057 and the source code of the GEC model is available at https://github.com/ufal/low-resource-gec-wnut2019.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ufal/low-resource-gec-wnut2019
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax