Neural Machine Translation of Text from Non-Native Speakers

Antonios Anastasopoulos; Alison Lui; Toan Nguyen; and David Chiang

arXiv:1808.06267·cs.CL·March 13, 2019

Neural Machine Translation of Text from Non-Native Speakers

Antonios Anastasopoulos, Alison Lui, Toan Nguyen, and David Chiang

PDF

2 Repos

TL;DR

This paper enhances neural machine translation robustness to grammatical errors by augmenting training data with artificial errors and using grammar correction, recovering significant translation quality loss.

Contribution

It introduces a data augmentation method with artificial errors and combines it with grammar correction to improve NMT robustness to real grammatical errors.

Findings

01

Augmenting training data with artificial errors improves robustness.

02

Combining with grammar correction recovers 1.5 BLEU points.

03

Provides Spanish translations of grammar error corpus for testing.

Abstract

Neural Machine Translation (NMT) systems are known to degrade when confronted with noisy data, especially when the system is trained only on clean data. In this paper, we show that augmenting training data with sentences containing artificially-introduced grammatical errors can make the system more robust to such errors. In combination with an automatic grammar error correction system, we can recover 1.5 BLEU out of 2.4 BLEU lost due to grammatical errors. We also present a set of Spanish translations of the JFLEG grammar error correction corpus, which allows for testing NMT robustness to real grammatical errors.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.