Synthetic and Natural Noise Both Break Neural Machine Translation

Yonatan Belinkov; Yonatan Bisk

arXiv:1711.02173·cs.CL·February 27, 2018·402 cites

Synthetic and Natural Noise Both Break Neural Machine Translation

Yonatan Belinkov, Yonatan Bisk

PDF

Open Access 3 Repos

TL;DR

This paper demonstrates that neural machine translation models are highly sensitive to both synthetic and natural noise, and proposes a character convolutional neural network approach to improve robustness against such noise.

Contribution

The paper introduces a character convolutional neural network model that enhances NMT robustness to various noise types, addressing a key brittleness issue.

Findings

01

State-of-the-art NMT models fail on noisy texts

02

Robust training improves translation of noisy data

03

Character CNNs learn noise-invariant representations

Abstract

Character-based neural machine translation (NMT) models alleviate out-of-vocabulary issues, learn morphology, and move us closer to completely end-to-end translation systems. Unfortunately, they are also very brittle and easily falter when presented with noisy data. In this paper, we confront NMT models with synthetic and natural sources of noise. We find that state-of-the-art models fail to translate even moderately noisy texts that humans have no trouble comprehending. We explore two approaches to increase model robustness: structure-invariant word representations and robust training on noisy texts. We find that a model based on a character convolutional neural network is able to simultaneously learn representations robust to multiple kinds of noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Testing and Debugging Techniques