On the Impact of Various Types of Noise on Neural Machine Translation

Huda Khayrallah; Philipp Koehn

arXiv:1805.12282·cs.CL·September 14, 2020

On the Impact of Various Types of Noise on Neural Machine Translation

Huda Khayrallah, Philipp Koehn

PDF

1 Repo 1 Datasets

TL;DR

This paper investigates how different artificial noise types in training data affect neural machine translation quality, revealing neural models are more sensitive to noise and can learn to copy inputs under severe noise conditions.

Contribution

The study introduces five types of artificial noise and compares their impact on neural and statistical machine translation, highlighting neural models' vulnerability.

Findings

01

Neural models are more affected by noise than statistical models.

02

Severe noise can cause neural models to copy input sentences.

03

Different noise types degrade translation quality to varying degrees.

Abstract

We examine how various types of noise in the parallel training data impact the quality of neural machine translation systems. We create five types of artificial noise and analyze how they degrade performance in neural and statistical machine translation. We find that neural models are generally more harmed by noise than statistical models. For one especially egregious type of noise they learn to just copy the input sentence.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

danielinux7/Multilingual-Parallel-Corpus
none

Datasets

Kylan12/Synthetic-AI-ML-Dataset
dataset· 42 dl
42 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.