Improving Robustness of Machine Translation with Synthetic Noise

Vaibhav Vaibhav; Sumeet Singh; Craig Stewart; Graham Neubig

arXiv:1902.09508·cs.CL·April 12, 2019·6 cites

Improving Robustness of Machine Translation with Synthetic Noise

Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, Graham Neubig

PDF

Open Access 1 Repo

TL;DR

This paper enhances machine translation robustness by synthesizing realistic noise in training data, enabling systems to better handle noisy, social media-like text and improve translation accuracy.

Contribution

It introduces a noise synthesis method based on the MTNT dataset to improve the resilience of MT systems against naturally occurring noisy text.

Findings

01

Synthesized noise improves translation robustness.

02

Resilience to noisy social media text increases.

03

Partial mitigation of accuracy loss in noisy conditions.

Abstract

Modern Machine Translation (MT) systems perform consistently well on clean, in-domain text. However most human generated text, particularly in the realm of social media, is full of typos, slang, dialect, idiolect and other noise which can have a disastrous impact on the accuracy of output translation. In this paper we leverage the Machine Translation of Noisy Text (MTNT) dataset to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data. Synthesizing noise in this manner we are ultimately able to make a vanilla MT system resilient to naturally occurring noise and partially mitigate loss in accuracy resulting therefrom.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

MysteryVaibhav/robust_mtnt
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis