Improving Robustness of Machine Translation with Synthetic Noise
Vaibhav Vaibhav, Sumeet Singh, Craig Stewart, Graham Neubig

TL;DR
This paper enhances machine translation robustness by synthesizing realistic noise in training data, enabling systems to better handle noisy, social media-like text and improve translation accuracy.
Contribution
It introduces a noise synthesis method based on the MTNT dataset to improve the resilience of MT systems against naturally occurring noisy text.
Findings
Synthesized noise improves translation robustness.
Resilience to noisy social media text increases.
Partial mitigation of accuracy loss in noisy conditions.
Abstract
Modern Machine Translation (MT) systems perform consistently well on clean, in-domain text. However most human generated text, particularly in the realm of social media, is full of typos, slang, dialect, idiolect and other noise which can have a disastrous impact on the accuracy of output translation. In this paper we leverage the Machine Translation of Noisy Text (MTNT) dataset to enhance the robustness of MT systems by emulating naturally occurring noise in otherwise clean data. Synthesizing noise in this manner we are ultimately able to make a vanilla MT system resilient to naturally occurring noise and partially mitigate loss in accuracy resulting therefrom.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
