When Does Unsupervised Machine Translation Work?
Kelly Marchisio, Kevin Duh, and Philipp Koehn

TL;DR
This paper evaluates the conditions affecting the success of unsupervised machine translation, revealing its limitations across different languages, domains, and resource levels, and emphasizing the need for thorough empirical testing.
Contribution
It provides a comprehensive empirical analysis of unsupervised MT under various challenging conditions, highlighting key failure points and influencing future research directions.
Findings
Performance drops with domain mismatch
Random initialization impacts results significantly
Poor results on low-resource and script-diverse language pairs
Abstract
Despite the reported success of unsupervised machine translation (MT), the field has yet to examine the conditions under which these methods succeed, and where they fail. We conduct an extensive empirical evaluation of unsupervised MT using dissimilar language pairs, dissimilar domains, diverse datasets, and authentic low-resource languages. We find that performance rapidly deteriorates when source and target corpora are from different domains, and that random word embedding initialization can dramatically affect downstream translation performance. We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs. We advocate for extensive empirical evaluation of unsupervised MT systems to highlight failure points and encourage continued research on the most…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
