Unsupervised Statistical Machine Translation

Mikel Artetxe; Gorka Labaka; Eneko Agirre

arXiv:1809.01272·cs.CL·December 28, 2021

Unsupervised Statistical Machine Translation

Mikel Artetxe, Gorka Labaka, Eneko Agirre

PDF

Open Access 3 Repos

TL;DR

This paper introduces an unsupervised phrase-based SMT approach that leverages monolingual data and cross-lingual embeddings, significantly narrowing the performance gap with supervised translation systems.

Contribution

It presents a novel unsupervised SMT method that induces phrase tables from monolingual data and improves translation quality through iterative backtranslation.

Findings

01

Achieved 14.08 BLEU on English-German translation

02

Achieved 26.22 BLEU on English-French translation

03

Reduced gap with supervised SMT to 2-5 BLEU points

Abstract

While modern machine translation has relied on large parallel corpora, a recent line of work has managed to train Neural Machine Translation (NMT) systems from monolingual corpora only (Artetxe et al., 2018c; Lample et al., 2018). Despite the potential of this approach for low-resource settings, existing systems are far behind their supervised counterparts, limiting their practical interest. In this paper, we propose an alternative approach based on phrase-based Statistical Machine Translation (SMT) that significantly closes the gap with supervised systems. Our method profits from the modular architecture of SMT: we first induce a phrase table from monolingual corpora through cross-lingual embedding mappings, combine it with an n-gram language model, and fine-tune hyperparameters through an unsupervised MERT variant. In addition, iterative backtranslation improves results further,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification