# Improving Unsupervised Word-by-Word Translation with Language Model and   Denoising Autoencoder

**Authors:** Yunsu Kim, Jiahui Geng, Hermann Ney

arXiv: 1901.01590 · 2019-01-08

## TL;DR

This paper introduces methods to enhance unsupervised cross-lingual word translation by integrating language models and denoising autoencoders, achieving superior results without back-translation or iterative training.

## Contribution

It presents a novel approach combining language models and denoising autoencoders to improve word-by-word translation using only monolingual data, surpassing existing methods.

## Key findings

- Outperforms state-of-the-art unsupervised translation systems
- Analyzes vocabulary size and denoising effects on translation quality
- Achieves improvements without costly iterative training

## Abstract

Unsupervised learning of cross-lingual word embedding offers elegant matching of words across languages, but has fundamental limitations in translating sentences. In this paper, we propose simple yet effective methods to improve word-by-word translation of cross-lingual embeddings, using only monolingual corpora but without any back-translation. We integrate a language model for context-aware search, and use a novel denoising autoencoder to handle reordering. Our system surpasses state-of-the-art unsupervised neural translation systems without costly iterative training. We also analyze the effect of vocabulary size and denoising type on the translation performance, which provides better understanding of learning the cross-lingual word embedding and its usage in translation.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.01590/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/1901.01590/full.md

## References

18 references — full list in the complete paper: https://tomesphere.com/paper/1901.01590/full.md

---
Source: https://tomesphere.com/paper/1901.01590