Semi-Supervised Learning for Neural Machine Translation
Yong Cheng, Wei Xu, Zhongjun He, Wei He, Hua Wu, Maosong Sun, Yang, Liu

TL;DR
This paper introduces a semi-supervised learning method for neural machine translation that leverages monolingual data through an autoencoder framework, significantly enhancing translation quality especially for low-resource languages.
Contribution
It presents a novel semi-supervised training approach using autoencoders with translation models to utilize monolingual corpora for NMT.
Findings
Achieved significant improvements over state-of-the-art SMT and NMT systems.
Effectively exploits both source and target monolingual data.
Demonstrated on Chinese-English translation with notable results.
Abstract
While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation. Since parallel corpora are usually limited in quantity, quality, and coverage, especially for low-resource languages, it is appealing to exploit monolingual corpora to improve NMT. We propose a semi-supervised approach for training NMT models on the concatenation of labeled (parallel corpora) and unlabeled (monolingual corpora) data. The central idea is to reconstruct the monolingual corpora using an autoencoder, in which the source-to-target and target-to-source translation models serve as the encoder and decoder, respectively. Our approach can not only exploit the monolingual corpora of the target language, but also of the source language. Experiments on the Chinese-English dataset show that our approach achieves significant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
