On Using Very Large Target Vocabulary for Neural Machine Translation
S\'ebastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

TL;DR
This paper introduces a method using importance sampling to enable neural machine translation models to handle very large target vocabularies efficiently, improving translation quality and achieving state-of-the-art results.
Contribution
It presents a novel importance sampling approach that reduces training complexity for large vocabularies and enables efficient decoding, leading to improved translation performance.
Findings
Outperforms baseline models with small vocabularies
Achieves state-of-the-art BLEU scores on English->German translation
Almost matches top systems on English->French translation
Abstract
Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
