On Using Very Large Target Vocabulary for Neural Machine Translation

S\'ebastien Jean; Kyunghyun Cho; Roland Memisevic; Yoshua Bengio

arXiv:1412.2007·cs.CL·March 19, 2015·56 cites

On Using Very Large Target Vocabulary for Neural Machine Translation

S\'ebastien Jean, Kyunghyun Cho, Roland Memisevic, Yoshua Bengio

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method using importance sampling to enable neural machine translation models to handle very large target vocabularies efficiently, improving translation quality and achieving state-of-the-art results.

Contribution

It presents a novel importance sampling approach that reduces training complexity for large vocabularies and enables efficient decoding, leading to improved translation performance.

Findings

01

Outperforms baseline models with small vocabularies

02

Achieves state-of-the-art BLEU scores on English->German translation

03

Almost matches top systems on English->French translation

Abstract

Neural machine translation, a recently proposed approach to machine translation based purely on neural networks, has shown promising results compared to the existing approaches such as phrase-based statistical machine translation. Despite its recent success, neural machine translation has its limitation in handling a larger vocabulary, as training complexity as well as decoding complexity increase proportionally to the number of target words. In this paper, we propose a method that allows us to use a very large target vocabulary without increasing training complexity, based on importance sampling. We show that decoding can be efficiently done even with the model having a very large target vocabulary by selecting only a small subset of the whole target vocabulary. The models trained by the proposed approach are empirically found to outperform the baseline models with a small vocabulary…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HIT-SCIR/ELMoForManyLangs
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling