Neural Machine Translation by Generating Multiple Linguistic Factors

Mercedes Garc\'ia-Mart\'inez; Lo\"ic Barrault; Fethi Bougares

arXiv:1712.01821·cs.CL·December 7, 2017

Neural Machine Translation by Generating Multiple Linguistic Factors

Mercedes Garc\'ia-Mart\'inez, Lo\"ic Barrault, Fethi Bougares

PDF

TL;DR

This paper introduces a factored neural machine translation model that decomposes words into linguistic factors, improving vocabulary handling and grammatical correctness in translation tasks.

Contribution

The paper presents a novel factored NMT architecture that reduces vocabulary size issues and produces grammatically correct words not in the vocabulary.

Findings

01

Improved BLEU and METEOR scores over baseline systems

02

Reduced unknown token production

03

Effective handling of larger vocabularies

Abstract

Factored neural machine translation (FNMT) is founded on the idea of using the morphological and grammatical decomposition of the words (factors) at the output side of the neural network. This architecture addresses two well-known problems occurring in MT, namely the size of target language vocabulary and the number of unknown tokens produced in the translation. FNMT system is designed to manage larger vocabulary and reduce the training time (for systems with equivalent target language vocabulary size). Moreover, we can produce grammatically correct words that are not part of the vocabulary. FNMT model is evaluated on IWSLT'15 English to French task and compared to the baseline word-based and BPE-based NMT systems. Promising qualitative and quantitative results (in terms of BLEU and METEOR) are reported.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.