Linguistically Motivated Vocabulary Reduction for Neural Machine   Translation from Turkish to English

Duygu Ataman; Matteo Negri; Marco Turchi; Marcello Federico

arXiv:1707.09879·cs.CL·August 1, 2017

Linguistically Motivated Vocabulary Reduction for Neural Machine Translation from Turkish to English

Duygu Ataman, Matteo Negri, Marco Turchi, Marcello Federico

PDF

TL;DR

This paper introduces a linguistically motivated vocabulary reduction method for neural machine translation that leverages morphological analysis to improve translation accuracy for morphologically rich languages like Turkish.

Contribution

It proposes a new vocabulary reduction approach based on unsupervised morphology learning and supervised analysis, enhancing translation quality in NMT systems for complex languages.

Findings

01

Achieved a 2.3 BLEU point improvement over conventional methods.

02

Effectively reduces vocabulary size while preserving morphological and semantic integrity.

03

Demonstrated better translation accuracy for Turkish-to-English NMT.

Abstract

The necessity of using a fixed-size word vocabulary in order to control the model complexity in state-of-the-art neural machine translation (NMT) systems is an important bottleneck on performance, especially for morphologically rich languages. Conventional methods that aim to overcome this problem by using sub-word or character-level representations solely rely on statistics and disregard the linguistic properties of words, which leads to interruptions in the word structure and causes semantic and syntactic losses. In this paper, we propose a new vocabulary reduction method for NMT, which can reduce the vocabulary of a given input corpus at any rate while also considering the morphological properties of the language. Our method is based on unsupervised morphology learning and can be, in principle, used for pre-processing any language pair. We also present an alternative word…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.