A Hybrid Morpheme-Word Representation for Machine Translation of Morphologically Rich Languages
Minh-Thang Luong, Preslav Nakov, Min-Yen Kan

TL;DR
This paper introduces a hybrid morpheme-word representation for statistical machine translation of morphologically rich languages, enhancing translation quality by respecting word boundaries and combining morpheme and word-level models.
Contribution
It presents a novel language-independent approach that extends phrase-based models with morpheme-level translation and language models, improving translation accuracy for morphologically complex languages.
Findings
Significant BLEU score improvements over classic models.
Enhanced translation quality confirmed by human judgments.
Effective combination of morpheme- and word-level models.
Abstract
We propose a language-independent approach for improving statistical machine translation for morphologically rich languages using a hybrid morpheme-word representation where the basic unit of translation is the morpheme, but word boundaries are respected at all stages of the translation process. Our model extends the classic phrase-based model by means of (1) word boundary-aware morpheme-level phrase extraction, (2) minimum error-rate training for a morpheme-level translation model using word-level BLEU, and (3) joint scoring with morpheme- and word-level language models. Further improvements are achieved by combining our model with the classic one. The evaluation on English to Finnish using Europarl (714K sentence pairs; 15.5M English words) shows statistically significant improvements over the classic model based on BLEU and human judgments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification
