Guided Alignment Training for Topic-Aware Neural Machine Translation
Wenhu Chen, Evgeny Matusov, Shahram Khadivi, Jan-Thorsten Peter

TL;DR
This paper introduces guided alignment training and metadata integration to improve neural machine translation quality, especially for e-commerce texts, achieving significant BLEU score gains and outperforming baselines.
Contribution
It presents a novel guided alignment training method and demonstrates the benefits of using metadata signals in NMT models for domain-specific translation tasks.
Findings
BLEU score on product titles improved from 18.6 to 21.3%
Metadata significantly enhances translation quality
Ensemble systems outperform phrase-based baselines by 2.1% BLEU
Abstract
In this paper, we propose an effective way for biasing the attention mechanism of a sequence-to-sequence neural machine translation (NMT) model towards the well-studied statistical word alignment models. We show that our novel guided alignment training approach improves translation quality on real-life e-commerce texts consisting of product titles and descriptions, overcoming the problems posed by many unknown words and a large type/token ratio. We also show that meta-data associated with input texts such as topic or category information can significantly improve translation quality when used as an additional signal to the decoder part of the network. With both novel features, the BLEU score of the NMT system on a product title set improves from 18.6 to 21.3%. Even larger MT quality gains are obtained through domain adaptation of a general domain NMT system to e-commerce data. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Biomedical Text Mining and Ontologies
