Monolingually Derived Phrase Scores for Phrase Based SMT Using Neural Networks Vector Representations
Amir Pouya Aghasadeghi, Mohadeseh Bastan

TL;DR
This paper introduces neural network-based phrase scoring features derived from monolingual data, enhancing phrase-based machine translation by improving BLEU scores and recovering translation quality lost without phrase table probabilities.
Contribution
It is the first to integrate neural network sentence and word embeddings into an end-to-end phrase-based SMT system, improving translation quality.
Findings
Recovered over 80% of BLEU loss without phrase table probabilities
Combined features improved BLEU score by 0.74 points
Demonstrated effectiveness of neural embeddings in SMT
Abstract
In this paper, we propose two new features for estimating phrase-based machine translation parameters from mainly monolingual data. Our method is based on two recently introduced neural network vector representation models for words and sentences. It is the first time that these models have been used in an end to end phrase-based machine translation system. Scores obtained from our method can recover more than 80% of BLEU loss caused by removing phrase table probabilities. We also show that our features combined with the phrase table probabilities improve the BLEU score by absolute 0.74 points.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
