Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine   Translation

Pranava Swaroop Madhyastha; Cristina Espa\~na-Bonet

arXiv:1608.01910·cs.CL·August 8, 2016·1 cites

Resolving Out-of-Vocabulary Words with Bilingual Embeddings in Machine Translation

Pranava Swaroop Madhyastha, Cristina Espa\~na-Bonet

PDF

Open Access

TL;DR

This paper introduces a bilingual embedding-based model to generate probable translations for out-of-vocabulary words, improving machine translation accuracy especially in out-of-domain scenarios.

Contribution

It presents a novel softmax-based vocabulary expansion method using monolingual embeddings and a small bilingual dictionary, enhancing translation quality.

Findings

01

Achieved a 3.9 BLEU point improvement on out-of-domain data.

02

Demonstrated effectiveness of bilingual embeddings in reducing OOV errors.

03

Improved translation quality with minimal bilingual resources.

Abstract

Out-of-vocabulary words account for a large proportion of errors in machine translation systems, especially when the system is used on a different domain than the one where it was trained. In order to alleviate the problem, we propose to use a log-bilinear softmax-based model for vocabulary expansion, such that given an out-of-vocabulary source word, the model generates a probabilistic list of possible translations in the target language. Our model uses only word embeddings trained on significantly large unlabelled monolingual corpora and trains over a fairly small, word-to-word bilingual dictionary. We input this probabilistic list into a standard phrase-based statistical machine translation system and obtain consistent improvements in translation quality on the English-Spanish language pair. Especially, we get an improvement of 3.9 BLEU points when tested over an out-of-domain test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification