Incorporating Global Visual Features into Attention-Based Neural Machine   Translation

Iacer Calixto; Qun Liu; Nick Campbell

arXiv:1701.06521·cs.CL·January 24, 2017·28 cites

Incorporating Global Visual Features into Attention-Based Neural Machine Translation

Iacer Calixto, Qun Liu, Nick Campbell

PDF

Open Access

TL;DR

This paper presents multi-modal attention-based neural machine translation models that incorporate global visual features in various ways, achieving state-of-the-art results and surpassing traditional statistical models on the Multi30k dataset.

Contribution

It introduces novel strategies for integrating global image features into neural translation models and demonstrates their effectiveness with new state-of-the-art results.

Findings

01

Global visual features improve translation quality.

02

Multi-modal models outperform phrase-based statistical models.

03

Synthetic multi-modal data enhances model performance.

Abstract

We introduce multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder. We utilise global image features extracted using a pre-trained convolutional neural network and incorporate them (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. In our experiments, we evaluate how these different strategies to incorporate global image features compare and which ones perform best. We also study the impact that adding synthetic multi-modal, multilingual data brings and find that the additional data have a positive impact on multi-modal models. We report new state-of-the-art results and our best models also significantly improve on a comparable phrase-based Statistical MT (PBSMT) model trained on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling