Multimodal Attention for Neural Machine Translation

Ozan Caglayan; Lo\"ic Barrault; Fethi Bougares

arXiv:1609.03976·cs.CL·September 14, 2016·37 cites

Multimodal Attention for Neural Machine Translation

Ozan Caglayan, Lo\"ic Barrault, Fethi Bougares

PDF

Open Access 1 Repo

TL;DR

This paper explores a multimodal attention mechanism that jointly focuses on images and text to improve neural machine translation, demonstrating significant performance gains over text-only models.

Contribution

It introduces a novel multimodal attention mechanism for NMT that leverages both visual and textual information simultaneously.

Findings

01

Up to 1.6 BLEU and METEOR score improvements

02

Dedicated attention per modality enhances translation quality

03

Effective integration of image and text modalities in NMT

Abstract

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lium-lst/nmtpy
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling