A Visual Attention Grounding Neural Model for Multimodal Machine   Translation

Mingyang Zhou; Runxiang Cheng; Yong Jae Lee; Zhou Yu

arXiv:1808.08266·cs.CL·August 29, 2018

A Visual Attention Grounding Neural Model for Multimodal Machine Translation

Mingyang Zhou, Runxiang Cheng, Yong Jae Lee, Zhou Yu

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel multimodal machine translation model that integrates visual attention grounding with textual translation, achieving state-of-the-art results on multiple datasets and introducing a new multilingual product description dataset.

Contribution

The paper introduces a visual attention grounding mechanism within a multimodal translation model and a new multilingual dataset for real-world applications.

Findings

01

Achieves competitive state-of-the-art results on Multi30K and Ambiguous COCO datasets.

02

Outperforms existing methods significantly on the new product description dataset.

03

Demonstrates the effectiveness of visual grounding in improving translation quality.

Abstract

We introduce a novel multimodal machine translation model that utilizes parallel visual and textual information. Our model jointly optimizes the learning of a shared visual-language embedding and a translator. The model leverages a visual attention grounding mechanism that links the visual semantics with the corresponding textual semantics. Our approach achieves competitive state-of-the-art results on the Multi30K and the Ambiguous COCO datasets. We also collected a new multilingual multimodal product description dataset to simulate a real-world international online shopping scenario. On this dataset, our visual attention grounding model outperforms other methods by a large margin.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Eurus-Holmes/VAG-NMT
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling