Imagination improves Multimodal Translation

Desmond Elliott; \'Akos K\'ad\'ar

arXiv:1705.04350·cs.CL·July 10, 2017·77 cites

Imagination improves Multimodal Translation

Desmond Elliott, \'Akos K\'ad\'ar

PDF

Open Access

TL;DR

This paper introduces a multitask learning approach that enhances multimodal translation by integrating visual grounding and translation tasks, leading to improved performance on benchmark datasets.

Contribution

It presents a novel multitask framework combining translation and visual grounding, demonstrating effectiveness even with external datasets for both tasks.

Findings

01

Improved translation accuracy on Multi30K dataset

02

Effective use of external MS COCO dataset for image prediction

03

Enhanced translation performance with external parallel text

Abstract

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations. In a multitask learning framework, translations are learned in an attention-based encoder-decoder, and grounded representations are learned through image representation prediction. Our approach improves translation performance compared to the state of the art on the Multi30K dataset. Furthermore, it is equally effective if we train the image prediction task on the external MS COCO dataset, and we find improvements if we train the translation model on the external News Commentary parallel text.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Multimodal Machine Learning Applications · Topic Modeling