Does Multimodality Help Human and Machine for Translation and Image   Captioning?

Ozan Caglayan; Walid Aransa; Yaxing Wang; Marc Masana; Mercedes; Garc\'ia-Mart\'inez; Fethi Bougares; Lo\"ic Barrault; Joost van de Weijer

arXiv:1605.09186·cs.CL·August 17, 2016

Does Multimodality Help Human and Machine for Translation and Image Captioning?

Ozan Caglayan, Walid Aransa, Yaxing Wang, Marc Masana, Mercedes, Garc\'ia-Mart\'inez, Fethi Bougares, Lo\"ic Barrault, Joost van de Weijer

PDF

Open Access 1 Repo

TL;DR

This paper investigates whether multimodal data improves translation and image captioning by comparing different systems and evaluating both automatic metrics and human judgment, demonstrating that multimodal approaches yield superior results.

Contribution

It introduces and compares multimodal and monomodal systems for translation and captioning, showing the benefits of multimodal data through comprehensive evaluation.

Findings

01

Multimodal systems outperform monomodal ones in BLEU and METEOR scores.

02

Human evaluation indicates multimodal data enhances translation and captioning quality.

03

The best results were achieved by systems using multimodal data.

Abstract

This paper presents the systems developed by LIUM and CVC for the WMT16 Multimodal Machine Translation challenge. We explored various comparative methods, namely phrase-based systems and attentional recurrent neural networks models trained using monomodal or multimodal data. We also performed a human evaluation in order to estimate the usefulness of multimodal data for human machine translation and image description generation. Our systems obtained the best results for both tasks according to the automatic evaluation metrics BLEU and METEOR.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lium-lst/nmtpy
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis