Multilingual Image Description with Neural Sequence Models
Desmond Elliott, Stella Frank, Eva Hasler

TL;DR
This paper introduces a neural sequence model that generates multilingual image descriptions by integrating image features and source language descriptions, significantly improving translation quality across languages.
Contribution
It presents a novel multi-language image description model combining neural machine translation and image captioning techniques, enhancing multilingual description generation.
Findings
Significant BLEU4 and Meteor score improvements with multi-language training
Effective integration of image features and source language descriptions
Enhanced cross-lingual image description performance
Abstract
In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description. To create a description of an image for a given target language, our sequence generation models condition on feature vectors from the image, the description from the source language, and/or a multimodal vector computed over the image and a description in the source language. In image description experiments on the IAPR-TC12 dataset of images aligned with English and German sentences, we find significant and substantial improvements in BLEU4 and Meteor scores for models trained over multiple languages, compared to a monolingual baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Image Retrieval and Classification Techniques · Natural Language Processing Techniques
