Unpaired Cross-lingual Image Caption Generation with Self-Supervised   Rewards

Yuqing Song; Shizhe Chen; Yida Zhao; Qin Jin

arXiv:1908.05407·cs.CV·August 16, 2019·5 cites

Unpaired Cross-lingual Image Caption Generation with Self-Supervised Rewards

Yuqing Song, Shizhe Chen, Yida Zhao, Qin Jin

PDF

Open Access

TL;DR

This paper introduces a self-supervised reinforcement learning approach for unpaired cross-lingual image captioning that improves fluency and relevance without relying on pivot language translation.

Contribution

It proposes a novel self-supervised reward mechanism using monolingual data and visual semantic matching to enhance caption quality in unpaired cross-lingual settings.

Findings

01

Significant performance improvements over state-of-the-art methods.

02

Effective fluency and relevance enhancement without pivot translation.

03

Successful application to English and Chinese captioning tasks.

Abstract

Generating image descriptions in different languages is essential to satisfy users worldwide. However, it is prohibitively expensive to collect large-scale paired image-caption dataset for every target language which is critical for training descent image captioning models. Previous works tackle the unpaired cross-lingual image captioning problem through a pivot language, which is with the help of paired image-caption data in the pivot language and pivot-to-target machine translation models. However, such language-pivoted approach suffers from inaccuracy brought by the pivot-to-target translation, including disfluency and visual irrelevancy errors. In this paper, we propose to generate cross-lingual image captions with self-supervised rewards in the reinforcement learning framework to alleviate these two types of errors. We employ self-supervision from mono-lingual corpus in the target…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning