UNISON: Unpaired Cross-lingual Image Captioning

Jiahui Gao; Yi Zhou; Philip L. H. Yu; Shafiq Joty; Jiuxiang Gu

arXiv:2010.01288·cs.CL·February 8, 2022

UNISON: Unpaired Cross-lingual Image Captioning

Jiahui Gao, Yi Zhou, Philip L. H. Yu, Shafiq Joty, Jiuxiang Gu

PDF

Open Access 1 Video

TL;DR

This paper introduces UNISON, a novel unpaired cross-lingual image captioning approach that generates captions in a target language without requiring paired datasets, leveraging scene graph encoding and cross-modal feature mapping.

Contribution

The work presents a new unpaired cross-lingual image captioning method that does not rely on caption corpora, enabling scalable caption generation across languages.

Findings

01

Effective in Chinese image captioning

02

Outperforms existing methods in experiments

03

Utilizes scene graph and cross-modal mapping

Abstract

Image captioning has emerged as an interesting research field in recent years due to its broad application scenarios. The traditional paradigm of image captioning relies on paired image-caption datasets to train the model in a supervised manner. However, creating such paired datasets for every target language is prohibitively expensive, which hinders the extensibility of captioning technology and deprives a large part of the world population of its benefit. In this work, we present a novel unpaired cross-lingual method to generate image captions without relying on any caption corpus in the source or the target language. Specifically, our method consists of two phases: (i) a cross-lingual auto-encoding process, which utilizing a sentence parallel (bitext) corpus to learn the mapping from the source to the target language in the scene graph encoding space and decode sentences in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

UNISON: Unpaired Cross-Lingual Image Captioning· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning