Semi-Supervised Image Captioning by Adversarially Propagating Labeled   Data

Dong-Jin Kim; Tae-Hyun Oh; Jinsoo Choi; In So Kweon

arXiv:2301.11174·cs.CV·January 27, 2023

Semi-Supervised Image Captioning by Adversarially Propagating Labeled Data

Dong-Jin Kim, Tae-Hyun Oh, Jinsoo Choi, In So Kweon

PDF

Open Access

TL;DR

This paper introduces a semi-supervised image captioning framework that leverages large unpaired image and caption datasets through adversarial learning to improve captioning performance, especially when paired data is scarce.

Contribution

The paper proposes a novel adversarial semi-supervised learning approach that associates unpaired image and caption data, enhancing image captioning models' generalization capabilities.

Findings

01

Significant performance improvements on image captioning benchmarks.

02

Effective handling of out-of-task and web-crawled unpaired data.

03

Theoretically well-founded with favorable global optimal properties.

Abstract

We present a novel data-efficient semi-supervised framework to improve the generalization of image captioning models. Constructing a large-scale labeled image captioning dataset is an expensive task in terms of labor, time, and cost. In contrast to manually annotating all the training samples, separately collecting uni-modal datasets is immensely easier, e.g., a large-scale image dataset and a sentence dataset. We leverage such massive unpaired image and caption data upon standard paired data by learning to associate them. To this end, our proposed semi-supervised learning method assigns pseudo-labels to unpaired samples in an adversarial learning fashion, where the joint distribution of image and caption is learned. Our method trains a captioner to learn from a paired data and to progressively associate unpaired data. This approach shows noticeable performance improvement even in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques