Show, Tell and Discriminate: Image Captioning by Self-retrieval with   Partially Labeled Data

Xihui Liu; Hongsheng Li; Jing Shao; Dapeng Chen; Xiaogang Wang

arXiv:1803.08314·cs.CV·July 24, 2018·31 cites

Show, Tell and Discriminate: Image Captioning by Self-retrieval with Partially Labeled Data

Xihui Liu, Hongsheng Li, Jing Shao, Dapeng Chen, Xiaogang Wang

PDF

Open Access

TL;DR

This paper introduces a self-retrieval guided image captioning framework that enhances discriminativeness and leverages unlabeled data, improving caption quality without requiring additional annotations.

Contribution

It proposes a novel retrieval-guided training method that promotes discriminative caption generation and utilizes unlabeled images for improved performance.

Findings

01

Outperforms existing methods on COCO and Flickr30k datasets.

02

Generates more discriminative and unique captions.

03

Effectively leverages unlabeled images without extra annotations.

Abstract

The aim of image captioning is to generate captions by machine to describe image contents. Despite many efforts, generating discriminative captions for images remains non-trivial. Most traditional approaches imitate the language structure patterns, thus tend to fall into a stereotype of replicating frequent phrases or sentences and neglect unique aspects of each image. In this work, we propose an image captioning framework with a self-retrieval module as training guidance, which encourages generating discriminative captions. It brings unique advantages: (1) the self-retrieval guidance can act as a metric and an evaluator of caption discriminativeness to assure the quality of generated captions. (2) The correspondence between generated captions and images are naturally incorporated in the generation process without human annotations, and hence our approach could utilize a large amount of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization