Generating Diverse and Accurate Visual Captions by Comparative   Adversarial Learning

Dianqi Li; Qiuyuan Huang; Xiaodong He; Lei Zhang; Ming-Ting Sun

arXiv:1804.00861·cs.CV·March 12, 2019·43 cites

Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning

Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel adversarial learning framework that generates diverse, accurate, and discriminative image captions by comparing captions within a joint space, improving over traditional methods.

Contribution

It proposes a comparative adversarial learning approach for caption generation that enhances diversity and discriminativeness compared to existing models.

Findings

01

Produces more discriminative captions across images

02

Generates diverse captions that better reflect image differences

03

Outperforms baseline models in accuracy and diversity

Abstract

We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Anjaney1999/image-captioning-seqgan
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition