Generating Diverse and Accurate Visual Captions by Comparative Adversarial Learning
Dianqi Li, Qiuyuan Huang, Xiaodong He, Lei Zhang, Ming-Ting Sun

TL;DR
This paper introduces a novel adversarial learning framework that generates diverse, accurate, and discriminative image captions by comparing captions within a joint space, improving over traditional methods.
Contribution
It proposes a comparative adversarial learning approach for caption generation that enhances diversity and discriminativeness compared to existing models.
Findings
Produces more discriminative captions across images
Generates diverse captions that better reflect image differences
Outperforms baseline models in accuracy and diversity
Abstract
We study how to generate captions that are not only accurate in describing an image but also discriminative across different images. The problem is both fundamental and interesting, as most machine-generated captions, despite phenomenal research progresses in the past several years, are expressed in a very monotonic and featureless format. While such captions are normally accurate, they often lack important characteristics in human languages - distinctiveness for each caption and diversity for different images. To address this problem, we propose a novel conditional generative adversarial network for generating diverse captions across images. Instead of estimating the quality of a caption solely on one image, the proposed comparative adversarial learning framework better assesses the quality of captions by comparing a set of captions within the image-caption joint space. By contrasting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Human Pose and Action Recognition
