Towards Diverse and Natural Image Descriptions via a Conditional GAN

Bo Dai; Sanja Fidler; Raquel Urtasun; Dahua Lin

arXiv:1703.06029·cs.CV·August 14, 2017·108 cites

Towards Diverse and Natural Image Descriptions via a Conditional GAN

Bo Dai, Sanja Fidler, Raquel Urtasun, Dahua Lin

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a Conditional GAN framework for image captioning that enhances the diversity and naturalness of generated descriptions, overcoming limitations of traditional likelihood-based models.

Contribution

It proposes a novel CGAN-based approach with reinforcement learning to generate more diverse and natural image descriptions, addressing the rigidity of existing methods.

Findings

01

Outperforms existing methods on large datasets

02

Achieves human-level performance in user studies

03

Produces more diverse and natural captions

Abstract

Despite the substantial progress in recent years, the image captioning techniques are still far from being perfect.Sentences produced by existing methods, e.g. those based on RNNs, are often overly rigid and lacking in variability. This issue is related to a learning principle widely used in practice, that is, to maximize the likelihood of training samples. This principle encourages high resemblance to the "ground-truth" captions while suppressing other reasonable descriptions. Conventional evaluation metrics, e.g. BLEU and METEOR, also favor such restrictive methods. In this paper, we explore an alternative approach, with the aim to improve the naturalness and diversity -- two essential properties of human expression. Specifically, we propose a new framework based on Conditional Generative Adversarial Networks (CGAN), which jointly learns a generator to produce descriptions conditioned…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

doubledaibo/gancaption_iccv2017
none

Videos

Towards Diverse and Natural Image Descriptions via a Conditional GAN· youtube

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Human Pose and Action Recognition