Adversarial Semantic Alignment for Improved Image Captions

Pierre L. Dognin; Igor Melnyk; Youssef Mroueh; Jarret Ross; and Tom; Sercu (IBM Research; USA)

arXiv:1805.00063·cs.LG·June 10, 2019

Adversarial Semantic Alignment for Improved Image Captions

Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, and Tom, Sercu (IBM Research, USA)

PDF

TL;DR

This paper introduces a semantic alignment approach using conditional GANs for image captioning, proposing new training methods, a semantic evaluation score, and a novel Out of Context test set to improve captioning quality and generalization.

Contribution

It presents a context-aware LSTM captioner with a co-attentive discriminator, compares training methods, and introduces new evaluation tools including a semantic score and Out of Context dataset.

Findings

01

SCST training yields more stable gradients and better results than Gumbel ST.

02

The semantic score correlates well with human judgment.

03

SCST-based training improves performance on OOC and MS-COCO benchmarks.

Abstract

In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSelf-critical Sequence Training · Sigmoid Activation · Tanh Activation · Convolution · Long Short-Term Memory · Dogecoin Customer Service Number +1-833-534-1729