Adversarial Semantic Alignment for Improved Image Captions
Pierre L. Dognin, Igor Melnyk, Youssef Mroueh, Jarret Ross, and Tom, Sercu (IBM Research, USA)

TL;DR
This paper introduces a semantic alignment approach using conditional GANs for image captioning, proposing new training methods, a semantic evaluation score, and a novel Out of Context test set to improve captioning quality and generalization.
Contribution
It presents a context-aware LSTM captioner with a co-attentive discriminator, compares training methods, and introduces new evaluation tools including a semantic score and Out of Context dataset.
Findings
SCST training yields more stable gradients and better results than Gumbel ST.
The semantic score correlates well with human judgment.
SCST-based training improves performance on OOC and MS-COCO benchmarks.
Abstract
In this paper we study image captioning as a conditional GAN training, proposing both a context-aware LSTM captioner and co-attentive discriminator, which enforces semantic alignment between images and captions. We empirically focus on the viability of two training methods: Self-critical Sequence Training (SCST) and Gumbel Straight-Through (ST) and demonstrate that SCST shows more stable gradient behavior and improved results over Gumbel ST, even without accessing discriminator gradients directly. We also address the problem of automatic evaluation for captioning models and introduce a new semantic score, and show its correlation to human judgement. As an evaluation paradigm, we argue that an important criterion for a captioner is the ability to generalize to compositions of objects that do not usually co-occur together. To this end, we introduce a small captioned Out of Context (OOC)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSelf-critical Sequence Training · Sigmoid Activation · Tanh Activation · Convolution · Long Short-Term Memory · Dogecoin Customer Service Number +1-833-534-1729
