Discriminability objective for training descriptive captions

Ruotian Luo; Brian Price; Scott Cohen; Gregory Shakhnarovich

arXiv:1803.04376·cs.CV·June 12, 2018·72 cites

Discriminability objective for training descriptive captions

Ruotian Luo, Brian Price, Scott Cohen, Gregory Shakhnarovich

PDF

Open Access 1 Repo

TL;DR

This paper introduces a discriminability-focused training objective for image captioning models, significantly enhancing their ability to produce captions that distinguish between images, while also improving traditional caption quality metrics.

Contribution

It proposes a novel loss component that directly optimizes for caption discriminability, applicable across various captioning models and loss functions.

Findings

01

Humans find the generated captions more discriminative.

02

Standard caption quality scores like BLEU and SPICE improve.

03

The method is modular and broadly applicable.

Abstract

One property that remains lacking in image captions generated by contemporary methods is discriminability: being able to tell two images apart given the caption for one of them. We propose a way to improve this aspect of caption generation. By incorporating into the captioning training objective a loss component directly related to ability (by a machine) to disambiguate image/caption matches, we obtain systems that produce much more discriminative caption, according to human evaluation. Remarkably, our approach leads to improvement in other aspects of generated captions, reflected by a battery of standard scores such as BLEU, SPICE etc. Our approach is modular and can be applied to a variety of model/loss combinations commonly proposed for image captioning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruotianluo/DiscCaptioning
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization