Context-aware Captions from Context-agnostic Supervision

Ramakrishna Vedantam; Samy Bengio; Kevin Murphy; Devi Parikh; Gal; Chechik

arXiv:1701.02870·cs.CV·August 2, 2017·25 cites

Context-aware Captions from Context-agnostic Supervision

Ramakrishna Vedantam, Samy Bengio, Kevin Murphy, Devi Parikh, Gal, Chechik

PDF

Open Access 1 Repo

TL;DR

This paper presents a novel inference method to generate context-aware image captions that distinguish similar concepts using only generic, context-agnostic training data, improving discriminative captioning performance.

Contribution

It introduces a joint inference technique combining a context-agnostic language model with a listener for discriminative captioning, applicable without specialized training data.

Findings

01

Outperforms baseline methods in discriminative captioning tasks

02

Effective in distinguishing closely related visual concepts

03

Improves justification accuracy for fine-grained categories

Abstract

We introduce an inference technique to produce discriminative context-aware image captions (captions that describe differences between images or visual concepts) using only generic context-agnostic training data (captions that describe a concept or an image in isolation). For example, given images and captions of "siamese cat" and "tiger cat", we generate language that describes the "siamese cat" in a way that distinguishes it from "tiger cat". Our key novelty is that we show how to do joint inference over a language model that is context-agnostic and a listener which distinguishes closely-related concepts. We first apply our technique to a justification task, namely to describe why an image contains a particular fine-grained category as opposed to another closely-related category of the CUB-200-2011 dataset. We then study discriminative image captioning to generate language that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ruotianluo/DiscCaptioning
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Domain Adaptation and Few-Shot Learning