Learning to Describe Differences Between Pairs of Similar Images

Harsh Jhamtani; Taylor Berg-Kirkpatrick

arXiv:1808.10584·cs.CL·September 3, 2018

Learning to Describe Differences Between Pairs of Similar Images

Harsh Jhamtani, Taylor Berg-Kirkpatrick

PDF

1 Repo 5 Models

TL;DR

This paper introduces a new dataset and model for automatically describing differences between similar images, advancing the alignment of language and vision in visual comparison tasks.

Contribution

The paper presents a novel dataset of difference descriptions for image pairs and a model that improves over attention-based methods by explicitly capturing visual salience.

Findings

01

Proposed model outperforms attention-only models in single-sentence generation.

02

Dataset enables exploration of language-vision alignment and multi-sentence coherence.

03

Visual analysis reveals object-level difference clusters as a proxy for differences.

Abstract

In this paper, we introduce the task of automatically generating text to describe the differences between two similar images. We collect a new dataset by crowd-sourcing difference descriptions for pairs of image frames extracted from video-surveillance footage. Annotators were asked to succinctly describe all the differences in a short paragraph. As a result, our novel dataset provides an opportunity to explore models that align language and vision, and capture visual salience. The dataset may also be a useful benchmark for coherent multi-sentence generation. We perform a firstpass visual analysis that exposes clusters of differing pixels as a proxy for object-level differences. We propose a model that captures visual salience by using a latent variable to align clusters of differing pixels with output sentences. We find that, for both single-sentence generation and as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

harsh19/spot-the-diff
noneOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.