VISALOGY: Answering Visual Analogy Questions

Fereshteh Sadeghi; C. Lawrence Zitnick; Ali Farhadi

arXiv:1510.08973·cs.CV·November 2, 2015·21 cites

VISALOGY: Answering Visual Analogy Questions

Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi

PDF

Open Access

TL;DR

This paper introduces a neural network approach to solve visual analogy questions by learning embeddings that capture image transformations, along with a new dataset for natural image analogies.

Contribution

It proposes a quadruple Siamese CNN architecture for learning visual analogies and provides the first dataset and initial results on natural images.

Findings

01

Successful learning of image transformations for analogy reasoning

02

First dataset of natural image analogy questions created

03

Initial results demonstrate feasibility of the approach

Abstract

In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture. We introduce a dataset of visual analogy questions in natural images, and show first results of its kind on solving analogy questions on natural images.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization