VISALOGY: Answering Visual Analogy Questions
Fereshteh Sadeghi, C. Lawrence Zitnick, Ali Farhadi

TL;DR
This paper introduces a neural network approach to solve visual analogy questions by learning embeddings that capture image transformations, along with a new dataset for natural image analogies.
Contribution
It proposes a quadruple Siamese CNN architecture for learning visual analogies and provides the first dataset and initial results on natural images.
Findings
Successful learning of image transformations for analogy reasoning
First dataset of natural image analogy questions created
Initial results demonstrate feasibility of the approach
Abstract
In this paper, we study the problem of answering visual analogy questions. These questions take the form of image A is to image B as image C is to what. Answering these questions entails discovering the mapping from image A to image B and then extending the mapping to image C and searching for the image D such that the relation from A to B holds for C to D. We pose this problem as learning an embedding that encourages pairs of analogous images with similar transformations to be close together using convolutional neural networks with a quadruple Siamese architecture. We introduce a dataset of visual analogy questions in natural images, and show first results of its kind on solving analogy questions on natural images.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Image and Video Retrieval Techniques · Video Analysis and Summarization
