Understanding the computational demands underlying visual reasoning

Mohit Vaishnav; Remi Cadene; Andrea Alamia; Drew Linsley; Rufin; VanRullen; Thomas Serre

arXiv:2108.03603·cs.CV·March 3, 2022

Understanding the computational demands underlying visual reasoning

Mohit Vaishnav, Remi Cadene, Andrea Alamia, Drew Linsley, Rufin, VanRullen, Thomas Serre

PDF

Open Access

TL;DR

This paper investigates the computational complexity of visual reasoning by testing CNNs on a set of visual reasoning tasks, extending models with attention mechanisms, and proposing a taxonomy based on relation types and complexity.

Contribution

It introduces a taxonomy of visual reasoning tasks based on relation types and complexity, and demonstrates that attention-augmented CNNs improve performance on these tasks.

Findings

01

Attention mechanisms significantly improve CNN performance on complex visual reasoning tasks.

02

The taxonomy links task difficulty to relation type and number of relations.

03

Attention networks partially explain the proposed taxonomy.

Abstract

Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the "Synthetic Visual Reasoning Test" (SVRT) challenge, a collection of twenty-three visual reasoning problems. Our analysis reveals a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different vs. spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in humans' visual reasoning ability. To test this hypothesis, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Explainable Artificial Intelligence (XAI)