Understanding the computational demands underlying visual reasoning
Mohit Vaishnav, Remi Cadene, Andrea Alamia, Drew Linsley, Rufin, VanRullen, Thomas Serre

TL;DR
This paper investigates the computational complexity of visual reasoning by testing CNNs on a set of visual reasoning tasks, extending models with attention mechanisms, and proposing a taxonomy based on relation types and complexity.
Contribution
It introduces a taxonomy of visual reasoning tasks based on relation types and complexity, and demonstrates that attention-augmented CNNs improve performance on these tasks.
Findings
Attention mechanisms significantly improve CNN performance on complex visual reasoning tasks.
The taxonomy links task difficulty to relation type and number of relations.
Attention networks partially explain the proposed taxonomy.
Abstract
Visual understanding requires comprehending complex visual relations between objects within a scene. Here, we seek to characterize the computational demands for abstract visual reasoning. We do this by systematically assessing the ability of modern deep convolutional neural networks (CNNs) to learn to solve the "Synthetic Visual Reasoning Test" (SVRT) challenge, a collection of twenty-three visual reasoning problems. Our analysis reveals a novel taxonomy of visual reasoning tasks, which can be primarily explained by both the type of relations (same-different vs. spatial-relation judgments) and the number of relations used to compose the underlying rules. Prior cognitive neuroscience work suggests that attention plays a key role in humans' visual reasoning ability. To test this hypothesis, we extended the CNNs with spatial and feature-based attention mechanisms. In a second series of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Visual Attention and Saliency Detection · Explainable Artificial Intelligence (XAI)
