Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning
Oleh Kolner, Thomas Ortner, Stanis{\l}aw Wo\'zniak, Angeliki, Pantazi

TL;DR
This paper introduces a novel glimpse-based active perception system that enhances visual reasoning by sequentially focusing on salient image regions, leading to improved generalization and sample efficiency over previous models.
Contribution
The paper proposes a new active perception method inspired by human eye movements, which significantly improves visual relation understanding in AI systems.
Findings
Achieves state-of-the-art results on visual reasoning tasks.
Demonstrates better generalization to out-of-distribution inputs.
Shows increased sample efficiency compared to prior models.
Abstract
Human capabilities in understanding visual relations are far superior to those of AI systems, especially for previously unseen objects. For example, while AI systems struggle to determine whether two such objects are visually the same or different, humans can do so with ease. Active vision theories postulate that the learning of visual relations is grounded in actions that we take to fixate objects and their parts by moving our eyes. In particular, the low-dimensional spatial information about the corresponding eye movements is hypothesized to facilitate the representation of relations between different image parts. Inspired by these theories, we develop a system equipped with a novel Glimpse-based Active Perception (GAP) that sequentially glimpses at the most salient regions of the input image and processes them at high resolution. Importantly, our system leverages the locations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications
