Mind the GAP: Glimpse-based Active Perception improves generalization   and sample efficiency of visual reasoning

Oleh Kolner; Thomas Ortner; Stanis{\l}aw Wo\'zniak; Angeliki; Pantazi

arXiv:2409.20213·cs.CV·April 2, 2025

Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning

Oleh Kolner, Thomas Ortner, Stanis{\l}aw Wo\'zniak, Angeliki, Pantazi

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a novel glimpse-based active perception system that enhances visual reasoning by sequentially focusing on salient image regions, leading to improved generalization and sample efficiency over previous models.

Contribution

The paper proposes a new active perception method inspired by human eye movements, which significantly improves visual relation understanding in AI systems.

Findings

01

Achieves state-of-the-art results on visual reasoning tasks.

02

Demonstrates better generalization to out-of-distribution inputs.

03

Shows increased sample efficiency compared to prior models.

Abstract

Human capabilities in understanding visual relations are far superior to those of AI systems, especially for previously unseen objects. For example, while AI systems struggle to determine whether two such objects are visually the same or different, humans can do so with ease. Active vision theories postulate that the learning of visual relations is grounded in actions that we take to fixate objects and their parts by moving our eyes. In particular, the low-dimensional spatial information about the corresponding eye movements is hypothesized to facilitate the representation of relations between different image parts. Inspired by these theories, we develop a system equipped with a novel Glimpse-based Active Perception (GAP) that sequentially glimpses at the most salient regions of the input image and processes them at high resolution. Importantly, our system leverages the locations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ibm/glimpse-based-active-perception
pytorchOfficial

Videos

Mind the GAP: Glimpse-based Active Perception improves generalization and sample efficiency of visual reasoning· slideslive

Taxonomy

TopicsNeural Networks and Applications