Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex

Sushrut Thorat; Adrien Doerig; Alexander Kroner; Carmen Amme; Tim C. Kietzmann

arXiv:2511.12715·q-bio.NC·November 18, 2025

Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex

Sushrut Thorat, Adrien Doerig, Alexander Kroner, Carmen Amme, Tim C. Kietzmann

PDF

Open Access 1 Models

TL;DR

This paper introduces Glimpse Prediction Networks that learn to predict future visual features during eye movements, creating scene representations that closely match human visual cortex activity through self-supervised learning from natural viewing behavior.

Contribution

The study presents a novel self-supervised model that predicts upcoming visual features during active vision, aligning with human brain responses and outperforming existing models.

Findings

01

GPNs successfully learn scene co-occurrence and spatial structure.

02

Scene representations from GPNs align with human fMRI responses.

03

GPNs outperform models trained with explicit semantic objectives.

Abstract

Scenes are complex, yet structured collections of parts, including objects and surfaces, that exhibit spatial and semantic relations to one another. An effective visual system therefore needs unified scene representations that relate scene parts to their location and their co-occurrence. We hypothesize that this structure can be learned self-supervised from natural experience by exploiting the temporal regularities of active vision: each fixation reveals a locally-detailed glimpse that is statistically related to the previous one via co-occurrence and saccade-conditioned spatial regularities. We instantiate this idea with Glimpse Prediction Networks (GPNs) -- recurrent models trained to predict the feature embedding of the next glimpse along human-like scanpaths over natural scenes. GPNs successfully learn co-occurrence structure and, when given relative saccade location vectors, show…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
novelmartis/GPN
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace Recognition and Perception · Visual Attention and Saliency Detection · Visual perception and processing mechanisms