Predicting upcoming visual features during eye movements yields scene representations aligned with human visual cortex
Sushrut Thorat, Adrien Doerig, Alexander Kroner, Carmen Amme, Tim C. Kietzmann

TL;DR
This paper introduces Glimpse Prediction Networks that learn to predict future visual features during eye movements, creating scene representations that closely match human visual cortex activity through self-supervised learning from natural viewing behavior.
Contribution
The study presents a novel self-supervised model that predicts upcoming visual features during active vision, aligning with human brain responses and outperforming existing models.
Findings
GPNs successfully learn scene co-occurrence and spatial structure.
Scene representations from GPNs align with human fMRI responses.
GPNs outperform models trained with explicit semantic objectives.
Abstract
Scenes are complex, yet structured collections of parts, including objects and surfaces, that exhibit spatial and semantic relations to one another. An effective visual system therefore needs unified scene representations that relate scene parts to their location and their co-occurrence. We hypothesize that this structure can be learned self-supervised from natural experience by exploiting the temporal regularities of active vision: each fixation reveals a locally-detailed glimpse that is statistically related to the previous one via co-occurrence and saccade-conditioned spatial regularities. We instantiate this idea with Glimpse Prediction Networks (GPNs) -- recurrent models trained to predict the feature embedding of the next glimpse along human-like scanpaths over natural scenes. GPNs successfully learn co-occurrence structure and, when given relative saccade location vectors, show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Visual Attention and Saliency Detection · Visual perception and processing mechanisms
