A Saccade-inspired Approach to Image Classification using Vision Transformer Attention Maps
Matthis Dallain, Laurent Rodriguez, Laurent Udo Perrinet, Beno\^it Miramond

TL;DR
This paper introduces a biologically inspired saccade-like method for image classification using Vision Transformer attention maps, improving efficiency and performance by focusing on key image regions.
Contribution
The work leverages DINO's attention maps to guide selective image processing, demonstrating a novel saccade-inspired approach that enhances classification efficiency and outperforms some existing methods.
Findings
Selective processing preserves most classification accuracy.
DINO's attention maps outperform traditional saliency models.
The approach suggests a new direction for neuromorphic visual systems.
Abstract
Human vision achieves remarkable perceptual performance while operating under strict metabolic constraints. A key ingredient is the selective attention mechanism, driven by rapid saccadic eye movements that constantly reposition the high-resolution fovea onto task-relevant locations, unlike conventional AI systems that process entire images with equal emphasis. Our work aims to draw inspiration from the human visual system to create smarter, more efficient image processing models. Using DINO, a self-supervised Vision Transformer that produces attention maps strikingly similar to human gaze patterns, we explore a saccade inspired method to focus the processing of information on key regions in visual space. To do so, we use the ImageNet dataset in a standard classification task and measure how each successive saccade affects the model's class scores. This selective-processing strategy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Gaze Tracking and Assistive Technology · Face Recognition and Perception
