Emergence of Fixational and Saccadic Movements in a Multi-Level Recurrent Attention Model for Vision
Pengcheng Pan, Yonekura Shogo, Yasuo Kuniyoshi

TL;DR
This paper introduces MRAM, a multi-level recurrent attention model that mimics human visual hierarchy, producing more natural eye movement patterns and outperforming previous models on image classification tasks.
Contribution
The paper presents a novel hierarchical attention model that explicitly captures human visual processing, balancing fixational and saccadic movements for improved interpretability and performance.
Findings
MRAM produces more human-like eye movement patterns.
MRAM outperforms CNN, RAM, and DRAM on standard benchmarks.
Decoupling glimpse generation and task execution improves attention dynamics.
Abstract
Inspired by foveal vision, hard attention models promise interpretability and parameter economy. However, existing models like the Recurrent Model of Visual Attention (RAM) and Deep Recurrent Attention Model (DRAM) failed to model the hierarchy of human vision system, that compromise on the visual exploration dynamics. As a result, they tend to produce attention that are either overly fixational or excessively saccadic, diverging from human eye movement behavior. In this paper, we propose a Multi-Level Recurrent Attention Model (MRAM), a novel hard attention framework that explicitly models the neural hierarchy of human visual processing. By decoupling the function of glimpse location generation and task execution in two recurrent layers, MRAM emergent a balanced behavior between fixation and saccadic movement. Our results show that MRAM not only achieves more human-like attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
