Emergence of Fixational and Saccadic Movements in a Multi-Level Recurrent Attention Model for Vision

Pengcheng Pan; Yonekura Shogo; Yasuo Kuniyoshi

arXiv:2505.13191·cs.CV·November 18, 2025

Emergence of Fixational and Saccadic Movements in a Multi-Level Recurrent Attention Model for Vision

Pengcheng Pan, Yonekura Shogo, Yasuo Kuniyoshi

PDF

TL;DR

This paper introduces MRAM, a multi-level recurrent attention model that mimics human visual hierarchy, producing more natural eye movement patterns and outperforming previous models on image classification tasks.

Contribution

The paper presents a novel hierarchical attention model that explicitly captures human visual processing, balancing fixational and saccadic movements for improved interpretability and performance.

Findings

01

MRAM produces more human-like eye movement patterns.

02

MRAM outperforms CNN, RAM, and DRAM on standard benchmarks.

03

Decoupling glimpse generation and task execution improves attention dynamics.

Abstract

Inspired by foveal vision, hard attention models promise interpretability and parameter economy. However, existing models like the Recurrent Model of Visual Attention (RAM) and Deep Recurrent Attention Model (DRAM) failed to model the hierarchy of human vision system, that compromise on the visual exploration dynamics. As a result, they tend to produce attention that are either overly fixational or excessively saccadic, diverging from human eye movement behavior. In this paper, we propose a Multi-Level Recurrent Attention Model (MRAM), a novel hard attention framework that explicitly models the neural hierarchy of human visual processing. By decoupling the function of glimpse location generation and task execution in two recurrent layers, MRAM emergent a balanced behavior between fixation and saccadic movement. Our results show that MRAM not only achieves more human-like attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSoftmax · Attention Is All You Need