Visual Attention Exploration in Vision-Based Mamba Models
Junpeng Wang, Chin-Chia Michael Yeh, Uday Singh Saini, Mahashweta Das

TL;DR
This paper introduces a visual analytics tool to explore and understand attention mechanisms in vision-based Mamba models, revealing how patch interactions and ordering strategies influence attention distribution.
Contribution
It provides a novel visualization method to analyze attention patterns in vision-based SSMs, specifically Mamba, and investigates the effects of patch arrangement strategies.
Findings
Attention distribution varies with patch order
Attention patterns evolve across Mamba blocks
Patch interactions are influenced by spatial arrangements
Abstract
State space models (SSMs) have emerged as an efficient alternative to transformer-based models, offering linear complexity that scales better than transformers. One of the latest advances in SSMs, Mamba, introduces a selective scan mechanism that assigns trainable weights to input tokens, effectively mimicking the attention mechanism. Mamba has also been successfully extended to the vision domain by decomposing 2D images into smaller patches and arranging them as 1D sequences. However, it remains unclear how these patches interact with (or attend to) each other in relation to their original 2D spatial location. Additionally, the order used to arrange the patches into a sequence also significantly impacts their attention distribution. To better understand the attention between patches and explore the attention patterns, we introduce a visual analytics tool specifically designed for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks · Neural Networks and Reservoir Computing
