X-VMamba: Explainable Vision Mamba
Mohamed A. Mabrok, Yalda Zafari

TL;DR
This paper introduces a controllability-based interpretability framework for Vision State Space Models, enabling transparent analysis of how input parts influence internal states with linear complexity, validated on medical imaging data.
Contribution
It proposes a novel, efficient interpretability method for SSMs using Jacobian and Gramian approaches, applicable across architectures without modifications.
Findings
Revealed hierarchical feature refinement in medical imaging SSMs
Identified domain-specific controllability signatures
Showed influence of scanning strategies on attention patterns
Abstract
State Space Models (SSMs), particularly the Mamba architecture, have recently emerged as powerful alternatives to Transformers for sequence modeling, offering linear computational complexity while achieving competitive performance. Yet, despite their effectiveness, understanding how these Vision SSMs process spatial information remains challenging due to the lack of transparent, attention-like mechanisms. To address this gap, we introduce a controllability-based interpretability framework that quantifies how different parts of the input sequence (tokens or patches) influence the internal state dynamics of SSMs. We propose two complementary formulations: a Jacobian-based method applicable to any SSM architecture that measures influence through the full chain of state propagation, and a Gramian-based approach for diagonal SSMs that achieves superior speed through closed-form analytical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Machine Learning in Healthcare
