RNN as Linear Transformer: A Closer Investigation into Representational Potentials of Visual Mamba Models
Timing Yang, Guoyizhe Wei, Alan Yuille, Feng Wang

TL;DR
This paper investigates Mamba's representational capabilities in vision tasks, revealing its theoretical connection to Softmax and Linear Attention, introducing a new evaluation metric, and demonstrating strong performance and interpretability improvements.
Contribution
It provides a theoretical analysis linking Mamba to Softmax and Linear Attention, introduces a binary segmentation metric, and shows enhanced interpretability and performance with self-supervised pretraining.
Findings
Mamba is a low-rank approximation of Softmax Attention.
The new binary segmentation metric quantifies long-range dependencies.
Mamba achieves 78.5% linear probing accuracy on ImageNet.
Abstract
Mamba has recently garnered attention as an effective backbone for vision tasks. However, its underlying mechanism in visual domains remains poorly understood. In this work, we systematically investigate Mamba's representational properties and make three primary contributions. First, we theoretically analyze Mamba's relationship to Softmax and Linear Attention, confirming that it can be viewed as a low-rank approximation of Softmax Attention and thereby bridging the representational gap between Softmax and Linear forms. Second, we introduce a novel binary segmentation metric for activation map evaluation, extending qualitative assessments to a quantitative measure that demonstrates Mamba's capacity to model long-range dependencies. Third, by leveraging DINO for self-supervised pretraining, we obtain clearer activation maps than those produced by standard supervised approaches,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace Recognition and Perception · Visual perception and processing mechanisms · Advanced Neural Network Applications
