The Hidden Attention of Mamba Models

Ameen Ali; Itamar Zimerman; Lior Wolf

arXiv:2403.01590·cs.LG·April 2, 2024·5 cites

The Hidden Attention of Mamba Models

Ameen Ali, Itamar Zimerman, Lior Wolf

PDF

Open Access 1 Repo 1 Video

TL;DR

The paper reveals that Mamba models, which are efficient selective state space models, can be understood as attention-driven models, providing new insights into their mechanisms and explainability.

Contribution

It introduces a new perspective viewing Mamba models as attention-driven, enabling comparison with transformers and enhancing interpretability.

Findings

01

Mamba models can be interpreted as attention mechanisms.

02

Theoretical and empirical comparison with transformer attention.

03

Enhanced explainability of Mamba's inner workings.

Abstract

The Mamba layer offers an efficient selective state space model (SSM) that is highly effective in modeling multiple domains, including NLP, long-range sequence processing, and computer vision. Selective SSMs are viewed as dual models, in which one trains in parallel on the entire sequence via an IO-aware parallel scan, and deploys in an autoregressive manner. We add a third view and show that such models can be viewed as attention-driven models. This new perspective enables us to empirically and theoretically compare the underlying mechanisms to that of the self-attention layers in transformers and allows us to peer inside the inner workings of the Mamba model with explainability methods. Our code is publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ameenali/hiddenmambaattn
pytorchOfficial

Videos

The Hidden Attention of Mamba Models· underline

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Generative Adversarial Networks and Image Synthesis