Understanding Input Selectivity in Mamba: Impact on Approximation Power, Memorization, and Associative Recall Capacity
Ningyuan Huang, Miguel Sarabia, Abhinav Moudgil, Pau Rodriguez, Luca Zappella, Federico Danieli

TL;DR
This paper investigates how input selectivity in Mamba State-Space Models enhances their approximation, memorization, and recall capabilities, providing theoretical insights and empirical validation of these effects.
Contribution
It offers a detailed theoretical analysis of Mamba's input selectivity, demonstrating its advantages in function approximation, memory retention, and associative recall over previous models.
Findings
S6 layer can represent Haar wavelet projections
Input selectivity helps counteract memory decay
Analytical solutions for associative recall tasks
Abstract
State-Space Models (SSMs), and particularly Mamba, have recently emerged as a promising alternative to Transformers. Mamba introduces input selectivity to its SSM layer (S6) and incorporates convolution and gating into its block definition. While these modifications do improve Mamba's performance over its SSM predecessors, it remains largely unclear how Mamba leverages the additional functionalities provided by input selectivity, and how these interact with the other operations in the Mamba architecture. In this work, we demystify the role of input selectivity in Mamba, investigating its impact on function approximation power, long-term memorization, and associative recall capabilities. In particular: (i) we prove that the S6 layer of Mamba can represent projections onto Haar wavelets, providing an edge over its Diagonal SSM (S4D) predecessor in approximating discontinuous functions…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsFormal Methods in Verification · Parallel Computing and Optimization Techniques · Ferroelectric and Negative Capacitance Devices
