Characterizing Mamba's Selective Memory using Auto-Encoders
Tamanna Hossain, Robert L. Logan IV, Ganesh Jagadeesan, Sameer Singh, Joel Tetreault, Alejandro Jaimes

TL;DR
This paper investigates the types of information that state space model language models, specifically Mamba, tend to forget during inference, revealing biases towards less prevalent tokens like math symbols and entities, and proposing a method to analyze this loss.
Contribution
It introduces a novel auto-encoder approach to characterize information loss in SSM LMs, highlighting specific token types and sequences that are more frequently forgotten.
Findings
Math-related tokens are more often forgotten.
Less prevalent tokens tend to be lost.
Mamba shows higher information loss on certain entity mentions.
Abstract
State space models (SSMs) are a promising alternative to transformers for language modeling because they use fixed memory during inference. However, this fixed memory usage requires some information loss in the hidden state when processing long sequences. While prior work has studied the sequence length at which this information loss occurs, it does not characterize the types of information SSM language models (LMs) tend to forget. In this paper, we address this knowledge gap by identifying the types of tokens (e.g., parts of speech, named entities) and sequences (e.g., code, math problems) that are more frequently forgotten by SSM LMs. We achieve this by training an auto-encoder to reconstruct sequences from the SSM's hidden state, and measure information loss by comparing inputs with their reconstructions. We perform experiments using the Mamba family of SSM LMs (130M--1.4B) on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
