Lost in State Space: Probing Frozen Mamba Representations
Bhagyashree Wagh, Akash Singh

TL;DR
This paper investigates whether frozen Mamba representations can produce semantic sentence summaries at patch boundaries, finding they do not outperform simple pooling due to structural issues like anisotropy and collapse.
Contribution
It provides a thorough empirical evaluation of frozen Mamba representations, revealing structural pathologies and proposing a modified recurrence method.
Findings
Patch boundary readouts do not outperform mean pooling.
Severe anisotropy observed in representations (cosine similarity 0.9999).
Representational collapse confirmed in final states.
Abstract
Mamba's recurrent state h_t is, by construction, a compressed summary of every token seen so far. This raises a tempting hypothesis: if we extract token-level outputs y_t at fixed patch boundaries, we obtain semantic sentence summaries for free, with no pooling head, no fine-tuning, and no [CLS] token. We test this hypothesis carefully. Across five benchmarks (SST-2, CoLA, MRPC, STS-B, IMDb), we compare four strategies for extracting frozen sentence representations from a pretrained Mamba-130M backbone under a strict frozen-feature probing protocol, using three random seeds where computationally feasible. The results do not support the hypothesis: patch boundary readouts do not consistently outperform simple mean pooling. We identify and quantify two structural pathologies: severe anisotropy (mean pairwise cosine similarity 0.9999, std 0.000044) and representational collapse in the raw…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
