Revealing and Mitigating the Local Pattern Shortcuts of Mamba
Wangjie You, Zecheng Tang, Juntao Li, Lili Yao, Min Zhang

TL;DR
This paper identifies local pattern shortcuts in Mamba, a linear-complexity model based on SSMs, and introduces a global selection module to improve its handling of distributed information, significantly boosting performance.
Contribution
The paper reveals Mamba's reliance on local shortcuts and proposes a global selection module to enhance its ability to process distributed key information.
Findings
Mamba performs well on local tasks but struggles with distributed information.
Adding a global selection module improves Mamba's performance on complex tasks.
The proposed method increases performance from 0 to 80.54 points with only 4M extra parameters.
Abstract
Large language models (LLMs) have advanced significantly due to the attention mechanism, but their quadratic complexity and linear memory demands limit their performance on long-context tasks. Recently, researchers introduced Mamba, an advanced model built upon State Space Models(SSMs) that offers linear complexity and constant memory. Although Mamba is reported to match or surpass the performance of attention-based models, our analysis reveals a performance gap: Mamba excels in tasks that involve localized key information but faces challenges with tasks that require handling distributed key information. Our controlled experiments suggest that this inconsistency arises from Mamba's reliance on local pattern shortcuts, which enable the model to remember local key information within its limited memory but hinder its ability to retain more dispersed information. Therefore, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAgriculture and Rural Development Research · Language, Linguistics, Cultural Analysis · Urban and Rural Development Challenges
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
