Exploring the Limitations of Mamba in COPY and CoT Reasoning
Ruifeng Ren, Zhicong Li, Yong Liu

TL;DR
This paper critically examines Mamba's capabilities in sequence modeling, revealing that while it can match Transformers in some tasks, it faces limitations in COPY and Chain of Thought reasoning, especially with fixed size.
Contribution
The paper provides a detailed analysis of Mamba's expressive power, highlighting its limitations and conditions under which it can or cannot outperform Transformers.
Findings
Mamba struggles with COPY operations at constant size.
Linear growth in Mamba size enables COPY but loses efficiency.
Mamba's performance on CoT tasks is limited compared to Transformers.
Abstract
Transformers have become the backbone of modern Large Language Models (LLMs); however, their inference overhead grows linearly with the sequence length, posing challenges for modeling long sequences. In light of this, Mamba has attracted attention for maintaining a constant inference size, with empirical evidence demonstrating that it can match Transformer performance in sequence modeling while significantly reducing computational costs. However, an open question remains: can Mamba always bring savings while achieving performance comparable to Transformers? In this paper, we focus on analyzing the expressive ability of Mamba to perform our defined COPY operation and Chain of Thought (CoT) reasoning. First, inspired by the connection between Mamba and linear attention, we show that constant-sized Mamba may struggle to perform COPY operations while Transformers can handle them more…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAfrican history and culture studies
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Focus
