Exploring the Limitations of Mamba in COPY and CoT Reasoning

Ruifeng Ren; Zhicong Li; Yong Liu

arXiv:2410.03810·cs.LG·May 30, 2025

Exploring the Limitations of Mamba in COPY and CoT Reasoning

Ruifeng Ren, Zhicong Li, Yong Liu

PDF

Open Access 1 Video

TL;DR

This paper critically examines Mamba's capabilities in sequence modeling, revealing that while it can match Transformers in some tasks, it faces limitations in COPY and Chain of Thought reasoning, especially with fixed size.

Contribution

The paper provides a detailed analysis of Mamba's expressive power, highlighting its limitations and conditions under which it can or cannot outperform Transformers.

Findings

01

Mamba struggles with COPY operations at constant size.

02

Linear growth in Mamba size enables COPY but loses efficiency.

03

Mamba's performance on CoT tasks is limited compared to Transformers.

Abstract

Transformers have become the backbone of modern Large Language Models (LLMs); however, their inference overhead grows linearly with the sequence length, posing challenges for modeling long sequences. In light of this, Mamba has attracted attention for maintaining a constant inference size, with empirical evidence demonstrating that it can match Transformer performance in sequence modeling while significantly reducing computational costs. However, an open question remains: can Mamba always bring savings while achieving performance comparable to Transformers? In this paper, we focus on analyzing the expressive ability of Mamba to perform our defined COPY operation and Chain of Thought (CoT) reasoning. First, inspired by the connection between Mamba and linear attention, we show that constant-sized Mamba may struggle to perform COPY operations while Transformers can handle them more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring the Limitations of Mamba in COPY and CoT Reasoning· underline

Taxonomy

TopicsAfrican history and culture studies

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Focus