Understanding and Mitigating Bottlenecks of State Space Models through   the Lens of Recency and Over-smoothing

Peihao Wang; Ruisi Cai; Yuehao Wang; Jiajun Zhu; Pragya Srivastava,; Zhangyang Wang; Pan Li

arXiv:2501.00658·cs.LG·March 12, 2025

Understanding and Mitigating Bottlenecks of State Space Models through the Lens of Recency and Over-smoothing

Peihao Wang, Ruisi Cai, Yuehao Wang, Jiajun Zhu, Pragya Srivastava,, Zhangyang Wang, Pan Li

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper investigates the inherent limitations of Structured State Space Models (SSMs) related to recency bias and over-smoothing, proposing a polarization method to improve long-range dependency learning and model scalability.

Contribution

The paper reveals the fundamental trade-off between recency bias and over-smoothing in SSMs and introduces a polarization technique to mitigate these issues, enabling deeper and more effective models.

Findings

01

Polarization improves long-range token recall accuracy.

02

Deeper SSMs benefit from the proposed polarization technique.

03

Theoretical analysis links depth, over-smoothing, and recency bias in SSMs.

Abstract

Structured State Space Models (SSMs) have emerged as alternatives to transformers. While SSMs are often regarded as effective in capturing long-sequence dependencies, we rigorously demonstrate that they are inherently limited by strong recency bias. Our empirical studies also reveal that this bias impairs the models' ability to recall distant information and introduces robustness issues. Our scaling experiments then discovered that deeper structures in SSMs can facilitate the learning of long contexts. However, subsequent theoretical analysis reveals that as SSMs increase in depth, they exhibit another inevitable tendency toward over-smoothing, e.g., token representations becoming increasingly indistinguishable. This fundamental dilemma between recency and over-smoothing hinders the scalability of existing SSMs. Inspired by our theoretical findings, we propose to polarize two channels…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vita-group/ssm-bottleneck
pytorchOfficial

Models

🤗
peihaowang/ssm-bottleneck-imgcls-attack
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Modeling and Causal Inference