Block-Biased Mamba for Long-Range Sequence Processing

Annan Yu; N. Benjamin Erichson

arXiv:2505.09022·cs.LG·May 15, 2025

Block-Biased Mamba for Long-Range Sequence Processing

Annan Yu, N. Benjamin Erichson

PDF

Open Access

TL;DR

This paper identifies limitations of Mamba in long-range sequence tasks, analyzes the causes, and proposes B2S6, an extension that improves its expressiveness, stability, and performance on long-range benchmarks.

Contribution

It provides a theoretical analysis of Mamba's shortcomings and introduces B2S6, a novel extension that enhances long-range sequence processing capabilities.

Findings

01

B2S6 outperforms S4 and S4D on Long-Range Arena tasks.

02

B2S6 maintains Mamba's performance on language modeling.

03

Theoretical analysis reveals Mamba's limitations in expressiveness, inductive bias, and stability.

Abstract

Mamba extends earlier state space models (SSMs) by introducing input-dependent dynamics, and has demonstrated strong empirical performance across a range of domains, including language modeling, computer vision, and foundation models. However, a surprising weakness remains: despite being built on architectures designed for long-range dependencies, Mamba performs poorly on long-range sequential tasks. Understanding and addressing this gap is important for improving Mamba's universality and versatility. In this work, we analyze Mamba's limitations through three perspectives: expressiveness, inductive bias, and training stability. Our theoretical results show how Mamba falls short in each of these aspects compared to earlier SSMs such as S4D. To address these issues, we propose $B_{2} S_{6}$ , a simple extension of Mamba's S6 unit that combines block-wise selective dynamics with a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlind Source Separation Techniques

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces