2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Gabriel Mongaras; Eric C. Larson

arXiv:2602.17363·cs.LG·May 18, 2026

2Mamba2Furious: Linear in Complexity, Competitive in Accuracy

Gabriel Mongaras, Eric C. Larson

PDF

2 Repos 3 Models

TL;DR

This paper introduces 2Mamba, a simplified linear attention method that nearly matches softmax attention accuracy while being more memory-efficient for long sequences.

Contribution

It simplifies and improves Mamba-2 to create 2Mamba, bridging the accuracy gap with softmax attention and enhancing efficiency for long contexts.

Findings

01

2Mamba achieves near softmax attention accuracy

02

Simplification of Mamba-2 identifies key components for performance

03

Code for experiments is publicly available

Abstract

Linear attention transformers have become a strong alternative to softmax attention due to their efficiency. However, linear attention tends to be less expressive and results in reduced accuracy compared to softmax attention. To bridge the accuracy gap between softmax attention and linear attention, we manipulate Mamba-2, a very strong linear attention variant. We first simplify Mamba-2 down to its most fundamental and important components, evaluating which specific choices make it most accurate. From this simplified Mamba variant (Mamba-2S), we improve the A-mask and increase the order of the hidden state, resulting in a method, which we call 2Mamba, that is nearly as accurate as softmax attention, yet much more memory efficient for long context lengths. We also investigate elements to Mamba-2 that help surpass softmax attention accuracy. Code is provided for all our experiments.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.