Differential Mamba

Nadav Schneider; Itamar Zimerman; Eliya Nachmani

arXiv:2507.06204·cs.LG·October 30, 2025

Differential Mamba

Nadav Schneider, Itamar Zimerman, Eliya Nachmani

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel differential mechanism for the Mamba architecture, enhancing its ability to mitigate attention overallocation and improve language modeling performance, especially in retrieval tasks.

Contribution

We adapt differential design techniques to Mamba, developing a new mechanism that improves its efficiency and effectiveness in language modeling tasks.

Findings

01

Enhanced retrieval capabilities in Mamba-based models

02

Superior performance over vanilla Mamba on benchmarks

03

Effective mitigation of attention overallocation issues

Abstract

Sequence models like Transformers and RNNs often overallocate attention to irrelevant context, leading to noisy intermediate representations. This degrades LLM capabilities by promoting hallucinations, weakening long-range and retrieval abilities, and reducing robustness. Recent work has shown that differential design can mitigate this issue in Transformers, improving their effectiveness across various applications. In this paper, we explore whether these techniques, originally developed for Transformers, can be applied to Mamba, a recent architecture based on selective state-space layers that achieves Transformer-level performance with greater efficiency. We show that a naive adaptation of differential design to Mamba is insufficient and requires careful architectural modifications. To address this, we introduce a novel differential mechanism for Mamba, empirically validated on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

nadavsc/diff-mamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Generative Adversarial Networks and Image Synthesis

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces