InfoMamba: An Attention-Free Hybrid Mamba-Transformer Model
Youjin Wang, Jiaqiao Zhao, Rong Fu, Run Zhou, Ruizhe Zhang, Jiani Liang, Suisuai Cao, Feng Zhou

TL;DR
InfoMamba introduces an attention-free hybrid model combining linear filtering and recurrent streams, effectively capturing local and global dependencies with improved efficiency and accuracy across various tasks.
Contribution
This work presents the first hybrid architecture that integrates SSMs with a global filtering layer, guided by a mutual information objective, to enhance sequence modeling.
Findings
Outperforms Transformer and SSM baselines in multiple tasks
Achieves competitive accuracy-efficiency trade-offs
Maintains near-linear scaling in performance
Abstract
Balancing fine-grained local modeling with long-range dependency capture under computational constraints remains a central challenge in sequence modeling. While Transformers provide strong token mixing, they suffer from quadratic complexity, whereas Mamba-style selective state-space models (SSMs) scale linearly but often struggle to capture high-rank and synchronous global interactions. We present a consistency boundary analysis that characterizes when diagonal short-memory SSMs can approximate causal attention and identifies structural gaps that remain. Motivated by this analysis, we propose InfoMamba, an attention-free hybrid architecture. InfoMamba replaces token-level self-attention with a concept bottleneck linear filtering layer that serves as a minimal-bandwidth global interface and integrates it with a selective recurrent stream through information-maximizing fusion (IMF). IMF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTime Series Analysis and Forecasting · Generative Adversarial Networks and Image Synthesis · Data Stream Mining Techniques
