LaMamba-Diff: Linear-Time High-Fidelity Diffusion Models Based on Local Attention and Mamba
Yunxiang Fu, Chaoqi Chen, Yizhou Yu

TL;DR
LaMamba-Diff introduces a linear-time diffusion model that combines local attention and Mamba to efficiently capture both global and local features, outperforming existing models in scalability and performance on ImageNet.
Contribution
The paper proposes LaMamba blocks that integrate self-attention and Mamba, enabling linear complexity diffusion models with high fidelity for visual generation.
Findings
Outperforms DiT models across various scales on ImageNet.
Reduces GFLOPs by up to 62% compared to DiT-XL/2.
Achieves superior performance with fewer parameters.
Abstract
Recent Transformer-based diffusion models have shown remarkable performance, largely attributed to the ability of the self-attention mechanism to accurately capture both global and local contexts by computing all-pair interactions among input tokens. However, their quadratic complexity poses significant computational challenges for long-sequence inputs. Conversely, a recent state space model called Mamba offers linear complexity by compressing a filtered global context into a hidden state. Despite its efficiency, compression inevitably leads to information loss of fine-grained local dependencies among tokens, which are crucial for effective visual generative modeling. Motivated by these observations, we introduce Local Attentional Mamba (LaMamba) blocks that combine the strengths of self-attention and Mamba, capturing both global contexts and local details with linear complexity.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Adaptive Filtering Techniques · Statistical Methods and Inference
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Max Pooling · U-Net · Diffusion · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
