Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and   Video Generation

Shentong Mo; Yapeng Tian

arXiv:2405.15881·cs.CV·May 28, 2024·1 cites

Scaling Diffusion Mamba with Bidirectional SSMs for Efficient Image and Video Generation

Shentong Mo, Yapeng Tian

PDF

Open Access

TL;DR

This paper introduces Diffusion Mamba, a scalable diffusion architecture that replaces traditional attention with Mamba's efficient sequence modeling, enabling faster and more resource-efficient image and video generation.

Contribution

The paper presents Diffusion Mamba, a novel diffusion model leveraging Mamba architecture for linear complexity, outperforming existing diffusion transformers in image and video generation.

Findings

01

Achieves linear complexity with respect to sequence length.

02

Outperforms existing diffusion transformers in quality and efficiency.

03

Establishes new benchmarks for scalable image and video generation.

Abstract

In recent developments, the Mamba architecture, known for its selective state space approach, has shown potential in the efficient modeling of long sequences. However, its application in image generation remains underexplored. Traditional diffusion transformers (DiT), which utilize self-attention blocks, are effective but their computational complexity scales quadratically with the input length, limiting their use for high-resolution images. To address this challenge, we introduce a novel diffusion architecture, Diffusion Mamba (DiM), which foregoes traditional attention mechanisms in favor of a scalable alternative. By harnessing the inherent efficiency of the Mamba architecture, DiM achieves rapid inference times and reduced computational load, maintaining linear complexity with respect to sequence length. Our architecture not only scales effectively but also outperforms existing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Advanced Image and Video Retrieval Techniques · Advanced Vision and Imaging

MethodsDiffusion