DiM: Diffusion Mamba for Efficient High-Resolution Image Synthesis
Yao Teng, Yue Wu, Han Shi, Xuefei Ning, Guohao Dai, Yu Wang, Zhenguo, Li, Xihui Liu

TL;DR
This paper introduces Diffusion Mamba (DiM), a novel high-resolution image synthesis method combining the efficiency of Mamba sequence models with diffusion models, achieving faster inference and training efficiency for high-res images.
Contribution
We propose DiM, a new architecture that integrates Mamba with diffusion models, including design innovations for 2D signals and training strategies for high-resolution image generation.
Findings
DiM achieves inference-time efficiency for high-resolution images.
Pretraining on low-res images and finetuning improves training efficiency.
Training-free upsampling enables higher-resolution image generation.
Abstract
Diffusion models have achieved great success in image generation, with the backbone evolving from U-Net to Vision Transformers. However, the computational cost of Transformers is quadratic to the number of tokens, leading to significant challenges when dealing with high-resolution images. In this work, we propose Diffusion Mamba (DiM), which combines the efficiency of Mamba, a sequence model based on State Space Models (SSM), with the expressive power of diffusion models for efficient high-resolution image synthesis. To address the challenge that Mamba cannot generalize to 2D signals, we make several architecture designs including multi-directional scans, learnable padding tokens at the end of each row and column, and lightweight local feature enhancement. Our DiM architecture achieves inference-time efficiency for high-resolution images. In addition, to further improve training…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Generative Adversarial Networks and Image Synthesis · Image Retrieval and Classification Techniques
MethodsConvolution · *Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Max Pooling · U-Net · Diffusion
