Inf-DiT: Upsampling Any-Resolution Image with Memory-Efficient Diffusion Transformer
Zhuoyi Yang, Heyang Jiang, Wenyi Hong, Jiayan Teng, Wendi Zheng,, Yuxiao Dong, Ming Ding, and Jie Tang

TL;DR
This paper introduces Inf-DiT, a memory-efficient diffusion transformer that can generate ultra-high-resolution images up to 4096x4096 by adaptively managing memory during inference, outperforming existing models.
Contribution
The paper proposes a unidirectional block attention mechanism and an infinite super-resolution model based on DiT, enabling efficient ultra-high-resolution image generation with significantly reduced memory usage.
Findings
Achieves state-of-the-art ultra-high-resolution image generation.
Saves over 5x memory compared to UNet structures.
Successfully handles images of various shapes and resolutions.
Abstract
Diffusion models have shown remarkable performance in image generation in recent years. However, due to a quadratic increase in memory during generating ultra-high-resolution images (e.g. 4096*4096), the resolution of generated images is often limited to 1024*1024. In this work. we propose a unidirectional block attention mechanism that can adaptively adjust the memory overhead during the inference process and handle global dependencies. Building on this module, we adopt the DiT structure for upsampling and develop an infinite super-resolution model capable of upsampling images of various shapes and resolutions. Comprehensive experiments show that our model achieves SOTA performance in generating ultra-high-resolution images in both machine and human evaluation. Compared to commonly used UNet structures, our model can save more than 5x memory when generating 4096*4096 images. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPhotoacoustic and Ultrasonic Imaging · Advanced Image Processing Techniques · Image Processing Techniques and Applications
