HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in   Pretrained Diffusion Models

Shen Zhang; Zhaowei Chen; Zhenyu Zhao; Yuhao Chen; Yao Tang; Jiajun; Liang

arXiv:2311.17528·cs.CV·April 30, 2024·1 cites

HiDiffusion: Unlocking Higher-Resolution Creativity and Efficiency in Pretrained Diffusion Models

Shen Zhang, Zhaowei Chen, Zhenyu Zhao, Yuhao Chen, Yao Tang, Jiajun, Liang

PDF

Open Access 1 Models

TL;DR

HiDiffusion is a tuning-free framework that enhances high-resolution image synthesis in pretrained diffusion models by dynamically adjusting features and reducing self-attention redundancy, leading to faster and more accurate results.

Contribution

The paper introduces HiDiffusion, a novel framework with Resolution-Aware U-Net and optimized self-attention, enabling higher-resolution image generation without additional training.

Findings

01

Supports resolutions up to 4096x4096

02

Achieves 1.5-6x inference speedup

03

Outperforms previous methods in quality and efficiency

Abstract

Diffusion models have become a mainstream approach for high-resolution image synthesis. However, directly generating higher-resolution images from pretrained diffusion models will encounter unreasonable object duplication and exponentially increase the generation time. In this paper, we discover that object duplication arises from feature duplication in the deep blocks of the U-Net. Concurrently, We pinpoint the extended generation times to self-attention redundancy in U-Net's top blocks. To address these issues, we propose a tuning-free higher-resolution framework named HiDiffusion. Specifically, HiDiffusion contains Resolution-Aware U-Net (RAU-Net) that dynamically adjusts the feature map size to resolve object duplication and engages Modified Shifted Window Multi-head Self-Attention (MSW-MSA) that utilizes optimized window attention to reduce computations. we can integrate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
erikayurika/brushnet
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neuroimaging Techniques and Applications · Domain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Diffusion · Max Pooling · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · U-Net