Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models and Time-Dependent Layer Normalization
Qihao Liu, Zhanpeng Zeng, Ju He, Qihang Yu, Xiaohui Shen, Liang-Chieh, Chen

TL;DR
This paper enhances diffusion-based image generation by introducing a multi-resolution network and time-dependent layer normalization, significantly reducing image distortion and achieving state-of-the-art results on ImageNet benchmarks.
Contribution
It proposes DiMR, a multi-resolution framework, and TD-LN, a novel time-dependent normalization method, improving image fidelity and efficiency in diffusion models.
Findings
Achieved new state-of-the-art FID scores of 1.70 on ImageNet 256x256
Outperformed prior diffusion models on high-resolution image generation
Demonstrated effectiveness of multi-resolution and time-dependent normalization techniques
Abstract
This paper presents innovative enhancements to diffusion models by integrating a novel multi-resolution network and time-dependent layer normalization. Diffusion models have gained prominence for their effectiveness in high-fidelity image generation. While conventional approaches rely on convolutional U-Net architectures, recent Transformer-based designs have demonstrated superior performance and scalability. However, Transformer architectures, which tokenize input data (via "patchification"), face a trade-off between visual fidelity and computational complexity due to the quadratic nature of self-attention operations concerning token length. While larger patch sizes enable attention computation efficiency, they struggle to capture fine-grained visual details, leading to image distortions. To address this challenge, we propose augmenting the Diffusion model with the Multi-Resolution…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptical measurement and interference techniques · Advanced Vision and Imaging · Image Processing Techniques and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Residual Connection · Softmax · Max Pooling · Layer Normalization · U-Net · Byte Pair Encoding · Label Smoothing
