Does Diffusion Beat GAN in Image Super Resolution?
Denis Kuznedelev, Valerii Startsev, Daniil Shlenskii, Sergey, Kastryulin

TL;DR
This paper rigorously compares diffusion-based and GAN-based image super resolution models under controlled conditions, revealing that GANs can match or outperform diffusion models when scaled equally, and examines the effects of design choices.
Contribution
It provides a fair comparison between diffusion and GAN models for ISR, showing GANs can be competitive, and analyzes how design choices influence performance.
Findings
GAN models can achieve comparable or better results than diffusion models when scaled equally.
Design choices like text conditioning and augmentation significantly impact ISR performance.
Controlled experiments reveal the true capabilities of each approach independent of model size.
Abstract
There is a prevalent opinion that diffusion-based models outperform GAN-based counterparts in the Image Super Resolution (ISR) problem. However, in most studies, diffusion-based ISR models employ larger networks and are trained longer than the GAN baselines. This raises the question of whether the high performance stems from the superiority of the diffusion paradigm or if it is a consequence of the increased scale and the greater computational resources of the contemporary studies. In our work, we thoroughly compare diffusion-based and GAN-based Super Resolution models under controlled settings, with both approaches having matched architecture, model and dataset sizes, and computational budget. We show that a GAN-based model can achieve results comparable or superior to a diffusion-based model. Additionally, we explore the impact of popular design choices, such as text conditioning and…
Peer Reviews
Decision·Submitted to ICLR 2025
Overall, this type of work should be appreciated as it probes deeper into what the differences in paradigms are when it comes to the SR task. The authors ensure that the setup for both paradigms is as comparable as possible through the architecture, datasets, etc. In general, the writing is good but some parts were confusing.
Because their experimental results were sometimes in favor of diffusion, sometimes in favor of GAN, and sometimes no difference was found, I have two suggestions that would significantly improve this paper. First, the authors should taxonomize and explain to the reader when one paradigm should be preferred and in which scenarios. Otherwise, this type of work does not help us improve actual application performance. Second, for the authors to actually be able to make such claims, they need to use
+ Conducting a comparison of GAN and Diffusion-based approaches for Super-Resolution with the same computational resource can provide good insight for the community. + The finding that given the same model size, GAN matches the performance of quality with the diffusion-based method is interesting.
**Concerns** I believe that the paper needs to have a thorough clarification. Specifically: + Claiming that GAN-based and Diffusion-based approaches give the comparison if using the same number of parameters might be relatively strong. It needs very careful investigations and evaluation. Because, if they give comparable performance, the community has no reason to use diffusion with much more cost for both training (much longer) and inference (much more sampling steps). A comparison with the sam
1. The main contribution of this paper is a fair comparison between GAN and diffusion-based ISR models, controlling for architecture, dataset size, and computational resources. 2. The authors perform detailed ablation studies, particularly focusing on the effects of pretraining, augmentations, and training with full-resolution images. 3. The paper explores not only the overall performance of the models but also the impact of various design choices such as text conditioning and augmentation, wh
1. The contribution is a bit limited as there is no really new and impactful insights presented in this work. Additionally, I'm not sure ICLR is a suitable venue for submitting this work, because it lean toward to more empirical side. May be MLSys is a better venue ? Again, I'm not sure. 2. Even though the experiment setting is quite fair, GAN-based models actually have one additional pretraining stage whereas diffusion model has to be trained from scratch, which could be a reason why diffusion
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image Processing Techniques · Image and Signal Denoising Methods · Image Processing Techniques and Applications
MethodsDiffusion
