Diffusion Models as Cartoonists: The Curious Case of High Density Regions
Rafa{\l} Karczewski, Markus Heinonen, Vikas Garg

TL;DR
This paper explores the high-density regions of diffusion models, introducing a mode-tracking process and a high-density sampler that generate higher likelihood images, often cartoon-like or blurry, revealing new insights into model behavior.
Contribution
The paper presents a theoretical mode-tracking method and a practical high-density sampler for diffusion models, enabling the generation of higher likelihood images and revealing novel patterns.
Findings
Higher likelihood samples often appear as cartoons or blurry images.
These patterns emerge even in datasets without such examples.
Likelihood tracking in diffusion SDEs incurs no extra computational cost.
Abstract
We investigate what kind of images lie in the high-density regions of diffusion models. We introduce a theoretical mode-tracking process capable of pinpointing the exact mode of the denoising distribution, and we propose a practical high-density sampler that consistently generates images of higher likelihood than usual samplers. Our empirical findings reveal the existence of significantly higher likelihood samples that typical samplers do not produce, often manifesting as cartoon-like drawings or blurry images depending on the noise level. Curiously, these patterns emerge in datasets devoid of such examples. We also present a novel approach to track sample likelihoods in diffusion SDEs, which remarkably incurs no additional computational cost. Code is available at https://github.com/Aalto-QuML/high-density-diffusion.
Peer Reviews
Decision·ICLR 2025 Poster
- The paper is well written and it develops new theoretical insights, especially the augmented SDE for tracking likelihood evolution are innovative and pratical since they don't introduce any additional computational cost. - The proposed high-probability sampler is a nice tool to generate high-likelihood samples that are not discoverable by traditional sampling techniques - The paper analize the diffusion probability landscape finding analyzing the different images in high-likelihood regions,
- The proposed mode-tracking approach has a very high computational cost and this is not clearly discussed. The discussion on how this mode-tracking approach scale and its computational limitations should be discussed and quantified - The high-likelihood samples discovered by the proposed analysis does not seem to have a real practical advantage, being cartoon drawings or blurry images. It is not discussed how these insight can be leveraged to improve sample generation strategies or how these h
The paper introduces a novel framework for estimating likelihoods within diffusion models using augmented stochastic differential equations (SDEs) and high-probability samplers. This is an important advancement because it allows the exploration of high-likelihood regions without increasing computational costs. By deriving density estimates through augmented SDEs, the authors provide a theoretically efficient approach to analyzing model outputs across noise levels, setting a foundation for future
1. Although the paper sheds light on high-density regions and likelihood estimation, it lacks a clear discussion on how these findings could practically inform the design or improvement of diffusion models in applied settings. Without recommendations on balancing high-likelihood sampling with image quality, it’s challenging to draw valuable insights from the findings, particularly for practitioners focused on real-world applications. 2. The paper describes the emergence of cartoon-like images i
* The authors develop a novel theory of augmented SDEs. They provided a clear and detailed derivation and coupled it with the bias estimation * Landscape analysis gives valuable insight into the structure of high-probability samples and provides a theoretical justification for the known fact that distorted images tend to have a higher likelihood * The paper is well-structured and easy to read
1. It isn't clear whether analysis from Section 5 can lead to the creation of better stochastic samplers or improve the quality of image generation. Overall practical implications of this work are quite poor 2. There is no intuition behind observations from Figure 4. More precisely, what does it mean that the model optimized for sample quality yields a smaller difference between $p_{0}^{ODE}$ and $p_{0}^{SDE}$? 3. Experiments were conducted on rather small and outdated diffusion models. It woul
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCultural Industries and Urban Development · Regional Development and Policy · Political Systems and Governance
MethodsDiffusion
