On the Relation Between Linear Diffusion and Power Iteration

Dana Weitzner; Mauricio Delbracio; Peyman Milanfar; Raja Giryes

arXiv:2410.14730·cs.LG·October 22, 2024

On the Relation Between Linear Diffusion and Power Iteration

Dana Weitzner, Mauricio Delbracio, Peyman Milanfar, Raja Giryes

PDF

Open Access 3 Reviews

TL;DR

This paper explores the connection between linear diffusion models and power iteration, revealing how diffusion processes converge to dominant data features and extending insights to non-linear denoisers in image generation.

Contribution

It establishes a theoretical link between diffusion models and power iteration, providing analytical insights into the denoising process and its convergence properties.

Findings

01

Linear diffusion converges to the leading eigenvector of data.

02

Low-frequency components emerge earlier during generation.

03

Results extend to non-linear denoisers in image tasks.

Abstract

Recently, diffusion models have gained popularity due to their impressive generative abilities. These models learn the implicit distribution given by the training dataset, and sample new data by transforming random noise through the reverse process, which can be thought of as gradual denoising. In this work, we examine the generation process as a ``correlation machine'', where random noise is repeatedly enhanced in correlation with the implicit given distribution. To this end, we explore the linear case, where the optimal denoiser in the MSE sense is known to be the PCA projection. This enables us to connect the theory of diffusion models to the spiked covariance model, where the dependence of the denoiser on the noise level and the amount of training data can be expressed analytically, in the rank-1 case. In a series of numerical experiments, we extend this result to general low rank…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 3Confidence 4

Strengths

The authors aim to provide an analysis of the diffusion process, an important question for diffusion models. This is achieved using a simple model. Overall, the writing is easy to follow.

Weaknesses

Providing theoretical studies for complex problems in deep learning and deep generative models is often very challenging. Thus, it is common practice to study a simpler problem that can shed light on the underlying mechanisms of more complex ones. While this work falls into this category, I found that the results for the linear diffusion model diverge significantly from the phenomena observed in real cases. (1) Theorem 4.3 shows that the diffusion process only converges to the dominant eigenvec

Reviewer 02Rating 3Confidence 5

Strengths

The paper has a clear structure and is easy to read. The topic is also interesting since studying the evolution of the denoisers along diffusion trajectory is important but less well explored in the literature.

Weaknesses

I have doubt on both the theoretical and practical aspects of this work. The theoretical results do not seem correct (correct me if I am wrong) and the connection to practice is weak. I list my questions as below: *Major questions: (i) Can the author explain how do they conclude Theorem 4.3 ? The author claimed that the final projection operator is a diagonal matrix with a spectrum concentrate around the first eigenvalue (line 409-412) and they supported this claim qualitatively in Figure 5. H

Reviewer 03Rating 6Confidence 3

Strengths

The paper provides a novel perspective on diffusion models for understanding the generation process by drawing parallels with power iteration. The work opens up new research directions for improving the interpretability and performance of generative models by highlighting the eigenvector alignment properties in diffusion models. The paper is well-written with most claims justified theoretically and backed by empirical evidence.

Weaknesses

The paper does a good job at conveying the idea and intuition from the simulations, however, the proof of the main result (Theorem 4.3), could be refined more. Particularly, I can not fully understand the proof in the high noise regime i.e., $\tau \leq t \leq T$. A more explicit explanation for the following statement would enhance the rigor: how exactly does Assumption 4.2 guarantee that $U_\tau^\top U_{\tau+1}$ is diagonal just enough not to spoil the diagonality of the next partial operator $

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms

MethodsDiffusion · Principal Components Analysis