Physics Informed Distillation for Diffusion Models

Joshua Tian Jin Tee; Kang Zhang; Hee Suk Yoon; Dhananjaya Nagaraja; Gowda; Chanwoo Kim; Chang D. Yoo

arXiv:2411.08378·cs.LG·November 14, 2024

Physics Informed Distillation for Diffusion Models

Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja, Gowda, Chanwoo Kim, Chang D. Yoo

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces Physics Informed Distillation (PID), a novel method that leverages the connection between diffusion models and ODEs, using PINNs principles to efficiently distill diffusion models without synthetic data, achieving competitive performance.

Contribution

The paper proposes PID, a new distillation approach for diffusion models that incorporates physics-informed principles, reducing complexity and improving usability compared to existing methods.

Findings

01

PID achieves performance comparable to recent distillation methods.

02

PID demonstrates predictable hyperparameter trends.

03

PID eliminates the need for synthetic dataset generation during distillation.

Abstract

Diffusion models have recently emerged as a potent tool in generative modeling. However, their inherent iterative nature often results in sluggish image generation due to the requirement for multiple model evaluations. Recent progress has unveiled the intrinsic link between diffusion models and Probability Flow Ordinary Differential Equations (ODEs), thus enabling us to conceptualize diffusion models as ODE systems. Simultaneously, Physics Informed Neural Networks (PINNs) have substantiated their effectiveness in solving intricate differential equations through implicit modeling of their solutions. Building upon these foundational insights, we introduce Physics Informed Distillation (PID), which employs a student model to represent the solution of the ODE system corresponding to the teacher diffusion model, akin to the principles employed in PINNs. Through experiments on CIFAR 10 and…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. The single step generation ability of PID is competitive on CIFAR10. 2. The training cost per step of PID is smaller compared to PD and CD. 3. The training of PID does require any extra data.

Weaknesses

1. PID cannot further improve the sample quality by investigating more NFEs. It is limited to single-step generation, where the performance is not that impressive. 2. Equation 9 and 8 are equivalent up to a scaling factor for L2 metric, but not for arbitrary distance metric such as LPIPS, which is used for the main results. Changing $L_{PINN}$ from equation 8 to 9 will change the loss landscape. However, this step is not justified or explained in the paper. Why not use the original PINN loss g

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The proposed method and the corresponding nemerical methods can achieve comparable results to other distillation methods such as consistency distillation. - The presentation is easy to follow and the algorithms are quite neat.

Weaknesses

- Major: - **Lack of an important related work: BOOT[1]**. The proposed method seems **almost exactly the same as BOOT**, because they both distill the integral from time $T$ to time $t$, with the same integral and numerical differential method. Please compare with BOOT in details and discuss more about the own contirbutions. - Minor: - The results in Table 1 is unfair. Some of the results are based on the checkpoint of the VPSDE in ScoreSDE[2], but some of the results are based on

Reviewer 03Rating 5· marginally below the acceptance thresholdConfidence 4

Strengths

- The authors make an interesting connection between distillation of diffusion models and PINNs via enforcement of the probability flow ODE. - The authors propose PID, a relatively simple method for distillation, which shows results comparable to state-of-the-art single-step image generation for CIFAR10 and ImageNet64. - The paper is generally well-written and clear. - The PID distillation method achieves results comparable to current state-of-the-art single-step image generation methods (1) usi

Weaknesses

- The specific parameterization of the PID model (Eqn. 7) seems somewhat undermotivated to me. The authors mention they take inspiration from the two common approaches to enforcing boundary conditions with PINNs, soft and strict conditions. However, beyond this high-level explanation, the parameterization is not justified and no ablations are performed. - Similarly, a first-order numerical approximation of the residual loss is proposed for the sake of efficient training, but no ablations are per

Code & Models

Repositories

pantheon5100/pid_diffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsProcess Optimization and Integration · Advanced Control Systems Optimization

MethodsDiffusion