Physics Informed Distillation for Diffusion Models
Joshua Tian Jin Tee, Kang Zhang, Hee Suk Yoon, Dhananjaya Nagaraja, Gowda, Chanwoo Kim, Chang D. Yoo

TL;DR
This paper introduces Physics Informed Distillation (PID), a novel method that leverages the connection between diffusion models and ODEs, using PINNs principles to efficiently distill diffusion models without synthetic data, achieving competitive performance.
Contribution
The paper proposes PID, a new distillation approach for diffusion models that incorporates physics-informed principles, reducing complexity and improving usability compared to existing methods.
Findings
PID achieves performance comparable to recent distillation methods.
PID demonstrates predictable hyperparameter trends.
PID eliminates the need for synthetic dataset generation during distillation.
Abstract
Diffusion models have recently emerged as a potent tool in generative modeling. However, their inherent iterative nature often results in sluggish image generation due to the requirement for multiple model evaluations. Recent progress has unveiled the intrinsic link between diffusion models and Probability Flow Ordinary Differential Equations (ODEs), thus enabling us to conceptualize diffusion models as ODE systems. Simultaneously, Physics Informed Neural Networks (PINNs) have substantiated their effectiveness in solving intricate differential equations through implicit modeling of their solutions. Building upon these foundational insights, we introduce Physics Informed Distillation (PID), which employs a student model to represent the solution of the ODE system corresponding to the teacher diffusion model, akin to the principles employed in PINNs. Through experiments on CIFAR 10 and…
Peer Reviews
Decision·Submitted to ICLR 2024
1. The single step generation ability of PID is competitive on CIFAR10. 2. The training cost per step of PID is smaller compared to PD and CD. 3. The training of PID does require any extra data.
1. PID cannot further improve the sample quality by investigating more NFEs. It is limited to single-step generation, where the performance is not that impressive. 2. Equation 9 and 8 are equivalent up to a scaling factor for L2 metric, but not for arbitrary distance metric such as LPIPS, which is used for the main results. Changing $L_{PINN}$ from equation 8 to 9 will change the loss landscape. However, this step is not justified or explained in the paper. Why not use the original PINN loss g
- The proposed method and the corresponding nemerical methods can achieve comparable results to other distillation methods such as consistency distillation. - The presentation is easy to follow and the algorithms are quite neat.
- Major: - **Lack of an important related work: BOOT[1]**. The proposed method seems **almost exactly the same as BOOT**, because they both distill the integral from time $T$ to time $t$, with the same integral and numerical differential method. Please compare with BOOT in details and discuss more about the own contirbutions. - Minor: - The results in Table 1 is unfair. Some of the results are based on the checkpoint of the VPSDE in ScoreSDE[2], but some of the results are based on
- The authors make an interesting connection between distillation of diffusion models and PINNs via enforcement of the probability flow ODE. - The authors propose PID, a relatively simple method for distillation, which shows results comparable to state-of-the-art single-step image generation for CIFAR10 and ImageNet64. - The paper is generally well-written and clear. - The PID distillation method achieves results comparable to current state-of-the-art single-step image generation methods (1) usi
- The specific parameterization of the PID model (Eqn. 7) seems somewhat undermotivated to me. The authors mention they take inspiration from the two common approaches to enforcing boundary conditions with PINNs, soft and strict conditions. However, beyond this high-level explanation, the parameterization is not justified and no ablations are performed. - Similarly, a first-order numerical approximation of the residual loss is proposed for the sake of efficient training, but no ablations are per
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsProcess Optimization and Integration · Advanced Control Systems Optimization
MethodsDiffusion
