Physics-Informed Distillation of Diffusion Models for PDE-Constrained Generation
Yi Zhang, Difan Zou

TL;DR
This paper introduces PIDDM, a post-hoc distillation method that enforces PDE constraints on diffusion models, improving physical fidelity and efficiency in generative modeling of physical systems.
Contribution
Proposes a novel post-hoc distillation approach to incorporate PDE constraints into diffusion models, enhancing physical accuracy without compromising generative quality.
Findings
PIDDM improves PDE satisfaction over baselines.
Supports both forward and inverse PDE problems.
Achieves better physical fidelity with less computation.
Abstract
Modeling physical systems in a generative manner offers several advantages, including the ability to handle partial observations, generate diverse solutions, and address both forward and inverse problems. Recently, diffusion models have gained increasing attention in the modeling of physical systems, particularly those governed by partial differential equations (PDEs). However, diffusion models only access noisy data at intermediate steps, making it infeasible to directly enforce constraints on the clean sample at each noisy level. As a workaround, constraints are typically applied to the expectation of clean samples , which is estimated using the learned score network. However, imposing PDE constraints on the expectation does not strictly represent the one on the true clean data, known as Jensen's Gap.…
Peer Reviews
Decision·ICLR 2026 Conference Withdrawn Submission
**S1.** The paper presents a clear and well-motivated solution to the Jensen’s gap problem in physics-informed diffusion models through a principled post-hoc distillation framework that enforces constraints directly on generated samples. **S2.** The approach is conceptually simple yet effective, showing strong and consistent performance across diverse PDE benchmarks and tasks such as forward, inverse, and reconstruction problems, outperforming competitive baselines. **S3.** The experiments inc
**W1.** The theoretical and methodological novelty of the work is incremental. While the paper presents a clear and convincing empirical illustration of the Jensen’s gap, the idea of distillation in constrained generative modeling has already been explored in frameworks such as rectified flows and consistency models. The proposed post-hoc distillation strategy largely extends existing one-step distillation approaches rather than introducing a fundamentally new formulation. It also remains unclea
1) Originality — A post-hoc distillation that directly penalizes the PDE residual on final samples $x_0$ via $\|R(x_0)\|^2$ reduces the mismatch from enforcing $R\!\left(\mathbb{E}[x_0\mid x_t]\right)$; a single distilled model supports forward, inverse, and reconstruction using masked latent optimization $x_{\mathrm{mix}}=x'\odot M+d_{\theta'}(\epsilon)\odot(1-M)$, a creative combination that removes multi-stage guidance limits. 2) Quality — Experiments across multiple PDEs (e.g., Darcy/Poisso
1. No theoretical guarantee for the “Jensen’s Gap” claim: the paper asserts a mismatch $R(\mathbb{E}[x_0\mid x_t]) \neq \mathbb{E}[R(x_0)\mid x_t]$ (Eq. 4) and says the method “bypasses” it, but provides no proposition/bound/consistency theorem; the Limitations section does not add theory. 2. Navier–Stokes residual use is not ablated: the NS setup is described, but it is unclear whether the residual is enforced over full space–time versus endpoints/boundaries, and no ablation is reported to ass
1. It proposes a new perspective to solve the challenge of PDE constraint in generation. This perspective is quite different from previous approaches that suffer from the inaccuracy caused by Jensen's gap. 2. The presentation is clear and easy to follow. 3. The experiments are convincing. The compared methods are complete and the results of the proposed method are strong.
1. For the contributions claimed in the Introduction Section, the "empirical confirmation of Jensen’s gap" can hardly be listed as one of the main contributions. Jensen’s gap uses $\mathbb{E}(x_0|x_t)$ to estimate clean samples from $x_t$ as a replacement for the real unknown $x_0$ in optimizing a certain objective or satisfying certain constraints. This is a widely adopted practice and recognized issue in the diffusion models community. Thus, the empirical confirmation of this gap in generative
- Originality. This paper takes a simple but fresh angle: stop enforcing physics on noisy states on posterior means and put the PDE loss on the actual final samples. Doing this via teacher -> one step student distillation with PDE loss is a clean combo. It is not a brand-new primitive but a well-targeted rethink of where the constraints belongs. - Quality. The paper supports its claims across some PDE benchmarks, and comprarisons are made against some of the relevant and competitive baselines.
### Unjustified Complexity for Downstream Tasks Compared to Standard Inverse Problem Methods The paper's approach to downstream tasks uses a complex, iterative optimization that appears potentially redundant, relatively fragile, and poorly justified given the problem's inherent structure and existing solution paradigms. - By modeling the joint filed $x=(u,a)$, all downstream tasks inherently become inverse problems: estimating the unknown components of $x$ given partial observations defined by m
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Block Copolymer Self-Assembly
MethodsSoftmax · Attention Is All You Need · Diffusion
