Beyond Loss Guidance: Using PDE Residuals as Spectral Attention in Diffusion Neural Operators
Medha Sawhney, Abhilash Neog, Mridul Khurana, Anuj Karpatne

TL;DR
PRISMA introduces a spectral attention mechanism in diffusion neural operators that embeds PDE residuals directly into the model, enabling fast, robust, and hyperparameter-free inference for solving PDEs, especially with noisy data.
Contribution
It proposes a novel spectral attention approach that incorporates PDE residuals into the neural operator architecture, eliminating the need for slow gradient-based optimization.
Findings
Achieves comparable accuracy with significantly fewer denoising steps.
Provides faster inference, up to 250 times quicker than previous methods.
Demonstrates robustness to noisy PDE residuals across multiple benchmarks.
Abstract
Diffusion-based solvers for partial differential equations (PDEs) are often bottle-necked by slow gradient-based test-time optimization routines that use PDE residuals for loss guidance. They additionally suffer from optimization instabilities and are unable to dynamically adapt their inference scheme in the presence of noisy PDE residuals. To address these limitations, we introduce PRISMA (PDE Residual Informed Spectral Modulation with Attention), a conditional diffusion neural operator that embeds PDE residuals directly into the model's architecture via attention mechanisms in the spectral domain, enabling gradient-descent free inference. In contrast to previous methods that use PDE loss solely as external optimization targets, PRISMA integrates PDE residuals as integral architectural features, making it inherently fast, robust, accurate, and free from sensitive hyperparameter tuning.…
Peer Reviews
Decision·Submitted to ICLR 2026
1. The spectral residual attention block seems to be a novel module for conditioning diffusion models/operators on residuals from sparse and noise observations. This can lead to improved performance, in particular for noisy observations (potentially due to the learned gating mechanism). Since no guidance is needed during inference, this further reduces the required number of steps and inference time compared to DiffusionPDE and FunDPS. 2. The paper includes a comprehensive set of experiments (fr
1. A major concern is that PRISMA considers a different setting than the two main baselines. The baselines only require paired data, but no prior knowledge of the PDE or the corruptions (type of masking/noise/etc.) during training. In other words, these guidance-based methods are *agnostic* to the corruptions and just rely on inference-time control. On the other hand, PRISMA assumes knowledge of the type of masks, noise, and PDE equations *during training*. This is a more restrictive setting and
### Novel PDE-informed guidance mechanism Using complex-valued attention in Fourier space to modulate frequencies based on PDE residuals is genuinely novel. The authors established via gating and normalization a well rounded method to perform this task. ### Impressive speedup of the inference (for diffusion models) The 20-step inference while achieving comparative accuracy in some tasks is impressive. Having a speedup of 15-250x is the strongest contribution of this work. This brings the neural
### Method is only decent on full observations, compared to other models (Table 6). In this case PINO and the other models seem to outperform the proposed method (in speed and accuracy) ### Noise robustness claims (Table 3). Overall this table shows that diffusion models are better suited for noisy data. You should compare PINO/FNO trained with the same data augmentation (noise injection during training). The current comparison falls flat, as this setting would be out-of-distribution data for P
- (Originality) This work introduces a new framework that can incorporate physical information other than direct calculation and backpropagation of physical loss, which seems to reduce the training time and inference steps for diffusion-based frameworks. - (Clarity) Besides some minor issues with mathematical symbols (see weaknesses part), the general presentation of this work is easy to follow.
- (Wrong Physical Residual Calculation of NS) One of the most serious problems with this paper is that, the calculation of physical residual for nonbounded NS equations, which was adopted from DiffusionPDE, is actually wrong. The vorticity $\vec{\omega}(x,y)=\vec{\nabla} \times \vec{v}(x, y)$ is an (axial-)vector which only has $z$ component and is a function of $x, y$. Therefore, its zero divergency, $\vec{\nabla} \cdot \vec{\omega}(x,y) = \frac{\partial \omega}{\partial z} = 0$ cannot be regar
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis · Neural Networks and Reservoir Computing
