Softly Constrained Denoisers for Diffusion Models

Victor M. Yeom-Song; Severi Rissanen; Arno Solin; Samuel Kaski; Mingfei Sun

arXiv:2512.14980·cs.LG·February 6, 2026

Softly Constrained Denoisers for Diffusion Models

Victor M. Yeom-Song, Severi Rissanen, Arno Solin, Samuel Kaski, Mingfei Sun

PDF

Open Access 3 Reviews

TL;DR

This paper introduces softly constrained denoisers for diffusion models that incorporate constraint guidance directly into the denoising process, improving compliance without biasing the entire model.

Contribution

It proposes a novel method to embed constraint guidance into the denoising step, avoiding bias from loss-based regularization and allowing flexibility in case of constraint misspecification.

Findings

01

Improved constraint compliance over standard denoisers

02

Maintains flexibility to deviate from constraints when misspecified

03

Enhances diffusion model applicability in scientific contexts

Abstract

Diffusion models struggle to produce samples that respect constraints, a common requirement in scientific applications. Recent approaches have introduced regularization terms in the loss or guidance methods during sampling to enforce such constraints, but they bias the generative model away from the true data distribution. This is a problem when the constraint is misspecified, which is a common issue in scientific applications where constraint formulation is challenging. We propose to integrate guidance-inspired adjustments to the denoiser, instead of the loss or sampling loop. This achieves a soft inductive bias towards constraint-compliant samples. We show that these softly constrained denoisers exploit constraint knowledge to improve compliance over standard denoisers, while maintaining enough flexibility to deviate from it in case of misspecification with observed data.

Peer Reviews

Decision·Submitted to ICLR 2026

Reviewer 01Rating 2Confidence 4

Strengths

The paper offers a new perspective on constrained distribution estimation by introducing a learnable scaling factor to replace the covariance term. This substitution effectively reduces computational complexity while preserving the structure of standard diffusion training and sampling schemes, requiring no architectural or procedural modifications.

Weaknesses

1. The method builds heavily on prior work, particularly on the approximation from Eq. (6) to Eq. (4), and the main contribution lies in substituting the covariance with a learnable scaling factor. While this reduces computation, it also diminishes interpretability and controllability. There is no analysis about the relationship between the scaling factor and covariance. 2. The experimental section is weak. The experiments in Fig 1 and Fig 2 are performed only on toy datasets. The quantitative

Reviewer 02Rating 2Confidence 3

Strengths

On toy example of data generation on a circle with misspecified constraints, SCD maintains low Wasserstein-1 distance (i.e., better data fidelity) while PIDM degrades as misspecification increases, indicating robustness of SCD to wrong constraints. On the Darcy Flow PDE benchmark, SCD avoids the strong distributional bias observed with PIDM.

Weaknesses

The theoretical idea is not sufficiently novel. The experiments are limited to a toy setup of circles and a single PDE benchmark, so external validity across other constraint types (hard equalities/inequalities, manifold constraints) is unclear, and distributional fidelity is mostly argued via histograms/qualitative plots rather than likelihood/FID score. Methodologically, the denoiser correction hinges on aggressive approximations which avoids VJPs but offers no guarantee against bias or consis

Reviewer 03Rating 4Confidence 3

Strengths

1. The idea of integrating constraints through architecture, not loss is interesting. 2. The demonstrated robustness can handle constraint misspecification adaptively. 3. Minimal computational cost, drop-in compatible with standard denoisers.

Weaknesses

1. Although the proposed method is more flexible than traditional physics-informed regularization, it still relies on the correctness of the constraint function (for example, the residual term that encodes a physical law). If that constraint is inaccurate or incomplete, the model can still be guided in the wrong direction. The paper argues that the learnable scaling factor can help the model “ignore” bad constraints, but there is no formal guarantee that this will always happen. In highly misspe

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Markov Chains and Monte Carlo Methods · Gaussian Processes and Bayesian Inference