Why Heuristic Weighting Works: A Theoretical Analysis of Denoising Score Matching
Juyan Zhang, Rhys Newbury, Xinyang Zhang, Tin Tran, Dana Kulic, Michael Burke

TL;DR
This paper provides a theoretical foundation for the heuristic weighting used in denoising score matching, showing it approximates the optimal weighting and can lead to more stable training in diffusion models.
Contribution
It formally derives optimal weighting functions for denoising score matching, explaining the heuristic's effectiveness and its relation to heteroskedasticity.
Findings
Heuristic weighting approximates the trace of the expected optimal weighting.
Optimal weighting can have lower variance than heuristic weighting.
Theoretical and empirical analysis supports the use of heuristic weighting in practice.
Abstract
Score matching enables the estimation of the gradient of a data distribution, a key component in denoising diffusion models used to recover clean data from corrupted inputs. In prior work, a heuristic weighting function has been used for the denoising score matching loss without formal justification. In this work, we demonstrate that heteroskedasticity is an inherent property of the denoising score matching objective. This insight leads to a principled derivation of optimal weighting functions for generalized, arbitrary-order denoising score matching losses, without requiring assumptions about the noise distribution. Among these, the first-order formulation is especially relevant to diffusion models. We show that the widely used heuristical weighting function arises as a first-order Taylor approximation to the trace of the expected optimal weighting. We further provide theoretical and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neuroimaging Techniques and Applications · Generative Adversarial Networks and Image Synthesis · Model Reduction and Neural Networks
