Feature-guided score diffusion for sampling conditional densities
Zahra Kadkhodaie, St\'ephane Mallat, Eero P. Simoncelli

TL;DR
This paper introduces a novel feature-guided score diffusion method that improves conditional density sampling by guiding the diffusion process with projected scores based on class feature vectors, resulting in high-quality, diverse, and generalizable samples.
Contribution
The authors propose a new algorithm that guides score diffusion with projected scores learned jointly with feature vectors, enabling better conditional density estimation and out-of-distribution generalization.
Findings
Generated high-quality, diverse samples from conditioned classes.
Feature vectors form a low-dimensional Euclidean embedding of class densities.
Interpolation of feature vectors enables out-of-distribution generation.
Abstract
Score diffusion methods can learn probability densities from samples. The score of the noise-corrupted density is estimated using a deep neural network, which is then used to iteratively transport a Gaussian white noise density to a target density. Variants for conditional densities have been developed, but correct estimation of the corresponding scores is difficult. We avoid these difficulties by introducing an algorithm that guides the diffusion with a projected score. The projection pushes the image feature vector towards the feature vector centroid of the target class. The projected score and the feature vectors are learned by the same network. Specifically, the image feature vector is defined as the spatial averages of the channels activations in select layers of the network. Optimizing the projected score for denoising loss encourages image feature vectors of each class to cluster…
Peer Reviews
Decision·Submitted to ICLR 2025
1. The paper is well-structured, easy to read, and highly innovative, introducing a projected score embedded in feature space. This embedding is not only straightforward to obtain (directly extracted from the score estimation model) but also adheres to Euclidean interpolation properties. 2. The model successfully achieves conditional generation in a mixture of Gaussian distributions, demonstrating that the feature-guided score diffusion model can accurately capture conditional density—an abilit
The dataset used in experiments is overly simple. The training dataset is derived by cropping 1700 images into 234k patches. Although the patches are non-overlapping, the data distribution for each class lacks sufficient diversity. Experiments on a more complex dataset, like ImageNet, would strengthen the paper’s validity.
This is a well-written paper. Both score and feature vectors are represented with the same network. The learned feature vectors cluster around their centroids, which enhances the accuracy of sampling rom the conditional probability density. The method enables gradual transitions of the images between classes through linear interpolation of mean feature vectors. The experimental results show that a diffusion algoriothm based on the projected score provides an accurate sampling of conditional prob
The authors provided a way to build the feature vectors that share the same network weights as the score function. It is not clear how to determine the feature vector dimension.
- Classifier free guidance (CFG) is the most dominant approach for guiding diffusion models today, even though it is known to lead to biased densities. Several recent papers analyzed the drawbacks of the approach from a theoretical standpoint. However the topic of designing good practical alternatives to CFG is still under-explored. This paper attempts to fill this gap, which is undoubtedly an important goal. - The paper presents clear intuition and empirically validates that the assumptions und
- The whole motivation of the paper is to propose an alternative to existing guidance methods. However, it does not provide theoretical guarantees that the approach samples from the conditional distribution. And it also does not provide any empirical evidence that the proposed approach outperforms the standard way of conditioning diffusion models. Specifically, it does not compare the sampling quality to that obtained with a conditional denoiser (with the common conditioning mechanism for U-Nets
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Gaussian Processes and Bayesian Inference · Neural Networks and Applications
MethodsDiffusion
