Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance
Matina Mahdizadeh Sani, Nima Jamali, Mohammad Jalali, Farzan Farnia

TL;DR
This paper introduces MMD Guidance, a training-free method that uses Maximum Mean Discrepancy to adapt pre-trained diffusion models to new target data distributions efficiently, especially useful in domain adaptation with limited data.
Contribution
It proposes a novel, training-free MMD-based guidance mechanism for diffusion models that directly aligns generated samples with target distributions without retraining.
Findings
Achieves effective distributional alignment with limited data
Preserves sample quality while adapting to new distributions
Applicable to latent diffusion models for computational efficiency
Abstract
Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose MMD Guidance, a training-free mechanism that augments the reverse diffusion process with gradients of the Maximum Mean Discrepancy (MMD) between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in…
Peer Reviews
Decision·Submitted to ICLR 2026
- The authors use the maximum mean discrepancy (MMD) between generated samples and a reference dataset to address the distribution mismatching problem. This is a natural approach to guide generation toward our desired distribution. - The MMD is a practical metric that measures the discrepancy between generation and reference distributions, since it avoids the curse of dimensionality from other metrics. - The authors provide an explicit evaluation of MMD gradient and illustrate the necessity o
- The maximum mean discrepancy is used to measure the distribution mismatching at every step. This might not be the best way. The reason is that initially distribution mismatching should be large, and gradually reduces as generation ends. We only need to ensure the final few steps have small distribution mismatching. - As shown in Theorems 1 and 2, it takes a large number of reference data to ensure that the empirical cross term is close to the ideal one. Since this is required for all time
* MMD Guidance is intuitive: it leverages MMD to guide transfer tasks in a training-free manner. * Theoretical derivations and toy examples are sensible and support the effectiveness of MMD Guidance. * The experiments show that MMD Guidance outperforms the CG and fine-tuning baselines on the FFHQ benchmarks.
* An important reference is missing [1], which also presents a guidance-based framework for adapting a pre-trained diffusion model. Reference [1] can be regarded as the classifier-free variant of the DomainCG baseline in Table 1, and is also related to the fine-tuning baseline in Table 5. The authors should carefully discuss [1] and include it as a baseline. * As a training-free approach, MMD Guidance may struggle to adapt to domains that are quite different from the pre-training domain. Could t
a) No retraining is required b) It is easy to implement and understand c) It has low computational costs d) It is based on a strong theoretical foundation e) The method is clearly described f) It produces very good visual and quantitative results (FD, KD, Coverage) g) The experiments are reliable, with results averaged from five random tests
a) It cannot learn new styles that the diffusion model doesn't know b) Hyperparameter values (guidance strength, kernel bandwidth) are selected empirically, which can hinder the correct generation of images c) The experiments mainly focus on simple prompts, like "person with sunglasses," so its effectiveness for more complex prompts is not clear.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks
