Deep MMD Gradient Flow without adversarial training

Alexandre Galashov; Valentin de Bortoli; Arthur Gretton

arXiv:2405.06780·cs.LG·May 14, 2024·1 cites

Deep MMD Gradient Flow without adversarial training

Alexandre Galashov, Valentin de Bortoli, Arthur Gretton

PDF

Open Access 3 Reviews

TL;DR

This paper introduces Diffusion-MMD-Gradient Flow, a noise-adaptive Wasserstein gradient method for generative modeling that avoids adversarial training, showing competitive results on standard image datasets.

Contribution

It proposes a novel diffusion-based MMD gradient flow method for generative modeling that does not require adversarial training, extending traditional MMD approaches.

Findings

01

Achieves competitive image generation results on CIFAR10, MNIST, CELEB-A, and LSUN datasets.

02

Demonstrates effectiveness when replacing MMD with a lower bound on KL divergence.

03

Provides a non-adversarial alternative to GAN training.

Abstract

We propose a gradient flow procedure for generative modeling by transporting particles from an initial source distribution to a target distribution, where the gradient field on the particles is given by a noise-adaptive Wasserstein Gradient of the Maximum Mean Discrepancy (MMD). The noise-adaptive MMD is trained on data distributions corrupted by increasing levels of noise, obtained via a forward diffusion process, as commonly used in denoising diffusion probabilistic models. The result is a generalization of MMD Gradient Flow, which we call Diffusion-MMD-Gradient Flow or DMMD. The divergence training procedure is related to discriminator training in Generative Adversarial Networks (GAN), but does not require adversarial training. We obtain competitive empirical performance in unconditional image generation on CIFAR10, MNIST, CELEB-A (64 x64) and LSUN Church (64 x 64). Furthermore, we…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 2

Strengths

1. The paper is clearly written and well-organized, tackling a genuine problem and effectively presenting its contributions and findings. 2. The paper establishes a solid mathematical foundation, rigorously linking each section to prior research. 3. The results presented are sufficient to validate the theoretical findings and showcase the effectiveness of the proposed approach.

Weaknesses

1. While the paper shows promising results, it is still outperformed by standard diffusion models, especially in terms of FID scores. Further work might be necessary to reach SOTA performance on larger datasets like ImageNet. 2. Related to the previous point, the experimental results are primarily limited to smaller datasets (CIFAR10, MNIST, CELEB-A, and LSUN Church), which may not reflect the potential scalability of DMMD to more complex, high-resolution datasets. 3. Although the method avoid

Reviewer 02Rating 5Confidence 2

Strengths

S1 - The paper provides a well-formulated background and problem statement, with a theoretically motivated and well-grounded DMMD framework. S2 - The idea of using an adversarial training-free discriminator based on the diffusion forward process could offer valuable insights to the community.

Weaknesses

W1 - The framework's absolute performance is a concern, as DMMD shows a significant performance gap compared to DDPM and more modern methods on the selected image generation benchmarks. W2 - Its broader application potential is limited, with empirical evaluation restricted to small datasets like MNIST and CIFAR. W3 - The sampling method appears restrictive, requiring reference features from the ground truth dataset to formulate the witness function.

Reviewer 03Rating 6Confidence 3

Strengths

The article explores the integration of a diffusion process into models based on Maximum Mean Discrepancy (MMD) gradient flow, offering a theoretical foundation for both the discriminator and the sampling process used in MMD-GAN with this diffusion approach. Additionally, it employs a linear kernel for the scalable MMD-GAN to reduce computational complexity. The authors also conduct experiments using other forms of KL divergence, such as KALE divergence, to demonstrate its effectiveness.

Weaknesses

1. I believe the contribution of this article is inadequate. Previous research has utilized the diffusion process in the discriminator, as noted in this work [1]. However, this article does not provide theoretical proof demonstrating that the MMD GAN can converge to more optimal points when using the diffusion process. 2. Additionally, the effectiveness of MMD Gradient Flow has only been tested on low-resolution datasets, which does not provide sufficient evidence to confirm its overall efficac

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPlasma and Flow Control in Aerodynamics · Fluid Dynamics and Turbulent Flows · Adversarial Robustness in Machine Learning

MethodsDiffusion