Generative Sliced MMD Flows with Riesz Kernels
Johannes Hertrich, Christian Wald, Fabian Altekr\"uger, Paul Hagemann

TL;DR
This paper introduces an efficient method for computing MMD flows with Riesz kernels, enabling scalable training of generative models for images by leveraging sliced MMD and Riesz kernel properties.
Contribution
The paper demonstrates that MMD with Riesz kernels can be computed efficiently using sliced versions, reducing computational complexity and enabling scalable generative modeling.
Findings
Efficient gradient computation for MMD flows with Riesz kernels.
Reduction of complexity from quadratic to near-linear for certain cases.
Successful application to image generation on MNIST, FashionMNIST, and CIFAR10.
Abstract
Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels , have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for , a simple sorting algorithm can be applied to reduce the complexity from to for two measures with and support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by…
Peer Reviews
Decision·ICLR 2024 poster
The paper is easy to follow and read. The proposed method is simple and computational efficient. The experiment results showed an improvement of FID in MNIST and FashionMNIST data sets.
All the theory part is quite simple, specially the important theorem 1, which proved that the Sliced Riesz kernel is an equivalent form of Riesz kernel. I have the same impression for the sorting algorithm in 1-D case and results of error bound for stochastic MMD gradient in theorem 4. The experimental part is very limited with few experiments. The methods is shown to work with simple data sets like MNIST and FashionMNIST, when they considered a much more complicated-structure data set like CI
Major theoretical claims are correct, and proofs seem convincing, though I have not checked all of them.
The paper is dedicated to accelerating the computation of the gradient of the sliced MMD with the Riesz kernel. Experiments are dedicated to a new algorithm for generative modeling (Algorithm 3 described in Appendix). A natural question appears: what is responsible for good results on MNIST/FashionMNIST/CIFAR10? Is it the sequential approach to train MMD flows, or the fact that gradients are estimated better, or the fact that Riesz kernel defines such a special MMD, or maybe specifics of archite
I find this article to be well-written and its contributions to be interesting. Efficient MMD computation is indeed an important point, not only for MMD flows. The article addresses an important problem and offers an elegant solution for Riesz kernels. However, there are several points that appear to need correction or, at the very least, further elaboration.
- Concerning Theorem 2: Theorem 2 establishes bounds between MMD and Wasserstein distance of order 1. In my opinion, these results are not very sharp, and there appears to be an important missing reference here. Under the same assumptions of compact support, the article [1, Theorem 1] demonstrates that the Wasserstein distance $W_1$ is bounded by an MMD with the Coulomb kernel $k(x, y) = -|x - y|^{2-d}$ but without the power dependency of $1/(d+1)$. Since the measures are bounded, MMD with the
Code & Models
Videos
Taxonomy
TopicsStochastic processes and financial applications · Lattice Boltzmann Simulation Studies
