Generative Sliced MMD Flows with Riesz Kernels

Johannes Hertrich; Christian Wald; Fabian Altekr\"uger; Paul Hagemann

arXiv:2305.11463·cs.LG·February 21, 2024·2 cites

Generative Sliced MMD Flows with Riesz Kernels

Johannes Hertrich, Christian Wald, Fabian Altekr\"uger, Paul Hagemann

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces an efficient method for computing MMD flows with Riesz kernels, enabling scalable training of generative models for images by leveraging sliced MMD and Riesz kernel properties.

Contribution

The paper demonstrates that MMD with Riesz kernels can be computed efficiently using sliced versions, reducing computational complexity and enabling scalable generative modeling.

Findings

01

Efficient gradient computation for MMD flows with Riesz kernels.

02

Reduction of complexity from quadratic to near-linear for certain cases.

03

Successful application to image generation on MNIST, FashionMNIST, and CIFAR10.

Abstract

Maximum mean discrepancy (MMD) flows suffer from high computational costs in large scale computations. In this paper, we show that MMD flows with Riesz kernels $K (x, y) = - ∥ x - y ∥^{r}$ , $r \in (0, 2)$ have exceptional properties which allow their efficient computation. We prove that the MMD of Riesz kernels, which is also known as energy distance, coincides with the MMD of their sliced version. As a consequence, the computation of gradients of MMDs can be performed in the one-dimensional setting. Here, for $r = 1$ , a simple sorting algorithm can be applied to reduce the complexity from $O (M N + N^{2})$ to $O ((M + N) lo g (M + N))$ for two measures with $M$ and $N$ support points. As another interesting follow-up result, the MMD of compactly supported measures can be estimated from above and below by the Wasserstein-1 distance. For the implementations we approximate the gradient of the sliced MMD by…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 5· marginally below the acceptance thresholdConfidence 3

Strengths

The paper is easy to follow and read. The proposed method is simple and computational efficient. The experiment results showed an improvement of FID in MNIST and FashionMNIST data sets.

Weaknesses

All the theory part is quite simple, specially the important theorem 1, which proved that the Sliced Riesz kernel is an equivalent form of Riesz kernel. I have the same impression for the sorting algorithm in 1-D case and results of error bound for stochastic MMD gradient in theorem 4. The experimental part is very limited with few experiments. The methods is shown to work with simple data sets like MNIST and FashionMNIST, when they considered a much more complicated-structure data set like CI

Reviewer 02Rating 5· marginally below the acceptance thresholdConfidence 2

Strengths

Major theoretical claims are correct, and proofs seem convincing, though I have not checked all of them.

Weaknesses

The paper is dedicated to accelerating the computation of the gradient of the sliced MMD with the Riesz kernel. Experiments are dedicated to a new algorithm for generative modeling (Algorithm 3 described in Appendix). A natural question appears: what is responsible for good results on MNIST/FashionMNIST/CIFAR10? Is it the sequential approach to train MMD flows, or the fact that gradients are estimated better, or the fact that Riesz kernel defines such a special MMD, or maybe specifics of archite

Reviewer 03Rating 8· accept, good paperConfidence 4

Strengths

I find this article to be well-written and its contributions to be interesting. Efficient MMD computation is indeed an important point, not only for MMD flows. The article addresses an important problem and offers an elegant solution for Riesz kernels. However, there are several points that appear to need correction or, at the very least, further elaboration.

Weaknesses

- Concerning Theorem 2: Theorem 2 establishes bounds between MMD and Wasserstein distance of order 1. In my opinion, these results are not very sharp, and there appears to be an important missing reference here. Under the same assumptions of compact support, the article [1, Theorem 1] demonstrates that the Wasserstein distance $W_1$ is bounded by an MMD with the Coulomb kernel $k(x, y) = -|x - y|^{2-d}$ but without the power dependency of $1/(d+1)$. Since the measures are bounded, MMD with the

Code & Models

Repositories

johertrich/sliced_mmd_flows
pytorchOfficial

Videos

Generative Sliced MMD Flows with Riesz Kernels· slideslive

Taxonomy

TopicsStochastic processes and financial applications · Lattice Boltzmann Simulation Studies