PHom-GeM: Persistent Homology for Generative Models

Jeremy Charlier; Radu State; Jean Hilger

arXiv:1905.09894·cs.LG·May 27, 2019

PHom-GeM: Persistent Homology for Generative Models

Jeremy Charlier, Radu State, Jean Hilger

PDF

1 Repo

TL;DR

This paper introduces PHom-GeM, a novel topological approach using persistent homology to evaluate and compare the distributions of generative models, addressing limitations of traditional distance measures.

Contribution

It proposes a new topological distance measure, the bottleneck distance, for assessing generative models, especially in complex real-world datasets.

Findings

01

Persistent homology effectively compares true and generated distributions.

02

PHom-GeM outperforms traditional measures on challenging datasets.

03

First application of topological distance in credit card transaction analysis.

Abstract

Generative neural network models, including Generative Adversarial Network (GAN) and Auto-Encoders (AE), are among the most popular neural network models to generate adversarial data. The GAN model is composed of a generator that produces synthetic data and of a discriminator that discriminates between the generator's output and the true data. AE consist of an encoder which maps the model distribution to a latent manifold and of a decoder which maps the latent manifold to a reconstructed distribution. However, generative models are known to provoke chaotically scattered reconstructed distribution during their training, and consequently, incomplete generated adversarial distributions. Current distance measures fail to address this problem because they are not able to acknowledge the shape of the data manifold, i.e. its topological features, and the scale at which the manifold should be…

Figures24

Click any figure to enlarge with its caption.

Tables1

Table 1. Table I : Bottleneck distance (smaller is better) with 95% of confidence interval between the samples X 𝑋 X of the original manifold 𝒳 𝒳 \mathcal{X} and the generated samples X ~ ~ 𝑋 \widetilde{X} of the manifold 𝒳 ~ ~ 𝒳 \mathcal{\widetilde{X}} . Because of the Wasserstein distance and gradient penalty, GP-WGAN achieves better performance.

Gen. Model	Mean Value	Lower Bound	Upper Bound
GP-WGAN	0.0711	0.0683	0.0738
WGAN	0.0744	0.0716	0.0772
WAE	0.0821	0.0791	0.0852
VAE	0.0857	0.0833	0.0881

Equations15

W_{c} (P_{X}, P_{G}) = f \in F_{L} sup E_{X \sim P_{X}} [f (X)] - E_{Y \sim P_{G}} [f (Y)]

W_{c} (P_{X}, P_{G}) = f \in F_{L} sup E_{X \sim P_{X}} [f (X)] - E_{Y \sim P_{G}} [f (Y)]

L =

L =

+ λ X \sim P_{X} E [(∣∣ \nabla_{X} f (X) ∣ ∣_{2} - 1)^{2}]

D_{WAE} (P_{X}, P_{G}) := Q (Z ∣ X) \in Q in f E_{P_{X}} E_{Q (Z ∣ X)} [c (X, G (Z))] + λ D_{Z} (Q_{Z}, P_{Z})

D_{WAE} (P_{X}, P_{G}) := Q (Z ∣ X) \in Q in f E_{P_{X}} E_{Q (Z ∣ X)} [c (X, G (Z))] + λ D_{Z} (Q_{Z}, P_{Z})

D_{Z} (P_{Z}, Q_{Z}) := = MMD_{k} (P_{Z}, Q_{Z}) ∣∣ \int_{Z} k (z, .) d P_{Z} (z) - \int_{Z} k (z, .) d Q_{Z} (z) ∣ ∣_{H_{k}}

D_{Z} (P_{Z}, Q_{Z}) := = MMD_{k} (P_{Z}, Q_{Z}) ∣∣ \int_{Z} k (z, .) d P_{Z} (z) - \int_{Z} k (z, .) d Q_{Z} (z) ∣ ∣_{H_{k}}

K_{1} ι K_{2} ι K_{3} ι ... ι K_{N - 1} ι K_{N} .

K_{1} ι K_{2} ι K_{3} ι ... ι K_{N - 1} ι K_{N} .

H_{k} (K_{1}, F) t_{1} H_{k} (K_{2}, F) t_{2} \dots t_{N - 2} H_{k} (K_{N - 1}, F) t_{N - 1} H_{k} (K_{N}, F) .

H_{k} (K_{1}, F) t_{1} H_{k} (K_{2}, F) t_{2} \dots t_{N - 2} H_{k} (K_{N - 1}, F) t_{N - 1} H_{k} (K_{N}, F) .

d_{b} (A, B) = ϕ : A^{'} \to B^{'} in f x \in A^{'} sup ∥ x - ϕ (x)∥

d_{b} (A, B) = ϕ : A^{'} \to B^{'} in f x \in A^{'} sup ∥ x - ϕ (x)∥

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dagrate/phomgem
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAutoencoders · Convolution · Dogecoin Customer Service Number +1-833-534-1729

Full text

PHom-GeM: Persistent Homology for Generative Models

Jeremy Charlier

University of Luxembourg

Luxembourg, Luxembourg

[email protected]

Radu State

University of Luxembourg

Luxembourg, Luxembourg

[email protected]

Jean Hilger

BCEE

Luxembourg, Luxembourg

[email protected]

Abstract

Generative neural network models, including Generative Adversarial Network (GAN) and Auto-Encoders (AE), are among the most popular neural network models to generate adversarial data. The GAN model is composed of a generator that produces synthetic data and of a discriminator that discriminates between the generator’s output and the true data. AE consist of an encoder which maps the model distribution to a latent manifold and of a decoder which maps the latent manifold to a reconstructed distribution. However, generative models are known to provoke chaotically scattered reconstructed distribution during their training, and consequently, incomplete generated adversarial distributions. Current distance measures fail to address this problem because they are not able to acknowledge the shape of the data manifold, i.e. its topological features, and the scale at which the manifold should be analyzed. We propose Persistent Homology for Generative Models, PHom-GeM, a new methodology to assess and measure the distribution of a generative model. PHom-GeM minimizes an objective function between the true and the reconstructed distributions and uses persistent homology, the study of the topological features of a space at different spatial resolutions, to compare the nature of the true and the generated distributions. Our experiments underline the potential of persistent homology for Wasserstein GAN in comparison to Wasserstein AE and Variational AE. The experiments are conducted on a real-world data set particularly challenging for traditional distance measures and generative neural network models. PHom-GeM is the first methodology to propose a topological distance measure, the bottleneck distance, for generative models used to compare adversarial samples in the context of credit card transactions.

Index Terms:

Neural Networks, Optimal Transport, Algebraic Topology

I Motivation

The field of unsupervised learning has evolved significantly for the past few years thanks to adversarial networks publications. In [1], Goodfelow et al. introduced a Generative Adversarial Network framework called GAN. It is a class of generative models that play a competitive game between two networks in which the generator network must compete against an adversary according to a game theoretic scenario [2]. The generator network produces samples from a noise distribution and its adversary, the discriminator network, tries to distinguish real samples from generated samples, respectively samples inherited from the training data and samples produced by the generator. Meanwhile, Variational Auto-Encoders (VAE) presented by Kingma et al. in [3] have emerged as a well-established approach for synthetic data generation. Nevertheless, they might generate poor target distribution because of the KL divergence [2]. We recall an AE is a neural network trained to copy its input manifold to its output manifold through a hidden layer. The encoder function sends the input space to the hidden space and the decoder function brings back the hidden space to the input space. By applying some of the Optimal Transport (OT) concepts gathered in [4] and noticeably, the Wasserstein distance, Arjovsky et al. introduced the Wasserstein GAN (WGAN) in [5]. It tries to avoid the mode collapse, a typical training convergence issue occurring between the generator and the discriminator. Gulrajani et al. further optimized the concept in [6] by proposing a Gradient Penalty to Wasserstein GAN (GP-WGAN) capable to generate adversarial samples of higher quality. Similarly, Tolstikhin et al. in [7] applied the same OT concepts to AE and, therefore, introduced Wasserstein AE (WAE), a new type of AE generative model, that avoids the use of the KL divergence.

Nonetheless, the description of the distribution $P_{G}$ of the generative models, which involves the description of the generated scattered data points [8] based on the distribution $P_{X}$ of the original manifold $\mathcal{X}$ , is very difficult using traditional distance measures, such as the Fréchet Inception Distance [7]. We highlight the distribution and the manifold notations in figure 1 for GAN and in figure 2 for AE. Effectively, traditional distance measures are not able to acknowledge the shapes of the data manifolds and the scale at which the manifold should be analyzed. However, persistent homology [9, 10] is specifically designed to highlight the topological features of the data [11]. Therefore, building upon persistent homology, Wasserstein distance [12] and generative models [7], our main contribution is to propose qualitative and quantitative ways to evaluate the scattered generated distributions and the performance of the generative models.

In this work we describe the persistent homology features of the generated model $G$ while minimizing the OT function $W_{c}(P_{X},P_{G})$ for a squared cost $c(x,y)=||x-y||_{2}^{2}$ where $P_{X}$ is the model distribution of the data contained in the manifold $\mathcal{X}$ , and $P_{G}$ the distribution of the generative model capable of generating adversarial samples. Our contributions are summarized below:

•

A persistent homology procedure for generative models, including GP-WGAN, WGAN, WAE and VAE, which we call PHom-GeM to highlight the topological properties of the generated distributions of the data for different spatial resolutions. The objective is a persistent homology description of the generated data distribution $P_{G}$ following the generative model $G$ .

•

A distance measure for persistence diagrams, the bottleneck distance, applied to generative models to compare quantitatively the true and the target distributions on any data set. We measure the shortest distance for which there exists a perfect matching between the points of the two persistence diagrams. A persistence diagram is a stable summary representation of topological features of simplicial complex, a collection of vertices, associated to the data set.

•

Finally, we propose the first application of algebraic topology and generative models on a public data set containing credit card transactions, particularly challenging for this type of models and traditional distance measures.

The paper is structured as follows. In section II, we review the optimized GP-WGAN and WAE formulations using OT derived by Gulrajani et al. in [6] and Tolstikhin et al. in [7], respectively. By using persistence homology, we are able to compare the topological properties of the original distribution $P_{X}$ and the generated distribution $P_{G}$ . We highlight experimental results in section III and we conclude in section IV by addressing promising directions for future work.

II Proposed Method

Our method computes the persistence homology of both the true manifold $X\in\mathcal{X}$ and the generated manifold $\widetilde{X}\in\mathcal{\widetilde{X}}$ following the generative model $G$ based on the minimization of the optimal transport cost $W_{c}(P_{X},P_{G})$ . In the resulting topological problem, the points of the manifolds are transformed to a metric space set for which a Vietoris-Rips simplicial complex filtration is applied (see definition 2). PHom-GeM achieves simultaneously two main goals: it computes the birth-death of the pairing generators of the iterated inclusions while measuring the bottleneck distance between persistence diagrams of the manifolds of the generative models.

II-A Optimal Transport and Dual Formulation

Following the description of the optimal transport problem [4] and relying on the Kantorovich-Rubinstein duality, the Wasserstein distance is computed as

[TABLE]

where $(\mathcal{X},d)$ is a metric space, $\mathcal{P}(X\sim P_{X},Y\sim P_{G})$ is a set of all joint distributions $(X,Y)$ with marginals $P_{X}$ and $P_{G}$ respectively and $\mathcal{F}_{L}$ is the class of all bounded 1-Lipschitz functions on $(\mathcal{X},d)$ .

II-B Gradient Penalty Wasserstein GAN (GP-WGAN)

As described in [6], the GP-WGAN objective loss function with gradient penalty is expressed such that

[TABLE]

where $f$ is the set of 1-Lipschitz functions on $(\mathcal{X},d)$ , $P_{X}$ the original data distribution, $P_{G}$ the generative model distribution implicitly defined by $\widetilde{X}=G(Z),Z\sim p(Z)$ . The input $Z$ to the generator is sampled from a noise distribution such as a uniform distribution. $P_{\widehat{X}}$ defines the uniform sampling along straight lines between pairs of points sampled from the data distribution $P_{X}$ and the generative distribution $P_{G}$ . A penalty on the gradient norm is enforced for random samples $\widehat{X}\sim P_{\widehat{X}}$ . For further details, we refer to [6] and [5].

II-C Wasserstein Auto-Encoders

As described in [7], the WAE objective function is expressed such that

[TABLE]

where $c(X,G(Z)):\mathcal{X}\times\mathcal{X}\rightarrow\mathcal{R}_{+}$ is any measurable cost function. In our experiments, we use a square cost function $c(x,y)=||x-y||_{2}^{2}$ for data points $x,y\in\mathcal{X}$ . $G(Z)$ denotes the sending of $Z$ to $X$ for a given map $G:\mathcal{Z}\rightarrow\mathcal{X}$ . $Q$ , and $G$ , are any nonparametric set of probabilistic encoders, and decoders respectively.

We use the Maximum Mean Discrepancy (MMD) for the penalty $\mathcal{D}_{Z}(Q_{Z},P_{Z})$ for a positive-definite reproducing kernel $k:\mathcal{Z}\times\mathcal{Z}\rightarrow\mathcal{R}$

[TABLE]

where $\mathcal{H}_{k}$ is the reproducing kernel Hilbert space of real-valued functions mapping on $\mathcal{Z}$ . For details on the MMD implementation, we refer to [7].

II-D Persistence Diagram and Vietoris-Rips Complex

Definition 1 Let $V=\{1,\cdots,|V|\}$ be a set of vertices. A simplex $\sigma$ is a subset of vertices $\sigma\subseteq V$ . A simplicial complex K on V is a collection of simplices $\{\sigma\}\>,\>\sigma\subseteq V$ , such that $\tau\subseteq\sigma\in K\Rightarrow\tau\in K$ . The dimension $n=|\sigma|-1$ of $\sigma$ is its number of elements minus 1. Simplicial complexes examples are represented in figure 3.

Definition 2 Let $(X,d)$ be a metric space. The Vietoris-Rips complex $\text{VR}(X,\epsilon)$ at scale $\epsilon$ associated to $X$ is the abstract simplicial complex whose vertex set is $X$ , and where $\left\{x_{0},x_{1},...,x_{k}\right\}$ is a $k$ -simplex if and only if $d(x_{i},x_{j})\leq\epsilon$ for all $0\leq i,j\leq k$ .

We obtain an increasing sequence of Vietoris-Rips complex by considering the $\text{VR}(\mathcal{C},\epsilon)$ for an increasing sequence $(\epsilon_{i})_{1\leq i\leq N}$ of value of the scale parameter $\epsilon$

[TABLE]

Applying the k-th singular homology functor $H_{k}(-,F)$ with coefficient in the field $F$ [13] to (5), we obtain a sequence of $F$ -vector spaces, called the k-th persistence module of $(\mathcal{K}_{i})_{1\leq i\leq N}$

[TABLE]

Definition 3 $\forall\leavevmode\nobreak\ i<j$ , the (i,j)-persistent $k$ -homology group with coefficient in $F$ of $\mathcal{K}=(\mathcal{K}_{i})_{1\leq i\leq N}$ denoted $HP_{k}^{i\rightarrow j}(\mathcal{K},F)$ is defined to be the image of the homomorphism $t_{j-1}\circ\ldots\circ t_{i}:H_{k}(\mathcal{K}_{i},F)\rightarrow H_{k}(\mathcal{K}_{j},F)$ .

Using the interval decomposition theorem [14], we extract a finite family of intervals of $\mathbb{R}_{+}$ called persistence diagram. Each interval can be considered as a point in the set $D=\left\{(x,y)\in\mathbb{R}_{+}^{2}|x\leq y\right\}$ . Hence, we obtain a finite subset of the set $D$ . This space of finite subsets is endowed with a matching distance called the bottleneck distance and defined as follow

[TABLE]

where $A^{\prime}=A\cup\Delta$ , $B^{\prime}=B\cup\Delta$ , $\Delta=\{(x,y)\in\mathbb{R}^{2}_{+}|x=y\}$ and the $\inf$ is over all the bijections from $A^{\prime}$ to $B^{\prime}$ .

II-E Application: PHom-GeM, Persistent Homology for Generative Models

Bridging the gap between persistent homology and generative models, PHom-GeM uses a two-steps procedure. First, the minimization problem is solved for the generator $G$ and the discriminator $D$ when considering GP-WGAN and WGAN. The gradient penalty $\lambda$ in equation (2) is fixed equal to 10 for GP-WGAN and to 0 for WGAN. For auto-encoders, the minimization problem is solved for the encoder $Q$ and the decoder $G$ . We use RMSProp optimizer [15] for the optimization procedure. Then, the samples of the original and generated distributions, $P_{X}$ and $P_{G}$ , are mapped to persistence homology for the description of their respective manifolds. The points contained in the manifold $\mathcal{X}$ inherited from $P_{X}$ and the points contained in the manifold $\mathcal{\tilde{X}}$ generated with $P_{G}$ are randomly selected into respective batches. Two samples, $Y_{1}$ from $\mathcal{X}$ following $P_{X}$ and $Y_{2}$ from $\mathcal{\tilde{X}}$ following $P_{G}$ , are selected to differentiate the topological features of the original manifold $\mathcal{X}$ and the generated manifold $\mathcal{\tilde{X}}$ . The samples $Y_{1}$ and $Y_{2}$ are contained in the spaces $\mathcal{Y}_{1}$ and $\mathcal{Y}_{2}$ , respectively. Then, the spaces $\mathcal{Y}_{1}$ and $\mathcal{Y}_{2}$ are transformed into metric space sets $\mathcal{\widehat{Y}}_{1}$ and $\mathcal{\widehat{Y}}_{2}$ for computational purposes. Then, we filter the metric space sets $\mathcal{\widehat{Y}}_{1}$ and $\mathcal{\widehat{Y}}_{2}$ using the Vietoris-Rips simplicial complex filtration. Given a line segment of length $\epsilon$ , vertices between data points are created for data points respectively separated from a smaller distance than $\epsilon$ . It leads to the construction of a collection of simplices resulting in Vietoris-Rips simplicial complex VR $(\mathcal{C},\epsilon)$ filtration. In our case, we decide to use the Vietoris-Rips simplicial complex as it offers the best compromise between the filtration accuracy and the memory requirement [11]. Subsequently, the persistence diagrams, $\text{dgm}_{Y_{1}}$ and $\text{dgm}_{Y_{2}}$ , are constructed. We recall a persistence diagram is a stable summary representation of topological features of simplicial complex. The persistence diagrams allow the computation of the bottleneck distance $d_{b}(\text{dgm}_{Y_{1}},\text{dgm}_{Y_{2}})$ . Finally, the barcodes represent in a simple way the birth-death of the pairing generators of the iterated inclusions detected by the persistence diagrams.

III Experiments

We empirically evaluate the proposed methodology PHom-GeM. We assess on a highly challenging data set for generative models whether PHom-GeM can simultaneously achieve (i) precise persistent homology mapping of the generated data points and (ii) accurate persistent homology distance measurement with the bottleneck distance.

Data Availability and Data Description We train PHom-GeM on one real-world open data set: the credit card transactions data set from the Kaggle database111The data set is available at https://www.kaggle.com/mlg-ulb/creditcardfraud. containing 284 807 transactions including 492 frauds. This data set is particularly interesting because it reflects the scattered points distribution of the reconstructed manifold that are found during generative models’ training, impacting afterward the generated adversarial samples. Furthermore, this data set is challenging because of the strong imbalance between normal and fraudulent transactions while being of high interest for the banking industry. To preserve transactions confidentiality, each transaction is composed of 28 components obtained with PCA without any description and two additional features Time and Amount that remained unchanged. Each transaction is labeled as fraudulent or normal in a feature called Class which takes a value of 1 or 0, respectively.

Experimental Setup and Code Availability In our experiments, we use the Euclidean latent space $\mathcal{Z}=\mathcal{R}^{2}$ and the square cost function $c$ previously defined as $c(x,y)=||x-y||_{2}^{2}$ for the data points $x\in\mathcal{X},\widetilde{x}\in\mathcal{\widetilde{X}}$ . The dimensions of the true data set is $\mathcal{R}^{29}$ . We kept the 28 components obtained with PCA and the amount resulting in a space of dimension 29. For the error minimization process, we used RMSProp gradient descent [15] with the parameters $\text{lr}=0.001,\rho=0.9,\epsilon=10^{-6}$ and a batch size of 64. Different values of $\lambda$ for the gradient penalty have been tested. We empirically obtained the lowest error reconstruction with $\lambda=10$ for both GP-WGAN and WAE. The coefficients of persistence homology are evaluated within the field $\mathbb{Z}/2\mathbb{Z}$ . We only consider homology groups $H_{0}$ and $H_{1}$ who represent the connected components and the loops, respectively. Higher dimensional homology groups did not noticeably improve the results quality while leading to longer computational time. The simulations were performed on a computer with 16GB of RAM, Intel i7 CPU and a Tesla K80 GPU accelerator. To ensure the reproducibility of the experiments, the code is available at the following address222The code is available at https://github.com/dagrate/phomgem.

Results and Discussions about PHom-GeM We test PHom-GeM, Persistent Homology for Generative Models, on four different generative models: GP-WGAN, WGAN, WAE and VAE. We compare the performance of PHom-GeM on two specificities: first, qualitative visualization of the persistence diagrams and barcodes and, secondly, quantitative estimation of the persistent homology closeness using the bottleneck distance between the generated manifolds $\mathcal{\widetilde{X}}$ of the generative models and the original manifold $\mathcal{X}$ .

On the top of figure 4, the rotated persistence and the barcode diagrams of the original sample $\mathcal{X}$ are highlighted. In the persistence diagram, black points represent the 0-dimensional homology groups $H_{0}$ , the connected components of the complex. The red triangles represent the 1-dimensional homology group $H_{1}$ , the 1-dimensional features known as cycles or loops. The barcode diagram is a simple way of representing the information contained in the persistence diagram. For the sake of simplicity, we represent only the barcode diagram of the generative models to compare qualitatively the generated distribution $P_{G}$ of each model with respect to the distribution $P_{X}$ of the original sample. The generated distribution $P_{G}$ of GP-WGAN is the closest to the distribution $P_{X}$ followed by WGAN, WAE and VAE. Effectively, the spectrum of the barcodes of GP-WGAN is very similar to the original sample’s spectrum as well as denser on the right. On the opposite, the WAE and VAE’s distributions $P_{G}$ are not able to reproduce all of the features contained in the original distribution, therefore explaining the narrower barcode spectrum.

In order to quantitatively assess the quality of the generated distributions, we use the bottleneck distance between the persistent diagram of $\mathcal{X}$ and the persistent diagram of $G(Z)$ of the generated data points. In table I, we highlight the mean value of the bottleneck distance for a 95% confidence interval. We also underline the lower and the upper bounds of the 95% confidence interval for each generative model. Confirming the visual observations, we notice the smallest bottleneck distance, and therefore, the best result, is obtained with GP-WGAN, followed by WGAN, WAE and VAE. It means GP-WGAN is capable to generate data distribution sharing the most topological features with the original data distribution, including the nearness measurements and the overall shape. It confirms topologically on a real-world data set the claims addressed in [6] of superior performance of GP-WGAN against WGAN. Furthermore, the performance of the AE cannot match the generative performance achieved by the GANs. However, the WAE, that relies on optimal transport theory, achieves better generative distribution in comparison to the popular VAE.

IV Conclusion

Building upon optimal transport and unsupervised learning, we introduced PHom-GeM, Persistent Homology for Generative Models, a new characterization of the generative manifolds that uses topology and persistence homology to highlight manifold features and scattered generated distributions. We discuss the relations of GP-WGAN, WGAN, WAE and VAE in the context of unsupervised learning. Furthermore, relying on persistence homology, the bottleneck distance has been introduced to estimate quantitatively the topological features similarities between the original distribution and the generated distributions of the generative models, a specificity that current traditional distance measures fail to acknowledge. We conducted experiments showing the performance of PHom-GeM on the four generative models GP-WGAN, WGAN, WAE and VAE. We used a challenging imbalanced real-world open data set containing credit card transactions, capable of illustrating the scattered generated data distributions of the generative models and particularly suitable for the banking industry. We showed the superior topological performance of GP-WGAN in comparison to the other generative models as well as the superior performance of WAE over VAE. Future work will include further exploration of the topological features such as the influence of the simplicial complex and the possibility to integrate a topological optimization function as a regularization term.

Bibliography15

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. In Advances in neural information processing systems , pages 2672–2680, 2014.
2[2] Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. Deep learning , volume 1. MIT press Cambridge, 2016.
3[3] Diederik P Kingma and Max Welling. Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114 , 2013.
4[4] Cédric Villani. Topics in optimal transportation . Number 58. American Mathematical Soc., 2003.
5[5] Martin Arjovsky, Soumith Chintala, and Léon Bottou. Wasserstein generative adversarial networks. In International Conference on Machine Learning , pages 214–223, 2017.
6[6] Ishaan Gulrajani, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron C Courville. Improved training of wasserstein gans. In Advances in Neural Information Processing Systems , pages 5767–5777, 2017.
7[7] Ilya Tolstikhin, Olivier Bousquet, Sylvain Gelly, and Bernhard Schoelkopf. Wasserstein auto-encoders. ar Xiv preprint ar Xiv:1711.01558 , 2017.
8[8] Yoshua Bengio, Li Yao, Guillaume Alain, and Pascal Vincent. Generalized denoising auto-encoders as generative models. In Advances in Neural Information Processing Systems , pages 899–907, 2013.