Visualization of AE's Training on Credit Card Transactions with   Persistent Homology

Jeremy Charlier; Francois Petit; Gaston Ormazabal; Radu State; Jean; Hilger

arXiv:1905.13020·cs.LG·August 13, 2019

Visualization of AE's Training on Credit Card Transactions with Persistent Homology

Jeremy Charlier, Francois Petit, Gaston Ormazabal, Radu State, Jean, Hilger

PDF

Open Access

TL;DR

This paper introduces PHom-WAE, a novel topological method using persistent homology to evaluate and improve the data distribution quality of Wasserstein Auto-Encoders, especially on complex real-world credit card transaction data.

Contribution

It proposes a new topological distance measure, the bottleneck distance, for Wasserstein Auto-Encoders to better assess latent space quality compared to traditional methods.

Findings

01

Persistent homology effectively captures topological features of data manifolds.

02

PHom-WAE outperforms traditional distance measures on real-world datasets.

03

The methodology provides a new way to evaluate generative models' data distribution quality.

Abstract

Auto-encoders are among the most popular neural network architecture for dimension reduction. They are composed of two parts: the encoder which maps the model distribution to a latent manifold and the decoder which maps the latent manifold to a reconstructed distribution. However, auto-encoders are known to provoke chaotically scattered data distribution in the latent manifold resulting in an incomplete reconstructed distribution. Current distance measures fail to detect this problem because they are not able to acknowledge the shape of the data manifolds, i.e. their topological features, and the scale at which the manifolds should be analyzed. We propose Persistent Homology for Wasserstein Auto-Encoders, called PHom-WAE, a new methodology to assess and measure the data distribution of a generative model. PHom-WAE minimizes the Wasserstein distance between the true distribution and the…

Tables2

Table 1. Table 1: Bottleneck distance (smaller is better) for PHom-WAE and PHom-VAE between the samples X 𝑋 X of the original manifold 𝒳 𝒳 \mathcal{X} and the reconstructed manifold G ( Z | X ) 𝐺 conditional 𝑍 𝑋 G(Z|X) for Z ∈ 𝒵 𝑍 𝒵 Z\in\mathcal{Z} . Because of OT, PHom-WAE achieves better performance.

PHom-WAE	PHom-VAE	Difference (%)
0.0788	0.0878	10.25

Table 2. Table 2: Bottleneck distance (smaller is better) for PHom-WAE and PHom-VAE between the samples Z i subscript 𝑍 𝑖 Z_{i} of the latent manifold 𝒵 𝒵 \mathcal{Z} following P Z subscript 𝑃 𝑍 P_{Z} to detect scattered distribution. PHom-WAE better preserves the topological features during the encoding than PHom-VAE resulting in a manifold 𝒵 𝒵 \mathcal{Z} less chaotically scattered, highlighted by the smaller bottleneck distance.

PHom-WAE	PHom-VAE	Difference (%)
0.0984	0.1372	28.28

Equations12

W_{c} (P_{X}, P_{G}) = f \in F_{L} sup E_{X \sim P_{X}} [f (X)] - E_{Y \sim P_{G}} [f (Y)]

W_{c} (P_{X}, P_{G}) = f \in F_{L} sup E_{X \sim P_{X}} [f (X)] - E_{Y \sim P_{G}} [f (Y)]

D_{WAE} (P_{X}, P_{G}) := Q (Z ∣ X) \in Q in f E_{P_{X}} E_{Q (Z ∣ X)} [c (X, G (Z))] + λ D_{Z} (Q_{Z}, P_{Z})

D_{WAE} (P_{X}, P_{G}) := Q (Z ∣ X) \in Q in f E_{P_{X}} E_{Q (Z ∣ X)} [c (X, G (Z))] + λ D_{Z} (Q_{Z}, P_{Z})

D_{Z} (P_{Z}, Q_{Z}) := = MMD_{k} (P_{Z}, Q_{Z}) ∣∣ \int_{Z} k (z, .) d P_{Z} (z) - \int_{Z} k (z, .) d Q_{Z} (z) ∣ ∣_{H_{k}}

D_{Z} (P_{Z}, Q_{Z}) := = MMD_{k} (P_{Z}, Q_{Z}) ∣∣ \int_{Z} k (z, .) d P_{Z} (z) - \int_{Z} k (z, .) d Q_{Z} (z) ∣ ∣_{H_{k}}

K_{1} ι K_{2} ι K_{3} ι ... ι K_{N - 1} ι K_{N} .

K_{1} ι K_{2} ι K_{3} ι ... ι K_{N - 1} ι K_{N} .

H_{k} (K_{1}, F) t_{1} H_{k} (K_{2}, F) t_{2} \dots t_{N - 2} H_{k} (K_{N - 1}, F) t_{N - 1} H_{k} (K_{N}, F) .

H_{k} (K_{1}, F) t_{1} H_{k} (K_{2}, F) t_{2} \dots t_{N - 2} H_{k} (K_{N - 1}, F) t_{N - 1} H_{k} (K_{N}, F) .

d_{b} (A, B) = ϕ : A^{'} \to B^{'} in f x \in A^{'} sup ∥ x - ϕ (x)∥

d_{b} (A, B) = ϕ : A^{'} \to B^{'} in f x \in A^{'} sup ∥ x - ϕ (x)∥

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopological and Geometric Data Analysis · Advanced Neuroimaging Techniques and Applications · Cell Image Analysis Techniques

Full text

11institutetext: University of Luxembourg, L-1855 Luxembourg, Luxembourg

11email: {name.surname@}@uni.lu 22institutetext: Columbia University, New York NY 10027, USA

22email: {jjc2292,gso7@}@columbia.edu 33institutetext: BCEE, L-1160 Luxembourg, Luxembourg

33email: [email protected]

Visualization of AE’s Training on Credit Card Transactions with Persistent Homology

Jeremy Charlier 1122

Francois Petit 11

Gaston Ormazabal 22

Radu State 11

Jean Hilger 33

Abstract

Auto-encoders are among the most popular neural network architecture for dimension reduction. They are composed of two parts: the encoder which maps the model distribution to a latent manifold and the decoder which maps the latent manifold to a reconstructed distribution. However, auto-encoders are known to provoke chaotically scattered data distribution in the latent manifold resulting in an incomplete reconstructed distribution. Current distance measures fail to detect this problem because they are not able to acknowledge the shape of the data manifolds, i.e. their topological features, and the scale at which the manifolds should be analyzed. We propose Persistent Homology for Wasserstein Auto-Encoders, called PHom-WAE, a new methodology to assess and measure the data distribution of a generative model. PHom-WAE minimizes the Wasserstein distance between the true distribution and the reconstructed distribution and uses persistent homology, the study of the topological features of a space at different spatial resolutions, to compare the nature of the latent manifold and the reconstructed distribution. Our experiments underline the potential of persistent homology for Wasserstein Auto-Encoders in comparison to Variational Auto-Encoders, another type of generative model. The experiments are conducted on a real-world data set particularly challenging for traditional distance measures and auto-encoders. PHom-WAE is the first methodology to propose a topological distance measure, the bottleneck distance, for Wasserstein Auto-Encoders used to compare decoded samples of high quality in the context of credit card transactions.

Keywords:

Barcodes Encoding-Decoding Persistence Diagram.

1 Motivation

Dimension reduction techniques were initially driven by linear algebra with second order matrix decompositions such as the Singular Value Decomposition (SVD) [1] and, ultimately, with tensor decompositions [2, 3], a higher order analogue of the matrix decompositions. Although SVD is fast enough to be applied on large data sets, it is limited to second order because it relies on matrices. While tensors are suitable for third-order experiments and above, the complexity of the decomposition and the time required for the computations become a limitation as soon as the size of the data set grows [4, 5]. Hence, in recent years, several architectures for dimension reductions based on neural networks have been proposed. Dense Auto-Encoders (AE) [6, 7] are one of the most well established approach. More recently, the Variational Auto-Encoders (VAE) presented by Kingma et al. in [8] constitute a well-known approach but they might generate poor target distribution because of the KL divergence.

We recall an AE is a neural network trained to copy its input manifold to its output manifold through a hidden layer. The encoder function sends the input space to the hidden space and the decoder function brings back the hidden space to the input space. As explained in [9], the points of the hidden space are chaotically scattered for most of the encoders, including the popular VAE. Even after proper training, groups of points of various sizes gather and cluster by themselves randomly in the hidden layer. Therefore, some features are missing in the reconstructed distribution $G(Z),Z\in\mathcal{Z}$ . The description of the scattered points is very complex using traditional distance measures, such as the Euclidean distance, because they are not able to acknowledge the shapes of the data manifolds. However, persistent homology is specifically designed to highlight the topological features of the data [10]. Therefore, building upon persistent homology, Wasserstein distance [11] and Wasserstein Auto-Encoders (WAE) [12], our main contribution is to propose qualitative and quantitative ways to evaluate the scattered hidden space and the overall performance of AE.

In this work we describe the persistent homology features of the encoded and decoded model distribution while minimizing the Optimal Transport (OT) function $W_{c}(P_{X},P_{G})$ for a squared cost $c(x,y)=||x-y||_{2}^{2}$ where $P_{X}$ is the model distribution of the data, and $P_{G}$ the latent variable model specified by the prior distribution $P_{Z}$ of latent manifold $Z\in\mathcal{Z}$ and the generative model $P_{G}(X|Z)$ of the initial distribution $X\in\mathcal{X}$ given $Z$ . The method is highlighted in figure 1. Our contributions are summarized below:

•

A persistent homology procedure for WAE which we call PHom-WAE to highlight the topological properties of the encoded and decoded distribution of the data for different spatial resolutions. The objective is twofold: a persistent homology description of the encoded latent space $Q_{z}:=\mathbb{E}_{P_{X}}[Q(Z|X)]$ , and a persistent homology description of the decoded latent space following the generative model $P_{G}(X|Z)$ .

•

A distance measure for persistence diagrams, the bottleneck distance, applied to WAE to compare quantitatively the true and the target distributions on any data set. We measure the shortest distance for which there exists a perfect matching between the points of the two persistence diagrams. A persistence diagram is a stable summary representation of topological features of simplicial complex, a collection of vertices, associated to the data set.

•

A persistent homology procedure to highlight the scattered latent distribution provoked by Variational Auto-Encoders (VAE) in comparison to WAE on a real-word data set. We use the concepts introduced for PHom-WAE to define PHom-VAE, Persistent Homology for VAE. We highlight the VAE’s hidden layer scattered distribution using persistent homology, confirming the original statement of VAE’s scattered distribution introduced in [13].

•

Finally, we propose the first application of algebraic topology and WAE on a public data set containing credit card transactions, particularly challenging for traditional distance measures and auto-encoders.

The paper is structured as follows. We discuss related works in section 2. In section 3, we review the WAE formulation using OT derived by Tolstikhin et al. in [12]. By using persistence homology, we are able to compare the topological properties of the distributions $P_{X},P_{Z}$ and $P_{G}$ , illustrated in figure 1. We highlight the experimental results in section 4 and we conclude in section 5 by addressing promising directions for future work.

2 Related Work

Literature on Persistence Homology and Topology A major trend in modern data analysis is to consider that the data has a shape, and more precisely, a topological structure. Topological Data Analysis (TDA) is a set of tools relying on computational algebraic topology allowing to obtain precise information about the data structure. Two of the most important techniques are persistence homology and the mapper algorithm [10].

Data sets are usually represented as points cloud of Euclidean spaces. Hence, the shape of a data set depends of the scale at which it is observed. Instead of trying to find an optimal scale, persistence homology, a method of TDA, studies the changes of the topological features (number of connected components, number of holes, …) of a data set depending of the scale. The foundations of persistence homology have been established in the early 2000s in [14], [15], [16] and [17]. They provide a computable algebraic invariant of filtered topological spaces (nested sequences of topological spaces which encode how the scale changes) called persistence module. This module can be decomposed into a family of intervals called persistence diagram or barcodes. This family records how the topology of the space is changing when going through the filtration [18]. The space of barcodes is measurable through the bottleneck distance. Moreover, the space of persistence module is also endowed with a metric and under a mild assumptions these two spaces are isometric [19]. Additionally, the Mapper algorithm first appeared in [20]. It is a visualization algorithm which aims at produce a low dimensional representation of high-dimensional data sets in form of a graph, therefore, capturing the topological features of the points cloud.

Meanwhile, efficient and fast algorithms have emerged to compute persistence homology [15],[17] as well as to construct filtered topological spaces using, for instance, the Vietoris-Rips complex [21]. Therefore, it has already found numerous successful applications. For instance, Nicolau et al. in [22] detected subgroups of cancers with unique mutational profile. In [23], it has been shown that computational topology could be used in medicine, social sciences or sport analysis. More recently, Bendich et al. improved statistical analyses of brain arteries of a population [24] while Xia et al. were capable of extracting molecular topological fingerprints for proteins in biological science [25].

Literature on Auto-Encoders and Optimal Transport A large variety of AE have appeared in the last few years [13]. Although promising results were achieved, most of the solutions did not address the representation of the samples of the encoded and decoded manifolds. As outlined by Bengio et al. in [9], the points in the encoded manifold $\mathcal{Z}$ for the majority of the encoders are chaotically scattered. Therefore, some features are missing in the reconstructed distribution $G(Z),Z\in\mathcal{Z}$ . Thus, sampling data points for the reconstruction with traditional AE is difficult. The added constraint of Variational Auto-Encoder (VAE) in [8] by the mean of a KL divergence, composed of a reconstruction cost and a regularization term, provides a finer solution to generate adversarial data by reducing the impact of the chaotic scattered distribution $P_{Z}$ of $\mathcal{Z}$ .

Concurrently to the emergence of AE, Goodfellow et al. introduced the Generative Adversarial Network (GAN) model in [26] in 2014. Although the GAN did not have an encoder, it consists of two parts, a generator to generate adversarial samples and a discriminator to fit the generated data points to the true data distribution. However, the GAN suffers from a mode collapse between the generator and the discriminator [13]. As a solution, optimal transport [27] was applied to GAN in [28] with the Wasserstein distance, and therefore, introducing Wasserstein GAN (WGAN). By adding a gradient penalty to the Wasserstein distance, Gulrajani et al. in [29] introduced a new training for GANs while avoiding the mode collapse. As described in [30, 31] in the context of unbalanced optimal transport, Tolstikhin et al. applied these concepts to AE in [12]. They proposed to add one extra divergence to the objective minimization function in the context of generative modeling leading to Wasserstein Auto-Encoders (WAE).

In this paper, using persistent homology and the bottleneck distance, we propose qualitative and quantitative ways to evaluate the performance of the compression of AE. We build upon the work of WAE and unbalanced OT with persistent homology. We show that that the barcodes, inherited from the persistence diagrams, are capable of representing the encoded manifold $\mathcal{Z}$ generated by the WAE. Furthermore, we show that the bottleneck distance allows to compare quantitatively the topological features between the samples $Z\in\mathcal{Z}$ of the reconstructed distribution $G(Z)$ and the samples $X\in\mathcal{X}$ of the true distribution.

3 Proposed Method

Our method computes the persistent homology of both the latent manifold $Z\in\mathcal{Z}$ and the reconstructed manifold following the generative model $P_{G}(X|Z)$ based on the minimization of the optimal transport cost $W_{c}(P_{X},P_{G})$ . In the resulting topological problem, the points of the manifolds are transformed to a metric space set for which a Vietoris-Rips simplicial complex filtration is applied (see definition 2). PHom-WAE achieves simultaneously two main goals: it computes the lifespan of the persistent homological features while measuring the bottleneck distance between the persistence diagrams of the WAE’s manifolds.

3.1 Preliminaries and Notations

We follow the notations used by Tolstikhin et al. in [12]. Sets are denoted by calligraphic letters such as $\mathcal{X}$ , random variables by capital letters $X$ , and their values by lower case letters $x$ . Probability distributions are denoted by capital letters $P(X)$ and their corresponding densities by lower case letters $p(x)$ .

3.2 Optimal Transport and Dual Formulation

Following the description of the optimal transport problem [27] and relying on the Kantorovich-Rubinstein duality, the Wasserstein distance is computed as

[TABLE]

where $(\mathcal{X},d)$ is a metric space, $\mathcal{P}(X\sim P_{X},Y\sim P_{G})$ is a set of all joint distributions $(X,Y)$ with marginals $P_{X}$ and $P_{G}$ respectively and $\mathcal{F}_{L}$ is the class of all bounded 1-Lipschitz functions on $(\mathcal{X},d)$ .

3.3 Wasserstein Auto-Encoders

As described in [12], the WAE objective function is expressed such that

[TABLE]

where $c(X,G(Z)):\mathcal{X}\times\mathcal{X}\rightarrow\mathcal{R}_{+}$ is any measurable cost function. In our experiments, we use a square cost function $c(x,y)=||x-y||_{2}^{2}$ for data points $x,y\in\mathcal{X}$ . $G(Z)$ denotes the sending of $Z$ to $X$ for a given map $G:\mathcal{Z}\rightarrow\mathcal{X}$ . $Q$ , and $G$ , are any nonparametric set of probabilistic encoders, and decoders respectively.

We use the Maximum Mean Discrepancy (MMD) for the penalty $\mathcal{D}_{Z}(Q_{Z},P_{Z})$ for a positive-definite reproducing kernel $k:\mathcal{Z}\times\mathcal{Z}\rightarrow\mathcal{R}$

[TABLE]

where $\mathcal{H}_{k}$ is the reproducing kernel Hilbert space of real-valued functions mapping on $\mathcal{Z}$ . For details on the MMD implementation, we refer to [12].

3.4 Vietoris-Rips Complex, Persistence Diagram and Barcodes

We explain the construction of the persistence module associated to a sample of a fixed distribution on a space. First, two manifold distributions are sampled from the WAE’s training. Then, we construct the persistence modules associated to each sample of the points manifolds. We refer to the first subsection of section 2 for pointed reference on persistent homology.

We associate to our points manifold $\mathcal{C}\subset\mathbb{R}^{n}$ , considered as a finite metric space, a sequence of simplicial complexes. For that aim, we use the Vietoris-Rips complex.

Definition 1 Let $V=\{1,\cdots,|V|\}$ be a set of vertices. A simplex $\sigma$ is a subset of vertices $\sigma\subseteq V$ . A simplicial complex K on V is a collection of simplices $\{\sigma\}\>,\>\sigma\subseteq V$ , such that $\tau\subseteq\sigma\in K\Rightarrow\tau\in K$ . The dimension $n=|\sigma|-1$ of $\sigma$ is its number of elements minus 1. Simplicial complexes examples are represented in figure 2.

Definition 2 Let $(X,d)$ be a metric space. The Vietoris-Rips complex at scale $\epsilon$ associated to $X$ , denoted by $\text{VR}(X,\epsilon)$ , is the abstract simplicial complex whose vertex set is $X$ , and where $\left\{x_{0},x_{1},...,x_{k}\right\}$ is a $k$ -simplex if and only if $d(x_{i},x_{j})\leq\epsilon$ for all $0\leq i,j\leq k$ .

We obtain an increasing sequence of Vietoris-Rips complex by considering the $\text{VR}(\mathcal{C},\epsilon)$ for an increasing sequence $(\epsilon_{i})_{1\leq i\leq N}$ of value of the scale parameter $\epsilon$

[TABLE]

Applying the k-th singular homology functor $H_{k}(-,F)$ with coefficient in the field $F$ [32] to (4), we obtain a sequence of $F$ -vector spaces, called the k-th persistence module of $(\mathcal{K}_{i})_{1\leq i\leq N}$

[TABLE]

Definition 3 $\forall\leavevmode\nobreak\ i<j$ , the (i,j)-persistent $k$ -homology group with coefficient in $F$ of $\mathcal{K}=(\mathcal{K}_{i})_{1\leq i\leq N}$ denoted $HP_{k}^{i\rightarrow j}(\mathcal{K},F)$ is defined to be the image of the homomorphism $t_{j-1}\circ\ldots\circ t_{i}:H_{k}(\mathcal{K}_{i},F)\rightarrow H_{k}(\mathcal{K}_{j},F)$ .

Using the interval decomposition theorem [33] , we extract a finite family of intervals of $\mathbb{R}_{+}$ called persistence diagram. Each interval can be considered as a point in the set $D=\left\{(x,y)\in\mathbb{R}_{+}^{2}|x\leq y\right\}$ . Hence, we obtain a finite subset of the set $D$ . This space of finite subsets is endowed with a matching distance called the bottleneck distance and defined as follow

[TABLE]

where $A^{\prime}=A\cup\Delta$ , $B^{\prime}=B\cup\Delta$ , $\Delta=\{(x,y)\in\mathbb{R}^{2}_{+}|x=y\}$ and the $\inf$ is over all the bijections from $A^{\prime}$ to $B^{\prime}$ .

Application We illustrate the construction of the barcodes and persistence diagram with the filtration parameter $\varepsilon$ in figure 3, according to the previous definitions. For every data point, the size of the points is continuously and artificially increased using a filtration parameter $\varepsilon$ . The points are, therefore, transformed to geometrical disks as the filtration parameter keeps growing. When two disks intersect, a line is drawn between the two corresponding original data points, creating a connected component defined as 1-simplex, while a barcode is drawn in a separate diagram. The barcodes highlight the birth-death cycles of each homological groups, $H_{0}$ for the connected components and $H_{1}$ for the loops. At the end of the filtration procedure, a persistence diagram is drawn to recapitulate the birth-death events observed with the barcodes, as shown in figure 4. The persistence diagram is used to describe the topological properties of the original data points cloud using quantitative measures, such as the bottleneck distance.

3.5 PHom-WAE, Persistent Homology for Wasserstein Auto-Encoder

Bridging the gap between topology and neural networks, PHom-WAE uses a two-steps procedure. First, the minimization problem of WAE is solved for the encoder $Q$ and the decoder $G$ . We use Adam optimizer [34] for the optimization procedure. Then, the samples of the encoded and decoded distributions, $P_{Z}$ and $P_{G}$ , are mapped to persistence homology to describe their respective manifold.

We highlight the topological features of the WAE’s manifolds based on their respective distributions with persistent homology. First, the points contained in the manifold $\mathcal{Z}$ inherited from $P_{Z}$ , the manifold $\mathcal{X}$ from $P_{X}$ and $P_{G}(X|Z)$ are randomly selected into respective batches. Two samples, $Y_{1}$ from $\mathcal{X}$ following $P_{X}$ and $Y_{2}$ from $\mathcal{X}$ following $P_{G}(X|Z)$ , are selected to differentiate the topological features of the manifold $\mathcal{X}$ before and after the encoding-decoding. Similarly, two other samples $Y_{1},Y_{2}$ are randomly selected from $\mathcal{Z}$ following $P_{Z}$ to detect the scattered distribution of the manifold $\mathcal{Z}$ after the encoding. The samples $Y_{1}$ and $Y_{2}$ are contained in the spaces $\mathcal{Y}_{1}$ and $\mathcal{Y}_{2}$ , respectively. Then, the spaces $\mathcal{Y}_{1}$ and $\mathcal{Y}_{2}$ are transformed to metric space sets $\mathcal{\widehat{Y}}_{1}$ and $\mathcal{\widehat{Y}}_{2}$ for computational purposes. Then, we filter the metric space sets $\mathcal{\widehat{Y}}_{1}$ and $\mathcal{\widehat{Y}}_{2}$ using the Vietoris-Rips simplicial complex filtration. Given a line segment of length $\epsilon$ , vertices between data points are created for data points respectively separated from a smaller distance than $\epsilon$ . It leads to the construction of a collection of simplices resulting in Vietoris-Rips simplicial complex VR $(\mathcal{C},\epsilon)$ filtration. In our case, we decide to use the Vietoris-Rips simplicial complex as it offers the best compromise between the filtration accuracy and the memory requirement [10]. Subsequently, the persistence diagrams, $\text{dgm}_{Y_{1}}$ and $\text{dgm}_{Y_{2}}$ , are constructed. We recall a persistence diagram is a stable summary representation of topological features of simplicial complex. The persistence diagrams allow the computation of the bottleneck distance $d_{b}(\text{dgm}_{Y_{1}},\text{dgm}_{Y_{2}})$ . The barcodes represent the lifespan of the homological features detected by the persistence diagrams, for instance the holes. The barcodes are a collection of the interval modules. Furthermore, the lifespan of a homological feature is defined by the boundaries of its interval module, respectively the birth time and the death time. Therefore, the barcodes illustrate in a simple way the birth-death of the pairing generators of the iterated inclusions. PHom-WAE is described in Algorithm 1.

4 Experiments

In this section, we empirically evaluate the proposed methodology PHom-WAE. We assess on a challenging data set for auto-encoders whether PHom-WAE can simultaneously achieve (i) accurate topological reconstruction of the data points and (ii) appropriate persistent homology mapping of the latent manifold.

Data Availability and Data Description We train PHom-WAE on one real-world open data set: the credit card transactions data set from the Kaggle database 111The data set is available at https://www.kaggle.com/mlg-ulb/creditcardfraud. containing 284 807 transactions including 492 frauds. This data set is particularly interesting because it underlines the chaotically scattered points of the encoded manifold that are found during the AE training. Furthermore, this data set is challenging because of the strong imbalance between normal and fraudulent transactions while being of high interest for the banking industry. To preserve transactions confidentiality, each transaction is composed of 28 components obtained with PCA without any description and two additional features Time and Amount that remained unchanged. Each transaction is labeled as fraudulent or normal in a feature called Class which takes a value of 1 in the case of fraud or 0 otherwise.

Experimental Setup and Code Availability In our experiments, we use the Euclidean latent space $\mathcal{Z}=\mathcal{R}^{2}$ and the square cost function $c$ previously defined as $c(x,y)=||x-y||_{2}^{2}$ for the data points $x,y\in\mathcal{X}$ . The dimensions of the true data set is $\mathcal{R}^{29}$ . We kept the 28 components obtained with PCA and the amount resulting in a space of dimension 29. For the error minimization process, we used Adam gradient descent [34] with the parameters $\text{lr}=0.001,\beta_{1}=0.9,\beta_{2}=0.999$ and a batch size of 64. Different values of $\lambda$ for Wasserstein penalty have been tested, we empirically obtained the lowest error reconstruction with $\lambda=15$ . The coefficients of persistence homology are evaluated within the field $\mathbb{Z}/2\mathbb{Z}$ . We only consider homology groups $H_{0}$ and $H_{1}$ who represent the connected components and the loops, respectively. Higher dimensional homology groups did not noticeably improve the results quality while leading to longer computational time. The simulations were performed on a computer with 16GB of RAM, Intel i7 CPU and a Tesla K80 GPU accelerator. To ensure the reproducibility of the experiments, the code is available at the following address222The code is available at https://github.com/dagrate/PHom-WAE..

Results and Discussions on PHom-WAE against PHom-VAE We test PHom-WAE against PHom-VAE, Persistent Homology for Variational Auto-Encoder. We recall a Variational Auto-Encoder [8] uses a KL divergence, denoted by $D_{KL}(P_{X},P_{G})$ , composed of a reconstruction cost and a regularization term instead of an OT cost function. We use the same concepts introduced for PHom-WAE to define PHom-VAE. We compare the performance of PHom-WAE and PHom-VAE on two specificities: the latent manifold $\mathcal{Z}$ and the reconstructed data distribution $G(Z)$ following the generative model distribution $P_{G}$ for the samples $Z\in\mathcal{Z}$ .

As pictured in figures 5 and 6, both the persistence diagram and the barcodes between the original and the reconstructed distributions, respectively $P_{X}$ and $P_{G}(X|Z)$ , of the manifold $\mathcal{X}$ are more widely distributed for PHom-WAE than for PHom-VAE. Additionally, the persistence diagram and the barcodes of PHom-WAE are qualitatively closer to those associated with the original data manifold $\mathcal{X}$ . It means the topological features are better preserved for PHom-WAE than for PHom-VAE. It highlights a better encoding-decoding process thanks to the use of an optimal transport cost function. Furthermore, in figure 7, a topological representation of the original and the reconstructed distributions is highlighted. We observe the iterated inclusion chains are more similar for PHom-WAE than for PHom-VAE. For PHom-VAE, the inclusions of the reconstructed distribution are randomly scattered through the space without connected vertices.

In order to quantitatively assess the quality of the encoding-decoding process, we use the bottleneck distance between the persistent diagram of $\mathcal{X}$ and the persistent diagram of $G(Z)$ of the reconstructed data points. We recall the strength of the bottleneck distance is to measure quantitatively the topological changes in the data, either the true or the reconstructed data, while being insensitive to the scale of the analysis. Traditional distance measures fail to acknowledge this as they do not rely on persistent homology and, therefore, can only reflect a measurement of the nearness relations of the data points without considering the overall shape of the data distribution. In table 1, we notice the smallest bottleneck distance, and therefore, the best result, is obtained with PHom-WAE. It means PHom-WAE is capable to better preserve the topological features of the original data distribution than PHom-VAE including the nearness measurements and the overall shape.

Last but not least, persistent homology and the bottleneck distance are used to highlight the scarcity of the distribution $P_{Z}$ of the latent manifold $\mathcal{Z}$ . Using the bootstrapping technique of [35], we successively randomly select data samples $Z_{i}$ contained in the manifold $\mathcal{Z}$ . The total number of selected samples $Z_{i}$ is at least 50% of the total number of points contained in the manifold $\mathcal{Z}$ to ensure a reliable statistical representation. Assuming the data is not scattered, the bottleneck distance between the persistence diagrams of the samples $Z_{i}$ is small. On the opposite, if the data is chaotically scattered in $\mathcal{Z}$ , then the topological features between the samples $Z_{i}$ are significantly different, and consequently, the bottleneck distance is large. In table 2, the bottleneck distance is significantly lower for PHom-WAE than for PHom-VAE. Therefore, the level of scattered chaos for PHom-WAE is lower than for PHom-VAE. It also means the distribution $P_{Z}$ of the latent manifold $\mathcal{Z}$ is better topologically preserved for PHom-WAE than for PHom-VAE. The reconstructed distribution $P_{G}(X|Z)$ of $\mathcal{X}$ is, thus, less altered for PHom-WAE.

5 Conclusion

Building upon WAE and VAE, we introduce PHom-WAE and PHom-VAE, a new characterization of the manifolds of the WAE and VAE, respectively, that uses topology and persistence homology to highlight the manifold properties and the scattered points of the hidden space. We discussed their relations with other AE modeling techniques. Furthermore, relying on persistence homology, the bottleneck distance has been introduced to estimate quantitatively the alteration of the topological features occurring during the encoding-decoding process, a specificity that current traditional distance measures fail to acknowledge. We conducted experiments showing the performance of PHom-WAE in comparison to PHom-VAE using a challenging imbalanced real-world open data set containing credit card transactions, particularly suitable for the banking industry. We showed the superior performance of PHom-WAE in comparison to PHom-VAE. Future work will include further exploration of the topological features such as the influence of the simplicial complex and the possibility to integrate a topological optimization function as a regularization term.

Bibliography35

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Golub, G.H.: Cf van loan. Matrix computation, Baltimore, MD, USA (1989)
2[2] Carroll, J.D., Chang, J.J.: Analysis of individual differences in multidimensional scaling via an n-way generalization of “eckart-young” decomposition. Psychometrika 35 (3) (1970)
3[3] Kolda, T.G., Bader, B.W.: Tensor decompositions and applications. SIAM review 51 (3) (2009)
4[4] Paatero, P.: A weighted non-negative least squares algorithm for three-way ‘PARAFAC’ factor analysis. Chemometrics and Intelligent Laboratory Systems 38 (2) (1997)
5[5] Bader, B.W., Harshman, R.A., Kolda, T.G.: Temporal analysis of semantic graphs using asalsan. In: Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on. IEEE (2007)
6[6] Le Cun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Backpropagation applied to handwritten zip code recognition. Neural computation 1 (4), 541–551 (1989)
7[7] Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and helmholtz free energy. In: Advances in neural information processing systems. pp. 3–10 (1994)
8[8] Kingma, D.P., Welling, M.: Auto-encoding variational bayes. ar Xiv preprint ar Xiv:1312.6114 (2013)