Persistence Homology of Networks: Methods and Applications
Mehmet Emin Aktas, Esra Akbas, Ahmed El Fatmaoui

TL;DR
This paper reviews the use of persistent homology, a topological data analysis tool, for extracting and analyzing the complex topological features of various types of networks, highlighting recent methods and applications.
Contribution
It provides a comprehensive conceptual review and unified framework of recent advances in applying persistent homology to complex network analysis.
Findings
Summarizes key methods for applying PH to networks
Highlights applications in biological and social networks
Identifies future research directions
Abstract
Information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, and biological networks. The primary challenge in this domain is measuring similarity or distance between networks based on topology. However, classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements or correlations without considering the topology of networks such as the connected components or holes. In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales with applications ranging from biological networks to social networks.…
| Paper | Filtration | Topological Summary | Data |
| [36] | POW | 0-1 dim Betti numbers | PPI networks |
| [40] | VR | 1-2 dim PD | Brain networks |
| [41] | VR | 0 dim PD | Brain networks |
| [39] | WS | 0-2 dim PD | Brain Networks |
| [42] | VR | 0-dim PD | Brain Networks |
| [44] | VR | 0-2 dim PD | Migration and remittance networks |
| [37] | VR | 1 dim PB | Simulated idiotypic networks |
| [43] | TMP | 1-2 dim PD | Co-occurrence networks |
| [45] | POW | 0 dim PB | Co-occurrence networks |
| [24] | VBCL,kCL | 0 dim PD | Co-occurrence, brain and collaboration network |
| Paper | Filtration | Topological Summary | Data |
| [21, 22] | DSS | 1 dim PD | Brain Networks |
| [64] | ZSF | 1 dim zigzag PD | Brain Networks |
| [62] | VR | 0 dim Betti plots | Brain networks |
| [63] | VR | 0-1 dim PD | Brain networks |
| [67] | VR | 1 dim PD | Brain networks |
| [65] | WS | 0 dim persistence vineyards | Brain Networks |
| [25] | WS | 0-2 dim PD | Collaboration networks |
| [57] | VR | 0-2 dim Betti numbers | Collaboration networks |
| [28] | TMP | 1 dim Betti numbers | Collaboration networks |
| [50] | VR | 0-3 dim Betti numbers | PPI, brain and simulated weighted networks |
| [58] | VR | 0-2 dim PD | Economy networks |
| [59] | VR | 0-1 dim PD | Finance networks |
| [23] | CL | 0-11 dim PD | Random, email and scale-free networks |
| [27] | FMG | 0 dim PD | Road networks |
| [70] | VSF | 0 dim zigzag PD | Dynamic biological networks |
| [30] | PPH | 1 dim PD | Cycle networks |
| [68] | POW | 0-1 dim PD | Dynamic communication networks |
| [60] | VFB | 0-1 dim PD | Attributed social networks |
| [69] | POW | 0-1 dim PD | Online social networks |
| [55] | VFB | 0-1 dim PD | Social, medical and biological networks |
| [20, 51, 52] | VR | 1 dim PB | Social, infrastructural and biological networks |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Persistence Homology of Networks: Methods and Applications
MA\fnmMehmet E Aktas
EA\fnmEsra Akbas
AF\fnmAhmed El Fatmaoui
\orgnameDepartment of Mathematics and Statistics, University of Central Oklahoma, \cityEdmond, OK, \cnyUSA
\orgnameDepartment of Computer Science, Oklahoma State University, \cityStillwater, OK, \cnyUSA
Abstract
Information networks are becoming increasingly popular to capture complex relationships across various disciplines, such as social networks, citation networks, and biological networks. The primary challenge in this domain is measuring similarity or distance between networks based on topology. However, classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements or correlations without considering the topology of networks such as the connected components or holes. In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales with applications ranging from biological networks to social networks.
In this paper, we provide a conceptual review of key advancements in this area of using PH on complex network science. We give a brief mathematical background on PH, review different methods (i.e. filtrations) to define PH on networks and highlight different algorithms and applications where PH is used in solving network mining problems. In doing so, we develop a unified framework to describe these recent approaches and emphasize major conceptual distinctions. We conclude with directions for future work. We focus our review on recent approaches that get significant attention in the mathematics and data mining communities working on network data. We believe our summary of the analysis of PH on networks will provide important insights to researchers in applied network science.
Persistent homology,
networks,
simplicial complex,
filtration,
keywords:
\startlocaldefs\endlocaldefs
{fmbox}\dochead
Research
{abstractbox}
1 Introduction
Information networks are important tools to model the relationship between complex data. They exist in multiple disciplines such as social networks, biological networks, the World Wide Web and so on. Analysis of such networks includes many applications such as node classification [1, 2], community detection [3, 4], and link prediction [5, 6].
The primary challenge in applied network science is measuring similarity or distance between networks without knowing node correspondences. Since comparing the graphs with the graph isomorphism is computationally expensive [7], many statistically oriented graph similarity measures have been proposed in literature [8, 9]. While some of these methods embed the graphs into a feature space and then define distances on that space, other methods define kernel functions on graphs to build similarity measures [10]. Moreover, in graph-theoretic approaches, similarity measures are defined based on the difference in graph-theoretic features such as assortativity, betweenness centrality, small-worldness, and network homogeneity. However, such classical graph-theoretic measures are usually local and mainly based on differences between either node or edge measurements, or correlations without considering the network topology. Therefore, they may have information loss over topological structures, such as the connected components or holes in networks. On the other hand, structural holes in networks can give important information about network topology [11]. For instance, node importance can be measured based on structural holes. The unique characteristics of nodes in the location of structural holes can help to separate the structural holes nodes from other nodes. Moreover, the existence and distribution of structural holes in networks can be used as important topological features for network comparison and classification [12].
In recent years, mathematical tools and deep learning based methods have become popular to extract the topological features of networks. Persistent homology (PH) is a mathematical tool in computational topology that measures the topological features of data that persist across multiple scales. Its applications range from biological systems [13] to computer vision [14]. The basic idea in PH is to replace the data points with a parametrized family of simplicial complexes, which can roughly be considered as a union of points, edges, triangles, tetrahedron and higher-dimensional polytopes, and encode the change of the topological features (such as the number of connected components, holes, voids) of the simplicial complexes across different parameters for data analysis [15]. For an extensive and rigorous introduction to the computation of persistent homology, we refer readers to the survey papers [16, 17].
Nowadays PH is largely applied for the study of complex networks as a feature extractor since persistent homology gives multi-scales summarization of the graph, unlike the traditional metrics that describe the graph in specific angles. In this paper, we provide a conceptual review of key advancements in the area of using PH on complex network science.
The paper is structured as follows: In Section 2, we define and give the background on networks, simplicial complex, simplicial homology, and persistent homology. In Section 3, we list and compare the filtrations defined for networks. In Section 4, we highlight different algorithms and applications where PH is used in solving network mining problems. Lastly, we conclude the paper with directions for future work in Section 5.
2 Preliminaries
While a network can be represented as a graph, it can also be represented as other topological objects. Topology is a branch of mathematics that studies the property of the shapes that are invariant under continuous deformation such as stretching, twisting, bending but not tearing or gluing. For example, a donut and a coffee mug are topologically equivalent since one can transform one to the other continuously. Topological invariants, which are properties of the shapes that do not change under continuous transformation, are useful to detect whether given two shapes are topologically equivalent. The number of connected components, the existence of holes or voids are examples of the topological invariants. Algebraic topology is the area in topology that extracts these invariants of an object by simply counting them or associating algebraic structures, such as vector spaces, to them. For example, for a given topological object , homology associates vector spaces for where the dimension of gives the number of connected components, gives the number of holes, gives the number of voids and so on.
For a finite set of points, e.g. a point cloud data, homology does not give interesting information. The dimension of gives the number of points, and the dimensions of the higher dimensional homology are zero. It is also similar in a network setting. The dimension of gives the number of disconnected subgraphs, gives the number of loops and the dimensions of the higher dimensional homology are zero since a graph does not have 2 and higher dimensional simplices. Hence, instead of just looking the homology of the finite set of points itself, using (1) a distance function, e.g. a correlation or a measure of dissimilarity between points, and (2) a parameter value, one can add simplices and check how homology changes across different scales. Persistent homology then tracks the change in homology as the parameter value increases and detects which topological features “persist” across different scales.
In general, it is very difficult to compute homology of an arbitrary topological object. Hence, instead of doing this, we can approximate a topological object with a simplicial complex and then compute the homology, that is actually called simplicial homology.
In this section, we define how to define persistent homology on a finite set of points and networks. We first give a formal definition of graphs and explain their characteristics. We then define the simplicial complex and how to compute the simplicial homology. Finally, we briefly explain the persistent homology and two special metrics which are very useful for using persistent homology in data analysis applications.
2.1 Graphs
As a formal definition, a graph is a pair of sets where is the set of vertices and is the set of edges of the graph. Networks can be represented via graphs where vertices represent the objects and edges represent the relations between objects. There are different types of graphs to represent different relations between vertices. While in an undirected graph, edges link two vertices symmetrically, in a directed graph (also called digraph in literature), edges link two vertices asymmetrically. If there is a score for the relationship between vertices which could represent the strength of interactions, we can represent this type of relations or interactions by a weighted network. In a weighted graph, a weight function is defined to assign a weight on each edge. Weights could come from the Euclidean space or other spaces.
A graph with vertices can be represented by an adjacency matrix. Entries of the matrix will be for an unweighted graph and for a weighted graph if there is an edge from vertex to vertex . If there is no edge between vertex and vertex , it will be .
Furthermore, there are two different graph types we study in this paper. Firstly, a graph is called a metric graph if each edge is assigned a positive length and if the graph is equipped with a natural metric where the distance between any two points of the graph (not necessarily vertices) is defined to be the minimum length of all paths from one to the other. Secondly, a graph is called a dynamic (time-varying) graph if the graph varies over time, i.e. it can have vertex and edge deletions and additions.
2.2 Simplicial complex
Informally, a simplicial complex is a topological object which is built as a union of points, edges, triangles, tetrahedron, and higher-dimensional polytopes. The building blocks of a simplicial complex are called simplices (plural of simplex). Simplices are higher dimensional analogs of points, line segments, and triangles, such as a tetrahedron. We start this section with a formal definition of a simplex.
Definition 1**.**
An -simplex is the convex hull of affinely independent points, i.e. the set of all convex combinations where and for all .
A 0-simplex is just a point, a 1-simplex is two points connected with a line segment, a 2-simplex is a filled triangle (see Figure 1). We call vertex for 0-simplex, edge for 1-simplex, triangle for 2-simplex, and tetrahedron for 3-simplex.
We can now define a simplicial complex roughly as a union of simplices, but these simplices need to be glued in a certain way. Here is the formal definition.
Definition 2**.**
A simplicial complex is a finite collection of simplices such that
Every face of a simplex in also belongs to . 2. 2.
For any two simplices and in , if , then is a common face of both and .
The first condition says that if a simplex, e.g. a triangle, is in , then its faces, such as its edges and vertices, need to be also in . The second condition says that we can only glue simplices by their common faces. For example, we can glue two triangles by a common vertex or a common edge but cannot glue a vertex of a triangle on one of the edges of the other triangle. Figure 2-a is an example of a simplicial complex whereas Figure 2-b and Figure 2-c are not a simplicial complex since they are violating the first and second condition in Definition 2 respectively.
Before we start to explain how to compute the homology of a simplicial complex, we define the clique complex of a graph which will be a crucial concept to define most of the filtrations in Chapter 3.
Definition 3**.**
The clique complex of an undirected graph is a simplicial complex where vertices of are its vertices and each -clique, i.e. the complete subgraphs with vertices, in corresponds to a -simplex in .
For example, in Figure 3-a, there is a graph with a 4-clique on the left, 2-clique in the middle and 3-clique on the right. Hence, its clique complex, Figure 3-b, has a 3-simplex (tetrahedron), a 1-simplex (edge) and a 2-simplex (triangle).
2.3 Simplicial homology
In a simplicial complex, we can consider the holes as voids bounded by simplices of different dimensions. In dimension 0, they are connected components, in dimension 1, they are loops bounded by edges (1-simplices), in dimension 2, they are holes bounded by triangles (2-simplices) and in general, in dimension , they are the holes bounded by -simplices.
The simplicial homology is the way to find the holes in a simplicial complex. To understand what simplicial homology is, we need to define the chains, and two special types of chains, namely cycles and boundaries.
Definition 4**.**
Fix a dimension and assume we use the field of integers. An -chain is a formal sum of -simplices of a simplicial complex with integer coefficients and the sum is taken over possible -simplices. The set of all -chains of is denoted with .
For example, are 0-chains for the simplicial complex in Figure 4. One can add two -chains by simply adding the corresponding integer coefficients, e.g. , and multiply by scalars, e.g. . Hence, is actually a vector space over integers (more generally we can over any field such as real numbers). For simplicity, we assume the field is the binary field from now on.
To map an -simplex to an -simplex, we define the boundary of an -simplex as the sum of its -dimensional faces. Formally speaking, for an -simplex , its boundary is
[TABLE]
where the hat indicates the is omitted. We can expand this definition to -chains. For an -chain , . For example, in Figure 4, and .
We should also note here that the boundary of a boundary is empty, i.e. . For example, in Figure 4, since in the binary field .
We can now distinguish two special types of chains using the boundary map that will be useful to define homology. The first one is an -cycle, which is defined as an -chain with empty boundary. In other words, an -chain is an -cycle if and only if , i.e. . For example, the 1-chain in Figure 4 is a 1-cycle since . The set of all such -cycles forms a subspace in , which we denote .
Second special type of an -chain is -boundary: an -chain is an -boundary if there exists an -chain such that , i.e. . For example. the one chain is a 1-boundary since . The set of all such -boundaries forms a subspace in , which we denote .
After defining these two special subspaces, -cycles and -boundaries of , we now take the quotient space of as a subset of . In this quotient space, there are only the -cycles that do not bound an -complex left. These are actually the -voids of . We call this quotient space as the -th homology of the simplicial complex
[TABLE]
The dimension of -th homology is called the -th Betti number of , , where . Basically, the -th Betti number is the number of -dimensional voids in the simplicial complex. For example, gives the number of connected components and gives the number of loops. In Figure 4, .
2.4 Persistent homology
For a finite set of points , e.g. a point cloud data, homology does not give interesting information. gives the number of connected components, which is just the number of points, and all other Betti numbers are zero since there are no other dimensional holes in the set. Hence, instead of working with the set of points, one can induce a family of simplicial complexes for a range of values of out of the set so that the complex at step is embedded in the complex at for , i.e. . This nested family of simplicial complexes is called filtration (see Figure 5 for an example). During this construction, some holes may appear and then disappear and the persistency of these homological features can be considered as the features of the dataset. In a filtration, one can record the birth, the time a hole appears, and death, the time a hole disappears, of holes. The essence of the persistent homology is to tract the birth and death of these homological features in for different values. The lifespan of each homological feature can be represented as an interval, where the start and end points of the interval correspond to the birth and the death of the homological feature respectively. For a given dataset and a filtration, one can record all these intervals by a persistence barcode (PB) as a multiset of intervals bounded below [18]. Equivalently, a persistence barcode can be represented via persistence diagram (PD) that consists of the birth and death times of the features as a point (birth, death) in the extended real plane [19]. The longer bars in PBs and the points far away to diagonal in PDs are considered as the real feature of the dataset.
Example 1**.**
Figure 6 has the 0- and 1-dimensional persistence barcodes (Figure 6-(a)) and the 0- and 1-dimensional persistence diagrams (Figure 6-(b)) of the filtration in Figure 5.
We first investigate the 0-dimensional PB and PD. As we see in the filtration, when , there are five disconnected vertices, which means there are five connected components in the simplicial complex. That is why five bars are born at the beginning of the 0-dimensional PB. When , two edges are added that decreases the total number of connected components to three, hence two bars die at . When , three more edges are added and this makes the simplicial complex only one connected component, thus two more bars die at . After this point, the number of connected component does not change so the top bar lives forever (arrowhead at the right of that bar implies this fact). Following the same reasoning, 0-dimensional PD has the point two times, that corresponds to the two bars spanning from 0 to 1 in the 0-dimensional PB, the point again two times, that corresponds to the two bars spanning from 0 to 2 in the 0-dimensional PB, and the point , that corresponds to the top bar that lives forever in the 0-dimensional PB.
For the 1-dimensional PB and PD, since the first 1-dimensional hole (loop) appears for , there is a bar born at this value in the 1-dimensional barcode. When , this loop splits into two loops, hence the number of the loops increases to two, and as a result, a new bar is born at . When , one of the two loops also splits into two loops, so there is another bar born at . When , the top triangle is filled in (a 2-simplex is added), so the number of the loops decreases by one and this results into a death of the bar born at . Similarly, other two bars die at and . In the 1-dimensional PD, there are the points (2,7), (3,6) and (4,5) that correspond to the three bars in the 1-dimensional PB.
2.5 Two metrics for persistence diagrams
One may want to employ persistence diagrams to compare the corresponding datasets. For example, in the network matching problem, we can create a persistence diagram for each network and compare the persistence diagrams to obtain the network similarity. For such a comparison, we need to measure the distance between persistent diagrams using stable metrics. A metric is stable if a small perturbation of a dataset creates only a small change in the persistence diagram up to that metric. There are two metrics, which can be stable depending on how simplices are defined, that have been commonly used to measure the distance between diagrams: the bottleneck distance and the Wasserstein distance. We first define the bottleneck distance.
Definition 5**.**
Let and be two persistence diagrams. The bottleneck distance between and is defined as
[TABLE]
where ranges over all matchings from to and for with .
In other words, the bottleneck distance measures the distance between two persistence diagrams and by the maximum distance between two points in a matching from to . Hence, the bottleneck distance only outputs the distance between the greatest outlier, rather than the distance between all pair of points.
As an answer to this concern, the Wasserstein distance can be used.
Definition 6**.**
Let and be two persistence diagrams. The -th Wasserstein distance between and is defined as
[TABLE]
where ranges over all matchings from to and for with .
In other words, the Wasserstein distance considers the total distance between the matched pair of points, hence provides an overall quantification for the similarity between persistence diagrams.
3 Filtrations
In this section, we review the filtrations defined for networks in the literature. We compare the filtrations according to their properties such as sensitivity to different network types (e.g. directed/undirected, weighted/unweighted). We also provide a comparison table, Table 1, at the end of the section.
Throughout this section, we use the notations defined for graphs in Section 2.1.
3.1 Vietoris-Rips filtration (VR)
Let be an undirected weighted graph with the weight function defined on . For any , the 1-skeleton is defined as the subgraph of where and its edge set only includes the edges whose weight is less than or equal to . Then, for any , we define the Vietoris-Rips complex as the clique complex of the 1-skeleton , , and the Vietoris-Rips filtration is then defined as
[TABLE]
In other words, in this filtration, we first start with the vertex set. We then rank the edge weights from the minimum weight, , to the maximum weight, , and let the parameter increase from to . At each step, we add the corresponding edges and take the clique complex of the thresholded subgraph . This construction yields the Vietoris-Rips filtration on networks.
For the application purposes, we may prefer to add edges with larger weights before the ones with smaller weights to stress the importance of the weights. In other words, after adding the vertex set, we rank the edge weights from to and for any , we add edges whose weight is bigger than or equal to . This yields a similar yet another filtration. This filtration is called the weight rank clique filtration by [20]. However, to be more concise, we prefer to call this as inverse Vietoris-Rips filtration.
3.2 Dowker sink and source filtration (DSS)
Using the idea of the Vietoris-Rips filtration, the authors in [21, 22] define the Dowker -sink and -source simplicial complex on directed weighted networks that is sensitive edge directions. For a directed graph with edge weights , the Dowker -sink simplicial complex associated to is defined as
[TABLE]
In other words, there is a sink vertex such that there are edges from each to with weights less than or equal to the threshold . Using this simplicial complex, they define the Dowker sink filtration as follows
[TABLE]
They similarly define a dual construction, namely the Dowker -source simplicial complex associated to a directed weighted network , as follows
[TABLE]
The only difference here is the edge directions: there is a source vertex such that there are edges from to each with weights less than or equal to the threshold . Similarly, they define the Dowker source filtration as
[TABLE]
Dowker sink and source (DSS) filtrations are formed with respect to a central authority , hence they could be preferred on networks, such as small-world networks, who would desire simplices to be formed with respect to particular hub nodes.
The authors also prove that both filtrations generate the same persistent diagram.
3.3 Clique complex filtration (CCL)
For a graph with vertices and its clique complex , the clique complex filtration is defined as
[TABLE]
such that where the -th complex in the filtration is given by where is the th skeleton of the clique complex, i.e. the set of simplices of dimension less than or equal to [23]. In other words, in this filtration, we add the vertices at , add the edges at , add the triangles at and so on.
3.4 Vertex-based clique filtration (VBCL)
This filtration is originally defined in [24] for just the 0-dimension, however it can be extended to higher dimensions as well. Let be a graph with a vertex weight function . For this filtration, we use vertex weights, instead of the edge weights, as threshold values. For any , the 1-skeleton is defined as the subgraph of where and the edges Then, for any , using the clique complex of the 1-skeleton , , the filtration is defined as
[TABLE]
Furthermore, we can also define the inverse vertex-based clique filtration by just filtering from to as we do in the Vietoris-Rips filtration.
3.5 -clique filtration (kCL)
This filtration is used to detect the evolution of -clique communities for a fixed in [24]. In this filtration, we assume the graph has a vertex weight function . First, using the vertex weights, we assign a weight function on an arbitrary clique inductively as
[TABLE]
i.e., the maximum weight of its vertices. Second, for a fixed , we detect all -clique communities in and create the -clique connectivity graph where there is a vertex for every -clique of and its edges are defined by
[TABLE]
i.e., and intersect in a -clique, in other words, they share vertices in common. We then extend the weight function to the edges of by setting
[TABLE]
Next, in a similar way, for any , the 1-skeleton is defined as the subgraph of where and the edges Then, for any , using the clique complex of the 1-skeleton , the filtration is defined as
[TABLE]
This filtration is unique in a sense that it just focuses on the evolution of the -clique communities only in the original graph.
3.6 Weighted simplex filtration (WS)
In the previous filtration, we assign weights to arbitrary cliques, i.e. simplices, using the vertex weights. Alternatively, one may use another way to assign weights to simplices and use these weights to create a filtration. For example, Huang et all [25] assign weights to each simplex in a simplicial complex based on relationship functions in a given dissimilarity network. For any , they define to be the collection of simplices appearing before or on . Then, this construction yields the filtration
[TABLE]
To be a well-defined filtration, we need to have all faces of each simplex and intersections of any simplices in also appear before or on . They prove that this filtration from a given dissimilarity network is a well-defined filtration, i.e. satisfies both conditions.
3.7 Vertex function based filtration (VFB)
Let be an undirected graph and be a function defined on its vertices. We construct the sublevel graphs for where and . Hence, increasing from to provides a nested sequence of increasing subgraphs. The sublevel vertex function based (VFB) filtration is given by taking the clique complex of each sublevel graph
[TABLE]
Similarly, we can construct the superlevel graphs for where and . This time decreasing from to provides a nested sequence of increasing subgraphs which yields to the superlevel vertex function based (VFB) filtration as follows
[TABLE]
3.8 Intrinsic Čech filtration (IC)
This filtration is defined only for metric graphs in [26]. Let be a metric graph with geometric realization . For any point , we define the set , and we let be an open cover. Since has all the vertices and every point along the edges, it has uncountable points. Hence, is also an uncountable cover. We let denote the nerve of where the nerve of a family of sets is the abstract simplicial complex defined on the vertex set where a family with for all spans a -simplex if and only if . The associated intrinsic Čech filtration is defined as the set of inclusion maps
[TABLE]
3.9 Functional metric graph filtration (FMG)
This filtration is also defined for metric graphs only [27]. Let be a metric graph and take a fixed point . They consider where , i.e. to be the geodesic distance from to . Let denote the super-level set of with respect to . Clearly for . Then the filtration is given by
[TABLE]
Similarly, instead of using super-level set of , one can use the sub-level set of for each , . This yields another filtration
[TABLE]
Depending on problems, one may choose either of the filtrations.
3.10 Power filtration (POW)
Let be a graph. A walk is an alternating sequence of vertices and edges beginning with and ending with such that every edge joins the vertices immediately preceding and following it. A path is a walk in which no vertex is repeated and the number of edges it contains is its length. The graph distance between is the minimum length of all paths. One can also consider the edge weights while computing the graph distance as well. The th power , of is the graph with vertex set and for which if, and only if, the distance between and in is at most . The power filtration is the clique complex of the th power . In other words, for an appropriate distance range within , the power filtration is given by
[TABLE]
where denotes the clique complex.
3.11 Temporal filtration (TMP)
This filtration is defined in [28] for dynamic (time-varying) networks. If the network is growing in time , this will yield a sequence of networks where the network represents the network occurred until time . This network sequence results in the temporal filtration given by the clique complex of each network
[TABLE]
3.12 Zigzag simplicial filtration (ZSF)
This filtration is also defined for dynamic networks. While TMP only considers the vertex and edge insertion into a dynamic graph which yields adding simplices to the simplicial complexes, this method also allows vertex and edge deletion from a dynamic graph which yields removing simplices from the simplicial complexes. In a standard filtration on a graph , whenever . This filtration generalizes standard filtrations by allowing the simplicial complexes to sometimes become smaller. A zigzag simplicial filtration on a graph is a filtration with extra two conditions: (1) The set of points of discontinuity of the zigzag simplicial filtration should be locally finite, i.e. each point in the set has a neighborhood that includes only finitely many of the points in the set and (2) for any scale parameter value , it holds that for all sufficiently small . Then we use the zigzag persistent homology to obtain the persistence barcodes/diagrams [29]. The same basic idea applies in the zigzag persistent homology. For example, for the 0-dimensional zigzag persistence barcodes, we just track the number of connected components in the filtration.
3.13 Digraph filtration using Persistent Path Homology (PPH)
In [30], the authors define a new way to construct homology on networks: persistent path homology (PPH) which is sensitive to the edge directions. We summarize the construction in 4 steps.
Step 1: Let be directed weighted graph. Given any integer , an elementary -path over is a sequence of vertices of . For each , the free vector space consisting of all formal linear combinations of elementary -paths over with coefficients in is denoted . One also defines and . Next, for any , one defines the non-regular boundary map as
[TABLE]
for each elementary -path . is the zero map. Observe that for all so is a chain complex.
Step 2: For each , an elementary -path is called regular if for each and irregular otherwise. Let and is irregular one. We have so is well defined on . Since via a natural isomorphism, one can define as the pullback of via this isomorphism. is called the regular boundary map and now we have a chain complex .
Step 3: For each , one defines an elementary -path on to be allowed if for each . For each , the free vector space on the allowed -paths on is denoted and is called the space of allowed -paths. Furthermore, , .
Step 4: The allowed paths do not form a chain complex since the image of an allowed path under need not to be allowed. To handle this problem, they define the space of -invariant -paths on as the following subspace of :
[TABLE]
One further defines and . Now we have a chain complex and this yields to path homology groups.
Filtration: For any and a directed weighted graph with the edge weight function , we define the directed subgraphs where . We then define the digraph filtration as
[TABLE]
After getting the filtration, instead of the persistent homology, they apply the persistent path homology (PPH). They show that in an undirected graph, PPH and Dowker filtrations agree in dimension 1 if a certain local condition is satisfied (need to be square-free). The authors also prove the stability result for PPH.
3.14 Generalizations of Vietoris-Rips filtration (GVR)
The author in [31] uses ordered-tuple complexes instead of using simplicial complexes to increase the flexibility regarding order. An ordered-tuple complex (OT-complex) is a collection of ordered tuples such that if then for all (where is the ordered tuple with removed). For example, the tuples and are distinct. One can define a chain complex, homology and -th dimensional ordered tuple persistence homology of OT-complexes as defined for simplicial complexes. Then, the author defines the four generalization of Vietoris-Rips filtrations. She also proves the stability theorem for each case. Hence, we will not mention the theorem for each case separately. For the following filtrations, let be the vertex set and a function ( can be considered as an edge weight function).
3.14.1 Vietoris-Rips filtration under syma
For any we can define a symmetric function sym
[TABLE]
Set be the simplicial complex containing whenever sym for all . This filtration is called the Vietoris-Rips filtration under syma of .
3.14.2 Directed Vietoris-Rips filtration
Set to be the filtration of OT-complexes where when for all . This filtration is called the directed Vietoris-Rips filtration of .
3.14.3 Associated filtration of directed graphs
There is a natural filtration of directed graphs associated to by setting to the the directed graph with vertices and including the directed edges whenever max. This filtration is called the associated filtration of directed graphs of .
3.14.4 Preorder filtration
Given a preorder , let be the OT-complex containing when . Let be the filtration of OT-complexes corresponding to the filtration of posets . This filtration is called the preorder filtration of
4 Algorithms and Applications
In recent years, persistent homology has found applications in data analysis, including neuroscience [32], time series data [33], text mining [34] and shape analysis [35]. In the complex network setting, while some studies analyze the evaluation of a single graph, some studies analyze multiple graphs for graph matching and classification with characterizing the temporal changes in topological features of a network. Besides these, while some studies use Betti numbers, some studies use persistent diagrams to extract some statistical features of the network. In this section, we categorize the persistent homology enabled applications as single graph and multiple graph analysis. We explain the algorithms and applications of each study in their corresponding sections. We also provide a comparison table, Table 2 and Table 3, for algorithms with datasets after each section.
4.1 Analysis on Single Graph
In some applications, persistent homology is used to detect global structural features of a single network such as complexity and distributions of strongly connected regions. While some applications analyze the evaluation of a single graph according to edge weights, others analyze the evaluation of the graph over time.
In many studies, Betti numbers are used as the complexity measure for different networks. Benzekry et al. [36] propose that cancer therapy can be guided by changes in the complexity of protein-protein interaction (PPI) networks. They analyze 11 cancer interaction networks and find out that there is a correlation between 1-dimensional Betti number and survival of cancer patients. They compute Betti numbers using the power filtration (POW). To examine the effect of a node on the network complexity, each node in the network is removed and the change in Betti number is recorded. They consider the drop of the Betti number as the drop of the complexity. Therefore, if the removal of a node results in the largest drop in Betti number, it also results in the largest drop in complexity and is potentially a good drug target.
Similar to this, Rucco et al. [37] use Betti number as persistent entropy to measure the graph complexity. They study the behavior of the idiotypic network of the mammal immune system. Their main goal is to detect the behavior of the immune system reaction to an external stimulus in terms of phase transitions. In addition to the persistent entropy, they use 2 other graph complexity measures, which are the connectivity entropy and the approximate von Neumann entropy [38]. While connectivity entropy is used to analyze the structural properties and to identify the set of key players of the idiotypic network, approximate von Neumann entropy is used to distinguish graphs corresponding to the same system but in different conditions. For persistent entropy, they use persistent barcodes constructed with the Vietoris-Rips filtration (VR). In their experiment, they create the simulation of the idiotypic network and obtain a weighted idiotypic network using the coexistence function as a weight function between antibodies. After computing the 3 different entropy measures on this network, they identify that peak on entropy corresponds to the activation of the immune response. While the connectivity entropy does not distinguish between the activation and the immune memory states, both the approximated von Neumann entropy and the persistent entropy are able to recognize the activation of the immune system. The analysis of the Betti numbers reveals that there is a subset of antibodies arranged in a 1-dimensional hole that is present both in the activation state and in the memory state.
Cliques and cycles are important structural features of complex networks to describe their cohesive structures. Rieck et al. [24] use persistent homology to detect clique communities and their evolution in weighted networks. Persistent diagram is created using the vertex-based clique filtration (VBCL) and the -clique filtration (kCL). They analyze the connectivity relations for all clique degrees and all weight thresholds. Various networks are studied including co-occurrence network, brain network, and collaboration network. An interactive visualization tool is created that is capable of detecting and tracking the evolution of networks’ clique communities for different thresholds and clique degrees.
Persistent homology is also used to analyze the brain networks by computing distributions of cliques (brain regions) and cycles (strongly connected regions) in them. In [39], the authors review the underlying mathematical background of using simplicial complex in neural data, specifically brain networks. They list different types of simplicial complexes for encoding neural data such as networks, clique complex, independence complex, and concurrence complex. They also elaborate on using persistent homology to measure the global structure of simplicial complexes and the strength of neural connections using the weighted simplex filtration (WS) to generate persistent diagrams.
In [40], the authors test the hypothesis that the spatial distributions of cliques and cycles will differ in their anatomical locations. They construct 1- and 2-dimensional persistent diagrams of brain networks using the Vietoris-Rips filtration (VR). The structural brain networks of eight volunteers is extracted using diffusion spectrum imaging. The undirected and weighted network consists of 83 nodes representing different brain regions and edges that refer to the density of white matter between the nodes. Weak and strong connections between cliques are assessed by observing the difference between birth and death times of -cliques in persistent diagrams.
Additionally, in [41], persistent homology is also used to analyze the brain networks with the aim of examining the abnormal white matter in maltreated children. Networks are obtained by thresholding (based on the sample covariance) sparse correlations for the Jacobian determinant from magnetic resonance imaging (MRI) and fractional anisotropy from diffusion tensor imaging (DTI) at different threshold values. The collection of the thresholded graphs forms a Vietoris-Rips filtration (VR).
Moreover, in [42], the authors demonstrate that persistent homology is useful in analyzing functional brain connectivity. The application involves electroencephalography (EEG) data from eight cortical regions of corticosterone (CORT) induced depression mouse and control models. After the EEG measurement is obtained, the square root of (1-correlation) distance metric is used to create a binary network. Next, the Vietoris-Rips filtration (VR) is applied and used to visualize topological changes by 0-dimensional barcodes which are then used to construct single-linkage dendrograms (SLD). Finally, single-linkage distance is computed using the generated SLDs. The results show that CORT model is characterized by an increased local connectivity and by a decreased global connectivity.
Besides its utility on brain networks, persistent homology is also used to analyze word co-occurrence, remittance, and migration networks. In [43], the authors study the word co-occurrence networks to explore the conceptual landscape of mathematical research. They first create the network using 54177 articles in arXiv from 01/1994 to 03/2007. Then they parse a concept list from Wikipedia that includes 1612 equations, theorems, and lemmas. Next, they combine these two datasets by checking 1612 concepts’ appearance in the articles and find that 1067 of them match in at least one article and 35018 articles contain at least one of the concepts. They first take 1067 concepts as nodes and include a -simplex for each article containing -concepts. Furthermore, whenever the concept sets of two articles intersect at concepts, their corresponding simplices share a face of dimension . In total, this construction results in 32707 unweighted edges. They use the temporal filtration (TMP) using article dates. They create the 1- and 2-dimensional persistent diagrams, i.e. they just look at the 2-dimensional holes bounded by edges and 3-dimensional holes bounded by triangles respectively. They interpret these holes to explain the intrinsic characteristics of how research evolves in mathematics. They also explore the authors’ conceptual profile using the holes and their attributes to the holes.
Ignacio et al. [44] analyze the patterns and shapes in remittance and migration networks as a directed weighted network via persistent homology to identify flow patterns between multiple countries. They detect both local and global patterns that highlight simultaneous interactions among multiple nodes. They extend the Vietoris-Rips filtration (VR) to detect topological features such as persistent cycles in directed networks using the weight of the edges and create persistence barcodes. They use 0-, 1- and 2-dimensional barcodes to analyze the cycles in networks. As a modification on 1 and 2-dimensional barcodes, to encode additional information, they color the bars in barcodes according to the standard deviation of the weights in the cycles they represent. They create the 2015 Asian net migration and remittance networks which include 50 countries and states to perform their analysis on. They define the weight of a directed edge as the profit country gains from exchanging remittances with the country for remittance networks and define it in a similar manner for net migration networks.
One of the challenges for most graphing methods is the inability of visualizing the global structure of graphs as a result of the absence of interactive exploration mechanisms. Persistent homology is used to address this challenge [45]. They use 0-dimensional PH features to control and modify force-directed layouts of a graph. The 0-dimensional barcode, obtained by the power filtration (POW), enables the visualization of contraction and repulsion events in the network. More forces are added to the graph layout based on the selected number of barcodes. They have three case studies to show the effectiveness of their method on 3 different real-world networks. One of the networks is “Les Miserables” which contains 77 nodes (characters) and 254 edges, weighted by how many scenes two characters share during any chapter of the novel. Some of the key characters featured in the book can be identified on the force-directed layout modified with PH features. They are also able to extract major important nodes in the Madrid Train Bombing network and US Senate 2007 and 2008 Co- and Anti-voting network using their method.
4.2 Multiple Graphs Analysis
Graph comparison is an important task for many graph applications such as classification and matching. On the other hand, it is a computationally complex problem where we need to compute the similarity between 2 networks [46]. It has been studied for many years and defined as either exact matches (e.g. graph isomorphism [47]) or some measures of structural similarity (e.g. graph edit distance [48]). Graph kernels are also used to capture the graph similarity [49]. Recent years, persistent homology is used to extract topological features of networks to compare them.
While most existing metrics for network structure rely on local features of vertices such as node degrees, correlation of neighborhood nodes, they do not capture the precise mesoscopic structure of complex networks. Sizemore et al. [50] extract mesoscale homological features as 0-3 dimensional Betti numbers. They use the Vietoris-Rips (VR) filtration to compute the homology and record the maximal clique distribution and Betti sequence. Extracted features are used to classify 14 commonly studied weighted network models into four groups or classes with agglomerative hierarchical clustering to use for graph classification. Betti values and parameters from the maximal clique distribution are used to determine the structural similarities between networks. After classifying networks into groups, they analyze the structural patterns in each group of networks.
In [20, 51, 52], persistent homology is used to detect particular non-local structural features of networks. After creating the barcodes with the inverse Vietoris-Rips (VR) filtration based on edge weights, statistical distributions of 1-dimensional barcodes are computed. They classify real-world networks into 2 classes according to the similarity of their cycle distribution with randomized version. In Class I, cycle distributions are markedly different from the randomized versions and in Class II, cycle distributions are very close to their random versions. The authors study different network datasets, such as US air passenger networks, C. Elegans’s neuronal network [53], the online messages network [54], gene network, network of mentions and re-tweet between Twitter users, school face-to-face contact network, co-authorship networks. While the gene network and airport network are in class 1, co-authorship networks and twitter network are in class 2.
In [55], the authors propose to use persistence diagrams for graph classification problem for undirected weighted graphs. They first define a graph kernel function, namely heat kernel signatures [56], on networks and use the sublevel and superlevel VFB filtration on each network to generate PDs. Then they employ two layers neural network architecture to process the PDs and classify the graphs. They evaluate their classification model on social networks, medical and biological networks. They also compare their results with four different state-of-the-art graph classification methods and show that their method has comparable results despite being much simpler than other methods.
Moreover, persistent homology is used to analyze the structure of weighted networks. In [57], the authors consider the collaboration networks as weighted network. They use the Vietoris-Rips (VR) filtration to generate the persistence barcodes of networks. They employ the Betti numbers of 0, 1, and 2 dimensions and use them to distinguish collaboration networks from random networks. They conclude that the first and second Betti numbers give us richer information about weighted networks.
Siddharth et al. [28] study the growing collaboration network with a temporal parametrization and characterize the temporal changes in its topological features. In a collaboration network, each person in a paper or a movie is represented as a vertex, and each collaborative act (and each of its subsets) is represented as a simplex of vertices comprising it. They define a temporal filtration (TMP) from growing collaboration networks, with adding new collaborations occurred in each year. In addition, they introduce a new distance measure between a growing network which captures the difference in the rate of growth of cycles in the networks being compared. They use DBLP (Digital bibliography & library project) and IMDB (Internet movie database) data sets from 1950-2008 considering 10-year windows. They study the topological properties of networks as the growth in the cyclicity, with respect to the time corresponding to the 10-year windows, and size of the largest connected component.
In [58], the authors consider the national input-output networks of domestic products as a weighted network and use persistent homology to identify dissimilarities between them. The nodes are available sectors in an economy and edge weights are the monetary flow measuring the magnitude of the economic relationship between two sectors. They generate persistence diagrams for dimensions 0, 1, and 2 with the Vietoris-Rips (VR) filtration. Using 0-dimensional diagrams, they distinguish economies with high GDP, large population, and small import/export percentages of GDP from those with lower GDP, small population, and larger import/export percentages. They also discuss the potential for applying higher-dimensional persistent homology to study these networks.
Similarly, financial networks are considered as weighted networks and persistent homology is used to detect early signs of critical transitions of financial crisis in [59]. The vertices correspond to the stocks, each pair of distinct nodes is connected by an edge and each edge is assigned a weight using the Pearson correlation coefficient. For each time frame, they generate 0- and 1-dimensional persistent diagrams of the network using the Vietoris-Rips (VR) filtration. Then, the distance between them is measured via Wasserstein distance. They show that the persistent diagrams and the distances between them have significant changes prior to the 2007-2008 financial crisis.
Furthermore, in [60], the undirected attributed networks are considered as weighted networks. They first assign weights on edges using the vertex attributes. Then, they extract the ego-networks of each vertex and define a graph kernel function, namely the diffusion Fréchet function [61], on each ego network that takes both the network topology and edge weights into consideration. Next, they generate the persistence diagrams of each ego network using the sublevel and the superlevel VFB filtration and obtain the distance matrix between each vertex computing the Wasserstein distance between their persistence diagrams. Finally, they cluster the network using the -means clustering algorithm.
Beside previous weighted networks, brain networks are considered as sparse weighted networks and persistent homology is also used to analyze them [62]. They obtain the topological structure of a graph induced by sparse correlation. They first transform MRI and DTI data to weighted networks where they employ the sparse Pearson correlation to obtain the edge weights. They generate the 0-dimensional Betti plots for the brain networks using the Vietoris-Rips filtration (VR). They also generate Betti plots using sparse covariance. They show that the sparse correlation method gets a huge group separation between normal and stress-exposed children visually. This method is also less computationally expensive than the sparse covariance method.
In [63], the authors study dynamical connectome state analysis on brain networks using three different methods: -means clustering, modularity based clustering and topological feature based clustering. They consider brain networks as weighted networks. In topological feature based clustering, they use the Vietoris-Rips (VR) filtration. They first split the correlation matrix to the two matrices with positive and negative correlations. Then, they create VR filtrations for both matrices. In their clustering, different type of connections describes different processes in the brain, so they compute persistent homology with annotated intervals collection. After getting the intervals, they compute different statistics for each homology group and for types of interactions. Then, finally, they perform hierarchical clustering based on these topological features. They show that topological feature based clustering is more informative than the other two clustering methods.
In addition to this, in [21, 22], the authors classify the brain (hippocampal) networks using persistence diagrams. They consider five different environments with holes, holes, holes, hole and no hole and for each environment, 20 simulated brain networks are created. Persistence diagrams of these 100 networks are computed with the Dowker filtration (DSS). They use the bottleneck distance between the 1-dimensional diagrams of networks to compare them. Finally, they classify the networks using the single linkage dendrogram algorithm and show that Dowker filtration is successful in capturing the differences between the five classes of networks. The authors also work on the same problem and dataset using the zigzag simplicial filtration (ZSF) in [64]. They create 1-dimensional zigzag persistent diagrams to perform persistent homology computations on dynamic simplicial complexes resulted from these brain networks.
In [65], the authors show that persistent homology, or more precisely persistence vineyard, is a robust approach to estimate functional connectivity in the resting and gaming stages of the brain networks. They conduct an experiment with 26 male college students aged 19-29 years old from two universities located in Seoul, Republic of Korea. They undergo all the 26 healthy subjects resting and gaming experiments. Each stage was recorded for five minutes separately. They segment their data using 30s window lengths and 2s step size. For each window, they compute the persistence diagram using Pearson correlation between brain channels employing the weighted simplex filtration (WS). Then, they compute the 0-dimensional persistence vineyard to analyze the dynamic brain connectivity. In a brief, a persistent vineyard is a dimension persistent diagram with a time dimension added, tracking the birth and death of dimension diagrams in a time-varying topological space [66]. Their results show that persistence vineyard is successful to determine the temporarily dynamic properties of the brain in a robust and threshold-free way. They also show that persistent vineyard is more effective than the principal component analysis (PCA) and standard graph theoretical methods.
[67] compares resting state functional brain activity in 15 healthy volunteers after intravenous infusion of placebo and psilocybin using persistent homology and other statistical methods-density function. First, the raw data from fMRI (functional magnetic resonance imaging) dataset is transformed into a functional network. They create the 1-dimensional persistence diagram using the Vietoris-Rips (VR) filtration. Later, they define two different homological scaffolds depending on how frequently edges are part of the generators of the persistent homology groups and how persistent are the generators to which they belong to. The results show that the homological structure of the brain’s functional patterns undergoes a dramatic change post-psilocybin, characterized by the appearance of many transient structures of low stability and of a small number of persistent ones that are not observed in the case of placebo.
In [23], the authors first study random graphs using the clique (CCL) filtration. Using different probabilities, they generate random networks and compute their barcodes. They show that the results on these barcodes are in agreement with the theoretical studies on these complexes. As another application, they study an email network. They create barcodes and show that higher dimensional barcodes, which do not exist for random networks, correspond to more dense communications among certain groups. They also apply their methods on scale-free networks with a modular structure. They use three different parameters to generate three types of scale networks: Clustered modular networks, clustered non-modular networks and non-clustered modular networks. They show that both clustered modular and clustered non-modular networks have more bars in their 3-dimensional and 4-dimensional barcodes than the non-clustered modular network.
Moreover, persistent homology is used for metric graph comparison. In [27], the authors first introduce the functional metric graph filtration (FMG) on metric graphs. Then, they define the persistence distortion distance between two finite metric graphs using the persistence diagrams from FMG filtration. In their experiment, they show the stability of the proposed distance measure on the Athen’s road network as a metric graph and generate its noisy sample using a noise level . The results show that the persistence distortion distance between the original graph and its noisy sample grows roughly proportionally to . They also use proposed persistence distortion distance to compare surface meshes of different geometric models. Models from the same group have very smaller persistence distortion distances among them than those between the dissimilar group, which shows the that proposed distance is able to differentiate surface models.
In [68], persistent homology is employed to quantify structural changes in time-varying (dynamic) graphs. Their objective is to transform each instance of the time-varying graph into a metric space, extract topological features using persistent homology, and compare those features over time by means of bottleneck or Wasserstein distance between their corresponding persistence diagrams. Finally, several case studies on real-world networks, such as high school communication network, show how this method can find cyclic patterns, deviations from those patterns, and one-time events in time-varying graphs. In particular, 0- and 1-dimensional PH are utilized to detect the components and tunnels respectively. Each graph constitutes a distinct metric space for which the Vietoris-Rips (VR) filtration is implemented to compute the corresponding Betti numbers.
High order networks are weighted complete hypergraphs collecting relationships between elements of tuples. Computing distance between high order network is difficult when the number of nodes is large. In [25], the authors use persistent homology to derive distance approximations of networks. They compute the bottleneck distance between persistence diagrams of networks to evaluate the differences between networks. They first define a relationship function between a set of nodes to represent a measure of similarity or dissimilarity for members of the group. They use this function to assign weight on each simplex. Using these weights and the weighted simplex filtration (WS), they generate 0-, 1- and 2-dimensional persistence diagrams. They show that they can lower bound distance between two higher order networks, which is in general computationally expensive, with a computationally less expensive distance between their persistence diagrams. They apply their method to the coauthorship networks. They first create the networks using 5 journals from the mathematics community and 6 journals from the engineering community. They use the lower bounds to classify the networks, distinguish the collaboration patterns of engineering and mathematics community and also discriminate engineering communities with different research interests.
To answer the question of whether the existing anonymization mechanisms for preserving privacy truly keep the graph utility, Gao et al. [69] employ persistent homology to analyze and evaluate four anonymization mechanisms. They study online social networks (OSN). They define the distance between two nodes as the number of hopes on the shortest path between these nodes and create 0-,1- and 2-dimensional persistence barcodes using this distance in the power filtration (POW). They analyze the original and anonymized OSNs using the barcodes. The results show that original OSN graphs have stable structures. Furthermore, the 0-dimensional barcodes they obtain show that most anonymized OSNs are more closely connected than the original graph. All anonymized graphs are not as stable as the original graph, because they have more 2-dimensional holes or larger holes. They also compare their results with traditional graph metrics.
In [70], the authors study flocking/swarming behaviors in animals. They first create dynamic graphs and simplicial complexes using the Vietoris-Rips complex for a fixed scale parameter. Then, they construct the zigzag simplicial filtration (ZSF) and obtain 0-dimensional zigzag persistent diagrams to classify the four different type of flocking behavior in animals. Finally, using the bottleneck distance, the single linkage hierarchical clustering, and MDS, they distinguish the 4 behaviors very well.
In [30], the authors characterize the directed cycle networks by digraph filtration using persistent path homology (PPH). They prove that the persistent diagrams of a cycle network with nodes for solely depends on .
5 Conclusion
In this paper, we provide a conceptual review of key advancements in the area of using PH on applied network science. We look into research studies that use PH on networks and highlight different algorithms that are used to extract topological features of networks. We review the applications where PH is used in solving network mining problems. We believe our summary of the analysis of PH on networks will provide important insights to researchers in applied network science.
At the moment, the implicit goal of most studies is to extract the topological features of the networks that persist across multiple scales. However, there are some limitations to these studies. Firstly, scalability may be a concern for future progress. The networks used in these studies are mostly small networks (number of vertices is less than 1000). There is still significant work needed to be done in scaling PH approaches for larger networks. Secondly, there are some filtrations whose stability has not proven yet.
Furthermore, although many filtration methods are proposed, they are mainly designed for static networks. However, many real-world networks are evolving over time. For example, in the Facebook network, friendships between users always dynamically change over time, hence new edges are continuously added to the social network while some edges may be deleted. Most of the existing methods cannot be directly applied to large scale evolving networks. New filtration algorithms, which are able to tackle the dynamic nature of evolving networks, are highly desirable in persistent homology.
As another future research direction, PH can also be used for network and sub-network embedding problems.
Abbreviations
PH, Persistent Homology; PB, Persistence Barcode; PD, Persistence Diagram; VR, Vietoris-Rips Filtration; DSS, Dowker Sink and Source Filtration; CCL, Clique Complex Filtration; WRCL, Weight Rank Clique Filtration; VBCL, Vertex-Based Clique Filtration; kCL, -clique filtration; WS, Weighted Simplex Filtration; VFB, Vertex function based filtration; IC, Intrinsic Čech Filtration; FMG, Functional Metric Graph Filtration; POW, Power Filtration; TMP, Temporal Filtration; ZSF, Zigzag Simplicial Filtration; PPH, Persistent Path Homology; GVR, Generalizations of Vietoris-Rips Filtration; PPI, Protein-protein interaction networks; MRI, Magnetic Resonance Imaging; fMRI, Functional Magnetic Resonance Imaging; DTI, Diffusion Tensor Imaging; EEG, Electroencephalography; CORT, Cortical Regions of Corticosterone; SLD, Single-linkage Dendrograms; US, United States; GDP, Gross Domestic Product; OSN, Online Social Networks; DBLP, Digital Bibliography & Library Project; IMDB, Internet Movie Database; PCA, principal component analysis; MDS, multidimensional scaling; dim, dimension.
Availability of data and material
Not applicable
Competing interests
The authors declare that they have no competing interests.
Funding
Not applicable
Author’s contributions
MEA and EA designed the project. MEA studied the filtrations defined on networks. MEA, EA and AEF studied the algorithm and applications of the persistent homology in network settings. MEA, EA and AEF wrote the manuscript.
Acknowledgements
Not applicable
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Bhagat, S., Cormode, G., Muthukrishnan, S.: Node classification in social networks. In: Social Network Data Analytics, pp. 115–148. Springer, US (2011)
- 2[2] Akbas, E., Aktas, M.: Network embedding: on compression and learning. ar Xiv preprint ar Xiv:1907.02811 (2019)
- 3[3] Akbas, E., Zhao, P.: Truss-based community search: a truss-equivalence based indexing approach. Proceedings of the VLDB Endowment 10 (11), 1298–1309 (2017)
- 4[4] Akbas, E., Zhao, P.: Attributed graph clustering: An attribute-aware graph embedding approach. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017. ASONAM ’17, pp. 305–308. ACM, New York, NY, USA (2017)
- 5[5] Lopes, G.R., Moro, M.M., Wives, L.K., De Oliveira, J.P.M.: Collaboration recommendation on academic social networks. In: International Conference on Conceptual Modeling, pp. 190–199 (2010). Springer
- 6[6] Sharan, R., Ulitsky, I., Shamir, R.: Network-based prediction of protein function. Molecular systems biology 3 (1), 88 (2007)
- 7[7] Babai, L.: Graph isomorphism in quasipolynomial time. In: Proceedings of the Forty-eighth Annual ACM Symposium on Theory of Computing, pp. 684–697 (2016). ACM
- 8[8] Baur, M., Benkert, M.: Network comparison. In: Network Analysis, pp. 318–340. Springer, Berlin, Heidelberg (2005)
