Analysis of contagion maps on a class of networks that are spatially embedded in a torus
Barbara I. Mahler

TL;DR
This paper investigates how contagion spreads on networks embedded in a torus, analyzing wavefront versus jump propagation, and introduces contagion maps as a tool for understanding spreading dynamics and manifold learning.
Contribution
It extends previous work by analyzing contagion on noisy geometric networks embedded in a torus, and demonstrates the use of contagion maps for manifold learning and network analysis.
Findings
Identifies parameter regions with wavefront propagation.
Shows how nongeometric edges influence spreading behavior.
Demonstrates contagion maps as a tool for manifold learning.
Abstract
A spreading process on a network is influenced by the network's underlying spatial structure, and it is insightful to study the extent to which a spreading process follows such structure. We consider a threshold contagion model on a network whose nodes are embedded in a manifold and which has both `geometric edges', which respect the geometry of the underlying manifold, and `nongeometric edges' that are not constrained by that geometry. Building on ideas from Taylor et al. \cite{Taylor2015}, we examine when a contagion propagates as a wave along a network whose nodes are embedded in a torus and when it jumps via long nongeometric edges to remote areas of the network. We build a `contagion map' for a contagion spreading on such a `noisy geometric network' to produce a point cloud; and we study the dimensionality, geometry, and topology of this point cloud to examine qualitative…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18
Figure 19
Figure 20
Figure 21
Figure 22
Figure 23
Figure 24
Figure 25
Figure 26
Figure 27
Figure 28
Figure 29
Figure 30
Figure 31
Figure 32
Figure 33
Figure 34
Figure 35
Figure 36
Figure 37
Figure 38
Figure 39
Figure 40Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\newsiamremark
remarkRemark \newsiamremarkhypothesisHypothesis
\newsiamthmclaimClaim \headersContagion maps on a class of networks embedded in a torusB. I. Mahler
Analysis of contagion maps on a class of networks that are spatially embedded in a torus††thanks: Submitted to the editors 28/12/2018.
Barbara I. Mahler Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK (). The author acknowledges a studentship from the EPSRC under grant EP/G03706X/1.
Analysis of contagion maps on a class of networks that are spatially embedded in a torus
Barbara I. Mahler Mathematical Institute, University of Oxford, Oxford OX2 6GG, UK (). The author acknowledges a studentship from the EPSRC under grant EP/G03706X/1.
Abstract
A spreading process on a network is influenced by the network’s underlying spatial structure, and it is insightful to study the extent to which a spreading process follows such structure. We consider a threshold contagion model on a network whose nodes are embedded in a manifold and which has both ‘geometric edges’, which respect the geometry of the underlying manifold, and ‘nongeometric edges’ that are not constrained by that geometry. Building on ideas from Taylor et al. [33], we examine when a contagion propagates as a wave along a network whose nodes are embedded in a torus and when it jumps via long nongeometric edges to remote areas of the network. We build a ‘contagion map’ for a contagion spreading on such a ‘noisy geometric network’ to produce a point cloud; and we study the dimensionality, geometry, and topology of this point cloud to examine qualitative properties of this spreading process. We identify a region in parameter space in which the contagion propagates predominantly via wavefront propagation. We consider different probability distributions for constructing nongeometric edges — reflecting different decay rates with respect to the distance between nodes in the underlying manifold — and examine the effect of such choices on the qualitative properties of the spreading dynamics. Our work generalizes the analysis in Taylor et al. and consolidates contagion maps both as a tool for investigating spreading behavior on spatial networks and as a technique for manifold learning.
keywords:
spreading dynamics, contagions, spatial networks, manifold learning, topological data analysis
{AMS}
55N31, 05C82, 91D30, 82B43
1 Introduction
Spreading dynamics are ubiquitous in many situations, including social settings and biological processes. The spreading of a contagious disease or of an idea between people are two obvious examples, and various other phenomena also give rise to spreading processes on networks [22, 24, 26].
The spreading of real-world contagions is often guided by the geometry of some underlying domain [18, 37, 30, 9]. One example is the spread of contagious diseases. Historically, such diseases spread gradually along part of the earth’s surface. Similarly, when means of transportation and communication are limited, information typically disseminates via entities that are physically close. In such cases, contagions often propagate as a wavefront, passing between geometrically close entities. However, with modern transportation and communication technology, there are now many scenarios where — even in the presence of a well-defined underlying geometry, such as the spherical surface of the earth — a contagion can also spread via connections that are not intrinsically geometric [1, 6]. Examples of such scenarios include the spreading of an infectious disease via passengers traveling on a long-distance flight and the dissemination of information via social media. In these examples, a contagion jumps across space to distant locations, rather than following the geometry of an underlying domain.
One way to study such phenomena is to consider contagion models on networks that are embedded in some underlying geometric space [28]. In particular, one can consider networks that have both geometric edges that respect the geometry of the underlying space, in the sense that they can only connect nodes that are close to each other according to the space’s metric, and nongeometric edges, which are not constrained by the underlying geometry and can connect nodes that are far from each other. Following terminology from [33], we refer to such networks as noisy geometric networks. It is interesting and important to ask [22, 26, 29] what propagation pattern(s) a contagion follows and how much the structure of the underlying space influences such patterns. Two fundamental spreading mechanisms that can occur on a noisy geometric network are wavefront propagation (WFP) and the appearance of new clusters (ANC). Wavefront propagation is the spreading of a contagion along the structure of the underlying space via geometric edges. The appearance of new clusters occurs when long-range, nongeometric edges connect activated nodes with nodes in a region of the network that has been unaffected by the contagion and thereby lead to a new cluster of activated nodes in this previously unaffected region. We can view the nongeometric edges as ‘bridges’ that can accelerate the spreading process considerably, especially if they are ‘long’. In a threshold model (a type of ‘complex contagion’ [7, 22]), in which a sufficient fraction or number of nodes in a focal node’s neighborhood need to be active to activate that node, bridges also need to be sufficiently ‘wide’ to encourage ANC.
Taylor et al. [33] used methods from topological data analysis and nonlinear dimensionality reduction to study spreading behavior of a threshold contagion model on noisy geometric networks. They explored the occurrence of WFP and ANC on a noisy ring lattice, and they examined the extent to which the spreading process follows the ring structure. To investigate the extent to which a complex contagion on a network adheres to the structure of the underlying space, they introduced the notion of a contagion map, which maps each node of a network to a point in based on its activation times in different realizations of a contagion process. It thereby produces a point cloud that one can view as a distortion of the network that reflects the contagion’s spreading behavior. To see if they could identify the structure of the underlying space, Taylor et al. examined the geometry, dimensionality, and topology of such point clouds. They compared their results with a bifurcation analysis of the contagion on noisy ring lattices and found that the contagion map successfully recovers the geometry, dimensionality, and topology of the underlying space exactly when the contagion propagates predominantly by WFP. This illustrates that one can use contagion maps to illuminate propagation patterns of spreading processes on noisy geometric networks whose underlying space is known. Moreover, they found that on noisy ring lattices WFP occurs for a wide range of the network and contagion parameters, suggesting that contagion maps are a viable tool for inferring the structure of the underlying space of a noisy geometric network from contagion dynamics on it. That is, one may be able to use contagion maps as a technique for manifold learning.
We follow the approach of [33], and we build on their ideas through a study of a new example. We still use a threshold contagion model, but we consider a more complicated family of noisy geometric networks. Our networks, which can be interpreted as geometrically embedded in a flat torus, are similar to the Kleinberg small-world model [21]. We use a contagion map to construct a point cloud that represents the dynamics of the contagion from a set of different initial conditions. We then examine the structure of this point cloud in three different ways: topologically (via the homology of a space that we build on the point cloud), geometrically (via distances between pairs of points), and with respect to dimensionality (via the approximate embedding dimension). We compare our findings to the topological and geometric structure, as well as the embedding dimension, of the torus. If the point cloud’s structure resembles that of the torus geometrically, topologically, and in terms of its embedding dimension, this suggests that the contagion propagates via wavefront propagation along the structure of the underlying torus.
The motivation for our choice of network model is fourfold: (1) one can view it as a two-dimensional (2D) analogue of the noisy ring lattice that was studied in [33]; (2) the Kleinberg small-world model includes a nontrivial and adjustable spatial scaling in its probability distribution for constructing nongeometric edges; (3) the flat torus has locally Euclidean geometry and is entirely homogeneous (the local geometry at one point is the same as that at any other); and (4) the embedding of the network in the torus entails nontrivial topological features to take into account when comparing the topological structure of a contagion map to that of the underlying space. In our analysis, we find for a certain region of the parameter space that the structure of the contagion map resembles that of a torus in terms of topology, geometry and dimensionality. Further, this region corresponds to scenarios in which we can predict analytically that the contagion spreads predominantly via WFP, rather than via ANC. This consolidates the approach of [33] both as a way of determining spreading behavior of contagions on noisy geometric networks, and as a manifold-learning technique that is robust to noise.
Our paper proceeds as follows. In sections 2 and 3, we define the network model and contagion model that we study, and we give some background on noisy geometric networks and contagions on networks in general. In section 4, we describe the employed methodology. We present a series of numerical experiments in section 5, and we perform a bifurcation analysis in section 6. In section 7, we discuss our findings. We give background mathematical details on persistent homology in our supplementary material.
2 Network model
A geometric network is a network whose nodes are embedded in some metric space and whose edges, called geometric edges, can occur only between pairs of nodes that are sufficiently close in this space [2, 3].
One can build a noisy geometric network from a geometric network by adding so-called nongeometric (i.e., ‘noisy’) edges between pairs of nodes that can be distant from each other in the underlying metric space. For example, in synthetic noisy geometric networks, the nodes may be located on a manifold that is embedded in an ambient Euclidean space. One can place geometric edges between all or some of the node pairs that are at distances from each other that are below some fixed threshold, and one can then add nongeometric edges uniformly at random (see Figure 1(a)) or following some other probabilistic or deterministic rule.
As another example (see Figure 1(b)), one can add noise to node locations in the ambient space and place a nongeometric edge between any two nodes that are close in the ambient space but not close with respect to geodesic distance along the manifold. Many nonlinear dimensionality-reduction techniques, such as diffusion maps and Isomap [4, 34, 8, 13, 32], start by inferring a proximity network from point-cloud data, such as by connecting each point to its nearest neighbors, with the goal of finding underlying low-dimensional structure of the point cloud. Such proximity networks can be viewed as noisy geometric networks, as it is possible for nodes to be adjacent even when they are not close on the underlying manifold, and nonlinear dimensionality-reduction techniques seek to find purely geometric structures in such networks.
We consider a family of noisy geometric networks that are embedded geometrically in a 2D manifold, with the nodes spread evenly on the surface of a flat torus. The torus has Betti numbers , , and (we define Betti numbers in Definition 8.3 in the supplementary material), so there are multiple nontrivial topological features to take into account when comparing the structure of a contagion map to that of the underlying space.
Our noisy geometric network is a variant of the Kleinberg small-world model [21] (see Figure 1(c,d)). We start with a periodic square lattice of nodes:
[TABLE]
so
[TABLE]
We define the periodic lattice distance between nodes to be
[TABLE]
where
[TABLE]
and the sum of the two residues in (1) is taken in . The periodic lattice distance is the regular lattice distance with opposite sides of the lattice considered to be close to each other. We can thus think of the lattice as being ‘wrapped up’ into a 2D torus, which has no boundary. In other words, we are using periodic boundary conditions.
We fix and place a geometric edge between any two nodes whose (periodic) Euclidean distance from each other is within . That is, we place a geometric edge between nodes and if and only if
[TABLE]
We call the number of geometric edges that are incident to a node its geometric degree, which we denote by .
Each node also has ‘nongeometric stubs’, and we connect pairs of stubs to build nongeometric edges as follows. We connect a nongeometric stub from node to a stub from node with a probability that is proportional to , where is a fixed parameter. We call the number of nongeometric edges that are incident to a node its nongeometric degree, which we denote by (where , by definition). The degree of a node is , and the class of networks that we just defined consists of regular networks of uniform degree for all . When , we match the nongeometric stubs uniformly at random. For , nongeometric edges tend to connect nodes that are close with respect to the periodic lattice distance, and this tendency becomes more pronounced for progressively larger .
3 Contagion model
A contagion on a network is a dynamical process in which nodes become successively ‘activated’, starting from some initial condition. The most common type of initial condition is that a set of ‘seed’ nodes are active at time [27, 19]. We examine the Watts threshold model (WTM) [36] (see also [15, 35]), one of the simplest and best-studied models of a contagion on a network .
Let (with ) denote the set of nodes of a network, and let be the adjacency matrix of the network. In our contagion, each node can be either active or inactive, and we denote the state of node at time by , which takes the value if it is active and the value [math] if it is inactive. We call the set of nodes that are active at time a contagion seed, and we denote the seed set by . That is, for all and for all . If consists of a single node, the initial condition is called ‘node seeding’; if consists of a node together with its neighbors, the initial condition is called ‘cluster seeding’. For a given homogeneous threshold , we update node states synchronously in discrete time steps according to the following rule. If , then . If , then
[TABLE]
In other words, a node activates at time if the fraction of its neighbors that are active is larger than at time step . Once a node is active, it stays active forever. For a fixed homogeneous threshold and a given seed set , this contagion is a deterministic and monotonic process, which eventually reaches a stable state in which either all of the nodes are active or some nodes are inactive and will never activate. (It is ‘monotonic’ in the sense that a node that activates stays active forever.) For a given network and a given seed set , this deterministic process is one ‘realization’ of the contagion. Formally, a realization is the nested sequence of subsets of that are active at successive times: such that . Note that, due to the deterministic nature of the process, determines for all .
4 Methods
We construct a point cloud by mapping the nodes of a network to points in based on their activation times in different realizations of the contagion dynamics. This so-called contagion map was first studied in [33] and is inspired by approaches, such as diffusion maps and Isomap [4, 34, 8, 13, 32], from nonlinear dimensionality reduction. A point cloud that is the image of a contagion map can be interpreted as a distortion of an underlying network structure that reflects the contagion dynamics. We analyze the structure of this point cloud from three different perspectives — topologically, geometrically, and with respect to dimensionality — and compare it to the structure of the underlying network. We expect the structure of the point cloud in to resemble the structure of the underlying network when the contagion spreads predominantly via WFP. We perform a bifurcation analysis 111The traditional use of the term ‘bifurcation’ [16] is to describe situations in dynamical-systems theory in which a system’s qualitative behavior changes in a mathematically quantifiable way (e.g., as expressed using a normal form), such as the onset of a limit cycle for a critical value of a parameter, as a function of one or more parameters. The notion of bifurcation that we examine in the present paper is somewhat different in flavor from classical bifurcations, but we still examine qualitative changes in dynamics as we adjust parameters in a model. to identify regions in parameter space for which WFP is the predominant mode of propagation, and we use the results of this analysis to validate that the structure of the underlying network is recovered in the point cloud whenever the contagion spreads predominantly via WFP.
To compare the topology, geometry, and dimensionality of a point cloud to the structure of the network on which it is based, we need to specify this structure precisely. Specifically, we need to choose a metric space associated with the network and we need to specify the locations of the network’s nodes in this metric space.
We consider the torus as the Cartesian product of two circles:
[TABLE]
We evenly distribute the nodes of our network on this torus . The nodes of our network are the points on with coordinates
[TABLE]
4.1 Contagion maps
Consider our WTM contagion, with homogeneous threshold , on one instantiation of our Kleinberg-like network for some fixed parameter values , , , and . For a given seed set, the contagion dynamics is a deterministic process, and we can record the activation times of the nodes. We consider several realizations of the contagion dynamics initialized with different seeds. We denote the set of realizations by and we denote the activation time of node in realization by . If node is never activated in realization , we set (i.e., larger than any actual activation time).
The regular contagion map associated with the set of realizations is a function from the set of nodes to . It is defined by
[TABLE]
The regular contagion map associated with maps each node in to a vector in that records its activation times in each of the realizations.
We take to be the same size as and choose the seed sets to be the clusters around the different nodes, such that the seed that initializes realization is . In this case, the activation time of node in realization is a proxy for a distance between nodes and . To see this, consider the realization of a contagion with homogeneous threshold initialized with a single seed node and observe that the activation time of node is exactly the length of a shortest path between and . For cluster seeding of our contagion, the activation time of node in realization may not be precisely the shortest-path distance between and ; it depends on how the contagion spreads. Moreover, in general. With this in mind, we define the reflected contagion map, which maps , and the symmetric contagion map, which maps .
4.2 Geometry
To quantify the similarity of the geometric structure of a contagion map to that of the network on which it is based, we calculate the Pearson correlation coefficient of pairwise distances between points of the point cloud and pairwise distances between corresponding nodes of the network. We use Euclidean distance in for the nodes and Euclidean distance in for points in the point cloud.
Recall that the nodes lie on the torus (2) at points with coordinates (3). Let
[TABLE]
denote the point in that is associated with node . The distance between two points, and , in is
[TABLE]
and the distance between the corresponding points, and , in the point cloud is
[TABLE]
Given ordered sets, and , of pairwise distances between nodes of the network and points in the point cloud, respectively, we compute the Pearson correlation coefficient between these sets:
[TABLE]
where
[TABLE]
denotes the mean pairwise distance between nodes and denotes the mean pairwise distance between points in the contagion map. Progressively larger Pearson correlation coefficients indicate progressively more similar geometric structures between a contagion map and its associated network.
4.3 Topology
We examine the topology of a contagion map by studying the persistent homology (PH) of the Vietoris–Rips (VR) filtration (see Definition 8.12 in the supplementary material) on its associated point cloud. We calculate PH using the software package Ripser222Ripser is publicly available at https://github.com/Ripser/ripser.. We seek to quantify the extent to which topological features of a torus appear in the barcode that represents PH in a given dimension. To do this, we calculate the Wasserstein distance (see Definition 8.9 in the supplementary material) between this barcode and a ‘model barcode’ that represents topological features of a torus in the given dimension. As a ‘model barcode’, we choose the one that corresponds to the PH of the VR filtration on the regular point cloud on the torus in formula (3). Smaller Wasserstein distances correspond to more ‘torus-like’ point clouds, recovering the topology of the manifold in which the network’s nodes are embedded. Roughly speaking, a barcode exhibits the topological features of a torus when it has two dominant bars in dimension and one dominant bar in dimension (as well as one bar that never dies in dimension [math]). We work with networks of nodes (see section 2). Due to the computational complexity of computing 2D persistent homology of a VR filtration on points (it involves building up to simplices), we compute PH only up to dimension , which requires building only up to simplices for a given point cloud.
The Wasserstein distance between barcodes is sensitive to scaling. Consider, for instance, two barcodes that have the same number of bars, such that the relative lengths of the bars within each barcode are the same. Although these two barcodes represent the exact same topological features — albeit of different sizes — the Wasserstein distance between them is nonzero. Similarly, two barcodes that represent very similar topological features, but are at very different scales, may have a larger Wasserstein distance between each other than two barcodes that represent different features but are close in ‘scale’333Our use of the term ‘scale’ differs from existing uses in topological data analysis. Two example uses of ‘scale’ in TDA are for the persistence of a topological feature in a filtration and for the point in a filtration at which a feature appears.. See Figure 2 for an illustration of this phenomenon.
In the present application, this sensitivity to scaling can manifest as follows. The model-torus barcode corresponding to regularly spaced points on a torus constructed as the Cartesian product of two circles of circumference has relatively short bars. For our contagion, larger values of entail slower spreading. Therefore, if we have two contagion maps which both arise from spreading via WFP without ANC, but one with large (entailing slow propagation) and the other with small (entailing fast propagation), then both contagion maps have the same (torus-like) shape, but the former is much ‘larger’ than the latter. This implies, in turn, that the former’s corresponding barcode is farther away than the latter from the model-torus barcode. Similarly, when there are many nongeometric edges, there is fast spreading via ANC. Therefore, although the shape of the contagion map should not look torus-like in this case, but instead should look like a cluster of points, the Wasserstein distance from the corresponding barcode to the model torus barcode may still be small (simply by virtue of the size of the point cloud, rather than because of its shape).
To counteract the above scaling issue, we ‘calibrate’ all barcodes before calculating the Wasserstein distance (see Figure 3). We find the longest bar in each barcode and divide the birth and death times of all bars by that length. This yields barcodes whose longest bar is exactly , so they can be considered to be at the same ‘scale’. Consequently, the Wasserstein distance between these calibrated barcodes can serve as a measure for comparing the topological features of the corresponding point clouds without taking absolute distances (i.e. geometry) into account (see Figure 2 (d)–(f)).
To compute Wasserstein distance, we use the software package Hera444Hera is publicly available at https://bitbucket.org/grey_narn/hera.. Hera currently provides the fastest algorithm for computing Wasserstein distances.
4.4 Dimensionality
We determine the approximate embedding dimension of a point cloud by finding the smallest dimension such that we lose less than of the variance when projecting to that dimension using principal component analysis (PCA) [32]. That is, for each , we project the point cloud to using PCA, resulting in a point cloud .
We estimate the extent to which this projection preserves the original point cloud by calculating the residual variance [34, 10]
[TABLE]
where is the Pearson correlation coefficient between the pairwise Euclidean distances of points in and corresponding pairwise Euclidean distances between points in (see section 4.2). The approximate embedding dimension is the smallest dimension for which the residual variance is less than 5%; that is, .
In practice, we put a cap of on , so if the approximate embedding dimension is or larger, we record it to be . Because we consider the torus to be embedded in , an approximate embedding dimension of indicates that the contagion map recovers the dimensionality of the torus.
5 Numerical Experiments
5.1 Experiments in parameter space
We construct Kleinberg-like small-world networks (as detailed in section 2) for the following parameter values: nodes (i.e., ), geometric degrees of (corresponding to ), nongeometric degrees of , and distance decay parameter . For each of these networks and for each threshold value , we examine a WTM contagion (see section 3) with cluster seeding around each of its nodes and record the activation times of each node. Using the activation times of each node in each of these realizations as coordinates, we map the nodes of each network to a point cloud in via the symmetric contagion map (see section 4.1).
We compute quantitative measures of the similarity of these point clouds to the underlying torus in terms of geometry (see section 4.2), topology (see section 4.3), and dimensionality (see section 4.4) when we place nongeometric edges uniformly at random (i.e., when ). We illustrate our results by separately displaying the values of the Pearson correlation coefficient , the Wasserstein distance , and the embedding dimension in parameter space for each value of (see Figure 4). When examining topological similarity, we only cover the case , as computing PH of the VR filtration on a point cloud of 2500 points is extremely time-consuming because of the large number of simplices involved. Brighter regions in our plots signify larger Pearson correlation coefficients (in the geometry computation), smaller Wasserstein distances (in the topology computation), and lower approximate embedding dimensions . In each plot, we can identify a region in the parameter space for which is large, and and are small, indicating that WFP dominates for these parameter values.
The first column of each plot in Figure 4 (e.g., see the yellow bar in panel (a)) shows our results for , which corresponds to a purely geometric network. In this case, network formation is deterministic and we can analytically determine the WTM dynamics (in particular, the presence versus absence of WFP). (See section 6 for details.) We see in the first column of each plot that , , and take only extreme values for and that the transition between extreme values occurs at the same threshold for all three quantities. Below this threshold, the Pearson correlation coefficients are large and the Wasserstein distances and approximate embedding dimensions are small (, to be precise). Above this threshold, the Pearson correlation coefficients are small and is large (at the cap of ), and these values of yield ‘infinite activation times’ of nodes in the plot for the Wasserstein distance. The observations described in this paragraph are consistent with our analytical considerations, which demonstrate that spreading (by WFP) can occur only below this threshold.
There is a band along the transition between the region in which we expect WFP (see Figures 8–10) and the region in which we do not. This band is dark in the plots of the geometric and the topological structure, and it is bright in the plot of dimensionality. This implies that the point cloud is low-dimensional for the corresponding parameter combinations, but that it does not exhibit torus-like structure in terms of geometry or topology. Although this was not discussed in [33], one can also observe such a band for WTM contagions on the noisy ring lattices in that study.
Some irregularities and outliers in our figures are likely due to the probabilistic nature of nongeometric edges in our network construction. One example is that of the nonwhite spots in the white region of Figure 4(e). These correspond to parameter combinations for which our bifurcation analysis (see section 6) suggests that we should expect infinite activation times, but all nodes have finite activation times in practice in all realizations.
5.2 Effect of the distance-decay parameter on contagion maps
In our Kleinberg-like small-world networks (see section 2), recall that we regulate the range of nongeometric edges using the decay parameter . Each node has a fixed number of nongeometric stubs, and we connect two stubs that emanate from nodes and to form a nongeometric edge with a probability that is proportional to . For , we match the nongeometric stubs uniformly at random, regardless of the distance between the corresponding nodes, so the length of the nongeometric edges can take any value with equal probability. For , nongeometric edges have a bias to connect nodes that are close to each other with respect to the periodic lattice distance. This bias becomes more pronounced for progressively larger , so larger values of tend to yield shorter nongeometric edges.
The speed of a WTM contagion on a Kleinberg-like network depends significantly on the parameter [14, 11]. We examine the effect of on the shape of a contagion map. We show our results for geometry in Figure 5, for topology in Figure 6, and for dimensionality in Figure 7. For fixed values of the geometric degree and nongeometric degree of our networks and threshold of our contagion, we vary the value of . Specifically, we choose a geometric degree of , a nongeometric degree of , and one value for the contagion threshold for each predicted spreading regime when . We use the value for the regime in which we expect both WFP and ANC when , the value for the regime in which we expect WFP but no ANC when , and the value for the regime in which we expect neither WFP nor ANC when . See Figure 13 to locate these values in parameter space and thereby identify their associated spreading regimes. For each of these three values for , we let vary from [math] to in increments of . For each value for , we map the nodes of the associated network via the contagion map using the given value of and analyze the resulting point cloud as described in section 4.
For , the Pearson correlation coefficient increases significantly in an almost linear fashion as we increase , while the Wasserstein distance and the approximate embedding dimension both decrease. This arises from the fact that nongeometric edges change in function from drivers of ANC to contributors to WFP. For , we expect fast spreading of the contagion that is dominated by ANC. This leads to a contagion map whose image is a cluster of tightly bunched points that are distributed fairly evenly in the region that they occupy. In particular, the pairwise distances between the points are not influenced much by the pairwise distances between their corresponding nodes. Such a cluster of points has a high approximate embedding dimension , because its points are distributed with roughly constant density across the region that they occupy, so the point cloud does not have a lower intrinsic dimension than its ambient space. For progressively larger , the nongeometric edges tend to become shorter and contribute increasingly to WFP, instead of facilitating spreading across large distances in a network. They thereby produce a point cloud that is still contained in a small volume (because the spreading is still fast with such a low threshold), but with pairwise distances between points that become increasingly faithful to the pairwise distances between their corresponding nodes.
For , the Pearson correlation coefficient starts out large and increases further for progressively larger . By contrast, the Wasserstein distance is small throughout the range of , with a slight decrease at the lower end of the range. The approximate embedding dimension is for all values of that we considered. This relative stability of all three measures stems from the fact that, for this value of , WFP dominates over ANC even when , as the nongeometric edges are not sufficiently numerous to drive ANC. For progressively larger , the gradually shortening nongeometric edges only contribute increasingly to WFP.
For , the Pearson correlation coefficient and the approximate embedding dimension remain fairly small and fairly large, respectively, for all values of . The Wasserstein distance decreases steadily as we increase .
6 Bifurcation analysis
We conduct a bifurcation analysis of the spreading behavior of the WTM contagion (see section 3 for its definition), which we initialize with cluster seeding on the family of Kleinberg-like small-world networks that we described in section 2. The results of this bifurcation analysis give a guideline for interpreting our prior numerical computations. We want to determine analytically which combinations of network parameter values, and (see section 2), and threshold parameter value (see section 3) allow the contagion to spread by WFP and which allow it to spread by ANC. That is, we want to identify regions in parameter space for which the spreading dynamics follow specific regimes that are characterized by the presence and absence of WFP and ANC. We are especially interested in the region of parameter space for which there is WFP but no ANC, as this region should comprise the parameter combinations for which the contagion map exhibits structural features of a torus.
We consider exclusively, as — at least locally and sufficiently early in the contagion process — the total size of a network should not affect contagion behavior. At later stages, a contagion that saturates a network will speed up earlier for smaller networks, as the active region of the network is now proportionately larger with respect to the total network size. Additionally, we restrict our analysis to the case (i.e., when we place nongeometric edges uniformly at random).
We fix the geometric degree and examine the spreading behavior as we vary the nongeometric degree and the threshold . The possible values for are constrained by the number of nodes that are within a distance of a given node (see section 2). For a given , the corresponding is less than the number of integer lattice points that lie inside a circle of radius that is centered at the origin. This number is approximately equal to the area of the circle, and the problem of determining it is known as “Gauss’ Circle Problem” [17]. The three smallest values of nongeometric degree are , , and , which correspond to , , and , respectively, in the definition of our Kleinberg-like small-world network.
6.1 Wavefront Propagation (WFP)
We consider individually, and we work out the maximum threshold for which a Kleinberg-like network with can support sustained spreading via only geometric edges.
If (see Figure 8), then for WFP to occur, the threshold needs to be small enough to allow spreading via a single edge. Therefore, for variable , for WFP to occur, the threshold needs to be smaller than
[TABLE]
If (see Figure 9), then for WFP to occur, the threshold needs to be small enough to allow spreading via three edges. Therefore, for variable , for WFP to occur, the threshold needs to be smaller than
[TABLE]
If (see Figure 10), then for WFP to occur, the threshold needs to be small enough to allow spreading via four edges. Therefore, for variable , for WFP to occur, the threshold needs to be smaller than
[TABLE]
There does not seem to be a closed form for that holds for general values of . One needs to find the maximum threshold that allows spreading by WFP for each value of individually by finding the edges that can support spreading from the contagion seed.
6.2 Appearance of new clusters (ANC)
The activation of an inactive node by ANC occurs, by definition, exclusively via nongeometric edges. That is, a node is activated via ANC if it is adjacent to at least active nodes by nongeometric edges and all of its geometric neighbors are inactive. Consequently, if the threshold is larger than or equal to the ratio of the nongeometric degree to the total degree (i.e., ), then ANC is impossible. If , then ANC is possible. When , ANC can occur in principle, but only if all of the nongeometric edges of an inactive node that has no active geometric neighbors ‘reach into’ contagion clusters. This is very unlikely to occur in practice, so a threshold for which ANC is possible in principle is not a good indicator in practice for the presence of ANC. We will explore this issue.
We define the horizon
[TABLE]
of ANC to be the boundary between thresholds for which ANC is possible in theory and thresholds for which ANC is impossible. Using the horizon as a boundary curve for ANC generates an ‘idealized’ bifurcation diagram that tends to overestimate the size of the region of the parameter space for which ANC occurs.
In practice, one needs to think about the probability that a number among all nongeometric edges of a given node reach into clusters of active nodes. If nongeometric edges are placed uniformly at random (which occurs when in the construction of our Kleinberg-like networks), the expected probability for a nongeometric edge of an inactive node to be incident to an active node at time is , where is the number of active nodes at time . Consequently, if the nongeometric degree is , the expected number of nongeometric edges of an inactive node that are incident to active nodes is . It follows that the maximum threshold for which one can expect every node that is inactive before time to activate via ANC at time is
[TABLE]
However, for ANC to occur at a certain time, it is not necessary for every inactive node to activate via ANC at that time. It suffices for any inactive node that is sufficiently far away from the contagion to activate via ANC, and the requirements for that to occur are generally lower than (4). Consequently, we expect (4) to be a lower bound for , the critical threshold for ANC to occur.
The numerator in (4) depends linearly on the (time-dependent) number of active nodes. This raises the question of what may be a sensible choice for (and ). Intuitively, if ANC occurs towards the end of a spreading process, when large parts of the network are already active, then its contribution to the spreading of the contagion is a minor one and it has only a negligible distortive effect on the contagion map. The activation times of nodes that are infected via ANC late in a contagion process are only mildly shorter than what occurs for spreading purely via WFP, so the points in the image of the contagion map are perturbed only slightly. To make (4) a meaningful bound for , we thus seek to work out the latest point in the spreading process up to which the occurrence of ANC plays a significant role in the overall spreading behavior and accordingly has a noticeable effect on the contagion map. The later this point occurs, the larger will be and the larger we expect the critical threshold (i.e., bifurcation point) to be. We have
[TABLE]
for some , where the parameter is determined by how late in the spreading process the occurrence of ANC plays a significant role. If, for instance, the occurrence of ANC plays a significant role in overall spreading behavior and thus has a noticeable distortive effect on a contagion map only if it takes place by the time that three fifths of the nodes in the network are active, then the bifurcation curve for ANC is bounded below as follows:
[TABLE]
We compare the idealized bifurcation diagram (using the horizon of ANC as its bifurcation curve) and the diagram that we obtain from (4) with , where is the number of nodes in the network (see Figure 11), to our numerical results for the geometry and dimensionality.
The above argument is independent of the particular geometry that underlies a network, as long as the nongeometric edges are placed uniformly at random. In particular, the inequalities (5) should also hold for the ring lattice in the computations of Taylor et al. [33]. Indeed, looking at their results (see Figure 6 in [33]), their curve (the dotted curve) does seem to be a bit higher than what they observed in their numerical computations, suggesting that is indeed bounded above by the idealized bifurcation curve.
If three fifths of the total number of nodes is indeed the correct choice for the maximum number of nodes that activate before a certain time for the occurrence of ANC to be ‘significant’ at time , then one should expect the actual bifurcation curve to lie somewhere between the red curves in Figure 11.
To find the actual bifurcation curve , we need to find (for a given value of ) the largest threshold that realistically allows ANC to arise before the active region of the given network is so large that the occurrence of ANC no longer has a significant impact on the spreading dynamics. This amounts to finding a threshold that is as large as possible, but is still small enough that we obtain a value of at least for the expected number of inactive nodes with sufficiently many nongeometric edges that reach into the active region.
To avoid taking into account the activation of nodes via ANC in the boundary of the active region or very close to it, one can count only inactive nodes that lie ‘sufficiently far away’ from active nodes. We use the term neighborhood555Our use of the term ‘neighborhood’ is different from its usual use in graph theory. and the notation for the set of nodes outside which we count nodes being activated via ANC. The neighborhood can consist either of the active nodes only, in which case ; or it can include some additional nodes around the active region, such that .
At a given time , suppose that the nodes that are active at time were activated at previous time steps by WFP from the cluster seed and that they form an active cluster of roughly square shape. We denote this set of nodes by
[TABLE]
so . We define the neighborhood of this active cluster to be the cluster itself together with the nodes in its periphery of a certain ‘width’. That is, is the active cluster itself, its boundary, and (depending on the width) some more nodes around it. Given a width , we approximate the number of nodes in as
[TABLE]
We can choose any natural number for the width , and one plausible choice is . If , the neighborhood is just itself (and then ). To make a sensible choice for the size of the active region of a network at the latest point in the spreading process at which we consider ANC to be significant, we estimate the largest number of active nodes such that the active region together with a periphery of inactive nodes of width constitutes at most of the nodes in the network. We make this estimate by choosing to be the largest integer such that
[TABLE]
Consequently,
[TABLE]
For and , this gives .
If we consider ANC to be significant up to time , the corrected bifurcation curve for ANC is
[TABLE]
where is the largest integer such that, at time , the expected number of nodes outside the neighborhood of with more than active neighbors is at least .
Let be the number of nodes outside with more than active neighbors. It satisfies the binomial distribution
[TABLE]
so
[TABLE]
where is the number of edges of a node outside that are incident to an active node. Therefore, the expected number of nodes outside with more than active neighbors is
[TABLE]
As we argue in a remark at the end of this subsection, it is approximately the case that
[TABLE]
That is, approximately follows a binomial distribution. Its associated (approximate) cumulative distribution function is
[TABLE]
To determine the numerator of (see formula (9)), we seek the largest integer such that . That is, using the approximation (LABEL:dinprob), we seek the largest integer such that
[TABLE]
We can find for each value of and deduce from that value for a given . For , this yields the plots in Figure 12 for for various choices of the width of the neighborhood and the maximum value of at which we consider ANC to be significant.
Observe that the curves are essentially increasing, but in an oscillatory manner, resulting in a staircase-like shape. This shape arises from the fact that, as increases from [math] to in integer increments, it affects both the largest integer such that inequality (13) is satisfied and the denominator of formula (9). The larger the value of , the larger the value of and the smaller the value of in the sum on the right-hand side of inequality (13), with the latter factor being overall dominant. This explains the overall increasing tendency of the curve. For a given value of , an increase of leads to a steady decrease of , explaining the small intervals of decrease of the curve.
Note (again) that our central reasoning in the above argument is independent of the geometry that underlies our noisy geometric network, provided we place nongeometric edges uniformly at random. The only point at which the particular geometry of the 2D torus comes into play is in our estimation of the approximation (6), which we calculate by assuming that the contagion cluster is roughly square-shaped and that its neighborhood forms a larger square-shaped area of some width around . If we take this width to be [math] (i.e., if ), the formula for the expectation (10) is the same for any noisy geometric network.
We summarize our results in the bifurcation diagram in Figure 13. We overlay our results for geometry, topology, and dimensionality on this diagram in Figure 14.
Remark: The random variable follows a binomial distribution only approximately, because — with our network model avoiding double edges — the event that a nongeometric edge of a node is incident to an active node is not entirely independent of another one of this node’s nongeometric edges being incident to an active node. Consequently, if we want to determine the probability that a node outside has nongeometric edges that reach into the contagion cluster (i.e., ), we have to pick of the node’s nongeometric edges and calculate consecutively, for each , the probability that is incident to an active (if ) or inactive (if ) node, given the states of the incident nodes of the with .
The precise probability of a node having nongeometric edges that are incident to an active node is thus
[TABLE]
However, for , it is reasonable to approximate the probability that a given nongeometric edge of an inactive node is incident to an active node as (i.e., the number of active nodes divided by the total number of nodes). Consequently, asymptotically follows the binomial distribution and the approximation (LABEL:dinprob) is correct asymptotically.
7 Conclusions and Discussion
Networks that have some underlying geometry and include both geometric edges (which are short according to that geometry) and nongeometric edges (which can occur between nodes regardless of their distance from each other) arise in many applications [3], including modeling of human communication and transportation. The spread of a contagion on such a network can be influenced heavily by the underlying geometry, and it is useful to investigate the strength of such influence.
To study this problem, we considered a family of networks whose nodes can be considered to be lying on a 2D torus and whose edges include both geometric edges (which are deterministic and close to each other in the underlying geometry) and nongeometric edges (which are formed randomly and can occur between nodes that are far from each other in the underlying geometry). Using the Watts threshold model, we investigated the spreading behavior of contagions on this family of networks. We did so by mapping network nodes to a high-dimensional point cloud via a contagion map (following Taylor et al. [33]) and analyzing the structure of this point cloud from three perspectives: geometrically, topologically, and in terms of dimensionality. To examine the point cloud’s geometry and dimensionality, we calculated a Pearson correlation coefficient and the embedding dimension, which are well-established measures and easy to compute. To study the topology of the point cloud, we computed persistent homology of the Vietoris–Rips filtration and then calculated the Wasserstein distance between the corresponding barcode and a reference barcode. This was the most challenging and time-consuming part of our work, as algorithms for the computation of PH are computationally expensive and software development in the field is still relatively young and evolving. We therefore restricted ourselves to computing PH in dimension , although PH in dimension may also be insightful for our problem. In our analysis of the topological structure of the point clouds, we also illustrated the sensitivity of the Wasserstein distance to the overall scale of barcodes and the ensuing need to correct for geometric factors when using the Wasserstein distance as a measure of purely topological similarity. We proposed a way of calibrating barcodes to eliminate geometric factors and found that our method was effective at quantifying how much a point cloud resembles a torus.
Our main finding is that the computational analysis of the point clouds that are the images of contagion maps aligns with the bifurcation analysis of spreading behaviors in the combined parameter space of network and contagion (see Figure 14). We also found that the nature of the nongeometric edges affects contagion maps in the expected way: the shorter these edges are likely to be, the more they appear to contribute to the wave-like spreading of a contagion along the underlying torus and the more torus-like the structure of the point cloud is. This provides empirical evidence to support the effectiveness of contagion maps as a tool for estimating spreading behavior. It also suggests the potential of contagion maps as a manifold-learning technique that has some robustness to noise, a direction that is explored in [23].
The topological analysis of contagion maps was the most challenging part of our work, and there is room for further investigations of this aspect of our work with respect to both computational experiments and mathematical theory. First, for our focal network family (namely, Kleinberg-like small-world networks), it may be insightful to examine PH in dimension of the VR filtration, as there are nontrivial topological features to consider in two dimensions. Second, one might challenge our methodology for quantitatively measuring the topology of the contagion map itself: Both the VR complex and the Wasserstein distance — despite being used as topological tools — are intrinsically geometric in nature. For the Wasserstein distance, we have addressed this issue by preprocessing barcodes by calibrating them. Regarding the VR filtration itself, we note that its construction is based purely on pairwise distances, just like the Pearson correlation coefficient that we used as a measure of geometry. While the PH of the VR filtration provides a richer, more nuanced summary of the set of pairwise distances than the Pearson correlation coefficient, this information is integrated when we calculate the Wasserstein distance from the associated barcode to a reference barcode. One may therefore question how much additional information one gains from our topological measure, especially in light of its computational complexity in comparison to that of the easily computed Pearson correlation coefficient. To understand this issue, it likely will be useful to explore how our geometric and topological measures relate to each other and if such a relation can be exploited to speed up computations of our topological measure.
In future work, it will be insightful to use the approach of the present paper to study various other monotonic spreading processes, including those with stochastic update rules (such as compartmental models for diseases), on networks that have some underlying geometric structure (including ones that are more complicated than a torus).
Acknowledgements
The author would like to thank Ulrike Tillmann for her advice on this project and for many useful comments on various versions of this paper. In addition, the author would like to thank Florian Klimm and Dane Taylor for helpful discussions at an early stage of this project.
Supplementary Materials
We present the mathematical background for the methodology that we used in the main text to construct a topological measure for how closely the point clouds that we obtain from contagion maps (see section 4.3 of the main text). For proofs and further discussion of this theory, see [12]. For a condensed and accessible introduction, see [25].
8 Simplicial Homology
Definition 8.1**.**
*An abstract finite simplicial complex is a finite collection of finite ordered sets that is closed under inclusion: whenever and , it follows that . *
The elements of are called simplices. A face of a simplex is a nonempty proper subset . The dimension of a simplex is . The [math]-dimensional simplices are called vertices, and we denote the set of vertices by . The dimension of a simplicial complex is the maximum of the dimensions of the simplices that it contains. A simplicial subcomplex of a simplicial complex is a subcollection of simplices that is itself a simplicial complex. For , the -skeleton of a simplicial complex is the union of its simplices of dimensions .
To each simplex, we can assign a polytope, which is called its geometric realization. A [math]-simplex corresponds to a vertex, a -simplex corresponds to an edge, a -simplex corresponds to a triangle, a -simplex corresponds to a tetrahedron, and so on. A simplicial complex can thereby be represented as a subset of the simplex that is spanned by its vertices. See Figure 15 for examples of geometric realizations.
Let be a simplicial complex, let be an integer, and let some field. A -chain is a linear combination of -simplices in over . We can turn the set of -chains into a vector space by defining addition and scalar multiplication to be component-wise. In topological data analysis, the most common field is ; in this case, a -chain can be interpreted as a collection of -simplices in . When working over , addition is equivalent to taking the symmetric difference. It is straightforward to check that satisfies the axioms of a vector space with this definition of addition and scalar multiplication and that the zero vector is the empty set.
The boundary of a -simplex is the alternating sum of its -dimensional faces. The boundary of a -chain is the sum of the boundaries of its simplices. The boundary of a -chain is a -chain, so the boundary operator defines a function . This function commutes with vector addition and scalar multiplication on . That is, is a linear map; it is called the boundary operator. We thus have a sequence of vector spaces that are connected by boundary operators:
[TABLE]
A -chain in the image of is called a -boundary. A -chain in the kernel of is called a -cycle (see Figure 16). A fundamental property of the boundary operator is that the boundary of a boundary is the zero vector. Consequently, the sequence (14) is a chain complex.
Lemma 8.2**.**
(Fundamental Lemma of Homology)* For any an integer and -chain , we have that . *
That is, the th boundary space is a subspace of the th cycle space , so the following definition makes sense.
Definition 8.3**.**
Given a simplicial complex and an integer , the th homology is the quotient vector space of the th cycle space by the th boundary space :
[TABLE]
The th Betti number is the dimension of the th homology of :
[TABLE]
Two -cycles represent the same element of the th homology if they differ only by -boundaries. Roughly speaking, is the number of -dimensional “holes” of the space . For example, is the number of connected components, is the number of “tunnels,” and is the number of “voids” of the geometric realization of .
Persistent Homology
Definition 8.4**.**
A finite filtration of a finite simplicial complex is a nested sequence of simplicial subcomplexes of , such that the [math]th member of the sequence is the empty complex and the last member of it is all of . That is,
[TABLE]
A filtration of a simplicial complex can be viewed as the construction of from the empty set by sequentially adding collections of simplices.
Given a filtration of a simplicial complex , for every and dimension , the inclusion map from to induces a linear map
[TABLE]
Therefore, there is a sequence of homologies that are related via these linear maps:
[TABLE]
One can track the evolution of along the filtration through the algebraic structures of the homologies in this sequence.
We can generalize the notion of homology in the setting of a filtration of a simplicial complex.
Definition 8.5**.**
For an integer , the th persistent homologies (PH) are the images of the linear maps induced by inclusion:
[TABLE]
*The th persistent Betti numbers are the dimensions of these spaces: . *
Using Definition 8.5, we can formalize the notion of birth and death of a homology class.
Definition 8.6**.**
*A nonzero homology class is born at if , and it dies at if but . *
For , the th PH can be interpreted as the space that consists of all homology classes that are born at or before and are still alive at .
We can now finally give a mathematically rigorous definition of persistence.
Definition 8.7**.**
*Let be a filtration of a simplicial complex , and let be an integer. If is a homology class that is born at and dies at , then its persistence (also known as its “lifespan”) is the difference . If never dies, we note its death time as infinite (i.e., ), making its persistence infinite as well (i.e., ). The interval is called a persistence interval. *
Barcodes, Persistence Diagrams, and Wasserstein Distance
It is a fundamental theorem [38] that one can find elements (with , where is the birth time of and is the death time of ) such that, for each , we have that
[TABLE]
forms a basis for . If the number of births (respectively, deaths) of homology classes in exceeds the number of deaths (respectively, births) at , then increases (respectively, decreases). Tracking the change of Betti numbers during a filtration is thus useful for monitoring the topological evolution of the growing complex. A topological feature that emerges with the birth of a homology class and disappears with the death of is said to “correspond to” and has persistence pers). The features that persist for a long time interval are usually considered to be the defining features of the complex, although this is not always the case.
The collection of persistence intervals corresponding to elements in can be viewed as the filtered analogue of the Betti number . Such persistence intervals can be represented as barcodes, or as diagrams (see Figure 17).
Definition 8.8**.**
The persistence diagram of the th PH of a filtered simplicial complex is the multiset
[TABLE]
*where . *
Note that all persistence diagrams have equal cardinality. For dimension , the collection of persistence intervals corresponding to elements in is called a barcode (see Figure 17).
One can turn the space of persistence diagrams (equivalently, the space of barcodes) into an extended metric space by defining the following notion of distance between two persistence diagrams.
Definition 8.9**.**
Given two persistence diagrams and , an extended metric on , and a number , the th Wasserstein distance between and is
[TABLE]
where ranges over all bijections from to .
If and , where , the Wasserstein distance is called the bottleneck distance.
One property of PH that is central to its utility in applications is that it is stable and therefore robust to noise: A small perturbation of input data induces only a small perturbation of the corresponding persistence diagram with respect to the bottleneck distance.
Čech Complex and Vietoris–Rips Complex
Persistent homology is a useful tool for analyzing point-cloud data. A point cloud is a finite set of points,
[TABLE]
in a metric space . One can view as a sample from some subspace of . There are various ways to construct a simplicial complex from a point cloud. Two of the most common constructions are the Čech complex and the Vietoris–Rips complex, which we now describe.
Given a nonnegative number , recall that the closed ball of radius around a point is the set of points within distance from ; that is, .
Definition 8.10**.**
*Let be a metric space, and let be a point cloud in . For , the Čech complex at associated with is the simplicial complex whose simplices are sets of points in whose closed -balls have nonempty intersection. The set is a -simplex of if and only if . *
This set of simplices is closed under taking subsets. That is, the conditions for a simplicial complex are indeed satisfied by this definition. The [math]-simplices of correspond precisely to the points ; the -simplices are the pairs of points that are within distance of each other; the -simplices are the triples of points whose )-balls have nonempty intersection; and so on. The Čech complex is the set of points in . For sufficiently large (to be precise, for at least as large as the diameter of ), the Čech complex is the -dimensional simplex together with all of its faces. If , then , and increasing incrementally from [math] to a large enough value gives a filtration of Čech complexes associated with the point cloud :
[TABLE]
For , the following result, known as the Nerve Theorem, states that the Čech complex associated with a point cloud is topologically faithful to the union of the closed )-balls around the points in in the sense that it has the same homotopy type. Intuitively, two spaces have the same homotopy type if they can be transformed into each other by bending, compressing, and expanding them (without having to do any cutting or gluing).
Theorem 8.11**.**
(Nerve Theorem)*. For a point cloud and the Čech complex is homotopy equivalent to the union of the closed -balls around the points in . *
The Nerve Theorem justifies why, when a point cloud is a sample of some subspace of , the Čech filtration can reveal features of this subspace. We expect that features that have a long lifespan in the Čech filtration are likely to correspond to features of the underlying space.
Whether the balls of a certain radius around a set of points in have a point of common intersection depends on the entire metric space and the position of in it. Checking for a point of common intersection is computationally intensive, so it can be impractical (or even infeasible) to construct the Čech filtration associated with a point cloud.
The following construction of a simplicial complex from a point cloud depends only on the pairwise distances of the points. Therefore, it is computationally more efficient than constructing a Čech filtration and hence more useful in practice.
Definition 8.12**.**
*Let be a metric space, and let be a point cloud in . For , the Vietoris–Rips (VR) complex at associated with is the simplicial complex whose simplices are sets of points in that are pairwise within distance . The set is a -simplex of if and only if for all . *
As with , the [math]-simplices of correspond precisely to the points , and the -simplices of are the pairs of points that are within distance of each other. Therefore, the -skeleton of is the same as that of . In the definition of higher-dimensional simplices, only the pairwise distances of points play a role. The simplicial complex is the maximal simplicial complex can be built on its -skeleton; its -simplices are the -cliques of its -skeleton. Consequently, the -skeleton of completely determines the entire simplicial complex. This is an attractive quality from a computational point of view, because it implies that it is possible to store a VR complex as a graph (its -skeleton).
If , then ; increasing incremently from [math] to a value larger than the maximum distance beween any pair of points in a point cloud gives a filtration of VR complexes whose [math]th member is the collection of [math]-simplices and whose final member is the -dimensional simplex together with its faces. That is,
[TABLE]
Let’s now return to the special case . Although a VR complex associated with a point cloud is not a faithful representation of the union of balls around the points in and may not even be topologically equivalent to a subspace of , VR complexes provide a good approximation in the light of persistence, as the following lemma, due to de Silva and Ghrist [31], shows.
Lemma 8.13**.**
For and any , we have that
[TABLE]
Consequently, any topological feature that persists between and in the VR filtration is also a feature of the Čech complex .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] D. Balcan, V. Colizza, B. Gonçalves, H. Hu, J. J. Ramasco, and A. Vespignani , Multiscale mobility networks and the spatial spreading of infectious diseases , Proceedings of the National Academy of Sciences of the United States of America, 106 (2009), pp. 21484–21489.
- 2[2] M. Barthelemy , Spatial networks , Physics Reports, 499 (2011), pp. 1–101.
- 3[3] M. Barthelemy , Morphogenesis of Spatial Networks , Springer International Publishing, Cham, Switzerland, 2018.
- 4[4] M. Belkin and P. Niyogi , Laplacian eigenmaps for dimensionality reduction and data representation , Neural Computation, 15 (2002), pp. 1373–1396.
- 5[5] M. Boguñá, F. Papadopoulos, and D. Krioukov , Sustaining the Internet with hyperbolic mapping , Nature Communications, 1 (2010), 62.
- 6[6] D. Brockmann and D. Helbing , The hidden geometry of complex, network-driven contagion phenomena. , Science, 342 (2013), pp. 1337–1342.
- 7[7] D. Centola, M. W. Macy, and V. M. Eguíluz , Cascade dynamics of complex propagation , Physica A, 374 (2007), pp. 449–456.
- 8[8] R. R. Coifman, S. Lafon, A. B. Lee, M. Maggioni, B. Nadler, F. Warner, and S. W. Zucker , Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps , Proceedings of the National Academy of Sciences of the United States of Americal, 102 (2005), pp. 7426–7431.
