Computing Minimal Persistent Cycles: Polynomial and Hard Cases
Tamal K. Dey, Tao Hou, Sayan Mandal

TL;DR
This paper investigates the computational complexity of finding minimal persistent cycles across different dimensions, proving NP-hardness in general but identifying specific tractable cases related to weak pseudomanifolds.
Contribution
It extends the understanding of minimal persistent cycle computation to higher dimensions and introduces polynomial algorithms for certain cases involving weak pseudomanifolds.
Findings
NP-hardness for d>1 persistent cycles in general complexes
Polynomial algorithms for finite intervals in weak pseudomanifolds
Experiments show minimal cycles capture significant data features
Abstract
Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with coefficients. In addition to proving that it is NP-hard to compute minimal persistent d-cycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the d-th persistence diagram of a weak…
| Problem | Restriction on | Hardness | |
|---|---|---|---|
| PCYC-FINd | NP-hard | ||
| WPCYC-FINd | a weak -pseudomanifold | Polynomial | |
| PCYC-INFd | Polynomial | ||
| WPCYC-INFd | a weak -pseudomanifold | NP-hard | |
| WEPCYC-INFd | a weak -pseudomanifold in d+1 | Polynomial |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Computing Minimal Persistent Cycles:
Polynomial and Hard Cases††thanks: Supported by NSF grants CCF-1740761 and CCF-1839252.
Tamal K. Dey Department of Computer Science and Engineering, The Ohio State University. [email protected]
Tao Hou Department of Computer Science and Engineering, The Ohio State University. [email protected]
Sayan Mandal Department of Computer Science and Engineering, The Ohio State University. [email protected]
Abstract
Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with coefficients. In addition to proving that it is NP-hard to compute minimal persistent -cycles () for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the persistence diagram of a weak -pseudomanifold, we utilize the fact that persistent cycles of such intervals are null-homologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NP-hard, we further assume the weak -pseudomanifold to be embedded in d+1 so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data.
1 Introduction
Persistent homology [15], which captures essential topological features of data, has proven to be a useful stable descriptor since Edelsbrunner et al. [16] first proposed the algorithm for its computation. The understanding of topological persistence was later expanded by several works [5, 9, 11, 31] in terms of both theory and computation. To make use of persistent homology, one typically computes a persistence diagram (also called barcode) which is a set of intervals with birth and death points. Besides just utilizing the set of intervals, some applications [13, 30] need persistence diagrams augmented with representative cycles for the intervals for gaining more insight into the data. These representative cycles, termed as persistent cycles [13], have been studied by Wu et al. [30], Obayashi [24], and Dey et al. [13] recently from the view-point of optimality.
Although the original persistence algorithm of Edelsbrunner et al. [16] implicitly computes persistent cycles, it does not necessarily provide minimal ones. In an earlier work [13], we showed that it is NP-hard to compute minimal persistent -cycles (cycles for 1-dimensional homology groups) when the given interval is finite. Interestingly, the same for infinite intervals turned out to be computable in polynomial time [13]. This naturally leads to the following questions: Are there other interesting cases beyond -dimension for which minimal persistent cycles can be computed in polynomial time? Also, what are the cases that are NP-hard? In this paper, we settle the complexity question for computing minimal persistent cycles with coefficients in general dimensions. We first show that when , computing minimal persistent -cycles for both finite and infinite intervals is NP-hard in general. We then identify a special but important class of simplicial complexes, which we term as weak -pseudomanifolds, whose minimal persistent -cycles can be computed in polynomial time. A weak -pseudomanifold111 The naming of weak pseudomanifold is adapted from the commonly accepted name pseudomanifold (see Definition A.1). is a generalization of a -manifold and is defined as follows:
Definition 1.1**.**
A simplicial complex is a weak -pseudomanifold if each -simplex is a face of no more than two -simplices in .
Specifically, we find that if the given complex is a weak -pseudomanifold, the problem of computing minimal persistent -cycles for finite intervals can be cast into a minimal cut problem (see Section 3) due to the fact that persistent cycles of such kind are null-homologous in the complex. However, when and intervals are infinite, the computation of the same becomes NP-hard (see Section 5). Nonetheless, for infinite intervals, if we assume that the weak -pseudomanifold is embedded in d+1, the minimal persistent cycle problem reduces to a minimal cut problem (see Section 4) and hence belongs to P. Note that a simplicial complex embedded in d+1 is automatically a weak -pseudomanifold. Also note that while there is an algorithm [8] in the non-persistence setting which computes minimal -cycles by minimal cuts, the non-persistence algorithm assumes the -complex to be embedded in d+1. Our algorithm for finite intervals, to the contrary, does not need the embedding assumption.
In order to make our statements about the hardness results precise, we let PCYC-FINd denote the problem of computing minimal persistent -cycles for finite intervals when the given simplicial complex is arbitrary, and let PCYC-INFd denote the same problem for infinite intervals (see definitions of Problem 2.1 and 2.2). We also let WPCYC-FINd denote a subproblem222 For two problems and , is a subproblem of if any instance of is an instance of and asks for computing the same solutions as . of PCYC-FINd and let WPCYC-INFd, WEPCYC-INFd denote two subproblems of PCYC-INFd, with the subproblems requiring additional constraints on the given simplicial complex. Table 1 lists the hardness results for all problems of interest, where the column “Restriction on ” specifies the additional constraints subproblems require on the given simplicial complex . Note that WPCYC-INFd being NP-hard trivially implies that PCYC-INFd is NP-hard.
Main contributions.
We summarize our contributions as follows:
- •
We prove the NP-hardness of PCYC-FINd and WPCYC-INFd for all .
- •
We present two polynomial time algorithms for WPCYC-FINd and WEPCYC-INFd when , based on the duality of minimal persistent cycles and minimal cuts. Other than the minimal cut computation, steps in both algorithms run in linear or almost linear time.
1.1 Related works
In the context of computing optimal cycles, most works have been done in the non-persistence setting. These works compute minimal cycles for homology groups of a given simplicial complex. Only very few works address the problem while taking into account the persistence. We review some of the relevant works below.
Minimal cycles for homology groups.
In terms of computing minimal cycles for homology groups, two problems are of most interest: the localization problem and the minimal basis problem. The localization problem asks for computing a minimal cycle in a homology class and the minimal basis problem asks for computing a set of generating cycles for a homology group whose sum of weights is minimal. With coefficients, these two problems are in general hard. Specifically, Chambers et al. [4] proved that the localization problem over dimension one is NP-hard when the given simplicial complex is a 2-manifold. Chen and Freedman [8] proved that the localization problem is NP-hard to approximate with fixed ratio over arbitrary dimension. They also showed that the minimal basis problem is NP-hard to approximate with fixed ratio over dimension greater than one. For one-dimensional homology, Dey et al. [14] proposed a polynomial time algorithm for the minimal basis problem. Several other works [3, 7, 12, 18] address variants of the two problems while considering special input classes, alternative cycle measures, or coefficients for homology other than .
In this work, we use graph cuts and their duality extensively. The duality of cuts on a planar graph and separating cycles on the dual graph has long been utilized to efficiently compute maximal flows and minimal cuts on planar graphs, a topic for which Chambers et al. [4] provide a comprehensive review. In their paper [4], Chambers et al. discover the duality between minimal cuts of a surface-embedded graph and minimal homologous cycles in a dual complex, and then devise algorithms for both problems assuming the genus of the surface to be fixed. Chen and Freedman [8] proposed an algorithm which computes a minimal non-bounding -cycle given a -complex embedded in d+1, utilizing a natural duality of -cycles in the complex and cuts in the dual graph. The minimal non-bounding cycle algorithm can be further extended to solve the localization problem and the minimal basis problem over dimension given a -complex embedded in d+1.
Persistent cycle.
As pointed out earlier, our main focus is the optimality of representative cycles in the persistence framework. Some early works [17, 19] address the representative cycle problem for persistence by computing minimal cycles at the birth points of intervals without considering what actually die at the death points. Wu et al. [30] proposed an algorithm computing minimal persistent 1-cycles for finite intervals using an annotation technique and heuristic search. However, the time complexity of the algorithm is exponential in the worst-case. Obayashi [24] casts the minimal persistent cycle problem for finite intervals into an integer program, but the rounded result of the relaxed linear program is not guaranteed to be optimal. Dey et al. [13] formalizes the definition of persistent cycles for both finite and infinite intervals. They also proved the NP-hardness of computing minimal persistent 1-cycles for finite intervals and proposed a polynomial time algorithm for computing non-optimal ones which are still good in practice.
2 Preliminaries
In this section we present some concepts necessary for presenting the results in this paper.
Simplicial complex.
A simplicial complex is a collection of simplices which are abstractly defined as subsets of a ground set called the vertex set of . If a simplex is in , then all its subsets called its faces are also in . The simplex is also referred to as a -simplex if the cardinality of the vertex set of is . A -face of is a -simplex being a face of and a -coface of is a -simplex having as a face. We call a -simplex of a boundary -simplex if it has less than two -cofaces in . A simplicial set is a set of simplices and the closure of a simplicial set is the simplicial complex consisting of all the faces of the simplices in . A simplicial complex is finite if it contains finitely many simplices. In this paper, we only consider finite simplicial complexes.
If each vertex of a simplicial complex is a point in a Euclidean space, then each simplex of can be interpreted as the convex hull of its vertices. The simplicial complex is said to be embedded in the Euclidean space if the interiors of all its simplices are disjoint. The underlying space of , denoted by , is the point-wise union of all the simplices of .
Definition 2.1** (Oriented simplex [23]).**
A -simplex with an ordering of its vertices is an oriented -simplex. For each -simplex (), there are exactly two equivalent classes of vertex orderings, resulting in two oriented -simplices of . We refer to them as the oppositely oriented -simplices.
Remark 2.1**.**
Any simplex by default is unoriented. We denote an unoriented -simplex spanned by vertices as and an oriented -simplex as , where specify the ordering of the spanning vertices.
Filtration.
A filtration of a simplicial complex is a filtered sequence of subcomplexes of , , such that and differ by one simplex denoted by . We let be the index of in and denote it as . A subcomplex in the filtered sequence of is also referred to as a partial complex.
Simplicial homology.
We provide a brief overview of simplicial homology used in this paper. See any standard book on the topic, e.g. [23]. Let , be a simplicial complex, and be an abelian group. The * chain group* is defined to be the abelian group containing all finite sums of the form , where and is an oriented -simplex of . Each element in is called a -chain of . Note that for two oppositely oriented -simplices and , we have that for any . Therefore, can be interpreted as a direct sum of copies of where is the number of -simplices of and each copy of corresponds to a -simplex of . The * boundary operator* is a group homomorphism such that for any oriented -simplex
[TABLE]
where the notation means that is deleted from the simplex. For brevity, we often omit the subscript of the boundary operator and denote it as when this does not cause any confusion. The kernel of is called the * cycle group* of and is denoted as . The image of is called the * boundary group* of and is denoted as . A -chain in is called a -cycle and a -chain in is called a -boundary. For a -chain , the -chain is also called the boundary of .
A fundamental fact in homology theory is that for any . This implies that . The * homology group* of denoted by is defined as the quotient . Each coset in is called a homology class and a cycle is said to be homologous to another cycle if they belong to the same homology class. As any boundary cycle represents the homology class [math] in , a boundary is also said to be null-homologous.
The abelian group in the above definitions is called the coefficient group for the homology groups. Sometimes, when the coefficient group is clear, we simply drop it and denote a chain group as . This applies to other groups defined in simplicial homology. In this paper, two coefficient groups and are used for simplicial homology. When not explicitly stated, the coefficients are assumed to be in . With coefficients, the orientations of simplices no longer matter and a -chain can be interpreted as a set of -simplices with summation of two -chains being the symmetric difference. A -cycle is then a set of -simplices where every -face of these simplices adjoins an even number of -simplices. Also note that because is a field, all groups defined in simplicial homology with coefficients become vector spaces and homomorphisms between these groups (such as ) become linear maps.
Definition 2.2** (-weighted).**
A simplicial complex is -weighted if each -simplex of has a non-negative finite weight . The weight of a -chain of is then defined as .
Definition 2.3** (-connected).**
Let be a simplicial complex, for , two -simplices and of are -connected in if there is a sequence of -simplices of , , such that , , and for all , and share a -face. The property of -connectedness defines an equivalence relation on -simplices of . Each set in the partition induced by the equivalence relation constitutes a -connected component of . We say is -connected if any two -simplices of are -connected in .
Remark 2.2**.**
See Figure 2a for an example of 1-connected components and 2-connected components.
Definition 2.4** (-connected cycle).**
A -cycle (with coefficients) is -connected if the complex derived by taking the closure of the simplicial set is -connected.
Persistent homology.
We will provide a brief description of persistent homology. We recommend the book by Edelsbrunner and Harer [15] for a detailed explanation of this topic and the book by Chazal et al. [6] for its underlying Mathematical structure, persistence module. Note that persistent homology in this paper is always assumed to be with coefficients. The persistence algorithm starts with a filtration of a simplicial complex , and for each simplex , inspects whether is a boundary in . If is a boundary in , is called positive; otherwise, it is called negative. The -chains (or -cycles) in that are not in are said to be born in or created by . A positive -simplex creates some -cycles and a negative -simplex makes some -cycles become boundaries. In the latter case, we also say that the negative -simplex kills or destroys those -cycles. What is central to the persistence algorithm is a notion called pairing: A positive simplex is initially unpaired when introduced; when a negative -simplex comes, the algorithm finds a -cycle created by an unpaired positive -simplex which is homologous to and pair with . Alongside the pairing, a finite interval is added to the persistence diagram, which is denoted by . After all simplices are processed, some positive simplices may still be unpaired. For each of these unpaired simplices, an infinite interval is added to , where is the dimension of .
Note that the pairing in the persistence algorithm for a given filtration is unique. Also note that in this paper, we assume a filtration of a complex is given and the persistence intervals start and end with indices of the paired simplices. However, in real-life applications, one is often given a function on a simplicial complex. To produce the persistence intervals, a filtration needs to be derived and the endpoints of the intervals are taken as function values on the paired simplices. In such cases, we can associate a given interval to its simplex pair, take the indices of the paired simplices, and get an interval which can serve as an input to our algorithms.
The persistent cycle problems.
We can now formally define the minimal persistent cycle problems:
Problem 2.1** (PCYC-FINd).**
Given a finite -weighted simplicial complex , a filtration , and a finite interval , this problem asks for computing a -cycle with the minimal weight which is born in and becomes a boundary in .
Problem 2.2** (PCYC-INFd).**
Given a finite -weighted simplicial complex , a filtration , and an infinite interval , this problem asks for computing a -cycle with the minimal weight which is born in .
Remark 2.3**.**
The definitions of the above two problems are derived directly from the definition of persistent -cycles [13].
Undirected flow network.
An undirected flow network consists of an undirected graph with vertex set and edge set , a capacity function , and two non-empty disjoint subsets and of . Vertices in are referred to as sources and vertices in are referred to as sinks. A cut of consists of two disjoint subsets and of such that , , and . The set of edges that connect a vertex in and a vertex in are referred as the edges across the cut and is denoted as . The capacity of a cut is defined as . A minimal cut of is a cut with the minimal capacity. Note that we allow parallel edges in (see Figure 2a) to ease the presentation. These parallel edges can be merged into one edge during computation.
3 Minimal persistent -cycles of finite intervals for weak -pseudomanifolds
In this section, we present an algorithm which computes minimal persistent -cycles for finite intervals given a filtration of a weak -pseudomanifold when . The general process is as follows: Suppose that the input weak -pseudomanifold is associated with a filtration and the task is to compute the minimal persistent cycle of a finite interval . We first construct an undirected dual graph for where vertices of are dual to -simplices of and edges of are dual to -simplices of . One dummy vertex termed as infinite vertex which does not correspond to any -simplices is added to for graph edges dual to those boundary -simplices. We then build an undirected flow network on top of where the source is the vertex dual to and the sink is the infinite vertex along with the set of vertices dual to those -simplices which are added to after . If a -simplex is or added to before , we let the capacity of its dual graph edge be its weight; otherwise, we let the capacity of its dual graph edge be . Finally, we calculate a minimal cut of this flow network and return the -chain dual to the edges across the minimal cut as a minimal persistent cycle of the interval.
The intuition of the above algorithm is best explained by an example in Figure 1, where . The key to the algorithm is the duality between persistent cycles of the input interval and cuts of the dual flow network having finite capacity. To see this duality, first consider a persistent -cycle of the input interval . There exists a -chain in created by whose boundary equals , making killed. We can let be the set of graph vertices dual to the simplices in and let be the set of the remaining graph vertices, then is a cut. Furthermore, must have finite capacity as the edges across it are exactly dual to the -simplices in and the -simplices in have indices in less than or equal to . On the other hand, let be a cut with finite capacity, then the -chain whose simplices are dual to the vertices in is created by . Taking the boundary of this -chain, we get a -cycle . Because -simplices of are exactly dual to the edges across and each edge across has finite capacity, must reside in . We only need to ensure that contains in order to show that is a persistent cycle of . In Section 3.2, we argue that actually contains , so is indeed a persistent cycle. Note that while the above explanation introduces the general idea, the rigorous statement and proof of the duality are articulated by Proposition 3.2 and 3.3.
We list the pseudo-code in Algorithm 1 and it works as follows: Line 3 and 4 set up a complex that the algorithm mainly works on, where is taken as the closure of the -connected component of containing . The reason for working on instead of the entire complex is explained later in this section. Line 6 constructs the dual graph from and line 818 builds the flow network on top of . Note that we denote the infinite vertex by . Line 19 computes a minimal cut for the flow network and line 20 returns the -chain dual to the edges across the minimal cut. In the pseudo-codes of this paper, to ease the exposition, we treat a Mathematical function as a computer program object. For example, the function returned by DualGraphFin in Algorithm 1 denotes the bijection between the simplices of and their dual vertices or edges (see Section 3.1 for details). In practice, these constructs can be easily implemented in any computer programming language.
To see the reason why we work on , we first note that the dual graph constructed directly from may be disconnected333 For an example in , take as two disconnected triangulated 2-spheres. Its dual graph consists of two connected components.. While cuts are still well-defined for a disconnected flow network, one may prefer a connected one as the minimal cut computation only concerns the graph component containing the source. By constructing the dual graph from , it can be ensured that the graph is connected. In order for Algorithm 1 to work, one has to further show that the sink is non-empty so that the computed persistent cycle is non-empty. This is verified in Proposition 3.1. An intuitive reason why the computation from is still correct is as follows: Each persistent -cycle of the given interval corresponds to a -chain which kills , i.e., . Suppose that is not entirely contained in . Notice that and contains at least the killer simplex . Then must be a persistent cycle of the interval residing in which has a smaller weight. Hence, a minimal persistent cycle must reside in . In Section 3.2, we formally verify the construction.
Complexity.
The time complexity of Algorithm 1 depends on the encoding scheme of the input and the data structure used for representing a simplicial complex. For encodings of the input, we assume and to be represented by a sequence of all the simplices of ordered by their indices in , where each simplex is denoted by its set of vertices. We also assume a simple yet reasonable simplicial complex data structure as follows: In each dimension, simplices are mapped to integral identifiers ranging from 0 to the number of simplices in that dimension minus 1; each -simplex has an array (or linked list) storing all the id’s of its -cofaces; a hash map for each dimension is maintained for the query of the integral id of each simplex in that dimension based on the spanning vertices of the simplex. We further assume to be constant. By the above assumptions, let be the size (number of bits) of the encoded input, then there are no more than elementary operations in line 3 and 4. So, the time complexity of line 3 and 4 is . It is not hard to verify that the flow network construction also takes time so the time complexity of Algorithm 1 is determined by the minimal cut algorithm. Using the max-flow algorithm by Orlin [25], the time complexity of Algorithm 1 becomes .
In the rest of this section, we first explain the bijection returned by DualGraphFin, then prove the correctness of the algorithm.
3.1 The bijection
The vertex set of contains vertices which correspond to the -simplices of . The set may also contain an infinite vertex if contains any boundary -simplex. We define a bijection
[TABLE]
such that for any -simplex of , is the vertex that is dual to. Similarly, we define another bijection
[TABLE]
using the same notation .
Note that we can take the image of a subset of the domain under a function. Therefore, if is a cut for a flow network built on , then denotes the set of -simplices dual to the edges across the cut. Also note that since simplicial chains with coefficients can be interpreted as sets, is also a -chain.
3.2 Algorithm correctness
In this subsection, we prove the correctness of Algorithm 1. Some of the symbols we use refer to Algorithm 1.
Proposition 3.1**.**
In Algorithm 1, the sink is not an empty set.
Proof.
For contradiction, suppose that is an empty set. Then, and is the -simplex of with the greatest index in . Because , any -simplex of must be a face of two -simplices of , so the set of -simplices of forms a -cycle created by . Then must be a positive simplex in , which is a contradiction. ∎
The following two propositions specify the duality mentioned at the beginning of this section:
Proposition 3.2**.**
For any cut of with finite capacity, the -chain is a persistent -cycle of and .
Proof.
Let , we first want to prove , so that is a cycle. Let be any -simplex of , then connects a vertex and a vertex . If , then cannot be a face of another -simplex in other than . So, is a face of exactly one -simplex of . If , then is also a face of exactly one -simplex of . Therefore, . On the other hand, let be any -simplex of , then is a face of exactly one -simplex of . If is a face of another -simplex in , then and . So, connects the vertex and the vertex in the graph . If is a face of exactly one -simplex in , must connect and in . So we have , i.e., .
We then show that is created by . By Proposition 3.1, cannot be empty. Therefore, for contradiction, we can suppose that is created by a -simplex . Because has finite capacity, we have that . We can let be a persistent cycle of and where is a -chain of . Then we have . Since and are both created by , then is created by a -simplex with an index less than in . So is a -cycle created by which becomes a boundary before is added. This means that is already paired when is added, contradicting the fact that is paired with . Similarly, we can prove that is not a boundary until is added, so is a persistent cycle of . Since has finite capacity, we must have
[TABLE]
Proposition 3.3**.**
For any persistent -cycle of , there exists a cut of such that .
Proof.
Let be a -chain in such that . Note that is created by and is the set of -simplices which are face of exactly one -simplex of . Let and , we claim that . To prove this, first let be any -simplex of , then is a face of exactly one -simplex of . Since , it is also true that , so . Then is a face of exactly one -simplex of , so . On the other hand, let be any -simplex of , then is a face of exactly one -simplex of . Note that and we then want to prove that is a face of exactly one -simplex of . Suppose that is a face of another -simplex of , then because . So we have , contradicting the fact that is a face of exactly one -simplex of . Then we have . Since , we have , which means that .
Let and , then it is true that is a cut of because is created by . We claim that . The proof of the equality is similar to the one in the proof of Proposition 3.2. It follows that . We then have that
[TABLE]
because each -simplex of has an index less than or equal to in .
Finally, because is a subchain of , we must have . ∎
Combining the above facts, we can conclude:
Theorem 3.1**.**
Algorithm 1 computes a minimal persistent -cycle for the given interval .
Proof.
First, the flow network constructed by Algorithm 1 must be valid by Proposition 3.1. Next, because the interval must have a persistent cycle, by Proposition 3.3, the flow network has a cut with finite capacity. This means that is finite. By Proposition 3.2, the chain is a persistent cycle of . Assume that is not a minimal persistent cycle of and instead let be a minimal persistent cycle of . Then there exists a cut such that by Proposition 3.2 and 3.3, contradicting the fact that is a minimal cut. ∎
4 Minimal persistent -cycles of infinite intervals for weak -pseudomanifolds embedded in d+1
We already mentioned that computing minimal persistent -cycles () for infinite intervals is NP-hard even if we restrict to weak -pseudomanifolds (see Section 5.3 for a proof). However, when the complex is embedded in d+1, the problem becomes polynomially tractable. In this section, we present an algorithm for this problem in 444 As mentioned earlier, when , this problem is polynomially tractable for arbitrary complexes.. The algorithm uses a similar duality described in Section 3. However, a direct use of the approach in Section 3 does not work. For example, in Figure 2a, 1-simplices that do not have any 2-cofaces cannot reside in any -connected component of the given complex. Hence, no cut in the flow network may correspond to a persistent cycle of the infinite interval created by such a -simplex. Furthermore, unlike the finite interval case, we do not have a negative simplex whose dual can act as a source in the flow network.
Let be an input to the problem where is a weak -pseudomanifold embedded in d+1, is a filtration of , and is an infinite interval of . By the definition of the problem, the task boils down to computing a minimal -cycle containing in . Note that is also a weak -pseudomanifold embedded in d+1.
Generically, assume is an arbitrary weak -pseudomanifold embedded in d+1 and we want to compute a minimal -cycle containing a -simplex for . By the embedding assumption, the connected components of are well defined and we call them the voids of . The complex has a natural (undirected) dual graph structure as exemplified by Figure 2a for , where the graph vertices are dual to the -simplices as well as the voids and the graph edges are dual to the -simplices. The duality between cycles and cuts is as follows: Since the ambient space d+1 is contractible (homotopy equivalent to a point), every -cycle in is the boundary of a -dimensional region obtained by point-wise union of certain -simplices and/or voids. We can derive a cut555 The cut here is defined on a graph without sources and sinks, so the cut is simply a partition of the vertex set into two sets. of the dual graph by putting all vertices contained in the -dimensional region into one vertex set and putting the rest into the other vertex set. On the other hand, for every cut of the graph, we can take the point-wise union of all the -simplices and voids dual to the graph vertices in one set of the cut and derive a -dimensional region. The boundary of the derived -dimensional region is then a -cycle in . We observe that by making the source and sink dual to the two -simplices or voids that adjoins, we can build a flow network where a minimal cut produces a minimal -cycle in containing .
The efficiency of the above algorithm is in part determined by the efficiency of the dual graph construction. This step requires identifying the voids that the boundary -simplices are incident on. A straightforward approach would be to first group the boundary -simplices into -cycles by local geometry, and then build the nesting structure of these -cycles to correctly reconstruct the boundaries of the voids. This approach has a quadratic worst-case complexity. To make the void boundary reconstruction faster, we assume that the simplicial complex being worked on is -connected so that building the nesting structure is not needed. Our reconstruction then runs in almost linear time. To satisfy the -connected assumption, we begin our algorithm by taking as a -connected subcomplex of containing and continue only with this . The computed output is still correct because the minimal cycle in is again a minimal cycle in as shown in Section 4.2.
We list the pseudo-code in Algorithm 1 and it works as follows: Line 36 set up the complex that the algorithm works on. Line 3 prunes to produce a complex . Given , the Prune subroutine iteratively deletes a -simplex of such that there is a -face of having as the only -coface (i.e., is a dangled -simplex), until no such -simplex can be found. It is not hard to verify that Prune only deletes -simplices not residing in any -cycles, so a minimal -cycle containing is never deleted. We perform the pruning because it can reduce the graph size for the minimal cut computation which is more time consuming. In line 46, we take the -connected component of containing and add a set of -simplices to the closure of to form . The set contains all -simplices of whose -faces reside in . The reason of adding the set is to reduce the number of voids for the complex and in turn reduce the running time of the subsequent void boundary reconstruction. For example, in Figure 3b, we could treat the entire complex as , all 1-simplices as , and all 2-simplices as . If we do not add to the closure of , there will be seven more voids corresponding to the seven 2-simplices. Line 8 reconstructs the void boundaries for . Each returned denotes a set of -simplices forming the boundary of a void. As indicated in Section 4.1, the -simplices in a void boundary are oriented. Line 9 constructs the dual graph based on the reconstructed void boundaries. Similar to Algorithm 1, the function returned by DualGraphInf denotes the bijection from -simplices of to . Line 1117 build the flow network on top of . The capacity of each edge is equal to the weight of its dual -simplex and the source and sink are selected as previously described. Line 18 computes a minimal cut for the flow network and line 19 returns the -chain dual to the edges across the minimal cut.
Complexity.
We make the same assumptions as in the complexity analysis for Algorithm 1. Since the void boundary reconstruction needs to sort the -cofaces of certain -simplices, its worst-case time complexity is . Then, all operations other than the minimal cut computation take time. Therefore, similar to Algorithm 1, Algorithm 1 achieves a complexity of by using Orlin’s max-flow algorithm [25].
In the rest of this section, we first describe the subroutine VoidBoundary invoked by Algorithm 1 and then prove the correctness of the algorithm.
4.1 Void boundary reconstruction
As previously stated, the object of the reconstruction is to identify which voids a boundary -simplex of is incident on. The task becomes complicated because a void may have disconnected boundaries and a -simplex may bound more than one void. This is exemplified in Figure 3a. To address this issue, we orient the boundary -simplices and determine the orientations consistently from the voids they bound. This is possible because an orientation of a -simplex in d+1 associates exactly one of its two sides to the -simplex. To reconstruct the boundaries, we first inspect the neighborhood of each -simplex being a face of a boundary -simplex and pair the oriented boundary -simplices in the neighborhood which locally bound the same void. Figure 2b gives an example of the oriented boundary -simplices pairing for . In Figure 2b, there are three local voids each colored differently. The oriented 1-simplices with the same color bound the same void and are paired.
After pairing the oriented boundary -simplices, we group them by putting paired ones into the same group. Each group then forms a -cycle (with coefficients). This is exemplified by Figure 3 for . Note that in general, the above grouping does not fully reconstruct the void boundaries. This can be seen from Figure 3a where the complex has four voids but the grouping produces six 1-cycles. In order to fully reconstruct the boundaries, one has to retrieve the nesting structure of these -cycles, which may take time in the worst-case. However, as we work on a complex that is -connected, we cannot have voids with disconnected boundaries. Therefore, the grouping of oriented -simplices can fully recover the void boundaries. Figure 3b gives an example for this when , where we add two 1-simplices to make the complex 1-connected. The four 1-cycles produced by the grouping are exactly the boundaries of the four voids.
In the rest of this subsection, we formalize the above ideas for reconstructing void boundaries and provide a proof for the correctness. Throughout this subsection, and are as defined in Algorithm 1. We first introduce the definition of the natural orientation of a -simplex in q. We use its induced orientation to canonically orient the boundary simplices.
Definition 4.1** (Natural orientation [22]).**
Let and be a -simplex in q, an oriented simplex of is naturally oriented if . For each face of , the natural orientation of induces an orientation of which we term as the induced orientation.
We now formally define the boundary of a void as follows:
Definition 4.2** (Boundary of void).**
Let be a simplicial complex embedded in q where , an oriented -simplex of is said to bound a void of if the following conditions are satisfied:
- •
The simplex is contained in the closure of .
- •
Let be an interior point of , be a point in such that the line segment is contained in and is orthogonal to the hyperplane spanned by . Furthermore, let be the naturally oriented simplex of . Then, has the induced orientation from .
The boundary of a void is then defined as the set of oriented -simplices of bounding .
Remark 4.1**.**
We can also interpret the boundary of a void as a sum of oriented -simplices, then the boundary defines a -cycle (with coefficients).
We now describe the pairing algorithm of the oriented boundary -simplices for . From now on, we denote the set of boundary -simplices of as . Let be a -simplex which is a face of a -simplex in , we first take a 2D plane which contains an interior point of and is orthogonal to the hyperplane spanned by . We then take the intersection of the plane with each boundary -simplex in the neighborhood of to get a set of line segments that we order circularly starting from an arbitrary one. For each two consecutive line segments in this order which enclose a void, we pick a point on the plane which resides in the void. Suppose that one of the two line segments is derived from a boundary -simplex . We take the -simplex and the induced oriented simplex of derived from the naturally oriented simplex of . For the other line segment, we similarly derive an induced oriented simplex and pair the two oriented -simplices and . Figure 2b can be reused to exemplify the pairing. The union of the shaded regions in the figure is the plane and , , , and are the line segments derived from intersecting the plane with four boundary -simplices. Taking the circular order , we see that the consecutive ones which enclose a void are , , and . For , we can pick as an interior point in the blue region and the two oriented -simplices corresponding to and can be induced and paired.
In summary, the steps of the VoidBoundary subroutine are the following:
For each -simplex being a face of a -simplex in , pair all oriented boundary -simplices in the neighborhood. 2. 2.
After gathering all the pairing, group the oriented boundary -simplices by putting all paired ones into a group. 3. 3.
Return , each of which is a group of the oriented boundary -simplices.
The following theorem concludes the correctness of the reconstruction:
Theorem 4.1**.**
Any returned by VoidBoundary is the boundary of a void of .
Proof.
See Appendix A. ∎
4.2 Algorithm correctness
To prove the correctness of Algorithm 1, we need two conclusions about cycles with coefficients. Specifically, Proposition 4.1 says that an embedded -cycle in separates the space and hence the two oriented simplices of a -simplex in the cycle bound different voids. Proposition 4.2 says that a -simplex in a -cycle belongs to a -connected sub-cycle of the -cycle.
Proposition 4.1**.**
Let , be a -cycle (with coefficients) of a simplicial complex embedded in q, and be the closure of the simplicial set . Then for any -simplex of , the two oriented simplices of must bound different voids of .
Proof.
Consider a closed topological -ball such that and equals the boundary of . Let and be the two open half balls of separated by . Then it is true that the two oriented simplices of bound different voids of if and only if and are not connected in . So we only need to show that and are not connected in . Consider a filtration of where is the last simplex added. Because is a positive simplex in the filtration, by adding , the dimension of must increase by 1. By Alexander duality, the dimension of of the complement space also increases by 1. Then and cannot be connected in . ∎
Proposition 4.2**.**
Let be a -cycle (with coefficients) of a simplicial complex where , then for any -simplex of , there must be a -cycle (with coefficients) containing such that and is -connected.
Proof.
We can construct an undirected graph for , with vertices of corresponding to the -simplices in . For each -simplex which is a face of a -simplex of , let be the set of -simplices in having as a face, then must be even. We can pair -simplices of arbitrarily, and make each pair of -simplices form an edge in . Let be the connected component of containing the corresponding vertex of and be the -chain corresponding to , then must be a cycle. This is because we can pair the -faces of all -simplices in according to the edges in , so . Furthermore, contains , , and is -connected. ∎
Throughout the rest of this subsection, some of the symbols we use refer to Algorithm 1. We endow the ambient space d+1 with a “cellular complex” structure by treating voids of as -dimensional “cells”. This cellular complex of d+1 is denoted as and . For , most terminologies from algebraic topology for simplicial complexes are inherited with the exception that -dimensional elements of are called -cells. Then, we can also let denote the bijection from -cells of to . To derive for a void of , we map oriented -simplices in the boundary of (Definition 4.2) to their corresponding unoriented -simplices. Then is defined as the sum (with coefficients) of these unoriented -simplices. It is not hard to see that is a -cycle (with coefficients) because each void boundary is a -cycle (with coefficients).
Proposition 4.3**.**
For any cut of , the -chain is a persistent -cycle of and .
Proof.
We have three things to show: (i) contains ; (ii) ; (iii) is a cycle. Claim (i) and (ii) are not hard to verify and we prove claim (iii) by showing that , so that as a sum of cycles, is a cycle. The detail for the equality of the two chains is omitted as it is similar to the one in the proof of Proposition 3.2. ∎
Proposition 4.4**.**
For any persistent -cycle of , there exists a cut of such that .
Proof.
Because of the nature of the pruning, must reside in . By Proposition 4.2, there must be a -cycle such that is -connected and contains . Hence, resides in . Let be the closure of the simplicial set , we can run the void boundary reconstruction algorithm of Section 4.1 on and take a void boundary containing an oriented simplex of . We can map each oriented simplex of to its unoriented simplex and let be the sum of these unoriented simplices, then is a -cycle (with coefficients) and . By Proposition 4.1, the oppositely oriented simplex of must not be in , so contains . Let bound a void of , we can let be the -chain of consisting of all the -cells residing in and let be the -chain consisting of all the other -cells, then . Let be the two end vertices of . Because the oppositely oriented simplex of does not bound in , it must be true that one of is in and the other is in . We can let or based on which set contains the source of the flow network, then is a cut of the flow network constructed in Algorithm 1. Furthermore, we have and . ∎
The following theorem concludes the correctness of Algorithm 1:
Theorem 4.2**.**
Algorithm 1 computes a minimal persistent -cycle for the given interval .
Proof.
First, the flow network constructed by Algorithm 1 is valid. The reason is that, by Proposition 4.1, it cannot happen that the two oriented simplices of bound the same void of . So must correspond to an edge of . Then by Proposition 4.3 and 4.4, we can reach the conclusion. ∎
5 Hardness for general complexes
Similar to the work [8], the NP-hardness proofs in this section accomplish the reduction with the help of a suspension operator. While Hatcher [21] defines this operator for general topological spaces, we need a definition of the operator for simplicial complexes and observe some of its properties that are useful for the proofs.
5.1 Suspension operator
Definition 5.1** (Suspension [20]).**
The suspension of a simplicial complex is defined as a simplicial complex
[TABLE]
where , are two extra vertices.
Remark 5.1**.**
In the above definition, we denote a simplex by its set of vertices.
In the rest of this subsection, we let be an arbitrary simplicial complex. Any simplex of the form in is called a suspended simplex. The symbol is also used to denote a linear map , where for any -simplex of . Note that since is injective, the map defines an isomorphism from to the image . For any chain , we abuse the notation slightly by letting denote the chain in mapped to under .
Proposition 5.1**.**
For any , the following diagram commutes:
\textstyle{\text{\sf C}_{q}(K)\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\mathcal{S}}$$\scriptstyle{\approx}$$\scriptstyle{\partial}$$\textstyle{\text{\sf C}_{q-1}(K)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\mathcal{S}}$$\scriptstyle{\approx}$$\textstyle{\mathcal{S}(\text{\sf C}_{q}(K))\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\partial}$$\textstyle{\mathcal{S}(\text{\sf C}_{q-1}(K))} *
Proof.
For any -simplex of , we have
[TABLE]
In the above equations, the notation means that is deleted from the simplex. ∎
Proposition 5.2**.**
For and any -cycle of containing only suspended simplices, one has .
Proof.
For any suspended -simplex of , if , then must also belong to because no other suspended -simplices of have in the boundary. If , the same argument follows. ∎
Proposition 5.3**.**
If is the top dimension of and , then for any such that contains only suspended simplices, one has .
Proof.
Because is the top dimension of , contains only suspended simplices. For any , we have \sigma\in\partial\big{(}\sigma\cup\{\omega_{i}\}\big{)}. If , to make cancelled in , must also belong to because no other -simplices in have in the boundary. If , the same argument follows. ∎
5.2 Hardness for finite intervals
The following proposition helps to prove our conclusion of the hardness:
Proposition 5.4**.**
PCYC-FINd-1 reduces to PCYC-FINd for .
Proof.
Given an instance of PCYC-FINd-1, where the complex of is denoted as , we can assume the top dimension of to be . The reason is that if it were not, we can restrict to the -skeleton of without affecting and the persistent -cycles. Then, we let be the simplicial complex for the instance of PCYC-FINd we are going to construct. For any suspended -simplex of , let the weight of be half of the weight of in . Furthermore, let the weight of any non-suspended -simplex of be the sum of all the weights of -simplices in plus . We endow with a filtration , where is the number of simplices of . Denoting the simplex added in as and the simplex added in as , we let , , and for any , , , .
We observe the following facts:
- (i)
For any , is positive and pairs with in . 2. (ii)
For any and , if there is a -cycle created by which is a boundary in , then there is a -cycle created by which is a boundary in . 3. (iii)
For any and , if there is a -cycle created by which is a boundary in , then there is a -cycle created by which is a boundary in .
The correctness of (i) is not hard to verify. To verify (ii), we can suspend the -cycle and use Proposition 5.1 to reach the claim. The argument for (iii) is as follows: Consider a -cycle created by which is a boundary in . For any non-suspended -simplex of , we add \partial\big{(}\sigma\cup\{\omega_{1}\}\big{)} to the cycle so that is canceled and only suspended simplices are added. Note that the adding process only adds -simplices in and never cancels . After all non-suspended simplices of are canceled, we derive a -cycle which is created by and contains only suspended simplices. By Proposition 5.2, is well defined. Since is homologous to in , is also a boundary in . Let be the boundary of a -chain in . Because , by Proposition 5.3, . Furthermore, by Proposition 5.1, we have . So is a -cycle created by which is a boundary in .
From the above facts, it is immediate that is a positive simplex in and pairs with so that is an interval in . It is also true that there is a bijection from the persistent -cycles of to the persistent -cycles of containing only suspended simplices. Furthermore, the bijection preserves the weights of the cycles. From the weight assigning policy, the minimal persistent -cycle of must contain only suspended simplices, so this minimal persistent -cycle of induces a minimal persistent -cycle of . Now we have reduced PCYC-FINd-1 to PCYC-FINd. Furthermore, the reduction is in polynomial time and the size of is a polynomial function of the size of . ∎
We have the following result from [13]:
Proposition 5.5**.**
PCYC-FIN1 is NP-hard.
Combining Proposition 5.4 and 5.5, we obtain the following theorem:
Theorem 5.1**.**
PCYC-FINd is NP-hard for .
5.3 Hardness for infinite intervals
In this subsection, we prove that it is NP-hard to approximate WPCYC-INFd with any fixed ratio. Let PROB be a minimization problem with solutions having positive costs. Given an instance of PROB, let be the cost of the minimal solution of . For , a solution of with cost is said to have an approximation ratio if [10]. We let PROB denote the problem that asks for an approximate solution with ratio given an instance of PROB. Moreover, in order to make approximation ratios well-defined for WPCYC-INFd, we let WPCYC-INF denote a subproblem of WPCYC-INFd where all -simplices are positively weighted.
Before proving the hardness result, we first recall the definition of the nearest codeword problem, which is NP-hard to approximate with any fixed ratio [8]:
Problem 5.1** (NR-CODE).**
Given an full-rank matrix over for and a vector , find a vector in with the minimal Hamming weight.
Remark 5.2**.**
The Hamming weight of a vector , denoted as , is the number of non-zero components in .
Theorem 5.2**.**
WPCYC-INF is NP-hard to approximate with any fixed ratio.
Similar to the NP-hardness proof of homology localization in [8], our proof of Theorem 5.2 conducts the reduction from the NR-CODE problem. One may think that a direct reduction from homology localization may be more straightforward. However, such a reduction is not immediately evident. The two problems appear to be of different nature: While the homology localization problem asks for a minimal cycle in a given homology class, WPCYC-INF asks for a minimal cycle in a complex containing a given simplex without referring to any particular homology class.
Proof.
For any , we reduce the NP-hard problem NR-CODE to WPCYC-INF. Given an instance of NR-CODE, we first compute the parity check matrix [8], which is a matrix such that . Similar to the proof of Lemma 4.3.1 in [8], we then build a “tube complex” with 1-cells each of which is a 1-sphere and 2-cells each of which is a 2-sphere with holes. The 2-cells of are attached to the 1-cells along the holes such that the boundary matrix of this tube complex equals . The “-chains” and “-cycles” for a tube complex are analogously defined as for a simplicial complex. We also assign a weight of 1 to each 2-cell of . By this construction, there is a straightforward bijection , such that the Hamming weight of a vector equals the weight of the corresponding 2-chain. Note that . Let , we then add a 2-cell whose boundary equals to and get a new tube complex . We call the 2-cycles in which are not in as the new 2-cycles in . Then is a new 2-cycle in and the set of new 2-cycles in is . We let the weight of also be 1. Note that there is a bijection , where for any , such that .
We then construct an instance of WPCYC-INF by first triangulating to get a simplicial complex . We make 2-weighted such that the sum of the weights of all triangles in any 2-cell of equals the weight of the 2-cell. It is not hard to make the size of a polynomial function of the number of cells of . Let be a 2-simplex in the triangulation of the 2-cell . We build a filtration of with being the last simplex added. Let the index of in be . Then, is an infinite interval of . Note that there is a bijection between the new 2-cycles in and the persistent 2-cycles of , where the weights of the cycles are preserved. Therefore, from the solution of WPCYC-INF with the input , we can derive a new 2-cycle of , where and is an -approximation of the minimal new 2-cycle. Let be a minimal new 2-cycle of , we have
[TABLE]
We also have
[TABLE]
Therefore
[TABLE]
Since is a minimal solution of , then is a -approximation of the minimal solution of . Hence, we have reduced NR-CODE to WPCYC-INF. Furthermore, the reduction is in polynomial time and the sizes of the instances are related by a polynomial function, so WPCYC-INF is NP-hard. ∎
Theorem 5.3**.**
WPCYC-INF is NP-hard to approximate with any fixed ratio for .
Proof.
For any and , we reduce WPCYC-INF to WPCYC-INF. Given an instance of WPCYC-INF, where the complex of is denoted as , let where is the -skeleton of . We make -weighted such that any -simplex of has half of the weight of in . The complex is endowed with a filtration such that is the last simplex added to . Let be the index of in , then . It is true that restricts to a bijection from to preserving the weights of the cycles. Furthermore, for any , is a persistent -cycle of if and only if is a persistent -cycle of . Suppose that is a solution for the instance of WPCYC-INF, i.e., is an -approximation of the minimal solution. Then, is an -approximation for the instance of WPCYC-INF. Therefore, the reduction is done. ∎
6 Experimental results
We experiment with our algorithms for WPCYC-FIN2 and WEPCYC-INF2 on several volume datasets. Since volume data have a natural cubical complex structure, we adapt our implementation slightly in order to work on cubical complexes. The cubical complex for volume data consists of cells in dimensions from 0 to 3 with the underlying space homeomorphic to a 3-dimensional ball. Note that a filtration built from a volume dataset does not produce any infinite intervals. Hence, in order to test our algorithm for WEPCYC-INF2, we take a finite interval and compute the minimal 2-cycle born at the birth time, which is exactly what WEPCYC-INF2 computes. We use the Gudhi [29] library to build the filtrations and compute the persistence intervals. From the experiments, we can see that the minimal persistent 2-cycles computed by our algorithms capture various features of the data which originate from different fields. Note that the combustion, hurricane, and medical datasets are time-varying and we chose a single time frame to compute the persistent intervals and cycles.
Cosmology.
The simulation data shown in Figure 4a from computational cosmology [2] consist of dark matter represented as particles. The thread-like structures in deep purple shown in Figure 4a correspond to sites of large scale structure formation. Galaxy clusters/superclusters are contained in such large scale structures. Figure 4b shows the minimal persistent 2-cycles of the top five longest intervals computed by our algorithms and these cycles precisely represent the top five galaxy clusters/superclusters in volume.
Combustion.
The data shown in Figure 4c correspond to the physical variable666 A physical variable defines a scalar value of a certain kind on each point. from a model of a turbulent combustion process. The variable represents scalar dissipation rate and provides a measure of the maximum possible chemical reaction rate. The minimal persistent 2-cycles shown in Figure 4d represent areas with high value of .
Hurricane.
This dataset777 The Hurricane Isabel data is produced by the Weather Research and Forecast (WRF) model, courtesy of NCAR, and the U.S. National Science Foundation (NSF). with physical variables corresponds to the devastating hurricane named Isabel. We down-sampled the data into a resolution of and worked with two physical variables. The minimal persistent 2-cycle colored blue in Figure 5a is computed on the cloud-volume variable and extracts the eye of the hurricane. The minimal persistent 2-cycle colored green in Figure 5b is computed on the pressure variable and captures the jagged shape of the pressure variation around the hurricane.
Medical imaging.
This dataset from the ADNI [26] project contains the MRI scan of a healthy human skull. The minimal persistent 2-cycles corresponding to the larger intervals as shown in Figure 5c are computed from two time frames. They extract significant features such as eyes, cartilages, nerves, and muscles.
Material science.
We consider the atomic configuration of , which is a ferroelectric material used for making capacitors, transducers, and microphones. Figure 6a shows the atomic configuration of the molecule, where the red, grey, and green balls denote the Oxygen, Titanium, and Barium atoms separately and the radii of the balls equal the radii of the corresponding atoms. Volume data are built by uniformly sampling a lattice structure similar to the one shown in Figure 6a, with the step width equal to one angstrom (note that Figure 6a only shows a lattice structure). Scalar value on a point of the volume is determined as follows: For each atom, let the distance from the point to the atom’s center be , then the scalar value of the point contributed by the atom is , where is the radius of the atom and is the atomic weight. The scalar value on the point is then equal to the sum of the above values contributed by all atoms. For the purpose of this experiment, we computed minimal persistent 2-cycles on both the original scalar function and its negated one. Figure 6b shows a portion of the minimal persistent 2-cycles computed on the original function, where the purple, red, and green cycles correspond to atoms of Barium, Titanium, and Oxygen respectively. In our experiment, every atom corresponds to such a minimal persistent 2-cycle of a long interval. Figure 6c shows a portion of the minimal persistent 2-cycles computed on the negated function, where the cycles complement the Barium atoms. Figure 6d shows the output on the negated function from a tetragonal lattice structure [27], where the atomic bonds are not straight (see Figure 6d inlay). The stretch on the lattice structure leads to minimal persistent 2-cycles with non-trivial genus.
7 Conclusions
In this paper, we inspect the computational complexity for several problems concerning minimal persistent cycles. We expand the hardness results found in [13] and discover the cases that are NP-hard and others that are solvable in polynomial time. For general complexes, we conclude that the computation is NP-hard over all dimensions for finite intervals and NP-hard over dimension greater than one for infinite intervals. Besides, we find the problems to be tractable in dimension if the given complex is a weak -pseudomanifold and, for infinite intervals, if the weak -pseudomanifold is embedded in d+1.
This research leads to some open questions concerning persistent cycles:
i. In our experiments, some persistent cycles correspond to important features of the data (see Section 6). However, we also ran into some intervals whose persistent cycles do not have obvious meanings. If there are ways to design filtrations for data such that persistent cycles are related to the important features, then the prospect for the application of persistent cycles or persistence in general would be more extensive.
ii. As found in [13], persistent cycles are not stable in general even when only the weights of the cycles are considered. It will be helpful to figure out assumptions that are still relevant in practice, but under which the persistent cycles remain stable.
iii. We have presented -time algorithms for computing a minimal persistent cycle for a given interval. A natural question is whether this time complexity can be improved. Furthermore, can we devise a better algorithm to compute minimal persistent cycles for all intervals (i.e., the minimal persistent basis [13]), improving upon the obvious -time algorithm that runs our algorithms on each interval?
Acknowledgments:
This research was conducted with the support of the NSF grants CCF-1740761 and CCF-1839252. We thank the anonymous reviewers for insightful comments.
Appendix A Proof of Theorem 4.1
We first define some symbols used in this section. The interior of a set is denoted by . The boundary of a topological ball is denoted by . The set of -cofaces of a simplex in a -complex [21] is denoted by .
The proof of Theorem 4.1 is based on the extended Jordan–Brouwer separation theorem (Theorem A.1) by Alexander [1]. The statement of the theorem depends on the following definition:
Definition A.1** (Pseudomanifold).**
A simplicial complex is a -pseudomanifold if is a pure -complex and each -simplex is a face of exactly two -simplices in .
Remark A.1**.**
Note that definitions for -pseudomanifolds, such as in [28], typically assume the complex to be -connected.
Theorem A.1**.**
Let and be a finite -connected -pseudomanifold embedded in q, then has exactly 2 connected components.
Now we can finish our proof:
Proof of Theorem 4.1.
The general idea of the proof is as follows: Using a trick which we call the “de-contracting”, we first create a -complex where each oriented simplex of uniquely corresponds to an unoriented simplex. Then, using a trick which we call the “de-pinching”, we show that is the boundary of a region . Finally, from the above fact, we use proof by contradiction to reach the conclusion. Figure 7b gives an example of the “de-contracting” and “de-pinching”.
First, let be the set of -simplices of whose both oriented simplices are in . For a -simplex of , we can let be a topological -ball residing in d+1 such that equals two -simplices with boundaries glued together. We then homeomorphically map points of to . By taking care of the mapping near the boundary of , we can get a new ambient d+1 and a new -complex where all simplices of are untouched except that now corresponds to the two -simplices bounding . We can also think of the above process as “de-contracting” the topological -ball into the topological -ball so that turns into two separate -simplices with identical -faces (see Figure 7a for an example). After doing the “de-contraction” for all -simplices in , we get a -complex . It is true that an oriented boundary -simplex in can be naturally identified as an oriented boundary -simplex in . It is also true that the groups of oriented boundary -simplices in are still groups of oriented boundary -simplices in under the natural identification. So we can let denote the same group of oriented -simplices in . The construction guarantees that if is the boundary of a void of , then is also the boundary of a void of . So we only need to show that is the boundary of a void of (see Figure 7b for an example). From now on, we always treat as a set of oriented -simplices as well as a -cycle (with coefficients) in .
Since different oriented simplices of correspond to different unoriented simplices in , we define a bijection . The bijection maps each oriented simplex of to its corresponding unoriented simplex and is the image of this mapping. We then let be the closure of the simplicial set . Note that is a -cycle (with coefficients) of and is a subcomplex of . Therefore, each -simplex is a face of an even number of -simplices in . We first pick a -simplex of such that \big{|}\mathrm{cof}\,_{d}^{\mathcal{M}}(\sigma^{d-1})\big{|}>2, then pick two -simplices and from such that and are paired in the void boundary reconstruction for . It is then true that forms a topological -ball containing . Forming the topological -balls for all such pairs of -simplices in , we get a set of -balls for \kappa=\big{|}\mathrm{cof}\,_{d}^{\mathcal{M}}(\sigma^{d-1})\big{|}\big{/}2. For each , we slightly move while keeping untouched. We then take the closure of each to get a new -complex in which the ’s have their interiors disjoint. Note that in , now corresponds to different -simplices sharing the boundary. We can repeat the above “de-pinching” process for each -simplex having more than two -cofaces in and then get a sequence of -complexes . In the sequence, and is derived from by doing the “de-pinching” on a -simplex. It is then true that is a pure -dimensional -connected -complex where each -simplex is a face of exactly two -simplices. Since we can subdivide to make it a simplicial complex, by Theorem A.1, must separate d+1 into two connected components. Note that for each , we can treat as a subset of because to deform back to , we only need to contract some points in to points in . Then the connected components of are still connected in . Since all oriented -simplices of bound the same void of , we can let this void be . The void is still connected in because . Therefore, is still connected in . We can let be the connected component of containing and let be the other connected component. The -simplices in and can be identified because going from each to the interior of each -simplex is never touched. Therefore, is still a -cycle (with coefficients) in . We then have that the two -cycles (with coefficients) in , which are derived from the two consistent orientations of simplices of , bound and . Then, as one of the two -cycles (with coefficients) derived from , must be the boundary of or in . We have that bounds because does not contain points from . A fact about our construction is that to deform each back into , we only need to contract points in . This implies that is still a void of with boundary (see Figure 7b for an example).
To prove that is the boundary of a void of , we only need to show that there are no oriented -simplices which are in the boundary of but do not belong to . For contradiction, suppose that there is such an oriented -simplex . Then must not be oppositely oriented to any oriented simplex of because otherwise would bound another connected component of and thus bound another connected component of . Let be the unoriented -simplex of , then because otherwise would be oppositely oriented to an oriented simplex of . Since , the interior of must reside in . From now on, we always treat as a void of . Then among all voids of , the interior of resides in . This is because is the void of containing . If resides in a void other than , points to either side of cannot be from . Since is -connected, there must be a sequence of -simplices of such that , , and , share a -face for each such that . Because the interior of is not in , we can let be the first -simplex in the sequence whose interior is not in , then and the interior of is in . Let be the -face shared by and , we claim that . If , then it is obvious that . If , then it is also true that because otherwise the interiors of and would be connected in . Around the neighborhood of during the void boundary reconstruction for , any two paired oriented simplices from enclose a region residing in . Because of the nature of the pairing, cannot be contained in any of the regions enclosed by the paired oriented simplices from . Since is the boundary of the void of , all other regions in the neighborhood of must not be in . This implies that is not in , which is a contradiction. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] James W. Alexander. A proof and extension of the Jordan-Brouwer separation theorem. Transactions of the American Mathematical Society , 23(4):333–349, 1922.
- 2[2] Ann S. Almgren, John B. Bell, Mike J. Lijewski, Zarija Lukić, and Ethan Van Andel. Nyx: A massively parallel AMR code for computational cosmology. The Astrophysical Journal , 765(1):39, feb 2013.
- 3[3] Glencora Borradaile, Erin Wolf Chambers, Kyle Fox, and Amir Nayyeriy. Minimum cycle and homology bases of surface-embedded graphs. Journal of Computational Geometry , 8(2), 2017.
- 4[4] Erin W. Chambers, Jeff Erickson, and Amir Nayyeri. Minimum cuts and shortest homologous cycles. In Proceedings of the twenty-fifth annual symposium on Computational geometry , pages 377–385. ACM, 2009.
- 5[5] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry , pages 237–246. ACM, 2009.
- 6[6] Frédéric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot. The structure and stability of persistence modules . Springer, 2016.
- 7[7] Chao Chen and Daniel Freedman. Measuring and computing natural generators for homology groups. Computational Geometry , 43(2):169–181, 2010.
- 8[8] Chao Chen and Daniel Freedman. Hardness results for homology localization. Discrete & Computational Geometry , 45(3):425–448, 2011.
