Computing Minimal Persistent Cycles: Polynomial and Hard Cases

Tamal K. Dey; Tao Hou; Sayan Mandal

arXiv:1907.04889·cs.CG·February 18, 2020

Computing Minimal Persistent Cycles: Polynomial and Hard Cases

Tamal K. Dey, Tao Hou, Sayan Mandal

PDF

TL;DR

This paper investigates the computational complexity of finding minimal persistent cycles across different dimensions, proving NP-hardness in general but identifying specific tractable cases related to weak pseudomanifolds.

Contribution

It extends the understanding of minimal persistent cycle computation to higher dimensions and introduces polynomial algorithms for certain cases involving weak pseudomanifolds.

Findings

01

NP-hardness for d>1 persistent cycles in general complexes

02

Polynomial algorithms for finite intervals in weak pseudomanifolds

03

Experiments show minimal cycles capture significant data features

Abstract

Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with $Z_{2}$ coefficients. In addition to proving that it is NP-hard to compute minimal persistent d-cycles (d>1) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the d-th persistence diagram of a weak…

Tables1

Table 1. Table 1: Hardness results for minimal persistent cycle problems with bold results denoting new findings.

Problem	Restriction on $K$	$d$	Hardness
PCYC-FIN_d	$-$	$\geq 1$	NP-hard
WPCYC-FIN_d	$K$ a weak $(d + 1)$ -pseudomanifold	$\geq 1$	Polynomial
PCYC-INF_d	$-$	$= 1$	Polynomial
WPCYC-INF_d	$K$ a weak $(d + 1)$ -pseudomanifold	$\geq 2$	NP-hard
WEPCYC-INF_d	$K$ a weak $(d + 1)$ -pseudomanifold in ^d+1	$\geq 2$	Polynomial

Equations23

\partial_{q}\big{(}[v_{0},\ldots,v_{q}]\big{)}=\sum_{i=0}^{q}(-1)^{i}[v_{0},\ldots,\widehat{v}_{i},\ldots,v_{q}]

\partial_{q}\big{(}[v_{0},\ldots,v_{q}]\big{)}=\sum_{i=0}^{q}(-1)^{i}[v_{0},\ldots,\widehat{v}_{i},\ldots,v_{q}]

θ : {(d + 1) -simplices of K} \to V (G) ∖ {ϕ}

θ : {(d + 1) -simplices of K} \to V (G) ∖ {ϕ}

θ : {d -simplices of K} \to E (G)

θ : {d -simplices of K} \to E (G)

c (S, T) = e \in θ (ζ) \sum c (e) = θ^{- 1} (e) \in ζ \sum w (θ^{- 1} (e)) = w (ζ) \qed

c (S, T) = e \in θ (ζ) \sum c (e) = θ^{- 1} (e) \in ζ \sum w (θ^{- 1} (e)) = w (ζ) \qed

c (S, T) = e \in θ (ζ^{'}) \sum c (e) = θ^{- 1} (e) \in ζ^{'} \sum w (θ^{- 1} (e)) = w (ζ^{'})

c (S, T) = e \in θ (ζ^{'}) \sum c (e) = θ^{- 1} (e) \in ζ^{'} \sum w (θ^{- 1} (e)) = w (ζ^{'})

\begin{array}[]{l}\mathcal{S}K=\big{\{}\{\omega_{1}\},\{\omega_{2}\}\big{\}}\cup K\cup\Big{(}\bigcup_{\sigma\in K}\big{\{}\sigma\cup\{\omega_{1}\},\sigma\cup\{\omega_{2}\}\big{\}}\Big{)}\end{array}

\begin{array}[]{l}\mathcal{S}K=\big{\{}\{\omega_{1}\},\{\omega_{2}\}\big{\}}\cup K\cup\Big{(}\bigcup_{\sigma\in K}\big{\{}\sigma\cup\{\omega_{1}\},\sigma\cup\{\omega_{2}\}\big{\}}\Big{)}\end{array}

\partial (S σ)

\partial (S σ)

= i = 0 \sum q {v_{0}, \dots, v_{i}, \dots, v_{q}, ω_{1}} + {v_{0}, \dots, v_{q}} + i = 0 \sum q {v_{0}, \dots, v_{i}, \dots, v_{q}, ω_{2}} + {v_{0}, \dots, v_{q}}

\displaystyle=\sum_{i=0}^{q}\big{(}\{v_{0},\ldots,\widehat{v_{i}},\ldots,v_{q},\omega_{1}\}+\{v_{0},\ldots,\widehat{v_{i}},\ldots,v_{q},\omega_{2}\}\big{)}

\displaystyle=\sum_{i=0}^{q}\mathcal{S}\big{(}\{v_{0},\ldots,\widehat{v_{i}},\ldots,v_{q}\}\big{)}=\mathcal{S}\Bigg{(}\sum_{i=0}^{q}\{v_{0},\ldots,\widehat{v_{i}},\ldots,v_{q}\}\Bigg{)}=\mathcal{S}\partial(\sigma)

\frac{w ( t + y _{0} + ζ )}{w ( t + y _{0} + ζ ^{*} )} \leq r ⟹ \frac{w ( t ) + w ( y _{0} + ζ )}{w ( t ) + w ( y _{0} + ζ ^{*} )} \leq r ⟹ w (y_{0} + ζ) \leq r - 1 + r w (y_{0} + ζ^{*})

\frac{w ( t + y _{0} + ζ )}{w ( t + y _{0} + ζ ^{*} )} \leq r ⟹ \frac{w ( t ) + w ( y _{0} + ζ )}{w ( t ) + w ( y _{0} + ζ ^{*} )} \leq r ⟹ w (y_{0} + ζ) \leq r - 1 + r w (y_{0} + ζ^{*})

1 \leq \frac{r}{r - 1} w (y_{0} + ζ^{*}) ⟹ r - 1 \leq r w (y_{0} + ζ^{*})

1 \leq \frac{r}{r - 1} w (y_{0} + ζ^{*}) ⟹ r - 1 \leq r w (y_{0} + ζ^{*})

w (y_{0} + ζ) \leq 2 r w (y_{0} + ζ^{*}) ⟹ ∥ y_{0} + ϕ^{- 1} (ζ) ∥_{H} \leq 2 r ∥ y_{0} + ϕ^{- 1} (ζ^{*}) ∥_{H}

w (y_{0} + ζ) \leq 2 r w (y_{0} + ζ^{*}) ⟹ ∥ y_{0} + ϕ^{- 1} (ζ) ∥_{H} \leq 2 r ∥ y_{0} + ϕ^{- 1} (ζ^{*}) ∥_{H}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Computing Minimal Persistent Cycles:

Polynomial and Hard Cases††thanks: Supported by NSF grants CCF-1740761 and CCF-1839252.

Tamal K. Dey Department of Computer Science and Engineering, The Ohio State University. [email protected]

Tao Hou Department of Computer Science and Engineering, The Ohio State University. [email protected]

Sayan Mandal Department of Computer Science and Engineering, The Ohio State University. [email protected]

Abstract

Persistent cycles, especially the minimal ones, are useful geometric features functioning as augmentations for the intervals in a purely topological persistence diagram (also termed as barcode). In our earlier work, we showed that computing minimal 1-dimensional persistent cycles (persistent 1-cycles) for finite intervals is NP-hard while the same for infinite intervals is polynomially tractable. In this paper, we address this problem for general dimensions with $\mathbb{Z}_{2}$ coefficients. In addition to proving that it is NP-hard to compute minimal persistent $d$ -cycles ( $d>1$ ) for both types of intervals given arbitrary simplicial complexes, we identify two interesting cases which are polynomially tractable. These two cases assume the complex to be a certain generalization of manifolds which we term as weak pseudomanifolds. For finite intervals from the $d_{\text{th}}$ persistence diagram of a weak $(d+1)$ -pseudomanifold, we utilize the fact that persistent cycles of such intervals are null-homologous and reduce the problem to a minimal cut problem. Since the same problem for infinite intervals is NP-hard, we further assume the weak $(d+1)$ -pseudomanifold to be embedded in d+1 so that the complex has a natural dual graph structure and the problem reduces to a minimal cut problem. Experiments with both algorithms on scientific data indicate that the minimal persistent cycles capture various significant features of the data.

1 Introduction

Persistent homology [15], which captures essential topological features of data, has proven to be a useful stable descriptor since Edelsbrunner et al. [16] first proposed the algorithm for its computation. The understanding of topological persistence was later expanded by several works [5, 9, 11, 31] in terms of both theory and computation. To make use of persistent homology, one typically computes a persistence diagram (also called barcode) which is a set of intervals with birth and death points. Besides just utilizing the set of intervals, some applications [13, 30] need persistence diagrams augmented with representative cycles for the intervals for gaining more insight into the data. These representative cycles, termed as persistent cycles [13], have been studied by Wu et al. [30], Obayashi [24], and Dey et al. [13] recently from the view-point of optimality.

Although the original persistence algorithm of Edelsbrunner et al. [16] implicitly computes persistent cycles, it does not necessarily provide minimal ones. In an earlier work [13], we showed that it is NP-hard to compute minimal persistent $1$ -cycles (cycles for 1-dimensional homology groups) when the given interval is finite. Interestingly, the same for infinite intervals turned out to be computable in polynomial time [13]. This naturally leads to the following questions: Are there other interesting cases beyond $1$ -dimension for which minimal persistent cycles can be computed in polynomial time? Also, what are the cases that are NP-hard? In this paper, we settle the complexity question for computing minimal persistent cycles with $\mathbb{Z}_{2}$ coefficients in general dimensions. We first show that when $d\geq 2$ , computing minimal persistent $d$ -cycles for both finite and infinite intervals is NP-hard in general. We then identify a special but important class of simplicial complexes, which we term as weak $(d+1)$ -pseudomanifolds, whose minimal persistent $d$ -cycles can be computed in polynomial time. A weak $(d+1)$ -pseudomanifold111 The naming of weak pseudomanifold is adapted from the commonly accepted name pseudomanifold (see Definition A.1). is a generalization of a $(d+1)$ -manifold and is defined as follows:

Definition 1.1.

A simplicial complex $K$ is a weak $(d+1)$ -pseudomanifold if each $d$ -simplex is a face of no more than two $(d+1)$ -simplices in $K$ .

Specifically, we find that if the given complex is a weak $(d+1)$ -pseudomanifold, the problem of computing minimal persistent $d$ -cycles for finite intervals can be cast into a minimal cut problem (see Section 3) due to the fact that persistent cycles of such kind are null-homologous in the complex. However, when $d\geq 2$ and intervals are infinite, the computation of the same becomes NP-hard (see Section 5). Nonetheless, for infinite intervals, if we assume that the weak $(d+1)$ -pseudomanifold is embedded in d+1, the minimal persistent cycle problem reduces to a minimal cut problem (see Section 4) and hence belongs to P. Note that a simplicial complex embedded in d+1 is automatically a weak $(d+1)$ -pseudomanifold. Also note that while there is an algorithm [8] in the non-persistence setting which computes minimal $d$ -cycles by minimal cuts, the non-persistence algorithm assumes the $(d+1)$ -complex to be embedded in d+1. Our algorithm for finite intervals, to the contrary, does not need the embedding assumption.

In order to make our statements about the hardness results precise, we let PCYC-FINd denote the problem of computing minimal persistent $d$ -cycles for finite intervals when the given simplicial complex is arbitrary, and let PCYC-INFd denote the same problem for infinite intervals (see definitions of Problem 2.1 and 2.2). We also let WPCYC-FINd denote a subproblem222 For two problems $P_{1}$ and $P_{2}$ , $P_{2}$ is a subproblem of $P_{1}$ if any instance of $P_{2}$ is an instance of $P_{1}$ and $P_{2}$ asks for computing the same solutions as $P_{1}$ . of PCYC-FINd and let WPCYC-INFd, WEPCYC-INFd denote two subproblems of PCYC-INFd, with the subproblems requiring additional constraints on the given simplicial complex. Table 1 lists the hardness results for all problems of interest, where the column “Restriction on $K$ ” specifies the additional constraints subproblems require on the given simplicial complex $K$ . Note that WPCYC-INFd being NP-hard trivially implies that PCYC-INFd is NP-hard.

Main contributions.

We summarize our contributions as follows:

•

We prove the NP-hardness of PCYC-FINd and WPCYC-INFd for all $d\geq 2$ .

•

We present two polynomial time algorithms for WPCYC-FINd and WEPCYC-INFd when $d\geq 1$ , based on the duality of minimal persistent cycles and minimal cuts. Other than the minimal cut computation, steps in both algorithms run in linear or almost linear time.

1.1 Related works

In the context of computing optimal cycles, most works have been done in the non-persistence setting. These works compute minimal cycles for homology groups of a given simplicial complex. Only very few works address the problem while taking into account the persistence. We review some of the relevant works below.

Minimal cycles for homology groups.

In terms of computing minimal cycles for homology groups, two problems are of most interest: the localization problem and the minimal basis problem. The localization problem asks for computing a minimal cycle in a homology class and the minimal basis problem asks for computing a set of generating cycles for a homology group whose sum of weights is minimal. With $\mathbb{Z}_{2}$ coefficients, these two problems are in general hard. Specifically, Chambers et al. [4] proved that the localization problem over dimension one is NP-hard when the given simplicial complex is a 2-manifold. Chen and Freedman [8] proved that the localization problem is NP-hard to approximate with fixed ratio over arbitrary dimension. They also showed that the minimal basis problem is NP-hard to approximate with fixed ratio over dimension greater than one. For one-dimensional homology, Dey et al. [14] proposed a polynomial time algorithm for the minimal basis problem. Several other works [3, 7, 12, 18] address variants of the two problems while considering special input classes, alternative cycle measures, or coefficients for homology other than $\mathbb{Z}_{2}$ .

In this work, we use graph cuts and their duality extensively. The duality of cuts on a planar graph and separating cycles on the dual graph has long been utilized to efficiently compute maximal flows and minimal cuts on planar graphs, a topic for which Chambers et al. [4] provide a comprehensive review. In their paper [4], Chambers et al. discover the duality between minimal cuts of a surface-embedded graph and minimal homologous cycles in a dual complex, and then devise $O(n\log n)$ algorithms for both problems assuming the genus of the surface to be fixed. Chen and Freedman [8] proposed an algorithm which computes a minimal non-bounding $d$ -cycle given a $(d+1)$ -complex embedded in d+1, utilizing a natural duality of $d$ -cycles in the complex and cuts in the dual graph. The minimal non-bounding cycle algorithm can be further extended to solve the localization problem and the minimal basis problem over dimension $d$ given a $(d+1)$ -complex embedded in d+1.

Persistent cycle.

As pointed out earlier, our main focus is the optimality of representative cycles in the persistence framework. Some early works [17, 19] address the representative cycle problem for persistence by computing minimal cycles at the birth points of intervals without considering what actually die at the death points. Wu et al. [30] proposed an algorithm computing minimal persistent 1-cycles for finite intervals using an annotation technique and heuristic search. However, the time complexity of the algorithm is exponential in the worst-case. Obayashi [24] casts the minimal persistent cycle problem for finite intervals into an integer program, but the rounded result of the relaxed linear program is not guaranteed to be optimal. Dey et al. [13] formalizes the definition of persistent cycles for both finite and infinite intervals. They also proved the NP-hardness of computing minimal persistent 1-cycles for finite intervals and proposed a polynomial time algorithm for computing non-optimal ones which are still good in practice.

2 Preliminaries

In this section we present some concepts necessary for presenting the results in this paper.

Simplicial complex.

A simplicial complex $K$ is a collection of simplices which are abstractly defined as subsets of a ground set called the vertex set of $K$ . If a simplex $\sigma$ is in $K$ , then all its subsets called its faces are also in $K$ . The simplex $\sigma$ is also referred to as a $q$ -simplex if the cardinality of the vertex set of $\sigma$ is $q+1$ . A $q$ -face of $\sigma$ is a $q$ -simplex being a face of $\sigma$ and a $q$ -coface of $\sigma$ is a $q$ -simplex having $\sigma$ as a face. We call a $q$ -simplex of $K$ a boundary $q$ -simplex if it has less than two $(q+1)$ -cofaces in $K$ . A simplicial set is a set of simplices and the closure of a simplicial set $\Sigma$ is the simplicial complex consisting of all the faces of the simplices in $\Sigma$ . A simplicial complex is finite if it contains finitely many simplices. In this paper, we only consider finite simplicial complexes.

If each vertex of a simplicial complex $K$ is a point in a Euclidean space, then each simplex of $K$ can be interpreted as the convex hull of its vertices. The simplicial complex $K$ is said to be embedded in the Euclidean space if the interiors of all its simplices are disjoint. The underlying space of $K$ , denoted by $|K|$ , is the point-wise union of all the simplices of $K$ .

Definition 2.1 (Oriented simplex [23]).

A $q$ -simplex with an ordering of its vertices is an oriented $q$ -simplex. For each $q$ -simplex $\sigma$ ( $q>0$ ), there are exactly two equivalent classes of vertex orderings, resulting in two oriented $q$ -simplices of $\sigma$ . We refer to them as the oppositely oriented $q$ -simplices.

Remark 2.1.

Any simplex by default is unoriented. We denote an unoriented $q$ -simplex $\sigma$ spanned by vertices $v_{0},\ldots,v_{q}$ as $\sigma=\{v_{0},\ldots,v_{q}\}$ and an oriented $q$ -simplex $\vec{\sigma}$ as $\vec{\sigma}=[v_{0},\ldots,v_{q}]$ , where $v_{0},\ldots,v_{q}$ specify the ordering of the spanning vertices.

Filtration.

A filtration $\mathcal{F}$ of a simplicial complex $K$ is a filtered sequence of subcomplexes of $K$ , $\mathcal{F}:\varnothing=K_{0}\subseteq K_{1}\subseteq\ldots\subseteq K_{n}=K$ , such that $K_{i}$ and $K_{i-1}$ differ by one simplex denoted by $\sigma^{\mathcal{F}}_{i}$ . We let $i$ be the index of $\sigma^{\mathcal{F}}_{i}$ in $\mathcal{F}$ and denote it as $\mathrm{ind}(\sigma^{\mathcal{F}}_{i})=i$ . A subcomplex $K_{i}$ in the filtered sequence of $\mathcal{F}$ is also referred to as a partial complex.

Simplicial homology.

We provide a brief overview of simplicial homology used in this paper. See any standard book on the topic, e.g. [23]. Let $q\geq 0$ , $K$ be a simplicial complex, and $\mathbb{G}$ be an abelian group. The $q_{\text{th}}$ * chain group* $\text{\sf C}_{q}(K;\mathbb{G})$ is defined to be the abelian group containing all finite sums of the form $\sum_{i}n_{i}\vec{\sigma}_{i}$ , where $n_{i}\in\mathbb{G}$ and $\vec{\sigma}_{i}$ is an oriented $q$ -simplex of $K$ . Each element in $\text{\sf C}_{q}(K;\mathbb{G})$ is called a $q$ -chain of $K$ . Note that for two oppositely oriented $q$ -simplices $\vec{\sigma}$ and $\vec{\sigma}^{\prime}$ , we have that $n\vec{\sigma}=(-n)\vec{\sigma}^{\prime}$ for any $n\in\mathbb{G}$ . Therefore, $\text{\sf C}_{q}(K;\mathbb{G})$ can be interpreted as a direct sum of $N_{q}$ copies of $\mathbb{G}$ where $N_{q}$ is the number of $q$ -simplices of $K$ and each copy of $\mathbb{G}$ corresponds to a $q$ -simplex of $K$ . The $q_{\text{th}}$ * boundary operator* $\partial_{q}:\text{\sf C}_{q}(K;\mathbb{G})\to\text{\sf C}_{q-1}(K;\mathbb{G})$ is a group homomorphism such that for any oriented $q$ -simplex $[v_{0},\ldots,v_{q}]$

[TABLE]

where the notation $[v_{0},\ldots,\widehat{v}_{i},\ldots,v_{q}]$ means that $\widehat{v}_{i}$ is deleted from the simplex. For brevity, we often omit the subscript of the boundary operator $\partial_{q}$ and denote it as $\partial$ when this does not cause any confusion. The kernel of $\partial_{q}$ is called the $q_{\text{th}}$ * cycle group* of $K$ and is denoted as $\text{\sf Z}_{q}(K;\mathbb{G})$ . The image of $\partial_{q+1}$ is called the $q_{\text{th}}$ * boundary group* of $K$ and is denoted as $\text{\sf B}_{q}(K;\mathbb{G})$ . A $q$ -chain in $\text{\sf Z}_{q}(K;\mathbb{G})$ is called a $q$ -cycle and a $q$ -chain in $\text{\sf B}_{q}(K;\mathbb{G})$ is called a $q$ -boundary. For a $q$ -chain $A$ , the $(q-1)$ -chain $\partial(A)$ is also called the boundary of $A$ .

A fundamental fact in homology theory is that $\partial_{q}\partial_{q+1}=0$ for any $q$ . This implies that $\text{\sf B}_{q}(K;\mathbb{G})\subseteq\text{\sf Z}_{q}(K;\mathbb{G})$ . The $q_{\text{th}}$ * homology group* of $K$ denoted by $\text{\sf H}_{q}(K;\mathbb{G})$ is defined as the quotient $\text{\sf Z}_{q}(K;\mathbb{G})/\text{\sf B}_{q}(K;\mathbb{G})$ . Each coset in $\text{\sf H}_{q}(K;\mathbb{G})$ is called a homology class and a cycle is said to be homologous to another cycle if they belong to the same homology class. As any boundary cycle represents the homology class [math] in $\text{\sf H}_{q}(K;\mathbb{G})$ , a boundary is also said to be null-homologous.

The abelian group $\mathbb{G}$ in the above definitions is called the coefficient group for the homology groups. Sometimes, when the coefficient group $\mathbb{G}$ is clear, we simply drop it and denote a chain group as $\text{\sf C}_{q}(K)$ . This applies to other groups defined in simplicial homology. In this paper, two coefficient groups $\mathbb{Z}_{2}$ and $\mathbb{Z}$ are used for simplicial homology. When not explicitly stated, the coefficients are assumed to be in $\mathbb{Z}_{2}$ . With $\mathbb{Z}_{2}$ coefficients, the orientations of simplices no longer matter and a $q$ -chain can be interpreted as a set of $q$ -simplices with summation of two $q$ -chains being the symmetric difference. A $q$ -cycle is then a set of $q$ -simplices where every $(q-1)$ -face of these simplices adjoins an even number of $q$ -simplices. Also note that because $\mathbb{Z}_{2}$ is a field, all groups defined in simplicial homology with $\mathbb{Z}_{2}$ coefficients become vector spaces and homomorphisms between these groups (such as $\partial$ ) become linear maps.

Definition 2.2 ( $q$ -weighted).

A simplicial complex $K$ is $q$ -weighted if each $q$ -simplex $\sigma$ of $K$ has a non-negative finite weight $w(\sigma)$ . The weight of a $q$ -chain $A$ of $K$ is then defined as $w(A)=\sum_{\sigma\in A}w(\sigma)$ .

Definition 2.3 ( $q$ -connected).

Let $K$ be a simplicial complex, for $q\geq 1$ , two $q$ -simplices $\sigma$ and $\sigma^{\prime}$ of $K$ are $q$ -connected in $K$ if there is a sequence of $q$ -simplices of $K$ , $(\sigma_{0},\ldots,\sigma_{l})$ , such that $\sigma_{0}=\sigma$ , $\sigma_{l}=\sigma^{\prime}$ , and for all $0\leq i<l$ , $\sigma_{i}$ and $\sigma_{i+1}$ share a $(q-1)$ -face. The property of $q$ -connectedness defines an equivalence relation on $q$ -simplices of $K$ . Each set in the partition induced by the equivalence relation constitutes a $q$ -connected component of $K$ . We say $K$ is $q$ -connected if any two $q$ -simplices of $K$ are $q$ -connected in $K$ .

Remark 2.2.

See Figure 2a for an example of 1-connected components and 2-connected components.

Definition 2.4 ( $q$ -connected cycle).

A $q$ -cycle $\zeta$ (with $\mathbb{Z}_{2}$ coefficients) is $q$ -connected if the complex derived by taking the closure of the simplicial set $\zeta$ is $q$ -connected.

Persistent homology.

We will provide a brief description of persistent homology. We recommend the book by Edelsbrunner and Harer [15] for a detailed explanation of this topic and the book by Chazal et al. [6] for its underlying Mathematical structure, persistence module. Note that persistent homology in this paper is always assumed to be with $\mathbb{Z}_{2}$ coefficients. The persistence algorithm starts with a filtration $\mathcal{F}:\varnothing=K_{0}\subseteq K_{1}\subseteq\ldots\subseteq K_{n}=K$ of a simplicial complex $K$ , and for each simplex $\sigma_{i}^{\mathcal{F}}$ , inspects whether $\partial(\sigma_{i}^{\mathcal{F}})$ is a boundary in $K_{i-1}$ . If $\partial(\sigma_{i}^{\mathcal{F}})$ is a boundary in $K_{i-1}$ , $\sigma_{i}^{\mathcal{F}}$ is called positive; otherwise, it is called negative. The $d$ -chains (or $d$ -cycles) in $K_{i}$ that are not in $K_{i-1}$ are said to be born in $K_{i}$ or created by $\sigma_{i}^{\mathcal{F}}$ . A positive $d$ -simplex creates some $d$ -cycles and a negative $d$ -simplex makes some $(d-1)$ -cycles become boundaries. In the latter case, we also say that the negative $d$ -simplex kills or destroys those $(d-1)$ -cycles. What is central to the persistence algorithm is a notion called pairing: A positive simplex is initially unpaired when introduced; when a negative $d$ -simplex $\sigma_{i}^{\mathcal{F}}$ comes, the algorithm finds a $(d-1)$ -cycle created by an unpaired positive $(d-1)$ -simplex $\sigma_{j}^{\mathcal{F}}$ which is homologous to $\partial(\sigma_{i}^{\mathcal{F}})$ and pair $\sigma_{j}^{\mathcal{F}}$ with $\sigma_{i}^{\mathcal{F}}$ . Alongside the pairing, a finite interval $[j,i)$ is added to the $(d-1)_{\text{th}}$ persistence diagram, which is denoted by $\mathsf{D}_{d-1}(\mathcal{F})$ . After all simplices are processed, some positive simplices may still be unpaired. For each $\sigma_{i}^{\mathcal{F}}$ of these unpaired simplices, an infinite interval $[i,+\infty)$ is added to $\mathsf{D}_{d}(\mathcal{F})$ , where $d$ is the dimension of $\sigma_{i}^{\mathcal{F}}$ .

Note that the pairing in the persistence algorithm for a given filtration is unique. Also note that in this paper, we assume a filtration of a complex is given and the persistence intervals start and end with indices of the paired simplices. However, in real-life applications, one is often given a function on a simplicial complex. To produce the persistence intervals, a filtration needs to be derived and the endpoints of the intervals are taken as function values on the paired simplices. In such cases, we can associate a given interval to its simplex pair, take the indices of the paired simplices, and get an interval which can serve as an input to our algorithms.

The persistent cycle problems.

We can now formally define the minimal persistent cycle problems:

Problem 2.1 (PCYC-FINd).

Given a finite $d$ -weighted simplicial complex $K$ , a filtration $\mathcal{F}:\varnothing=K_{0}\subseteq K_{1}\subseteq\ldots\subseteq K_{n}=K$ , and a finite interval $[{\beta},{\delta})\in\mathsf{D}_{d}(\mathcal{F})$ , this problem asks for computing a $d$ -cycle with the minimal weight which is born in $K_{{\beta}}$ and becomes a boundary in $K_{{\delta}}$ .

Problem 2.2 (PCYC-INFd).

Given a finite $d$ -weighted simplicial complex $K$ , a filtration $\mathcal{F}:\varnothing=K_{0}\subseteq K_{1}\subseteq\ldots\subseteq K_{n}=K$ , and an infinite interval $[{\beta},+\infty)\in\mathsf{D}_{d}(\mathcal{F})$ , this problem asks for computing a $d$ -cycle with the minimal weight which is born in $K_{\beta}$ .

Remark 2.3.

The definitions of the above two problems are derived directly from the definition of persistent $d$ -cycles [13].

Undirected flow network.

An undirected flow network $(G,s_{1},s_{2})$ consists of an undirected graph $G$ with vertex set $V(G)$ and edge set $E(G)$ , a capacity function $c:E(G)\to[0,+\infty]$ , and two non-empty disjoint subsets $s_{1}$ and $s_{2}$ of $V(G)$ . Vertices in $s_{1}$ are referred to as sources and vertices in $s_{2}$ are referred to as sinks. A cut $(S,T)$ of $(G,s_{1},s_{2})$ consists of two disjoint subsets $S$ and $T$ of $V(G)$ such that $S\cup T=V(G)$ , $s_{1}\subseteq S$ , and $s_{2}\subseteq T$ . The set of edges that connect a vertex in $S$ and a vertex in $T$ are referred as the edges across the cut $(S,T)$ and is denoted as $\xi(S,T)$ . The capacity of a cut $(S,T)$ is defined as $c(S,T)=\sum_{e\in\xi(S,T)}c(e)$ . A minimal cut of $(G,s_{1},s_{2})$ is a cut with the minimal capacity. Note that we allow parallel edges in $G$ (see Figure 2a) to ease the presentation. These parallel edges can be merged into one edge during computation.

3 Minimal persistent $d$ -cycles of finite intervals for weak $({d+1})$ -pseudomanifolds

In this section, we present an algorithm which computes minimal persistent $d$ -cycles for finite intervals given a filtration of a weak $({d+1})$ -pseudomanifold when $d\geq 1$ . The general process is as follows: Suppose that the input weak $({d+1})$ -pseudomanifold is $K$ associated with a filtration $\mathcal{F}:K_{0}\subseteq K_{1}\subseteq\ldots\subseteq K_{n}$ and the task is to compute the minimal persistent cycle of a finite interval $[{\beta},{\delta})\in\mathsf{D}_{d}(\mathcal{F})$ . We first construct an undirected dual graph $G$ for $K$ where vertices of $G$ are dual to $({d+1})$ -simplices of $K$ and edges of $G$ are dual to $d$ -simplices of $K$ . One dummy vertex termed as infinite vertex which does not correspond to any $({d+1})$ -simplices is added to $G$ for graph edges dual to those boundary $d$ -simplices. We then build an undirected flow network on top of $G$ where the source is the vertex dual to $\sigma_{\delta}^{\mathcal{F}}$ and the sink is the infinite vertex along with the set of vertices dual to those $({d+1})$ -simplices which are added to $\mathcal{F}$ after $\sigma_{\delta}^{\mathcal{F}}$ . If a $d$ -simplex is $\sigma_{\beta}^{\mathcal{F}}$ or added to $\mathcal{F}$ before $\sigma_{\beta}^{\mathcal{F}}$ , we let the capacity of its dual graph edge be its weight; otherwise, we let the capacity of its dual graph edge be $+\infty$ . Finally, we calculate a minimal cut of this flow network and return the $d$ -chain dual to the edges across the minimal cut as a minimal persistent cycle of the interval.

The intuition of the above algorithm is best explained by an example in Figure 1, where $d=1$ . The key to the algorithm is the duality between persistent cycles of the input interval and cuts of the dual flow network having finite capacity. To see this duality, first consider a persistent $d$ -cycle $\zeta$ of the input interval $[{\beta},{\delta})$ . There exists a $({d+1})$ -chain $A$ in $K_{\delta}$ created by $\sigma_{\delta}^{\mathcal{F}}$ whose boundary equals $\zeta$ , making $\zeta$ killed. We can let $S$ be the set of graph vertices dual to the simplices in $A$ and let $T$ be the set of the remaining graph vertices, then $(S,T)$ is a cut. Furthermore, $(S,T)$ must have finite capacity as the edges across it are exactly dual to the $d$ -simplices in $\zeta$ and the $d$ -simplices in $\zeta$ have indices in $\mathcal{F}$ less than or equal to ${\beta}$ . On the other hand, let $(S,T)$ be a cut with finite capacity, then the $({d+1})$ -chain whose simplices are dual to the vertices in $S$ is created by $\sigma_{\delta}^{\mathcal{F}}$ . Taking the boundary of this $({d+1})$ -chain, we get a $d$ -cycle $\zeta$ . Because $d$ -simplices of $\zeta$ are exactly dual to the edges across $(S,T)$ and each edge across $(S,T)$ has finite capacity, $\zeta$ must reside in $K_{\beta}$ . We only need to ensure that $\zeta$ contains $\sigma_{\beta}^{\mathcal{F}}$ in order to show that $\zeta$ is a persistent cycle of $[{\beta},{\delta})$ . In Section 3.2, we argue that $\zeta$ actually contains $\sigma_{\beta}^{\mathcal{F}}$ , so $\zeta$ is indeed a persistent cycle. Note that while the above explanation introduces the general idea, the rigorous statement and proof of the duality are articulated by Proposition 3.2 and 3.3.

We list the pseudo-code in Algorithm 1 and it works as follows: Line 3 and 4 set up a complex $\widetilde{K}$ that the algorithm mainly works on, where $\widetilde{K}$ is taken as the closure of the $({d+1})$ -connected component of $K$ containing $\sigma_{\delta}^{\mathcal{F}}$ . The reason for working on $\widetilde{K}$ instead of the entire complex is explained later in this section. Line 6 constructs the dual graph $G$ from $\widetilde{K}$ and line 8 $-$ 18 builds the flow network on top of $G$ . Note that we denote the infinite vertex by $\phi$ . Line 19 computes a minimal cut for the flow network and line 20 returns the $d$ -chain dual to the edges across the minimal cut. In the pseudo-codes of this paper, to ease the exposition, we treat a Mathematical function as a computer program object. For example, the function $\theta$ returned by DualGraphFin in Algorithm 1 denotes the bijection between the simplices of $\widetilde{K}$ and their dual vertices or edges (see Section 3.1 for details). In practice, these constructs can be easily implemented in any computer programming language.

To see the reason why we work on $\widetilde{K}$ , we first note that the dual graph constructed directly from $K$ may be disconnected333 For an example in $d=1$ , take $K$ as two disconnected triangulated 2-spheres. Its dual graph consists of two connected components.. While cuts are still well-defined for a disconnected flow network, one may prefer a connected one as the minimal cut computation only concerns the graph component containing the source. By constructing the dual graph from $\widetilde{K}$ , it can be ensured that the graph is connected. In order for Algorithm 1 to work, one has to further show that the sink is non-empty so that the computed persistent cycle is non-empty. This is verified in Proposition 3.1. An intuitive reason why the computation from $\widetilde{K}$ is still correct is as follows: Each persistent $d$ -cycle $\zeta$ of the given interval corresponds to a $({d+1})$ -chain $A$ which kills $\zeta$ , i.e., $\partial(A)=\zeta$ . Suppose that $A$ is not entirely contained in $\widetilde{K}$ . Notice that $A\cap\widetilde{K}\neq\varnothing$ and contains at least the killer simplex $\sigma_{\delta}^{\mathcal{F}}$ . Then $\partial(A\cap\widetilde{K})$ must be a persistent cycle of the interval residing in $\widetilde{K}$ which has a smaller weight. Hence, a minimal persistent cycle must reside in $\widetilde{K}$ . In Section 3.2, we formally verify the construction.

Complexity.

The time complexity of Algorithm 1 depends on the encoding scheme of the input and the data structure used for representing a simplicial complex. For encodings of the input, we assume $K$ and $\mathcal{F}$ to be represented by a sequence of all the simplices of $K$ ordered by their indices in $\mathcal{F}$ , where each simplex is denoted by its set of vertices. We also assume a simple yet reasonable simplicial complex data structure as follows: In each dimension, simplices are mapped to integral identifiers ranging from 0 to the number of simplices in that dimension minus 1; each $q$ -simplex has an array (or linked list) storing all the id’s of its $(q+1)$ -cofaces; a hash map for each dimension is maintained for the query of the integral id of each simplex in that dimension based on the spanning vertices of the simplex. We further assume $d$ to be constant. By the above assumptions, let $n$ be the size (number of bits) of the encoded input, then there are no more than $n$ elementary $O(1)$ operations in line 3 and 4. So, the time complexity of line 3 and 4 is $O(n)$ . It is not hard to verify that the flow network construction also takes $O(n)$ time so the time complexity of Algorithm 1 is determined by the minimal cut algorithm. Using the max-flow algorithm by Orlin [25], the time complexity of Algorithm 1 becomes $O(n^{2})$ .

In the rest of this section, we first explain the bijection $\theta$ returned by DualGraphFin, then prove the correctness of the algorithm.

3.1 The bijection $\bm{\theta}$

The vertex set $V(G)$ of $G$ contains vertices which correspond to the $({d+1})$ -simplices of $\widetilde{K}$ . The set $V(G)$ may also contain an infinite vertex $\phi$ if $\widetilde{K}$ contains any boundary $d$ -simplex. We define a bijection

[TABLE]

such that for any $({d+1})$ -simplex $\sigma^{d+1}$ of $\widetilde{K}$ , $\theta(\sigma^{d+1})$ is the vertex that $\sigma^{d+1}$ is dual to. Similarly, we define another bijection

[TABLE]

using the same notation $\theta$ .

Note that we can take the image of a subset of the domain under a function. Therefore, if $(S,T)$ is a cut for a flow network built on $G$ , then $\theta^{-1}(\xi(S,T))$ denotes the set of $d$ -simplices dual to the edges across the cut. Also note that since simplicial chains with $\mathbb{Z}_{2}$ coefficients can be interpreted as sets, $\theta^{-1}(\xi(S,T))$ is also a $d$ -chain.

3.2 Algorithm correctness

In this subsection, we prove the correctness of Algorithm 1. Some of the symbols we use refer to Algorithm 1.

Proposition 3.1.

In Algorithm 1, the sink $s_{2}$ is not an empty set.

Proof.

For contradiction, suppose that $s_{2}$ is an empty set. Then, $\phi\not\in V(G)$ and $\sigma_{\delta}^{\mathcal{F}}$ is the $({d+1})$ -simplex of $\widetilde{K}$ with the greatest index in $\mathcal{F}$ . Because $\phi\not\in V(G)$ , any $d$ -simplex of $\widetilde{K}$ must be a face of two $({d+1})$ -simplices of $\widetilde{K}$ , so the set of $({d+1})$ -simplices of $\widetilde{K}$ forms a $({d+1})$ -cycle created by ${\sigma_{\delta}^{\mathcal{F}}}$ . Then ${\sigma_{\delta}^{\mathcal{F}}}$ must be a positive simplex in $\mathcal{F}$ , which is a contradiction. ∎

The following two propositions specify the duality mentioned at the beginning of this section:

Proposition 3.2.

For any cut $(S,T)$ of $(G,s_{1},s_{2})$ with finite capacity, the $d$ -chain $\zeta=\theta^{-1}(\xi({S},{T}))$ is a persistent $d$ -cycle of $[{\beta},{\delta})$ and $w(\zeta)=c(S,T)$ .

Proof.

Let $A=\theta^{-1}(S)$ , we first want to prove $\zeta=\partial(A)$ , so that $\zeta$ is a cycle. Let $\sigma^{d}$ be any $d$ -simplex of $\zeta$ , then $\theta(\sigma^{d})$ connects a vertex $u\in S$ and a vertex $v\in T$ . If $v=\phi$ , then $\sigma^{d}$ cannot be a face of another $({d+1})$ -simplex in $K$ other than $\theta^{-1}(u)$ . So, $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex of $A$ . If $v\neq\phi$ , then $\sigma^{d}$ is also a face of exactly one $({d+1})$ -simplex of $A$ . Therefore, $\sigma^{d}\in\partial(A)$ . On the other hand, let $\sigma^{d}$ be any $d$ -simplex of $\partial(A)$ , then $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex $\sigma_{0}^{d+1}$ of $A$ . If $\sigma^{d}$ is a face of another $({d+1})$ -simplex $\sigma_{1}^{d+1}$ in $K$ , then $\sigma^{d+1}_{1}\in\widetilde{K}$ and $\sigma^{d+1}_{1}\not\in A$ . So, $\theta(\sigma^{d})$ connects the vertex $\theta(\sigma^{d+1}_{0})\in S$ and the vertex $\theta(\sigma^{d+1}_{1})\in T$ in the graph $G$ . If $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex in $K$ , $\theta(\sigma^{d})$ must connect $\theta(\sigma^{d+1}_{0})\in S$ and $\phi\in T$ in $G$ . So we have $\theta(\sigma^{d})\in\xi(S,T)$ , i.e., $\sigma^{d}\in\theta^{-1}(\xi(S,T))$ .

We then show that $\zeta$ is created by $\sigma_{\beta}^{\mathcal{F}}$ . By Proposition 3.1, $\zeta$ cannot be empty. Therefore, for contradiction, we can suppose that $\zeta$ is created by a $d$ -simplex $\sigma^{d}\neq\sigma_{\beta}^{\mathcal{F}}$ . Because $c(S,T)$ has finite capacity, we have that $\mathrm{ind}(\sigma^{d})<{\beta}$ . We can let $\zeta^{\prime}$ be a persistent cycle of $[{\beta},{\delta})$ and $\zeta^{\prime}=\partial(A^{\prime})$ where $A^{\prime}$ is a $({d+1})$ -chain of $K_{\delta}$ . Then we have $\zeta+\zeta^{\prime}=\partial(A+A^{\prime})$ . Since $A$ and $A^{\prime}$ are both created by $\sigma_{\delta}^{\mathcal{F}}$ , then $A+A^{\prime}$ is created by a $({d+1})$ -simplex with an index less than ${\delta}$ in $\mathcal{F}$ . So $\zeta+\zeta^{\prime}$ is a $d$ -cycle created by $\sigma_{\beta}^{\mathcal{F}}$ which becomes a boundary before $\sigma_{\delta}^{\mathcal{F}}$ is added. This means that $\sigma_{\beta}^{\mathcal{F}}$ is already paired when $\sigma_{\delta}^{\mathcal{F}}$ is added, contradicting the fact that $\sigma_{\beta}^{\mathcal{F}}$ is paired with $\sigma_{\delta}^{\mathcal{F}}$ . Similarly, we can prove that $\zeta$ is not a boundary until $\sigma_{\delta}^{\mathcal{F}}$ is added, so $\zeta$ is a persistent cycle of $[{\beta},{\delta})$ . Since $(S,T)$ has finite capacity, we must have

[TABLE]

Proposition 3.3.

For any persistent $d$ -cycle $\zeta$ of $[{\beta},{\delta})$ , there exists a cut $(S,T)$ of $(G,s_{1},s_{2})$ such that $c(S,T)\leq w(\zeta)$ .

Proof.

Let $A$ be a $({d+1})$ -chain in $K_{\delta}$ such that $\zeta=\partial(A)$ . Note that $A$ is created by $\sigma_{\delta}^{\mathcal{F}}$ and $\zeta$ is the set of $d$ -simplices which are face of exactly one $({d+1})$ -simplex of $A$ . Let $\zeta^{\prime}=\zeta\cap\widetilde{K}$ and $A^{\prime}=A\cap\widetilde{K}$ , we claim that $\zeta^{\prime}=\partial(A^{\prime})$ . To prove this, first let $\sigma^{d}$ be any $d$ -simplex of $\zeta^{\prime}$ , then $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex $\sigma^{d+1}$ of $A$ . Since $\sigma^{d}\in\widetilde{K}$ , it is also true that $\sigma^{d+1}\in\widetilde{K}$ , so $\sigma^{d+1}\in A^{\prime}$ . Then $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex of $A^{\prime}$ , so $\sigma^{d}\in\partial(A^{\prime})$ . On the other hand, let $\sigma^{d}$ be any $d$ -simplex of $\partial(A^{\prime})$ , then $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex $\sigma_{0}^{d+1}$ of $A^{\prime}$ . Note that $\sigma_{0}^{d+1}\in A$ and we then want to prove that $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex $\sigma_{0}^{d+1}$ of $A$ . Suppose that $\sigma^{d}$ is a face of another $({d+1})$ -simplex $\sigma^{d+1}_{1}$ of $A$ , then $\sigma^{d+1}_{1}\in\widetilde{K}$ because $\sigma_{0}^{d+1}\in\widetilde{K}$ . So we have $\sigma^{d+1}_{1}\in A\cap\widetilde{K}=A^{\prime}$ , contradicting the fact that $\sigma^{d}$ is a face of exactly one $({d+1})$ -simplex of $A^{\prime}$ . Then we have $\sigma^{d}\in\partial(A)$ . Since $\sigma_{0}^{d+1}\in\widetilde{K}$ , we have $\sigma^{d}\in\widetilde{K}$ , which means that $\sigma^{d}\in\zeta^{\prime}$ .

Let $S=\theta(A^{\prime})$ and $T=V(G)\smallsetminus S$ , then it is true that $(S,T)$ is a cut of $(G,s_{1},s_{2})$ because $A^{\prime}$ is created by $\sigma_{\delta}^{\mathcal{F}}$ . We claim that $\theta^{-1}(\xi(S,T))=\partial(A^{\prime})$ . The proof of the equality is similar to the one in the proof of Proposition 3.2. It follows that $\xi(S,T)=\theta(\zeta^{\prime})$ . We then have that

[TABLE]

because each $d$ -simplex of $\zeta^{\prime}$ has an index less than or equal to ${\beta}$ in $\mathcal{F}$ .

Finally, because $\zeta^{\prime}$ is a subchain of $\zeta$ , we must have $c(S,T)=w(\zeta^{\prime})\leq w(\zeta)$ . ∎

Combining the above facts, we can conclude:

Theorem 3.1.

Algorithm 1 computes a minimal persistent $d$ -cycle for the given interval $[{\beta},{\delta})$ .

Proof.

First, the flow network $(G,s_{1},s_{2})$ constructed by Algorithm 1 must be valid by Proposition 3.1. Next, because the interval $[{\beta},{\delta})$ must have a persistent cycle, by Proposition 3.3, the flow network $(G,s_{1},s_{2})$ has a cut with finite capacity. This means that $c(S^{*},T^{*})$ is finite. By Proposition 3.2, the chain $\zeta^{*}=\theta^{-1}(\xi(S^{*},T^{*}))$ is a persistent cycle of $[{\beta},{\delta})$ . Assume that $\zeta^{*}$ is not a minimal persistent cycle of $[{\beta},{\delta})$ and instead let $\zeta^{\prime}$ be a minimal persistent cycle of $[{\beta},{\delta})$ . Then there exists a cut $(S^{\prime},T^{\prime})$ such that $c(S^{\prime},T^{\prime})\leq w(\zeta^{\prime})<w(\zeta^{*})=c(S^{*},T^{*})$ by Proposition 3.2 and 3.3, contradicting the fact that $(S^{*},T^{*})$ is a minimal cut. ∎

4 Minimal persistent $d$ -cycles of infinite intervals for weak $({d+1})$ -pseudomanifolds embedded in d+1

We already mentioned that computing minimal persistent $d$ -cycles ( $d\geq 2$ ) for infinite intervals is NP-hard even if we restrict to weak $(d+1)$ -pseudomanifolds (see Section 5.3 for a proof). However, when the complex is embedded in d+1, the problem becomes polynomially tractable. In this section, we present an algorithm for this problem in $d\geq 1$ 444 As mentioned earlier, when $d=1$ , this problem is polynomially tractable for arbitrary complexes.. The algorithm uses a similar duality described in Section 3. However, a direct use of the approach in Section 3 does not work. For example, in Figure 2a, 1-simplices that do not have any 2-cofaces cannot reside in any $2$ -connected component of the given complex. Hence, no cut in the flow network may correspond to a persistent cycle of the infinite interval created by such a $1$ -simplex. Furthermore, unlike the finite interval case, we do not have a negative simplex whose dual can act as a source in the flow network.

Let $(K,\mathcal{F},[{\beta},+\infty))$ be an input to the problem where $K$ is a weak $({d+1})$ -pseudomanifold embedded in d+1, $\mathcal{F}:K_{0}\subseteq K_{1}\subseteq\ldots\subseteq K_{n}$ is a filtration of $K$ , and $[{\beta},+\infty)$ is an infinite interval of $\mathsf{D}_{d}(\mathcal{F})$ . By the definition of the problem, the task boils down to computing a minimal $d$ -cycle containing $\sigma_{\beta}^{\mathcal{F}}$ in $K_{\beta}$ . Note that $K_{\beta}$ is also a weak $({d+1})$ -pseudomanifold embedded in d+1.

Generically, assume $\widetilde{K}$ is an arbitrary weak $({d+1})$ -pseudomanifold embedded in d+1 and we want to compute a minimal $d$ -cycle containing a $d$ -simplex $\widetilde{\sigma}$ for $\widetilde{K}$ . By the embedding assumption, the connected components of ${}^{d+1}\smallsetminus|\widetilde{K}|$ are well defined and we call them the voids of ${}^{d+1}\smallsetminus|\widetilde{K}|$ . The complex $\widetilde{K}$ has a natural (undirected) dual graph structure as exemplified by Figure 2a for $d=1$ , where the graph vertices are dual to the $({d+1})$ -simplices as well as the voids and the graph edges are dual to the $d$ -simplices. The duality between cycles and cuts is as follows: Since the ambient space d+1 is contractible (homotopy equivalent to a point), every $d$ -cycle in $\widetilde{K}$ is the boundary of a $({d+1})$ -dimensional region obtained by point-wise union of certain $({d+1})$ -simplices and/or voids. We can derive a cut555 The cut here is defined on a graph without sources and sinks, so the cut is simply a partition of the vertex set into two sets. of the dual graph by putting all vertices contained in the $({d+1})$ -dimensional region into one vertex set and putting the rest into the other vertex set. On the other hand, for every cut of the graph, we can take the point-wise union of all the $({d+1})$ -simplices and voids dual to the graph vertices in one set of the cut and derive a $({d+1})$ -dimensional region. The boundary of the derived $({d+1})$ -dimensional region is then a $d$ -cycle in $\widetilde{K}$ . We observe that by making the source and sink dual to the two $({d+1})$ -simplices or voids that $\widetilde{\sigma}$ adjoins, we can build a flow network where a minimal cut produces a minimal $d$ -cycle in $\widetilde{K}$ containing $\widetilde{\sigma}$ .

The efficiency of the above algorithm is in part determined by the efficiency of the dual graph construction. This step requires identifying the voids that the boundary $d$ -simplices are incident on. A straightforward approach would be to first group the boundary $d$ -simplices into $d$ -cycles by local geometry, and then build the nesting structure of these $d$ -cycles to correctly reconstruct the boundaries of the voids. This approach has a quadratic worst-case complexity. To make the void boundary reconstruction faster, we assume that the simplicial complex being worked on is $d$ -connected so that building the nesting structure is not needed. Our reconstruction then runs in almost linear time. To satisfy the $d$ -connected assumption, we begin our algorithm by taking $\widetilde{K}$ as a $d$ -connected subcomplex of $K_{\beta}$ containing $\sigma_{\beta}^{\mathcal{F}}$ and continue only with this $\widetilde{K}$ . The computed output is still correct because the minimal cycle in $\widetilde{K}$ is again a minimal cycle in $K_{\beta}$ as shown in Section 4.2.

We list the pseudo-code in Algorithm 1 and it works as follows: Line 3 $-$ 6 set up the complex $\widetilde{K}$ that the algorithm works on. Line 3 prunes $K_{\beta}$ to produce a complex $K_{\beta}^{\prime}$ . Given $(K_{\beta},d)$ , the Prune subroutine iteratively deletes a $d$ -simplex $\sigma^{d}$ of $K_{\beta}$ such that there is a $(d-1)$ -face of $\sigma^{d}$ having $\sigma^{d}$ as the only $d$ -coface (i.e., $\sigma^{d}$ is a dangled $d$ -simplex), until no such $d$ -simplex can be found. It is not hard to verify that Prune only deletes $d$ -simplices not residing in any $d$ -cycles, so a minimal $d$ -cycle containing $\sigma_{\beta}^{\mathcal{F}}$ is never deleted. We perform the pruning because it can reduce the graph size for the minimal cut computation which is more time consuming. In line 4 $-$ 6, we take the $d$ -connected component $C_{\beta}$ of $K_{\beta}^{\prime}$ containing $\sigma_{\beta}^{\mathcal{F}}$ and add a set $\Sigma^{d+1}$ of $({d+1})$ -simplices to the closure of $C_{\beta}$ to form $\widetilde{K}$ . The set $\Sigma^{d+1}$ contains all $({d+1})$ -simplices of $K_{\beta}^{\prime}$ whose $d$ -faces reside in $C_{\beta}$ . The reason of adding the set $\Sigma^{d+1}$ is to reduce the number of voids for the complex $\widetilde{K}$ and in turn reduce the running time of the subsequent void boundary reconstruction. For example, in Figure 3b, we could treat the entire complex as $K_{\beta}^{\prime}$ , all 1-simplices as $C_{\beta}$ , and all 2-simplices as $\Sigma^{d+1}$ . If we do not add $\Sigma^{d+1}$ to the closure of $C_{\beta}$ , there will be seven more voids corresponding to the seven 2-simplices. Line 8 reconstructs the void boundaries for $\widetilde{K}$ . Each returned $\vec{\zeta}_{j}$ denotes a set of $d$ -simplices forming the boundary of a void. As indicated in Section 4.1, the $d$ -simplices in a void boundary are oriented. Line 9 constructs the dual graph $G$ based on the reconstructed void boundaries. Similar to Algorithm 1, the function $\theta$ returned by DualGraphInf denotes the bijection from $d$ -simplices of $\widetilde{K}$ to $E(G)$ . Line 11 $-$ 17 build the flow network on top of $G$ . The capacity of each edge is equal to the weight of its dual $d$ -simplex and the source and sink are selected as previously described. Line 18 computes a minimal cut for the flow network and line 19 returns the $d$ -chain dual to the edges across the minimal cut.

Complexity.

We make the same assumptions as in the complexity analysis for Algorithm 1. Since the void boundary reconstruction needs to sort the $d$ -cofaces of certain $({d-1})$ -simplices, its worst-case time complexity is $O(n\log n)$ . Then, all operations other than the minimal cut computation take $O(n\log n)$ time. Therefore, similar to Algorithm 1, Algorithm 1 achieves a complexity of $O(n^{2})$ by using Orlin’s max-flow algorithm [25].

In the rest of this section, we first describe the subroutine VoidBoundary invoked by Algorithm 1 and then prove the correctness of the algorithm.

4.1 Void boundary reconstruction

As previously stated, the object of the reconstruction is to identify which voids a boundary $d$ -simplex of $\widetilde{K}$ is incident on. The task becomes complicated because a void may have disconnected boundaries and a $d$ -simplex may bound more than one void. This is exemplified in Figure 3a. To address this issue, we orient the boundary $d$ -simplices and determine the orientations consistently from the voids they bound. This is possible because an orientation of a $d$ -simplex in d+1 associates exactly one of its two sides to the $d$ -simplex. To reconstruct the boundaries, we first inspect the neighborhood of each $({d-1})$ -simplex being a face of a boundary $d$ -simplex and pair the oriented boundary $d$ -simplices in the neighborhood which locally bound the same void. Figure 2b gives an example of the oriented boundary $d$ -simplices pairing for $d=1$ . In Figure 2b, there are three local voids each colored differently. The oriented 1-simplices with the same color bound the same void and are paired.

After pairing the oriented boundary $d$ -simplices, we group them by putting paired ones into the same group. Each group then forms a $d$ -cycle (with $\mathbb{Z}$ coefficients). This is exemplified by Figure 3 for $d=1$ . Note that in general, the above grouping does not fully reconstruct the void boundaries. This can be seen from Figure 3a where the complex has four voids but the grouping produces six 1-cycles. In order to fully reconstruct the boundaries, one has to retrieve the nesting structure of these $d$ -cycles, which may take $\Omega(n^{2})$ time in the worst-case. However, as we work on a complex $\widetilde{K}$ that is $d$ -connected, we cannot have voids with disconnected boundaries. Therefore, the grouping of oriented $d$ -simplices can fully recover the void boundaries. Figure 3b gives an example for this when $d=1$ , where we add two 1-simplices to make the complex 1-connected. The four 1-cycles produced by the grouping are exactly the boundaries of the four voids.

In the rest of this subsection, we formalize the above ideas for reconstructing void boundaries and provide a proof for the correctness. Throughout this subsection, $\widetilde{K}$ and $d$ are as defined in Algorithm 1. We first introduce the definition of the natural orientation of a $q$ -simplex in q. We use its induced orientation to canonically orient the boundary simplices.

Definition 4.1 (Natural orientation [22]).

Let $q>1$ and ${\sigma}=\{v_{0},\ldots,v_{q}\}$ be a $q$ -simplex in q, an oriented simplex $\vec{\sigma}=[v^{\prime}_{0},\ldots,v_{q}^{\prime}]$ of ${\sigma}$ is naturally oriented if $\det(v^{\prime}_{1}-v^{\prime}_{0},\ldots,v^{\prime}_{q}-v^{\prime}_{0})>0$ . For each face $\sigma^{\prime}$ of $\sigma$ , the natural orientation of $\sigma$ induces an orientation of $\sigma^{\prime}$ which we term as the induced orientation.

We now formally define the boundary of a void as follows:

Definition 4.2 (Boundary of void).

Let $K$ be a simplicial complex embedded in q where $q\geq 2$ , an oriented $(q-1)$ -simplex $\vec{\sigma}^{q-1}=[v_{0},\ldots,v_{q-1}]$ of ${K}$ is said to bound a void $\mathcal{V}$ of ${}^{q}\smallsetminus|{K}|$ if the following conditions are satisfied:

•

The simplex ${\sigma}^{q-1}=\{v_{0},\ldots,v_{q-1}\}$ is contained in the closure of $\mathcal{V}$ .

•

Let $u$ be an interior point of ${\sigma}^{q-1}=\{v_{0},\ldots,v_{q-1}\}$ , $v$ be a point in $\mathcal{V}$ such that the line segment $\overline{uv}$ is contained in $\mathcal{V}$ and $\overline{uv}$ is orthogonal to the hyperplane spanned by ${\sigma}^{q-1}$ . Furthermore, let $\vec{\sigma}^{q}$ be the naturally oriented simplex of $\{v,v_{0},\ldots,v_{q-1}\}$ . Then, $\vec{\sigma}^{q-1}$ has the induced orientation from $\vec{\sigma}^{q}$ .

The boundary of a void $\mathcal{V}$ is then defined as the set of oriented $(q-1)$ -simplices of ${K}$ bounding $\mathcal{V}$ .

Remark 4.1.

We can also interpret the boundary of a void as a sum of oriented $(q-1)$ -simplices, then the boundary defines a $(q-1)$ -cycle (with $\mathbb{Z}$ coefficients).

We now describe the pairing algorithm of the oriented boundary $d$ -simplices for $\widetilde{K}$ . From now on, we denote the set of boundary $d$ -simplices of $\widetilde{K}$ as $\mathrm{bd}(\widetilde{K})$ . Let $\sigma^{{d-1}}$ be a $({d-1})$ -simplex which is a face of a $d$ -simplex in $\mathrm{bd}(\widetilde{K})$ , we first take a 2D plane $\Delta$ which contains an interior point of $\sigma^{d-1}$ and is orthogonal to the hyperplane spanned by $\sigma^{{d-1}}$ . We then take the intersection of the plane $\Delta$ with each boundary $d$ -simplex in the neighborhood of $\sigma^{d-1}$ to get a set of line segments that we order circularly starting from an arbitrary one. For each two consecutive line segments in this order which enclose a void, we pick a point $p$ on the plane $\Delta$ which resides in the void. Suppose that one of the two line segments is derived from a boundary $d$ -simplex $\sigma^{d}_{0}=\{v_{0},\ldots,v_{d}\}$ . We take the $({d+1})$ -simplex $\sigma^{d+1}=\{p,v_{0},\ldots,v_{d}\}$ and the induced oriented simplex $\vec{\sigma}^{d}_{0}$ of $\sigma^{d}_{0}$ derived from the naturally oriented simplex of $\sigma^{d+1}$ . For the other line segment, we similarly derive an induced oriented simplex $\vec{\sigma}^{d}_{1}$ and pair the two oriented $d$ -simplices $\vec{\sigma}^{d}_{0}$ and $\vec{\sigma}^{d}_{1}$ . Figure 2b can be reused to exemplify the pairing. The union of the shaded regions in the figure is the plane $\Delta$ and $a$ , $b$ , $c$ , and $d$ are the line segments derived from intersecting the plane with four boundary $d$ -simplices. Taking the circular order $a,b,c,d$ , we see that the consecutive ones which enclose a void are $(a,b)$ , $(c,d)$ , and $(d,a)$ . For $(a,b)$ , we can pick $p$ as an interior point in the blue region and the two oriented $d$ -simplices corresponding to $a$ and $b$ can be induced and paired.

In summary, the steps of the VoidBoundary subroutine are the following:

For each $({d-1})$ -simplex $\sigma^{{d-1}}$ being a face of a $d$ -simplex in $\mathrm{bd}(\widetilde{K})$ , pair all oriented boundary $d$ -simplices in the neighborhood. 2. 2.

After gathering all the pairing, group the oriented boundary $d$ -simplices by putting all paired ones into a group. 3. 3.

Return $(\vec{\zeta}_{1},\ldots,\vec{\zeta}_{k})$ , each of which is a group of the oriented boundary $d$ -simplices.

The following theorem concludes the correctness of the reconstruction:

Theorem 4.1.

Any $\vec{\zeta}_{j}$ returned by VoidBoundary is the boundary of a void of ${}^{d+1}\smallsetminus|\widetilde{K}|$ .

Proof.

See Appendix A. ∎

4.2 Algorithm correctness

To prove the correctness of Algorithm 1, we need two conclusions about cycles with $\mathbb{Z}_{2}$ coefficients. Specifically, Proposition 4.1 says that an embedded $(q-1)$ -cycle in $\mathbb{R}^{q}$ separates the space and hence the two oriented simplices of a $(q-1)$ -simplex in the cycle bound different voids. Proposition 4.2 says that a $q$ -simplex in a $q$ -cycle belongs to a $q$ -connected sub-cycle of the $q$ -cycle.

Proposition 4.1.

Let $q\geq 2$ , $\zeta$ be a $(q-1)$ -cycle (with $\mathbb{Z}_{2}$ coefficients) of a simplicial complex embedded in q, and $\mathcal{Z}$ be the closure of the simplicial set $\zeta$ . Then for any $(q-1)$ -simplex $\sigma$ of $\zeta$ , the two oriented simplices of $\sigma$ must bound different voids of ${}^{q}\smallsetminus|\mathcal{Z}|$ .

Proof.

Consider a closed topological $q$ -ball $\mathbb{B}$ such that $\sigma\subseteq\mathbb{B}$ and $\mathbb{B}\cap|\mathcal{Z}\smallsetminus{\sigma}|$ equals the boundary of $\sigma$ . Let $\mathbb{B}_{1}$ and $\mathbb{B}_{2}$ be the two open half balls of $\mathbb{B}$ separated by $\sigma$ . Then it is true that the two oriented simplices of $\sigma$ bound different voids of ${}^{q}\smallsetminus|\mathcal{Z}|$ if and only if $\mathbb{B}_{1}$ and $\mathbb{B}_{2}$ are not connected in ${}^{q}\smallsetminus|\mathcal{Z}|$ . So we only need to show that $\mathbb{B}_{1}$ and $\mathbb{B}_{2}$ are not connected in ${}^{q}\smallsetminus|\mathcal{Z}|$ . Consider a filtration of $\mathcal{Z}$ where $\sigma$ is the last simplex added. Because $\sigma$ is a positive simplex in the filtration, by adding $\sigma$ , the dimension of $\text{\sf H}_{q-1}$ must increase by 1. By Alexander duality, the dimension of $\text{\sf H}_{0}$ of the complement space also increases by 1. Then $\mathbb{B}_{1}$ and $\mathbb{B}_{2}$ cannot be connected in ${}^{q}\smallsetminus|\mathcal{Z}|$ . ∎

Proposition 4.2.

Let $\zeta$ be a $q$ -cycle (with $\mathbb{Z}_{2}$ coefficients) of a simplicial complex where $q>0$ , then for any $q$ -simplex $\sigma$ of $\zeta$ , there must be a $q$ -cycle $\zeta^{\prime}$ (with $\mathbb{Z}_{2}$ coefficients) containing $\sigma$ such that $\zeta^{\prime}\subseteq\zeta$ and $\zeta^{\prime}$ is $q$ -connected.

Proof.

We can construct an undirected graph $L$ for $\zeta$ , with vertices of $L$ corresponding to the $q$ -simplices in $\zeta$ . For each $(q-1)$ -simplex $\sigma^{q-1}$ which is a face of a $q$ -simplex of $\zeta$ , let $\mathcal{N}$ be the set of $q$ -simplices in $\zeta$ having $\sigma^{q-1}$ as a face, then $|\mathcal{N}|$ must be even. We can pair $q$ -simplices of $\mathcal{N}$ arbitrarily, and make each pair of $q$ -simplices form an edge in $L$ . Let $C$ be the connected component of $L$ containing the corresponding vertex of $\sigma$ and $\zeta^{\prime}$ be the $q$ -chain corresponding to $C$ , then $\zeta^{\prime}$ must be a cycle. This is because we can pair the $(q-1)$ -faces of all $q$ -simplices in $\zeta^{\prime}$ according to the edges in $L$ , so $\partial(\zeta^{\prime})=0$ . Furthermore, $\zeta^{\prime}$ contains $\sigma$ , $\zeta^{\prime}\subseteq\zeta$ , and $\zeta^{\prime}$ is $q$ -connected. ∎

Throughout the rest of this subsection, some of the symbols we use refer to Algorithm 1. We endow the ambient space d+1 with a “cellular complex” structure by treating voids of ${}^{d+1}\smallsetminus|\widetilde{K}|$ as $({d+1})$ -dimensional “cells”. This cellular complex of d+1 is denoted as $\mathcal{R}^{d+1}$ and $\mathcal{R}^{d+1}=\widetilde{K}\cup\{\text{voids of }{}^{d+1}\smallsetminus|\widetilde{K}|\}$ . For $\mathcal{R}^{d+1}$ , most terminologies from algebraic topology for simplicial complexes are inherited with the exception that $({d+1})$ -dimensional elements of $\mathcal{R}^{d+1}$ are called $({d+1})$ -cells. Then, we can also let $\theta$ denote the bijection from $({d+1})$ -cells of $\mathcal{R}^{d+1}$ to $V(G)$ . To derive $\partial(\mathcal{V})$ for a void $\mathcal{V}$ of ${}^{d+1}\smallsetminus|\widetilde{K}|$ , we map oriented $d$ -simplices in the boundary of $\mathcal{V}$ (Definition 4.2) to their corresponding unoriented $d$ -simplices. Then $\partial(\mathcal{V})$ is defined as the sum (with $\mathbb{Z}_{2}$ coefficients) of these unoriented $d$ -simplices. It is not hard to see that $\partial(\mathcal{V})$ is a $d$ -cycle (with $\mathbb{Z}_{2}$ coefficients) because each void boundary is a $d$ -cycle (with $\mathbb{Z}$ coefficients).

Proposition 4.3.

For any cut $(S,T)$ of $(G,s_{1},s_{2})$ , the $d$ -chain $\zeta=\theta^{-1}(\xi({S},{T}))$ is a persistent $d$ -cycle of $[{\beta},+\infty)$ and $w(\zeta)=c(S,T)$ .

Proof.

We have three things to show: (i) $\zeta$ contains $\sigma_{\beta}^{\mathcal{F}}$ ; (ii) $w(\zeta)=c(S,T)$ ; (iii) $\zeta$ is a cycle. Claim (i) and (ii) are not hard to verify and we prove claim (iii) by showing that $\zeta=\sum_{\alpha\in\theta^{-1}(S)}\partial(\alpha)$ , so that as a sum of cycles, $\zeta$ is a cycle. The detail for the equality of the two chains is omitted as it is similar to the one in the proof of Proposition 3.2. ∎

Proposition 4.4.

For any persistent $d$ -cycle $\zeta$ of $[{\beta},+\infty)$ , there exists a cut $(S,T)$ of $(G,s_{1},s_{2})$ such that $c(S,T)\leq w(\zeta)$ .

Proof.

Because of the nature of the pruning, $\zeta$ must reside in $K_{\beta}^{\prime}$ . By Proposition 4.2, there must be a $d$ -cycle $\zeta^{\prime}\subseteq\zeta$ such that $\zeta^{\prime}$ is $d$ -connected and contains $\sigma_{\beta}^{\mathcal{F}}$ . Hence, $\zeta^{\prime}$ resides in $\widetilde{K}$ . Let $\mathcal{Z}^{\prime}$ be the closure of the simplicial set $\zeta^{\prime}$ , we can run the void boundary reconstruction algorithm of Section 4.1 on $\mathcal{Z}^{\prime}$ and take a void boundary $\vec{\zeta}$ containing an oriented simplex $\vec{\sigma}_{\beta}^{\mathcal{F}}$ of $\sigma_{\beta}^{\mathcal{F}}$ . We can map each oriented simplex of $\vec{\zeta}$ to its unoriented simplex and let $\zeta_{0}$ be the sum of these unoriented simplices, then $\zeta_{0}$ is a $d$ -cycle (with $\mathbb{Z}_{2}$ coefficients) and $\zeta_{0}\subseteq\zeta^{\prime}$ . By Proposition 4.1, the oppositely oriented simplex of $\vec{\sigma}_{\beta}^{\mathcal{F}}$ must not be in $\vec{\zeta}$ , so $\zeta_{0}$ contains $\sigma_{\beta}^{\mathcal{F}}$ . Let $\vec{\zeta}$ bound a void $\mathcal{V}$ of ${}^{d+1}\smallsetminus|\mathcal{Z}^{\prime}|$ , we can let $\mathcal{A}$ be the $({d+1})$ -chain of $\mathcal{R}^{d+1}$ consisting of all the $({d+1})$ -cells residing in $\mathcal{V}$ and let $\mathcal{B}$ be the $({d+1})$ -chain consisting of all the other $({d+1})$ -cells, then $\partial(\mathcal{A})=\partial(\mathcal{B})=\zeta_{0}$ . Let $v_{1},v_{2}$ be the two end vertices of $\theta(\sigma_{\beta}^{\mathcal{F}})$ . Because the oppositely oriented simplex of $\vec{\sigma}_{\beta}^{\mathcal{F}}$ does not bound $\mathcal{V}$ in $\mathcal{Z}^{\prime}$ , it must be true that one of $v_{1},v_{2}$ is in $\theta(\mathcal{A})$ and the other is in $\theta(\mathcal{B})$ . We can let $(S,T)=(\theta(\mathcal{A}),\theta(\mathcal{B}))$ or $(\theta(\mathcal{B}),\theta(\mathcal{A}))$ based on which set contains the source of the flow network, then $(S,T)$ is a cut of the flow network constructed in Algorithm 1. Furthermore, we have $\zeta_{0}=\theta^{-1}(\xi(S,T))$ and $c(S,T)=w(\zeta_{0})\leq w(\zeta)$ . ∎

The following theorem concludes the correctness of Algorithm 1:

Theorem 4.2.

Algorithm 1 computes a minimal persistent $d$ -cycle for the given interval $[{\beta},+\infty)$ .

Proof.

First, the flow network $(G,s_{1},s_{2})$ constructed by Algorithm 1 is valid. The reason is that, by Proposition 4.1, it cannot happen that the two oriented simplices of $\sigma_{\beta}^{\mathcal{F}}$ bound the same void of ${}^{d+1}\smallsetminus|\widetilde{K}|$ . So $\sigma_{\beta}^{\mathcal{F}}$ must correspond to an edge of $G$ . Then by Proposition 4.3 and 4.4, we can reach the conclusion. ∎

5 Hardness for general complexes

Similar to the work [8], the NP-hardness proofs in this section accomplish the reduction with the help of a suspension operator. While Hatcher [21] defines this operator for general topological spaces, we need a definition of the operator for simplicial complexes and observe some of its properties that are useful for the proofs.

5.1 Suspension operator

Definition 5.1 (Suspension [20]).

The suspension $\mathcal{S}K$ of a simplicial complex $K$ is defined as a simplicial complex

[TABLE]

where $\omega_{1}$ , $\omega_{2}$ are two extra vertices.

Remark 5.1.

In the above definition, we denote a simplex by its set of vertices.

In the rest of this subsection, we let $K$ be an arbitrary simplicial complex. Any simplex of the form $\sigma\cup\{\omega_{i}\}$ in $\mathcal{S}K$ is called a suspended simplex. The symbol $\mathcal{S}$ is also used to denote a linear map $\mathcal{S}:\text{\sf C}_{q}(K)\to\text{\sf C}_{q+1}(\mathcal{S}K)$ , where $\mathcal{S}\sigma=\sigma\cup\{\omega_{1}\}+\sigma\cup\{\omega_{2}\}$ for any $q$ -simplex $\sigma$ of $K$ . Note that since $\mathcal{S}$ is injective, the map $\mathcal{S}$ defines an isomorphism from $\text{\sf C}_{q}(K)$ to the image $\mathcal{S}(\text{\sf C}_{q}(K))$ . For any chain $A\in\mathcal{S}(\text{\sf C}_{q}(K))$ , we abuse the notation slightly by letting $\mathcal{S}^{-1}A$ denote the chain in $\text{\sf C}_{q}(K)$ mapped to $A$ under $\mathcal{S}$ .

Proposition 5.1.

For any $q\geq 1$ , the following diagram commutes:

$\textstyle{\text{\sf C}_{q}(K)\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\mathcal{S}}$$\scriptstyle{\approx}$$\scriptstyle{\partial}$$\textstyle{\text{\sf C}_{q-1}(K)\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\mathcal{S}}$$\scriptstyle{\approx}$$\textstyle{\mathcal{S}(\text{\sf C}_{q}(K))\ignorespaces\ignorespaces\ignorespaces\ignorespaces}$$\scriptstyle{\partial}$$\textstyle{\mathcal{S}(\text{\sf C}_{q-1}(K))}$ *

Proof.

For any $q$ -simplex $\sigma=\{v_{0},\ldots,v_{q}\}$ of $K$ , we have

[TABLE]

In the above equations, the notation $\widehat{v_{i}}$ means that $v_{i}$ is deleted from the simplex. ∎

Proposition 5.2.

For $q\geq 1$ and any $q$ -cycle $\zeta$ of $\mathcal{S}K$ containing only suspended simplices, one has $\zeta\in\mathcal{S}(\text{\sf C}_{q-1}(K))$ .

Proof.

For any suspended $q$ -simplex $\sigma\cup\{\omega_{i}\}$ of $\zeta$ , if $\omega_{i}=\omega_{1}$ , then $\sigma\cup\{\omega_{2}\}$ must also belong to $\zeta$ because no other suspended $q$ -simplices of $\mathcal{S}K$ have $\sigma$ in the boundary. If $\omega_{i}=\omega_{2}$ , the same argument follows. ∎

Proposition 5.3.

If $q$ is the top dimension of $K$ and $q\geq 1$ , then for any $A\in\text{\sf C}_{q+1}(\mathcal{S}K)$ such that $\partial(A)$ contains only suspended simplices, one has $A\in\mathcal{S}(\text{\sf C}_{q}(K))$ .

Proof.

Because $q$ is the top dimension of $K$ , $A$ contains only suspended simplices. For any $\sigma\cup\{\omega_{i}\}\in A$ , we have $\sigma\in\partial\big{(}\sigma\cup\{\omega_{i}\}\big{)}$ . If $\omega_{i}=\omega_{1}$ , to make $\sigma$ cancelled in $\partial(A)$ , $\sigma\cup\{\omega_{2}\}$ must also belong to $A$ because no other $(q+1)$ -simplices in $\mathcal{S}K$ have $\sigma$ in the boundary. If $\omega_{i}=\omega_{2}$ , the same argument follows. ∎

5.2 Hardness for finite intervals

The following proposition helps to prove our conclusion of the hardness:

Proposition 5.4.

PCYC-FINd-1 reduces to PCYC-FINd for $d\geq 2$ .

Proof.

Given an instance $(K,\mathcal{F},[{\beta},{\delta}))$ of PCYC-FINd-1, where the $i_{\text{th}}$ complex of $\mathcal{F}$ is denoted as $K_{i}$ , we can assume the top dimension of $K$ to be $d$ . The reason is that if it were not, we can restrict $\mathcal{F}$ to the $d$ -skeleton of $K$ without affecting $\mathsf{D}_{d-1}(\mathcal{F})$ and the persistent $(d-1)$ -cycles. Then, we let $\mathcal{S}K$ be the simplicial complex for the instance of PCYC-FINd we are going to construct. For any suspended $d$ -simplex $\sigma\cup\{\omega_{i}\}$ of $\mathcal{S}K$ , let the weight of $\sigma\cup\{\omega_{i}\}$ be half of the weight of $\sigma$ in $K$ . Furthermore, let the weight of any non-suspended $d$ -simplex of $\mathcal{S}K$ be the sum of all the weights of $(d-1)$ -simplices in $K$ plus $1$ . We endow $\mathcal{S}K$ with a filtration $\mathcal{S}\mathcal{F}:\varnothing=\widehat{K}_{0}\subseteq\widehat{K}_{1}\subseteq\ldots\subseteq\widehat{K}_{3n+2}=\mathcal{S}K$ , where $n$ is the number of simplices of $K$ . Denoting the ${i}_{\text{th}}$ simplex added in $\mathcal{F}$ as $\sigma_{i}$ and the ${i}_{\text{th}}$ simplex added in $\mathcal{S}\mathcal{F}$ as $\widehat{\sigma}_{i}$ , we let $\widehat{\sigma}_{1}=\{\omega_{1}\}$ , $\widehat{\sigma}_{2}=\{\omega_{2}\}$ , and for any $1\leq i\leq n$ , $\widehat{\sigma}_{3i}=\sigma_{i}$ , $\widehat{\sigma}_{3i+1}=\sigma_{i}\cup\{\omega_{1}\}$ , $\widehat{\sigma}_{3i+2}=\sigma_{i}\cup\{\omega_{2}\}$ .

We observe the following facts:

(i)

For any $i$ , $\widehat{\sigma}_{3i}$ is positive and pairs with $\widehat{\sigma}_{3i+1}$ in $\mathcal{S}\mathcal{F}$ . 2. (ii)

For any $i$ and $j$ , if there is a $(d-1)$ -cycle created by $\sigma_{i}$ which is a boundary in $K_{j}$ , then there is a $d$ -cycle created by $\widehat{\sigma}_{3i+2}$ which is a boundary in $\widehat{K}_{3j+2}$ . 3. (iii)

For any $i$ and $j$ , if there is a $d$ -cycle created by $\widehat{\sigma}_{3i+2}$ which is a boundary in $\widehat{K}_{3j+2}$ , then there is a $(d-1)$ -cycle created by $\sigma_{i}$ which is a boundary in $K_{j}$ .

The correctness of (i) is not hard to verify. To verify (ii), we can suspend the $(d-1)$ -cycle and use Proposition 5.1 to reach the claim. The argument for (iii) is as follows: Consider a $d$ -cycle $\widehat{\zeta}_{0}$ created by $\widehat{\sigma}_{3i+2}$ which is a boundary in $\widehat{K}_{3j+2}$ . For any non-suspended $d$ -simplex $\sigma$ of $\widehat{\zeta}_{0}$ , we add $\partial\big{(}\sigma\cup\{\omega_{1}\}\big{)}$ to the cycle $\widehat{\zeta}_{0}$ so that $\sigma$ is canceled and only suspended simplices are added. Note that the adding process only adds $d$ -simplices in $\widehat{K}_{3i+2}$ and never cancels $\widehat{\sigma}_{3i+2}$ . After all non-suspended simplices of $\widehat{\zeta}_{0}$ are canceled, we derive a $d$ -cycle $\widehat{\zeta}$ which is created by $\widehat{\sigma}_{3i+2}$ and contains only suspended simplices. By Proposition 5.2, $\mathcal{S}^{-1}\widehat{\zeta}$ is well defined. Since $\widehat{\zeta}$ is homologous to $\widehat{\zeta}_{0}$ in $\widehat{K}_{3i+2}$ , $\widehat{\zeta}$ is also a boundary in $\widehat{K}_{3j+2}$ . Let $\widehat{\zeta}$ be the boundary of a $(d+1)$ -chain $\widehat{A}$ in $\widehat{K}_{3j+2}$ . Because $\mathcal{S}K_{j}=\widehat{K}_{3j+2}$ , by Proposition 5.3, $\widehat{A}\in\mathcal{S}(\text{\sf C}_{d}(K_{j}))$ . Furthermore, by Proposition 5.1, we have $\mathcal{S}^{-1}\widehat{\zeta}=\mathcal{S}^{-1}\partial(\widehat{A})=\partial(\mathcal{S}^{-1}\widehat{A})$ . So $\mathcal{S}^{-1}\widehat{\zeta}$ is a $(d-1)$ -cycle created by $\sigma_{i}$ which is a boundary in $K_{j}$ .

From the above facts, it is immediate that $\widehat{\sigma}_{3{\beta}+2}$ is a positive simplex in $\mathcal{S}\mathcal{F}$ and pairs with $\widehat{\sigma}_{3\delta+2}$ so that $[3{\beta}+2,3{\delta}+2)$ is an interval in $\mathsf{D}_{d}(\mathcal{S}\mathcal{F})$ . It is also true that there is a bijection from the persistent $(d-1)$ -cycles of $[{\beta},{\delta})$ to the persistent $d$ -cycles of $[3{\beta}+2,3{\delta}+2)$ containing only suspended simplices. Furthermore, the bijection preserves the weights of the cycles. From the weight assigning policy, the minimal persistent $d$ -cycle of $[3{\beta}+2,3{\delta}+2)$ must contain only suspended simplices, so this minimal persistent $d$ -cycle of $[3{\beta}+2,3{\delta}+2)$ induces a minimal persistent $(d-1)$ -cycle of $[{\beta},{\delta})$ . Now we have reduced PCYC-FINd-1 to PCYC-FINd. Furthermore, the reduction is in polynomial time and the size of $(\mathcal{S}K,\mathcal{S}\mathcal{F},[3{\beta}+2,3{\delta}+2))$ is a polynomial function of the size of $(K,\mathcal{F},[{\beta},{\delta}))$ . ∎

We have the following result from [13]:

Proposition 5.5.

PCYC-FIN1 is NP-hard.

Combining Proposition 5.4 and 5.5, we obtain the following theorem:

Theorem 5.1.

PCYC-FINd is NP-hard for $d\geq 1$ .

5.3 Hardness for infinite intervals

In this subsection, we prove that it is NP-hard to approximate WPCYC-INFd with any fixed ratio. Let PROB be a minimization problem with solutions having positive costs. Given an instance $\mathcal{I}$ of PROB, let $C^{*}$ be the cost of the minimal solution of $\mathcal{I}$ . For $r\geq 1$ , a solution of $\mathcal{I}$ with cost $C$ is said to have an approximation ratio $r$ if $C/C^{*}\leq r$ [10]. We let PROB $[r]$ denote the problem that asks for an approximate solution with ratio $r$ given an instance of PROB. Moreover, in order to make approximation ratios well-defined for WPCYC-INFd, we let WPCYC-INF ${}_{d}^{+}$ denote a subproblem of WPCYC-INFd where all $d$ -simplices are positively weighted.

Before proving the hardness result, we first recall the definition of the nearest codeword problem, which is NP-hard to approximate with any fixed ratio [8]:

Problem 5.1 (NR-CODE).

Given an $l\times k$ full-rank matrix $\mathcal{A}$ over $\mathbb{Z}_{2}$ for $k<l$ and a vector $y_{0}\in(\mathbb{Z}_{2})^{l}\smallsetminus\mathrm{Img}\,(\mathcal{A})$ , find a vector in $y_{0}+\mathrm{Img}\,(\mathcal{A})$ with the minimal Hamming weight.

Remark 5.2.

The Hamming weight of a vector $y$ , denoted as $\|y\|_{H}$ , is the number of non-zero components in $y$ .

Theorem 5.2.

WPCYC-INF ${}^{\,+}_{2}$ is NP-hard to approximate with any fixed ratio.

Similar to the NP-hardness proof of homology localization in [8], our proof of Theorem 5.2 conducts the reduction from the NR-CODE problem. One may think that a direct reduction from homology localization may be more straightforward. However, such a reduction is not immediately evident. The two problems appear to be of different nature: While the homology localization problem asks for a minimal cycle in a given homology class, WPCYC-INF ${}^{+}_{2}$ asks for a minimal cycle in a complex containing a given simplex without referring to any particular homology class.

Proof.

For any $r>1$ , we reduce the NP-hard problem NR-CODE $[2r]$ to WPCYC-INF ${}^{+}_{2}[r]$ . Given an instance $(\mathcal{A},y_{0})$ of NR-CODE $[2r]$ , we first compute the $(l-k)\times l$ parity check matrix $\mathcal{A}^{\perp}$ [8], which is a matrix such that $\mathrm{Ker}\,(\mathcal{A}^{\perp})=\mathrm{Img}\,(\mathcal{A})$ . Similar to the proof of Lemma 4.3.1 in [8], we then build a “tube complex” $T_{1}$ with $(l-k)$ 1-cells each of which is a 1-sphere and $l$ 2-cells each of which is a 2-sphere with holes. The 2-cells of $T_{1}$ are attached to the 1-cells along the holes such that the boundary matrix $\partial_{2}$ of this tube complex equals $\mathcal{A}^{\perp}$ . The “ $q$ -chains” and “ $q$ -cycles” for a tube complex are analogously defined as for a simplicial complex. We also assign a weight of 1 to each 2-cell of $T_{1}$ . By this construction, there is a straightforward bijection $\phi:(\mathbb{Z}_{2})^{l}\to\text{\sf C}_{2}(T_{1})$ , such that the Hamming weight of a vector equals the weight of the corresponding 2-chain. Note that $\text{\sf Z}_{2}(T_{1})=\mathrm{Ker}\,(\partial_{2})=\phi(\mathrm{Ker}\,(\mathcal{A}^{\perp}))=\phi(\mathrm{Img}\,(\mathcal{A}))$ . Let $\widetilde{y}_{0}=\phi(y_{0})$ , we then add a 2-cell $\widehat{t}$ whose boundary equals $\partial_{2}(\widetilde{y}_{0})$ to $T_{1}$ and get a new tube complex $T_{2}$ . We call the 2-cycles in $T_{2}$ which are not in $T_{1}$ as the new 2-cycles in $T_{2}$ . Then $\widehat{t}+\widetilde{y}_{0}$ is a new 2-cycle in $T_{2}$ and the set of new 2-cycles in $T_{2}$ is $\widehat{t}+\widetilde{y}_{0}+\text{\sf Z}_{2}(T_{1})$ . We let the weight of $\widehat{t}$ also be 1. Note that there is a bijection $\psi:y_{0}+\mathrm{Img}\,(\mathcal{A})\to\widehat{t}+\widetilde{y}_{0}+\text{\sf Z}_{2}(T_{1})$ , where $\psi(y_{0}+z)=\widehat{t}+\widetilde{y}_{0}+\phi(z)$ for any $z\in\mathrm{Img}\,(\mathcal{A})$ , such that $w(\psi(y_{0}+z))=\|y_{0}+z\|_{H}+w(\widehat{t})$ .

We then construct an instance of WPCYC-INF ${}^{+}_{2}[r]$ by first triangulating $T_{2}$ to get a simplicial complex $K$ . We make $K$ 2-weighted such that the sum of the weights of all triangles in any 2-cell of $T_{2}$ equals the weight of the 2-cell. It is not hard to make the size of $K$ a polynomial function of the number of cells of $T_{2}$ . Let $\sigma$ be a 2-simplex in the triangulation of the 2-cell $\widehat{t}$ . We build a filtration $\mathcal{F}$ of $K$ with $\sigma$ being the last simplex added. Let the index of $\sigma$ in $\mathcal{F}$ be ${\beta}$ . Then, $[{\beta},+\infty)$ is an infinite interval of $\mathsf{D}_{2}(\mathcal{F})$ . Note that there is a bijection between the new 2-cycles in $T_{2}$ and the persistent 2-cycles of $[{\beta},+\infty)$ , where the weights of the cycles are preserved. Therefore, from the solution of WPCYC-INF ${}^{+}_{2}[r]$ with the input $(K,\mathcal{F},[{\beta},+\infty))$ , we can derive a new 2-cycle $\widehat{t}+\widetilde{y}_{0}+\zeta$ of $T_{2}$ , where $\zeta\in\text{\sf Z}_{2}(T_{1})$ and $\widehat{t}+\widetilde{y}_{0}+\zeta$ is an $r$ -approximation of the minimal new 2-cycle. Let $\widehat{t}+\widetilde{y}_{0}+\zeta^{*}$ be a minimal new 2-cycle of $T_{2}$ , we have

[TABLE]

We also have

[TABLE]

Therefore

[TABLE]

Since $y_{0}+\phi^{-1}(\zeta^{*})$ is a minimal solution of $(\mathcal{A},y_{0})$ , then $y_{0}+\phi^{-1}(\zeta)$ is a $2r$ -approximation of the minimal solution of $(\mathcal{A},y_{0})$ . Hence, we have reduced NR-CODE $[2r]$ to WPCYC-INF ${}^{+}_{2}[r]$ . Furthermore, the reduction is in polynomial time and the sizes of the instances are related by a polynomial function, so WPCYC-INF ${}^{+}_{2}[r]$ is NP-hard. ∎

Theorem 5.3.

WPCYC-INF ${}^{\,+}_{d}$ is NP-hard to approximate with any fixed ratio for $d\geq 2$ .

Proof.

For any $d\geq 3$ and $r\geq 1$ , we reduce WPCYC-INF ${}^{+}_{d-1}[r]$ to WPCYC-INF ${}^{+}_{d}[r]$ . Given an instance $(K,\mathcal{F},[{\beta},+\infty))$ of WPCYC-INF ${}^{+}_{d-1}[r]$ , where the $i_{\text{th}}$ complex of $\mathcal{F}$ is denoted as $K_{i}$ , let $K^{\prime}=\mathcal{S}K_{\beta}^{d-1}$ where $K_{\beta}^{d-1}$ is the $(d-1)$ -skeleton of $K_{\beta}$ . We make $K^{\prime}$ $d$ -weighted such that any $d$ -simplex $\sigma\cup\{\omega_{i}\}$ of $K^{\prime}$ has half of the weight of $\sigma$ in $K$ . The complex $K^{\prime}$ is endowed with a filtration $\mathcal{F}^{\prime}$ such that $\sigma_{\beta}^{\mathcal{F}}\cup\{\omega_{2}\}$ is the last simplex added to $\mathcal{F}^{\prime}$ . Let ${\beta}^{\prime}$ be the index of $\sigma_{\beta}^{\mathcal{F}}\cup\{\omega_{2}\}$ in $\mathcal{F}^{\prime}$ , then $[{\beta}^{\prime},+\infty)\in\mathsf{D}_{d}(\mathcal{F}^{\prime})$ . It is true that $\mathcal{S}$ restricts to a bijection from $\text{\sf Z}_{d-1}(K_{\beta})$ to $\text{\sf Z}_{d}(K^{\prime})$ preserving the weights of the cycles. Furthermore, for any $\zeta\in\text{\sf Z}_{d-1}(K_{\beta})$ , $\zeta$ is a persistent $(d-1)$ -cycle of $[{\beta},+\infty)\in\mathsf{D}_{d-1}(\mathcal{F})$ if and only if $\mathcal{S}\zeta$ is a persistent $d$ -cycle of $[{\beta}^{\prime},+\infty)\in\mathsf{D}_{d}(\mathcal{F}^{\prime})$ . Suppose that $\zeta^{\prime}$ is a solution for the instance $(K^{\prime},\mathcal{F}^{\prime},[{\beta}^{\prime},+\infty))$ of WPCYC-INF ${}^{+}_{d}[r]$ , i.e., $\zeta^{\prime}$ is an $r$ -approximation of the minimal solution. Then, $\mathcal{S}^{-1}\zeta^{\prime}$ is an $r$ -approximation for the instance $(K,\mathcal{F},[{\beta},+\infty))$ of WPCYC-INF ${}^{+}_{d-1}[r]$ . Therefore, the reduction is done. ∎

6 Experimental results

We experiment with our algorithms for WPCYC-FIN2 and WEPCYC-INF2 on several volume datasets. Since volume data have a natural cubical complex structure, we adapt our implementation slightly in order to work on cubical complexes. The cubical complex for volume data consists of cells in dimensions from 0 to 3 with the underlying space homeomorphic to a 3-dimensional ball. Note that a filtration built from a volume dataset does not produce any infinite intervals. Hence, in order to test our algorithm for WEPCYC-INF2, we take a finite interval and compute the minimal 2-cycle born at the birth time, which is exactly what WEPCYC-INF2 computes. We use the Gudhi [29] library to build the filtrations and compute the persistence intervals. From the experiments, we can see that the minimal persistent 2-cycles computed by our algorithms capture various features of the data which originate from different fields. Note that the combustion, hurricane, and medical datasets are time-varying and we chose a single time frame to compute the persistent intervals and cycles.

Cosmology.

The simulation data shown in Figure 4a from computational cosmology [2] consist of dark matter represented as particles. The thread-like structures in deep purple shown in Figure 4a correspond to sites of large scale structure formation. Galaxy clusters/superclusters are contained in such large scale structures. Figure 4b shows the minimal persistent 2-cycles of the top five longest intervals computed by our algorithms and these cycles precisely represent the top five galaxy clusters/superclusters in volume.

Combustion.

The data shown in Figure 4c correspond to the physical variable666 A physical variable defines a scalar value of a certain kind on each point. $\chi$ from a model of a turbulent combustion process. The variable $\chi$ represents scalar dissipation rate and provides a measure of the maximum possible chemical reaction rate. The minimal persistent 2-cycles shown in Figure 4d represent areas with high value of $\chi$ .

Hurricane.

This dataset777 The Hurricane Isabel data is produced by the Weather Research and Forecast (WRF) model, courtesy of NCAR, and the U.S. National Science Foundation (NSF). with $11$ physical variables corresponds to the devastating hurricane named Isabel. We down-sampled the data into a resolution of $250\times 250\times 50$ and worked with two physical variables. The minimal persistent 2-cycle colored blue in Figure 5a is computed on the cloud-volume variable and extracts the eye of the hurricane. The minimal persistent 2-cycle colored green in Figure 5b is computed on the pressure variable and captures the jagged shape of the pressure variation around the hurricane.

Medical imaging.

This dataset from the ADNI [26] project contains the MRI scan of a healthy human skull. The minimal persistent 2-cycles corresponding to the larger intervals as shown in Figure 5c are computed from two time frames. They extract significant features such as eyes, cartilages, nerves, and muscles.

Material science.

We consider the atomic configuration of $BaTiO_{3}$ , which is a ferroelectric material used for making capacitors, transducers, and microphones. Figure 6a shows the atomic configuration of the molecule, where the red, grey, and green balls denote the Oxygen, Titanium, and Barium atoms separately and the radii of the balls equal the radii of the corresponding atoms. Volume data are built by uniformly sampling a $3\times 3\times 3$ lattice structure similar to the one shown in Figure 6a, with the step width equal to one angstrom (note that Figure 6a only shows a $2\times 2\times 2$ lattice structure). Scalar value on a point of the volume is determined as follows: For each atom, let the distance from the point to the atom’s center be $d$ , then the scalar value of the point contributed by the atom is $\max\{w(r-d)/r,0\}$ , where $r$ is the radius of the atom and $w$ is the atomic weight. The scalar value on the point is then equal to the sum of the above values contributed by all atoms. For the purpose of this experiment, we computed minimal persistent 2-cycles on both the original scalar function and its negated one. Figure 6b shows a portion of the minimal persistent 2-cycles computed on the original function, where the purple, red, and green cycles correspond to atoms of Barium, Titanium, and Oxygen respectively. In our experiment, every atom corresponds to such a minimal persistent 2-cycle of a long interval. Figure 6c shows a portion of the minimal persistent 2-cycles computed on the negated function, where the cycles complement the Barium atoms. Figure 6d shows the output on the negated function from a tetragonal lattice structure [27], where the atomic bonds are not straight (see Figure 6d inlay). The stretch on the lattice structure leads to minimal persistent 2-cycles with non-trivial genus.

7 Conclusions

In this paper, we inspect the computational complexity for several problems concerning minimal persistent cycles. We expand the hardness results found in [13] and discover the cases that are NP-hard and others that are solvable in polynomial time. For general complexes, we conclude that the computation is NP-hard over all dimensions for finite intervals and NP-hard over dimension greater than one for infinite intervals. Besides, we find the problems to be tractable in dimension $d$ if the given complex is a weak $({d+1})$ -pseudomanifold and, for infinite intervals, if the weak $({d+1})$ -pseudomanifold is embedded in d+1.

This research leads to some open questions concerning persistent cycles:

i. In our experiments, some persistent cycles correspond to important features of the data (see Section 6). However, we also ran into some intervals whose persistent cycles do not have obvious meanings. If there are ways to design filtrations for data such that persistent cycles are related to the important features, then the prospect for the application of persistent cycles or persistence in general would be more extensive.

ii. As found in [13], persistent cycles are not stable in general even when only the weights of the cycles are considered. It will be helpful to figure out assumptions that are still relevant in practice, but under which the persistent cycles remain stable.

iii. We have presented $O(n^{2})$ -time algorithms for computing a minimal persistent cycle for a given interval. A natural question is whether this time complexity can be improved. Furthermore, can we devise a better algorithm to compute minimal persistent cycles for all intervals (i.e., the minimal persistent basis [13]), improving upon the obvious $O(n^{3})$ -time algorithm that runs our algorithms on each interval?

Acknowledgments:

This research was conducted with the support of the NSF grants CCF-1740761 and CCF-1839252. We thank the anonymous reviewers for insightful comments.

Appendix A Proof of Theorem 4.1

We first define some symbols used in this section. The interior of a set $U$ is denoted by $\mathrm{Int}(U)$ . The boundary of a topological ball $\mathbb{B}$ is denoted by $\mathrm{bd}(\mathbb{B})$ . The set of $q$ -cofaces of a simplex $\sigma$ in a $\Delta$ -complex [21] $K$ is denoted by $\mathrm{cof}\,^{K}_{q}(\sigma)$ .

The proof of Theorem 4.1 is based on the extended Jordan–Brouwer separation theorem (Theorem A.1) by Alexander [1]. The statement of the theorem depends on the following definition:

Definition A.1 (Pseudomanifold).

A simplicial complex $K$ is a $q$ -pseudomanifold if $K$ is a pure $q$ -complex and each $(q-1)$ -simplex is a face of exactly two $q$ -simplices in $K$ .

Remark A.1.

Note that definitions for $q$ -pseudomanifolds, such as in [28], typically assume the complex to be $q$ -connected.

Theorem A.1.

Let $q>1$ and $\mathcal{M}$ be a finite $(q-1)$ -connected $(q-1)$ -pseudomanifold embedded in q, then ${}^{q}\smallsetminus|\mathcal{M}|$ has exactly 2 connected components.

Now we can finish our proof:

Proof of Theorem 4.1.

The general idea of the proof is as follows: Using a trick which we call the “de-contracting”, we first create a $\Delta$ -complex $\widetilde{K}^{\prime}$ where each oriented simplex of $\vec{\zeta}_{j}$ uniquely corresponds to an unoriented simplex. Then, using a trick which we call the “de-pinching”, we show that $\vec{\zeta}_{j}$ is the boundary of a region $\mathcal{A}$ . Finally, from the above fact, we use proof by contradiction to reach the conclusion. Figure 7b gives an example of the “de-contracting” and “de-pinching”.

First, let $\Sigma^{\prime}$ be the set of $d$ -simplices of $\widetilde{K}$ whose both oriented simplices are in $\vec{\zeta}_{j}$ . For a $d$ -simplex $\sigma^{d}$ of $\Sigma^{\prime}$ , we can let $\mathbb{B}^{\prime}$ be a topological $({d+1})$ -ball residing in d+1 such that $\mathrm{bd}(\mathbb{B}^{\prime})$ equals two $d$ -simplices with boundaries glued together. We then homeomorphically map points of ${}^{d+1}\smallsetminus\sigma^{d}$ to ${}^{d+1}\smallsetminus\mathbb{B}^{\prime}$ . By taking care of the mapping near the boundary of $\mathbb{B}^{\prime}$ , we can get a new ambient d+1 and a new $\Delta$ -complex where all simplices of $\widetilde{K}$ are untouched except that $\sigma^{d}$ now corresponds to the two $d$ -simplices bounding $\mathbb{B}^{\prime}$ . We can also think of the above process as “de-contracting” the topological $d$ -ball $\sigma^{d}$ into the topological $({d+1})$ -ball $\mathbb{B}^{\prime}$ so that $\sigma^{d}$ turns into two separate $d$ -simplices with identical $(d-1)$ -faces (see Figure 7a for an example). After doing the “de-contraction” for all $d$ -simplices in $\Sigma^{\prime}$ , we get a $\Delta$ -complex $\widetilde{K}^{\prime}$ . It is true that an oriented boundary $d$ -simplex in $\widetilde{K}$ can be naturally identified as an oriented boundary $d$ -simplex in $\widetilde{K}^{\prime}$ . It is also true that the groups of oriented boundary $d$ -simplices in $\widetilde{K}$ are still groups of oriented boundary $d$ -simplices in $\widetilde{K}^{\prime}$ under the natural identification. So we can let $\vec{\zeta}_{j}$ denote the same group of oriented $d$ -simplices in $\widetilde{K}^{\prime}$ . The construction guarantees that if $\vec{\zeta}_{j}$ is the boundary of a void of ${}^{d+1}\smallsetminus|\widetilde{K}^{\prime}|$ , then $\vec{\zeta}_{j}$ is also the boundary of a void of ${}^{d+1}\smallsetminus|\widetilde{K}|$ . So we only need to show that $\vec{\zeta}_{j}$ is the boundary of a void of ${}^{d+1}\smallsetminus|\widetilde{K}^{\prime}|$ (see Figure 7b for an example). From now on, we always treat $\vec{\zeta}_{j}$ as a set of oriented $d$ -simplices as well as a $d$ -cycle (with $\mathbb{Z}$ coefficients) in $\widetilde{K}^{\prime}$ .

Since different oriented simplices of $\vec{\zeta}_{j}$ correspond to different unoriented simplices in $\widetilde{K}^{\prime}$ , we define a bijection $\psi:\vec{\zeta}_{j}\to\zeta$ . The bijection $\psi$ maps each oriented simplex of $\vec{\zeta}_{j}$ to its corresponding unoriented simplex and $\zeta$ is the image of this mapping. We then let $\mathcal{M}$ be the closure of the simplicial set $\zeta$ . Note that $\zeta$ is a $d$ -cycle (with $\mathbb{Z}_{2}$ coefficients) of $\widetilde{K}^{\prime}$ and $\mathcal{M}$ is a subcomplex of $\widetilde{K}^{\prime}$ . Therefore, each $(d-1)$ -simplex is a face of an even number of $d$ -simplices in $\mathcal{M}$ . We first pick a $(d-1)$ -simplex $\sigma^{d-1}$ of $\mathcal{M}$ such that $\big{|}\mathrm{cof}\,_{d}^{\mathcal{M}}(\sigma^{d-1})\big{|}>2$ , then pick two $d$ -simplices $\sigma^{d}_{0}$ and $\sigma^{d}_{1}$ from $\mathrm{cof}\,_{d}^{\mathcal{M}}(\sigma^{d-1})$ such that $\psi^{-1}(\sigma^{d}_{0})$ and $\psi^{-1}(\sigma^{d}_{1})$ are paired in the void boundary reconstruction for $\widetilde{K}^{\prime}$ . It is then true that $\sigma^{d}_{0}\cup\sigma^{d}_{1}$ forms a topological $d$ -ball $\mathbb{B}^{d}_{1}$ containing $\sigma^{d-1}$ . Forming the topological $d$ -balls for all such pairs of $d$ -simplices in $\mathrm{cof}\,_{d}^{\mathcal{M}}(\sigma^{d-1})$ , we get a set of $d$ -balls $\{\mathbb{B}^{d}_{1},\ldots,\mathbb{B}^{d}_{\kappa}\}$ for $\kappa=\big{|}\mathrm{cof}\,_{d}^{\mathcal{M}}(\sigma^{d-1})\big{|}\big{/}2$ . For each $i$ , we slightly move $\mathbb{B}^{d}_{i}\smallsetminus\mathrm{Int}(\sigma^{d-1})$ while keeping $\mathrm{bd}(\mathbb{B}^{d}_{i})$ untouched. We then take the closure of each $\mathbb{B}^{d}_{i}\smallsetminus\mathrm{Int}(\sigma^{d-1})$ to get a new $\Delta$ -complex $\mathcal{M}_{1}$ in which the $\mathbb{B}^{d}_{i}$ ’s have their interiors disjoint. Note that in $\mathcal{M}_{1}$ , $\sigma^{d-1}$ now corresponds to $\kappa$ different $({d-1})$ -simplices sharing the boundary. We can repeat the above “de-pinching” process for each $(d-1)$ -simplex having more than two $d$ -cofaces in $\mathcal{M}$ and then get a sequence of $\Delta$ -complexes $(\mathcal{M}_{0},\mathcal{M}_{1},\ldots,\mathcal{M}_{h})$ . In the sequence, $\mathcal{M}_{0}=\mathcal{M}$ and $\mathcal{M}_{i}$ is derived from $\mathcal{M}_{i-1}$ by doing the “de-pinching” on a $(d-1)$ -simplex. It is then true that $\mathcal{M}_{h}$ is a pure $d$ -dimensional $d$ -connected $\Delta$ -complex where each $(d-1)$ -simplex is a face of exactly two $d$ -simplices. Since we can subdivide $\mathcal{M}_{h}$ to make it a simplicial complex, by Theorem A.1, $|\mathcal{M}_{h}|$ must separate d+1 into two connected components. Note that for each $i$ , we can treat ${}^{d+1}\smallsetminus|\mathcal{M}_{i}|$ as a subset of ${}^{d+1}\smallsetminus|\mathcal{M}_{i+1}|$ because to deform $\mathcal{M}_{i+1}$ back to $\mathcal{M}_{i}$ , we only need to contract some points in ${}^{d+1}\smallsetminus|\mathcal{M}_{i+1}|$ to points in $|\mathcal{M}_{i+1}|$ . Then the connected components of ${}^{d+1}\smallsetminus|\mathcal{M}|$ are still connected in ${}^{d+1}\smallsetminus|\mathcal{M}_{h}|$ . Since all oriented $d$ -simplices of $\vec{\zeta}_{j}$ bound the same void of ${}^{d+1}\smallsetminus|\widetilde{K}^{\prime}|$ , we can let this void be $\mathcal{V}$ . The void $\mathcal{V}$ is still connected in ${}^{d+1}\smallsetminus|\mathcal{M}|$ because ${}^{d+1}\smallsetminus|\widetilde{K}^{\prime}|\subseteq{}^{d+1}\smallsetminus|\mathcal{M}|$ . Therefore, $\mathcal{V}$ is still connected in ${}^{d+1}\smallsetminus|\mathcal{M}_{h}|$ . We can let $\mathcal{A}$ be the connected component of ${}^{d+1}\smallsetminus|\mathcal{M}_{h}|$ containing $\mathcal{V}$ and let $\mathcal{B}$ be the other connected component. The $d$ -simplices in $\mathcal{M}$ and $\mathcal{M}_{h}$ can be identified because going from each $\mathcal{M}_{i}$ to $\mathcal{M}_{i+1}$ the interior of each $d$ -simplex is never touched. Therefore, $\zeta$ is still a $d$ -cycle (with $\mathbb{Z}_{2}$ coefficients) in $\mathcal{M}_{h}$ . We then have that the two $d$ -cycles (with $\mathbb{Z}$ coefficients) in $\mathcal{M}_{h}$ , which are derived from the two consistent orientations of simplices of $\zeta$ , bound $\mathcal{A}$ and $\mathcal{B}$ . Then, as one of the two $d$ -cycles (with $\mathbb{Z}$ coefficients) derived from $\zeta$ , $\vec{\zeta}_{j}$ must be the boundary of $\mathcal{A}$ or $\mathcal{B}$ in $\mathcal{M}_{h}$ . We have that $\vec{\zeta}_{j}$ bounds $\mathcal{A}$ because $\mathcal{B}$ does not contain points from $\mathcal{V}$ . A fact about our construction is that to deform each $\mathcal{M}_{i}$ back into $\mathcal{M}_{i-1}$ , we only need to contract points in $\mathcal{B}$ . This implies that $\mathcal{A}$ is still a void of ${}^{d+1}\smallsetminus|\mathcal{M}|$ with boundary $\vec{\zeta}_{j}$ (see Figure 7b for an example).

To prove that $\vec{\zeta}_{j}$ is the boundary of a void of ${}^{d+1}\smallsetminus|\widetilde{K}^{\prime}|$ , we only need to show that there are no oriented $d$ -simplices which are in the boundary of $\mathcal{V}$ but do not belong to $\vec{\zeta}_{j}$ . For contradiction, suppose that there is such an oriented $d$ -simplex $\vec{\sigma}^{d}$ . Then $\vec{\sigma}^{d}$ must not be oppositely oriented to any oriented simplex of $\vec{\zeta}_{j}$ because otherwise $\vec{\sigma}^{d}$ would bound another connected component of ${}^{d+1}\smallsetminus|\mathcal{M}|$ and thus bound another connected component of ${}^{d+1}\smallsetminus|\widetilde{K}^{\prime}|$ . Let ${\sigma}^{d}$ be the unoriented $d$ -simplex of $\vec{\sigma}^{d}$ , then ${\sigma}^{d}\not\in\mathcal{M}$ because otherwise $\vec{\sigma}^{d}$ would be oppositely oriented to an oriented simplex of $\vec{\zeta}_{j}$ . Since ${\sigma}^{d}\not\in\mathcal{M}$ , the interior of ${\sigma}^{d}$ must reside in ${}^{d+1}\smallsetminus|\mathcal{M}|$ . From now on, we always treat $\mathcal{A}$ as a void of ${}^{d+1}\smallsetminus|\mathcal{M}|$ . Then among all voids of ${}^{d+1}\smallsetminus|\mathcal{M}|$ , the interior of ${\sigma}^{d}$ resides in $\mathcal{A}$ . This is because $\mathcal{A}$ is the void of ${}^{d+1}\smallsetminus|\mathcal{M}|$ containing $\mathcal{V}$ . If ${\sigma}^{d}$ resides in a void other than $\mathcal{A}$ , points to either side of ${\sigma}^{d}$ cannot be from $\mathcal{V}$ . Since $\widetilde{K}^{\prime}$ is $d$ -connected, there must be a sequence of $d$ -simplices $({\sigma}^{d}_{0},\ldots,{\sigma}^{d}_{l})$ of $\widetilde{K}^{\prime}$ such that ${\sigma}^{d}_{0}={\sigma}^{d}$ , ${\sigma}^{d}_{l}\in\mathcal{M}$ , and ${\sigma}^{d}_{i}$ , ${\sigma}^{d}_{i+1}$ share a $(d-1)$ -face for each $i$ such that $0\leq i<l$ . Because the interior of ${\sigma}^{d}_{l}$ is not in $\mathcal{A}$ , we can let ${\sigma}^{d}_{l^{\prime}}$ be the first $d$ -simplex in the sequence whose interior is not in $\mathcal{A}$ , then $l^{\prime}\neq 0$ and the interior of ${\sigma}^{d}_{l^{\prime}-1}$ is in $\mathcal{A}$ . Let ${\sigma}^{d-1}_{l^{\prime}-1}$ be the $(d-1)$ -face shared by ${\sigma}^{d}_{l^{\prime}-1}$ and ${\sigma}^{d}_{l^{\prime}}$ , we claim that ${\sigma}^{d-1}_{l^{\prime}-1}\in\mathcal{M}$ . If ${\sigma}^{d}_{l^{\prime}}\in\mathcal{M}$ , then it is obvious that ${\sigma}^{d-1}_{l^{\prime}-1}\in\mathcal{M}$ . If ${\sigma}^{d}_{l^{\prime}}\not\in\mathcal{M}$ , then it is also true that ${\sigma}^{d-1}_{l^{\prime}-1}\in\mathcal{M}$ because otherwise the interiors of ${\sigma}^{d}_{l^{\prime}-1}$ and ${\sigma}^{d}_{l^{\prime}}$ would be connected in ${}^{d+1}\smallsetminus|\mathcal{M}|$ . Around the neighborhood of ${\sigma}^{d-1}_{l^{\prime}-1}$ during the void boundary reconstruction for $\widetilde{K}^{\prime}$ , any two paired oriented simplices from $\vec{\zeta}_{j}$ enclose a region residing in $\mathcal{A}$ . Because of the nature of the pairing, ${\sigma}^{d}_{l^{\prime}-1}$ cannot be contained in any of the regions enclosed by the paired oriented simplices from $\vec{\zeta}_{j}$ . Since $\vec{\zeta}_{j}$ is the boundary of the void $\mathcal{A}$ of ${}^{d+1}\smallsetminus|\mathcal{M}|$ , all other regions in the neighborhood of ${\sigma}^{d-1}_{l^{\prime}-1}$ must not be in $\mathcal{A}$ . This implies that ${\sigma}^{d}_{l^{\prime}-1}$ is not in $\mathcal{A}$ , which is a contradiction. ∎

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] James W. Alexander. A proof and extension of the Jordan-Brouwer separation theorem. Transactions of the American Mathematical Society , 23(4):333–349, 1922.
2[2] Ann S. Almgren, John B. Bell, Mike J. Lijewski, Zarija Lukić, and Ethan Van Andel. Nyx: A massively parallel AMR code for computational cosmology. The Astrophysical Journal , 765(1):39, feb 2013.
3[3] Glencora Borradaile, Erin Wolf Chambers, Kyle Fox, and Amir Nayyeriy. Minimum cycle and homology bases of surface-embedded graphs. Journal of Computational Geometry , 8(2), 2017.
4[4] Erin W. Chambers, Jeff Erickson, and Amir Nayyeri. Minimum cuts and shortest homologous cycles. In Proceedings of the twenty-fifth annual symposium on Computational geometry , pages 377–385. ACM, 2009.
5[5] Frédéric Chazal, David Cohen-Steiner, Marc Glisse, Leonidas J. Guibas, and Steve Y. Oudot. Proximity of persistence modules and their diagrams. In Proceedings of the twenty-fifth annual symposium on Computational geometry , pages 237–246. ACM, 2009.
6[6] Frédéric Chazal, Vin De Silva, Marc Glisse, and Steve Oudot. The structure and stability of persistence modules . Springer, 2016.
7[7] Chao Chen and Daniel Freedman. Measuring and computing natural generators for homology groups. Computational Geometry , 43(2):169–181, 2010.
8[8] Chao Chen and Daniel Freedman. Hardness results for homology localization. Discrete & Computational Geometry , 45(3):425–448, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Computing Minimal Persistent Cycles:

Abstract

1 Introduction

Definition 1.1**.**

Main contributions.

1.1 Related works

Minimal cycles for homology groups.

Persistent cycle.

2 Preliminaries

Simplicial complex.

Definition 2.1** (Oriented simplex [23]).**

Remark 2.1**.**

Filtration.

Simplicial homology.

Definition 2.2** (qqq-weighted).**

Definition 2.3** (qqq-connected).**

Remark 2.2**.**

Definition 2.4** (qqq-connected cycle).**

Persistent homology.

The persistent cycle problems.

Problem 2.1** (PCYC-FINd).**

Problem 2.2** (PCYC-INFd).**

Remark 2.3**.**

Undirected flow network.

3 Minimal persistent ddd-cycles of finite intervals for weak (d+1)({d+1})(d+1)-pseudomanifolds

Complexity.

3.1 The bijection θ\bm{\theta}θ

3.2 Algorithm correctness

Proposition 3.1**.**

Proof.

Proposition 3.2**.**

Proof.

Proposition 3.3**.**

Proof.

Theorem 3.1**.**

Proof.

4 Minimal persistent ddd-cycles of infinite intervals for weak (d+1)({d+1})(d+1)-pseudomanifolds embedded in d+1

Complexity.

4.1 Void boundary reconstruction

Definition 4.1** (Natural orientation [22]).**

Definition 4.2** (Boundary of void).**

Remark 4.1**.**

Theorem 4.1**.**

Proof.

4.2 Algorithm correctness

Proposition 4.1**.**

Proof.

Proposition 4.2**.**

Proof.

Proposition 4.3**.**

Proof.

Proposition 4.4**.**

Proof.

Theorem 4.2**.**

Proof.

5 Hardness for general complexes

5.1 Suspension operator

Definition 5.1** (Suspension [20]).**

Remark 5.1**.**

Proposition 5.1**.**

Proof.

Proposition 5.2**.**

Proof.

Proposition 5.3**.**

Proof.

5.2 Hardness for finite intervals

Proposition 5.4**.**

Proof.

Proposition 5.5**.**

Theorem 5.1**.**

5.3 Hardness for infinite intervals

Problem 5.1** (NR-CODE).**

Remark 5.2**.**

Theorem 5.2**.**

Definition 1.1.

Definition 2.1 (Oriented simplex [23]).

Remark 2.1.

Definition 2.2 ( $q$ -weighted).

Definition 2.3 ( $q$ -connected).

Remark 2.2.

Definition 2.4 ( $q$ -connected cycle).

Problem 2.1 (PCYC-FINd).

Problem 2.2 (PCYC-INFd).

Remark 2.3.

3 Minimal persistent $d$ -cycles of finite intervals for weak $({d+1})$ -pseudomanifolds

3.1 The bijection $\bm{\theta}$

Proposition 3.1.

Proposition 3.2.

Proposition 3.3.

Theorem 3.1.

4 Minimal persistent $d$ -cycles of infinite intervals for weak $({d+1})$ -pseudomanifolds embedded in d+1

Definition 4.1 (Natural orientation [22]).

Definition 4.2 (Boundary of void).

Remark 4.1.

Theorem 4.1.

Proposition 4.1.

Proposition 4.2.

Proposition 4.3.

Proposition 4.4.

Theorem 4.2.

Definition 5.1 (Suspension [20]).

Remark 5.1.

Proposition 5.1.

Proposition 5.2.

Proposition 5.3.

Proposition 5.4.

Proposition 5.5.

Theorem 5.1.

Problem 5.1 (NR-CODE).

Remark 5.2.

Theorem 5.2.

Theorem 5.3.

Definition A.1 (Pseudomanifold).

Remark A.1.

Theorem A.1.