On limit theorems for persistent Betti numbers from dependent data

Johannes Krebs

arXiv:1905.04045·math.PR·March 2, 2021

On limit theorems for persistent Betti numbers from dependent data

Johannes Krebs

PDF

TL;DR

This paper establishes limit theorems for persistent Betti numbers derived from dependent time series and random fields, extending previous results beyond independent or stationary point process data.

Contribution

It provides the first general limit theorems for persistent Betti numbers in dependent data settings, broadening the applicability of topological data analysis.

Findings

01

Derived limit theorems for persistent Betti numbers under dependence

02

Extended convergence results to time series and random fields

03

Applicable to a wide range of dependence structures

Abstract

We study persistent Betti numbers and persistence diagrams obtained from time series and random fields. It is well known that the persistent Betti function is an efficient descriptor of the topology of a point cloud. So far, convergence results for the $(r, s)$ -persistent Betti number of the $q$ th homology group, $β_{q}^{r, s}$ , were mainly considered for finite-dimensional point cloud data obtained from i.i.d. observations or stationary point processes such as a Poisson process. In this article, we extend these considerations. We derive limit theorems for the pointwise convergence of persistent Betti numbers $β_{q}^{r, s}$ in the critical regime under quite general dependence settings.

Equations369

n \to \infty lim n^{- 1} E [β_{q}^{r, s} (K (n^{1/ p} X_{n}))] = E [\hat{b}_{q} (κ (X_{t})^{1/ p} (r, s))], \forall 0 \leq q \leq p - 1, \forall 0 \leq r \leq s < \infty,

n \to \infty lim n^{- 1} E [β_{q}^{r, s} (K (n^{1/ p} X_{n}))] = E [\hat{b}_{q} (κ (X_{t})^{1/ p} (r, s))], \forall 0 \leq q \leq p - 1, \forall 0 \leq r \leq s < \infty,

\int_{E} f d ν_{n} \to \int_{E} f d ν, \forall f \in C_{c} (E),

\int_{E} f d ν_{n} \to \int_{E} f d ν, \forall f \in C_{c} (E),

C (X, r)

C (X, r)

R (X, r)

β_{q} (K) = dim (Z_{q} (K) / B_{q} (K)) .

β_{q} (K) = dim (Z_{q} (K) / B_{q} (K)) .

H_{q} (K (r)) ↪ H_{q} (K (s)), x + B_{q} (K (r)) \mapsto x + B_{q} (K (s)) .

H_{q} (K (r)) ↪ H_{q} (K (s)), x + B_{q} (K (r)) \mapsto x + B_{q} (K (s)) .

H_{q}^{r, s} (K) = Z_{q} (K (r)) / (B_{q} (K (s)) \cap Z_{q} (K (r))), r \leq s .

H_{q}^{r, s} (K) = Z_{q} (K (r)) / (B_{q} (K (s)) \cap Z_{q} (K (r))), r \leq s .

β_{q}^{r, s} (K) : = dim H_{q}^{r, s} (K)

β_{q}^{r, s} (K) : = dim H_{q}^{r, s} (K)

ξ_{q} (K) = (b_{i}, d_{i}) \in D_{q} (K) \sum δ_{(b_{i}, d_{i})} .

ξ_{q} (K) = (b_{i}, d_{i}) \in D_{q} (K) \sum δ_{(b_{i}, d_{i})} .

ξ_{q} (K) ([0, r] \times (s, \infty]) = β_{q}^{r, s} (K) .

ξ_{q} (K) ([0, r] \times (s, \infty]) = β_{q}^{r, s} (K) .

κ, f_{X_{t} ∣ X_{1}, \dots, X_{t - 1}} \leq f^{*} and f_{X_{v_{1}}, \dots, X_{v_{ℓ}} ∣ X_{t}} \leq f^{*}

κ, f_{X_{t} ∣ X_{1}, \dots, X_{t - 1}} \leq f^{*} and f_{X_{v_{1}}, \dots, X_{v_{ℓ}} ∣ X_{t}} \leq f^{*}

(i)

(i)

Z^{'}_{j}^{(z_{1}, \dots, z_{i}, z_{i}^{'})} = z_{j} for all j \in {1, \dots, i - 1} and Z^{'}_{i}^{(z_{1}, \dots, z_{i}, z_{i}^{'})} = z_{i}^{'} .

(ii)

\displaystyle\big{(}{Z^{\prime}}_{i+1}^{(z_{1},\ldots,z_{i},z^{\prime}_{i})},\ldots,{Z^{\prime}}_{N}^{(z_{1},\ldots,z_{i},z^{\prime}_{i})}\big{)}\sim\mathcal{L}\big{(}Z_{i+1}.\ldots,Z_{N}\,|\,Z_{1}=z_{1},\ldots,Z_{i-1}=z_{i-1},Z_{i}=z^{\prime}_{i}\big{)}.

(iii)

μ_{i} (A) = \int_{Λ_{1} \times \dots \times Λ_{i - 1}} P_{(Z_{1}, \dots, Z_{i - 1})} (d (z_{1}, \dots, z_{i - 1})) \int_{Λ_{i}} \mathbbm M_{Z_{i} ∣ (Z_{1}, \dots, Z_{i - 1})} ((z_{1}, \dots, z_{i - 1}), d z_{i}) \int_{Λ_{i}} \mathbbm M_{Z_{i} ∣ (Z_{1}, \dots, Z_{i - 1})} ((z_{1}, \dots, z_{i - 1}), d z_{i}^{'}) \mathds 1 {A} (z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'}) .

μ_{i} (A) = \int_{Λ_{1} \times \dots \times Λ_{i - 1}} P_{(Z_{1}, \dots, Z_{i - 1})} (d (z_{1}, \dots, z_{i - 1})) \int_{Λ_{i}} \mathbbm M_{Z_{i} ∣ (Z_{1}, \dots, Z_{i - 1})} ((z_{1}, \dots, z_{i - 1}), d z_{i}) \int_{Λ_{i}} \mathbbm M_{Z_{i} ∣ (Z_{1}, \dots, Z_{i - 1})} ((z_{1}, \dots, z_{i - 1}), d z_{i}^{'}) \mathds 1 {A} (z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'}) .

Γ_{j, i} = 0, Γ_{i, j} = ess sup P (Z_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})} \neq = Z^{'}_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})}), 1 \leq i < j \leq N,

Γ_{j, i} = 0, Γ_{i, j} = ess sup P (Z_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})} \neq = Z^{'}_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})}), 1 \leq i < j \leq N,

Γ_{i, j} \leq sup P (Z_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})} \neq = Z^{'}_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})})

Γ_{i, j} \leq sup P (Z_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})} \neq = Z^{'}_{j}^{(z_{1}, \dots, z_{i - 1}, z_{i}, z_{i}^{'})})

j = 1 \sum n Γ_{i, j}^{(n)} = j = i \sum n Γ_{i, j}^{(n)} = j = 1 \sum n - i + 1 Γ_{j, n - i + 1}^{(n)} = j = 1 \sum n Γ_{j, n - i + 1}^{(n)} .

j = 1 \sum n Γ_{i, j}^{(n)} = j = i \sum n Γ_{i, j}^{(n)} = j = 1 \sum n - i + 1 Γ_{j, n - i + 1}^{(n)} = j = 1 \sum n Γ_{j, n - i + 1}^{(n)} .

γ_{\infty} : = n \in N sup ∥ Γ^{(n)} ∥_{\infty} < \infty.

γ_{\infty} : = n \in N sup ∥ Γ^{(n)} ∥_{\infty} < \infty.

P (X_{j}^{(x_{1}, \dots, x_{i}, x_{i}^{'})} \neq = X_{j}^{' (x_{1}, \dots, x_{i}, x_{i}^{'})})

P (X_{j}^{(x_{1}, \dots, x_{i}, x_{i}^{'})} \neq = X_{j}^{' (x_{1}, \dots, x_{i}, x_{i}^{'})})

\leq d_{T V} (L ((X^{(x_{1}, \dots, x_{i}, x_{i}^{'})})_{j}^{n}), L ((X^{' (x_{1}, \dots, x_{i}, x_{i}^{'})})_{j}^{n}))

= d_{T V} (L (X_{j}^{n} ∣ X_{1}^{i - 1} = x_{1}^{i - 1}, X_{i} = x_{i}), L (X_{j}^{n} ∣ X_{1}^{i - 1} = x_{1}^{i - 1}, X_{i} = x_{i}^{'})) .

P (X_{j}^{n} \in A ∣ X_{1}^{i} = x_{1}^{i})

P (X_{j}^{n} \in A ∣ X_{1}^{i} = x_{1}^{i})

d_{T V} (L (Z_{j - τ_{m - 1}} ∣ Z_{i} = z_{i}), L (Z_{j - τ_{m - 1}} ∣ Z_{i} = z_{i}^{'})) .

d_{T V} (L (Z_{j - τ_{m - 1}} ∣ Z_{i} = z_{i}), L (Z_{j - τ_{m - 1}} ∣ Z_{i} = z_{i}^{'})) .

d_{TV}\Big{(}\mathcal{L}(Z_{t}|Z_{0}=z),\mathcal{L}(Z_{t})\Big{)}\leq R\rho^{t}.

d_{TV}\Big{(}\mathcal{L}(Z_{t}|Z_{0}=z),\mathcal{L}(Z_{t})\Big{)}\leq R\rho^{t}.

Γ_{i, \cdot}^{(n)} \leq (1, \dots, 1, 1, 1 \land (2 R ρ, 1 \land (2 R ρ^{2}), \dots, 1 \land (2 R ρ^{n - 1 - τ_{m - 1}})) .

Γ_{i, \cdot}^{(n)} \leq (1, \dots, 1, 1, 1 \land (2 R ρ, 1 \land (2 R ρ^{2}), \dots, 1 \land (2 R ρ^{n - 1 - τ_{m - 1}})) .

t_{mix}=\min\Big{(}t:d_{TV}(\mathcal{L}(Z_{t}|Z_{0}=z),\mathcal{L}(Z_{t}))\leq\frac{1}{4}\Big{)}.

t_{mix}=\min\Big{(}t:d_{TV}(\mathcal{L}(Z_{t}|Z_{0}=z),\mathcal{L}(Z_{t}))\leq\frac{1}{4}\Big{)}.

γ_{\infty} \leq τ_{m - 1} + t_{mi x} \cdot 1 + t_{mi x} \cdot \frac{1}{2} + t_{mi x} \cdot \frac{1}{4} + \dots = τ_{m - 1} + 2 t_{mi x} .

γ_{\infty} \leq τ_{m - 1} + t_{mi x} \cdot 1 + t_{mi x} \cdot \frac{1}{2} + t_{mi x} \cdot \frac{1}{4} + \dots = τ_{m - 1} + 2 t_{mi x} .

n w \in S sup μ (B (w, 2 η_{n}^{- 1} r)) < C_{r} (lo g n)^{α_{1}},

n w \in S sup μ (B (w, 2 η_{n}^{- 1} r)) < C_{r} (lo g n)^{α_{1}},

lo g N (η_{n}^{- 1} r, S, d) \leq C_{r} (lo g n)^{α_{2}}

∥ x ∥_{α} = k \leq \underline{α} max t \in [0, 1] sup ∣ D^{k} x (t) ∣ + s, t \in (0, 1), s \neq = t sup \frac{∣ D ^{\underline{α}} x ( t ) - D ^{\underline{α}} x ( s ) ∣}{∣ s - t ∣ ^{α - \underline{α}}} .

∥ x ∥_{α} = k \leq \underline{α} max t \in [0, 1] sup ∣ D^{k} x (t) ∣ + s, t \in (0, 1), s \neq = t sup \frac{∣ D ^{\underline{α}} x ( t ) - D ^{\underline{α}} x ( s ) ∣}{∣ s - t ∣ ^{α - \underline{α}}} .

lo g N (ε, C_{1}^{α} ([0, 1]), ∥ \cdot ∥_{\infty}) \leq c ε^{- 1/ α},

lo g N (ε, C_{1}^{α} ([0, 1]), ∥ \cdot ∥_{\infty}) \leq c ε^{- 1/ α},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

On limit theorems for persistent Betti numbers from dependent data

Johannes Krebs

[email protected]

Abstract

We study persistent Betti numbers and persistence diagrams obtained from time series and random fields. It is well known that the persistent Betti function is an efficient descriptor of the topology of a point cloud. So far, convergence results for the $(r,s)$ -persistent Betti number of the $q$ th homology group, $\beta^{r,s}_{q}$ , were mainly considered for finite-dimensional point cloud data obtained from i.i.d. observations or stationary point processes such as a Poisson process. In this article, we extend these considerations. We derive limit theorems for the pointwise convergence of persistent Betti numbers $\beta^{r,s}_{q}$ in the critical regime under quite general dependence settings.

keywords:

Critical regime , Dependent data , Functional data , Limit theorems , Markov Chains , Marton Coupling , Persistent Betti numbers , Persistence diagrams , Point processes , Time series , Topological data analysis , Random fields , Random geometric complexes.

MSC:

[2010] Primary: 60D05 , 60G55 , Secondary: 60F10 , 37M10 , 60G60.

††journal: arXiv.org

\stackMath

\affiliation

[1] organization=Institute for Applied Mathematics, Heidelberg University,addressline=Im Neuenheimer Feld 205, city=Heidelberg, postcode=69120, country=Germany

Topological data analysis (TDA) is a comparably young field in (applied) mathematics at the intersection between computational geometry, probability theory, mathematical statistics and machine learning. Seminal papers which popularized the ideas of TDA are Edelsbrunner et al. (2000), Zomorodian and Carlsson (2005), Carlsson (2009). An introduction offers the monograph of Boissonnat et al. (2018). Statistical aspects of TDA are discussed in the surveys of Chazal and Michel (2017) and Bobrowski and Kahle (2018).

In this article, we will focus on a special topic in persistent homology, which itself is the major branch in TDA: We study the large sample behavior of persistent Betti numbers and the corresponding persistence diagram obtained from time series or random fields.

So far, the literature has focused on point cloud data obtained from two major sources. On the one hand, there are various limit theorems for persistent Betti numbers obtained from stationary point processes as a rather general class, a prominent example here is the homogeneous Poisson process. On the other hand, the binomial process (a sample of i.i.d. data) is intensely studied, too.

In early contributions, Kahle (2011) investigates the asymptotic behavior of Betti numbers in the sub-, supercritical and critical regime. Extensions are given by Kahle and Meckes (2013) and Yogeshwaran and Adler (2015). From the above mentioned three asymptotic regimes, the critical (or thermodynamic) regime certainly gets the most attention and in the following, we will also limit the discussion in the introduction to this case.

One of the first major contributions which studies large deviation inequalities and central limit theorems for the Poisson and binomial sampling scheme in the critical regime is the work of Yogeshwaran et al. (2017). Extensions to persistent Betti numbers and persistence diagrams are given in Hiraoka et al. (2018). Trinh (2019) provides an abstract result for the asymptotic normality of Betti numbers. Krebs and Polonik (2019) study the stabilizing properties of and related central limit theorems from Betti numbers built from non homogeneous Poisson or binomial processes. Strong laws of large numbers for Betti numbers obtained from the Poisson or the binomial process on general manifolds are considered in Goel et al. (2018). Other recent contributions which also discuss limiting results for Betti numbers are Owada (2018), Owada and Thomas (2020). Divol and Polonik (2019) study the limiting behavior of the persistence diagram.

In the context of time series, the behavior of Betti numbers has been mainly investigated in applications. Islambekov et al. (2020) combine the TDA methodology with classical methods for change point detection. Classification problems for time series using methods from TDA are considered in Seversky et al. (2016) and in Umeda (2017). The applications of TDA to networks obtained from financial data are studied in Gidea (2017) and Gidea and Katz (2018); here the methods of TDA measure a type of high-dimensional and time-dependent correlation in the network.

The persistence landscape (Bubenik (2015)) is an efficient summary statistic of the persistence diagram and is quite popular in machine learning; we also refer to Chazal et al. (2014) and Kim et al. (2020) for related contributions.

The aim of this paper, is to provide two advances in the study of persistent Betti numbers in the context of time series and random fields. On the one hand, we study the large sample behavior of the expectation of persistent Betti numbers obtained from time series and random fields. More precisely, for the time series case, let $X=(X_{t}:t\in\mathbb{Z})\subseteq[0,1]^{p}$ be a stationary Markov chain of order $m$ (w.r.t. its natural filtration) with a continuous and strictly positive joint density $g$ of $(X_{1},\ldots,X_{m+1})$ . Write $\kappa$ for the marginal density of each $X_{t}$ . It is well-known that for an $n$ -binomial process $\mathbb{X}^{*}_{n}$ , which consists of $n$ i.i.d. observations $X^{*}_{t}$ with marginal density $\kappa$ , the limit of $n^{-1}\mathbb{E}\left[\beta^{r,s}_{q}(\mathcal{K}(n^{1/p}\mathbb{X}^{*}_{n}))\right]$ exists. Using the nearly additive properties of persistent Betti numbers, we show that Markov chains converge to the same limit. In fact, denoting $\mathcal{K}$ the Čech or Vietoris-Rips filtration, we have

[TABLE]

and where $\hat{b}_{q}(r,s)$ is the limit of $n^{-1}\mathbb{E}\left[\beta^{r,s}_{q}(\mathcal{K}(n^{1/p}\mathbb{Y}^{*}_{n}))\right]$ for an $n$ -binomial process $\mathbb{Y}^{*}_{n}$ on $[0,1]^{p}$ with uniform density $\kappa$ . We also prove a related strong law of large numbers. Doing so, we can also conclude convergence results for persistence diagrams. Moreover, we establish similar convergence results for stationary random fields.

On the other hand, we establish an exponential inequality and give strong laws of large numbers for persistent Betti numbers, which are not exclusively derived from point clouds on $\mathbb{R}^{p}$ . Instead, we also allow for functional data as a potential data source. The presented exponential inequality relies on the concept of the Marton coupling, see Marton (2003). Marton couplings have also been successfully used in the past to derive concentration inequalities of the McDiarmid-type, see also Samson (2000) and Paulin (2015).

The remainder of this paper is organized as follows. In Section 1, we give the notation used throughout the manuscript. Furthermore, we outline the basic concept of persistent homology. In Section 2, we describe the dependence structure assumed for our time series model and present our main results related to the time series case. In Section 3, we study the extension of our results to random fields. The proofs are contained in Section 4; further deferred calculations are contained in A.

1 Notation

The purpose of this section is not to make the paper self-contained which is impossible. The aim is rather to allow the reader from other areas to become familiar with the vocabulary and to understand the basic concepts of topological data analysis.

We begin with some general notation. We write $\mathbb{N}$ for the natural numbers starting at 1; if we include 0, we write $\mathbb{N}_{0}$ . We write $\#A$ for the cardinality of a countable set $A$ . We work on a separable Banach space $S$ . We write $d$ for the metric which is obtained from the norm on $S$ and $\mathfrak{S}$ for the Borel- $\sigma$ -field on $S$ . $(S,\mathfrak{S})$ is equipped with the measure $\mu$ . The measure $\mu$ is non-atomic and $\sigma$ -finite. Then we write $B(x,r)=\{y\in S:d(x,y)\leq r\}$ for the closed $d$ -ball around $x$ . The diameter of a set $A\subseteq S$ is $\operatorname{diam}(A)=\sup\{d(x,y):x,y\in S\}$ . Let $A\in\mathcal{B}(\mathbb{R}^{p})$ and write $|A|$ for its $p$ -dimensional Lebesgue measure as well as $A^{(\varepsilon)}=\{x\in S:d(x,A)\leq\varepsilon\}$ for its $\varepsilon$ -offset.

Write $\otimes_{i=1}^{\ell}\mu=\mu^{\otimes\ell}$ for the $\ell$ -fold product measure on the product space $(S^{\ell},\mathfrak{S}^{\otimes\ell})$ . The essential supremum of a real-valued function $f$ defined on $(S,\mathfrak{S},\mu)$ is abbreviated by $\|f\|_{\infty,\mu}$ . We write simply $\|f\|_{\infty}$ for the supremum norm of a continuous function on $\mathbb{R}^{p}$ .

Let $(\Omega,\mathcal{A},\mathbb{P})$ be a probability space and let $(T,\mathfrak{T}),(U,\mathfrak{U})$ be two state spaces. Consider two random variables $X\colon\Omega\to T$ and $Y\colon\Omega\to U$ . Assume that $X$ admits a conditional distribution given $Y$ . We write $\mathbb{M}_{X|Y}\colon U\times\mathfrak{T}\to[0,1]$ for this distribution.

In order to abbreviate a subset of an ordered sample $(x_{1},\ldots,x_{n})$ , say, we write $x_{a}^{b}$ for the subset $(x_{a},x_{a+1},\ldots,x_{b})$ , $1\leq a\leq b\leq n$ . Given a time series $X_{1}^{n}=(X_{1},\ldots,X_{n})\subseteq S$ , we write $\mathbb{X}_{n}=\{X_{1},\ldots,X_{n}\}$ for the associated point cloud which has no ordering.

Given a metric space $(E,d)$ and Radon measures $\nu,\nu_{1},\nu_{2},\ldots$ , we say that $(\nu_{n})_{n\in\mathbb{N}}$ converges vaguely to $\nu$ if

[TABLE]

where $C_{c}(E)$ is the class of all continuous functions on $E$ with compact support. We indicate this writing $\nu_{n}\overset{v}{\to}\nu$ .

We construct the filtration from the Čech or the Vietoris-Rips complex. If $\mathbb{X}$ is a finite subset of $S$ and $r\geq 0$ , these complexes are defined by

[TABLE]

In the following, the writing $K$ refers to both the Čech and the Vietoris-Rips complex. If we want additionally to precise the point cloud or the filtration parameter $r$ , we write $K(\mathbb{X},r)$ or $K(r)$ . The corresponding filtration is given by $\mathcal{K}=\mathcal{K}(\mathbb{X})=(K(\mathbb{X},r):0\leq r<\infty)$ . It is a direct consequence of the homogeneity of $d$ that for $\eta>0$ the complexes $K(\eta\mathbb{X},r)$ and $K(\mathbb{X},\eta^{-1}r)$ are combinatorially isomorphic.

The dimension of a simplex $\sigma\in K$ is its cardinality minus 1. If $\sigma$ has dimension $q$ , it is a $q$ -simplex. Write $K_{q}$ for the set of $q$ -simplices in a complex $K$ . Moreover, for a measurable set $A\in\mathfrak{S}$ and a point cloud $\mathbb{X}\subseteq S$ , we write $K_{q}(\mathbb{X},r;A)$ for the number if $q$ -simplices in $K(\mathbb{X},r)$ with at least one vertex in $A$ .

We use the field $\mathbb{F}_{2}$ to build the homology groups and the Betti numbers of a simplicial complex $K$ . Define for $q\in\mathbb{N}_{0}$ the space of $q$ -chains $C_{q}(K)$ to be the free Abelian group generated by the $q$ -simplices in $K$ . So the elements of $C_{q}(K)$ are formal sums (“ $q$ -chains”) $c=\sum_{i}a_{i}\sigma_{i}$ , $a_{i}\in\{0,1\}$ , $\sigma_{i}\in K$ a $q$ -simplex. The sum of two $q$ -chains $c_{1}+c_{2}$ is their symmetric difference because the coefficients $a_{i}$ are in $\mathbb{F}_{2}$ .

The boundary operator $\partial_{q}$ relates $C_{q}(K)$ and $C_{q-1}(K)$ by mapping a $q$ -simplex $\{x_{0},x_{1},\ldots,x_{q}\}$ to $\partial_{q}(\{x_{0},x_{1},\ldots,x_{q}\})\coloneqq\sum_{i=0}^{q}(-1)^{i+1}\{x_{0},\ldots,x_{i-1},x_{i+1},\ldots,x_{q}\}$ . For a general chain $c=\sum_{i}a_{i}\sigma_{i}\in C_{q}(K)$ , the boundary operator is then $\partial_{q}(c)=\sum_{i}a_{i}\partial_{q}(\sigma_{i})$ .

The boundary operator satisfies $\partial_{q}\circ\partial_{q+1}\equiv 0$ (“a boundary has no boundary”). This property enables the construction of homology groups of $K$ . Let $Z_{q}(K)=\operatorname{ker}(\partial_{q})$ be the subspace of $C_{q}(K)$ consisting of the $q$ -cycles, those elements whose boundary is 0 under $\partial_{q}$ . Let $B_{q}(K)=\operatorname{im}(\partial_{q+1})$ be the subspace of $C_{q}(K)$ that consists of the boundaries of elements in $C_{q+1}(K)$ (which lie in $C_{q}(K)$ ).

The homology groups are defined as $H_{q}(K)\coloneqq Z_{q}(K)/B_{q}(K)$ , the cycles $Z_{q}$ modulo the boundaries $B_{q}$ in dimension $q$ . Loosely speaking, the elements in $H_{q}(K)$ represent “holes” in the simplicial complex $K$ . These are closed loops, voids or cavities, whose interior cannot be filled by other elements of the complex. Similarly as $Z_{q}(K)$ and $B_{q}(K)$ , $H_{q}(K)$ is a vector space.

The $q$ th Betti number of a simplicial complex $K$ is the dimension of $H_{q}(K)$ , viz.,

[TABLE]

So, $\beta_{q}(K)$ is the number of $q$ -dimensional holes in $K$ . $H_{q}(K)$ and $\beta_{q}(K)$ provide topological information from a single simplicial complex. Given a filtration $\mathcal{K}=(K(r):0\leq r<\infty)$ , the persistent homology provides more topological details. The natural inclusions $Z_{q}(K(r))\subseteq Z_{q}(K(s))$ and $B_{q}(K(r))\subseteq B_{q}(K(s))$ for $r\leq s$ , provide the inclusion map

[TABLE]

We define the persistent homology groups of the filtration $\mathcal{K}=(K(r):0\leq r<\infty)$ by

[TABLE]

Loosely speaking, nonzero elements in $H_{q}^{r,s}(\mathcal{K})$ represent topological features born before or at time $r$ and which persist until a time greater than $s$ . The dimension of $H_{q}^{r,s}(\mathcal{K})$ , i.e., the number of these features, is the persistent Betti number.

Definition 1.1 (Persistent Betti number).

Let $\mathcal{K}$ be a filtration and let $0\leq r\leq s<\infty$ . The persistent Betti number of dimension $q\in\mathbb{N}_{0}$ for the parameter pair $(r,s)$ is

[TABLE]

At this point there is an important difference between the Čech and Vietoris-Rips complex in the special case where we consider the Euclidean space $\mathbb{R}^{p}$ . While in the Čech complex the homology degree is bounded by $p-1$ , the Vietoris-Rips complex can have nontrivial cycles of every possible dimension (see also Bobrowski and Kahle (2018)).

The $q$ th persistence diagram summarizes the evolution of the homology groups; it is a multiset of points in $\Delta=\{(b,d):0\leq b<d\leq\infty\}$ . Each point $(b,d)$ in the $q$ th persistence diagram corresponds to a $q$ -dimensional hole (feature) in the filtration $\mathcal{K}$ which is born (appears for the first time) at time $b$ and dies (disappears in the filtration) at time $d$ . The lifetime of this feature $d-b$ is called the persistence. $d=\infty$ means that the feature has an infinite lifetime. Persistence diagrams exist given mild assumptions on the filtration, see Chazal et al. (2016). Also in the case of a random point cloud, e.g., an i.i.d. sample, the persistence diagram can inherit certain smoothness properties from the point cloud, see Chazal and Divol (2018).

Let $\mathfrak{D}_{q}(\mathcal{K})=\{(b_{i},d_{i})\in\Delta:i=1,\ldots,n_{q}\}$ be the $q$ th persistence diagram given as a multiset of points. Then in the following we understand $\mathfrak{D}_{q}(\mathcal{K})$ as a counting measure on $\Delta$ defined as

[TABLE]

$\xi_{q}(\mathcal{K})$ is related to the $q$ th persistent Betti number as follows

[TABLE]

This means $\beta^{r,s}_{q}$ counts the number of $q$ -dimensional features in the upper left rectangular area with vertex $(r,s)$ in the persistence diagram. So given $r<<s$ , the persistent Betti number $\beta^{r,s}_{q}$ represents the number of $q$ -dimensional features with a high persistence. It is clear that the values of the $q$ th persistence diagram $\xi_{q}(\mathcal{K})$ also describe the persistent Betti function $\{\beta^{r,s}_{q}(\mathcal{K}):0\leq r\leq s<\infty\}$ completely.

2 Persistent Betti numbers obtained from time series

This section contains the main results of this paper. We derive an exponential inequality for persistent Betti numbers from a rather general class of stochastic processes, which also applies to functional data and random fields after a renumeration of the coordinates, we will see this below. For the special case of an $\mathbb{R}^{p}$ -valued time series, we also give the large sample behavior of the expectation and study the vague convergence of the corresponding persistence diagram.

2.1 The data generating process

Consider a stationary process $X=(X_{t}:t\in\mathbb{Z})$ defined on $(\Omega,\mathcal{A},\mathbb{P})$ and taking values in $S$ . (A special case would be $\mathbb{R}^{p}$ equipped with the Borel- $\sigma$ -field $\mathcal{B}(\mathbb{R}^{p})$ and the Lebesgue measure.)

The observations $X_{t}$ admit a density $\kappa$ w.r.t. $\mu$ . Furthermore, the observations admit conditional densities as follows. The distribution of $X_{t}$ conditional on $X_{1},\ldots,X_{t-1}$ , $\mathcal{L}(X_{t}|X_{1},\ldots,X_{t-1})$ , admits a density $f_{X_{t}\,|\,X_{1},\ldots,X_{t-1}}$ for each $t\in\mathbb{N}$ . Also $\mathcal{L}(X_{v_{1}},\ldots,X_{v_{\ell}}\,|\,X_{t})$ admits a density $f_{X_{v_{1}},\ldots,X_{v_{\ell}}\,|\,X_{t}}$ for all $t,\ell\in\mathbb{N}$ and all finite sets $\{v_{1},\ldots,v_{\ell}\}\subseteq\mathbb{N}$ , which do not contain $t$ . Moreover, there is a $f^{*}<\infty$ such that uniformly

[TABLE]

for all $t,\ell\in\mathbb{N}$ and sets $\{v_{1},\ldots,v_{\ell}\}\subseteq\mathbb{N}$ which do contain $t$ . These requirements are not restrictive and satisfied for a wide range of stochastic processes.

2.2 Marton couplings as the concept of dependence

We use the concept of Marton couplings to quantify the dependence within the observed data. These couplings were first defined in Marton (2003) and measure the strength of dependence within a collection of random variables by a mixing (or coupling) matrix.

Definition 2.1 (Marton coupling).

Let $N\in\mathbb{N}$ and let $\Lambda_{1},\ldots,\Lambda_{N}$ be Polish. Let $Z=(Z_{1},\ldots,Z_{N})$ be a vector of random variables taking values in $\Lambda=\Lambda_{1}\times\ldots\times\Lambda_{N}$ . A Marton coupling of $Z$ is a set of couplings $(Z^{(z_{1},\ldots,z_{i},z^{\prime}_{i})},Z^{\prime(z_{1},\ldots,z_{i},z^{\prime}_{i})}$ ), for every $i\in\{1,\ldots,N\}$ and every $z_{1}\in\Lambda_{1},\ldots,z_{i},z^{\prime}_{i}\in\Lambda_{i}$ , which satisfies the conditions

[TABLE]

Write $\mathbbm{M}_{Z_{i}|(Z_{1},\ldots,Z_{i-1})}$ for the conditional distribution of $Z_{i}$ given $(Z_{1},\ldots,Z_{i-1})$ for $1\leq i\leq N$ . Construct a measure $\mu_{i}$ on the product space $\Lambda_{1}\times\ldots\times\Lambda_{i-1}\times\Lambda_{i}\times\Lambda_{i}$ , which consists of the joint distribution of $(Z_{1},\ldots,Z_{i-1})$ and the product measure $\mathbbm{M}_{Z_{i}|(Z_{1},\ldots,Z_{i-1})}\otimes\mathbbm{M}_{Z_{i}|(Z_{1},\ldots,Z_{i-1})}$ for a Borel set of $\Lambda_{1}\times\ldots\Lambda_{i-1}\times\Lambda_{i}\times\Lambda_{i}$ as follows:

[TABLE]

Then, we define the mixing matrix $\Gamma\coloneqq(\Gamma_{i,j})_{1\leq i,j\leq N}$ for a Marton coupling of $Z$ as an upper diagonal matrix with $\Gamma_{i,i}=1$ and

[TABLE]

where we compute the essential supremum w.r.t. the measure $\mu_{i}$ . Note that for $1\leq i<j\leq N$ each entry in the mixing matrix is bounded above by

[TABLE]

where the supremum is taken over all $(z_{1},\ldots,z_{i-1},z_{i},z^{\prime}_{i})\in\Lambda_{1}\times\ldots\times\Lambda_{i}\times\Lambda_{i}$ .

We return to the data generating process $X$ . Write $\Gamma^{(n)}$ for the mixing matrix of the sample $X_{1},\ldots,X_{n}$ . As $X$ is stationary, $\Gamma^{(n)}_{i,j}=\Gamma^{(n)}_{i+k,j+k}$ (as long as all indices are between 1 and $n$ ). Consequently, $\Gamma^{(n)}_{i,j}=\Gamma^{(n)}_{n-j+1,n-i+1}$ for the choice $k=n-j-i+1$ . So the summation over all elements in line $i$ is equivalent to the summation over all elements in column $n-i+1$ (and vice versa), viz.,

[TABLE]

In particular, the maximum absolute column sum $\|\Gamma^{(n)}\|_{1}$ equals the maximum absolute row sum $\|\Gamma^{(n)}\|_{\infty}$ . In what follows, we assume that the coefficients of the mixing matrix are at least summable in the sense that

[TABLE]

Consider the spectral norm $\|\Gamma^{(n)}\|$ of the mixing matrix $X$ induced by the Euclidean norm $\|\cdot\|$ on $\mathbb{R}^{N}$ . Using $\|\Gamma^{(n)}\|^{2}\leq\|\Gamma^{(n)}\|_{1}\|\Gamma^{(n)}\|_{\infty}$ , implies that $\Gamma^{(n)}$ is also uniformly bounded in the spectral norm over all $n\in\mathbb{N}$ , i.e., $\sup_{n\in\mathbb{N}}\|\Gamma^{(n)}\|<\infty$ .

The condition on the mixing matrix in (A2) is satisfied for a wide range of stochastic processes. Consider for instance, so-called delay embeddings for time series.

Example 2.2 (Delay embeddings from Markov chains).

Let $Z$ be a stationary, uniformly geometrically ergodic Markov chain in a Polish space $\mathcal{E}$ whose marginal distribution and transition kernel both admit a strictly positive density w.r.t. a reference measure $\mu$ . Construct a process $X$ from $Z$ via a delay embedding, that is, $X_{t}=(Z_{t},Z_{t-\tau_{1}},\ldots,Z_{t-\tau_{m-1}})\in\mathcal{E}^{m}$ , where $\tau_{1}<\ldots<\tau_{m-1}$ are natural numbers. We show that this process $X$ satisfies (A2). We construct a Marton coupling $(X^{(x_{1},\ldots,x_{i},x^{\prime}_{i})},X^{\prime(x_{1},\ldots,x_{i},x^{\prime}_{i})})$ , for every $i\in\{1,\ldots,n\}$ and every $x_{1},\ldots,x_{i-1},x_{i},x^{\prime}_{i}\in\mathcal{E}^{m}$ with Goldstein’s maximal coupling (Proposition A.4).

For every $i$ and all states, Goldstein’s maximal coupling yields two coupled random variables $X^{(x_{1},\ldots,x_{i},x^{\prime}_{i})}$ , $X^{\prime(x_{1},\ldots,x_{i},x^{\prime}_{i})}$ such that (i), (ii) and (iii) from Definition 2.1 are satisfied. By Proposition A.4, the marginals of each coupling satisfy

[TABLE]

Note that the essential supremum of the left-hand side w.r.t. $\mu_{i}$ equals the coefficient $\Gamma^{(n)}_{i,j}$ . Thus, we can easily bound above the norm of the mixing matrix $\Gamma^{(n)}$ with the properties of the Markov chain $Z$ . For simplicity, we use $\Gamma^{(n)}_{i,j}\leq 1$ for $0\leq j-i\leq\tau_{m-1}$ and only consider the asymptotic properties for $j-i>\tau_{m-1}$ . Set $x_{i}=(z_{i},\ldots,z_{i-\tau_{m-1}})^{\prime}$ . We can derive from the Markov property of $Z$ that

[TABLE]

Next, we use the Markov property to see that the total variation distance in (2.2) is determined by the observation $X_{j}$ because $j$ is closest to $i$ . Consequently, if $j-\tau_{m-1}-i\geq 0$ , (2.2) equals

[TABLE]

By assumption, $Z$ is uniformly geometrically ergodic. Hence, there are $R\geq 1$ and $\rho\in(0,1)$ such that uniformly in $z$ and for all $t\in\mathbb{N}$

[TABLE]

So the quantity in (2.3) is at most $1\wedge(2R\rho^{j-\tau_{m-1}-i})$ . In particular, we have for a row of the mixing matrix of $X_{1},\ldots,X_{n}$ the following bound, which implies (A2):

[TABLE]

Consequently, $\gamma_{\infty}\leq(\tau_{m-1}+1-2R)+2R/(1-\rho)$ .

The mixing time of a (uniformly ergodic) Markov chain is defined by

[TABLE]

Hence, using the Markov property, we can also give an upper bound on (2.3) in terms of the mixing time by simply writing $j-\tau_{m-1}-i=kt_{mix}+r$ for $k\in\mathbb{N}_{0}$ and $r\in\{0,\ldots,t_{mix}-1\}$ . Then (2.3) is at most 1 if $k=0$ or if $j-\tau_{m-1}-i<0$ and $(2\frac{1}{4})^{k}$ if $k\geq 1$ . Consequently, one obtains the following upper bound for $\gamma_{\infty}$

[TABLE]

2.3 Covering the state space

As we consider general state spaces $S$ , we work with the following covering condition, which is satisfied in many examples.

Condition 2.3 (Covering condition).

The state space $(S,\mathfrak{S})$ is precompact. Write $\mathscr{N}=\mathscr{N}(r,S,d)$ for the $r$ -covering number of $S$ w.r.t. $d$ , i.e., for each $r>0$ , $S$ admits a covering $\{B(w_{j},r):1\leq j\leq\mathscr{N}\}$ with closed balls w.r.t. the metric $d$ of radius $r$ located at positions $w_{j}$ .

Moreover for all $r\in\mathbb{R}_{+}$ there is a sequence of scaling factors $(\eta_{n}:n\in\mathbb{N})\subseteq\mathbb{R}_{+}$ , $\eta_{n}\to\infty$ , such that

[TABLE]

for some $C_{r},\alpha_{1},\alpha_{2}\in\mathbb{R}$ .

Some discussion on the covering condition is needed. (A4) is clearly needed to bound above the complexity of the underlying metric space $(S,d)$ . Condition (A3) is more delicate as it regulates the ratio between the number of points $n$ and the $\mu$ -volume of the $d$ -ball. Many spaces satisfy this condition. We give here two examples, the finite-dimensional case, i.e., $\mathbb{R}^{p}$ , and the functional case.

Example 2.4 (Coverings for finite-dimensional spaces).

Consider the unit cube $S=[0,1]^{p}$ which is endowed with the $\infty$ -norm and the Lebesgue measure $|\cdot|$ . For each $r>0$ , $[0,1]^{p}$ can be covered with disjoint cubes of side length at most $2r$ , i.e., balls w.r.t. the $\infty$ -norm $B(w_{j},r)=w_{j}+[-r,r]^{p}$ . In that case, one finds with geometric arguments that a ball of radius $r$ at some position $w$ can be covered with at most $2^{p}$ balls of radius $r$ at fixed positions $w_{j}$ .

In this Euclidean setting, three regimes are classically studied for random geometric complexes. In the subcritical regime $\eta_{n}^{-1}n^{1/p}\rightarrow 0$ , i.e., the scaling factors grow faster than $n^{1/p}$ . In the critical regime, this growth is balanced, so that $\eta_{n}^{-1}n^{1/p}\rightarrow\eta^{-1}\in\mathbb{R}_{+}$ . Moreover, $\eta_{n}^{-1}n^{1/p}\rightarrow\infty$ in the supercritical regime.

We study the situation for a point cloud of $n$ points $\mathbb{X}_{n}\subseteq[0,1]^{p}$ obtained from a stationary time series $X_{1},\ldots,X_{n}$ , whose marginals admit a density $\kappa$ w.r.t. the Lebesgue measure. In the subcritical regime, the points of the rescaled point cloud $\eta_{n}\mathbb{X}_{n}$ tend to become more and more isolated as the number of points per volume tends to zero. In the critical regime, the number of points per volume from $\eta_{n}\mathbb{X}_{n}$ tends to a constant. In the supercritical regime, the points from the point cloud $\eta_{n}\mathbb{X}_{n}$ lie increasingly dense.

Clearly, scaling factors which achieve the thermodynamic regime (e.g. $\eta_{n}=n^{1/p}$ ) satisfy the condition from (A3) because $n|B(x,\eta_{n}^{-1}s)|\propto s^{p}$ . The covering number $\mathscr{N}(\eta_{n}^{-1}s,[0,1]^{p},\|\cdot\|_{\infty})$ is proportional to $n$ in this case.

Moreover, note that we can also allow for a slower increase in $\eta_{n}$ which then yields a supercritical regime. For instance, $\eta_{n}\propto n^{1/p}(\log n)^{-\alpha}$ still satisfies (A3) (and also (A4)) for each $\alpha>0$ .

Scaling factors which achieve a subcritical regime satisfy (A3), however, (A4) restricts the growth rate from above; for instance, any polynomial rate $\eta_{n}=n^{\alpha}$ is allowed for (A4).

Example 2.5 (Coverings of functional spaces).

Let $\alpha>0$ . We study the class of all functions $x$ on the unit interval that posses uniformly bounded derivatives on $(0,1)$ up to order $\underline{\alpha}$ (the greatest integer strictly smaller than $\alpha$ ) and whose highest derivatives are Hölder continuous of order $\alpha-\underline{\alpha}$ . Write $D^{k}x$ for the $k$ th derivative of a function $x$ . Define

[TABLE]

Write $C^{\alpha}_{M}([0,1])$ for the set of continuous real-valued functions on $[0,1]$ with $\left\lVert x\right\rVert_{\alpha}\leq M$ . We write $\|\cdot\|_{\infty}$ for the supremum-norm of a real-valued function on $[0,1]$ and denote the corresponding $r$ -neighborhood of $x$ by $B(x,r)$ (other norms are also possible). Then the covering number of $C^{\alpha}_{1}([0,1])$ w.r.t. $\|\cdot\|_{\infty}$ satisfies by Theorem 2.7.1 in van der Vaart and Wellner (1996)

[TABLE]

for a certain constant $c\in\mathbb{R}_{+}$ which is independent of $\varepsilon>0$ .

For each centered Gaussian measure $\mu$ on a real separable Banach space $E$ , there is a unique Hilbert space $H_{\mu}\subseteq E$ such that $\mu$ is determined by considering $(E,H_{\mu})$ as an abstract Wiener space (Cameron-Martin space, see Gross (1970)). Then $H_{\mu}$ is dense in $E$ and we have for $x\in H_{\mu}$

[TABLE]

where $C_{x}\geq 1$ , see Li and Shao (2001) Theorem 3.1. In the following, consider the centered Gaussian measure on the infinitely often differentiable functions determined by the covariance function $k(s,t)=\exp(-(s-t)^{2}/(2\ell^{2}))$ for a given characteristic length-scale $\ell>0$ . In this case, it follows from Li and Shao (2001) Theorem 4.1 and Theorem 4.5 as well as some calculations that

[TABLE]

for all $s$ sufficiently small for certain $K_{1},K_{2}\in\mathbb{R}_{+}$ .

For instance, define the scaling factor by $\eta_{n}=v(\log n)^{\beta}$ for $\beta,v>0$ . Then on the one hand, we obtain $\log\mathscr{N}(\eta_{n}^{-1}s)\leq c(v/s)^{1/\alpha}(\log n)^{\beta/\alpha}$ which shows (A4). On the other hand, relying on (2.5)

[TABLE]

If $\beta>1$ , (A3) is satisfied and we have a functional subcritical regime in that $n\mu(B(x,2\eta_{n}^{-1}s))\to 0$ ( $n\to\infty$ ).

2.4 A concentration inequality for persistent Betti numbers from dependent data

We come to the first main result in this article which is an abstract exponential inequality for a certain class of functionals defined on the point clouds $\mathbb{X}_{n}$ , $n\in\mathbb{N}$ .

Theorem 2.6.

Let the stochastic process $X$ satisfy assumption (A1) and (A2). Moreover, let (A3) and (A4) from Condition 2.3 be satisfied. Let $H$ be a functional defined on all finite subsets of $S$ . Set $H_{n}(\{x_{1},\ldots,x_{n}\})=H(\eta_{n}\{x_{1},\ldots,x_{n}\})$ . The functional satisfies

(1)

Universal bound. There are $c_{1},q\in\mathbb{R}_{+}$ such that for all $n\in\mathbb{N}$ , $1\leq i\leq n$ and for all $x_{1},\ldots,x_{n},y\in S$

[TABLE]

(2)

Exchange-one cost are local. There are $c_{2},r,\widetilde{q}\in\mathbb{R}_{+}$ such that for all $n\in\mathbb{N}$ , $1\leq i\leq n$ and for all $x_{1},\ldots,x_{n},y\in S$

[TABLE]

Set $\gamma=(2a-1)/(2\widetilde{q}+1)$ and let $a>1/2$ . Then for all $n\in\mathbb{N}$ and for all $t>0$

[TABLE]

The persistent Betti function satisfies the conditions of Theorem 2.6, this follows from the Geometric Lemma (Lemma 4.3). More precisely, we have a result as follows.

Theorem 2.7.

Let the regularity conditions from Theorem 2.6 be satisfied. Then for each $q\in\mathbb{N}_{0}$ , for each $(r,s)$ , $r\leq s$ , for $a>1/2$ and $\gamma=(2a-1)/(2q+3)$ such that the persistent Betti number from Definition 1.1 satisfies for each $t>0$

[TABLE]

Hence, there are constants $C_{1},C_{2},C_{3}\in\mathbb{R}_{+}$ such that for each $t>0$ and $n\in\mathbb{N}$

[TABLE]

In particular, $n^{-1}(\beta_{q}^{r,s}(\mathcal{K}(\eta_{n}\mathbb{X}_{n}))-\mathbb{E}[\beta_{q}^{r,s}(\mathcal{K}(\eta_{n}\mathbb{X}_{n}))])\rightarrow 0$ $a.s.$

Yogeshwaran et al. (2017) obtain (among other results) an exponential inequality for Betti numbers computed from an $n$ -binomial process, whose marginals have a continuous and compactly supported density on $\mathbb{R}^{d}$ . Their result, in particular, their rate is very similar to our result.

The abstract concentration inequality in Theorem 2.6 can be considered as a generalization of McDiarmid’s inequality for functionals of dependent random vectors whose martingale differences are not necessarily bounded. Technically it relies on the Marton coupling and an abstract concentration inequality of Chalker et al. (1999) for martingale differences as well as on the observation that point cloud data from the data generating process tends to evenly spread across the state space $S$ . So exchanging one point in the argument of the functional as in (LABEL:E:ExchOneCost) tends to have a much smaller impact than the worst case bound in (2.6).

Many important functionals in stochastic geometry do not possess a deterministic local exchange-one cost function as required in Theorem 2.6. Instead these functionals are often stabilizing (at an exponential rate). We refer to Penrose and Yukich (2001) and Lachièze-Rey et al. (2019) for an introduction and examples of stabilizing functionals. Loosely speaking stabilization implies that the exchange-one cost function is determined by the points in a “small” $r$ -neighborhood of the exchanged points with a high probability. So the abstract result in Theorem 2.6 will also be relevant for such stabilizing functionals because usually these can be truncated in such a way that the error is negligible for large sample sizes and that the exchange-one cost of the truncated functional actually satisfies (LABEL:E:ExchOneCost). Of course, this remark is an outline of how to apply Theorem 2.6 to such functionals and we leave it to future research to prove rigorously this claim.

2.5 Convergence results in Euclidean space

It remains to show that the normalized expectation of the persistent Betti numbers converges to a limit. Here we restrict our considerations to point cloud data on $\mathbb{R}^{p}$ because of the following reason: In order to obtain limit theorems for persistent Betti numbers in the critical regime from dependent data realized on a general measure space such as a manifold or a function space, a possible way would be to first derive the limit of $n^{-1}\mathbb{E}[\beta_{q}^{r,s}(\mathcal{K}(\eta_{n}\mathcal{P}_{n}))]$ for a certain underlying Poisson process $\mathcal{P}_{n}$ on this space (see Last and Penrose (2017) for the notion of a Poisson process on a general space). In this step, the topological properties of the underlying space are of course crucial. Second, one needs to apply a de-Poissonization argument to obtain a limit for the binomial process which treats the situation for an i.i.d. sample. Finally, as we will see from the applied techniques in the proofs, the nearly additive properties of persistent Betti numbers in the critical regime enable us to use certain continuity results and then allow us to conclude the case for dependent data. This entire procedure is quite comprehensive. So far, to the best of our knowledge, these extensions have only been considered for manifolds (Goel et al. (2018)) in the literature. For this reason we have decided to limit our considerations to $\mathbb{R}^{p}$ -valued data.

We study two kind of processes $X=(X_{t}:t\in\mathbb{Z})\subseteq[0,1]^{p}$ in the critical regime, namely, (1) processes which can be coupled to a process with a discrete state space and (2) Markov chains of finite order.

First consider a process $X$ which is obtained from a stationary discrete process $Z=(Z_{t}:t\in\mathbb{Z})$ as follows. Let $\kappa$ be a blocked density w.r.t. the Lebesgue measure on $[0,1]^{p}$ , i.e., there is an $m\in\mathbb{N}$ such that

[TABLE]

and where the subcubes $A_{i}$ partition $[0,1]^{p}$ . Note that the $A_{i}$ may have different volumes.

We assume that the process $Z$ admits a Marton coupling which satisfies (A2) and takes values in a finite set $S=(s_{1},\ldots,s_{m^{d}})$ , $s_{i}\neq s_{j}$ for $1\leq i\neq j\leq m^{p}$ , such that $\mathbb{P}(Z_{t}=s_{i})=\alpha_{i}|A_{i}|$ . Define $X_{t}$ with the help of $Z_{t}$ by

[TABLE]

where the $Y_{t,i}$ are independent and uniformly distributed on $A_{i}$ for $t\in\mathbb{Z}$ and $1\leq i\leq m^{d}$ . Then if $B\subseteq A_{i}$ , $\mathbb{P}(X_{t}\in B)=(|B|/|A_{i}|)\,(\alpha_{i}|A_{i}|)=\alpha_{i}|B|$ . Hence, each $X_{t}$ admits a marginal density $\kappa$ .

Then the conditional distribution of the process $X$ works as follows. In the first step and conditional on the past $(X_{1},\ldots,X_{t-1})$ , we choose a subcube $A_{i}$ , according to

[TABLE]

In the second step, we choose at random a point in the subcube $A_{i}$ as the realization of $X_{t}$ .

Consequently, $X=(X_{t}:t\in\mathbb{Z})$ admits a Marton coupling which satisfies (A2). The conditional distribution of $X$ is invariant in the sense that

[TABLE]

for all $x_{i},y_{i}\in A_{i}$ , $i=t-s,\ldots,t-1$ , $\ell,s\in\mathbb{N}$ , $t-s\geq 1$ . If we can only observe the process $X$ , then we can think of $Z$ as a hidden process. We have the following theorem.

Theorem 2.8.

Let $X=(X_{t}:t\in\mathbb{Z})$ be a $[0,1]^{p}$ -valued process which admits a Marton coupling that satisfies (A2). Each $X_{t}$ has a marginal density $\kappa$ as in (A5) and (A6) such that $0<\inf\kappa\leq\sup\kappa<\infty$ . Then for each $0\leq q\leq p-1$

[TABLE]

where $\hat{b}_{q}(r,s)$ is the limit of $n^{-1}\mathbb{E}\left[\beta^{r,s}_{q}(\mathcal{K}(n^{1/p}\mathbb{X}^{*}_{n}))\right]$ for an $n$ -binomial process $\mathbb{X}^{*}_{n}$ with unit density on $[0,1]^{p}$ .

So the expectation of the persistent Betti number obtained from this kind of time series has the same limit properties as the corresponding binomial process.

We extend Theorem 2.8 to general marginal density functions $\kappa\colon[0,1]^{p}\to\mathbb{R}_{+}$ which can be approximated by blocked density functions $\kappa_{\varepsilon}$ . To this end, we restrict ourselves to the case of uniformly ergodic Markov chains of order $m$ , viz., $X=(X_{t}:t\in\mathbb{Z})$ is a stationary process such that $\mathcal{L}(X_{t}|X_{u}:u<t)=\mathcal{L}(X_{t}|X_{t-1},\ldots,X_{t-m})$ , for some $m\in\mathbb{N}$ . For such a Markov chain all transition probabilities are determined by the joint density $g$ of $X_{1},\ldots,X_{m+1}$ which is assumed to be continuous and strictly positive on $[0,1]^{(m+1)p}$ in that $\inf\{g(z):z\in[0,1]^{(m+1)p}\}>0$ . It is known that this kind of aperiodic restriction ensures the Markov chain $X$ to be uniformly geometrically ergodic, see also Meyn and Tweedie (2012) Theorem 16.0.2.

Furthermore, the limit on the right-hand side in (2.9) is continuous: Indeed, Divol and Polonik (2019) show that the limit

[TABLE]

exists, where $\kappa_{\varepsilon}$ are blocked density functions on $[0,1]^{p}$ (from a regular grid) as in (A5) which converge to $\kappa$ in the $\|\cdot\|_{\infty}$ -norm and where the $Y_{\varepsilon}$ (resp. $Y$ ) have density $\kappa_{\varepsilon}$ (resp. $\kappa$ ).

For this kind of Markov chains $X$ we obtain from the previous Theorem 2.8 the following result.

Theorem 2.9.

Let $X$ be a homogeneous Markov chain of order $m$ taking values in $[0,1]^{p}$ such that the joint density $g$ of $X_{1},\ldots,X_{m+1}$ is continuous and satisfies $\inf\{g(z):z\in[0,1]^{(m+1)p}\}>0$ . The $X_{t}$ have marginal density $\kappa$ . Then for each $q=0,\ldots,p-1$ the convergence results from (2.9) and (2.10) in Theorem 2.8 are valid.

Consequently, we obtain also for this natural generalization of the binomial process the well-known limit. The generalization to arbitrary stationary processes $X$ which admit a Marton coupling is rather elaborate and complex. Actually, when following the current scheme of the proof, one first has to assume that this process $X$ can be coupled to a discrete process $\widetilde{X}$ which approximates $X$ sufficiently closely in terms of the conditional distribution functions. This would mainly result in a complex notation. For this reason, we have limited our considerations to processes whose conditional distributions only depend on $m$ lags of its past, this is sufficient for many applications and also serves as an approximation to the general case.

We conclude with an immediate result which follows from the Theorem 2.8 and Theorem 2.9 and the work of Hiraoka et al. (2018) concerning the vague convergence of persistence diagrams.

Corollary 2.10 (Vague convergence of persistence diagrams obtained from dependent data).

Let the assumptions of Theorem 2.8 or Theorem 2.9 be satisfied. Then for each $0\leq q\leq p-1$ , there is a Radon measure $\xi_{q}$ depending on $\kappa$ such that $\mathbb{E}\left[\xi_{n,q}\right]\overset{v_{c}}{\to}\xi_{q}$ and $\xi_{n,q}\overset{v_{c}}{\to}\xi_{q}$ $a.s.$ as $n\to\infty$ .

3 Extensions to random fields

We extend the theory from above to random fields in two settings, these correspond then to the situations discussed in Theorems 2.8 and 2.9 for the time series case.

The extension to random fields requires mainly notational changes. We consider stationary random fields indexed by the regular $d$ -dimensional lattice $\mathbb{Z}^{d}$ . The main difference is the ordering of the data which we assume to be located in the subset $\mathbb{N}^{d}$ . If $u,v\in\mathbb{N}^{d}$ are two positions on the lattice, we write $u\geq v$ ( $u\leq v$ ) if and only if $u_{i}\geq v_{i}$ ( $u_{i}\leq v_{i}$ ) for all $i\in\{1,\ldots,d\}$ . Moreover, we construct a total ordering on $\mathbb{N}^{d}$ with the $\ell^{1}$ -norm as follows. Let $u,v\in\mathbb{N}^{d}$ , then

[TABLE]

where $\|u_{1}^{k}\|_{1}=|u_{1}|+\ldots+|u_{k}|$ . The relations $>_{d},\leq_{d},\geq_{d}$ follow in the same spirit.

For a vector $N=(N_{1},\ldots,N_{d})\in\mathbb{N}^{d}$ , we denote the cardinality of the corresponding $d$ -cube $\prod_{i=1}^{d}\{1,\ldots,N_{i}\}$ by $\pi(N)$ . For a given a random field $X=(X_{u}:u\in\mathbb{Z}^{d})$ and an $N\in\mathbb{N}^{d}$ , we write $\mathbb{X}_{N}$ for the associated point cloud $\{X_{u}:u\leq N\}$ , which represents the sample data. In the following, we will consider only such $N\in\mathbb{N}^{d}$ which satisfy

[TABLE]

for some constant $\bar{c}\in\mathbb{R}_{+}$ . We write $N\to\infty$ for a sequence $N(k)\subseteq\mathbb{N}^{d}$ which satisfy the relation (3.2) for each $N(k)$ and also fulfills $\max\{N_{i}(k):i\in\{1,\ldots,d\}\}\to\infty$ as $k\to\infty$ .

Consider a Marton coupling $((X_{u},X^{\prime}_{u}):u\in\mathbb{N}^{d})$ of a stationary random field on $\mathbb{N}^{d}$ . We define the mixing matrix $\Gamma^{(\infty)}=(\Gamma^{(\infty)}_{u,v})_{u,v\in\mathbb{N}^{d}}$ w.r.t. the ordering $>_{d}$ . The line corresponding to location $u$ in the mixing matrix $\Gamma^{(\infty)}$ is given by

[TABLE]

where $v\geq_{d}u$ and where $\mu_{u}$ is defined in the same spirit as $\mu_{i}$ in (2.1).

We study the structure of the entries of the mixing matrix in a simple example. Consider a stationary random field $X$ on the lattice $\mathbb{Z}^{2}$ whose joint distribution can entirely be described by four (conditional) densities $f_{(0,0)}=\kappa,f_{(0,1)},f_{(1,0)}$ and $f_{(1,1)}$ . This means for any $N\in\mathbb{N}^{d}$ the joint distribution $\{X_{u}:u\leq N\}$ can be simulated with these four (conditional) densities and we can do this also using the ordering $<_{d}$ , beginning at the corner point (1,1). So we first simulate $X_{(1,1)}$ according to $\kappa=f_{(0,0)}$ . All observations $X_{(1,t)}$ for $1<t\leq N_{1}$ (resp. $X_{(t,1)}$ for $1<t\leq N_{2}$ ) are simulated with $f_{(0,1)}$ (resp. $f_{(1,0)}$ ). All remaining observations are simulated with the conditional density $f_{(1,1)}$ . Figure 1 illustrates the scheme.

Consider a location $u$ in the lattice and a configuration of the Marton coupling which agrees at all locations of the past of $u$ w.r.t. $>_{d}$ (all locations $v$ with $u>_{d}v$ ). Consider a point $v$ in the future of $u$ w.r.t. $>_{d}$ (all locations $v$ with $v>_{d}u$ ). Then the distributions of $X_{v}^{(x_{w}:w<_{d}u,x_{u},x^{\prime}_{u})}$ and ${X^{\prime}}_{v}^{(x_{w}:w<_{d}u,x_{u},x^{\prime}_{u})}$ are affected by the different configurations at location $u$ if and only if $v\geq u$ . Hence, (3.3) is only affected by the locations $v$ which satisfy $v\geq u$ , which is a strict subset of all those locations $v$ which satisfy $v\geq_{d}u$ .

We come to the description of the dependence patterns. First we consider again the blocked density function from (A5) and proceed as in the case for time series. Let $Z=(Z_{u}:u\in\mathbb{Z}^{d})$ be a stationary random field on the regular $d$ -dimensional lattice. The state space of $Z$ is discrete, i.e., $S=\{s_{1},\ldots,s_{m^{p}}\}$ , $s_{i}\neq s_{j}$ , for $1\leq i\neq j\leq m^{p}$ , such that $\mathbb{P}(Z_{u}=s_{i})=\alpha_{i}|A_{i}|$ . Also $Z$ admits a Marton coupling whose mixing matrix $\Gamma^{(\infty)}$ from (3.3) satisfies (similar as in (A2))

[TABLE]

Define a new random field $X=(X_{u}:u\in\mathbb{Z}^{d})$ with the help of $Z$ by

[TABLE]

where the $Y_{u,i}$ are independent and uniformly distributed on $A_{i}$ for $u\in\mathbb{Z}^{d}$ and $1\leq i\leq m^{d}$ . Then if $B\subseteq A_{i}$ , $\mathbb{P}(X_{u}\in B)=(|B|/|A_{i}|)\,(\alpha_{i}|A_{i}|)=\alpha_{i}|B|$ . In particular, each $X_{u}$ has a density $\kappa$ on $[0,1]^{p}$ . Also all other properties from the time series case are inherited. So, we have once more an invariance property as in (2.8): For $N\in\mathbb{N}^{d}$ and $u,z_{1},z_{2}\leq N$ such that $z_{2}<_{d}u<_{d}z_{1}$

[TABLE]

for all $x_{w},y_{w}\in A_{i_{w}}$ , $1\leq i_{w}\leq m^{p}$ , for all $w$ such that $z_{2}\leq_{d}w<_{d}u$ and $w\leq N$ .

Consequently, we obtain the following generalized variant of Theorem 2.8.

Theorem 3.1.

Let $X=(X_{u}:u\in\mathbb{Z}^{d})$ be a $[0,1]^{p}$ -valued random field on $\mathbb{Z}^{d}$ , which admits a Marton coupling that satisfies (A7) w.r.t. $>_{d}$ for each $N\in\mathbb{N}^{d}$ . Each $X_{u}$ has a marginal density $\kappa$ as in (A8) such that $0<\inf\kappa\leq\sup\kappa<\infty$ . Then for each $0\leq q\leq p-1$

[TABLE]

We refer to Chazottes et al. (2007); Külske (2003) who consider couplings for high-temperature Gibbs measures for the discrete random field $Z$ whose components take the values in $\{-,+\}$ . Given certain upper bounds on the dependence within the random field, they obtain for the two state Gibbs model a coupling $(Z,Z^{\prime})$ which satisfies

[TABLE]

for a certain constant $C\in\mathbb{R}_{+}$ . So the probability of an unsuccessful coupling decays exponentially fast in the $\ell^{1}$ -distance on the lattice, which is the minimal number of edges between $x$ and $y$ w.r.t. the standard $2^{d}$ -neighborhood structure. In particular, the Marton coupling $(Z,Z^{\prime})$ satisfies (A7).

For a generalization of Theorem 2.9 we need a decay assumption on the mixing matrices. In the case of a Markov chain of finite order, Theorem 16.0.2 in Meyn and Tweedie (2012) states that strictly positive and continuous conditional densities ensure uniform geometric ergodicity. So concerning the Marton coupling, we obtain a mixing matrix whose entries in one line decay at an exponential rate.

For random fields the situation is far more complex. To this end, we restrict ourselves to stationary Markov random fields $X$ of order 1 w.r.t. the $2^{d}$ neighborhood structure of the regular lattice $\mathbb{Z}^{d}$ whose joint distribution can be described with $2^{d}$ (conditional) density functions

[TABLE]

is the marginal density. More precisely, the distribution can be modeled with a scheme as in Figure 1, however, on a $d$ -dimensional lattice. The conditional density $f_{s}$ describes the transition within the set $\{z\in\mathbb{Z}^{d}:z_{j}=0\text{ for }j\in J(s)\}$ , where $J(s)=\{1\leq j\leq d:s_{j}=0\}$ .

We give an example for a cube $\mathcal{C}_{N}=\{u\in\mathbb{N}^{d}:u\leq N\}$ . First we can simulate the random variable $X_{(1,\ldots,1)}$ in the lower left corner according to $\kappa=f_{0}$ . Let $e_{k}$ be the standard basis elements of $\mathbb{R}^{d}$ for $k=1,\ldots,d$ , i.e., the vector whose $k$ th entry is 1 and 0 otherwise. Then the conditional densities $f_{e_{k}}$ describe the transition on the coordinate axes of the cube. Similarly, with the remaining functions $f_{s}$ , $s\neq(1,\ldots,1)$ , we can completely simulate the transition on the lower envelope of the cube, i.e., the locations which are zero in at least one coordinate. Finally, the conditional density $f_{(1,\ldots,1)}$ describes the transition to those locations $u$ , which are nonzero each entry.

It is an important fact that due to the Markov structure we can factorize the distribution of the random field on $\mathcal{C}_{N}$ with these conditional densities and use the ordering $>_{d}$ in the same time.

In contrast to the one-dimensional situation of a Markov chain, it is this time not enough that the conditional densities from (A9) are strictly positive in order to ensure a successful Marton coupling. To this end, we assume that the dependence within $X=\{X_{u}:u\in\mathbb{Z}^{d}\}$ decays at a polynomial rate in the sense that

[TABLE]

where $\delta>3(d-1)$ , $\|u\|_{\max}=\max\{|u_{i}|:i=1,\ldots,d\}$ and where $\Gamma^{(\infty)}$ is the mixing matrix of the entire random field. Note that a uniform exponential decay as in (3.6) is obviously sufficient for (A10). Note that due to the factorization property of $X$ from (A9), the mixing matrix at position $(u,v)$ is nontrivial if and only if $v\geq u$ . Also due to stationarity, it is entirely determined by the entries $\Gamma^{(\infty)}_{(1,\ldots,1),v}$ , $v\geq(1,\ldots,1)$ . Using last condition on the decay, we conclude with a generalized convergence result for persistent Betti numbers obtained from Markov random fields.

Theorem 3.2.

Let the stationary random field $X$ be given by the (conditional) density functions $f_{s}$ , $s\in\{0,1\}^{d}$ , from (A9), which are all continuous. Each $f_{s}$ is strictly positive on $[0,1]^{p}\times[0,1]^{\|s\|_{1}p}$ in that $\inf_{x,y}f_{s}(x|y)>0$ . Let the mixing matrix of $X$ satisfy (A10). Then $X$ fulfills the convergence results from (3.4) and (3.5).

4 Technical results

4.1 Helpful tools

Before we come to the proofs of the central results, we start with some auxiliary results.

Lemma 4.1 (Concentration inequality for bounded transition kernels).

Let $Z=\{Z_{i}:i\in\mathbb{N}\}$ be a sequence whose components $Z_{i}$ take values in the measure space $(S,\mathfrak{S},\mu)$ . Moreover assume that the conditional distributions $\mathcal{L}(Z_{i}|Z_{j}=z_{j},1\leq j<i)$ admit a conditional density $f_{i}$ . These densities are uniformly bounded in the sense that the first part of (A1) holds, i.e.,

[TABLE]

Let $(B_{n}:n\in\mathbb{N})\subseteq S$ be a sequence of measurable sets such that $\limsup_{n\rightarrow\infty}n\mu(B_{n})(\log n)^{-\alpha}\leq c^{*}$ , for certain $\alpha,c^{*}\in\mathbb{R}_{+}$ . Then there is a constant $c_{1}\in\mathbb{R}_{+}$ such that for all $n\in\mathbb{N}$ and $t\in\mathbb{R}_{+}$

[TABLE]

In particular, let $Z$ be an $\mathbb{R}^{p}$ -valued homogenous Markov chain which admits uniformly bounded conditional densities. For each $n$ , let $B_{n}=B(x,r_{n})$ be the $r_{n}$ -neighborhood of a point $x\in\mathbb{R}^{p}$ w.r.t. the Euclidean distance such that $nr_{n}^{p}\rightarrow c^{*}$ . Then (4.1) holds with $\alpha=0$ .

Proof.

First, we bound the Laplace transform of $\,\mathds{1}\!\left\{Z_{i}\in B_{n}\right\}$ w.r.t. $\mathcal{F}_{i-1}$ , where $\{\mathcal{F}_{i}\}_{i}$ is the natural filtration of the process $Z$ with $\mathcal{F}_{0}$ being trivial. We have

[TABLE]

Thus, we obtain for the entire process $\mathbb{P}(\sum_{i=1}^{n}\,\mathds{1}\!\left\{Z_{i}\in B_{n}\right\}>t)\leq\exp(-t+(e-1)f^{*}n\mu(B_{n}))$ , using Markov’s inequality. This finishes the proof because $\limsup_{n\rightarrow\infty}n\mu(B_{n})(\log n)^{-\alpha}\leq c^{*}$ . ∎

The next lemma is a generalization of Lemma 3.1 in Yogeshwaran et al. (2017).

Lemma 4.2.

Let $j\in\mathbb{N}$ and $X=\{X_{t}:t\in\mathbb{Z}\}$ be a process which takes values in a measure space $(S,\mathfrak{S},\mu)$ with a non-atomic measure $\mu$ . Let $\{v_{1},\ldots,v_{\ell}\}$ be a set of $\ell\leq j$ distinct natural numbers. Assume that the distribution of the vector $(X_{v_{1}},\ldots,X_{v_{\ell}})$ , when conditioned on another observation $X_{i}$ , $i\notin\{v_{1},\ldots,v_{l}\}$ , admits a density $f_{(X_{v_{1}},\ldots,X_{v_{\ell}})|X_{i}}$ w.r.t. $\mu^{\otimes\ell}$ . Assume that these densities are uniformly essentially bounded in the sense that the second part of (A1) holds, i.e.,

[TABLE]

Then for each $A\in\mathfrak{S}$ and for each $r>0$ it is true that

[TABLE]

In particular, if $S$ is a subset of $\mathbb{R}^{p}$ , $\eta_{n}=n^{1/p}$ and $\mu$ equals the Lebesgue measure, then (4.2) is of constant order $O(|A|r^{pj})$ and (4.2) is of order $O(r^{pj})$ .

Proof.

We only prove the statement in (4.2), the statement in (4.3) follows in the same fashion. The first inequality in (4.2) is obvious. Thus, we only show the second one. Observe that

[TABLE]

because the distance between any two points in a $j$ -simplex in the Čech or the Vietoris-Rips complex $K_{j}(\eta_{n}\mathbb{X}_{n},r;A)$ is at most $2r$ . On the one hand,

[TABLE]

and also $\#\{(u_{1},\ldots,u_{j})\in\{1,\ldots,n\}^{j}:i\neq u_{\ell}\neq u_{\ell^{\prime}}\neq i,\>1\leq\ell,\ell^{\prime}\leq j\}\leq n^{j}.$ On the other hand

[TABLE]

Combining (4.5), (4.6) with (4.4) yields the conclusion. ∎

The following result is well-known to topologists.

Lemma 4.3 (Geometric Lemma, Lemma 2.11 in Hiraoka et al. (2018)).

Let $\mathbb{X}\subseteq\mathbb{Y}$ be two finite point sets in $S$ . Then $|\beta^{r,s}_{q}(\mathcal{K}(\mathbb{Y}))-\beta^{r,s}_{q}(\mathcal{K}(\mathbb{X}))|\leq\sum_{j=q}^{q+1}|K_{j}(\mathbb{Y},s)\setminus K_{j}(\mathbb{X},s)|.$

4.2 Technical details on Section 2

We come to the proof of Theorem 2.7. Similar as in Yogeshwaran et al. (2017), we use a result of Chalker et al. (1999) to establish an exponential inequality without the need of bounding the martingale differences in the supremum-norm.

Proof of Theorem 2.6.

Consider the natural filtration of the process $X$ , $\mathcal{F}_{i}=\sigma(X_{1},\ldots,X_{i})$ for $i=0,\ldots,n$ with the convention that $\mathcal{F}_{0}=\{\emptyset,\Omega\}$ . We rewrite $H_{n}(\mathbb{X}_{n})-\mathbb{E}\left[H_{n}(\mathbb{X}_{n})\right]$ in terms of martingale differences as follows

[TABLE]

where $V_{n,i}=\mathbb{E}\left[H_{n}(\mathbb{X}_{n})|\mathcal{F}_{i}\right]-\mathbb{E}\left[H_{n}(\mathbb{X}_{n})|\mathcal{F}_{i-1}\right]$ . An abstract result of Chalker et al. (1999) yields

[TABLE]

for any $b_{1},b_{2}\in\mathbb{R}_{+}$ . Hence, it remains to compute bounds of $V_{n,i}$ . In all cases, we have the universal bound from (2.6). So, $\left\lVert V_{n,i}\right\rVert_{\mathbb{P},\infty}\leq c_{1}n^{q}$ .

Next, we investigate the probabilities on the right-hand side of (4.8). Define for $a\in S$ and $i\in\{1,\ldots,n\}$

[TABLE]

Write $\nu_{i}$ for the conditional distribution of $X_{i}$ given $(X_{1},\ldots,X_{i-1})$ on $S$ , viz.,

[TABLE]

Then, it follows with elementary calculations that for each $1\leq i\leq n$

[TABLE]

Let $\varepsilon>0$ be arbitrary but fixed. Choose $a^{*},b^{*}\in S$ such that $I_{n,i}(a^{*})\geq\operatorname*{ess\,sup}_{\text{ w.r.t.\ }\nu_{i}}I_{n,i}(\cdot)-\varepsilon/2$ and $I_{n,i}(b^{*})\leq\operatorname*{ess\,inf}_{\text{ w.r.t.\ }\nu_{i}}I_{n,i}(\cdot)+\varepsilon/2$ . Consider the Marton coupling of $(X_{1},\ldots,X_{n})$ and write $\mathbb{X}_{n}^{(X_{1},\ldots,X_{i-1},a^{*},b^{*})}$ for the point cloud associated to the coupling element $X^{(X_{1},\ldots,X_{i-1},a^{*},b^{*})}$ . The notation ${\mathbb{X}^{\prime}}_{n}^{(X_{1},\ldots,X_{i-1},a^{*},b^{*})}$ is used for the point cloud obtained from the counterpart $X^{\prime(X_{1},\ldots,X_{i-1},a^{*},b^{*})}$ . Consequently,

[TABLE]

(by abusing the notation slightly). We write $\mathbb{Y}_{n,i}=\{X_{1}^{i-1},a^{*},y_{i+1}^{n}\}$ and $\mathbb{Y}^{\prime}_{n,i}=\{X_{1}^{i-1},b^{*},{y^{\prime}}_{i+1}^{n}\}$ and consider the difference of the functionals in (LABEL:Eq:AbstractExpIneq4). The point clouds $\mathbb{Y}_{n,i}$ and ${\mathbb{Y}^{\prime}}_{n,i}$ in (LABEL:Eq:AbstractExpIneq4) differ at most in $n-i+1$ entries for each $i$ . These entries are $\{a^{*},y_{i+1},\ldots,y_{n}\}$ and $\{b^{*},y^{\prime}_{i+1},\ldots,y^{\prime}_{n}\}$ . Thus, we can transform $\mathbb{Y}_{n,i}$ into ${\mathbb{Y}^{\prime}}_{n,i}$ in $n-i+1$ steps exchanging one entry in each step, i.e., we consider the transformations

[TABLE]

where $\mathbb{Y}_{n,i}^{(\ell)}=\{X_{1}^{i-1},b^{*},{y^{\prime}}_{i+1}^{i+\ell-2},y_{i+\ell-1}^{n}\}$ , for $\ell\in\{2,\ldots,n-i+2\}$ .

Using this definition, the difference of the functionals in (LABEL:Eq:AbstractExpIneq4) is bounded above by

[TABLE]

The symmetric difference $\mathbb{Y}_{n,i}^{(\ell+1)}\triangle\mathbb{Y}_{n,i}^{(\ell)}$ is at most $\{a^{*},b^{*}\}$ for $\ell=1$ and $\{y_{\ell+i-1},y^{\prime}_{\ell+i-1}\}$ for $\ell\in\{2,\ldots,n-i+1\}$ . Let $\ell\in\{2,\ldots,n-i+1\}$ ; clearly, if $y_{\ell+i-1}=y^{\prime}_{\ell+i-1}$ , then $|H_{n}(\mathbb{Y}_{n,i}^{(\ell+1)})-H_{n}(\mathbb{Y}_{n,i}^{(\ell)})|=0$ . If $y_{\ell+i-1}\neq y^{\prime}_{\ell+i-1}$ , we can use (LABEL:E:ExchOneCost), which states that the exchange-one cost are local, to obtain

[TABLE]

A similar argument applies to $|H_{n}(\mathbb{Y}_{n,i}^{(2)})-H_{n}(\mathbb{Y}_{n,i}^{(1)})|$ , which admits the same bound as in (4.12) using the points $a^{*},b^{*}$

Write $\mathscr{N}(r)=\mathscr{N}(r,S,d)$ for the $r$ -covering number of $S$ w.r.t. $d$ from Condition 2.3. Use the family of coverings $\{\{B(w_{j},r):1\leq j\leq\mathscr{N}(r)\}:r>0\}$ to define for each $r>0$ and each $u>0$ the set

[TABLE]

Loosely speaking, when considering only sets in $\mathbb{A}_{n,u}(r)$ , we can control the degree of accumulation within a point cloud on $S$ if we choose $u<<n$ for some fixed $r$ . Using the definition of the set $\mathbb{A}_{n,u}(r)$ , (LABEL:Eq:AbstractExpIneq4) is at most

[TABLE]

Clearly, if both $\mathbb{Y}_{n,i}$ and ${\mathbb{Y}^{\prime}}_{n,i}$ are in $\mathbb{A}_{n,u}(\eta_{n}^{-1}r)$ , then each point cloud of the type $\mathbb{Y}_{n,i}^{(\ell)}$ and $\mathbb{Y}_{n,i}^{(\ell+1)}\cup\mathbb{Y}_{n,i}^{(\ell)}$ is in $\mathbb{A}_{n,2u}(\eta_{n}^{-1}r)$ .

By Condition 2.3 there is a covering $\{B(w_{k},\eta_{n}^{-1}r):1\leq j\leq\mathscr{N}(\eta_{n}^{-1}r)\}$ of $S$ . Consequently, there is a $1\leq k\leq\mathscr{N}(\eta_{n}^{-1}r)$ such that $y_{\ell+i-1}\in B(w_{k},\eta_{n}^{-1}r)$ and the neighborhood $B(y_{\ell+i-1},\eta_{n}^{-1}r)$ is contained in $B(w_{k},2\eta_{n}^{-1}r)$ . So, in the case where $\mathbb{Y}_{n,i}$ and ${\mathbb{Y}^{\prime}}_{n,i}$ are both in $\mathbb{A}_{n,u}(\eta_{n}^{-1}r)$ , we have

[TABLE]

the same applies to the neighborhoods of $y^{\prime}_{\ell+i-1},a^{*},b^{*}$ . Thus, we obtain an upper bound of the following type for the integral in (LABEL:Eq:AbstractExpIneq4)

[TABLE]

The last term in (LABEL:Eq:AbstractExpIneq7) is at most $2c_{2}(2u)^{\widetilde{q}}\gamma_{\infty}$ uniformly in $n,i$ and $a^{*},b^{*}$ due to the condition from (A2), which implies

[TABLE]

Moreover as $\varepsilon$ was arbitrary, this last bound from (LABEL:Eq:AbstractExpIneq7) is also true for the limit and we have

[TABLE]

where all constants are uniform in $n$ and $i$ . We fix the value of $u$ in the following as $u\coloneqq n^{\gamma}$ and return to (LABEL:Eq:AbstractExpIneq4). We choose $b_{2}=4c_{2}\gamma_{\infty}(2u)^{\widetilde{q}}$ . Thus, using (LABEL:Eq:AbstractExpIneq7b)

[TABLE]

where we use Markov’s inequality in the last step. We bound above both expectations in (LABEL:Eq:AbstractExpIneq8) as follows: First note that for $r\geq 0$ and $w\in S$

[TABLE]

In particular, we have for each $r>0$

[TABLE]

Additionally, we apply Lemma 4.1 to $\mathbb{E}\left[\exp(\sum_{k<i}\,\mathds{1}\!\left\{X_{k}\in B(w,r)\right\})\right]$ and use that $\,\mathds{1}\!\left\{X_{i}\in B(w,r)\right\}$ is at most 1. Then we obtain for a state $a\in S$

[TABLE]

Combining this last inequality with (LABEL:Eq:AbstractExpIneq8), we see that

[TABLE]

Moreover, inserting this result in (4.8) for the above choice of $b_{2}$ and $b_{1}=n^{a}t$ , yields

[TABLE]

Finally, applying the definition of $\gamma$ completes the proof. ∎

Proof of Theorem 2.7.

It remains to verify that the persistent Betti function $\beta^{r,s}_{q}$ satisfies the condition in (2.6) and in (LABEL:E:ExchOneCost). It follows from the definition of Betti numbers that (2.6) is satisfied for $c_{1}=1$ and the exponent $q$ .

Next, we inspect the condition in (LABEL:E:ExchOneCost). Let $\mathbb{Y},\mathbb{Z}$ be two point clouds of $n$ points, which differ in exactly one point, viz., $\mathbb{Y}\Delta\mathbb{Z}=\{y,z\}$ . We can use the Geometric Lemma (Lemma 4.3) to obtain

[TABLE]

where we use for the last inequality the scaling relation $K_{j}(\eta(\mathbb{Y}\cup\mathbb{Z}),r)=K_{j}(\mathbb{Y}\cup\mathbb{Z},\eta^{-1}r)$ , which is valid for the Čech and the Vietoris-Rips complex for all $\eta>0$ because of the homogeneity of $d$ .

Observe that a $j$ -simplex in the filtration $K(\mathbb{Y}\cup\mathbb{Z},\eta_{n}^{-1}s)$ has a diameter of at most $2\eta_{n}^{-1}s$ . Thus, a $j$ -simplex with a node in a point $y$ , resp. $z$ , lies in the $(2\eta_{n}^{-1}s)$ -neighborhood of $y$ , resp. $z$ . Consequently, (4.17) is at most

[TABLE]

Hence, the condition in (LABEL:E:ExchOneCost) is satisfied with $c_{2}=2$ , $r=2s$ and exponent $\widetilde{q}=q+1$ . ∎

Proof of Theorem 2.8.

We split the proof in three parts. We show in the first part that

[TABLE]

Define a filtration, which is the union of the single filtrations when restricted to the cubes $A_{i}$ , by the complexes

[TABLE]

Since this union is of disjoint complexes, we have $\beta_{q}^{r,s}(\mathring{\mathcal{K}}(n^{1/p}\mathbb{X}_{n}))=\sum_{i=1}^{m^{p}}\beta_{q}^{r,s}(\mathcal{K}(n^{1/p}(\mathbb{X}_{n}\cap A_{i})))$ . We use Lemma 4.3 and Lemma 4.2 to arrive at

[TABLE]

Then $R_{n}$ is of order $n^{-1/p}$ . So, we can consider the expectation on the blocks $A_{i}$ instead.

From now let $i\in\{1,\ldots,m^{p}\}$ be an arbitrary but fixed index. Write $\ell_{i,1},\ldots,\ell_{i,p}$ for the edge lengths of $A_{i}$ . So that $|A_{i}|$ equals $\prod_{j=1}^{p}\ell_{i,j}$ . Also write $M_{i}$ for the diagonal matrix $diag(\ell_{i,j}:1\leq j\leq p)$ . Note that $\operatorname{det}(M_{i})=|A_{i}|$ . This completes the first part.

In the second part, we use McDiarmid’s inequality from Theorem A.2. Set $S_{n,i}=\sum_{t=1}^{n}\,\mathds{1}\!\left\{X_{t}\in A_{i}\right\}$ and $h(n)=n^{3/4}$ . Since $(X_{1},\ldots,X_{n})$ admits a Marton coupling which satisfies (A2), we can apply Theorem A.2 to arrive at

[TABLE]

Using the definition $I_{n,i}=[-h(n)+\mathbb{E}\left[S_{n,i}\right],h(n)+\mathbb{E}\left[S_{n,i}\right]]$ and the fact that the Betti numbers of dimension $q$ are polynomially bounded by $n^{q+1}$ , we obtain

[TABLE]

In particular, $n^{-1}\mathbb{E}[\beta_{q}^{r,s}(\mathcal{K}(n^{1/p}(\mathbb{X}_{n}\cap A_{i})))\{S_{n,i}\notin I_{n,i}\}]$ is negligible and we can focus in the following on the restriction $\{S_{n,i}\in I_{n,i}\}$ . For this purpose, write $\mu_{n,i}=\lfloor\mathbb{E}\left[S_{n,i}\right]\rfloor=\lfloor n\alpha_{i}|A_{i}|\rfloor$ , then it follows from Lemma 5.5 in Krebs and Polonik (2019) that for each $0\leq r\leq s$

[TABLE]

where the $X^{\prime}_{1},X^{\prime}_{2},\ldots$ are independent and uniformly distributed on $[0,1]^{p}$ . We will use (LABEL:E:UnifConvergenceBettiPoisson) later.

In the third part, we study the success runs of $(X_{t}:1\leq t\leq n)$ and the sum $S_{n,i}$ : If an $X_{t}$ falls in $A_{i}$ , we term this a success and a failure otherwise. Consider a path with exactly $k$ successes $J=(\mathscr{F}_{1},\mathscr{S}_{1},\mathscr{F}_{2},\ldots,\mathscr{F}_{v},\mathscr{S}_{v},\mathscr{F}_{v+1})\in\{0,1\}^{n}$ , where $v\leq k$ and where each $\mathscr{S}_{i}$ is a sequence of 1’s and each $\mathscr{F}_{i}$ a sequence of 0’s (potentially $\mathscr{F}_{1}$ and $\mathscr{F}_{v+1}$ have length 0) for $i\in\{1,\ldots v\},(\text{resp. in }\{1,\ldots v+1\})$ . So, on the path $J$ , we have $S_{n,i}=k$ .

Consider the expectation on this path $J$ . For this write $J^{*}$ for the index set which contains the positions in $J$ that mark a success. Write $\mathbb{M}_{t}$ for the conditional distribution of $X_{t}$ given the past $X_{1},\ldots,X_{t-1}$ . Then

[TABLE]

Consider the situation for the last success which is given at a position $t^{*}$ . Note that each $\mathbb{M}_{t}$ admits a conditional density $f_{t}$ because the distribution of $X_{t}$ on each block $A_{j}$ , $j\in\{1,\ldots,m^{p}\}$ , is uniform and independent of the past observations $X_{1},\ldots,X_{t-1}$ given that $X_{t}$ falls in the block $A_{j}$ . So this $f_{t}(x_{t}|x_{1},\ldots,x_{t-1})$ is constant for all $x_{t}$ from a block $A_{j}$ . Due to the blocked structure of the conditional densities of $X$ and the invariance property from (2.8), the contribution of the observations $X_{t^{*}},\ldots,X_{n}$ to the integral in (LABEL:E:ConvergenceExpectationDiscrete1) is then

[TABLE]

where $z_{t^{*}}$ is an arbitrary but fixed element of $A_{i}$ and where we use for the last equation that

[TABLE]

Using recursively this conditional independence argument, one obtains for (LABEL:E:ConvergenceExpectationDiscrete1)

[TABLE]

where the $X^{\prime}_{t}$ are independent and uniformly distributed on $[0,1]^{p}$ .

Moreover, using the uniform approximation result from (LABEL:E:UnifConvergenceBettiPoisson) shows that (4.20) equals

[TABLE]

where the remainder $o(1)$ is uniform in $J$ and $k\in I_{n,i}$ . Furthermore, using the dilatation rules of the expectation of persistent Betti numbers computed from the Čech or Vietoris-Rips filtration, we obtain for the main term of (4.21)

[TABLE]

where the remainder $o(1)$ is uniform in $J$ and where the last equality follows as in the proof of Lemma 10 in Divol and Polonik (2019). Summing over all paths $J$ with exactly $k$ successes, over all $k\in I_{n,i}$ and over all $i=1,\ldots,m^{p}$ yields then the conclusion, viz.,

[TABLE]

This proves the first assertion in (2.9). Combining this last statement with Theorem 2.7 and the Borel-Cantelli-Lemma shows the second assertion in (2.10). So, the proof is complete. ∎

Proof of Theorem 2.9.

In the proof, we sometimes abuse the notation slightly in order to keep formulas shorter. To be precise, we write $K(U_{1}^{n},r)$ for the simplicial complex $K(\{U_{1},\ldots,U_{n}\},r)$ of a vector $U_{1},\ldots,U_{n}$ at filtration time $r$ to save space. The related expressions are abbreviated in this way, too.

In the first step of the proof, we construct a discrete Markov chain of order $m$ , $\widetilde{X}$ , which approximates $X$ closely. To this end, let $\varepsilon>0$ be arbitrary but fixed. We use a discrete density function $g_{\varepsilon}$ , which is an approximation of the joint density $g$ of $(X_{1},\ldots,X_{m+1})$ . We write $f_{t}$ for the conditional density of $X_{t}$ given $(X_{t-1},\ldots,X_{1})$ with the convention that $f_{1}$ is the marginal density $\kappa$ .

Since we assume that the process $X$ is a Markov chain of order $m$ , we are actually dealing with the conditional densities $f_{1}\equiv\kappa,f_{2},\ldots,f_{m+1}$ only. Using the approximation $g_{\varepsilon}$ , we obtain approximations $f_{\varepsilon,1}=:\kappa_{\varepsilon},f_{\varepsilon,2},\ldots,f_{\varepsilon,m+1}$ , which are defined in the same spirit as the $f_{t}$ . We choose the precision between $g$ and $g_{\varepsilon}$ sufficiently high (in the $\|\cdot\|_{\infty}$ -norm) such that

[TABLE]

Thus, at each step of the evolution of the Markov chain, we can approximate each conditional density with a discrete conditional density at a precision of at least $\varepsilon/2$ (measured in the total variation distance). Note that this is possible because we assume that $\inf\{g(z):z\in[0,1]^{(m+1)p}\}>0$ , so that all conditional densities are well defined.

We write $\widetilde{X}$ for the Markov chain of order $m$ obtained from the above $\varepsilon$ -approximation scheme, note that we can also choose $g_{\varepsilon}$ to be strictly positive. In particular, this implies that $\widetilde{X}$ satisfies the assumptions of Theorem 2.8 because it is uniformly geometrically ergodic, see Meyn and Tweedie (2012) Theorem 16.0.2. Clearly, also $X$ is uniformly geometrically ergodic, hence, $X$ admits a Marton coupling (see also Example 2.2 and Paulin (2015) Proposition 2.4), whose mixing matrix $\Gamma^{(n)}$ (based on $X_{1},\ldots,X_{n}$ ) satisfies

[TABLE]

In the second step, we use the decomposition

[TABLE]

where the random variables $Y_{\varepsilon}$ (resp. $Y$ ) have density $\kappa_{\varepsilon}$ (resp. $\kappa$ ).

If $\varepsilon$ converges to 0, (4.26) converges to 0, too, see as well (2.11) and Divol and Polonik (2019) for details. Moreover, from Theorem 2.8, we conclude that (4.25) converges to 0 as $n$ tends to $\infty$ for each $\varepsilon>0$ and corresponding approximation $g_{\varepsilon}$ .

So, the term in (4.24) remains. Let $w\in\mathbb{N}$ and $\overline{w}$ be such that $1/w+1/\overline{w}=1$ . For the remainder of this proof, we show that (4.24) is at most $C^{*}_{s,p,w}\varepsilon^{1/\overline{w}}+o(1)$ uniformly in $n$ , where the constant $C^{*}_{s,p,w}$ does neither depend on the choice of the approximation parameter $\varepsilon$ nor on $n$ . It solely depends on $s,p$ and $w$ . For this we rewrite the expectations in (4.24) as

[TABLE]

We transform (LABEL:EQ:ConvergenceExpectationContinuous6) in (LABEL:EQ:ConvergenceExpectationContinuous7) in $n$ -steps using a specific coupling in each step. For this purpose we write the difference between (LABEL:EQ:ConvergenceExpectationContinuous6) and (LABEL:EQ:ConvergenceExpectationContinuous7) as a telescopic sum as follows (the exchanged factor is given in square parentheses)

[TABLE]

Each integral in the sum can be interpreted as a difference between the expectation of two persistent Betti numbers of two coupled processes $(Z^{\prime}_{t,\cdot},\widetilde{Z}_{t,\cdot})$ for $t\in\{1,\ldots,n\}$ . We explain this coupling in three steps and refer to the term in (LABEL:EQ:ConvergenceExpectationContinuous8), which shows the general situation. First, the $t$ th coupling starts with $\widetilde{Z}_{t,1}={Z^{\prime}}_{t,1},\ldots,\widetilde{Z}_{t,t-1}={Z^{\prime}}_{t,t-1}$ ; so $Z^{\prime}_{t,\cdot}$ and $\widetilde{Z}_{t,\cdot}$ have the same distribution as the stationary discrete Markov chain $\widetilde{X}$ (with the densities $f_{\varepsilon,\cdot})$ from time 1 to $t-1$ .

Second, at time $t$ , we simulate a random variable $Z^{\prime}_{t,t}$ using the conditional density $f_{i}$ (where the index $i$ depends on the position of $t$ ). Also at time $t$ , we simulate a random variable $\widetilde{Z}_{t,t}$ using the conditional density $f_{\varepsilon,i}$ . Note that $Z^{\prime}_{t,t}$ and $\widetilde{Z}_{t,t}$ can be coupled such that

[TABLE]

because of the choices in (LABEL:EQ:ConvergenceExpectationContinuous1); we refer to Den Hollander (2012) for an abstract maximal coupling result on Polish spaces.

Third, we find two chains $Z^{\prime}_{t,j}$ and $\widetilde{Z}_{t,j}$ , $j=t+1,\ldots$ , using the conditional densities $f_{i}$ such that the single elements at time $j\leq n$ satisfy $\mathbb{P}(Z^{\prime}_{t,j}\neq\widetilde{Z}_{t,j}|\widetilde{Z}_{t,1},\ldots,\widetilde{Z}_{t,t-1},\widetilde{Z}_{t,t},Z^{\prime}_{t,t})\leq\Gamma^{(n)}_{t,j}$ . This last inequality follows from the properties of the Marton coupling, see (4.23).

In the following, we will use the abbreviation $\widetilde{Z}^{t,j}_{t,i}$ for the vector $(\widetilde{Z}_{t,i},\ldots,\widetilde{Z}_{t,j})$ ; we use the notation in the same spirit for $Z^{\prime}_{t,\cdot}$ . Using the above coupling, we see that (4.24) is at most

[TABLE]

So for each $t$ the point clouds differ at most in the points ${Z^{\prime}}_{t,t}^{t,n}$ resp. ${\widetilde{Z}}_{t,t}^{t,n}$ and we can always transform one point cloud in the other in $n-t+1$ steps.

Regarding (4.31), we show that there is a $C\in\mathbb{R}_{+}$ such that for each $t\in\{1,\ldots,n\}$ and for each $n\in\mathbb{N}$ the coupling $(Z^{\prime}_{t,\cdot},\widetilde{Z}_{t,\cdot})$ satisfies

[TABLE]

Define the coupling time between $({Z^{\prime}}_{t,j})_{j}$ and $(\widetilde{Z}_{t,j})_{j}$ by

[TABLE]

i.e., for all $j\geq\tau_{c}(t)$ the chains evolve again in lockstep, viz., $Z^{\prime}_{t,j}=\widetilde{Z}_{t,j}$ for $j\geq\tau_{c}(t)$ . Note that given $Z^{\prime}_{t,t}\neq\widetilde{Z}_{t,t}$ , we have $\tau_{c}(t)\geq t+m$ .

The coupling times $\tau_{c}$ admit a tail bound which involves the coefficients from the Marton coupling as follows: If $u\geq t+m$ , then

[TABLE]

Since the coefficients of the Marton coupling satisfy (4.23), this shows that the moments of the coupling times (when conditioned on $\{Z^{\prime}_{t,t}\neq\widetilde{Z}_{t,t}\}$ ) are uniformly bounded: We have for $\delta\geq 0$

[TABLE]

We begin our considerations with the restriction to the event $\{\tau_{c}(t)\leq n\}$ for $t\in\{1,\ldots,n\}$ :

[TABLE]

where we use for the last equality that $\tau_{c}(t)$ can be at least $t+m$ conditional on the event $\{{Z^{\prime}}_{t,t}\neq\widetilde{Z}_{t,t}\}$ . We study the expectations in (LABEL:EQ:ConvergenceExpectationContinuous16). First we apply the Geometric Lemma to obtain

[TABLE]

Consider the $w$ th moment of a simplex count $K_{k}$ for some $\ell$ . For this purpose denote by $(Y_{0},\ldots,Y_{n})$ the data $(\widetilde{Z}_{t,1},\ldots,\widetilde{Z}_{t,t-1+\ell},{Z^{\prime}}_{t,t-1+\ell},\ldots,Z^{\prime}_{n})$ . Then one finds with elementary combinatorial arguments that the $w$ th moment of the simplex count in the first line in (LABEL:EQ:ConvergenceExpectationContinuous17) is at most

[TABLE]

for pairwise disjoint indices $i_{\ell}$ . The last inequality can be derived as follows: The number of different observations $v$ is in $[k,wk]$ . Given $v$ pairwise different indices $i_{1},\ldots,i_{v}$ (observations $Y_{i_{1}},\ldots,Y_{i_{v}}$ ) each occurs with multiplicity $u_{i}$ such that $\sum_{i=1}^{v}u_{i}=wk$ .

Applying finally the same reasoning as in the proof of Lemma 4.2 shows that (LABEL:EQ:ConvergenceExpectationContinuous18) is bounded above by a universal constant $C_{s,p,w}$ , which depends on $s,p,w$ but not on $n,\ell,k$ . Clearly, the $w$ th moment of the simplex count in the second line in (LABEL:EQ:ConvergenceExpectationContinuous17) is at most $C_{s,p,w}$ , too.

We can now return to (LABEL:EQ:ConvergenceExpectationContinuous16). Set $C^{\prime}_{w}=(\sum_{j=1}^{\infty}j^{-aw})^{1/w}$ for some $a>1/w$ . Then relying on the coupling result from (4.30), (LABEL:EQ:ConvergenceExpectationContinuous16) is at most

[TABLE]

Relying on (4.23), (4.33) and (4.34) this last term is of order $\varepsilon^{1/\overline{w}}$ . This shows that (LABEL:EQ:ConvergenceExpectationContinuous16) is of order $\varepsilon^{1/w}$ uniformly in $t\in\{1,\ldots,n\}$ and $n$ .

In order to complement the considerations following (LABEL:EQ:ConvergenceExpectationContinuous16), it remains to consider the restriction to the event $\{\tau_{c}(t)>n\}$ for $t\in\{1,\ldots,n\}$ . Here we need additionally to consider the average over all $t$

[TABLE]

Carrying out similar calculations as in (LABEL:EQ:ConvergenceExpectationContinuous17) and (LABEL:EQ:ConvergenceExpectationContinuous18) it is not difficult to see that (4.38) is at most

[TABLE]

Using once more the result in (4.33), (4.34) and (4.23) yields directly that this last upper bound vanishes.

Combining these results yields (4.32). This completes the proof. ∎

Proof of Corollary 2.10.

It is shown in Proposition 3.4 in Hiraoka et al. (2018) that the pointwise convergence of persistent Betti numbers implies the vague convergence of the corresponding sequence of persistent diagrams. ∎

4.3 Technical details on Section 3

Proof of Theorem 3.1.

The statement follows immediately from Theorem 2.8. ∎

Proof of Theorem 3.2.

The proof is very similar to that of Theorem 2.9 and we only study the main differences in detail. First, we construct an $\varepsilon$ -approximation $\widetilde{X}$ of $X$ . For this purpose we consider the joint distribution of $\{X_{u}:u\leq(1,\ldots,1)\}$ , which is completely determined by the joint density $g\colon[0,1]^{2^{d}p}\to(0,\infty)$ . Let $\varepsilon>0$ and choose a discrete approximation $g_{\varepsilon}\colon[0,1]^{2^{d}p}\to(0,\infty)$ of $g$ such that conditional densities $f_{\varepsilon,s}$ ( $s\in\{0,1\}^{d}$ ) are derived from $g_{\varepsilon}$ in the same spirit as in the proof of Theorem 2.9; we refer to Figure 1. These densities are strictly positive and satisfy

[TABLE]

as well as $\|f_{\varepsilon,(0,\ldots,0)}-f_{(0,\ldots,0)}\|_{\infty}\leq\varepsilon$ . (This requirement is the analog to (LABEL:EQ:ConvergenceExpectationContinuous1)). Obviously, the discrete (conditional) densities $f_{\varepsilon,s}$ determine the random field $\widetilde{X}$ completely. Also, due to the blocked structure of the densities $f_{\varepsilon,s}$ and the condition from (A10), the random field $\widetilde{X}$ satisfies the requirements of Theorem 3.1. Reasoning as in (4.24) to (4.26), it is sufficient to study the difference

[TABLE]

for an arbitrary but fixed $N\in\mathbb{N}^{d}$ .

We use the same expansion for this difference as in the case of Markov chains, see (LABEL:EQ:ConvergenceExpectationContinuous6), (LABEL:EQ:ConvergenceExpectationContinuous7) and (LABEL:EQ:ConvergenceExpectationContinuous8). But this time we use the ordering $>_{d}$ for the expansion. We obtain for each $u\in\mathbb{N}^{d}$ a coupling $((Z^{\prime}_{u,v},\widetilde{Z}_{u,v}):v\in\mathbb{N}^{d})$ with the properties

[TABLE]

Consequently, using that $\widetilde{Z}_{u,v}={Z^{\prime}}_{u,v}$ , for all $v<_{d}u$ , we can write the difference in (4.39) as

[TABLE]

Given a location $u$ and a coupling $(Z^{\prime}_{u,\cdot},\widetilde{Z}_{u,\cdot})$ , we define the coupling time

[TABLE]

So $\tau_{c}(u)$ is determined by the causal dependence pattern which is derived from the factorization of the joint distribution according to the ordering $>_{d}$ . Note that both random fields $Z^{\prime}_{u,\cdot}$ and $\widetilde{Z}_{u,\cdot}$ move in lockstep after $\tau_{c}(u)$ .

Consider the tail of the coupling time $\tau_{c}$ at location $u$

[TABLE]

for a constant $c_{d}\in\mathbb{R}_{+}$ , which depends on $d$ but not on $u,k,N$ .

Choose $w\in\mathbb{N}$ such that we have with the abbreviation $\overline{w}=w/(w-1)$ that $(\delta/3+1)/\overline{w}-d>1/w>0$ , which is possible because by assumption $\delta>3(d-1)$ .

In the following, we will consider such a single difference in (LABEL:EQ:ConvExpMRF2) and show that it is of order $\varepsilon^{1/\overline{w}}$ uniformly in $N$ ; the calculations follow in a similar spirit as in the proof of Theorem 2.9, see (LABEL:EQ:ConvergenceExpectationContinuous16) to (LABEL:EQ:ConvergenceExpectationContinuous18), so we omit some details. Clearly, we can restrict our considerations to the event $\{Z^{\prime}_{u,u}\neq\widetilde{Z}_{u,u}\}$ . Again, use $C^{\prime}_{w}=(\sum_{j=1}^{\infty}j^{-aw})^{1/w}$ but this time for $a\in(1/w,(\delta/3+1)/\overline{w}-d)$ . Then using a similar bound on simplex counts, there is a constant ${\overline{C}}_{s,p,d,w}$ such that

[TABLE]

for all $u\leq N$ and for all $N$ . Regarding the sum in (LABEL:EQ:ConvExpMRF4), we use the definition of $C^{\prime}_{w}$ to see

[TABLE]

Applying the upper bound from (4.41) together with the condition from (A10) shows that (4.43) is uniformly bounded in $u$ and $N$ .

We use (LABEL:EQ:ConvExpMRF4) together with (4.43) as well as (4.41) to give a bound on (LABEL:EQ:ConvExpMRF2) (up to a universal multiplicative constant) as follows

[TABLE]

because $(\tau_{c}(u):u)$ is stationary. We can now repeat the calculations which lead to (4.43) to see that this last sum satisfies

[TABLE]

Relying once more on (4.41) and (A10) shows then that this integral is uniformly bounded in $u$ and $N$ . This shows that (LABEL:EQ:ConvExpMRF2) is of order $\varepsilon^{1/\overline{w}}$ . Consequently, (4.39) is of order $\varepsilon^{1/\overline{w}}$ , too. ∎

Appendix A McDiarmid inequalities for Marton couplings

In this section we study McDiarmid inequalities for Marton couplings. Notable contributions to this topic are Samson (2000), Chazottes et al. (2007), Kontorovich and Ramanan (2008), Redig and Chazottes (2009). We shall first state a result of Paulin (2015) who uses Marton couplings to characterize the dependence of the data.

Definition A.1 (Partition).

A partition of a random vector $Z=(Z_{1},\ldots,Z_{N})$ is a deterministic division of $Z$ into random variables $\hat{Z}_{i}$ , $i=1,\ldots,n$ for some $n\leq N$ such that the set $\{Z_{1},\ldots,Z_{N}\}$ is partitioned by $(\hat{Z}_{i})_{i=1,\ldots,n}$ . Denote the number of elements of $\hat{Z}_{i}$ by $s(\hat{Z}_{i})$ and write $s(\hat{Z})$ for the size of the partition which is $\max_{i=1,\ldots,n}s(\hat{Z}_{i})$ .

Theorem A.2 (McDiarmid’s inequality, Paulin (2015)).

Let $Z=(Z_{1},\ldots,Z_{N})$ be a random variable in $\Lambda=\Lambda_{1}\times\cdots\times\Lambda_{N}$ . Assume that $Z$ admits a partitioning $\hat{Z}=(\hat{Z}_{1},\ldots,\hat{Z}_{n})$ which allows a Marton coupling with mixing matrix $\Gamma\in\mathbb{R}^{n\times n}$ . Let $\varphi:\Lambda\rightarrow\mathbb{R}$ be Lipschitz continuous w.r.t. the Hamming distance, i.e., there is a $c=(c_{1},\ldots,c_{N})\in\mathbb{R}^{N}$ such that

[TABLE]

Set $\mathcal{I}_{i}\coloneqq\{j=1,\ldots,N:Z_{j}\in\hat{Z}_{i}\}$ and $C_{i}(c)\coloneqq\sum_{j\in\mathcal{I}_{i}}c_{j}$ for $i=1,\ldots,n$ . Then

[TABLE]

In particular,

[TABLE]

The proof uses the following lemma of Devroye and Lugosi (2012):

Lemma A.3.

Let $\mathcal{F}$ be a sub- $\sigma$ -algebra, $U,V,W$ random variables which satisfy $U\leq V\leq W$ $a.s.$ Moreover, $U,W$ are $\mathcal{F}$ -measurable and $\mathbb{E}\left[V|\mathcal{F}\right]=0$ . Then

[TABLE]

Proof of Theorem A.2.

We consider the natural filtration of the random vector $\hat{Z}$ , i.e., $\mathcal{F}_{i}=\sigma(\hat{Z}_{1},\ldots,\hat{Z}_{i})$ for $i=0,\ldots,n$ and define $\hat{\varphi}((\hat{x_{i}})_{i})\coloneqq\varphi(x)$ for $x\in\Lambda$ . Then $\hat{\varphi}$ is also Lipschitz continuous w.r.t. Hamming distance, more precisely,

[TABLE]

Set $V_{i}\coloneqq\mathbb{E}\left[\hat{\varphi}(\hat{Z})|\mathcal{F}_{i}\right]-\mathbb{E}\left[\hat{\varphi}(\hat{Z})|\mathcal{F}_{i-1}\right]$ for $i=1,\ldots,n$ . Moreover, define for $a\in\hat{\Lambda}_{i}=\prod_{j\in\mathcal{I}_{i}}\Lambda_{j}$

[TABLE]

And write $\nu_{i}$ for the conditional distribution of $\hat{Z}_{i}$ given $(\hat{Z}_{1},\ldots,\hat{Z}_{i-1})$ , i.e.,

[TABLE]

Then, it follows with elementary calculations that

[TABLE]

Now let $\varepsilon>0$ be arbitrary but fixed. Choose $a^{*},b^{*}\in\hat{\Lambda}_{i}$ such that $I_{i}(a^{*})\geq\operatorname*{ess\,sup}_{\text{ w.r.t.\ }\nu_{i}}I_{i}(\cdot)-\frac{\varepsilon}{2}$ and $I_{i}(b^{*})\leq\operatorname*{ess\,inf}_{\text{ w.r.t.\ }\nu_{i}}I_{i}(\cdot)+\frac{\varepsilon}{2}$ . Next, we use the Marton coupling of $\hat{Z}$ to obtain

[TABLE]

And as $\varepsilon>0$ was arbitrary,

[TABLE]

Moreover, we have

[TABLE]

and both the left- and the right-hand-side are $\mathcal{F}_{i-1}$ -measurable. Consequently, using the lemma of Devroye and Lugosi (2012), we find that

[TABLE]

This establishes the claim in (A.2). The final result in (A.3) follows from the inequalities $\left\lVert\Gamma C(c)\right\rVert^{2}\leq\left\lVert\Gamma\right\rVert^{2}\left\lVert C(c)\right\rVert^{2}\leq\left\lVert\Gamma\right\rVert^{2}\left\lVert c\right\rVert^{2}s(\hat{Z}).$

∎

The next proposition is due to Fiebig (1993) and a consequence of Goldstein’s maximal coupling, Goldstein (1979). See also Paulin (2015) Proposition 2.6 and Samson (2000) Proposition 2.

Proposition A.4 (Fiebig (1993), p. 482, (2.1)).

Let $P$ and $Q$ be two probability distributions on some common Polish space $\Lambda_{1}\times\ldots\times\Lambda_{N}$ both admitting a strictly positive density w.r.t. to a measure $\rho$ . Then there is a coupling of random vectors $X=(X_{1},\ldots,X_{N})$ , $Y=(Y_{1},\ldots,Y_{N})$ such that $\mathcal{L}(X)=P$ , $\mathcal{L}(Y)=Q$ and

[TABLE]

Acknowledgments

The author thanks an anonymous referee whose careful reading and detailed reports improved the manuscript considerably. This research was supported by the German Research Foundation (DFG), Grant Number KR-4977/1-1.

Bibliography46

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bobrowski and Kahle (2018) O. Bobrowski and M. Kahle. Topology of random geometric complexes: a survey. Journal of Applied and Computational Topology , 1(3):331–364, 2018.
2Boissonnat et al. (2018) J.-D. Boissonnat, F. Chazal, and M. Yvinec. Geometric and Topological Inference , volume 57. Cambridge University Press, 2018.
3Bubenik (2015) P. Bubenik. Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. , 16(1):77–102, 2015.
4Carlsson (2009) G. Carlsson. Topology and data. Bulletin of the American Mathematical Society , 46(2):255–308, 2009.
5Chalker et al. (1999) T. Chalker, A. Godbole, P. Hitczenko, J. Radcliff, and O. Ruehr. On the size of a random sphere of influence graph. Advances in Applied Probability , 31(3):596–609, 1999.
6Chazal and Divol (2018) F. Chazal and V. Divol. The density of expected persistence diagrams and its kernel based estimation. In LIP Ics-Leibniz International Proceedings in Informatics , volume 99. Schloss Dagstuhl-Leibniz-Zentrum für Informatik, 2018.
7Chazal and Michel (2017) F. Chazal and B. Michel. An introduction to topological data analysis: fundamental and practical aspects for data scientists. ar Xiv preprint ar Xiv:1710.04019 , 2017.
8Chazal et al. (2014) F. Chazal, B. T. Fasy, F. Lecci, A. Rinaldo, and L. Wasserman. Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the 30th Annual Symposium on Computational Geometry , pages 474–483, 2014.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

On limit theorems for persistent Betti numbers from dependent data

Abstract

keywords:

MSC:

1 Notation

Definition 1.1** (Persistent Betti number).**

2 Persistent Betti numbers obtained from time series

2.1 The data generating process

2.2 Marton couplings as the concept of dependence

Definition 2.1** (Marton coupling).**

Example 2.2** (Delay embeddings from Markov chains).**

2.3 Covering the state space

Condition 2.3** (Covering condition).**

Example 2.4** (Coverings for finite-dimensional spaces).**

Example 2.5** (Coverings of functional spaces).**

2.4 A concentration inequality for persistent Betti numbers from dependent data

Theorem 2.6**.**

Theorem 2.7**.**

2.5 Convergence results in Euclidean space

Theorem 2.8**.**

Theorem 2.9**.**

Corollary 2.10** (Vague convergence of persistence diagrams obtained from dependent data).**

3 Extensions to random fields

Theorem 3.1**.**

Theorem 3.2**.**

4 Technical results

4.1 Helpful tools

Lemma 4.1** (Concentration inequality for bounded transition kernels).**

Proof.

Lemma 4.2**.**

Proof.

Lemma 4.3** (Geometric Lemma, Lemma 2.11 in Hiraoka et al. (2018)).**

4.2 Technical details on Section 2

Proof of Theorem 2.6.

Proof of Theorem 2.7.

Proof of Theorem 2.8.

Proof of Theorem 2.9.

Proof of Corollary 2.10.

4.3 Technical details on Section 3

Proof of Theorem 3.1.

Proof of Theorem 3.2.

Appendix A McDiarmid inequalities for Marton couplings

Definition A.1** (Partition).**

Theorem A.2** (McDiarmid’s inequality, Paulin (2015)).**

Lemma A.3**.**

Proof of Theorem A.2.

Proposition A.4** (Fiebig (1993), p. 482, (2.1)).**

Acknowledgments

Definition 1.1 (Persistent Betti number).

Definition 2.1 (Marton coupling).

Example 2.2 (Delay embeddings from Markov chains).

Condition 2.3 (Covering condition).

Example 2.4 (Coverings for finite-dimensional spaces).

Example 2.5 (Coverings of functional spaces).

Theorem 2.6.

Theorem 2.7.

Theorem 2.8.

Theorem 2.9.

Corollary 2.10 (Vague convergence of persistence diagrams obtained from dependent data).

Theorem 3.1.

Theorem 3.2.

Lemma 4.1 (Concentration inequality for bounded transition kernels).

Lemma 4.2.

Lemma 4.3 (Geometric Lemma, Lemma 2.11 in Hiraoka et al. (2018)).

Definition A.1 (Partition).

Theorem A.2 (McDiarmid’s inequality, Paulin (2015)).

Lemma A.3.

Proposition A.4 (Fiebig (1993), p. 482, (2.1)).