Random Fixed Boundary Flows

Zhigang Yao; Yuqing Xia; Zengyan Fan

arXiv:1904.11332·math.OC·March 3, 2023

Random Fixed Boundary Flows

Zhigang Yao, Yuqing Xia, Zengyan Fan

PDF

Open Access

TL;DR

This paper introduces the concept of fixed boundary flows on non-linear Riemannian manifolds, providing a method to analyze noisy multivariate data with fixed start and end points, and demonstrates its convergence and practical applications.

Contribution

It defines and analyzes random fixed boundary flows on manifolds, develops an algorithm for computation, and proves its convergence to the population flow.

Findings

01

The fixed boundary flow decomposes into three segments, including a principal flow in Euclidean space.

02

The random fixed boundary flow converges to the population flow with high probability.

03

The method is applicable to real data sets, demonstrating interpretability and utility.

Abstract

We consider fixed boundary flow with canonical interpretability as principal components extended on non-linear Riemannian manifolds. We aim to find a flow with fixed starting and ending points for noisy multivariate data sets lying on an embedded non-linear Riemannian manifold. In geometric term, the fixed boundary flow is defined as an optimal curve that moves in the data cloud with two fixed end points. At any point on the flow, we maximize the inner product of the vector field, which is calculated locally, and the tangent vector of the flow. The rigorous definition derives from an optimization problem using the intrinsic metric on the manifolds. For random data sets, we name the fixed boundary flow the random fixed boundary flow and analyze its limiting behavior under noisy observed samples. We construct a high level algorithm to compute the random fixed boundary flow and the…

Figures40

Click any figure to enlarge with its caption.

Tables1

Table 1. Table 1: Comparison of the mean errors for the Hausdorff distances d H ( γ ~ , γ ∗ ) subscript 𝑑 𝐻 ~ 𝛾 superscript 𝛾 d_{H}(\tilde{\gamma},\gamma^{*}) with five different values of h ℎ h . Standard deviations based on ten sets of randomly generated data are shown in parentheses.

Unit Sphere						Right-circular unit cone
$h$	noisy “C”-shaped	$h$	noisy six-fold	$h$	noisy two-fold	$h$	noisy band	$h$	noisy “C” shape	$h$	noisy “S” shape
$0.06$	0.0074(0.0017)	$0.04$	0.0093(0.0022)	$0.06$	0.0122(0.0025)	0.12	0.0095(0.0021)	0.06	0.0080(0.0018)	0.06	0.0101(0.0025)
$0.08$	0.0063(0.0010)	$0.06$	0.0090(0.0009)	$0.08$	0.0110(0.0026)	0.14	0.0088(0.0012)	0.08	0.0066(0.0013)	0.08	0.0095(0.0013)
$0.10$	0.0063(0.0014)	$0.08$	0.0121(0.0012)	$0.10$	0.0133(0.0011)	0.16	0.0094(0.0009)	0.10	0.0067(0.0012)	0.10	0.0126(0.0014)
$0.12$	0.0073(0.0011)	$0.10$	0.0180(0.0059)	$0.12$	0.0164(0.0009)	0.18	0.0103(0.0009)	0.12	0.0077(0.0009)	0.12	0.0167(0.0013)
$0.14$	0.0078(0.0009)	$0.12$	0.0257(0.0071)	$0.14$	0.0204(0.0011)	0.20	0.0122(0.0007)	0.14	0.0099(0.0009)	0.14	0.0215(0.0019)

Equations189

\int_{0}^{r} λ_{1} (γ (t)) ⟨ \overset{γ}{˙} (t), W_{n, h} (γ (t))⟩ d t .

\int_{0}^{r} λ_{1} (γ (t)) ⟨ \overset{γ}{˙} (t), W_{n, h} (γ (t))⟩ d t .

d λ_{1} (x) = 2 ⟨ W (x) W (x)^{T} (x - \overset{x}{ˉ}), d x ⟩,

d λ_{1} (x) = 2 ⟨ W (x) W (x)^{T} (x - \overset{x}{ˉ}), d x ⟩,

Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2})

Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2})

γ = ar g γ \in Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup \int_{0}^{ℓ (γ)} ⟨ \overset{γ}{˙}, W (γ (t))⟩ d t,

γ = ar g γ \in Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup \int_{0}^{ℓ (γ)} ⟨ \overset{γ}{˙}, W (γ (t))⟩ d t,

{{\color[rgb]{0,0,0}x_{i}=\gamma^{*}(t_{i})+\xi_{i},\quad{\rm for}\ i=1,\ldots,n}}

{{\color[rgb]{0,0,0}x_{i}=\gamma^{*}(t_{i})+\xi_{i},\quad{\rm for}\ i=1,\ldots,n}}

Σ_{h^{(k)}} (z_{2 i + 1}^{(k)}) = \frac{1}{n _{2 j + 1, k}} l = 1 \sum n_{2 j + 1, k} (y_{l} - z_{2 j + 1}^{(k)}) \otimes (y_{l} - z_{2 j + 1}^{(k)}),

Σ_{h^{(k)}} (z_{2 i + 1}^{(k)}) = \frac{1}{n _{2 j + 1, k}} l = 1 \sum n_{2 j + 1, k} (y_{l} - z_{2 j + 1}^{(k)}) \otimes (y_{l} - z_{2 j + 1}^{(k)}),

d \leq 1/ (2 σ), and n \leq exp {\frac{1}{4 C σ}},

d \leq 1/ (2 σ), and n \leq exp {\frac{1}{4 C σ}},

i max d (x_{i}, γ^{*}) \leq σ (d + ln (n^{C}) \leq σ (\frac{1}{2 σ} + \frac{1}{2 σ}) \leq σ .

i max d (x_{i}, γ^{*}) \leq σ (d + ln (n^{C}) \leq σ (\frac{1}{2 σ} + \frac{1}{2 σ}) \leq σ .

\tilde{γ}^{(k + 1)} =

\tilde{γ}^{(k + 1)} =

\displaystyle\ \cup\ s\big{(}\gamma_{{\rm proj},2N^{{\scriptscriptstyle(k)}}-1}^{{\scriptscriptstyle(k)}}(t_{2N^{{\scriptscriptstyle(k)}}}),\bar{x}_{2}\big{)}\ \cup\ \{s\big{(}\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j}),\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})\big{)}\}_{j=0}^{N^{{\scriptscriptstyle(k)}}-1}

C_{δ} h^{(0)} \leq C_{1} ρ, ∥ γ^{*} (0) - γ^{*} (r^{*}) ∥ > (4 C_{1} + 7) h^{(0)}, C_{2} > 4 C_{1} + 7

C_{δ} h^{(0)} \leq C_{1} ρ, ∥ γ^{*} (0) - γ^{*} (r^{*}) ∥ > (4 C_{1} + 7) h^{(0)}, C_{2} > 4 C_{1} + 7

d_{H} (\tilde{γ}^{(K)}, γ^{*}) = O (h^{(k)}^{2}) = O ((σ)^{2}) = O (σ) .

d_{H} (\tilde{γ}^{(K)}, γ^{*}) = O (h^{(k)}^{2}) = O ((σ)^{2}) = O (σ) .

Σ_{n, h} (x) = \frac{1}{n} i = 1 \sum n (x_{i} - x) (x_{i} - x)^{T} .

Σ_{n, h} (x) = \frac{1}{n} i = 1 \sum n (x_{i} - x) (x_{i} - x)^{T} .

\gamma_{*}(0)=\bar{x}_{1}\quad{\rm and}\quad\dot{\gamma}_{*}(t)=W(\gamma_{*}(t))\quad{\rm for}\ t\in(0,{{\color[rgb]{0,0,0}C\Delta}}].

\gamma_{*}(0)=\bar{x}_{1}\quad{\rm and}\quad\dot{\gamma}_{*}(t)=W(\gamma_{*}(t))\quad{\rm for}\ t\in(0,{{\color[rgb]{0,0,0}C\Delta}}].

γ sup L (W, γ) \leq γ sup \int_{0}^{C Δ} ⟨ \overset{γ}{˙} (t), W (γ (t))⟩ \leq \int_{0}^{C Δ} ∥ \overset{γ}{˙} (t) ∥∥ W (γ (t)) ∥ d t = C Δ,

γ sup L (W, γ) \leq γ sup \int_{0}^{C Δ} ⟨ \overset{γ}{˙} (t), W (γ (t))⟩ \leq \int_{0}^{C Δ} ∥ \overset{γ}{˙} (t) ∥∥ W (γ (t)) ∥ d t = C Δ,

Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) = {γ \in Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) : \overset{γ}{˙} (t) ⊙ W (γ (t)) \geq 0 for any t} .

Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) = {γ \in Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) : \overset{γ}{˙} (t) ⊙ W (γ (t)) \geq 0 for any t} .

\overset{γ}{ˉ}_{1} = argsup_{γ \in Γ_{+} (\overset{x}{ˉ}_{1}, p_{1}, v_{1})} L (W, γ), \overset{γ}{ˉ}_{2} = argsup_{γ \in Γ_{+} (p_{2}, \overset{x}{ˉ}_{2}, v_{1})} L (W, γ) .

\overset{γ}{ˉ}_{1} = argsup_{γ \in Γ_{+} (\overset{x}{ˉ}_{1}, p_{1}, v_{1})} L (W, γ), \overset{γ}{ˉ}_{2} = argsup_{γ \in Γ_{+} (p_{2}, \overset{x}{ˉ}_{2}, v_{1})} L (W, γ) .

γ_{s} (t) = ⎩ ⎨ ⎧ \overset{γ}{ˉ}_{1} (t) \overset{γ}{ˉ}_{1} (ℓ (\overset{γ}{ˉ}_{1})) + \overset{γ}{ˉ} (t) \overset{γ}{ˉ}_{1} (ℓ (\overset{γ}{ˉ}_{1})) + \overset{γ}{ˉ} (ℓ (\overset{γ}{ˉ})) + \overset{γ}{ˉ}_{2} (t) 0 \leq t \leq ℓ (\overset{γ}{ˉ}_{1}) ℓ (\overset{γ}{ˉ}_{1}) < t \leq ℓ (\overset{γ}{ˉ}_{1}) + ℓ (\overset{γ}{ˉ}) ℓ (\overset{γ}{ˉ}_{1}) + ℓ (\overset{γ}{ˉ}) < t \leq ℓ (\overset{γ}{ˉ}_{1}) + ℓ (\overset{γ}{ˉ}) + ℓ (\overset{γ}{ˉ}_{2}),

γ_{s} (t) = ⎩ ⎨ ⎧ \overset{γ}{ˉ}_{1} (t) \overset{γ}{ˉ}_{1} (ℓ (\overset{γ}{ˉ}_{1})) + \overset{γ}{ˉ} (t) \overset{γ}{ˉ}_{1} (ℓ (\overset{γ}{ˉ}_{1})) + \overset{γ}{ˉ} (ℓ (\overset{γ}{ˉ})) + \overset{γ}{ˉ}_{2} (t) 0 \leq t \leq ℓ (\overset{γ}{ˉ}_{1}) ℓ (\overset{γ}{ˉ}_{1}) < t \leq ℓ (\overset{γ}{ˉ}_{1}) + ℓ (\overset{γ}{ˉ}) ℓ (\overset{γ}{ˉ}_{1}) + ℓ (\overset{γ}{ˉ}) < t \leq ℓ (\overset{γ}{ˉ}_{1}) + ℓ (\overset{γ}{ˉ}) + ℓ (\overset{γ}{ˉ}_{2}),

L (W, γ_{s}) = γ \in Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup L (W, γ) .

L (W, γ_{s}) = γ \in Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup L (W, γ) .

γ \in Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup L (W, γ) \geq γ \in Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) / Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup L (W, γ),

γ \in Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup L (W, γ) \geq γ \in Γ (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) / Γ_{+} (\overset{x}{ˉ}_{1}, \overset{x}{ˉ}_{2}) sup L (W, γ),

{Π_{M} z : (z - \tilde{γ} (t))^{T} V diag (\frac{λ _{1}}{λ _{2} h ^{2}}, \dots, \frac{λ _{1}}{λ _{m} h ^{2}}) V^{T} (z - \tilde{γ} (t)) = 1},

{Π_{M} z : (z - \tilde{γ} (t))^{T} V diag (\frac{λ _{1}}{λ _{2} h ^{2}}, \dots, \frac{λ _{1}}{λ _{m} h ^{2}}) V^{T} (z - \tilde{γ} (t)) = 1},

M := {x \in R^{d} : F (x) = 0} .

M := {x \in R^{d} : F (x) = 0} .

T_{x} M := {y \in R^{d} : D F y = 0}, x \in M,

T_{x} M := {y \in R^{d} : D F y = 0}, x \in M,

exp_{x} : T_{x} M \to M,

exp_{x} : T_{x} M \to M,

lo g_{x} : M \to T_{x} M .

lo g_{x} : M \to T_{x} M .

\frac{1}{n} i = 1 \sum n g^{2} (\cdot, x_{i}) .

\frac{1}{n} i = 1 \sum n g^{2} (\cdot, x_{i}) .

Σ_{h} (\overset{x}{ˉ}) = \frac{1}{\sum _{i} κ _{h} ( x _{i} , x ˉ )} i = 1 \sum n (lo g_{\overset{x}{ˉ}} (x_{i}) \otimes lo g_{\overset{x}{ˉ}} (x_{i})) κ_{h} (x_{i}, \overset{x}{ˉ}),

Σ_{h} (\overset{x}{ˉ}) = \frac{1}{\sum _{i} κ _{h} ( x _{i} , x ˉ )} i = 1 \sum n (lo g_{\overset{x}{ˉ}} (x_{i}) \otimes lo g_{\overset{x}{ˉ}} (x_{i})) κ_{h} (x_{i}, \overset{x}{ˉ}),

Σ_{h} (x)

Σ_{h} (x)

= Σ_{h} (\overset{x}{ˉ}) + (x - \overset{x}{ˉ}) (x - \overset{x}{ˉ})^{T} .

d λ_{1} (x)

d λ_{1} (x)

= ⟨(d Σ_{h} (x)) W (x), W (x)⟩ + ⟨ Σ_{h} (x) d W (x), W (x)⟩ + ⟨ Σ_{h} (x) W (x), d W (x)⟩

= ⟨(d Σ_{h} (x)) W (x), W (x)⟩ + ⟨ d W (x), Σ_{h}^{T} (x) W (x)⟩ + ⟨ Σ_{h} (x) W (x), d W (x)⟩

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLandslides and related hazards · Hydrology and Drought Analysis · Morphological variations and asymmetry

Full text

Random Fixed Boundary Flows

Abstract

We consider fixed boundary flow with canonical interpretability as principal components extended on non-linear Riemannian manifolds. We aim to find a flow with fixed starting and ending points for noisy multivariate data sets lying on an embedded non-linear Riemannian manifold. In geometric term, the fixed boundary flow is defined as an optimal curve that moves in the data cloud with two fixed end points. At any point on the flow, we maximize the inner product of the vector field, which is calculated locally, and the tangent vector of the flow. The rigorous definition derives from an optimization problem using the intrinsic metric on the manifolds. For random data sets, we name the fixed boundary flow the random fixed boundary flow and analyze its limiting behavior under noisy observed samples. We construct a high level algorithm to compute the random fixed boundary flow and the convergence of the algorithm is provided. We show that the fixed boundary flow yields a concatenate of three segments, of which one coincides with the usual principal flow when the manifold is reduced to the Euclidean space. We further prove that the random fixed boundary flow converges largely to the population fixed boundary flow with high probability. We illustrate how the random fixed boundary flow can be used and interpreted, and showcase its application in real data sets.

Zhigang Yao

Department of Statistics and Data Science

21 Lower Kent Ridge Road

National University of Singapore, Singapore 117546

Center of Mathematical Sciences and Applications

20 Garden Street

Harvard University

Cambridge, USA 02138

email: [email protected] or [email protected]

Yuqing Xia

School of Data Science

Zhejiang University of Finance and Economics

Hangzhou, China 310018

email: [email protected]

Zengyan Fan

School of Science and Technology

463 Clementi Road

Singapore University of Social Sciences, Singapore 599494

email: [email protected]

Keywords: vector field, manifolds, curve, boundary condition, tangent space

1 Introduction

Most existing statistical methods assume a linear dependency between features. As the dimensionality of features increases, the representation of the features in a high-dimensional space becomes more complex and it thus becomes more challenging to understand the relationships between features. In many applications, modern data structures are often complex and not necessarily linear. Indeed it is often the case that there is a lower-dimensional structure, namely a manifold embedded in the high-dimensional ambient space (Fefferman et al., 2016; Fefferman et al., 2018), as in the examples of geometric shapes in the shape space (Turk and Levoy, 1994; Dryden and Mardia, 2016; Kilian et al., 2007; Bradley et al., 2013) and graphs in computer graphics (Phillips et al., 1997; Gross, 2005; Arjovsky et al., 2017).

A series of methods that aim to recover the underlying structure of the lower-dimensional manifold have been developed over the past two decades. These methods, usually called manifold learning, are focused mostly on mapping data in a $d$ -dimensional space into a set of points close to an $m$ -dimensional ( $m\ll d$ ) manifold. Among them, is a method known as known as the Principal Component Analysis (PCA), which is commonly used to reduce the feature dimension in the Euclidean space. To address features lying in a non-linear space (i.e., a manifold), methods such as LLE (Roweis and Saul, 2000), Isomap (Tenenbaum et al., 2000), MDS (Cox and Cox, 2000), and LTSA (Zhang and Zha, 2004), which determine the low-dimensional embedding, preserving local properties of the data, may be preferable. A comprehensive review of such work appears in Ma and Fu (2011).

Another line of research relating to statistics on manifolds is centered on the extension of existing methods defined in the Euclidean space to the manifold space. The manifold space can be the actual physical space that the data lies on or the learnt manifold created through the manifold learning methods. In recent decades, numerous non-linear approaches have been developed to analyze the data on the manifold directly (Jupp and Kent, 1987; Fletcher et al., 2004; Huckemann and Ziezold, 2006; Kume et al., 2007; Fletcher and Joshi, 2007; Kenobi et al., 2010; Jung et al., 2012; Eltzner et al., 2018). Throughout the paper, we focus on the known manifold, based on the assumption that the manifold embedding is known.

Next, we will mainly review the “curve fitting” methods on manifolds. A geodesic is a generalization of the straight lines in the standard Euclidean space to the manifold. The principal geodesic analysis (Fletcher et al., 2004), which extends the PCA to the manifold, was proposed to describe the non-linear variability of data on a manifold. The principal curves, proposed in Hastie and Stuetzle (1989), are flexible one-dimensional curves that pass through the middle of data points. Having said that, principal curves are able to better capture the non-linear variation of data in comparison to all other regression lines in the Euclidean space. Ozertem and Erdogmus (2011) redefined principal curves and surfaces in terms of the gradient and the Hessian of the probability density estimate, based on the consideration that every point on the principal surface should be at the local maximum of the probability density in the local orthogonal subspace, and not the expected value as in Hastie and Stuetzle (1989). For applications in classification tasks, Ladicky and Torr (2011) proposed a new curve fitting method to find the smooth decision boundary with bounded curvature.

A recent piece of work on principal flows (Panaretos et al., 2014) works as an extension of the principal curves on Riemannian manifolds. Therefore, the principal flows are also flexible one-dimensional curves, which pass through the Fréchet mean of the data points. The principal flows are able to capture the non-geodesic pattern of variation both locally and globally. Instead of handling curves with an explicit parameterization, Liu et al. (2017) combine the level set method with the principal flow algorithm to obtain a fully implicit formulation, so that the obtained co-dimension one surface on the manifold fits the data set well.

When the data comes with multiple paths, it would be quite natural to want to isolate one of the paths in particular - that with a fixed direction. All the methods outlined above/earlier fail to determine flows with fixed directions. Hence, we propose flows with a fixed direction, each determined by fixed data boundaries, namely their start and end points. For example, we consider seismological events that took place in the Sea of Japan between 1904 and 2015, with the epicentres plotted as green dots in Figure 1(b). From the information of tectonic plates shown in Figure 1(a), we observe that the seismological events in this analysis tended to occur around the tectonic plate boundaries (shown as black curves with triangles in Figure 1(a)). Specifically, we deduce that the seismological events occurred frequently along the boundaries of four tectonic plates: the North American plate, the Eurasian plate, the Philippine Sea plate and the Pacific plate. Given these seismological events, the principal flow passes through the Fréchet mean and captures local variations that depend on the value for the scale parameter, $h$ . Since there are a greater number of seismological events along the boundary of the Pacific plate, the resulting Fréchet mean appears around the Pacific plate and the principal flow starts moving from the Fréchet mean. In Figure 1(b), the red curve represents the principal flow of the earthquake data for a scale parameter of $400$ miles. We observe that the principal flow moves along the boundary of the Pacific plate (red curve in Figure 1(c)). When we focus on the seismological events caused by the Philippine Sea plate, the principal flow will not be of interest in terms of finding a boundary. In this sense, the trend along the boundary highlighted in blue in Figure 1(c) would be more appropriate. Although we could derive a flow similar to that shown in blue by selecting the data with latitudes and longitudes around the Philippine Sea plate, it is hard to accurately determine which data points to include in practice. Hence, we propose fixed boundary flows, where the flow will be automatically determined by using boundary points that are chosen by users manually. If we select start and end points around the Philippine Sea plate, the obtained fixed boundary flow for a scale parameter of $400$ miles is shown in blue in Figure 1(b). Furthermore, we observe that the fixed boundary flow starts from the fixed starting point, moves along the boundary of the Philippine Sea plate (blue curve in Figure 1(c)) and terminates at the fixed ending point.

In order to clarify the aforementioned concepts and parameters, we hereby review the technique proposed for principal flows in brief and demonstrate that this technique comes up short when considering boundary constraints. Throughout this paper we work within the context of a complete Riemannian manifold ${\cal M}$ of dimension $m$ , and ${\cal M}$ is isometrically embedded into the Euclidean space $(\mathbb{R}^{d},\|\cdot\|)$ with $m<d$ . The related preliminaries in Riemannian geometry can be found in the Supplementary Materials. Given data points $\{x_{i}\}_{i=1}^{n}$ on the Riemannian manifold, the methodology for the principal flow seeks to solve for a curve on the manifold that passes through the Fréchet mean of the data, such that the tangent vector along the curve locally follows the direction of maximal variation of the data in a local tangent space. As we will define later, the vector field characterizes the direction of maximal variation and the scale parameter characterizes how locally or globally we wish to describe a path of maximal variation. A flow with a large scale parameter captures the global trend while a flow with a reduced scale parameter describes the finer structure.

Mathematically, the principal flow finds a curve $\gamma:[0,r]\to\mathcal{M}$ starting at a Fréchet mean $\bar{x}$ and maximizing

[TABLE]

where $\lambda_{1}(x)$ and $W_{n,h}(x)$ are the first eigenvalue and the first unit eigenvector of the local tangent covariance matrix $\Sigma_{n,h}(x)$ , respectively. The definition of $\Sigma_{n,h}(x)$ is reviewed in the Supplementary Materials, and we remark that $\Sigma_{n,h}(x)$ is computed with the projections of data points onto $T_{x}{\cal M}$ , which implies that the first eigenvector $W_{n,h}(x)$ also locates in the tangent space at $T_{x}{\cal M}$ . The subscript $n$ of $\Sigma_{n,h}$ indicates that $\Sigma_{n,h}(x)$ is calculated from the data points of cardinality $n$ , while the subscript $h$ of $\Sigma_{n,h}(x)$ indicates the locality. Specifically, $\Sigma_{n,h}(x)$ is computed using the data points in $B_{d}(x,h)$ , the Euclidean ball centered at $x$ of radius $h$ . With a different $x$ , the eigenvectors form the vector field $W=\{W_{n,h}(x)\}$ . To avoid confusion, we will omit the subscript $n$ and $h$ of $W_{n,h}(x)$ hereafter. The first eigenvalue is assumed to be simple throughout, which guarantees the uniqueness of $W$ . We note that the projection onto the local tangent space might be impossible in practice, in the case that either ${\cal M}$ or the formula of the local tangent space is unavailable. Under these circumstances, we might omit the projection step in computing $\Sigma_{n,h}(x)$ and use the local covariance matrix in ambient space instead.

Let us think of a simple example: noisy “C”-shaped data in ${\cal M}=\mathbb{R}^{2}$ as shown in Figure 2. Furthermore, by setting $h=\infty$ , we will use this example to demonstrate that determining fixed boundary flows is not a simple extension of the work of principal flows. The first eigenvalue $\lambda_{1}(x)$ in (1.1) varies with $x$ with Figure 2 visualizing its changes. One may see that the first eigenvalue reaches its trough at the Fréchet mean, which in turn implies that the first eigenvalue would have been increasing along any direction after its departure from $\bar{x}$ . By differentiating $\lambda_{1}(x)$ (see derivation given in the Supplementary Materials) we have

[TABLE]

and that $\lambda_{1}(x)$ increases most rapidly along its gradient, that is, $W(x)$ and $-W(x)$ . Therefore, maximizing either $\lambda_{1}(\gamma(t))$ or $\langle\dot{\gamma}(t),W(\gamma(t))\rangle$ at $\gamma(t)=\bar{x}$ locally leads to two half-lines along $W(\bar{x})$ and $-W(\bar{x})$ starting from $\bar{x}$ . Hence, maximizing the optimization problem (1.1), which is the product of $\lambda_{1}(\gamma(t))$ and $\langle\dot{\gamma}(t),W(\gamma(t))\rangle$ locally, leads to the principal flow along $W(\bar{x})$ through $\bar{x}$ , as represented by the dashed line on the left panel.

Things are very different when one considers the fixed boundary flows, which begin from the fixed starting point $\bar{x}_{1}$ , move along the data points and end at the fixed ending point $\bar{x}_{2}$ . From the right panel of Figure 2, we observe that the $\lambda_{1}(x)$ is large at the boundary and will decrease when a curve moves towards the data cloud’s center from $\bar{x}_{1}$ . Furthermore, from the differentiation form, $\lambda_{1}(x)$ decreases most rapidly along its gradient $W(\bar{x}_{1})$ , as shown by the red arrow in the right panel of Figure 2 and increases most rapidly along $-W(\bar{x}_{1})$ , the dashed arrow shown on the right panel of Figure 2. Therefore, if one maximizes the inner product $\langle\dot{\gamma}(t),W(\gamma(t))\rangle$ at $\gamma(t)=\bar{x}_{1}$ in (1.1), in favor of the curve moving along the vector field $W(\gamma(t))$ , one should take $\dot{\gamma}(t)=W(\gamma(t))$ . This means the curve would move along the red arrow in the right panel of Figure 2, which makes $\lambda_{1}(\gamma(t))$ decrease most rapidly. While if one maximizes $\lambda_{1}(\gamma(t))$ , one should take $\dot{\gamma}(t)=-W(\gamma(t))$ , in favor of the curve moving along $-W(\gamma(t))$ since it is the gradient of $\lambda_{1}(\gamma(t))$ . However, taking $\dot{\gamma}(t)=-W(\gamma(t))$ makes $\langle\dot{\gamma}(t),W(\gamma(t))\rangle$ decrease fastest. From this point of view, we conclude that maximizing $\lambda_{1}(\gamma(t))$ and the inner product $\langle\dot{\gamma}(t),W(\gamma(t))\rangle$ in (1.1) is mutually conflicting. Such conflict makes the fixed boundary flows unique, unlike the principal flows, meaning one cannot, therefore, simply extend the optimization problem of principal flows to fixed boundary flows.

We are now motivated to consider the fixed boundary flows that capture the manifold data variation in a way that differs from the principal flows. To achieve this, we initialize an optimization problem to capture a smooth flow for non-random data lying on the manifold that starts and ends at pre-defined points in Section 2. For each point of the flow, its tangent vector is close to the vector field at that point. When noise presents, the data follows from the underlying distribution of the population flow on the manifold and it is thus non-deterministic. And so too are the fixed boundary flows. The random fixed boundary flows, generalizing the fixed boundary flows, are proposed in Section 3.1. An efficient algorithm to determine the random fixed boundary flow, with its convergence of the random fixed boundary flow, is outlined in Section 3.2. In Sections 4 and 5, we illustrate that the random fixed boundary flow is able to capture patterns of variation in synthetic, seismic and real-world image data. Several statistical properties and theories of the fixed boundary flow are examined in Section 6.

2 Fixed boundary flows

Fixing two boundary points produces an infinite number of flows. To begin with, we describe the class of curves that provide the candidates of the fixed boundary flows. Given $\bar{x}_{1}$ and $\bar{x}_{2}$ , we define the class as:

[TABLE]

where $W(\gamma(t))$ is the value of the vector field $W$ , calculated form local data $\{x_{i}\}_{i=1}^{n}$ at $\gamma(t)$ , and $\ell(\gamma[0,t])$ denotes the length of the parametric flow $\gamma[0,t]$ from $\gamma(0)$ to $\gamma(t)$ , for all $0<t\leq r$ . Here, $\Delta=d(\bar{x}_{1},\bar{x}_{2})$ denotes the geodesic distance between $\bar{x}_{1}$ and $\bar{x}_{2}$ and $C>1$ is a given constant. The choice of $C$ controls the size of $\Gamma(\bar{x}_{1},\bar{x}_{2})$ . Since $t\in[0,{{\color[rgb]{0,0,0}C\Delta}}]$ , the length of the flows in the class $\Gamma(\bar{x}_{1},\bar{x}_{2})$ is less than $C\Delta$ . A smaller C filters out the flows that (1) are far away from the data cloud by restricting the length of flows in the class $\Gamma(\bar{x}_{1},\bar{x}_{2})$ , and (2) overfits the data (this is because overfitted flows tend to go through all data points which will increase its length). We assume $\Delta<1$ without loss of generalization, otherwise the manifold ${\cal M}$ should be rescaled. For any flow $\gamma\in\Gamma(\bar{x}_{1},\bar{x}_{2})$ , we could determine its moving direction and vector field at every point. The moving directions and vector fields vary with different points and different flows. To follow the direction of highest variation, we aim to find a flow with a moving direction that matches the vector field as much as possible at any given point on the flow. From the classical mechanics perspective, we seek a flow with fixed starting and ending points, that best approximate the vector field globally. Conventional local Euclidean approaches fail to achieve this without being able to accommodate the boundary conditions globally, while forcing the flow to stay on the manifold. We term such an optimal flow the fixed boundary flow; that is, it is defined as a smooth flow $\gamma$ on the manifold $\mathcal{M}$ , starting and ending at the fixed points, with a derivative vector $\dot{\gamma}$ that is maximally compatible with the vector field $W$ , calculated from local data.

Definition 2.1.

(Fixed boundary flow at scale $h$ ) Let $\bar{x}_{1},\bar{x}_{2}\in B$ , where $B$ is the neighborhood that contains the data $\{x_{i}\}_{i=1}^{n}$ on the manifold. Assume that $\Sigma_{n,h}(x)$ have distinct first and second eigenvalues for any $x\in B$ . A fixed boundary flow of $\{x_{i}\}_{i=1}^{n}$ with given $\bar{x}_{1}$ and $\bar{x}_{2}$ is the curve satisfying

[TABLE]

where $W(\gamma(t))$ is the vector field over the neighborhood of $\gamma(t)$ for $0\leq t\leq C\Delta$ .

The fixed boundary flow is the solution of the optimization problem defined in (2.2).

3 Random fixed boundary flows

Besides being high-dimensional, the data on the manifold is usually noisy, representing some underlying distribution. One accessible way to illustrate the noisy data is shown in the following assumption.

Definition 3.1.

(Population flow) Given boundary points $\bar{x}_{1}$ and $\bar{x}_{2}$ , there exists a population flow $\gamma^{\ast}\subset{\cal M}$ under unit speed parameterization, depending on the continuous vector field $W^{\ast}$ distribution. Assume that $\gamma^{\ast}$ passes through $\bar{x}_{1}$ and $\bar{x}_{2}$ , which means $\gamma^{\ast}(0)\neq\bar{x}_{1}$ and $\gamma^{\ast}(r^{\ast})\neq\bar{x}_{2}$ with $r^{\ast}=\ell(\gamma^{\ast})$ .

Assumption 3.1.

The data points $\{x_{i}\}_{i=1}^{n}\subset\mathcal{M}$ satisfy

[TABLE]

where $t_{1}\leq\cdots\leq t_{n}$ are ordered indices sampled from uniform distribution between $[0,r^{*}]$ on $\gamma^{*}$ and $\{\xi_{i}\}_{i=1}^{n}\sim N(0,\sigma^{2}I_{d})$ are i.i.d. Gaussian noises.

Remark 3.1.

*Also, $\gamma^{*}(t_{1})\neq\bar{x}_{1}$ and $\gamma^{*}(t_{n})\neq\bar{x}_{2}$ . As shown in Figure 3, $\bar{x}_{1}$ and $\bar{x}_{2}$ are chosen to be at the inner end of the data cloud so that there are enough samples in the neighborhood of $\bar{x}_{1}$ and $\bar{x}_{2}$ . Section 3.2 will further formulate the relationship between $\bar{x}_{1}$ ( $\bar{x}_{2}$ ) and the end points of the population flow. *

Under Assumption 3.1, the relation between the fixed boundary flow and $\gamma^{\ast}$ is summarized in the following theorem.

Theorem 3.1.

Suppose Assumption 3.1 holds, the vector field $W$ is calculated at scale $h$ and $T=\{t:\|\gamma^{\ast}(t)-\gamma^{\ast}(0)\|>h/2\ \mbox{and}\ \|\gamma^{\ast}(t)-\gamma^{\ast}(r^{*})\|>h/2\}$ . For any $t\in T$ and given $\delta>0$ , there exist constants $C$ and $n_{0}$ such that if $n\geq n_{0}$ , then $\langle\dot{\gamma}^{\ast}(t),W(\gamma^{\ast}(t)\rangle\geq 1-\frac{C}{2}h^{2}$ with probability $1-\delta$ .

The proof of Theorem 3.1 is given in Appendix B in the Supplementary Materials. From Theorem 3.1, we observe that the inner product $\langle\dot{\gamma}^{\ast}(t),W(\gamma^{*}(t)\rangle$ is close to its maximum, that is, $1$ with sufficiently small $h$ . This means that the integrand in the optimization problem (2.2) achieves a very large value along the main segment of the flow $\gamma^{*}$ . Note that we choose to work with $\gamma$ simply because there might not be enough samples at the two ends of $\gamma^{*}$ . Here, boundary $\bar{x}_{1}$ and $\bar{x}_{2}$ are at the inner end of the data cloud so that the main segment $\gamma^{*}(\bar{x}_{1},\bar{x}_{2})$ is $h/2$ away from $\gamma^{*}(0)$ and $\gamma^{*}(r^{*})$ . Hence, $\gamma^{*}(\bar{x}_{1},\bar{x}_{2})$ well approximates the optimal solution to (2.2), as illustrated by Figure 3. The theoretical analysis will focus on the main segment of $\gamma^{*}$ . For convenience, we use $\gamma^{\ast}$ simplifying $\gamma^{\ast}(\bar{x}_{1},\bar{x}_{2})$ for the rest of the paper.

Now, let us turn to the random fixed boundary flows. Under Assumption 3.1 and given fixed boundaries, a random fixed boundary flow is the empirical flow, $\tilde{\gamma}$ , computed from the data points with the fixed boundary. Our focus here is twofold. First is to determine the random fixed boundary flows through an efficient algorithm without intensive computation. Second is to investigate the distance property between the random fixed boundary flow and the population flow $\gamma^{\ast}$ , where a theoretical analysis of the bound of the Hausdorff distance is derived from the geometry property of the underlying manifold.

3.1 Determination of random fixed boundary flows

The aim of the proposed approach is to determine the random fixed boundary flow via a discrete flow with the fixed boundary. Furthermore, each point of the discrete flow moves along the direction of the vector field, which captures the localized variation maximally. From this perspective, the proposed approach attains an approximate solution for the original optimization problem in (2.2).

Given the fixed boundary points $\bar{x}_{1}$ and $\bar{x}_{2}$ , the implementation begins with a discrete flow $\tilde{\gamma}^{\scriptscriptstyle(0)}$ starting at $\bar{x}_{1}$ and ending at $\bar{x}_{2}$ , with a user-defined resolution $N$ . The choice of the initial flow $\tilde{\gamma}^{\scriptscriptstyle(0)}$ can be a geodesic on the manifold $\mathcal{M}$ or a straight line from $\bar{x}_{1}$ to $\bar{x}_{2}$ in the ambient space, neither derailing the convergence of the algorithm, as we will show. The initial flow is denoted by $\tilde{\gamma}^{\scriptscriptstyle(0)}(t_{i})$ , with $0=t_{0}<t_{1}<...<t_{2N}=1$ and satisfied $\tilde{\gamma}^{\scriptscriptstyle(0)}(t_{0})=\bar{x}_{1}$ and $\tilde{\gamma}^{\scriptscriptstyle(0)}(t_{2N})=\bar{x}_{2}$ . Then, the proposed approach will iteratively update the flow $\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{i})$ from $k=1$ until the convergence criterion is met. Gradually, at each point, the flow $\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{i})$ is determined to maximize the localized variation of the data. Hence, user-defined values for the scale parameter $h$ , shrinkage constant $\rho$ and stopping criterion constant $\epsilon$ , are each needed during iterations.

During the iterations, we update the discrete flow $\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{i})$ , $i=0,1,\ldots,2N$ , for $k=1,2,\ldots$ by maximizing the optimization problem (2.2). There are four core steps with this aim in mind: choosing scale parameter, calculating local covariance matrix, determining vector field, and updating. Here, we elaborate on each of these core steps, as shown in Figure 4.

(1)

Choosing scale parameter: we choose an appropriate scale parameter $h^{\scriptscriptstyle(k)}=\rho^{k}h$ , where $h\leq 1$ and $\rho\in(0,1]$ is a shrinkage constant. In our study, we let $\rho=0.9$ . One may note that the shrinkage constant $\rho$ makes the scale parameter $h^{\scriptscriptstyle(k)}$ decrease during the iterations. Hence, the scale parameter $h^{\scriptscriptstyle(k)}$ guarantees the capture of the local variation.

(2)

Calculating local covariance matrix: the local covariance matrix is determined by using the discrete flow $\tilde{\gamma}^{\scriptscriptstyle(k-1)}$ that we have obtained from the previous iteration. Specifically, we use the points $\tilde{\gamma}^{\scriptscriptstyle(k-1)}(t_{2j+1})$ , $j=0,1,\ldots,N-1$ , with odd indices to calculate the local covariance matrix. Determining the local covariance matrix is a vital step for the following updating step of the discrete flow. We note that the points $\tilde{\gamma}^{\scriptscriptstyle(k-1)}(t_{2j+1})$ , $j=0,1,\ldots,N-1$ , may not lie inside the data cloud. To capture the local variation accurately, we propose to project these points back inside the data cloud. To this aim, we first project these points to the nearest data points. As the nearest data points might be outliers, we further select the local data points within the distance of $h^{\scriptscriptstyle(k)}$ from the nearest data points and obtain the mean points. Eventually, the projected points $\tilde{\gamma}_{\text{proj}}^{\scriptscriptstyle(k)}(t_{2j+1})$ , $j=0,\ldots,N-1$ are the nearest data points to the mean points. Then, the projected points $\tilde{\gamma}_{\text{proj}}^{\scriptscriptstyle(k)}(t_{2j+1})$ are used to select the local data points to further compute the local covariance matrix. Denote by $\{y_{l}\}_{l=1}^{n_{2j+1,k}}$ the data points in the neighborhood of the Euclidean ball $B_{d}(\tilde{\gamma}_{\rm proj}^{{\scriptscriptstyle(k)}}(t_{2j+1}),h^{{\scriptscriptstyle(k)}})$ with center $\tilde{\gamma}_{\rm proj}^{{\scriptscriptstyle(k)}}(t_{2j+1})$ and radius $h^{{\scriptscriptstyle(k)}}$ . Eventually, the local covariance matrix is computed at the mean $z^{{\scriptscriptstyle(k)}}_{2j+1}$ of the local data points $\{y_{l}\}_{l=1}^{n_{2j+1,k}}$ and can be calculated by

[TABLE]

where $a\otimes a=aa^{T}$ .

(3)

Determining vector field: following on from the local covariance matrix $\Sigma_{h^{\scriptscriptstyle(k)}}(z^{{\scriptscriptstyle(k)}}_{2j+1})$ that we obtained in the previous step, the vector field is to determine at the mean points $z^{{\scriptscriptstyle(k)}}_{2j+1}$ . Denote by $W(\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j+1}))$ the vector field at point $\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j+1})$ , $j=0,1,\ldots,N-1$ . By now, the local variation is captured by the local covariance matrix $\Sigma_{h^{\scriptscriptstyle(k)}}(z^{{\scriptscriptstyle(k)}}_{2j+1})$ . Hence, the direction along the first eigenvector $e_{1}(z^{{\scriptscriptstyle(k)}}_{2j+1})$ shows the maximum variation. We let the vector field $W(\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j+1}))=e_{1}(z^{{\scriptscriptstyle(k)}}_{2j+1})$ .

(4)

Updating: as the boundary points are fixed, we first let $\gamma^{\scriptscriptstyle(k)}(t_{0})=\bar{x}_{1}$ and $\gamma^{\scriptscriptstyle(k)}(t_{2N})=\bar{x}_{2}$ . For the points with odd indices, we let $\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1})=z^{{\scriptscriptstyle(k)}}_{2j+1}$ , $j=0,1,\ldots,N-1$ . The vector field $W(\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j+1}))$ is used to update the points with even indices. Specifically, we map the points $\tilde{\gamma}^{\scriptscriptstyle(k-1)}(t_{2j})$ to the directions of the two adjacent vector fields $W(\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j-1}))$ and $W(\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j+1}))$ for $j=1,2,\ldots,N-1$ . We denote the two projected points by $\tilde{\gamma}_{{\rm proj},2j-1}^{\scriptscriptstyle(k)}(t_{2j})$ and $\tilde{\gamma}_{{\rm proj},2j+1}^{\scriptscriptstyle(k)}(t_{2j})$ . Hence, we update the points $\tilde{\gamma}^{\scriptscriptstyle(k)}(t_{2j})$ by using the mean point of $\tilde{\gamma}_{{\rm proj},2j-1}^{\scriptscriptstyle(k)}(t_{2j})$ and $\tilde{\gamma}_{{\rm proj},2j+1}^{\scriptscriptstyle(k)}(t_{2j})$ .

It is crucial to ensure that the random fixed boundary flow always moves along the direction that maximizes the vector field. Therefore, a stop criterion is necessary to the implementation. According, we terminate the iteration process when the optimization function $f(\tilde{\gamma}^{\scriptscriptstyle(k)})$ does not change too much. Lastly, interpolation and projection will be implemented to ensure that the points $\tilde{\gamma}(t_{i})$ on the resulting random fixed boundary flow are equidistant and lie on the manifold. The detailed algorithm is summarized in Algorithm 1. The convergence of the random fixed boundary flow will be investigated in Section 3.2.

3.2 Convergence of the random fixed boundary flow

In the following statement, ${\cal I}(x,h)=\{i:x_{i}\in B_{d}(x,h)\}$ and $|{\cal I}(x,h)|$ denotes the cardinality of ${\cal I}(x,h)$ . We use upper $C$ , $C_{0},C_{1},\cdots$ or lower $c,c_{0},c_{1},\cdots$ to denote constants greater or less than 1. Here, a constant means a value independent of $h$ , $h^{{\scriptscriptstyle(k)}}$ and $x$ . Values of $C$ and $c$ with various subscripts may differ from line to line.

Recalling our Assumption 3.1, samples are blurred by Gaussian noise. Hence, by Gaussian concentration, the maximal distance between a point $x_{i}$ and $\gamma^{\ast}$ is bounded above by $\sigma(\sqrt{d}+\sqrt{\ln(n^{C})})$ with probability at least $1-n^{-C}$ . If $\sigma$ is sufficiently small such that

[TABLE]

we can further bound the maximal distance between a point $x_{i}$ and $\gamma^{\ast}$ above by $\sqrt{\sigma}$ , with probability $1-n^{-C}$ , since the following holds

[TABLE]

This inequality above shows that the samples mainly lie in the tube $T_{*}=\{x:d(x,\gamma^{\ast})\leq\sqrt{\sigma}\}$ surrounding $\gamma^{\ast}$ . Considering a point $x$ satisfying $d(x,\gamma^{\ast})\leq\epsilon$ with $\epsilon>\sqrt{\sigma}$ , the intersection $B_{d}(x,2\epsilon)\cap T_{*}$ cannot be ignored, hence the following assumption $B_{d}(x,2\epsilon)\cap X\neq\emptyset$ holds true.

Assumption 3.2.

For any $\epsilon>\sqrt{\sigma}$ , if $x$ satisfies $d(x,\gamma^{\ast})\leq\epsilon$ , then $B_{d}(x,2\epsilon)\cap X\neq\emptyset$ .

Note that Step 3(a) of Algorithm 1 projects points to the data cloud by finding its nearest samples in $X$ . Assumption 3.2 bounds the distance between the given point and the projected point above, which essentially leads to the convergence. Algorithm 1 selects decreasing scales $h^{{\scriptscriptstyle(k)}}=\rho h^{{\scriptscriptstyle(k-1)}}$ with a given $\rho\in(0,1]$ in each iteration, until the scale is less than $4\sqrt{\sigma}$ or the objective function hardly changes. Each iteration takes the output discrete flow of the previous iteration as input, updates the vector field with a smaller scale and outputs a discrete flow using the updated vector field. Theorems 3.2 - 3.4 with full proofs in Appendix C in the Supplementary Materials, together prove that the random fixed boundary flow converges to the population flow $\gamma^{\ast}$ , given certain conditions of the initial discrete flow.

Specifically, Theorem 3.2 exploits the $k$ -th iteration and bounds $d_{H}(\tilde{\gamma}^{\scriptscriptstyle(k+1)},\gamma^{\ast})$ above when (a) its input discrete flow is sufficiently close to $\gamma^{\ast}$ , (b) the points in the discrete flow are sufficiently dense, and (c) the points with odd indices are not too close to the two ends of the population flow. Note that (c) is needed since the vector field near the two ends does not follow the population flow. This means that the fixed boundaries $\bar{x}_{1}$ and $\bar{x}_{2}$ should be chosen not too close to the two ends, $\gamma^{\ast}(0)$ and $\gamma^{\ast}(r^{*})$ , in practice. Theorem 3.3 proves that imposing constraints on the initial discrete flow, that is the input discrete flow for $k=0$ , also leads to the upper bound of $d_{H}(\tilde{\gamma}^{\scriptscriptstyle(k+1)},\gamma^{\ast})$ . Theorem 3.4 proves the convergence of the random fixed boundary flow, as the projection of $\tilde{\gamma}^{\scriptscriptstyle(K)}$ onto ${\cal M}$ .

Theorem 3.2.

Suppose the discrete curve at the $k$ -th iteration satisfies the following conditions:

(a)

$d_{H}(\{\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i})\}_{i=0}^{2N^{\scriptscriptstyle(k)}},\gamma^{\ast})\leq C_{1}h^{{\scriptscriptstyle(k)}}$ ,

(b)

$\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i+1})-\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i})\|\leq C_{2}h^{{\scriptscriptstyle(k)}}$ * for any $i<2N^{{\scriptscriptstyle(k)}}$ ,*

(c)

$\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})-\gamma^{\ast}(0)\|\geq(2C_{1}+{{\color[rgb]{0,0,0}3.5}})h^{{\scriptscriptstyle(k)}}$ * and $\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})-\gamma^{\ast}(r^{*})\|\geq(2C_{1}+{{\color[rgb]{0,0,0}3.5}})h^{{\scriptscriptstyle(k)}}$ for any $j=0,1,\cdots,N^{{\scriptscriptstyle(k)}}-1$ ,*

For any given $\delta>0$ , there exists $C$ such that any point in the polyline

[TABLE]

is also within Hausdorff distance $C_{\delta}{h^{{\scriptscriptstyle(k)}}}^{2}$ to $\gamma^{\ast}$ with probability $1-\delta$ .

We only sketch the proof of Theorem 3.2. Recalling Algorithm 1, the polyline $\tilde{\gamma}^{\scriptscriptstyle(k+1)}$ is composed of segments passing $\{\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1})\}_{j=0}^{N^{\scriptscriptstyle(k)}-1}$ and along $\{W(\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1}))\}_{j=0}^{N^{\scriptscriptstyle(k)}-1}$ . Lemma 3.1 in the Supplementary Materials proves $\{\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1})\}_{j=0}^{N^{\scriptscriptstyle(k)}-1}$ are within Hausdorff distance $O({h^{{\scriptscriptstyle(k)}}}^{2})$ to $\gamma^{\ast}$ . Lemma 3.2 proves the vector field at $\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1})$ approximate the tangent vector of $\gamma^{\ast}$ in the order of $h^{\scriptscriptstyle(k)}$ . Based on these, Lemma 3.3 in the Supplementary Materials proves that the segments which pass $\{\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1})\}_{j=0}^{N^{\scriptscriptstyle(k)}-1}$ along $\{W(\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{2j+1}))\}_{j=0}^{N^{\scriptscriptstyle(k)}-1}$ approximates $\gamma^{\ast}$ in the order of ${h^{\scriptscriptstyle(k)}}^{2}$ . Hence, the polyline $\tilde{\gamma}^{\scriptscriptstyle(k+1)}$ which is composed of these segments is also within Hausdorff distance $O({h^{{\scriptscriptstyle(k)}}}^{2})$ to $\gamma^{\ast}$ .

Theorem 3.3.

If the conditions (a)-(c) in Theorem 3.2 hold for $k=0$ and the constants in Theorem 3.2 further satisfies

[TABLE]

then for any given $k>0$ , $d_{H}(\tilde{\gamma}^{\scriptscriptstyle(k+1)},\gamma^{\ast})\leq C{h^{{\scriptscriptstyle(k)}}}^{2}$ with probability $(1-\delta)^{k+1}$ .

According to the stopping criteria, $h^{{\scriptscriptstyle(K)}}=O(\sqrt{\sigma})$ when Algorithm 1 stops. Hence, the final polyline $\tilde{\gamma}^{\scriptscriptstyle(K)}$ satisfies

[TABLE]

Note that the interpolation step generates a discrete curve containing $\tilde{\gamma}^{\scriptscriptstyle(K)}$ , and the projection step will not change the order of the Hausdorff distance as Theorem 3.4 has proved. To be precise, the final discrete curve of Algorithm 1 is located in a tube along the population curve $\gamma^{\ast}$ with a radius in order of $\sqrt{\sigma}$ .

Theorem 3.4.

If $d(x,\gamma^{\ast})=O(h)$ , then $d(\tilde{x},\gamma^{\ast})=O(h)$ , where $\tilde{x}$ is the projection of $x$ onto ${\cal M}$ .

4 Simulations

To illustrate the performance of random fixed boundary flows, we studied several random data sets generated on two manifolds, a unit sphere and a right-circular unit cone. The two manifolds are in $\mathbb{R}^{3}$ with intrinsic dimension $d=2$ . In the simulation, the boundary points were selected manually from the given data set so that there are enough data points around the boundary points to calculate the local variation. To generate the random fixed boundary flows, we applied the proposed algorithm with different values of the scale parameter $h$ . Here, we note that the random fixed boundary flow is a discrete curve with derivatives that approximately capture the direction of the maximum local variation depending on $h$ . Throughout the numerical studies in sections 4 and 5, we use RFBFs to denote random fixed boundary flows.

In the first part of the simulation, we evaluate the performance of the RFBFs on the unit sphere. The noisy data sets are randomly generated from three population flows, which are plotted in purple in Figure 5 (a)-(c). Specifically, Gaussian noise is added to the points on the population flows with a constraint such that the perturbed points remain on the test manifold. In this manner, we generated three noisy data sets, each representing different types of variation on the unit sphere. The first data set is concentrated around a “C”-shaped curve on the unit sphere, thus presenting a variation pattern along the geodesic. After that, we considered two data sets from two non-convex closed flows. In this setting, the simulated data sets present local variation patterns along the non-convex flows. In particular, the second data set is generated from a quarter of the six-fold star-shaped flow, and the third data set is concentrated around a half of the two-fold star-shaped flow.

To obtain RFBFs, the initial flows used in our analysis are straight lines connecting $\bar{x}_{1}$ and $\bar{x}_{2}$ . One may use other initial flows, for example, the geodesic from $\bar{x}_{1}$ to $\bar{x}_{2}$ . Given a set of randomly generated data, we obtained a RFBF with a specific $h$ . For the data sets plotted in Figure 5 (a)-(c), the RFBFs obtained with a specific value of $h$ are illustrated in red in Figure 5 (d)-(f). To further investigate the performance, we obtained ten sets of random data for each population flow. The RFBFs are then obtained with a sequence of $h$ for the random data. An analysis of the mean errors for the Hausdorff distances between the population flow and RFBFs are summarized in Table 1. From the numerical results, we note that the RFBFs are able to capture the variation globally and locally. As we lower $h$ , the performance accuracy of the RFBFs improves generally. On the other hand, overfitting may occur as we lower $h$ gradually.

For the two non-convex population flows in Figure 5 (b)-(c), we also generated noisy data sets from the whole closed flows. As boundary points are required to obtain RFBFs, we handled these noisy data sets parts by parts. For example, we fitted the noisy six-fold data set quarter by quarter and the noisy two-fold data set half by half. We specified the boundary points for each part of the whole data set and obtained the RFBFs with predetermined values of $h$ . The obtained RFBFs are shown in red in Figure 6. To compare the performance accuracy, we further applied the level set methods in Liu et al. (2017) to the random data sets and plotted the obtained curves in blue in Figure 6. In contrast to the level set methods, the RFBFs are able to capture the local variation better, especially at the parts of the curves with high curvature. We also note that the level curve methods reach the locations outside the data cloud at some parts of the two-fold data.

In the second part of the simulation study, the testing manifold is extended to a right-circular unit cone, with apex at $(0,0,0)$ , height $H=1$ and radius $R=1$ . Three types of random data sets are generated to examine the performance of RFBFs on the right-circular unit cone. The first data set is concentrated around a band on the cone. For the second and third data sets, they are generated from a “C”-shaped and “S”-shaped population flows on the tested manifold. The RFBFs with a predetermined value of $h$ are illustrated in red in Figure 7. As the data plotted shown, we observe that the RFBFs work well to capture different types of variations on the cone. Similarly, we fitted RFBFs with a sequence of $h$ for ten randomly generated data sets. To examine the performance accuracy, the mean errors of the Hausdorff distances between the population flows and RFBFs are summarized in Table 1. As expected, the obtained RFBFs do indeed divine the variation accurately on the cone as we lower $h$ . It becomes more challenging to capture the variation accurately for all three types of variation investigated when the variation pattern becomes more complicated.

5 Real Data Application

5.1 Seismological Data

Here we explain the full analysis of the previously mentioned seismological events. The data set was sourced from the International Seismological Center (ISC) and features significant earthquakes (magnitude $5.5$ in Richter scale and above, including continental events of magnitude $5.0$ ) between 1904 and 2015. The earthquake epicentre data is plotted in black in Figure 8. Before we fit the RFBF for the earthquake data, we first investigate the distribution of the first eigenvalue for the data. This is shown in Figure 8, from which we observe that the variation of the first eigenvalue among the earthquake epicentres along the distribution of the earthquakes is quite non-uniform. Furthermore, we also observe that the first eigenvalue changes with different values of $h$ , which changes the determination of local variation. Hence, the analysis of seismological events is an example with a varying first eigenvalue and we will investigate the performance of RFBF for this case.

We note that earthquakes tend to occur around the tectonic plate boundaries. As has been mentioned earlier, the shape of the plate boundaries shown in Figure 1(a) carries the global variation (from east to west, or north to south) and the localized variation along different plates. If we select $\bar{x}_{1}$ and $\bar{x}_{2}$ around the Philippine Sea plate manually, we expect the RFBFs would move along the plate boundary and mirror the blue curve shown in Figure 1(c). At the same time, the movement of the RFBFs will also reflect the local variation pattern of the data, which is captured by $h$ . In our analysis, we scaled the data onto the unit sphere and selected three different sets of $\bar{x}_{1}$ and $\bar{x}_{2}$ along the Philippine Sea plate manually. Figure 9 illustrates the earthquake data on a flat world atlas with the three sets of $\bar{x}_{1}$ and $\bar{x}_{2}$ , namely (a)-(c), (d)-(f) and (g)-(i). To visualise and compare the performance, we fit RFBFs using three values of $h$ . As we expected, the RFBFs move along the boundary of the Philippine Sea plate and capture the variation between the given boundary points. Furthermore, we let $h$ vary and visualize the RFBFs that reflect the various localized variation patterns. Given the boundary points, we note that the RFBFs work well in capturing the variation patterns of the data. As we lower $h$ , the RFBFs uncover the global and local variation pattern more accurately. For example, when we set $h=0.075$ , the RFBFs in Figure 9 (a), (d) and (g) move inside the data cloud and trace the global variation from south to north better than the RFBFs in the other plots of Figure 9. When we gradually increase the value of $h$ , more data points will be involved in the determination of the local variation and this also influences the trend of the RFBFs. In the last three plots of Figure 9, we select two sets of boundary points with opposite directions. Comparing the results in Figure 9 (a)-(c) and (g)-(i), we note that the direction of the boundary points does not inordinately affect the RFBFs with the same $h$ .

5.2 Labeled Faces in the Wild

In this section, we consider another concrete case – Labeled Faces in the Wild (LFW) in Huang et al. (2007). The data set comprising face photographs is designed to provide a system of face recognition with over $13,000$ images of faces collected from the web. Each face image is labeled with the name of the person in that image. Note also that among those face images are $1,680$ people who have two or more distinct photographs in the data set.

In our study, we downloaded $264$ images of $66$ people with four images of each person. To facilitate the analysis, the face region was cropped from the original image and resized to $50\times 37$ pixels. The images of the face region for the $66$ individuals can be found in the Supplementary Materials. As the analysis uses four different images for each individual, the data set can be written as $\{x_{i}\}_{i=1}^{264}$ , where $x_{i}$ are vectors in the ambient space $\mathbb{R}^{1850}$ . We assume that the data points $\{x_{i}\}_{i=1}^{264}$ lie on the unit sphere $\mathbb{S}^{1849}$ , which is embedded in $\mathbb{R}^{1850}$ . To begin with, we chose two images with the largest distance from one another in the ambient space, setting them as $\bar{x}_{1}$ and $\bar{x}_{2}$ . As shown in Figure 10, the image of Andy Roddick in Figure 10(a) is the starting image, and the image of Jack Straw in Figure 10(p) is the ending image in our analysis. Then, we fit RFBFs with various values of $h$ .

The obtained RFBFs are discrete flows of face images which capture the variation of facial structure from the starting image to the ending image. With the exception of the boundary images on the RFBFs, we generated a sequence of fake faces, which are plotted in Figure 10 (b)-(o). The person plotted in each fake face image is not a real person that can be identified in the given image set. On the contrary, the person is constructed using the characteristics extracted from the local and global variation pattern of the given images. The intermediate face images on the RFBF reflect the progressive face changing from the starting image to the ending image. There are some noteworthy conclusions that we draw from the RFBFs. First, the skin tone of Andy Roddick shown in the starting image of Figure 10 (a) appears somewhat wheatish, while Jack Straw’s face, plotted in the ending image of Figure 10 (p), possesses a light skin tone. Through the fake faces constructed on the RFBF, we are able to observe the gradual changes of skin tone from dark to light. Second, we note that the Andy Roddick dons a cap in the starting image and Jack Straw’s hairstyle features a fringe in the ending image. For the first few images in Figure 10 (b)-(f), the fake faces on the RFBF are also wearing caps. In the last few images plotted in Figure 10 (m)-(o), the fake faces of the RFBF have fringes. Hence, we are also able to monitor the change of hairstyle through the intermediate fake faces on the RFBF. Although RFBFs are able to reveal some progressive face changes, the characteristics captured by the RFBFs are one-dimensional. Hence, the variation pattern analyzed by the RFBFs is limited when we are dealing with high-dimensional data with large $m$ values. We will consider the extension of RFBFs in the future.

6 Fixed Boundary Flow for Non-random Data in Euclidean Space

The aim of this section is to prove that fixed boundary flows for non-random data are canonical, in the sense that they will pass through the usual principal component, in the context of Euclidean spaces. Hereafter, we suppose $\mathcal{M}$ is a linear subspace of $\mathbb{R}^{d}$ , and $h=\infty$ , which implies

[TABLE]

Under this configuration, we will figure out the supremum of $\mathcal{L}(W,\gamma)$ defined in (2.2) subjected to the constraint $\gamma\in\Gamma(\bar{x}_{1},\bar{x}_{2})$ defined in (2.1).

Proposition 6.1.

Suppose $\gamma_{*}:[0,{{\color[rgb]{0,0,0}C\Delta}}]\to\mathcal{M}$ such that

[TABLE]

If $\gamma_{*}({{\color[rgb]{0,0,0}C\Delta}})=\bar{x}_{2}$ , then $\gamma_{*}$ is the unique optima of (2.2)

Proof.

Since $W(\gamma(t))$ and $\dot{\gamma}(t)$ are units for any $\gamma\in\Gamma(\bar{x}_{1},\bar{x}_{2})$ , we have

[TABLE]

and the equation holds only if $\dot{\gamma}(t)=W(\gamma(t))$ for any $t\in[0,C\Delta]$ . Hence, $\gamma_{*}$ is the only curve that enables the equation to hold, and is accordingly the unique optima of (2.2). ∎

Proposition 6.1 analyzes the optima of (2.2) under a strict condition that $\gamma_{*}(t)=\bar{x}_{2}$ with $t=C\Delta$ . If the condition is relaxed to be $t\leq C\Delta$ , things are more difficult. For further analysis, we suppose the original point of $\mathcal{M}$ to be $\bar{x}=\sum_{i=1}^{n}x_{i}$ , and $[v_{1},\cdots,v_{d}]$ to be the basis with $v_{1}=W(\bar{x})$ . For convenience, we denote $z_{i}=v_{i}^{T}x$ to be the $i$ -th coordinate of any $z\in{\cal M}$ and $V_{\bot}=[v_{2},\cdots,v_{d}]\in\mathbb{R}^{d\times(d-1)}$ hereafter.

Before giving our final proposition, we define some important sets and curves first. With $\odot$ representing Hadamard multiplication, we denote a subset of $\Gamma(\bar{x}_{1},\bar{x}_{2})$ , where the curves have the same direction with $W$ ,

[TABLE]

The red curves in Figure 11 (a) demonstrate flows satisfying $\dot{\gamma}(t)\odot W(\gamma(t))\geq 0$ , that is, the curves have the same direction as $W$ . Denote $p_{1}=v_{1}v_{1}^{T}\bar{x}_{1}$ and $p_{2}=v_{1}v_{1}^{T}\bar{x}_{2}$ as the projections of $\bar{x}_{1}$ and $\bar{x}_{2}$ , respectively, onto the first axis. And $\Gamma_{+}(\bar{x}_{1},p_{1},v_{1})$ ( $\Gamma_{+}(\bar{x}_{2},p_{2},v_{1})$ ) as the set of the curves from $\bar{x}_{1}$ ( $\bar{x}_{2}$ ) to $p_{1}$ ( $p_{2}$ ), orthogonal to $v_{1}$ and satisfying $\dot{\gamma}(t)\odot W(\gamma(t))\geq 0$ . We set

[TABLE]

And we also set $\bar{\gamma}:[0,\|p_{1}-p_{2}\|]\to\mathcal{M}$ as the straight line between $p_{1}$ and $p_{2}$ , that is $\bar{\gamma}(t)=p_{1}+\frac{t}{\|p_{2}-p_{1}\|}(p_{2}-p_{1})$ .

Let $\gamma_{s}$ be the concatenation of $\bar{\gamma}_{1},\bar{\gamma}$ and $\bar{\gamma}_{2}$ , that is $\gamma_{s}:[0,\ell(\bar{\gamma}_{1})+\ell(\bar{\gamma})+\ell(\bar{\gamma}_{2})]\to{\cal M}$ satisfying

[TABLE]

then $\gamma_{s}$ is continuous and in the closure of $\Gamma_{+}(\bar{x}_{1},\bar{x}_{2})$ by Proposition 4.1 in the Supplementary Materials. The yellow curve in Figure 11 (a) demonstrates $\gamma_{s}$ .

In Figure 11 (a), we use the blue arrows to demonstrate an example of the vector field satisfying Assumption 6.1. Generally speaking, this refers to the arrows at the left half plane pointing towards $\bar{x}$ and arrows at the right half plane pointing in the opposite direction. Moreover, the arrows straighten horizontally as they approach the second axis. We summarize the assumptions on the vector field in Assumption 6.1 (b) and (c).

Assumption 6.1.

**

(a)

$v^{T}\bar{x}_{1}<0$ * and $v^{T}\bar{x}_{2}>0$ .*

(b)

For any $x\in\mathcal{M}$ , $v_{1}^{T}W_{n,h}(x)\geq 0$ and $(v_{i}^{T}W(x))*(v_{i}^{T}x)*(v_{1}^{T}x)\geq 0$ for any $i\geq 2$ .

(c)

Suppose $x$ and $x^{\prime}$ are in $\mathcal{M}$ . If $V_{\bot}^{T}x=V_{\bot}^{T}x^{\prime}$ and $|v_{1}^{T}x|\leq|v_{1}^{T}x^{\prime}|\leq\max\{|v_{1}^{T}\bar{x}_{1}|,|v_{1}^{T}\bar{x}_{2}|\}$ , then $|v_{i}^{T}W(x)|\leq|v_{i}^{T}W(x^{\prime})|$ for any $i\geq 2$ .

Assumption 6.1 is not strict. Figure 12 illustrates (b) and (c) of Assumption 6.1 with two data sets, as represented by black points that are concentrated around a “C”-shaped curve and an “S”-shaped curve in $\mathbb{R}^{2}$ . The two diagrams in the left-hand panel show the vector fields for the two data sets, both of which satisfy Assumption 6.1(b), while the diagrams in the right-hand panel show how $|v_{2}^{T}W(x)|$ varies at different points of $x$ . Specifically, $|v_{2}^{T}W|$ gets larger when the color transitions to yellow, and smaller when the color transitions to blue. One can conclude from the two diagrams in the right panel that the vector field between the two orange lines satisfies Assumption 6.1(c).

We can now set out sthe second proposition, which is under the general condition $\gamma(t)=\bar{x}_{2}$ with $t\leq C\Delta$ . This proposition shows that if we restrict $\gamma$ in $\Gamma_{+}(\bar{x}_{1},\bar{x}_{2})$ , the fixed boundary flow will pass through the usual principal component.

Proposition 6.2.

If $\ell(\gamma_{s})\leq C\Delta$ , then

[TABLE]

The proof of Proposition 6.2 can be found in Appendix D in the Supplementary Materials. Also in Appendix D in the Supplementary Materials, we further explain the inequality

[TABLE]

Combining the inequality in Proposition 6.2 and the inequality (7.6), we conclude that the optimal solution of (2.2) always passes through the usual principal component. The scheme to show the inequality (7.6) is organized as follows. As shown in Figure 11(b) and (c), we construct $\gamma_{+}$ (the red curve) by any $\gamma$ (the blue curve), and illustrate $\mathcal{L}(W,\gamma_{s})\geq\mathcal{L}(W,\gamma)$ . In particular, if the dimension of the space is 2, the comparison between $\mathcal{L}(W,\gamma_{s})$ and $\mathcal{L}(W,\gamma)$ can be achieved by calculating the integration over the gray area using Green’s Theorem.

7 Discussion

The determination of a fixed boundary flow for data points on non-linear manifolds is a very different problem from the case of principal flow. We propose the notion of a fixed boundary flow to define a curve with fixed starting and ending points and a tangent velocity that matches the maximal variation of data in its neighborhood at each point. The local geometry of data variation is represented by the tangent space at the given point, which compels us to use the local vector fields. Based on this choice, we formulate an optimization framework to construct a smooth curve on the manifold, with a tangent vector that always matches the local vector fields. There is no doubt that the solution to the optimization problem, and equivalently, the fixed boundary flow, depends on how a neighborhood is defined at a certain point, just as with principal flow.

The choice of the neighborhood depending on the scale parameter $h$ determines how local or global covariation features are captured by the fixed boundary flow. Algorithm 1 provides a way to select a series of decreasing $h^{\scriptscriptstyle(k)}$ till $\rho h^{\scriptscriptstyle(k)}\leq 4\sqrt{\sigma}$ , which obliges us to focus on the global trend of the curve first and the local second. Using this algorithm, we generate a curve represented by $\tilde{\gamma}$ . We discuss below the construction of a “confidence band” for the resulting fixed boundary flow $\tilde{\gamma}\in{\cal M}$ . As we define the confidence band for the flow on the manifold, it should be a confidence ellipsoid. Note that the samples in $B_{d}(x,h)$ roughly lie within an ellipsoid with principal axes of length $h$ , $\frac{\sqrt{\lambda_{2}}}{\sqrt{\lambda_{1}}}h,\cdots,\frac{\sqrt{\lambda_{m}}}{\sqrt{\lambda_{1}}}h$ , respectively. Thus, we use the formulation of $\{\lambda_{i}\}_{i=1}^{m}$ to construct the “confidence band”. Specifically, for any point $x=\tilde{\gamma}(t)$ on the computed fixed boundary flow $\tilde{\gamma}$ , we can define an ellipsoid of dimension $(m-1)$ in the intersection of $\mathcal{T}_{x}{\cal M}$ and the normal space at $x$ , which could cover most samples in this intersection. By allowing the orthonormal $U(x)\in\mathbb{R}^{d\times m}$ be a basis of $\mathcal{T}_{x}{\cal M}$ , the confidence ellipsoid is of dimension $(m-1)$ obeying

[TABLE]

where $\Pi_{\mathcal{M}}z$ is the projection of $z$ onto $\mathcal{M}$ and $V=\big{(}U(\tilde{\gamma}(t))U(\tilde{\gamma}(t))^{T}-\dot{\tilde{\gamma}}(t)\dot{\tilde{\gamma}}(t)^{T}\big{)}(e_{2},\cdots,e_{m})$ . Note that $U(x)$ can be estimated with certain theoretical guarantees (see Tyagi et al. (2013)). We remark that $\dot{\tilde{\gamma}}(t)$ usually approximates $W(\tilde{\gamma}(t))$ , that is, $e_{1}$ . This makes $V$ full column rank, and consequently, the dimension of the ellipsoid is $(m-1)$ . If $\dot{\tilde{\gamma}}(t)$ is happened to be orthogonal to $W(\tilde{\gamma}(t))$ , the dimension of the ellipsoid would reduce to $(m-2)$ . With certain covering ellipsoid conditions for the samples in the neighborhood, one might consider bounding $\dot{\tilde{\gamma}}$ and $\tilde{\gamma}$ under the current setting. Some of the results in Yao and Zhang (2020) will be helpful in this respect. As this is part of our ongoing work, we intend to further investigate it in the future.

Acknowledgements

ZY is grateful for the financial support from his Singapore Ministry of Education (MOE) Tier 1 funding (A-0004809-00-00) and Tier 2 funding (A-0008520-00-00) at the National University of Singapore. ZY thanks Professor Shing-Tung Yau for his comments and discussion and the support from the Center of Mathematical Sciences and Applications at Harvard University.

Appendix A: Preliminaries

In this section, we will introduce some preliminaries in Riemannian geometry and review the principal flows. We focus on studying a complete Riemannian manifold $\mathcal{M}$ of dimension $m$ , equipped with a metric $g$ . The smooth Riemannian manifold $\mathcal{M}$ can be isometrically embedded into the Euclidean space $(\mathbb{R}^{d},\|\cdot\|)$ , $m<d$ . Assuming that the embedding is known, there exists a known differentiable function $F:\mathbb{R}^{d}\to\mathbb{R}^{m}$ , and we have

[TABLE]

The Riemannian metric $g(\cdot,\cdot)$ on the Riemannian manifold $\mathcal{M}$ is induced by an inner product $\langle\cdot,\cdot\rangle$ defined in the tangent space $T_{x}\mathcal{M}$ at each point $x\in\mathcal{M}$ , and the tangent space $T_{x}\mathcal{M}$ is denoted by

[TABLE]

where $DF$ is the $m\times d$ derivative matrix of $F$ evaluated at $x$ . Since the tangent space $T_{x}\mathcal{M}$ is able to locally approximate the manifold, we define two mappings between the tangent space and the manifold. The exponential map at $x$ takes a tangent vector $v\in T_{x}\mathcal{M}$ denoted by

[TABLE]

and there exists a unique geodesic $\gamma_{v}$ satisfying $\gamma_{v}(0)=x$ with initial velocity $\dot{\gamma}_{v}(0)=v/\|v\|$ . Therefore, the exponential map is locally defined by $\exp_{x}(v)=\gamma_{v}(\|v\|)$ in the neighborhood of $x$ . The inverse of the exponential map, the logarithm map, is denoted by

[TABLE]

Before we review principal flows, we recall the Fréchet mean and tangent space PCA, which are basic elements to construct principal flows. Let $\{x_{1},\dots,x_{n}\}$ be the data points lying on the manifold $\mathcal{M}$ , where there exists a connected open set $B\subset\mathcal{M}$ , such that $B$ covers $\{x_{1},\dots,x_{n}\}$ . The Fréchet mean $\bar{x}\in B$ is the point that minimizes the sum of square distances under the Riemannian metric

[TABLE]

The tangent components $e_{1}(\bar{x}),\ldots,e_{m}(\bar{x})$ at $\bar{x}$ form a basis for the tangent space $T_{\bar{x}}\mathcal{M}$ , and they are given by the first $r$ eigenvectors of the scale $h$ local tangent covariance matrix

[TABLE]

where $y\otimes y=yy^{\rm T}$ , $\kappa_{h}(x,\bar{x})=K(h^{-1}\|\log_{\bar{x}}x-\bar{x}\|)$ and $h>0$ .

When $h=\infty$ , and $\mathcal{M}=\mathbb{R}^{d}$ , the local tangent covariance matrix reduced to be

[TABLE]

Noting $W(x)$ is unit length, $\langle W(x),dW(x)\rangle=0$ , we could calculate the derivation of $\lambda_{1}(x)$ as follows:

[TABLE]

Appendix B: Proof of Theorem 3.1

In subsequent proof, we need a special case of Theorem 8 by Yao and Xia (2019), where the manifold degenerates into a curve, that is, the dimension of the manifold is $1$ . We state this special case of Theorem 8 below, where we use $h$ instead of $r^{\prime}$ as the scale, to ensure the same as the symbol in this paper.

Theorem 7.1 (Slight deformation of Theorem 8 by Yao and Xia (2019)).

Let $z$ be a point off a curve $\gamma$ , $z^{*}$ be the projection of $z$ onto $\gamma$ , $d(z,\gamma^{\ast})\leq h$ . We have

[TABLE]

where $\Pi_{z^{*}}^{*}$ denotes the orthogonal projection onto the normal space of $z^{*}$ and $\Pi_{z}=v_{\bot}v_{\bot}^{T}$ . Here $V_{\bot}$ is the orthogonal component of $v$ and $v$ is the first eigenvector of $\Sigma_{h}(z)$ .

To bound the summation of some power of $\|\xi_{i}\|$ above, we need Proposition 2.3 by Yao and Xia (2019) as follows.

Proposition 7.1 (Proposition 2.3 by Yao and Xia (2019)).

Suppose $\xi\sim N(0,\sigma^{2}I_{d})$ ; then we have, for any positive integer $k$ :

(1)

$\mathbb{E}(\|\xi\|_{2}^{k})=C_{1}\sigma^{k}$ **

(2)

${\rm Var}(\|\xi\|_{2}^{k})=C_{2}\sigma^{2k}$ **

(3)

$\mathbb{E}\big{(}\|\xi\|_{2}^{k}-\mathbb{E}(\|\xi\|_{2}^{k})\big{)}^{3}=C_{3}\sigma^{3k}$ **

(4)

$\|\xi_{i}\|_{2}^{k}$ * and $\|\xi_{j}\|_{2}^{k}$ are independent if $\xi_{i}$ and $\xi_{j}$ are independent,*

where $C_{1}$ , $C_{2}$ , and $C_{3}$ are three constants depending on $d$ and $k$ .

Based on the above proposition, we obtain the upper bound of the summation of each $\|\xi_{i}\|^{k}$ for points lying in a tube surrounding $\gamma^{\ast}$ .

Proposition 7.2.

For a given $\delta$ , there exists $C_{n}$ such that if $n\geq C_{n}\sqrt{\sigma}$ , then

[TABLE]

holds with probability $1-\delta$ for any $(x,h)$ satisfying $h>4\sqrt{\sigma}$ , $d(x,\gamma^{\ast})\leq\sqrt{\sigma}$ , $\|x-\gamma^{\ast}(0)\|>h/2$ and $\|x-\gamma^{\ast}(r^{*})\|>h/2$ .

Proof.

Noticing $\{\xi_{i}\}$ are i.i.d. samples drawn from Gaussian distribution, we can obtain the expectation $\mu_{k}=C_{1}\sigma^{k}$ , variance $\sigma_{k}^{2}=C_{2}\sigma^{2k}$ and the third moment $\rho_{k}=C_{3}\sigma^{3k}$ of $\|\xi_{i}\|^{k}$ according to Proposition 7.1. By Berry-Esseen Theorem, the cumulative distribution of $\big{(}\sum_{i\in{\cal I}(x,h)}\|\xi_{i}\|^{k}-\mu_{k}|{\cal I}(x,h)|\big{)}/\big{(}\sigma_{k}\sqrt{|{\cal I}(x,h)|}\big{)}$ denoted by $F$ satisfies

[TABLE]

where $\Phi$ is the cumulative distribution function of standard normal distribution. So, there exists $C$ depending on $d$ , $k$ and $\delta$ such that

[TABLE]

with probability at least $1-\delta/3-C^{\prime}/\sqrt{|{\cal I}(x,h)|}$ .

To estimate $|{\cal I}(x,h)|$ , we calculate the probability of $i\in{\cal I}(x,h)$ based on $h>4\sqrt{\sigma}$ ,

[TABLE]

Letting $x^{*}=\gamma^{\ast}(t^{*})$ be the projection of $x$ onto $\gamma^{\ast}$ , then $x^{*}\in B_{d}(x,h/2)$ since $\|x-x^{*}\|=d(x,\gamma^{\ast})\leq\sigma<h/4$ . Since $\|x-\gamma^{\ast}(0)\|>h/2$ and $\|x-\gamma^{\ast}(r^{*})\|>h/2$ , there exists $0<t_{1}<t^{*}<t_{2}<r^{*}$ such that $x_{1}=\gamma^{\ast}(t_{1})$ and $x_{2}=\gamma^{\ast}(t_{2})$ satisfy $\|x-x_{1}\|=\|x-x_{2}\|=h/2$ . Hence, $\ell(\gamma^{\ast}\cap B_{d}(x,h/2))\geq\|x_{1}-x^{*}\|+\|x^{*}-x_{2}\|\geq\big{(}\|x_{1}-x\|-\|x-x^{*}\|\big{)}+\big{(}\|x-x_{2}\|-\|x-x^{*}\|\big{)}\geq h/2$ . Since each entry of $\xi_{i}$ obeys Gaussian distribution, $\|\xi_{i}\|^{2}/\sigma^{2}$ obeys Chi-squared distribution. According to the cdf of Chi-squared distribution, we could obtain $\mathbb{P}(\|\xi_{i}\|\leq 2\sqrt{\sigma})=O(1)$ , and thereby $\mathbb{P}(i\in{\cal I}(x,h))\geq\frac{h/2}{r^{*}}\cdot O(1)\geq ch=2c\sqrt{\sigma}$ . Thus, whether $i\in{\cal I}(x,h)$ or not can be treated as a Bernoulli distribution with expectation no less than $2c\sqrt{\sigma}$ . Applying Berry-Esseen theorem to the $n$ Bernoulli trials, there exists $c^{\prime}<1$ such that $|{\cal I}(x,h)|\geq c^{\prime}n\sqrt{\sigma}$ with probability $1-C^{\prime\prime}/\sqrt{n}$ , which implies,

[TABLE]

Setting

[TABLE]

one could verify $1-\delta/3-C^{\prime}/(c^{\prime}n\sqrt{\sigma})\geq(1-\delta)/(1-\delta/3)$ and $(1-C^{\prime\prime}/\sqrt{n}\geq(1-\delta/3)$ , which implies $\mathbb{P}(\frac{1}{|{\cal I}(x,h)|}\sum_{i\in{\cal I}(x,h)}\|\xi_{i}\|^{k}\leq C\sigma^{k})\geq 1-\delta$ . Hence, we could take

[TABLE]

to complete this proof. ∎

Proof of Theorem 3.1.

For any $t\in T$ , plugging $z=\gamma^{\ast}(t)$ into Theorem 7.1, we have

[TABLE]

where the second inequality holds by Proposition 7.2 with probability $1-\delta$ , and the last inequality holds since $h>4\sqrt{\sigma}$ .

Let $u=\dot{\gamma}^{*}(t)$ , the tangent vector of $\gamma^{\ast}$ at $\gamma^{\ast}(t)$ , and $v=W(\gamma^{\ast}(t))$ , the first eigenvector of $\Sigma_{h}(\gamma^{\ast}(t))$ . By the definition of $\Pi_{z}$ and $\Pi_{z^{*}}^{*}$ , we have $\Pi_{z}=v_{\bot}v_{\bot}^{T}$ and $\Pi_{z^{*}}^{*}=u_{\bot}u_{\bot}^{T}$ . Hence $\|\Pi_{z}-\Pi_{z^{*}}^{*}\|_{F}=\|v_{\bot}v_{\bot}^{T}-u_{\bot}u_{\bot}^{T}\|=\|uu^{T}-vv^{T}\|\leq Ch$ . Noting

[TABLE]

we have $1-\langle u,v\rangle\leq\frac{C^{2}}{2}h^{2}$ , that is, $\langle u,v\rangle=\langle\dot{\gamma}^{*}(t),W(\gamma^{\ast}(t))\rangle\geq 1-\frac{C^{2}}{2}h^{2}$ , which completes the proof. ∎

Appendix C: Proof of Theorem 3.2 - Theorem 3.4

Lemma 7.1.

Suppose $h>4\sqrt{\sigma}$ , $\|x-\gamma^{\ast}(0)\|>h/2$ , $\|x-\gamma^{\ast}(r^{*})\|>h/2$ and $x\in X$ . For any given $\delta$ , there exists $C$ independent on $x$ and $h$ such that $d(\frac{1}{|{\cal I}(x,h)|}\sum_{i\in{\cal I}(x,h)}x_{i},\gamma^{\ast})\leq Ch^{2}$ with probability $1-\delta$ .

Proof.

In this proof, we simplify $\frac{1}{|{\cal I}(x,h)|}\sum_{i\in{\cal I}(x,h)}x_{i}$ to be $z$ for convenience. For $i\in{\cal I}(x,h)$ , we use $x_{i}^{*}$ to represent the projection of $x_{i}$ onto $\gamma^{\ast}$ , and similarly, use $z^{*}$ to represent the projection of $z$ onto $\gamma^{\ast}$ . Denoting the tangent space of $\gamma^{\ast}$ at $z^{*}$ to be $T_{z^{*}}\gamma^{\ast}$ , we have

[TABLE]

where $\|x_{i}-{x_{i}}^{*}\|\leq\sqrt{\sigma}<h/2$ by (3.3) in the main manuscript and

[TABLE]

To get a tight bound on $d(z,\gamma^{\ast})$ , we denote $\Pi_{z^{*}}^{*}$ to be the orthogonal projection onto the normal space of $\gamma^{\ast}$ at $z^{*}$ and obtain

[TABLE]

where the second term of the last inequality follows Theorem 4.18 by Federer (1959). Noting

[TABLE]

and $\frac{1}{|{\cal I}(x,h)|}\sum_{i\in{\cal I}(x,h)}\|\xi_{i}\|\leq C\sigma$ with high probability by Proposition 7.2, we could bound

[TABLE]

which completes the proof. ∎

Lemma 7.2.

Let $z$ be a point off $\gamma^{\ast}$ , $z^{*}$ be the projection of $z$ onto $\gamma^{\ast}$ , and $v$ be the tangent vector of $\gamma^{\ast}$ at $z^{*}$ . If $d(z,\gamma^{\ast})\leq C_{1}h^{2}$ and $\|z-\gamma^{\ast}(t)\|>h/2$ for $t=0,1$ , then for any given $\delta$ , there exists $C$ such that $\|vv^{T}-e_{1}(z)e_{1}(z)^{T}\|\leq Ch$ with probability $1-\delta$ .

Proof.

Since $d(z,\gamma^{\ast})\leq C_{1}h^{2}$ , $\|z-z^{*}\|=d(z,\gamma^{\ast})\leq C_{1}h^{2}$ . Plugging $z$ into Theorem 7.1, we have

[TABLE]

where the second inequality holds by Proposition 7.2 with probability $1-\delta$ , and the last inequality holds since $4\sqrt{\sigma}<h\leq 1$ .

By the definition of $\Pi_{z}$ and $\Pi_{z^{*}}^{*}$ , we have $\Pi_{z}=I-e_{1}(z)e_{1}(z)^{T}$ and $\Pi_{z^{*}}^{*}=I-vv^{T}$ . Hence $\|\Pi_{z}-\Pi_{z^{*}}^{*}\|_{F}=\|vv^{T}-e_{1}(z)e_{1}(z)^{T}\|\leq Ch$ . ∎

Proposition 7.3.

Suppose $u$ and $v$ are normal vectors, then $\|uu^{T}-vv^{T}\|=\sqrt{2}\|(I-vv^{T})u\|$ .

Proof.

To prove this proposition, we calculate $\|uu^{T}-vv^{T}\|^{2}$ and $\|(I-vv^{T})u\|^{2}$ respectively as per $\|uu^{T}-vv^{T}\|^{2}=\langle uu^{T},uu^{T}\rangle+\langle vv^{T},vv^{T}\rangle-2\langle uu^{T},vv^{T}\rangle=2-2\langle uu^{T},vv^{T}\rangle,$ and $\|(I-vv^{T})u\|^{2}=\langle(I-vv^{T})u,(I-vv^{T})u\rangle=\langle I,uu^{T}\rangle-\langle uu^{T},vv^{T}\rangle=1-\langle uu^{T},vv^{T}\rangle,$ which complete the proof. ∎

Proposition 7.4.

If $a\in\gamma^{\ast}$ and $b\in T_{a}\gamma^{*}$ , then $d(b,\gamma^{\ast})\leq\frac{1}{2\tau}\|a-b\|^{2}$ .

Proof.

Let $a=\gamma^{\ast}(t_{0})$ and $\Delta t=\langle b-a,{\gamma^{\ast}}^{\prime}(t_{0})\rangle$ . By Taylor’s expansion,

[TABLE]

where $t^{\prime}\in[t_{0},t_{0}+\Delta t]$ and $\dot{\gamma}^{\ast}(t_{0}){\dot{\gamma}^{\ast}(t_{0})}^{T}(b-a)=b-a$ since $b\in T_{a^{*}}\gamma^{\ast}$ and $\|\dot{\gamma}^{\ast}\|=1$ . Hence, we could obtain $d(b,\gamma^{\ast})\leq\|b-\gamma^{\ast}(t_{0}+\Delta t)\|\leq\frac{1}{2\tau}\|b-a\|^{2}$ . ∎

Proposition 7.5.

Suppose $x=z_{i}^{{\scriptscriptstyle(k)}}+\alpha e_{1}(z_{i}^{{\scriptscriptstyle(k)}},h^{{\scriptscriptstyle(k)}})$ , $d(z_{i}^{{\scriptscriptstyle(k)}},\gamma^{\ast})\leq C_{1}{h^{{\scriptscriptstyle(k)}}}^{2}$ , $\|z_{i}^{{\scriptscriptstyle(k)}}-\gamma^{\ast}(0)\|>h^{{\scriptscriptstyle(k)}}/2$ and $\|z_{i}^{{\scriptscriptstyle(k)}}-\gamma^{\ast}(r^{*})\|>h^{{\scriptscriptstyle(k)}}/2$ . If $|\alpha|\leq C_{2}h^{{\scriptscriptstyle(k)}}$ , then $d(x,\gamma^{\ast})\leq C{h^{{\scriptscriptstyle(k)}}}^{2}$ .

Proof.

This proof is conducted in two steps: First, we show that there is $\bar{x}\in T_{{z_{i}^{{\scriptscriptstyle(k)}}}^{*}}\gamma^{\ast}$ such that $d(x,T_{{z_{i}^{{\scriptscriptstyle(k)}}}^{*}}\gamma^{\ast})=\|x-\bar{x}\|\leq C{h^{{\scriptscriptstyle(k)}}}^{2}$ and then we bound $d(\bar{x},\gamma^{\ast})\leq C{h^{{\scriptscriptstyle(k)}}}^{2}$ . These two claims conclude $d(x,\gamma^{\ast})\leq\|x-\bar{x}\|+d(\bar{x},\gamma^{\ast})\leq C{h^{{\scriptscriptstyle(k)}}}^{2}$ .

To begin with,

[TABLE]

where $\|vv^{T}-e_{1}(z_{i}^{{\scriptscriptstyle(k)}})e_{1}(z_{i}^{{\scriptscriptstyle(k)}})^{T}\|\leq Ch^{{\scriptscriptstyle(k)}}$ by Lemma 7.2. Denote the projection of $x$ onto $T_{{z_{i}^{{\scriptscriptstyle(k)}}}^{*}}\gamma^{\ast}$ to be $\bar{x}$ , then

[TABLE]

Taking $a={z_{i}^{{\scriptscriptstyle(k)}}}^{*}$ and $b=\bar{x}$ in Proposition 7.4, we have $d(\bar{x},\gamma^{\ast})\leq C{h^{{\scriptscriptstyle(k)}}}^{2}$ , which completes the proof. ∎

We impose constrains to $\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i})$ rather than $z_{i}^{{\scriptscriptstyle(k)}}$ and $\alpha$ , and obtain the following Lemma. We use $s(a,b)$ to denote the segment between $a$ and $b$ hereafter, that is, $s(a,b)=\{\beta a+(1-\beta)b:\beta\in[0,1]\}$ .

Lemma 7.3.

Suppose the discrete curve at the $k$ -th iteration satisfies the following conditions:

(a)

$d(\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i}),\gamma^{\ast})\leq C_{1}h^{{\scriptscriptstyle(k)}}$ ,

(b)

$\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i+1})-\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{i})\|\leq C_{2}h^{{\scriptscriptstyle(k)}}$ * for any $i<2N^{{\scriptscriptstyle(k)}}$ ,*

(c)

$\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})-\gamma^{\ast}(0)\|\geq(2C_{1}+3.5)h^{{\scriptscriptstyle(k)}}$ * and $\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})-\gamma^{\ast}(r^{*})\|\geq(2C_{1}+3.5)h^{{\scriptscriptstyle(k)}}$ for any $j=0,1,\cdots,N^{{\scriptscriptstyle(k)}}-1$ .*

For any given $\delta$ , there exists $C$ such that the segments $s\big{(}\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j}),\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})\big{)}$ for any $j=0,1,\cdots,N^{{\scriptscriptstyle(k)}}-1$ are within Hausdorff distance $C{h^{{\scriptscriptstyle(k)}}}^{2}$ to $\gamma^{\ast}$ with probability $1-\delta$ .

Proof.

Since $d(\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1}),\gamma^{\ast})\leq C_{1}h^{{\scriptscriptstyle(k)}}$ for each $j=0,1,\cdots,N^{{\scriptscriptstyle(k)}}-1$ , $B_{d}(\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1}),2C_{1}h^{{\scriptscriptstyle(k)}})\cap X$ is non-empty according to Assumption 3.2 in the main manuscript. As the closest point to $\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})$ , $x^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})$ exists and $\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})-x^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})\|\leq 2C_{1}h^{{\scriptscriptstyle(k)}}$ . Moveover, $\|\bar{x}^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})-x^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})\|\leq h^{\scriptscriptstyle(k)}$ since $\bar{x}^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})$ is the mean of $B_{d}(x^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1}),h^{\scriptscriptstyle(k)})$ . Noting $x^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})\in B_{d}(\bar{x}^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1}),h^{\scriptscriptstyle(k)})$ , the nearest sample in the data cloud to $\bar{x}^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})$ , which is the final projection of $\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j+1})$ to the data cloud denoted by $\gamma^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})$ , satisfies

[TABLE]

Hence, the distance between $\gamma^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})$ and the ends of $\gamma^{*}$ can be bounded below as

[TABLE]

Plugging $x=\gamma^{{\scriptscriptstyle(k)}}_{\rm proj}(t_{2j+1})$ into Lemma 7.1, we conclude $d(z_{2j+1}^{{\scriptscriptstyle(k)}},\gamma^{\ast})=O({h^{{\scriptscriptstyle(k)}}}^{2})$ with probability $1-\delta$ . Moreover,

[TABLE]

for $t=0,r^{*}$ . Hence, the conditions on $z_{2j+1}^{{\scriptscriptstyle(k)}}$ in Proposition 7.5 are satisfied for $j=0,1,\cdots,N^{{\scriptscriptstyle(k)}}-1$ with probability $1-\delta$ . We utilize Proposition 7.5 to prove the conclusion for the segment between $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})$ . Since $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ is the projection of $\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j})$ onto the line passing $z_{2j+1}^{{\scriptscriptstyle(k)}}$ along direction $e_{1}(z_{2j+1}^{{\scriptscriptstyle(k)}},h^{{\scriptscriptstyle(k)}})$ , $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ can be written as $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})=z_{2j+1}^{{\scriptscriptstyle(k)}}+\alpha_{1}e_{1}(z_{2j+1}^{{\scriptscriptstyle(k)}},h^{{\scriptscriptstyle(k)}})$ with $|\alpha_{1}|=\|\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})-z_{2j+1}^{{\scriptscriptstyle(k)}}\|\leq\|\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j})-z_{2j+1}^{{\scriptscriptstyle(k)}}\|$ . Thus,

[TABLE]

Analogically, $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})$ can be written as $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})=z_{2j+1}^{{\scriptscriptstyle(k)}}+\alpha_{2}e_{1}(z_{2j+1}^{{\scriptscriptstyle(k)}},h^{{\scriptscriptstyle(k)}})$ with

[TABLE]

Any point on the segment between $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})$ is a convex combination of $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})$ , that is, such a point equals

[TABLE]

with a certain $\beta\in[0,1]$ . Also, we could verify that $|\beta\alpha_{1}+(1-\beta)\alpha_{2}|\leq\beta|\alpha_{1}|+(1-\beta)|\alpha_{2}|\leq(C_{2}+2C_{1}+1)h^{{\scriptscriptstyle(k)}}$ . Hence, any point on the segment between $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})$ satisfies the condition of Proposition 7.5 and thereby it has a distance less than $C{h^{{\scriptscriptstyle(k)}}}^{2}$ to $\gamma^{\ast}$ , where $C$ is independent on $h^{{\scriptscriptstyle(k)}}$ . ∎

Proposition 7.6.

If $d(a,\gamma^{\ast})=O(h^{2})$ , $d(b,\gamma^{\ast})=O(h^{2})$ and $\|a-b\|=O(h)$ , then $d(c,\gamma^{\ast})=O(h^{2})$ for any $c\in s(a,b)$ .

Proof.

Letting $a^{*}$ and $b^{*}$ be the projections of $a$ and $b$ onto $\gamma^{\ast}$ respectively, then $\|a^{*}-b^{*}\|\leq\|a^{*}-a\|+\|a-b\|+\|b-b^{*}\|=O(h)$ . Letting $T_{a^{*}}\gamma^{\ast}$ be the tangent space of $\gamma^{\ast}$ at $a^{*}$ ,

[TABLE]

by Theorem 4.18 in Federer (1959).

Based on the above inequalities, we start to bound $d(c,T_{a^{*}}\gamma^{\ast})$ for $c=\beta a+(1-\beta)b$ . Denoting $u$ be the normalized tangent vector of $\gamma^{\ast}$ at $a^{*}$ , then

[TABLE]

Let the projection of $c$ onto $T_{a^{*}}\gamma^{\ast}$ be $c^{*}$ . We could bound $\|a^{*}-c^{*}\|$ as

[TABLE]

and thereby we could bound $d(c^{*},\gamma^{\ast})=O(h^{2})$ by Proposition 7.4. Hence, we have

[TABLE]

∎

Proof of Theorem 3.2.

For any $j=1,\cdots,N^{{\scriptscriptstyle(k)}}$ , we will show that as the two projections of $\tilde{\gamma}^{{\scriptscriptstyle(k)}}(t_{2j})$ to $e_{1}(z_{2j-1}^{{\scriptscriptstyle(k)}},h^{{\scriptscriptstyle(k)}})$ and $e_{1}(z_{2j+1}^{{\scriptscriptstyle(k)}},h^{{\scriptscriptstyle(k)}})$ respectively, $\gamma_{{\rm proj},2j-1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ are not far away from each other:

[TABLE]

where the last inequality holds by (7.2) and (7.3). According to Lemma 7.3, the two segments $s\big{(}\gamma_{{\rm proj},2j-1}^{{\scriptscriptstyle(k)}}(t_{2j-2}),\gamma_{{\rm proj},2j-1}^{{\scriptscriptstyle(k)}}(t_{2j})\big{)}$ and $s\big{(}\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j}),\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})\big{)}$ are within Hausdorff distance $C{h^{\scriptscriptstyle(k)}}^{2}$ to $\gamma^{\ast}$ with probability $1-\delta$ . And thereby the two points $\gamma_{{\rm proj},2j-1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ are within Hausdorff distance $C{h^{\scriptscriptstyle(k)}}^{2}$ to $\gamma^{\ast}$ . Plugging $a=\gamma_{{\rm proj},2j-1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $b=\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ into Proposition 7.6, we could prove the Hausdorff distance between $s(\gamma_{{\rm proj},2j-1}^{{\scriptscriptstyle(k)}}(t_{2j}),\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j})$ and $\gamma^{\ast}$ is less than $O({h^{{\scriptscriptstyle(k)}}}^{2})$ for any $j=1,\cdots,N^{{\scriptscriptstyle(k)}}$ . Similarly, we could also prove $s(\bar{x}_{1},\gamma_{{\rm proj},1}^{{\scriptscriptstyle(k)}}(t_{0})$ and $s(\gamma_{{\rm proj},2N^{{\scriptscriptstyle(k)}}-1}^{{\scriptscriptstyle(k)}}(t_{2N^{{\scriptscriptstyle(k)}}}),\bar{x}_{2})$ are within the same Hausdorff distance. As the union of all the segments, $d_{H}(\mathcal{S},\gamma^{\ast})=O({h^{{\scriptscriptstyle(k)}}}^{2})$ . ∎

Proposition 7.7.

Given the initial discrete curve $\{\tilde{\gamma}^{{\scriptscriptstyle(0)}}(t_{i})\}_{i=0}^{2N^{{\scriptscriptstyle(0)}}}$ , if there exists constants $h^{{\scriptscriptstyle(0)}}$ , $C_{1}$ and $C_{2}$ satisfies

(a)

$Ch^{{\scriptscriptstyle(0)}}\leq C_{1}\rho$ * and $(4C_{1}+7)h^{{\scriptscriptstyle(0)}}<\|\gamma^{\ast}(0)-\gamma^{\ast}(r^{*})\|$ ,*

(b)

$d(\tilde{\gamma}^{{\scriptscriptstyle(0)}}(t_{i}),\gamma^{\ast})\leq C_{1}h^{{\scriptscriptstyle(0)}}$ , for any $i=1,\cdots,2N^{{\scriptscriptstyle(0)}}-1$

(c)

$\|\tilde{\gamma}^{{\scriptscriptstyle(0)}}(t_{i+1})-\tilde{\gamma}^{{\scriptscriptstyle(0)}}(t_{i})\|\leq C_{2}h^{{\scriptscriptstyle(0)}}$ * for any $i<2N^{{\scriptscriptstyle(0)}}$ ,*

(d)

$\|\tilde{\gamma}^{{\scriptscriptstyle(0)}}(t_{2j+1})-\gamma^{\ast}(0)\|\geq(2C_{1}+3.5)h^{{\scriptscriptstyle(0)}}$ * and $\|\tilde{\gamma}^{{\scriptscriptstyle(0)}}(t_{2j+1})-\gamma^{\ast}(r^{*})\|\geq(2C_{1}+3.5)h^{{\scriptscriptstyle(0)}}$ for any $j=0,1,\cdots,N^{{\scriptscriptstyle(0)}}-1$ .*

(e)

$C_{2}>4C_{1}+7$ **

then the three conditions of Lemma 7.3 hold with probability $(1-\delta)^{k}$ for any $k\geq 1$ .

Proof.

Since $\rho\leq 1$ , $h^{{\scriptscriptstyle(k)}}=\rho^{k}h^{\scriptscriptstyle(0)}<h^{\scriptscriptstyle(0)}$ , the conditions $(4C_{1}+7)h^{{\scriptscriptstyle(k)}}<\|\gamma^{\ast}(0)-\gamma^{\ast}(r^{*})\|$ and $Ch^{{\scriptscriptstyle(k)}}\leq C_{1}\rho$ hold for any $k$ . In order to prove the three conditions hold for $k\geq 1$ if they hold for $k=0$ , we only need to prove the three conditions hold for $k+1$ if they hold for $k$ . If the conditions hold for $k$ , then Lemma 7.3 holds. We obtain the segments $\{s\big{(}\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j}),\gamma_{{\rm proj},2j+1}^{{\scriptscriptstyle(k)}}(t_{2j+2})\big{)}\}$ for any $j=0,1,\cdots,N^{{\scriptscriptstyle(k)}}$ are within Hausdorff distance $O({h^{{\scriptscriptstyle(k)}}}^{2})$ to $\gamma^{\ast}$ with probability $1-\delta$ . Hence, by Proposition 7.6, we obtain

[TABLE]

which is the first condition of Lemma 7.3 with probability $1-\delta$ .

$\tilde{\gamma}^{\scriptscriptstyle(k+1)}$ is a continuous polyline starting at $\bar{x}_{1}$ and ending at $\bar{x}_{2}$ . Removing the points in $B_{d}(\gamma^{\ast}(0),(2C_{1}+3.5)h^{\scriptscriptstyle(k+1)})$ and $B_{d}(\gamma^{\ast}(r^{*}),(2C_{1}+3.5)h^{\scriptscriptstyle(k+1)})$ from $\tilde{\gamma}^{\scriptscriptstyle(k+1)}$ , we obtain

[TABLE]

Since $\|\gamma^{\ast}(0)-\gamma^{\ast}(r^{*})\|>(4C_{1}+7)h^{\scriptscriptstyle(k+1)}$ , $\tilde{\gamma}^{\scriptscriptstyle(k+1)}_{0}$ is non-empty. Hence, we could select a discrete curve $\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{i})$ , $i=1,\cdots,2N^{\scriptscriptstyle(k+1)}-1$ , from $\tilde{\gamma}_{0}^{{\scriptscriptstyle(k+1)}}$ , such that nearby points are within distance $C_{2}h^{\scriptscriptstyle(k+1)}$ , that is, $\|\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{i+1})-\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{i})\|\leq C_{2}h^{\scriptscriptstyle(k+1)}$ for any $i<2N^{\scriptscriptstyle(k+1)}$ , the second condition of Lemma 7.3 at the $(k+1)$ -th iteration. Moreover, since $\{\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{i})\}\subset\tilde{\gamma}_{0}^{\scriptscriptstyle(k+1)}$ , we have $\|\tilde{\gamma}^{\scriptscriptstyle(k+1)}(t_{i})-\gamma^{\ast}(t)\|\geq(2C_{1}+3.5)h^{\scriptscriptstyle(k+1)}$ for $t=1,2$ , which is the third condition of Lemma 7.3. ∎

Proof of Theorem 3.3.

By Proposition 7.7, the three conditions of Lemma 7.3 hold with probability $(1-\delta)^{k}$ for any $k\geq 1$ . When the conditions hold, we have

[TABLE]

by Theorem 3.2. Hence,

[TABLE]

∎

Proof of Theorem 3.4.

Since $\gamma^{\ast}\subset{\cal M}$ , we have $d(x,{\cal M})\leq d(x,\gamma^{\ast})$ . Using this inequality, we could obtain the following inequalities:

[TABLE]

∎

Appendix D: Data set of Labelled Faces in the Wild in Section 5.2

In our study, we downloaded $264$ images of $66$ people with four images of each person. The images of the face region for the $66$ individuals are shown in Figure 13.

Appendix E: Proof of Proposition 6.2

Proposition 7.8.

If $\ell(\bar{\gamma}_{1})+\ell(\bar{\gamma})+\ell(\bar{\gamma}_{2})\leq C$ , then $\gamma_{s}$ belongs to the closure $\Gamma_{+}(\bar{x}_{1},\bar{x}_{2})$ .

Proof.

For simplicity, we denote $\bar{S}$ to be the closure of a set $S$ . Let $P$ be the set of polynomials and $C$ be the set of continuous functions. Based on the Stone-Weierstrass Theorem, we have $P\subset C^{2}\subset\bar{P}=C$ , which implies that the closure of $C^{2}$ is $C$ . Based on this conclusion, we have

[TABLE]

and

[TABLE]

Since $\gamma_{s}$ is continuous and satisfies $\dot{\gamma}(t)\odot W(\gamma(t))\geq 0$ , we conclude that $\gamma_{s}\in\bar{\Gamma}_{+}(\bar{x}_{1},\bar{x}_{2})$ . ∎

Proof of Proposition 6.2.

For any $\gamma\in\Gamma_{+}(\bar{x}_{1},\bar{x}_{2})$ , there exists $t_{0}$ such that $v_{1}^{T}(\gamma(t_{0}))=0$ , since $v_{1}^{T}(\gamma(0))=v_{1}^{T}\bar{x}_{1}<0$ and $v_{1}^{T}(\gamma(r))=v_{1}^{T}\bar{x}_{2}>0$ . Define $\gamma_{1}:[0,t_{0}]\to\mathcal{M}$ by $\gamma_{1}(t)=\gamma(t)+v_{1}v_{1}^{T}(\gamma(0)-\gamma(t))$ and $\gamma_{2}:[0,r-t_{0}]\to\mathcal{M}$ by $\gamma_{2}(t)=\gamma(t-t_{0})+v_{1}v_{1}^{T}(\gamma(r)-\gamma(t-t_{0}))$ . It is easy to verify that $V_{\bot}^{T}\gamma(t)=V_{\bot}^{T}\gamma_{1}(t)$ , $V_{\bot}^{T}\dot{\gamma}(t)=V_{\bot}^{T}\dot{\gamma_{1}}(t)$ and thereby

[TABLE]

by Assumption 6.1 (b) and (c).

Considering that $v_{1}^{T}\gamma_{1}(t)$ and $v_{1}^{T}\gamma_{2}(t)$ are constant, we have $v_{1}^{T}\dot{\gamma_{1}}(t)=v_{1}^{T}\dot{\gamma_{2}}(t)$ and thereby

[TABLE]

where $\gamma_{0}=\arg\sup_{\gamma\in\Gamma_{+}(\gamma_{1}(t_{0}),p_{1},v_{1}^{T}p_{1})}\mathcal{L}(W,\gamma)$ . Since we can always reparameterize $\gamma_{1}$ to be a unit speed one, say $\tilde{\gamma}_{1}$ , the concatenate of $\tilde{\gamma}_{1}$ and $\gamma_{0}$ belongs to $\bar{\Gamma}_{+}(\bar{x}_{1},p_{1},v_{1}^{T}p_{1})$ . Hence,

[TABLE]

We can similarly verify that

[TABLE]

Moreover, by $\|W(\gamma(t))\|=1$ , we have $v_{1}^{T}W(\gamma(t))\leq 1$ and thereby

[TABLE]

Hence,

[TABLE]

Since $\gamma_{s}\in\bar{\Gamma}_{+}(\bar{x}_{1},\bar{x}_{2})$ , the supremum can be achieved, which completes the proof. ∎

Next, we will discuss the inequality

[TABLE]

Actually, if $\gamma\in\Gamma(\bar{x}_{1},\bar{x}_{2})/\Gamma_{+}(\bar{x}_{1},\bar{x}_{2})$ satisfies $\langle v_{1}^{T}\dot{\gamma}(t),v_{1}^{T}W(\gamma(t))\rangle\geq 0$ , then we define $\gamma_{+}$ by $\{v_{i}^{T}\gamma_{+}(t)\}_{i=1}^{m}$ . We specially set $v_{1}^{T}\gamma_{+}(t)=v_{1}^{T}\gamma(t)$ , and for $i\geq 2$ we set

[TABLE]

where $t_{0}$ is defined in the proof of Proposition 6.2. Using Assumption 6.1 (b), we can verify that $\dot{\gamma}_{+}(t)\odot W(\gamma_{+}(t))\geq 0$ .

In Figure 14, we display the cross sectional area of ${\cal M}$ along the first and $i$ -th axis for $i\geq 2$ . In the left panel, the blue curve is $\gamma$ and the red curve is $\gamma_{+}$ . Without loss of generality, we focus on $v_{i}^{T}\bar{x}_{1}\leq 0$ and $t\leq t_{0}$ . The other three cases in (7.7) can be similarly verified.

First, we compare the integrals over the orange curve $\mathcal{C}_{1}$ and the yellow curve $\mathcal{C}_{2}$ in Figure 14. Then the integral on $\mathcal{C}_{1}$ denoted by $I_{1}$ is

[TABLE]

and $I_{2}=\int_{\mathcal{C}_{2}}v_{i}^{T}W(z)dz_{i}$ . Then, $I_{1}-I_{2}$ is the integral of $v_{i}^{T}W(z)$ over the closed anticlockwise curve consisting of $\mathcal{C}_{1}$ and the inverse of $\mathcal{C}_{2}$ . When $d=2$ , such integral is equal to an integral over the gray region denoted by $\mathcal{D}$ shown in the right panel of Figure 14 by Green Theorem, that is,

[TABLE]

since $\frac{\partial v_{2}^{T}W(z)}{\partial z_{1}}\leq 0$ for $z_{1}\leq 0$ based on Assumption 6.1(c). For $i\geq 2$ , if $\frac{\partial v_{i}^{T}W(z)}{\partial z_{j}}\geq 0$ holds for any $j>i$ and $\frac{\partial v_{i}^{T}W(z)}{\partial z_{j}}<0$ holds for any $j<i$ , the conclusion can be extended to a higher dimension by the Stokes’ theorem.

Second, we compare the integrals over the purple and pink curve in Figure 14. By Assumption 6.1 (b), the integral of $v_{i}^{T}W$ over the purple curve is negative, while the integral over the pink curve is zero. So, the integral of $v_{i}^{T}W$ over the purple curve is less than the pink curve. The above discussion summarizes $\int_{0}^{t_{0}}v_{i}^{T}\dot{\gamma}(t)\cdot v_{i}^{T}W(\gamma(t))dt\leq\int_{0}^{t_{0}}v_{i}^{T}\dot{\gamma}_{+}(t)\cdot v_{i}^{T}W(\gamma_{+}(t))dt,$ for any $i\geq 2$ . Moreover,

[TABLE]

where the last inequality can be verified by similar proof of Proposition 6.2. Implementing the above discussion for $t\geq t_{0}$ analogically, we also have

[TABLE]

Along with (7.5) we conclude

[TABLE]

which supports the inequality (7.6).

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arjovsky et al. (2017) Arjovsky, M., S. Chintala, and L. Bottou (2017). Wasserstein generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning (ICML) , pp. 214–223.
2Bradley et al. (2013) Bradley, D., D. Nowrouzezahrai, and P. Beardsley (2013, july). Image-based reconstruction and synthesis of dense foliage. ACM Transactions on Graphics (TOG) 32 (4), 74:1–74:10.
3Cox and Cox (2000) Cox, T. F. and M. A. Cox (2000). Multidimensional scaling . Chapman and hall/CRC.
4Dryden and Mardia (2016) Dryden, I. L. and K. V. Mardia (2016). Statistical shape analysis: with applications in R , Volume 995. John Wiley & Sons.
5Eltzner et al. (2018) Eltzner, B., S. Huckemann, T. Hotz, and K. Mardia (2018). Torus principal component analysis with applications to rna structure. Annals of Applied Statistics 12 , 1332–1359.
6Federer (1959) Federer, H. (1959). Curvature measures. Transactions of the American Mathematical Society 93 (3), 418–491.
7Fefferman et al. (2018) Fefferman, C., S. Ivanov, Y. Kurylev, M. Lassas, and H. Narayanan (2018). Fitting a putative manifold to noisy data. In Proceedings of the 31st Conference On Learning Theory , Volume 75, pp. 688–720. PMLR.
8Fefferman et al. (2016) Fefferman, C., S. Mitter, and H. Narayanan (2016). Testing the manifold hypothesis. Journal of the American Mathematical Society 29 (4), 983–1049.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Random Fixed Boundary Flows

Abstract

1 Introduction

2 Fixed boundary flows

Definition 2.1**.**

3 Random fixed boundary flows

Definition 3.1**.**

Assumption 3.1**.**

Remark 3.1**.**

Theorem 3.1**.**

3.1 Determination of random fixed boundary flows

3.2 Convergence of the random fixed boundary flow

Assumption 3.2**.**

Theorem 3.2**.**

Theorem 3.3**.**

Theorem 3.4**.**

4 Simulations

5 Real Data Application

5.1 Seismological Data

5.2 Labeled Faces in the Wild

6 Fixed Boundary Flow for Non-random Data in Euclidean Space

Proposition 6.1**.**

Proof.

Assumption 6.1**.**

Proposition 6.2**.**

7 Discussion

Acknowledgements

Appendix A: Preliminaries

Appendix B: Proof of Theorem 3.1

Theorem 7.1** (Slight deformation of Theorem 8 by Yao and Xia (2019)).**

Proposition 7.1** (Proposition 2.3 by Yao and Xia (2019)).**

Proposition 7.2**.**

Proof.

Proof of Theorem 3.1.

Appendix C: Proof of Theorem 3.2 - Theorem 3.4

Lemma 7.1**.**

Proof.

Lemma 7.2**.**

Proof.

Proposition 7.3**.**

Proof.

Proposition 7.4**.**

Proof.

Proposition 7.5**.**

Proof.

Lemma 7.3**.**

Proof.

Proposition 7.6**.**

Proof.

Proof of Theorem 3.2.

Proposition 7.7**.**

Proof.

Proof of Theorem 3.3.

Proof of Theorem 3.4.

Appendix D: Data set of Labelled Faces in the Wild in Section 5.2

Appendix E: Proof of Proposition 6.2

Proposition 7.8**.**

Proof.

Proof of Proposition 6.2.

Definition 2.1.

Definition 3.1.

Assumption 3.1.

Remark 3.1.

Theorem 3.1.

Assumption 3.2.

Theorem 3.2.

Theorem 3.3.

Theorem 3.4.

Proposition 6.1.

Assumption 6.1.

Proposition 6.2.

Theorem 7.1 (Slight deformation of Theorem 8 by Yao and Xia (2019)).

Proposition 7.1 (Proposition 2.3 by Yao and Xia (2019)).

Proposition 7.2.

Lemma 7.1.

Lemma 7.2.

Proposition 7.3.

Proposition 7.4.

Proposition 7.5.

Lemma 7.3.

Proposition 7.6.

Proposition 7.7.

Proposition 7.8.