Non-Asymptotic Rates for Manifold, Tangent Space, and Curvature   Estimation

Eddie Aamari (DATASHAPE; SELECT; LM-Orsay); Cl\'ement Levrard (UPD7)

arXiv:1705.00989·math.ST·February 6, 2018

Non-Asymptotic Rates for Manifold, Tangent Space, and Curvature Estimation

Eddie Aamari (DATASHAPE, SELECT, LM-Orsay), Cl\'ement Levrard (UPD7)

PDF

TL;DR

This paper establishes optimal non-asymptotic rates for estimating manifold structures, tangent spaces, and curvature from finite samples, advancing theoretical understanding of geometric estimation.

Contribution

It introduces a unified approach using local polynomials for simultaneous estimation of manifold, tangent space, and curvature, with minimax lower bounds derived.

Findings

01

Optimal rates for tangent space estimation

02

Optimal rates for second fundamental form estimation

03

Optimal rates for manifold estimation

Abstract

Given an $n$ -sample drawn on a submanifold $M \subset R^{D}$ , we derive optimal rates for the estimation of tangent spaces $T_X M$ , the second fundamental form $I I_X^{M}$ , and the submanifold $M$ .After motivating their study, we introduce a quantitative class of $C^{k}$ -submanifolds in analogy with H{\"o}lder classes.The proposed estimators are based on local polynomials and allow to deal simultaneously with the three problems at stake. Minimax lower bounds are derived using a conditional version of Assouad's lemma when the base point $X$ is random.

Figures16

Click any figure to enlarge with its caption.

Equations478

\inf_{\hat{T}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq 0\end{subarray}}\mathbb{E}\angle\bigl{(}T_{y}M,\hat{T}\bigr{)}\geq 1/2,\qquad\inf_{\widehat{II}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq 0\end{subarray}}\mathbb{E}\bigl{\|}{II_{y}^{M}-\widehat{II}}\bigr{\|}\geq c>0,

\inf_{\hat{T}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq 0\end{subarray}}\mathbb{E}\angle\bigl{(}T_{y}M,\hat{T}\bigr{)}\geq 1/2,\qquad\inf_{\widehat{II}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq 0\end{subarray}}\mathbb{E}\bigl{\|}{II_{y}^{M}-\widehat{II}}\bigr{\|}\geq c>0,

M e d (M) = {z \in R^{D} ∣\exists p \neq = q \in M, ∥ z - p ∥ = ∥ z - q ∥ = d (z, M)} .

M e d (M) = {z \in R^{D} ∣\exists p \neq = q \in M, ∥ z - p ∥ = ∥ z - q ∥ = d (z, M)} .

τ_{M} = p \in M in f d (p, M e d (M)) = z \in M e d (M) in f d (z, M) .

τ_{M} = p \in M in f d (p, M e d (M)) = z \in M e d (M) in f d (z, M) .

exp_{p} : B_{T_{p} M} (0, τ_{min} /4)

exp_{p} : B_{T_{p} M} (0, τ_{min} /4)

v

N_{p} (0) = 0, d_{0} N_{p} = 0, ∥ d_{v} N_{p} ∥_{o p} \leq L_{⊥} ∥ v ∥,

N_{p} (0) = 0, d_{0} N_{p} = 0, ∥ d_{v} N_{p} ∥_{o p} \leq L_{⊥} ∥ v ∥,

y - p = π_{T_{p} M} (y - p) + R_{2} (y - p),

y - p = π_{T_{p} M} (y - p) + R_{2} (y - p),

\mathbb{E}\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\leq C\left(\frac{1}{n}\right)^{\frac{1}{d}}.

\mathbb{E}\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\leq C\left(\frac{1}{n}\right)^{\frac{1}{d}}.

Ψ_{p} : B_{T_{p} M} (0, r)

Ψ_{p} : B_{T_{p} M} (0, r)

v

N_{p} (0) = 0, d_{0} N_{p} = 0, d_{v}^{2} N_{p}_{o p} \leq L_{⊥},

N_{p} (0) = 0, d_{0} N_{p} = 0, d_{v}^{2} N_{p}_{o p} \leq L_{⊥},

d_{v}^{i} N_{p}_{o p} \leq L_{i} for all 3 \leq i \leq k .

d_{v}^{i} N_{p}_{o p} \leq L_{i} for all 3 \leq i \leq k .

y - p = π^{*}

y - p = π^{*}

0 < f_{min} \leq f (y) \leq f_{ma x} < \infty.

0 < f_{min} \leq f (y) \leq f_{ma x} < \infty.

\displaystyle\inf_{\hat{T}}\sup_{P\in\mathcal{P}_{(x)}^{k}}\mathbb{E}_{P^{\otimes n}}\angle\bigl{(}T_{x}M,\hat{T}\bigr{)}\geq\frac{1}{2}>0,

\displaystyle\inf_{\hat{T}}\sup_{P\in\mathcal{P}_{(x)}^{k}}\mathbb{E}_{P^{\otimes n}}\angle\bigl{(}T_{x}M,\hat{T}\bigr{)}\geq\frac{1}{2}>0,

I I in f P \in P_{(x)}^{k} sup E_{P^{\otimes n}} I I_{x}^{M} \circ π_{T_{x} M} - I I_{o p} \geq \frac{L _{⊥}}{4} > 0,

I I in f P \in P_{(x)}^{k} sup E_{P^{\otimes n}} I I_{x}^{M} \circ π_{T_{x} M} - I I_{o p} \geq \frac{L _{⊥}}{4} > 0,

π, s u p_{2 \leq i \leq k} ∥ T_{i} ∥_{o p} \leq t ar g min P_{n - 1}^{(j)} x - π (x) - i = 2 \sum k - 1 T_{i} (π (x)^{\otimes i})^{2} \mathbbm 1_{B (0, h)} (x),

π, s u p_{2 \leq i \leq k} ∥ T_{i} ∥_{o p} \leq t ar g min P_{n - 1}^{(j)} x - π (x) - i = 2 \sum k - 1 T_{i} (π (x)^{\otimes i})^{2} \mathbbm 1_{B (0, h)} (x),

h_{0} = \frac{τ _{min} \land L _{⊥}^{- 1}}{8} .

h_{0} = \frac{τ _{min} \land L _{⊥}^{- 1}}{8} .

\displaystyle\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\leq C_{d,k,\tau_{min},\mathbf{L}}\sqrt{\frac{f_{max}}{f_{min}}}(h^{k-1}\vee\sigma h^{-1})(1+th).

\displaystyle\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\leq C_{d,k,\tau_{min},\mathbf{L}}\sqrt{\frac{f_{max}}{f_{min}}}(h^{k-1}\vee\sigma h^{-1})(1+th).

\displaystyle\sup_{P\in\mathcal{P}^{k}(\sigma)}\mathbb{E}_{P^{\otimes n}}\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\leq C\left(\frac{\log n}{n-1}\right)^{\frac{k-1}{d}}\left\{1\vee\sigma\left(\frac{\log n}{n-1}\right)^{-\frac{k}{d}}\right\},

\displaystyle\sup_{P\in\mathcal{P}^{k}(\sigma)}\mathbb{E}_{P^{\otimes n}}\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\leq C\left(\frac{\log n}{n-1}\right)^{\frac{k-1}{d}}\left\{1\vee\sigma\left(\frac{\log n}{n-1}\right)^{-\frac{k}{d}}\right\},

\inf_{\hat{T}}\sup_{P\in\mathcal{P}^{k}(\sigma)}\mathbb{E}_{P^{\otimes n}}\angle\bigr{(}T_{\pi_{M}(X_{1})}M,\hat{T}\bigr{)}\\ \geq c_{d,k,\tau_{min}}\left\{\left(\frac{1}{n-1}\right)^{\frac{k-1}{d}}\right.\left.\vee\left(\frac{\sigma}{n-1}\right)^{\frac{k-1}{d+k}}\right\},

\inf_{\hat{T}}\sup_{P\in\mathcal{P}^{k}(\sigma)}\mathbb{E}_{P^{\otimes n}}\angle\bigr{(}T_{\pi_{M}(X_{1})}M,\hat{T}\bigr{)}\\ \geq c_{d,k,\tau_{min}}\left\{\left(\frac{1}{n-1}\right)^{\frac{k-1}{d}}\right.\left.\vee\left(\frac{\sigma}{n-1}\right)^{\frac{k-1}{d+k}}\right\},

1 \leq j \leq n max I I_{Y_{j}}^{M} \circ π_{T_{Y_{j}} M} - \hat{T}_{2, j} \circ \overset{π}{^}_{j}_{o p}

1 \leq j \leq n max I I_{Y_{j}}^{M} \circ π_{T_{Y_{j}} M} - \hat{T}_{2, j} \circ \overset{π}{^}_{j}_{o p}

P \in P^{k} (σ) sup E_{P^{\otimes n}} 1 \leq j \leq n max I I_{Y_{j}}^{M} \circ π_{T_{Y_{j}} M} - \hat{T}_{2, j} \circ \overset{π}{^}_{j}_{o p} \leq C_{d, k, τ_{min}, L, f_{min}, f_{ma x}} (\frac{lo g n}{n - 1})^{\frac{k - 2}{d}} {1 \lor σ (\frac{lo g n}{n - 1})^{- \frac{k}{d}}} .

P \in P^{k} (σ) sup E_{P^{\otimes n}} 1 \leq j \leq n max I I_{Y_{j}}^{M} \circ π_{T_{Y_{j}} M} - \hat{T}_{2, j} \circ \overset{π}{^}_{j}_{o p} \leq C_{d, k, τ_{min}, L, f_{min}, f_{ma x}} (\frac{lo g n}{n - 1})^{\frac{k - 2}{d}} {1 \lor σ (\frac{lo g n}{n - 1})^{- \frac{k}{d}}} .

S c_{Y_{j}}^{M} = \frac{1}{d ( d - 1 )} r \neq = s \sum [⟨ I I_{Y_{j}}^{M} (e_{r}, e_{r}), I I_{Y_{j}}^{M} (e_{s}, e_{s}) ⟩ - ∥ I I_{Y_{j}}^{M} (e_{r}, e_{s}) ∥^{2}],

S c_{Y_{j}}^{M} = \frac{1}{d ( d - 1 )} r \neq = s \sum [⟨ I I_{Y_{j}}^{M} (e_{r}, e_{r}), I I_{Y_{j}}^{M} (e_{s}, e_{s}) ⟩ - ∥ I I_{Y_{j}}^{M} (e_{r}, e_{s}) ∥^{2}],

S c_{j} = \frac{1}{d ( d - 1 )} r \neq = s \sum [⟨ \hat{T}_{2, j} (\overset{e}{^}_{r}, \overset{e}{^}_{r}), \hat{T}_{2, j} (\overset{e}{^}_{s}, \overset{e}{^}_{s}) ⟩ - ∥ \hat{T}_{2, j} (\overset{e}{^}_{r}, \overset{e}{^}_{s}) ∥^{2}],

S c_{j} = \frac{1}{d ( d - 1 )} r \neq = s \sum [⟨ \hat{T}_{2, j} (\overset{e}{^}_{r}, \overset{e}{^}_{r}), \hat{T}_{2, j} (\overset{e}{^}_{s}, \overset{e}{^}_{s}) ⟩ - ∥ \hat{T}_{2, j} (\overset{e}{^}_{r}, \overset{e}{^}_{s}) ∥^{2}],

E_{P^{\otimes n}} 1 \leq j \leq n max S c_{j} - S c_{Y_{j}}^{M} \leq C (\frac{lo g n}{n - 1})^{\frac{k - 2}{d}} {1 \lor σ (\frac{lo g n}{n - 1})^{- \frac{k}{d}}},

E_{P^{\otimes n}} 1 \leq j \leq n max S c_{j} - S c_{Y_{j}}^{M} \leq C (\frac{lo g n}{n - 1})^{\frac{k - 2}{d}} {1 \lor σ (\frac{lo g n}{n - 1})^{- \frac{k}{d}}},

I I in f P \in P^{k} (σ) sup E_{P^{\otimes n}} I I_{π_{M} (X_{1})}^{M} \circ π_{T_{π_{M} (X_{1})} M} - I I_{o p} \geq c_{d, k, τ_{min}} {(\frac{1}{n - 1})^{\frac{k - 2}{d}} \lor (\frac{σ}{n - 1})^{\frac{k - 2}{d + k}}},

I I in f P \in P^{k} (σ) sup E_{P^{\otimes n}} I I_{π_{M} (X_{1})}^{M} \circ π_{T_{π_{M} (X_{1})} M} - I I_{o p} \geq c_{d, k, τ_{min}} {(\frac{1}{n - 1})^{\frac{k - 2}{d}} \lor (\frac{σ}{n - 1})^{\frac{k - 2}{d + k}}},

Ψ_{j} (v)

Ψ_{j} (v)

\hat{M}

\hat{M}

\displaystyle d_{H}\bigl{(}M,\hat{M}\bigr{)}\leq C_{d,k,\tau_{min},\mathbf{L},f_{min},f_{max}}(h^{k}\vee\sigma).

\displaystyle d_{H}\bigl{(}M,\hat{M}\bigr{)}\leq C_{d,k,\tau_{min},\mathbf{L},f_{min},f_{max}}(h^{k}\vee\sigma).

\displaystyle\sup_{P\in\mathcal{P}^{k}(\sigma)}\mathbb{E}_{P^{\otimes n}}d_{H}\bigl{(}M,\hat{M}\bigr{)}\leq C_{d,k,\tau_{min},\mathbf{L},f_{min},f_{max}}\left\{\left(\frac{\log n}{n-1}\right)^{\frac{k}{d}}\vee\sigma\right\}.

\displaystyle\sup_{P\in\mathcal{P}^{k}(\sigma)}\mathbb{E}_{P^{\otimes n}}d_{H}\bigl{(}M,\hat{M}\bigr{)}\leq C_{d,k,\tau_{min},\mathbf{L},f_{min},f_{max}}\left\{\left(\frac{\log n}{n-1}\right)^{\frac{k}{d}}\vee\sigma\right\}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

\setattribute

journalname \startlocaldefs

\endlocaldefs

and

label=u1,url]http://www.math.ucsd.edu/ eaamari/ label=u2,url]http://www.normalesup.org/ levrard/

t1Research supported by ANR project TopData ANR-13-BS01-0008 t2Research supported by Advanced Grant of the European Research Council GUDHI t3Supported by the Conseil régional d’Île-de-France program RDM-IdF

Non-asymptotic Rates for Manifold, Tangent Space and Curvature Estimation

Eddie Aamarilabel=e1][email protected] [

Clément Levrardlabel=e2][email protected] [[[

U.C. San Diego\thanksmarkm1 , Université Paris-Diderot\thanksmarkm3

Department of Mathematics

University of California San Diego

9500 Gilman Dr. La Jolla

CA 92093

United States

Laboratoire de probabilités et modèles aléatoires

Bâtiment Sophie Germain

Université Paris-Diderot

75013 Paris

France

Abstract

: Given a noisy sample from a submanifold $M\subset\mathbb{R}^{D}$ , we derive optimal rates for the estimation of tangent spaces $T_{X}M$ , the second fundamental form $II_{X}^{M}$ , and the submanifold $M$ . After motivating their study, we introduce a quantitative class of $\mathcal{C}^{k}$ -submanifolds in analogy with Hölder classes. The proposed estimators are based on local polynomials and allow to deal simultaneously with the three problems at stake. Minimax lower bounds are derived using a conditional version of Assouad’s lemma when the base point $X$ is random.

62G05, 62C20,

geometric inference,

minimax,

manifold learning,

keywords:

[class=MSC]

keywords:

1 Introduction

A wide variety of data can be thought of as being generated on a shape of low dimensionality compared to possibly high ambient dimension. This point of view led to the development of the so-called topological data analysis, which proved fruitful for instance when dealing with physical parameters subject to constraints, biomolecule conformations, or natural images [35]. This field intends to associate geometric quantities to data without regard of any specific coordinate system or parametrization. If the underlying structure is sufficiently smooth, one can model a point cloud $\mathbb{X}_{n}=\left\{X_{1},\ldots,X_{n}\right\}$ as being sampled on a $d$ -dimensional submanifold $M\subset\mathbb{R}^{D}$ . In such a case, geometric and topological intrinsic quantities include (but are not limited to) homology groups [28], persistent homology [15], volume [5], differential quantities [9] or the submanifold itself [20, 1, 26].

The present paper focuses on optimal rates for estimation of quantities up to order two: (0) the submanifold itself, (1) tangent spaces, and (2) second fundamental forms.

Among these three questions, a special attention has been paid to the estimation of the submanifold. In particular, it is a central problem in manifold learning. Indeed, there exists a wide bunch of algorithms intended to reconstruct submanifolds from point clouds (Isomap [32], LLE [29], and restricted Delaunay Complexes [6, 12] for instance), but few come with theoretical guarantees [20, 1, 26]. Up to our knowledge, minimax lower bounds were used to prove optimality in only one case [20]. Some of these reconstruction procedures are based on tangent space estimation [6, 1, 12]. Tangent space estimation itself also yields interesting applications in manifold clustering [19, 4]. Estimation of curvature-related quantities naturally arises in shape reconstruction, since curvature can drive the size of a meshing. As a consequence, most of the associated results deal with the case $d=2$ and $D=3$ , though some of them may be extended to higher dimensions [27, 23]. Several algorithms have been proposed in that case [30, 9, 27, 23], but with no analysis of their performances from a statistical point of view.

To assess the quality of such a geometric estimator, the class of submanifolds over which the procedure is evaluated has to be specified. Up to now, the most commonly used model for submanifolds relied on the reach $\tau_{M}$ , a generalized convexity parameter. Assuming $\tau_{M}\geq\tau_{min}>0$ involves both local regularity — a bound on curvature — and global regularity — no arbitrarily pinched area —. This $\mathcal{C}^{2}$ -like assumption has been extensively used in the computational geometry and geometric inference fields [1, 28, 15, 5, 20]. One attempt of a specific investigation for higher orders of regularity $k\geq 3$ has been proposed in [9].

Many works suggest that the regularity of the submanifold has an important impact on convergence rates. This is pretty clear for tangent space estimation, where convergence rates of PCA-based estimators range from $(1/n)^{1/d}$ in the $\mathcal{C}^{2}$ case [1] to $(1/n)^{\alpha}$ with $1/d<\alpha<2/d$ in more regular settings [31, 33]. In addition, it seems that PCA-based estimators are outperformed by estimators taking into account higher orders of smoothness [11, 9], for regularities at least $\mathcal{C}^{3}$ . For instance fitting quadratic terms leads to a convergence rate of order $(1/n)^{2/d}$ in [11]. These remarks naturally led us to investigate the properties of local polynomial approximation for regular submanifolds, where “regular” has to be properly defined. Local polynomial fitting for geometric inference was studied in several frameworks such as [9]. In some sense, a part of our work extends these results, by investigating the dependency of convergence rates on the sample size $n$ , but also on the order of regularity $k$ and the ambient and intrinsic dimensions $d$ and $D$ .

1.1 Overview of the Main Results

In this paper, we build a collection of models for $\mathcal{C}^{k}$ -submanifolds ( $k\geq 3$ ) that naturally generalize the commonly used one for $k=2$ (Section 2). Roughly speaking, these models are defined by their local differential regularity $k$ in the usual sense, and by their minimum reach $\tau_{min}>0$ that may be thought of as a global regularity parameter (see Section 2.2). On these models, we study the non-asymptotic rates of estimation for tangent space, curvature, and manifold estimation (Section 3). Roughly speaking, if $M$ is a $\mathcal{C}_{\tau_{min}}^{k}$ submanifold and if $Y_{1},\ldots,Y_{n}$ is an $n$ -sample drawn on $M$ uniformly enough, then we can derive the following minimax bounds:

$\displaystyle\text{{(Theorems \ref{thm:upper_bound_tangent} and \ref{thm:lower_bound_tangent})}}\hfill\inf_{\hat{T}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq\tau_{min}\end{subarray}}\mathbb{E}\max_{1\leq j\leq n}\angle\bigl{(}T_{Y_{j}}M,\hat{T}_{j}\bigr{)}\asymp\left(\frac{1}{n}\right)^{\frac{k-1}{d}},$

where $T_{y}M$ denotes the tangent space of $M$ at $y$ ;

$\displaystyle\text{{(Theorems \ref{thm:upper_bound_curvature} and \ref{thm:lower_bound_curvature})}}\hfill\inf_{\widehat{II}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq\tau_{min}\end{subarray}}\mathbb{E}\max_{1\leq j\leq n}\bigl{\|}II_{Y_{j}}^{M}-\widehat{II}_{j}\bigr{\|}\asymp\left(\frac{1}{n}\right)^{\frac{k-2}{d}},$

where $II_{y}^{M}$ denotes the second fundamental form of $M$ at $y$ ;

$\displaystyle\text{ {(Theorems \ref{thm:upper_bound_hausdorff} and \ref{thm:lower_bound_hausdorff})} }\hfill\hfill\hfill\inf_{\hat{M}}\sup_{\begin{subarray}{c}M\in\mathcal{C}^{k}\\ \tau_{M}\geq\tau_{min}\end{subarray}}\mathbb{E}\leavevmode\nobreak\ d_{H}\bigl{(}M,\hat{M}\bigr{)}\asymp\left(\frac{1}{n}\right)^{\frac{k}{d}},\hfill$

where $d_{H}$ denotes the Hausdorff distance.

These results shed light on the influence of $k$ , $d$ , and $n$ on these estimation problems, showing for instance that the ambient dimension $D$ plays no role. The estimators proposed for the upper bounds all rely on the analysis of local polynomials, and allow to deal with the three estimation problems in a unified way (Section 5.1). Some of the lower bounds are derived using a new version of Assouad’s Lemma (Section 5.2.2).

We also emphasize the influence of the reach $\tau_{M}$ of the manifold $M$ in Theorem 1. Indeed, we show that whatever the local regularity $k$ of $M$ , if we only require $\tau_{M}\geq 0$ , then for any fixed point $y\in M$ ,

[TABLE]

assessing that the global regularity parameter $\tau_{min}>0$ is crucial for estimation purpose.

It is worth mentioning that our bounds also allow for perpendicular noise of amplitude $\sigma>0$ . When $\sigma\lesssim(1/n)^{\alpha/d}$ for $1\leq\alpha$ , then our estimators behave as if the corrupted sample $X_{1},\ldots,X_{n}$ were exactly drawn on a manifold with regularity $\alpha$ . Hence our estimators turn out to be optimal whenever $\alpha\geq k$ . If $\alpha<k$ , the lower bounds suggest that better rates could be obtained with different estimators, by pre-processing data as in [21] for instance.

For the sake of completeness, geometric background and proofs of technical lemmas are given in the Appendix.

2 $\mathcal{C}^{k}$ Models for Submanifolds

2.1 Notation

Throughout the paper, we consider $d$ -dimensional compact submanifolds $M\subset\mathbb{R}^{D}$ without boundary. The submanifolds will always be assumed to be at least $\mathcal{C}^{2}$ . For all $p\in M$ , $T_{p}M$ stands for the tangent space of $M$ at $p$ [13, Chapter 0]. We let $II_{p}^{M}:T_{p}M\times T_{p}M\rightarrow T_{p}M^{\perp}$ denote the second fundamental form of $M$ at $p$ [13, p. 125]. $II_{p}^{M}$ characterizes the curvature of $M$ at $p$ . The standard inner product in $\mathbb{R}^{D}$ is denoted by $\langle\cdot,\cdot\rangle$ and the Euclidean distance by $\left\|\cdot\right\|$ . Given a linear subspace $T\subset\mathbb{R}^{D}$ , write $T^{\perp}$ for its orthogonal space. We write $\mathcal{B}(p,r)$ for the closed Euclidean ball of radius $r>0$ centered at $p\in\mathbb{R}^{D}$ , and for short $\mathcal{B}_{T}(p,r)=\mathcal{B}(p,r)\cap T$ . For a smooth function $\Phi:\mathbb{R}^{D}\rightarrow\mathbb{R}^{D}$ and $i\geq 1$ , we let $d_{x}^{i}\Phi$ denote the $i$ th order differential of $\Phi$ at $x\in\mathbb{R}^{D}$ . For a linear map $A$ defined on $T\subset\mathbb{R}^{D}$ , $\left\|A\right\|_{\mathrm{op}}={\sup_{v\in T}}\frac{\left\|Av\right\|}{\left\|v\right\|}$ stands for the operator norm. We adopt the same notation $\left\|\cdot\right\|_{op}$ for tensors, i.e. multilinear maps. Similarly, if $\left\{A_{x}\right\}_{x\in T^{\prime}}$ is a family of linear maps, its $L^{\infty}$ operator norm is denoted by $\left\|A\right\|_{op}=\sup_{x\in T^{\prime}}\left\|A_{x}\right\|_{op}$ . When it is well defined, we will write $\pi_{B}(z)$ for the projection of $z\in\mathbb{R}^{D}$ onto the closed subset $B\subset\mathbb{R}^{D}$ , that is the nearest neighbor of $z$ in $B$ . The distance between two linear subspaces $U,V\subset\mathbb{R}^{D}$ of the same dimension is measured by the principal angle $\angle(U,V)=\left\|\pi_{U}-\pi_{V}\right\|_{\mathrm{op}}.$ The Hausdorff distance [20] in $\mathbb{R}^{D}$ is denoted by ${d}_{H}$ . For a probability distribution $P$ , $\mathbb{E}_{P}$ stands for the expectation with respect to $P$ . We write $P^{\otimes n}$ for the $n$ -times tensor product of $P$ .

Throughout this paper, $C_{\alpha}$ will denote a generic constant depending on the parameter $\alpha$ . For clarity’s sake, $C^{\prime}_{\alpha}$ , $c_{\alpha}$ , or $c^{\prime}_{\alpha}$ may also be used when several constants are involved.

2.2 Reach and Regularity of Submanifolds

As introduced in [16], the reach $\tau_{M}$ of a subset $M\subset\mathbb{R}^{D}$ is the maximal neighborhood radius for which the projection $\pi_{M}$ onto $M$ is well defined. More precisely, denoting by $d(\cdot,M)$ the distance to $M$ , the medial axis of $M$ is defined to be the set of points which have at least two nearest neighbors on $M$ , that is

[TABLE]

The reach is then defined by

[TABLE]

It gives a minimal scale of geometric and topological features of $M$ .

As a generalized convexity parameter, $\tau_{M}$ is a key parameter in reconstruction [1, 20] and in topological inference [28]. Having $\tau_{M}\geq\tau_{min}>0$ prevents $M$ from almost auto-intersecting, and bounds its curvature in the sense that $\left\|II^{M}_{p}\right\|_{op}\leq\tau_{M}^{-1}\leq\tau_{min}^{-1}$ for all $p\in M$ [28, Proposition 6.1].

For $\tau_{min}>0$ , we let $\mathcal{C}^{2}_{\tau_{min}}$ denote the set of $d$ -dimensional compact connected submanifolds $M$ of $\mathbb{R}^{D}$ such that $\tau_{M}\geq\tau_{min}>0$ . A key property of submanifolds $M\in\mathcal{C}^{2}_{\tau_{min}}$ is the existence of a parametrization closely related to the projection onto tangent spaces. We let $\exp_{p}:T_{p}M\rightarrow M$ denote the exponential map of $M$ [13, Chapter 3], that is defined by $\exp_{p}(v)=\gamma_{p,v}(1)$ , where $\gamma_{p,v}$ is the unique constant speed geodesic path of $M$ with initial value $p$ and velocity $v$ .

Lemma 1.

If $M\in\mathcal{C}^{2}_{\tau_{min}}$ , $\exp_{p}:\mathcal{B}_{T_{p}M}\left(0,\tau_{min}/4\right)\rightarrow M$ is one-to-one. Moreover, it can be written as

[TABLE]

with $\mathbf{N}_{p}$ such that for all $v\in\mathcal{B}_{T_{p}M}\left(0,\tau_{min}/4\right)$ ,

[TABLE]

where $L_{\perp}=5/(4\tau_{min})$ . Furthermore, for all $p,y\in M$ ,

[TABLE]

*where $\left\|R_{2}(y-p)\right\|\leq\frac{\left\|y-p\right\|^{2}}{2\tau_{min}}$ . *

A proof of Lemma A.1 is given in Section A.1 of the Appendix. In other words, elements of $\mathcal{C}^{2}_{\tau_{min}}$ have local parametrizations on top of their tangent spaces that are defined on neighborhoods with a minimal radius, and these parametrizations differ from the identity map by at most a quadratic term. The existence of such local parametrizations leads to the following convergence result:

if data $Y_{1},\ldots,Y_{n}$ are drawn uniformly enough on $M\in\mathcal{C}^{2}_{\tau_{min}}$ , then it is shown in [1, Proposition 14] that a tangent space estimator $\hat{T}$ based on local PCA achieves

[TABLE]

When $M$ is smoother, it has been proved in [11] that a convergence rate in $n^{-2/d}$ might be achieved, based on the existence of a local order $3$ Taylor expansion of the submanifold on top of its tangent spaces.

Thus, a natural extension of the $\mathcal{C}^{2}_{\tau_{min}}$ model to $\mathcal{C}^{k}$ -submanifolds should ensure that such an expansion exists at order $k$ and satisfies some regularity constraints. To this aim, we introduce the following class of regularity $\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ .

Definition 1.

For $k\geq 3$ , $\tau_{min}>0$ , and $\mathbf{L}=(L_{\perp},L_{3},\ldots,L_{k})$ , we let $\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ denote the set of $d$ -dimensional compact connected submanifolds $M$ of $\mathbb{R}^{D}$ with $\tau_{M}\geq\tau_{min}$ and such that, for all $p\in M$ , there exists a local one-to-one parametrization $\Psi_{p}$ of the form:

[TABLE]

for some $r\geq\frac{1}{4L_{\perp}}$ , with $\mathbf{N}_{p}\in\mathcal{C}^{k}\left(\mathcal{B}_{T_{p}M}\left(0,r\right),\mathbb{R}^{D}\right)$ such that

[TABLE]

for all $\left\|v\right\|\leq\frac{1}{4L_{\perp}}$ . Furthermore, we require that

[TABLE]

It is important to note that such a family of $\Psi_{p}$ ’s exists for any compact $\mathcal{C}^{k}$ -submanifold, if one allows $\tau_{min}^{-1}$ , $L_{\perp}$ , $L_{3}$ , $\ldots$ , $L_{k}$ to be large enough. Note that the radius $1/(4L_{\perp})$ has been chosen for convenience. Other smaller scales would do and we could even parametrize this constant, but without substantial benefits in the results.

The $\Psi_{p}$ ’s can be seen as unit parametrizations of $M$ . The conditions on $\mathbf{N}_{p}(0)$ , $d_{0}\mathbf{N}_{p}$ , and $d^{2}_{v}\mathbf{N}_{p}$ ensure that $\Psi_{p}^{-1}$ is close to the projection $\pi_{T_{p}M}$ . The bounds on $d_{v}^{i}\mathbf{N}_{p}$ ( $3\leq i\leq k$ ) allow to control the coefficients of the polynomial expansion we seek. Indeed, whenever $M\in\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ , Lemma 2 shows that for every $p$ in $M$ , and $y$ in $\mathcal{B}\bigl{(}p,\frac{\tau_{min}\wedge L_{\perp}^{-1}}{4}\bigr{)}\cap M$ ,

[TABLE]

where $\pi^{*}$ denotes the orthogonal projection onto $T_{p}M$ , the $T_{i}^{*}$ are $i$ -linear maps from $T_{p}M$ to $\mathbb{R}^{D}$ with $\|T_{i}^{*}\|_{op}\leq L^{\prime}_{i}$ and $R_{k}$ satisfies $\|R_{k}(y-p)\|\leq C\|y-p\|^{k}$ , where the constants $C$ and the $L^{\prime}_{i}$ ’s depend on the parameters $\tau_{min}$ , $d$ , $k$ , $L_{\perp},\ldots,L_{k}$ .

Note that for $k\geq 3$ the exponential map can happen to be only $\mathcal{C}^{k-2}$ for a $\mathcal{C}^{k}$ -submanifold [24]. Hence, it may not be a good choice of $\Psi_{p}$ . However, for $k=2$ , taking $\Psi_{p}=\exp_{p}$ is sufficient for our purpose. For ease of notation, we may write $\mathcal{C}^{2}_{\tau_{min},\mathbf{L}}$ although the specification of $\mathbf{L}$ is useless. In this case, we implicitly set by default $\Psi_{p}=\exp_{p}$ and $L_{\perp}=5/(4\tau_{min})$ . As will be shown in Theorem 1, the global assumption $\tau_{M}\geq\tau_{min}>0$ cannot be dropped, even when higher order regularity bounds $L_{i}$ ’s are fixed.

Let us now describe the statistical model. Every $d$ -dimensional submanifold $M\subset\mathbb{R}^{D}$ inherits a natural uniform volume measure by restriction of the ambient $d$ -dimensional Hausdorff measure $\mathcal{H}^{d}$ . In what follows, we will consider probability distributions that are almost uniform on some $M$ in $\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ , with some bounded noise, as stated below.

Definition 2 (Noise-Free and Tubular Noise Models).

- (Noise-Free Model) For $k\geq 2$ , $\tau_{min}>0$ , $\mathbf{L}=(L_{\perp},L_{3},\ldots,L_{k})$ and $f_{min}\leq f_{max}$ , we let $\mathcal{P}^{k}_{\tau_{min},\mathbf{L},f_{min},f_{max}}$ denote the set of distributions $P_{0}$ with support $M\in\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ that have a density $f$ with respect to the volume measure on $M$ , and such that for all $y\in M$ ,

[TABLE]

- (Tubular Noise Model) For $0\leq\sigma<\tau_{min}$ , we denote by $\mathcal{P}^{k}_{\tau_{min},\mathbf{L},f_{min},f_{max}}\left(\sigma\right)$ the set of distributions of random variables $X=Y+Z$ , where $Y$ has distribution $P_{0}\in\mathcal{P}^{k}_{\tau_{min},\mathbf{L},f_{min},f_{max}}$ , and $Z\in T_{Y}M^{\perp}$ with $\|Z\|\leq\sigma$ and $\mathbb{E}(Z|Y)=0$ .

For short, we write $\mathcal{P}^{k}$ and $\mathcal{P}^{k}(\sigma)$ when there is no ambiguity. We denote by $\mathbb{X}_{n}$ an i.i.d. $n$ -sample $\left\{X_{1},\ldots,X_{n}\right\}$ , that is, a sample with distribution $P^{\otimes n}$ for some $P\in\mathcal{P}^{k}(\sigma)$ , so that $X_{i}=Y_{i}+Z_{i}$ , where $Y$ has distribution $P_{0}\in\mathcal{P}^{k}$ , $Z\in\mathcal{B}_{T_{Y}M^{\perp}}(0,\sigma)$ with $\mathbb{E}(Z|Y)=0$ . It is immediate that for $\sigma<\tau_{min}$ , we have $Y=\pi_{M}(X)$ . Note that the tubular noise model $\mathcal{P}^{k}(\sigma)$ is a slight generalization of that in [21].

In what follows, though $M$ is unknown, all the parameters of the model will be assumed to be known, including the intrinsic dimension $d$ and the order of regularity $k$ . We will also denote by $\mathcal{P}^{k}_{(x)}$ the subset of elements in $\mathcal{P}^{k}$ whose support contains a prescribed $x\in\mathbb{R}^{D}$ .

In view of our minimax study on $\mathcal{P}^{k}$ , it is important to ensure by now that $\mathcal{P}^{k}$ is stable with respect to deformations and dilations.

Proposition 1.

Let $\Phi:\mathbb{R}^{D}\rightarrow\mathbb{R}^{D}$ be a global $\mathcal{C}^{k}$ -diffeomorphism. If $\left\|d\Phi-I_{D}\right\|_{op}$ , $\left\|d^{2}\Phi\right\|_{op}$ , …, $\left\|d^{k}\Phi\right\|_{op}$ are small enough, then for all $P$ in $\mathcal{P}^{k}_{\tau_{min},\mathbf{L},f_{min},f_{max}}$ , the pushforward distribution $P^{\prime}=\Phi_{\ast}P$ belongs to $\mathcal{P}^{k}_{\tau_{min}/2,2\mathbf{L},f_{min}/2,2f_{max}}$ .

Moreover, if $\Phi=\lambda I_{D}$ ( $\lambda>0$ ) is an homogeneous dilation, then $P^{\prime}\in\mathcal{P}^{k}_{\lambda\tau_{min},\mathbf{L}_{(\lambda)},f_{min}/\lambda^{d},f_{max}/\lambda^{d}}$ , where $\mathbf{L}_{(\lambda)}=(L_{\perp}/\lambda,L_{3}/\lambda^{2},\ldots,L_{k}/\lambda^{k-1})$ .

Proposition A.4 follows from a geometric reparametrization argument (Proposition A.5 in Appendix A) and a change of variable result for the Hausdorff measure (Lemma A.6 in Appendix A).

2.3 Necessity of a Global Assumption

In the previous Section 2.2, we generalized $\mathcal{C}^{2}$ -like models — stated in terms of reach — to $\mathcal{C}^{k}$ , for $k\geq 3$ , by imposing higher order differentiability bounds on parametrizations $\Psi_{p}$ ’s. The following Theorem 1 shows that the global assumption $\tau_{M}\geq\tau_{min}>0$ is necessary for estimation purpose.

Theorem 1.

Assume that $\tau_{min}=0$ . If $D\geq d+3$ , then for all $k\geq 3$ and $L_{\perp}>0$ , provided that $L_{3}/L_{\perp}^{2},\ldots,{L_{k}}/L_{\perp}^{k-1},{L_{\perp}^{d}}/{f_{min}}$ and ${f_{max}}/{L_{\perp}^{d}}$ are large enough (depending only on $d$ and $k$ ), for all $n\geq 1$ ,

[TABLE]

where the infimum is taken over all the estimators $\hat{T}=\hat{T}\bigl{(}X_{1},\ldots,X_{n}\bigr{)}$ .

Moreover, for any $D\geq d+1$ , provided that $L_{3}/L_{\perp}^{2},\ldots,{L_{k}}/L_{\perp}^{k-1},{L_{\perp}^{d}}/{f_{min}}$ and ${f_{max}}/{L_{\perp}^{d}}$ are large enough (depending only on $d$ and $k$ ), for all $n\geq 1$ ,

[TABLE]

where the infimum is taken over all the estimators $\widehat{II}=\widehat{II}\bigl{(}X_{1},\ldots,X_{n}\bigr{)}$ .

The proof of Theorem 1 can be found in Section C.5. In other words, if the class of submanifolds is allowed to have arbitrarily small reach, no estimator can perform uniformly well to estimate neither $T_{x}M$ nor $II_{x}^{M}$ . And this, even though each of the underlying submanifolds have arbitrarily smooth parametrizations. Indeed, if two parts of $M$ can nearly intersect around $x$ at an arbitrarily small scale $\Lambda\rightarrow 0$ , no estimator can decide whether the direction (resp. curvature) of $M$ at $x$ is that of the first part or the second part (see Figures 8 and 9).

3 Main Results

Let us now move to the statement of the main results. Given an i.i.d. $n$ -sample $\mathbb{X}_{n}=\left\{X_{1},\ldots,X_{n}\right\}$ with unknown common distribution $P\in\mathcal{P}^{k}(\sigma)$ , we detail non-asymptotic rates for the estimation of tangent spaces $T_{Y_{j}}M$ , second fundamental forms $II_{Y_{j}}^{M}$ , and $M$ itself.

For this, we need one more piece of notation. For $1\leq j\leq n$ , $P_{n-1}^{(j)}$ denotes integration with respect to $1/(n-1)\sum_{i\neq j}\delta_{(X_{i}-X_{j})}$ , and $z^{\otimes i}$ denotes the $D\times i$ -dimensional vector $(z,\ldots,z)$ . For a constant $t>0$ and a bandwidth $h>0$ to be chosen later, we define the local polynomial estimator $(\hat{\pi}_{j},\hat{T}_{2,j},\ldots,\hat{T}_{k-1,j})$ at $X_{j}$ to be any element of

[TABLE]

where $\pi$ ranges among all the orthogonal projectors on $d$ -dimensional subspaces, and $T_{i}:\left(\mathbb{R}^{D}\right)^{i}\rightarrow\mathbb{R}^{D}$ among the symmetric tensors of order $i$ such that $\left\|T_{i}\right\|_{op}\leq t$ . For $k=2$ , the sum over the tensors $T_{i}$ is empty, and the integrated term reduces to $\left\|x-\pi(x)\right\|^{2}\mathbbm{1}_{\mathcal{B}(0,h)}(x)$ . By compactness of the domain of minimization, such a minimizer exists almost surely. In what follows, we will work with a maximum scale $h\leq h_{0}$ , with

[TABLE]

The set of $d$ -dimensional orthogonal projectors is not convex, which leads to a more involved optimization problem than usual least squares. In practice, this problem may be solved using tools from optimization on Grassman manifolds [34], or adopting a two-stage procedure such as in [9]: from local PCA, a first $d$ -dimensional space is estimated at each sample point, along with an orthonormal basis of it. Then, the optimization problem (2) is expressed as a minimization problem in terms of the coefficients of $(\pi_{j},T_{2,j},\ldots,T_{k,j})$ in this basis under orthogonality constraints. It is worth mentioning that a similar problem is explicitly solved in [11], leading to an optimal tangent space estimation procedure in the case $k=3$ .

The constraint $\|T_{i}\|_{op}\leq t$ involves a parameter $t$ to be calibrated. As will be shown in the following section, it is enough to choose $t$ roughly smaller than $1/h$ , but still larger than the unknown norm of the optimal tensors $\|T_{i}^{*}\|_{op}$ . Hence, for $h\rightarrow 0$ , the choice $t=h^{-1}$ works to guarantee optimal convergence rates. Such a constraint on the higher order tensors might have been stated under the form of a $\|.\|_{op}$ -penalized least squares minimization — as in ridge regression — leading to the same results.

3.1 Tangent Spaces

By definition, the tangent space $T_{Y_{j}}M$ is the best linear approximation of $M$ nearby $Y_{j}$ . Thus, it is natural to take the range of the first order term minimizing $\eqref{estimator_full_tensors_definition0}$ and write $\hat{T}_{j}=\operatorname{im}\hat{\pi}_{j}$ . The $\hat{T}_{j}$ ’s approximate simultaneously the $T_{Y_{j}}M$ ’s with high probability, as stated below.

Theorem 2.

Assume that $t\geq C_{k,d,\tau_{min},\mathbf{L}}\geq\sup_{2\leq i\leq k}\|T^{*}_{i}\|_{op}$ . Set $h=\left(C_{d,k}\frac{f_{max}^{2}\log n}{f_{min}^{3}(n-1)}\right)^{1/d}$ , for $C_{d,k}$ large enough, and assume that $\sigma\leq h/4$ . If $n$ is large enough so that $h\leq h_{0}$ , then with probability at least $1-\left(\frac{1}{n}\right)^{k/d}$ ,

[TABLE]

As a consequence, taking $t=h^{-1}$ , for $n$ large enough,

[TABLE]

where $C=C_{d,k,\tau_{min},\mathbf{L},f_{min},f_{max}}$ .

The proof of Theorem 2 is given in Section 5.1.2. The same bound holds for the estimation of $T_{y}M$ at a prescribed $y\in M$ in the model $\mathcal{P}^{k}_{(y)}(\sigma)$ . For that, simply take $P_{n}^{(y)}=1/n\sum_{i}\delta_{(X_{i}-y)}$ as integration in (2).

In the noise-free setting, or when $\sigma\leq h^{k}$ , this result is in line with those of [9] in terms of the sample size dependency $(1/n)^{(k-1)/d}$ . Besides, it shows that the convergence rate of our estimator does not depend on the ambient dimension $D$ , even in codimension greater than $2$ . When $k=2$ , we recover the same rate as [1], where we used local PCA, which is a reformulation of (2). When $k\geq 3$ , the procedure (2) outperforms PCA-based estimators of [31] and [33], where convergence rates of the form $(1/n)^{\beta}$ with $1/d<\beta<2/d$ are obtained. This bound also recovers the result of [11] in the case $k=3$ , where a similar procedure is used. When the noise level $\sigma$ is of order $h^{\alpha}$ , with $1\leq\alpha\leq k$ , Theorem 2 yields a convergence rate in $h^{\alpha-1}$ . Since a polynomial decomposition up to order $k_{\alpha}=\lceil\alpha\rceil$ in (2) results in the same bound, the noise level $\sigma=h^{\alpha}$ may be thought of as an $\alpha$ -regularity threshold. At last, it may be worth mentioning that the results of Theorem 2 also hold when the assumption $\mathbb{E}(Z|Y)=0$ is relaxed. Theorem 2 nearly matches the following lower bound.

Theorem 3.

If $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), then

[TABLE]

where the infimum is taken over all the estimators $\hat{T}=\hat{T}(X_{1},\ldots,X_{n})$ .

A proof of Theorem 3 can be found in Section 5.2.2. When $\sigma\lesssim(1/n)^{k/d}$ , the lower bound matches Theorem 2 in the noise-free case, up to a $\log n$ factor. Thus, the rate $(1/n)^{(k-1)/d}$ is optimal for tangent space estimation on the model $\mathcal{P}^{k}$ . The rate $(\log n/n)^{1/d}$ obtained in [1] for $k=2$ is therefore optimal, as well as the rate $(\log n/n)^{2/d}$ given in [11] for $k=3$ . The rate $(1/n)^{(k-1)/d}$ naturally appears on the the model $\mathcal{P}^{k}$ , as the estimation rate of differential objects of order $1$ from $k$ -smooth submanifolds.

When $\sigma\asymp(1/n)^{\alpha/d}$ with $\alpha<k$ , the lower bound provided by Theorem 3 is of order $(1/n)^{(k-1)(\alpha+d)/[d(d+k)]}$ , hence smaller than the $(1/n)^{\alpha/d}$ rate of Theorem 2. This suggests that the local polynomial estimator (2) is suboptimal whenever $\sigma\gg(1/n)^{k/d}$ on the model $\mathcal{P}^{k}(\sigma)$ .

Here again, the same lower bound holds for the estimation of $T_{y}M$ at a fixed point $y$ in the model $\mathcal{P}^{k}_{(y)}(\sigma)$ .

3.2 Curvature

The second fundamental form $II_{Y_{j}}^{M}:T_{Y_{j}}M\times T_{Y_{j}}M\rightarrow T_{Y_{j}}M^{\perp}\subset\mathbb{R}^{D}$ is a symmetric bilinear map that encodes completely the curvature of $M$ at $Y_{j}$ [13, Chap. 6, Proposition 3.1]. Estimating it only from a point cloud $\mathbb{X}_{n}$ does not trivially make sense, since $II_{Y_{j}}^{M}$ has domain $T_{Y_{j}}M$ which is unknown. To bypass this issue we extend $II_{Y_{j}}^{M}$ to $\mathbb{R}^{D}$ . That is, we consider the estimation of $II_{Y_{j}}^{M}\circ\pi_{T_{Y_{j}}M}$ which has full domain $\mathbb{R}^{D}$ . Following the same ideas as in the previous Section 3.1, we use the second order tensor $\hat{T}_{2,j}\circ\hat{\pi}_{j}$ obtained in (2) to estimate $II_{Y_{j}}^{M}\circ\pi_{T_{Y_{j}}M}$ .

Theorem 4.

*Let $k\geq 3$ . Take $h$ as in Theorem 2, $\sigma\leq h/4$ , and $t=1/h$ . If $n$ is large enough so that $h\leq h_{0}$ and $h^{-1}\geq C^{-1}_{k,d,\tau_{min},\mathbf{L}}\geq(\sup_{2\leq i\leq k}\|T^{*}_{i}\|_{op})^{-1}$ , then with probability at least $1-\left(\frac{1}{n}\right)^{k/d}$ , *

[TABLE]

In particular, for $n$ large enough,

[TABLE]

The proof of Theorem 4 is given in Section 5.1.3. As in Theorem 2, the case $\sigma\leq h^{k}$ may be thought of as a noise-free setting, and provides an upper bound of the form $h^{k-2}$ . Interestingly, Theorems 2 and 4 are enough to provide estimators of various notions of curvature. For instance, consider the scalar curvature [13, Section 4.4] at a point $Y_{j}$ , defined by

[TABLE]

where $(e_{r})_{1\leq r\leq d}$ is an orthonormal basis of $T_{Y_{j}}M$ . A plugin estimator of $Sc_{Y_{j}}^{M}$ is

[TABLE]

where $(\hat{e}_{r})_{1\leq r\leq d}$ is an orthonormal basis of $\hat{T}_{2,j}$ . Theorems 2 and 4 yield

[TABLE]

where $C=C_{d,k,\tau_{min},\mathbf{L},f_{min},f_{max}}$ .

The (near-)optimality of the bound stated in Theorem 4 is assessed by the following lower bound.

Theorem 5.

If $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), then

[TABLE]

where the infimum is taken over all the estimators $\widehat{II}=\widehat{II}(X_{1},\ldots,X_{n})$ .

The proof of Theorem 5 is given in Section 5.2.2. The same remarks as in Section 3.1 hold. If the estimation problem consists in approximating $II_{y}^{M}$ at a fixed point $y$ known to belong to $M$ beforehand, we obtain the same rates. The ambient dimension $D$ still plays no role. The shift $k-2$ in the rate of convergence on a $\mathcal{C}^{k}$ -model can be interpreted as the order of derivation of the object of interest, that is $2$ for curvature.

Notice that the lower bound (Theorem 5) does not require $k\geq 3$ . Hence, we get that for $k=2$ , curvature cannot be estimated uniformly consistently on the $\mathcal{C}^{2}$ -model $\mathcal{P}^{2}$ . This seems natural, since the estimation of a second order quantity should require an additional degree of smoothness.

3.3 Support Estimation

For each $1\leq j\leq n$ , the minimization (2) outputs a series of tensors $(\hat{\pi}_{j},\hat{T}_{2,j},\ldots,\hat{T}_{k-1,j})$ . This collection of multidimensional monomials can be further exploited as follows. By construction, they fit $M$ at scale $h$ around $Y_{j}$ , so that

[TABLE]

is a good candidate for an approximate parametrization in a neighborhood of $Y_{j}$ . We do not know the domain $T_{Y_{j}}M$ of the initial parametrization, though we have at hand an approximation $\hat{T}_{j}=\operatorname{im}\hat{\pi}_{j}$ which was proved to be consistent in Section 3.1. As a consequence, we let the support estimator based on local polynomials $\hat{M}$ be

[TABLE]

The set $\hat{M}$ has no reason to be globally smooth, since it consists of a mere union of polynomial patches (Figure 4). However, $\hat{M}$ is provably close to $M$ for the Hausdorff distance.

Theorem 6.

With the same assumptions as Theorem 4, with probability at least $1-2\left(\frac{1}{n}\right)^{\frac{k}{d}}$ , we have

[TABLE]

In particular, for $n$ large enough,

[TABLE]

A proof of Theorem 6 is given in Section 5.1.4. As in Theorem 2, for a noise level of order $h^{\alpha}$ , $\alpha\geq 1$ , Theorem 6 yields a convergence rate of order $h^{(k\wedge\alpha)/d}$ . Thus the noise level $\sigma$ may also be thought of as a regularity threshold. Contrary to [21, Theorem 2], the case $h/4<\sigma<\tau_{min}$ is not in the scope of Theorem 6. Moreover, for $1\leq\alpha<2d/(d+2)$ , [21, Theorem 2] provides a better convergence rate of $h^{2/(d+2)}$ . Note however that Theorem 6 is also valid whenever the assumption $\mathbb{E}(Z|Y)=0$ is relaxed. In this non-centered noise framework, Theorem 6 outperforms [26, Theorem 7] in the case $d\geq 3$ , $k=2$ , and $\sigma\leq h^{2}$ .

In the noise-free case or when $\sigma\leq h^{k}$ , for $k=2$ , we recover the rate $(\log n/n)^{2/d}$ obtained in [1, 20, 25] and improve the rate $(\log n/n)^{2/(d+2)}$ in [21, 26]. However, our estimator $\hat{M}$ is an unstructured union of $d$ -dimensional balls in $\mathbb{R}^{D}$ . Consequently, $\hat{M}$ does not recover the topology of $M$ as the estimator of [1] does.

When $k\geq 3$ , $\hat{M}$ outperforms reconstruction procedures based on a somewhat piecewise linear interpolation [1, 20, 26], and achieves the faster rate $(\log n/n)^{k/d}$ for the Hausdorff loss. This seems quite natural, since our procedure fits higher order terms. This is done at the price of a probably worse dependency on the dimension $d$ than in [1, 20]. Theorem 6 is now proved to be (almost) minimax optimal.

Theorem 7.

If $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), then for $n$ large enough,

[TABLE]

where the infimum is taken over all the estimators $\hat{M}=\hat{M}(X_{1},\ldots,X_{n})$ .

Theorem 7, whose proof is given in Section 5.2.1, is obtained from Le Cam’s Lemma (Theorem C.20). Let us note that it is likely for the extra $\log n$ term appearing in Theorem 6 to actually be present in the minimax rate. Roughly, it is due to the fact that the Hausdorff distance $d_{H}$ is similar to a $L^{\infty}$ loss. The $\log n$ term may be obtained in Theorem 7 with the same combinatorial analysis as in [25] for $k=2$ .

As for the estimation of tangent spaces and curvature, Theorem 7 matches the upper bound in Theorem 6 in the noise-free case $\sigma\lesssim(1/n)^{k/d}$ . Moreover, for $\sigma<\tau_{min}$ , it also generalizes Theorem 1 in [21] to higher orders of regularity ( $k\geq 3$ ). Again, for $\sigma\gg(1/n)^{-k/d}$ , the upper bound in Theorem 6 is larger than the lower bound stated in Theorem 7. However our estimator $\hat{M}$ achieves the same convergence rate if the assumption $\mathbb{E}(Z|Y)$ is dropped.

4 Conclusion, Prospects

In this article, we derived non-asymptotic bounds for inference of geometric objects associated with smooth submanifolds $M\subset\mathbb{R}^{D}$ . We focused on tangent spaces, second fundamental forms, and the submanifold itself. We introduced new regularity classes $\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ for submanifolds that extend the case $k=2$ . For each object of interest, the proposed estimator relies on local polynomials that can be computed through a least square minimization. Minimax lower bounds were presented, matching the upper bounds up to $\log n$ factors in the regime of small noise.

The implementation of (2) needs to be investigated. The non-convexity of the criterion comes from that we minimize over the space of orthogonal projectors, which is non-convex. However, that space is pretty well understood, and it seems possible to implement gradient descents on it [34]. Another way to improve our procedure could be to fit orthogonal polynomials instead of monomials. Such a modification may also lead to improved dependency on the dimension $d$ and the regularity $k$ in the bounds for both tangent space and support estimation.

Though the stated lower bounds are valid for quite general tubular noise levels $\sigma$ , it seems that our estimators based on local polynomials are suboptimal whenever $\sigma$ is larger than the expected precision for $\mathcal{C}^{k}$ models in a $d$ -dimensional space (roughly $(1/n)^{k/d}$ ). In such a setting, it is likely that a preliminary centering procedure is needed, as the one exposed in [21]. Other pre-processings of the data might adapt our estimators to other types of noise. For instance, whenevever outliers are allowed in the model $\mathcal{C}^{2}$ , [1] proposes an iterative denoising procedure based on tangent space estimation. It exploits the fact that tangent space estimation allows to remove a part of outliers, and removing outliers enhances tangent space estimation. An interesting question would be to study how this method can apply with local polynomials.

Another open question is that of exact topology recovery with fast rates for $k\geq 3$ . Indeed, $\hat{M}$ converges at rate $(\log n/n)^{k/d}$ but is unstructured. It would be nice to glue the patches of $\hat{M}$ together, for example using interpolation techniques, following the ideas of [18].

5 Proofs

5.1 Upper bounds

5.1.1 Preliminary results on polynomial expansions

To prove Theorem 2, 4 and 6, the following lemmas are needed. First, we relate the existence of parametrizations $\Psi_{p}$ ’s mentioned in Definition 1 to a local polynomial decomposition.

Lemma 2.

For any $M\in\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ and $y\in M$ , the following holds.

(i)

For all $v_{1},v_{2}\in\mathcal{B}_{T_{y}M}\left(0,\frac{1}{4L_{\perp}}\right)$ ,

[TABLE] 2. (ii)

For all $h\leq\frac{1}{4L_{\perp}}\wedge\frac{2\tau_{min}}{5}$ ,

[TABLE] 3. (iii)

For all $h\leq\frac{\tau_{min}}{2}$ ,

[TABLE] 4. (iv)

Denoting by $\pi^{\ast}=\pi_{T_{y}M}$ the orthogonal projection onto $T_{y}M$ , for all $y\in M$ , there exist multilinear maps $T_{2}^{\ast},\ldots,T_{k-1}^{\ast}$ from $T_{y}M$ to $\mathbb{R}^{D}$ , and $R_{k}$ such that for all $y^{\prime}\in\mathcal{B}\left(y,\frac{\tau_{min}\wedge L_{\perp}^{-1}}{4}\right)\cap M$ ,

[TABLE]

with

[TABLE]

where $L^{\prime}_{i}$ depends on $d,k,\tau_{min},L_{\perp},\ldots,L_{i}$ , and $C$ on $d$ , $k$ , $\tau_{min}$ , $L_{\perp}$ , $\ldots$ , $L_{k}$ . Moreover, for $k\geq 3$ , $T_{2}^{\ast}=II^{M}_{y}$ . 5. (v)

For all $y\in M$ , $\left\|II^{M}_{y}\right\|_{op}\leq 1/\tau_{min}$ . In particular, the sectional curvatures of $M$ satisfy

[TABLE]

The proof of Lemma 2 can be found in Section A.2. A direct consequence of Lemma 2 is the following Lemma 3.

Lemma 3.

Set $h_{0}=(\tau_{min}\wedge L_{\perp}^{-1})/8$ and $h\leq h_{0}$ . Let $M\in\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ , $x_{0}=y_{0}+z_{0}$ , with $y_{0}\in M$ and $\|z_{0}\|\leq\sigma\leq h/4$ . Denote by $\pi^{*}$ the orthogonal projection onto $T_{y_{0}}M$ , and by $T_{2}^{*},\ldots,T_{k-1}^{*}$ the multilinear maps given by Lemma 2, $iv)$ .

Then, for any $x=y+z$ such that $y\in M$ , $\|z\|\leq\sigma\leq h/4$ and $x\in\mathcal{B}(x_{0},h)$ , for any orthogonal projection $\pi$ and multilinear maps $T_{2},\ldots,T_{k-1}$ , we have

[TABLE]

where $T^{\prime}_{j}$ are $j$ -linear maps, and $\|R_{k}(x-x_{0})\|\leq C\left(\sigma\vee h^{k}\right)(1+th)$ , with $t=\max_{j=2\ldots,k}{\|T\|_{op}}$ and $C$ depending on $d$ , $k$ , $\tau_{min}$ , $L_{\perp}$ , $\ldots$ , $L_{k}$ . Moreover, we have

[TABLE]

and, if $\pi=\pi^{*}$ and $T_{i}=T_{i}^{*}$ , for $i=2,\ldots,k-1$ , then $T^{\prime}_{j}=0$ , for $j=1,\ldots,k$ .

Lemma 3 roughly states that, if $\pi$ , $T_{j}$ , $j\geq 2$ are designed to locally approximate $x=y+z$ around $x_{0}=y_{0}+z_{0}$ , then the approximation error may be expressed as a polynomial expansion in $\pi^{*}(y-y_{0})$ .

Proof of Lemma 3.

For short assume that $y_{0}=0$ . In what follows $C$ will denote a constant depending on $d$ , $k$ , $\tau_{min}$ , $L_{\perp}$ , $\ldots$ , $L_{k}$ . We may write

[TABLE]

with $\|R^{\prime}_{k}(x-x_{0})\|\leq C\sigma(1+th)$ . Since $\sigma\leq h/4$ , $y\in\mathcal{B}(0,3h/2)$ , with $h\leq h_{0}$ . Hence Lemma 2 entails

[TABLE]

with $\|R^{\prime\prime}_{k}(y)\|\leq Ch^{k}$ . We deduce that

[TABLE]

with $\|R^{\prime\prime\prime}_{k}(y)\|\leq Cth^{k+1}$ , since only tensors of order greater than $2$ are involved in $R^{\prime\prime\prime}_{k}$ . Since $T_{2}^{*}=II^{M}_{0}$ , $\pi^{*}\circ T_{2}^{*}=0$ , hence the result. ∎

At last, we need a result relating deviation in terms of polynomial norm and $L^{2}(P_{0,n-1}^{(j)})$ norm, where $P_{0}\in\mathcal{P}^{k}$ , for polynomials taking arguments in $\pi^{*,(j)}(y)$ . For clarity’s sake, the bounds are given for $j=1$ , and we denote $P_{0,n-1}^{(1)}$ by $P_{0,n-1}$ . Without loss of generality, we can assume that $Y_{1}=0$ .

Let $\mathbb{R}^{k}[y_{1:d}]$ denote the set of real-valued polynomial functions in $d$ variables with degree less than $k$ . For $S\in\mathbb{R}^{k}[y_{1:d}]$ , we denote by $\|S\|_{2}$ the Euclidean norm of its coefficients, and by $S_{h}$ the polynomial defined by $S_{h}(y_{1:d})=S(hy_{1:d})$ . With a slight abuse of notation, $S(\pi^{*}(y))$ will denote $S(e_{1}^{*}(\pi^{*}(y)),\ldots,e_{d}^{*}(\pi^{*}(y)))$ , where $e_{1}^{*},\ldots,e_{d}^{*}$ form an orthonormal coordinate system of $T_{0}M$ .

Proposition 2.

Set $h=\left(K\frac{\log n}{n-1}\right)^{\frac{1}{d}}$ . There exist constants $\kappa_{k,d}$ , $c_{k,d}$ and $C_{d}$ such that, if $K\geq(\kappa_{k,d}f_{max}^{2}/f_{min}^{3})$ and $n$ is large enough so that $h\leq h_{0}\leq\tau_{min}/8$ , then with probability at least $1-\left(\frac{1}{n}\right)^{\frac{k}{d}+1}$ , we have

[TABLE]

for every $S\in\mathbb{R}^{k}[y_{1:d}]$ , where $N(3h/2)=\sum_{j=2}^{n}\mathbbm{1}_{\mathcal{B}(0,3h/2)}(Y_{j})$ .

The proof of Proposition B.8 is deferred to Section B.2.

5.1.2 Upper Bound for Tangent Space Estimation

Proof of Theorem 2.

We recall that for every $j=1,\ldots,n$ , $X_{j}=Y_{j}+Z_{j}$ , where $Y_{j}\in M$ is drawn from $P_{0}$ and $\|Z_{j}\|\leq\sigma\leq h/4$ , where $h\leq h_{0}$ as defined in Lemma 3. Without loss of generality we consider the case $j=1$ , $Y_{1}=0$ . From now on we assume that the probability event defined in Proposition B.8 occurs, and denote by $\mathcal{R}_{n-1}(\pi,T_{2},\ldots,T_{k-1})$ the empirical criterion defined by (2). Note that $X_{j}\in\mathcal{B}(X_{1},h)$ entails $Y_{j}\in\mathcal{B}(0,3h/2)$ . Moreover, since for $t\geq\max_{i=2,\ldots,k-1}\|T^{*}_{i}\|_{op}$ , $\mathcal{R}_{n-1}(\hat{\pi},\hat{T}_{1},\ldots,\hat{T}_{k-1})\leq\mathcal{R}_{n-1}(\pi^{*},T^{*}_{2},\ldots,T_{k-1}^{*})$ , we deduce that

[TABLE]

according to Lemma 3. On the other hand, note that if $Y_{j}\in\mathcal{B}(0,h/2)$ , then $X_{j}\in\mathcal{B}(X_{1},h)$ . Lemma 3 then yields

[TABLE]

Using Proposition B.8, we can decompose the right-hand side as

[TABLE]

where for any tensor $T$ , $T^{(r)}$ denotes the $r$ -th coordinate of $T$ and is considered as a real valued $r$ -order polynomial. Then, applying Proposition B.8 to each coordinate leads to

[TABLE]

It follows that, for $1\leq j\leq k$ ,

[TABLE]

Noting that, according to [22, Section 2.6.2],

[TABLE]

we deduce that

[TABLE]

Theorem 2 then follows from a straightforward union bound. ∎

5.1.3 Upper Bound for Curvature Estimation

Proof of Theorem 4.

Without loss of generality, the derivation is conducted in the same framework as in the previous Section 5.1.2. In accordance with assumptions of Theorem 4, we assume that $\max_{2\leq i\leq k}\|T_{i}^{*}\|_{op}\leq t\leq 1/h$ . Since, according to Lemma 3,

[TABLE]

we deduce that

[TABLE]

Using (3) with $j=1,2$ and $th\leq 1$ leads to

[TABLE]

Finally, Lemma 2 states that $II_{Y_{1}}^{M}=T_{2}^{*}$ . Theorem 4 follows from a union bound. ∎

5.1.4 Upper Bound for Manifold Estimation

Proof of Theorem 6

.

Recall that we take $X_{i}=Y_{i}+Z_{i}$ , where $Y_{i}$ has distribution $P_{0}$ and $\|Z_{j}\|\leq\sigma\leq h/4$ . We also assume that the probability events of Proposition B.8 occur simultaneously at each $Y_{i}$ , so that (3) holds for all $i$ , with probability larger than $1-(1/n)^{k/d}$ . Without loss of generality set $Y_{1}=0$ . Let $v\in\mathcal{B}_{\hat{T}_{1}M}(0,7h/8)$ be fixed. Notice that $\pi^{*}(v)\in\mathcal{B}_{T_{0}M}(0,7h/8)$ . Hence, according to Lemma 2, there exists $y\in\mathcal{B}(0,h)\cap M$ such that $\pi^{*}(v)=\pi^{*}(y)$ . According to (3), we may write

[TABLE]

where, since $\|\hat{T}_{j}\|_{op}\leq 1/h$ , $\|R_{k}(v)\|\leq C_{k,d,\tau_{min},\mathbf{L}}\sqrt{f_{max}/f_{min}}(h^{k}\vee\sigma)$ . Using (3) again leads to

[TABLE]

where $\|R^{\prime}(\pi^{*}(y))\|\leq C_{k,d,\tau_{min},\mathbf{L}}\sqrt{f_{max}/f_{min}}(h^{k}\vee\sigma)$ . According to Lemma 2, we deduce that $\|\widehat{\Psi}(v)-y\|\leq C_{k,d,\tau_{min},\mathbf{L}}\sqrt{f_{max}/f_{min}}(h^{k}\vee\sigma)$ , hence

[TABLE]

Now we focus on $\sup_{y\in M}d(y,\hat{M})$ . For this, we need a lemma ensuring that $\mathbb{Y}_{n}=\{Y_{1},\ldots,Y_{n}\}$ covers $M$ with high probability.

Lemma 4.

Let $h=\left(\frac{C^{\prime}_{d}k}{f_{min}}\frac{\log n}{n}\right)^{1/d}$ with $C^{\prime}_{d}$ large enough. Then for $n$ large enough so that $h\leq\tau_{min}/4$ , with probability at least $1-\left(\frac{1}{n}\right)^{k/d}$ ,

[TABLE]

The proof of Lemma 4 is given in Section B.1. Now we choose $h$ satisfying the conditions of Proposition B.8 and Lemma 4. Let $y$ be in $M$ and assume that $\left\|y-Y_{j_{0}}\right\|\leq h/2$ . Then $y\in\mathcal{B}(X_{j_{0}},3h/4)$ . According to Lemma 3 and (3), we deduce that $\|\widehat{\Psi}_{j_{0}}(\hat{\pi}_{j_{0}}(y-X_{j_{0}}))-y\|\leq C_{k,d,\tau_{min},\mathbf{L}}\sqrt{f_{max}/f_{min}}(h^{k}\vee\sigma)$ . Hence, from Lemma 4,

[TABLE]

with probability at least $1-2\left(\frac{1}{n}\right)^{k/d}$ . Combining (4) and (5) gives Theorem 6. ∎

5.2 Minimax Lower Bounds

This section is devoted to describe the main ideas of the proofs of the minimax lower bounds. We prove Theorem 7 on one side, and Theorem 3 and Theorem 5 in a unified way on the other side. The methods used rely on hypothesis comparison [36].

5.2.1 Lower Bound for Manifold Estimation

We recall that for two distributions $Q$ and $Q^{\prime}$ defined on the same space, the $L^{1}$ test affinity $\left\|Q\wedge Q^{\prime}\right\|_{1}$ is given by

[TABLE]

where $dQ$ and $dQ^{\prime}$ denote densities of $Q$ and $Q^{\prime}$ with respect to any dominating measure.

The first technique we use, involving only two hypotheses, is usually referred to as Le Cam’s Lemma [36]. Let $\mathcal{P}$ be a model and $\theta(P)$ be the parameter of interest. Assume that $\theta(P)$ belongs to a pseudo-metric space $(\mathcal{D},d)$ , that is $d(\cdot,\cdot)$ is symmetric and satisfies the triangle inequality. Le Cam’s Lemma can be adapted to our framework as follows.

Theorem 8 (Le Cam’s Lemma [36]).

For all pairs $P,P^{\prime}$ in $\mathcal{P}$ ,

[TABLE]

where the infimum is taken over all the estimators $\hat{\theta}=\hat{\theta}(X_{1},\ldots,X_{n})$ .

In this section, we will get interested in $\mathcal{P}=\mathcal{P}^{k}(\sigma)$ and $\theta(P)=M$ , with $d=d_{H}$ . In order to derive Theorem 7, we build two different pairs $(P_{0},P_{1})$ , $(P_{0}^{\sigma},P_{1}^{\sigma})$ of hypotheses in the model $\mathcal{P}^{k}(\sigma)$ . Each pair will exploit a different property of the model $\mathcal{P}^{k}(\sigma)$ .

The first pair $(P_{0},P_{1})$ of hypotheses (Lemma 5) is built in the model $\mathcal{P}^{k}\subset\mathcal{P}^{k}(\sigma)$ , and exploits the geometric difficulty of manifold reconstruction, even if no noise is present. These hypotheses, depicted in Figure 5, consist of bumped versions of one another.

Lemma 5.

Under the assumptions of Theorem 7, there exist $P_{0},P_{1}\in\mathcal{P}^{k}$ with associated submanifolds $M_{0},M_{1}$ such that

[TABLE]

The proof of Lemma 5 is to be found in Section C.4.1.

The second pair $(P_{0}^{\sigma},P_{1}^{\sigma})$ of hypotheses (Lemma 6) has a similar construction than $(P_{0},P_{1})$ . Roughly speaking, they are the uniform distributions on the offsets of radii $\sigma/2$ of $M_{0}$ and $M_{1}$ of Figure 5. Here, the hypotheses are built in $\mathcal{P}^{k}(\sigma)$ , and fully exploit the statistical difficulty of manifold reconstruction induced by noise.

Lemma 6.

Under the assumptions of Theorem 7, there exist $P_{0}^{\sigma},P_{1}^{\sigma}\in\mathcal{P}^{k}(\sigma)$ with associated submanifolds $M_{0}^{\sigma},M_{1}^{\sigma}$ such that

[TABLE]

The proof of Lemma 6 is to be found in Section C.4.2. We are now in position to prove Theorem 7.

Proof of Theorem 7.

Let us apply Theorem C.20 with $\mathcal{P}=\mathcal{P}^{k}(\sigma)$ , $\theta(P)=M$ and $d=d_{H}$ . Taking $P=P_{0}$ and $P^{\prime}=P_{1}$ of Lemma 5, these distributions both belong to $\mathcal{P}^{k}\subset\mathcal{P}^{k}(\sigma)$ , so that Theorem C.20 yields

[TABLE]

Similarly, setting hypotheses $P=P_{0}^{\sigma}$ and $P^{\prime}=P_{1}^{\sigma}$ of Lemma 6 yields

[TABLE]

which concludes the proof. ∎

5.2.2 Lower Bounds for Tangent Space and Curvature Estimation

Let us now move to the proof of Theorem 3 and 5, that consist of lower bounds for the estimation of $T_{X_{1}}M$ and $II_{X_{1}}^{M}$ with random base point $X_{1}$ . In both cases, the loss can be cast as

[TABLE]

where $\hat{\theta}=\hat{\theta}(X,X^{\prime})$ , with $X=X_{1}$ driving the parameter of interest, and $X^{\prime}=(X_{2},\ldots,X_{n})=X_{2:n}$ . Since $\|.\|_{L^{1}(P)}$ obviously depends on $P$ , the technique exposed in the previous section does not apply anymore. However, a slight adaptation of Assouad’s Lemma [36] with an extra conditioning on $X=X_{1}$ carries out for our purpose. Let us now detail a general framework where the method applies.

We let $\mathcal{X},\mathcal{X}^{\prime}$ denote measured spaces. For a probability distribution $Q$ on $\mathcal{X}\times\mathcal{X}^{\prime}$ , we let $(X,X^{\prime})$ be a random variable with distribution $Q$ . The marginals of $Q$ on $\mathcal{X}$ and $\mathcal{X}^{\prime}$ are denoted by $\mu$ and $\nu$ respectively. Let $(\mathcal{D},d)$ be a pseudo-metric space. For $Q\in\mathcal{Q}$ , we let $\theta_{\cdot}(Q):\mathcal{X}\rightarrow\mathcal{D}$ be defined $\mu$ -almost surely, where $\mu$ is the marginal distribution of $Q$ on $\mathcal{X}$ . The parameter of interest is $\theta_{X}(Q)$ , and the associated minimax risk over $\mathcal{Q}$ is

[TABLE]

where the infimum is taken over all the estimators $\hat{\theta}:\mathcal{X}\times\mathcal{X}^{\prime}\rightarrow\mathcal{D}$ .

Given a set of probability distributions $\mathcal{Q}$ on $\mathcal{X}\times\mathcal{X}^{\prime}$ , write $\overline{Conv}(\mathcal{Q})$ for the set of mixture probability distributions with components in $\mathcal{Q}$ . For all $\tau=(\tau_{1},\ldots,\tau_{m})\in\left\{0,1\right\}^{m}$ , $\tau^{k}$ denotes the $m$ -tuple that differs from $\tau$ only at the $k$ th position. We are now in position to state the conditional version of Assouad’s Lemma that allows to lower bound the minimax risk (6).

Lemma 7 (Conditional Assouad).

Let $m\geq 1$ be an integer and let $\left\{\mathcal{Q}_{\tau}\right\}_{\tau\in\{0,1\}^{m}}$ be a family of $2^{m}$ submodels $\mathcal{Q}_{\tau}\subset\mathcal{Q}$ . Let $\left\{U_{k}\times U^{\prime}_{k}\right\}_{1\leq k\leq m}$ be a family of pairwise disjoint subsets of $\mathcal{X}\times\mathcal{X}^{\prime}$ , and $\mathcal{D}_{\tau,k}$ be subsets of $\mathcal{D}$ . Assume that for all $\tau\in\left\{0,1\right\}^{m}$ and $1\leq k\leq m$ ,

•

for all $Q_{\tau}\in\mathcal{Q}_{\tau}$ , $\theta_{X}(Q_{\tau})\in\mathcal{D}_{\tau,k}$ on the event $\left\{X\in U_{k}\right\}$ ;

•

for all $\theta\in\mathcal{D}_{\tau,k}$ and $\theta^{\prime}\in\mathcal{D}_{\tau^{k},k}$ , $d(\theta,\theta^{\prime})\geq\Delta$ .

For all $\tau\in\left\{0,1\right\}^{m}$ , let $\overline{Q}_{\tau}\in\overline{Conv}(\mathcal{Q}_{\tau})$ , and write $\bar{\mu}_{\tau}$ and $\bar{\nu}_{\tau}$ for the marginal distributions of $\overline{Q}_{\tau}$ on $\mathcal{X}$ and $\mathcal{X}^{\prime}$ respectively. Assume that if $(X,X^{\prime})$ has distribution $\overline{Q}_{\tau}$ , $X$ and $X^{\prime}$ are independent conditionally on the event $\left\{(X,X^{\prime})\in U_{k}\times U^{\prime}_{k}\right\}$ , and that

[TABLE]

Then,

[TABLE]

where the infimum is taken over all the estimators $\hat{\theta}:\mathcal{X}\times\mathcal{X}^{\prime}\rightarrow\mathcal{D}$ .

Note that for a model of the form $\mathcal{Q}=\left\{\delta_{x_{0}}\otimes P,P\in\mathcal{P}\right\}$ with fixed $x_{0}\in\mathcal{X}$ , one recovers the classical Assouad’s Lemma [36] taking $U_{k}=\mathcal{X}$ and $U^{\prime}_{k}=\mathcal{X}^{\prime}$ . Indeed, when $X=x$ is deterministic, the parameter of interest $\theta_{X}(Q)=\theta(Q)$ can be seen as non-random.

In this section, we will get interested in $\mathcal{Q}=\mathcal{P}^{k}(\sigma)^{\otimes n}$ , and $\theta_{X}(Q)=\theta_{X_{1}}(Q)$ being alternatively $T_{X_{1}}M$ and $II_{X_{1}}^{M}$ . Similarly to Section 5.2.1, we build two different families of submodels, each of them will exploit a different kind of difficulty for tangent space and curvature estimation.

The first family, described in Lemma 8, highlights the geometric difficulty of the estimation problems, even when the noise level $\sigma$ is small, or even zero. Let us emphasize that the estimation error is integrated with respect to the distribution of $X_{1}$ . Hence, considering mixture hypotheses is natural, since building manifolds with different tangent spaces (or curvature) necessarily leads to distributions that are locally singular. Here, as in Section 5.2.1, the considered hypotheses are composed of bumped manifolds (see Figure 7). We defer the proof of Lemma 8 to Section C.3.1.

Lemma 8.

Assume that the conditions of Theorem 3 or 5 hold. Given $i\in\{1,2\}$ , there exists a family of $2^{m}$ submodels $\bigl{\{}\mathcal{P}^{(i)}_{\tau}\bigr{\}}_{\tau\in\{0,1\}^{m}}\subset\mathcal{P}^{k}$ , together with pairwise disjoint subsets $\{U_{k}\times U_{k}^{\prime}\}_{1\leq k\leq m}$ of $\mathbb{R}^{D}\times\bigl{(}\mathbb{R}^{D}\bigr{)}^{n-1}$ such that the following holds for all $\tau\in\{0,1\}^{m}$ and $1\leq k\leq m$ .

For any distribution $P^{(i)}_{\tau}\in\mathcal{P}^{(i)}_{\tau}$ with support $M^{(i)}_{\tau}=Supp\bigl{(}P^{(i)}_{\tau}\bigr{)}$ , if $\left(X_{1},\ldots,X_{n}\right)$ has distribution $\bigl{(}P^{(i)}_{\tau}\bigr{)}^{\otimes n}$ , then on the event $\left\{X_{1}\in U_{k}\right\}$ , we have:

•

if $\tau_{k}=0$ ,

[TABLE]

•

if $\tau_{k}=1$ ,

–

for $i=1$ : $\displaystyle\angle\left(T_{X_{1}}M^{(1)}_{\tau},\mathbb{R}^{d}\times\left\{0\right\}^{D-d}\right)\geq c_{k,d,\tau_{min}}\left(\frac{1}{n-1}\right)^{\frac{k-1}{d}}$ ,

–

for $i=2$ : $\displaystyle\left\|II_{X_{1}}^{M^{(2)}_{\tau}}\circ\pi_{T_{X_{1}}{M^{(2)}_{\tau}}}\right\|_{op}\geq c_{k,d,\tau_{min}}\left(\frac{1}{n-1}\right)^{\frac{k-2}{d}}.$

Furthermore, there exists $\bar{Q}^{(i)}_{\tau,n}\in\overline{Conv}\bigl{(}\bigl{(}\mathcal{P}^{(i)}_{\tau}\bigr{)}^{\otimes n}\bigr{)}$ such that if $\left(Z_{1},\ldots,Z_{n}\right)=\left(Z_{1},Z_{2:n}\right)$ has distribution $\bar{Q}^{(i)}_{\tau,n}$ , $Z_{1}$ and $Z_{2:n}$ are independent conditionally on the event $\left\{\left(Z_{1},Z_{2:n}\right)\in U_{k}\times U^{\prime}_{k}\right\}$ . The marginal distributions of $\bar{Q}^{(i)}_{\tau,n}$ on $\mathbb{R}^{D}\times\bigl{(}\mathbb{R}^{D}\bigr{)}^{n-1}$ are $\bar{Q}^{(i)}_{\tau,1}$ and $\bar{Q}^{(i)}_{\tau,n-1}$ , and we have

[TABLE]

The second family, described in Lemma 9, testifies of the statistical difficulty of the estimation problem when the noise level $\sigma$ is large enough. The construction is very similar to Lemma 8 (see Figure 7). Though, in this case, the magnitude of the noise drives the statistical difficulty, as opposed to the sampling scale in Lemma 8. Note that in this case, considering mixture distributions is not necessary since the ample-enough noise make bumps that are absolutely continuous with respect to each other. The proof of Lemma 9 can be found in Section C.3.2.

Lemma 9.

Assume that the conditions of Theorem 3 or 5 hold, and that $\sigma\geq C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ for $C_{k,d,\tau_{min}}>0$ large enough. Given $i\in\{1,2\}$ , there exists a collection of $2^{m}$ distributions $\bigl{\{}\mathbf{P}_{\tau}^{(i),\sigma}\bigr{\}}_{\tau\in\{0,1\}^{m}}\subset\mathcal{P}^{k}(\sigma)$ with associated submanifolds $\bigl{\{}M_{\tau}^{(i),\sigma}\bigr{\}}_{\tau\in\{0,1\}^{m}}$ , together with pairwise disjoint subsets $\{U^{\sigma}_{k}\}_{1\leq k\leq m}$ of $\mathbb{R}^{D}$ such that the following holds for all $\tau\in\{0,1\}^{m}$ and $1\leq k\leq m$ .

If $x\in U_{k}^{\sigma}$ and $y=\pi_{M_{\tau}^{(i),\sigma}}(x)$ , we have

•

if $\tau_{k}=0$ ,

[TABLE]

•

if $\tau_{k}=1$ ,

–

for $i=1$ : $\displaystyle\angle\left(T_{y}M_{\tau}^{(1),\sigma},\mathbb{R}^{d}\times\left\{0\right\}^{D-d}\right)\geq c_{k,d,\tau_{min}}\left(\frac{\sigma}{n-1}\right)^{\frac{k-1}{k+d}}$ ,

–

for $i=2$ : $\displaystyle\left\|II_{y}^{M_{\tau}^{(2),\sigma}}\circ\pi_{T_{y}M_{\tau}^{(2),\sigma}}\right\|_{op}\geq c^{\prime}_{k,d,\tau_{min}}\left(\frac{\sigma}{n-1}\right)^{\frac{k-2}{k+d}}$ .

Furthermore,

[TABLE]

Proof of Theorem 3.

Let us apply Lemma C.11 with $\mathcal{X}=\mathbb{R}^{D}$ , $\mathcal{X}^{\prime}=\bigl{(}\mathbb{R}^{D}\bigr{)}^{n-1}$ , $\mathcal{Q}=\bigl{(}\mathcal{P}^{k}(\sigma)\bigr{)}^{\otimes n}$ , $X=X_{1}$ , $X^{\prime}=(X_{2},\ldots,X_{n})=X_{2:n}$ , $\theta_{X}(Q)=T_{X}M$ , and the angle between linear subspaces as the distance $d$ .

If $\sigma<C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ , for $C_{k,d,\tau_{min}}>0$ defined in Lemma 9, then, applying Lemma C.11 to the family $\bigl{\{}\bar{Q}^{(1)}_{\tau,n}\bigr{\}}_{\tau}$ together with the disjoint sets $U_{k}\times U_{k}^{\prime}$ of Lemma 8, we get

[TABLE]

where the second line uses that $\sigma<C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ .

If $\sigma\geq C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ , then Lemma 9 holds, and considering the family $\bigl{\{}\bigl{(}\mathbf{P}_{\tau}^{(1),\sigma}\bigr{)}^{\otimes n}\bigr{\}}_{\tau}$ , together with the disjoint sets $U_{k}^{\sigma}\times\bigl{(}\mathbb{R}^{D}\bigr{)}^{n-1}$ , Lemma C.11 gives

[TABLE]

hence the result.

∎

Proof of Theorem 5.

The proof follows the exact same lines as that of Theorem 3 just above. Namely, consider the same setting with $\theta_{X}(Q)=II_{\pi_{M}(X)}^{M}$ . If $\sigma\geq C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ , apply Lemma C.11 with the family $\bigl{\{}\bar{Q}^{(2)}_{\tau,n}\bigr{\}}_{\tau}$ of Lemma 8. If $\sigma>C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ , Lemma C.11 can be applied to $\bigl{\{}\bigl{(}\mathbf{P}_{\tau}^{(2),\sigma}\bigr{)}^{\otimes n}\bigr{\}}_{\tau}$ in Lemma 9. This yields the announced rate. ∎

Acknowledgements

We would like to thank Frédéric Chazal and Pascal Massart for their constant encouragements, suggestions and stimulating discussions. We also thank the anonymous reviewers for valuable comments and suggestions.

Appendix A: Properties and Stability of the Models

A.1 Property of the Exponential Map in $\mathcal{C}^{2}_{\tau_{min}}$

Here we show the following Lemma 1, reproduced as Lemma A.1.

Lemma A.1.

If $M\in\mathcal{C}^{2}_{\tau_{min}}$ , $\exp_{p}:\mathcal{B}_{T_{p}M}\left(0,\tau_{min}/4\right)\rightarrow M$ is one-to-one. Moreover, it can be written as

[TABLE]

with $\mathbf{N}_{p}$ such that for all $v\in\mathcal{B}_{T_{p}M}\left(0,\tau_{min}/4\right)$ ,

[TABLE]

where $L_{\perp}=5/(4\tau_{min})$ . Furthermore, for all $p,y\in M$ ,

[TABLE]

*where $\left\|R_{2}(y-p)\right\|\leq\frac{\left\|y-p\right\|^{2}}{2\tau_{min}}$ . *

Proof of Lemma A.1.

Proposition 6.1 in [28] states that for all $x\in M$ , $\left\|II^{M}_{x}\right\|_{op}\leq 1/\tau_{min}$ . In particular, Gauss equation ([13, Proposition 3.1 (a), p.135]) yields that the sectional curvatures of $M$ satisfy $-2/\tau_{min}^{2}\leq\kappa\leq 1/\tau_{min}^{2}$ . Using Corollary 1.4 of [3], we get that the injectivity radius of $M$ is at least $\pi\tau_{min}\geq\tau_{min}/4$ . Therefore, $\exp_{p}:\mathcal{B}_{T_{p}M}(0,\tau_{min}/4)\rightarrow M$ is one-to-one.

Let us write $\mathbf{N}_{p}(v)=\exp_{p}(v)-p-v$ . We clearly have $\mathbf{N}_{p}(0)=0$ and $d_{0}\mathbf{N}_{p}=0$ . Let now $v\in\mathcal{B}_{T_{p}M}(0,\tau_{min}/4)$ be fixed. We have $d_{v}\mathbf{N}_{p}=d_{v}\exp_{p}-Id_{T_{p}M}$ . For $0\leq t\leq\left\|v\right\|$ , we write $\gamma(t)=\exp_{p}(tv/\left\|v\right\|)$ for the arc-length parametrized geodesic from $p$ to $\exp_{p}(v)$ , and $P_{t}$ for the parallel translation along $\gamma$ . From Lemma 18 of [14],

[TABLE]

We now derive an upper bound for $\left\|P_{t}-Id_{T_{p}M}\right\|_{op}$ . For this, fix two unit vectors $u\in\mathbb{R}^{D}$ and $w\in T_{p}M$ , and write $g(t)=\langle P_{t}(w)-w,u\rangle$ . Letting $\bar{\nabla}$ denote the ambient derivative in $\mathbb{R}^{D}$ , by definition of parallel translation,

[TABLE]

Since $g(0)=0$ , we get $\left\|P_{t}-Id_{T_{p}M}\right\|_{op}\leq t/\tau_{min}$ . Finally, the triangle inequality leads to

[TABLE]

We conclude with the property of the projection $\pi^{\ast}=\pi_{T_{p}M}$ . Indeed, defining $R_{2}(y-p)=(y-p)-\pi^{\ast}(y-p)$ , Lemma 4.7 in [16] gives

[TABLE]

∎

A.2 Geometric Properties of the Models $\mathcal{C}^{k}$

Lemma A.2.

For any $M\in\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ and $x\in M$ , the following holds.

(i)

For all $v_{1},v_{2}\in\mathcal{B}_{T_{x}M}\left(0,\frac{1}{4L_{\perp}}\right)$ ,

[TABLE] 2. (ii)

For all $h\leq\frac{1}{4L_{\perp}}\wedge\frac{2\tau_{min}}{5}$ ,

[TABLE] 3. (iii)

For all $h\leq\frac{\tau_{min}}{2}$ ,

[TABLE] 4. (iv)

Denoting by $\pi^{\ast}=\pi_{T_{x}M}$ the orthogonal projection onto $T_{x}M$ , for all $x\in M$ , there exist multilinear maps $T_{2}^{\ast},\ldots,T_{k-1}^{\ast}$ from $T_{x}M$ to $\mathbb{R}^{D}$ , and $R_{k}$ such that for all $y\in\mathcal{B}\left(x,\frac{\tau_{min}\wedge L_{\perp}^{-1}}{4}\right)\cap M$ ,

[TABLE]

with

[TABLE]

where $L^{\prime}_{i}$ depends on $d,k,\tau_{min},L_{\perp},\ldots,L_{i}$ , and $C$ on $d$ , $k$ , $\tau_{min}$ , $L_{\perp}$ , $\ldots$ , $L_{k}$ . Moreover, for $k\geq 3$ , $T_{2}^{\ast}=II^{M}_{x}$ . 5. (v)

For all $x\in M$ , $\left\|II^{M}_{x}\right\|_{op}\leq 1/\tau_{min}$ . In particular, the sectional curvatures of $M$ satisfy

[TABLE]

Proof of Lemma A.2.

(i)

Simply notice that from the reverse triangle inequality,

[TABLE] 2. (ii)

The right-hand side inclusion follows straightforwardly from (i). Let us focus on the left-hand side inclusion. For this, consider the map defined by $G=\pi_{T_{x}M}\circ\Psi_{x}$ on the domain $\mathcal{B}_{T_{x}M}\left(0,h\right)$ . For all $v\in\mathcal{B}_{T_{x}M}\left(0,h\right)$ , we have

[TABLE]

Hence, $G$ is a diffeomorphism onto its image and it satisfies $\left\|G(v)\right\|\geq{3\left\|v\right\|}/{4}$ . It follows that

[TABLE]

Now, according to Lemma A.1, for all $y\in\mathcal{B}\left(x,\frac{3h}{5}\right)\cap M$ ,

[TABLE]

from what we deduce $\pi_{T_{x}M}\left(\mathcal{B}\left(x,\frac{3h}{5}\right)\cap M\right)\subset\mathcal{B}_{T_{x}M}\left(0,\frac{3h}{4}\right)$ . As a consequence,

[TABLE]

which yields the announced inclusion since $\pi_{T_{x}M}$ is one to one on $\mathcal{B}\left(x,\frac{5h}{4}\right)\cap M$ from Lemma 3 in [4], and

[TABLE] 3. (iii)

Straightforward application of Lemma 3 in [4]. 4. (iv)

Notice that Lemma A.1 gives the existence of such an expansion for $k=2$ . Hence, we can assume $k\geq 3$ . Taking $h=\frac{\tau_{min}\wedge L_{\perp}^{-1}}{4}$ , we showed in the proof of (ii) that the map $G$ is a diffeomorphism onto its image, with $\left\|d_{v}G-Id_{T_{x}M}\right\|_{op}\leq\frac{1}{4}<1$ . Additionally, the chain rule yields $\left\|d^{i}_{v}G\right\|_{op}\leq\left\|d_{v}^{i}\Psi_{x}\right\|_{op}\leq L_{i}$ for all $2\leq i\leq k$ . Therefore, from Lemma A.3, the differentials of $G^{-1}$ up to order $k$ are uniformly bounded. As a consequence, we get the announced expansion writing

[TABLE]

and using the Taylor expansions of order $k$ of $\Psi_{x}$ and $G^{-1}$ .

Let us now check that $T^{\ast}_{2}=II^{M}_{x}$ . Since, by construction, $T_{2}^{\ast}$ is the second order term of the Taylor expansion of $\Psi_{x}\circ G^{-1}$ at zero, a straightforward computation yields

[TABLE]

Let $v\in T_{x}M$ be fixed. Letting $\gamma(t)=\Psi_{x}(tv)$ for $|t|$ small enough, it is clear that $\gamma^{\prime\prime}(0)=d^{2}_{0}\Psi(v^{\otimes 2})$ . Moreover, by definition of the second fundamental form [13, Proposition 2.1, p.127], since $\gamma(0)=x$ and $\gamma^{\prime}(0)=v$ , we have

[TABLE]

Hence

[TABLE]

which concludes the proof. 5. (v)

The first statement is a rephrasing of Proposition 6.1 in [28]. It yields the bound on sectional curvature, using the Gauss equation [13, Proposition 3.1 (a), p.135].

∎

In the proof of Lemma A.2 (iv), we used a technical lemma of differential calculus that we now prove. It states quantitatively that if $G$ is $\mathcal{C}^{k}$ -close to the identity map, then it is a diffeomorphism onto its image and the differentials of its inverse $G^{-1}$ are controlled.

Lemma A.3.

Let $k\geq 2$ and $U$ be an open subset of $\mathbb{R}^{d}$ . Let $G:U\rightarrow\mathbb{R}^{d}$ be $\mathcal{C}^{k}$ . Assume that $\left\|I_{d}-dG\right\|_{op}\leq\varepsilon<1$ , and that for all $2\leq i\leq k$ , $\left\|d^{i}G\right\|_{op}\leq L_{i}$ for some $L_{i}>0$ . Then $G$ is a $\mathcal{C}^{k}$ -diffeomorphism onto its image, and for all $2\leq i\leq k$ ,

[TABLE]

Proof of Lemma A.3.

For all $x\in U$ , $\left\|d_{x}G-I_{d}\right\|_{op}<1$ , so $G$ is one to one, and for all $y=G(x)\in G(U)$ ,

[TABLE]

For $2\leq i\leq k$ and $1\leq j\leq i$ , write $\Pi_{i}^{(j)}$ for the set of partitions of $\left\{1,\ldots,i\right\}$ with $j$ blocks. Differentiating $i$ times the identity $G\circ G^{-1}=Id_{G(U)}$ , Faa di Bruno’s formula yields that, for all $y=G(x)\in G(U)$ and all unit vectors $h_{1},\ldots,h_{i}\in\mathbb{R}^{D}$ ,

[TABLE]

Isolating the term for $j=1$ entails

[TABLE]

Using the first order Lipschitz bound on $G^{-1}$ , we get

[TABLE]

The result follows by induction on $i$ . ∎

A.3 Proof of Proposition 1

This section is devoted to prove Proposition 1 (reproduced below as Proposition A.4), that asserts the stability of the model with respect to ambient diffeomorphisms.

Proposition A.4.

Let $\Phi:\mathbb{R}^{D}\rightarrow\mathbb{R}^{D}$ be a global $\mathcal{C}^{k}$ -diffeomorphism. If $\left\|d\Phi-I_{D}\right\|_{op}$ , $\left\|d^{2}\Phi\right\|_{op}$ , …, $\left\|d^{k}\Phi\right\|_{op}$ are small enough, then for all $P$ in $\mathcal{P}^{k}_{\tau_{min},\mathbf{L},f_{min},f_{max}}$ , the pushforward distribution $P^{\prime}=\Phi_{\ast}P$ belongs to $\mathcal{P}^{k}_{\tau_{min}/2,2\mathbf{L},f_{min}/2,2f_{max}}$ .

Moreover, if $\Phi=\lambda I_{D}$ ( $\lambda>0$ ) is an homogeneous dilation, then $P^{\prime}\in\mathcal{P}^{k}_{\lambda\tau_{min},\mathbf{L}_{(\lambda)},f_{min}/\lambda^{d},f_{max}/\lambda^{d}}$ , where $\mathbf{L}_{(\lambda)}=(L_{\perp}/\lambda,L_{3}/\lambda^{2},\ldots,L_{k}/\lambda^{k-1})$ .

Proof of Proposition A.4.

The second part is straightforward since the dilation $\lambda M$ has reach $\tau_{\lambda M}=\lambda\tau_{M}$ , and can be parametrized locally by $\tilde{\Psi}_{\lambda p}(v)=\lambda\Psi_{p}(v/\lambda)=\lambda p+v+\lambda\mathbf{N}_{p}(v/\lambda)$ , yielding the differential bounds $\mathbf{L}_{(\lambda)}$ . Bounds on the density follow from homogeneity of the $d$ -dimensional Hausdorff measure.

The first part follows combining Proposition A.5 and Lemma A.6. ∎

Proposition A.5 asserts the stability of the geometric model, that is, the reach bound and the existence of a smooth parametrization when a submanifold is perturbed.

Proposition A.5.

Let $\Phi:\mathbb{R}^{D}\rightarrow\mathbb{R}^{D}$ be a global $\mathcal{C}^{k}$ -diffeomorphism. If $\left\|d\Phi-I_{D}\right\|_{op}$ , $\left\|d^{2}\Phi\right\|_{op}$ , …, $\left\|d^{k}\Phi\right\|_{op}$ are small enough, then for all $M$ in $\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ , the image $M^{\prime}=\Phi\left(M\right)$ belongs to $\mathcal{C}^{k}_{\tau_{min}/2,2L_{\perp},2L_{3},\ldots,2L_{k}}$ .

Proof of Proposition A.5.

To bound $\tau_{M^{\prime}}$ from below, we use the stability of the reach with respect to $\mathcal{C}^{2}$ diffeomorphisms. Namely, from Theorem 4.19 in [16],

[TABLE]

for $\left\|I_{D}-d\Phi\right\|_{op}$ and $\left\|d^{2}\Phi\right\|_{op}$ small enough. This shows the stability for $k=2$ , as well as that of the reach assumption for $k\geq 3$ .

By now, take $k\geq 3$ . We focus on the existence of a good parametrization of $M^{\prime}$ around a fixed point $p^{\prime}=\Phi(p)\in M^{\prime}$ . For $v^{\prime}\in T_{p^{\prime}}M^{\prime}=d_{p}\Phi\left(T_{p}M\right)$ , let us define

[TABLE]

where $\mathbf{N}^{\prime}_{p^{\prime}}(v^{\prime})=\left\{\Phi\left(\Psi_{p}\left(d_{p^{\prime}}\Phi^{-1}.v^{\prime}\right)\right)-p^{\prime}-v^{\prime}\right\}$ .

${M}$${M^{\prime}}$${T_{p}M}$${T_{p^{\prime}}M^{\prime}}$$\scriptstyle{\Phi}$$\scriptstyle{\Psi_{p}}$$\scriptstyle{d_{p}\Phi}$$\scriptstyle{\Psi^{\prime}_{p^{\prime}}}$

The maps $\Psi^{\prime}_{p^{\prime}}(v^{\prime})$ and $\mathbf{N}^{\prime}_{p^{\prime}}(v^{\prime})$ are well defined whenever $\left\|d_{p^{\prime}}\Phi^{-1}.v^{\prime}\right\|\leq\frac{1}{4L_{\perp}}$ , so in particular if $\left\|v^{\prime}\right\|\leq\frac{1}{4\left(2L_{\perp}\right)}\leq\frac{1-\left\|I_{D}-d\Phi\right\|_{op}}{4L_{\perp}}$ and $\left\|I_{D}-d\Phi\right\|_{op}\leq\frac{1}{2}$ . One easily checks that $\mathbf{N}^{\prime}_{p^{\prime}}(0)=0$ , $d_{0}\mathbf{N}^{\prime}_{p^{\prime}}=0$ and writing $c(v^{\prime})=p+d_{p^{\prime}}\Phi^{-1}.v^{\prime}+\mathbf{N}_{p^{\prime}}\left(d_{p^{\prime}}\Phi^{-1}.v^{\prime}\right)$ , for all unit vector $w^{\prime}\in T_{p^{\prime}}M^{\prime}$ ,

[TABLE]

Writing further $\left\|d\Phi^{-1}\right\|_{op}\leq(1-\left\|I_{D}-d\Phi\right\|_{op})^{-1}\leq 1+2\left\|I_{D}-\Phi\right\|_{op}$ for $\left\|I_{D}-d\Phi\right\|_{op}$ small enough depending only on $L_{\perp}$ , it is clear that the right-hand side of the latter inequality goes below $2L_{\perp}$ for $\left\|I_{D}-d\Phi\right\|_{op}$ and $\left\|d^{2}\Phi\right\|_{op}$ small enough. Hence, for $\left\|I_{D}-d\Phi\right\|_{op}$ and $\left\|d^{2}\Phi\right\|_{op}$ small enough depending only on $L_{\perp}$ , $\|{d^{2}_{v^{\prime}}\mathbf{N}^{\prime}_{p^{\prime}}}\|_{op}\leq 2L_{\perp}$ for all $\left\|v^{\prime}\right\|\leq\frac{1}{4(2L_{\perp})}$ . From the chain rule, the same argument applies for the order $3\leq i\leq k$ differential of $\mathbf{N}^{\prime}_{p^{\prime}}$ . ∎

Lemma A.6 deals with the condition on the density in the models $\mathcal{P}^{k}$ . It gives a change of variable formula for pushforward of measure on submanifolds, ensuring a control on densities with respect to intrinsic volume measure.

Lemma A.6 (Change of variable for the Hausdorff measure).

Let $P$ be a probability distribution on $M\subset\mathbb{R}^{D}$ with density $f$ with respect to the $d$ -dimensional Hausdorff measure $\mathcal{H}^{d}$ . Let $\Phi:\mathbb{R}^{D}\rightarrow\mathbb{R}^{D}$ be a global diffeomorphism such that $\left\|I_{D}-d\Phi\right\|_{\mathrm{op}}<1/3$ . Let $P^{\prime}=\Phi_{\ast}P$ be the pushforward of $P$ by $\Phi$ . Then $P^{\prime}$ has a density $g$ with respect to $\mathcal{H}^{d}$ . This density can be chosen to be, for all $z\in\Phi(M)$ ,

[TABLE]

In particular, if $f_{min}\leq f\leq f_{max}$ on $M$ , then for all $z\in\Phi(M)$ ,

[TABLE]

Proof of Lemma A.6.

Let $p\in M$ be fixed and $A\subset\mathcal{B}(p,r)\cap M$ for $r$ small enough. For a differentiable map $h:\mathbb{R}^{d}\rightarrow\mathbb{R}^{D}$ and for all $x\in\mathbb{R}^{d}$ , we let $J_{h}(x)$ denote the $d$ -dimensional Jacobian $J_{h}(x)=\sqrt{\det\left(d_{x}h^{T}\,d_{x}h\right)}$ . The area formula ([17, Theorem 3.2.5]) states that if $h$ is one-to-one,

[TABLE]

whenever $u:\mathbb{R}^{D}\rightarrow\mathbb{R}$ is Borel, where $\lambda^{d}$ is the Lebesgue measure on $\mathbb{R}^{d}$ . By definition of the pushforward, and since $dP=fd\mathcal{H}^{d}$ ,

[TABLE]

Writing $\Psi_{p}=\exp_{p}:T_{p}M\rightarrow\mathbb{R}^{D}$ for the exponential map of $M$ at $p$ , we have

[TABLE]

Rewriting the right hand term, we apply the area formula again with $h=\Phi\circ{\Psi_{p}}$ ,

[TABLE]

Since this is true for all $A\subset\mathcal{B}(p,r)\cap M$ , $P^{\prime}$ has a density $g$ with respect to $\mathcal{H}^{d}$ , with

[TABLE]

Writing $p=\Phi^{-1}(z)$ , it is clear that $\Psi_{\Phi^{-1}\left(z\right)}^{-1}\circ\Phi^{-1}\left(z\right)=\Psi_{p}^{-1}(p)=0\in T_{p}M$ . Since $d_{0}\exp_{p}:T_{p}M\rightarrow\mathbb{R}^{D}$ is the inclusion map, we get the first statement.

We now let $B$ and $\pi_{T}$ denote $d_{p}\Phi$ and $\pi_{T_{p}M}$ respectively. For any unit vector $v\in T_{p}M$ ,

[TABLE]

Therefore, $1-3\left\|I_{D}-B\right\|_{\mathrm{op}}\leq\left\|\pi_{T}B^{T}\mathchoice{{B\,\smash{\vrule height=5.46666pt,depth=2.74078pt}}_{\,T_{p}M}}{{B\,\smash{\vrule height=5.46666pt,depth=2.74078pt}}_{\,T_{p}M}}{{B\,\smash{\vrule height=3.82666pt,depth=2.27611pt}}_{\,T_{p}M}}{{B\,\smash{\vrule height=2.73332pt,depth=2.27611pt}}_{\,T_{p}M}}\right\|_{\mathrm{op}}\leq 1+3\left\|I_{D}-B\right\|_{\mathrm{op}}$ . Hence,

[TABLE]

and

[TABLE]

which yields the result. ∎

Appendix B: Some Probabilistic Tools

B.1 Volume and Covering Rate

The first lemma of this section gives some details about the covering rate of a manifold with bounded reach.

Lemma B.7.

Let $P_{0}\in\mathcal{P}^{k}$ have support $M\subset\mathbb{R}^{D}$ . Then for all $r\leq\tau_{min}/4$ and $x$ in $M$ ,

[TABLE]

for some $c_{d},C_{d}>0$ , with $p_{x}(r)=P_{0}\bigl{(}\mathcal{B}(x,r)\bigr{)}$ .

Moreover, letting $h=\left(\frac{C^{\prime}_{d}k}{f_{min}}\frac{\log n}{n}\right)^{1/d}$ with $C^{\prime}_{d}$ large enough, the following holds. For $n$ large enough so that $h\leq\tau_{min}/4$ , with probability at least $1-\left(\frac{1}{n}\right)^{k/d}$ ,

[TABLE]

Proof of Lemma B.7.

Denoting by $\mathcal{B}_{M}(x,r)$ the geodesic ball of radius $r$ centered at $x$ , Proposition 25 of [1] yields

[TABLE]

Hence, the bounds on the Jacobian of the exponential map given by Proposition 27 of [1] yield

[TABLE]

for some $c_{d},C_{d}>0$ . Now, since $P$ has a density $f_{min}\leq f\leq f_{max}$ with respect to the volume measure of $M$ , we get the first result.

Now we notice that since $p_{x}(r)\geq c_{d}f_{min}r^{d}$ , Theorem 3.3 in [10] entails, for $s\leq\tau_{min}/8$ ,

[TABLE]

Hence, taking $s=h/2$ , and $h=\left(\frac{C^{\prime}_{d}k}{f_{min}}\frac{\log n}{n}\right)^{1/d}$ with $C^{\prime}_{d}$ so that $C^{\prime}_{d}\geq\frac{8^{d}}{c_{d}k}\vee\frac{2^{d}(1+k/d)}{c_{d}k}$ yields the result. Since $k\geq 1$ , taking $C^{\prime}_{d}=\frac{8^{d}}{c_{d}}$ is sufficient. ∎

B.2 Concentration Bounds for Local Polynomials

This section is devoted to the proof of the following proposition.

Proposition B.8.

Set $h=\left(K\frac{\log n}{n-1}\right)^{\frac{1}{d}}$ . There exist constants $\kappa_{k,d}$ , $c_{k,d}$ and $C_{d}$ such that, if $K\geq(\kappa_{k,d}f_{max}^{2}/f_{min}^{3})$ and $n$ is large enough so that $3h/2\leq h_{0}\leq\tau_{min}/4$ , then with probability at least $1-\left(\frac{1}{n}\right)^{\frac{k}{d}+1}$ , we have

[TABLE]

for every $S\in\mathbb{R}^{k}[x_{1:d}]$ , where $N(h)=\sum_{j=2}^{n}\mathbbm{1}_{\mathcal{B}(0,h)}(Y_{j})$ .

A first step is to ensure that empirical expectations of order $k$ polynomials are close to their deterministic counterparts.

Proposition B.9.

Let $b\leq\tau_{min}/8$ . For any $y_{0}$ $\in$ $M$ , we have

[TABLE]

where $P_{0,n-1}$ denotes the empirical distribution of $n-1$ i.i.d. random variables $Y_{i}$ drawn from $P_{0}$ .

Proof of Proposition B.9.

Without loss of generality we choose $y_{0}=0$ and shorten notation to $\mathcal{B}(b)$ and $p(b)$ . Let $\mathcal{Z}$ denote the empirical process on the left-hand side of Proposition B.9. Denote also by $f_{u,\varepsilon}$ the map $\prod_{j=1}^{k}\left(\frac{\left\langle u_{j},y\right\rangle}{b}\right)^{\varepsilon_{j}}\mathbbm{1}_{\mathcal{B}(b)}(y)$ , and let $\mathcal{F}$ denote the set of such maps, for $u_{j}$ in $\mathcal{B}(1)$ and $\varepsilon$ in $\{0,1\}^{k}$ .

Since $\|f_{u,\varepsilon}\|_{\infty}\leq 1$ and $Pf_{u,\varepsilon}^{2}\leq p(b)$ , the Talagrand-Bousquet inequality ([8, Theorem 2.3]) yields

[TABLE]

with probability larger than $1-e^{-t}$ . It remains to bound $\mathbb{E}\mathcal{Z}$ from above.

Lemma B.10.

We may write

[TABLE]

Proof of Lemma B.10.

Let $\sigma_{i}$ and $g_{i}$ denote some independent Rademacher and Gaussian variables. For convenience, we denote by $\mathbb{E}_{A}$ the expectation with respect to the random variable $A$ . Using symmetrization inequalities we may write

[TABLE]

Now let $\mathcal{Y}_{u,\varepsilon}$ denote the Gaussian process $\sum_{i=1}^{n-1}{g_{i}\prod_{j=1}^{k}\left(\frac{\left\langle u_{j},Y_{i}\right\rangle}{b}\right)^{\varepsilon_{j}}}\mathbbm{1}_{\mathcal{B}(b)}(Y_{i})$ . Since, for any $y$ in $\mathcal{B}(b)$ , $u$ , $v$ in $\mathcal{B}(1)^{k}$ , and $\varepsilon$ , $\varepsilon^{\prime}$ in $\{0,1\}^{k}$ , we have

[TABLE]

we deduce that

[TABLE]

where $\Theta_{u,\varepsilon}=\sqrt{k}\sum_{i=1}^{n-1}\sum_{r=1}^{k}g_{i,r}\frac{\left\langle\varepsilon_{r}u_{r},Y_{i}\right\rangle}{b}\mathbbm{1}_{\mathcal{B}(b)}(Y_{i})$ . According to Slepian’s Lemma [7, Theorem 13.3], it follows that

[TABLE]

We deduce that

[TABLE]

Then we can deduce that $\mathbb{E}_{X}\mathbb{E}_{g}\sup_{u,\varepsilon}Y_{g}\leq k\sqrt{p(b)}$ . ∎

Combining Lemma B.10 with Talagrand-Bousquet’s inequality gives the result of Proposition B.9. ∎

We are now in position to prove Proposition B.8.

Proof of Proposition B.8.

If $h/2\leq\tau_{min}/4$ , then, according to Lemma B.7, $p(h/2)\geq c_{d}f_{min}h^{d}$ , hence, if $h=\left(K\frac{\log(n)}{n-1}\right)^{\frac{1}{d}}$ , $(n-1)p(h/2)\geq Kc_{d}f_{min}\log(n)$ . Choosing $b=h/2$ and $t=(k/d+1)\log(n)+\log(2)$ in Proposition B.9 and $K=K^{\prime}/f_{min}$ , with $K^{\prime}>1$ leads to

[TABLE]

On the complement of the probability event mentioned just above, for a polynomial $S=\sum_{\alpha\in[0,k]^{d}||\alpha|\leq k}a_{\alpha}y_{1:d}^{\alpha}$ , we have

[TABLE]

On the other hand, we may write, for all $r>0$ ,

[TABLE]

for some constant $C_{d,k}$ . It follows that

[TABLE]

according to Lemma A.2. Then we may choose $K^{\prime}=\kappa_{k,d}(f_{max}/f_{min})^{2}$ , with $\kappa_{k,d}$ large enough so that

[TABLE]

The second inequality of Proposition B.8 is derived the same way from Proposition B.9, choosing $\varepsilon=\left(0,\ldots,0\right)$ , $b=3h/2$ and $h\leq\tau_{min}/8$ so that $b\leq\tau_{min}/4$ . ∎

Appendix C: Minimax Lower Bounds

C.1 Conditional Assouad’s Lemma

This section is dedicated to the proof of Lemma 7, reproduced below as Lemma C.11.

Lemma C.11 (Conditional Assouad).

Let $m\geq 1$ be an integer and let $\left\{\mathcal{Q}_{\tau}\right\}_{\tau\in\{0,1\}^{m}}$ be a family of $2^{m}$ submodels $\mathcal{Q}_{\tau}\subset\mathcal{Q}$ . Let $\left\{U_{k}\times U^{\prime}_{k}\right\}_{1\leq k\leq m}$ be a family of pairwise disjoint subsets of $\mathcal{X}\times\mathcal{X}^{\prime}$ , and $\mathcal{D}_{\tau,k}$ be subsets of $\mathcal{D}$ . Assume that for all $\tau\in\left\{0,1\right\}^{m}$ and $1\leq k\leq m$ ,

•

for all $Q_{\tau}\in\mathcal{Q}_{\tau}$ , $\theta_{X}(Q_{\tau})\in\mathcal{D}_{\tau,k}$ on the event $\left\{X\in U_{k}\right\}$ ;

•

for all $\theta\in\mathcal{D}_{\tau,k}$ and $\theta^{\prime}\in\mathcal{D}_{\tau^{k},k}$ , $d(\theta,\theta^{\prime})\geq\Delta$ .

For all $\tau\in\left\{0,1\right\}^{m}$ , let $\overline{Q}_{\tau}\in\overline{Conv}(\mathcal{Q}_{\tau})$ , and write $\bar{\mu}_{\tau}$ and $\bar{\nu}_{\tau}$ for the marginal distributions of $\overline{Q}_{\tau}$ on $\mathcal{X}$ and $\mathcal{X}^{\prime}$ respectively. Assume that if $(X,X^{\prime})$ has distribution $\overline{Q}_{\tau}$ , $X$ and $X^{\prime}$ are independent conditionally on the event $\left\{(X,X^{\prime})\in U_{k}\times U^{\prime}_{k}\right\}$ , and that

[TABLE]

Then,

[TABLE]

where the infimum is taken over all the estimators $\hat{\theta}:\mathcal{X}\times\mathcal{X}^{\prime}\rightarrow\mathcal{D}$ .

Proof of Lemma C.11.

The proof follows that of Lemma 2 in [36]. Let $\hat{\theta}=\hat{\theta}(X,X^{\prime})$ be fixed. For any family of $2^{m}$ distributions $\left\{Q_{\tau}\right\}_{\tau}\in\left\{\mathcal{Q}_{\tau}\right\}_{\tau}$ , since the $U_{k}\times U^{\prime}_{k}$ ’s are pairwise disjoint,

[TABLE]

Since the previous inequality holds for all $Q_{\tau}\in\mathcal{Q}_{\tau}$ , it extends to $\overline{Q}_{\tau}\in\overline{Conv}(\mathcal{Q}_{\tau})$ by linearity. Let us now lower bound each of the terms of the sum for fixed $\tau\in\left\{0,1\right\}^{m}$ and $1\leq k\leq m$ . By assumption, if $(X,X^{\prime})$ has distribution $\overline{Q}_{\tau}$ , then conditionally on $\left\{(X,X^{\prime})\in U_{k}\times U^{\prime}_{k}\right\}$ , $X$ and $X^{\prime}$ are independent. Therefore,

[TABLE]

where we used that $d(\hat{\theta},\mathcal{D}_{\tau,k})+d(\hat{\theta},\mathcal{D}_{\tau^{k},k})\geq\Delta.$ The result follows by summing the above bound $\left|\left\{1,\ldots,m\right\}\times\left\{0,1\right\}^{m}\right|=m2^{m}$ times. ∎

C.2 Construction of Generic Hypotheses

Let $M_{0}^{(0)}$ be a $d$ -dimensional $\mathcal{C}^{\infty}$ -submanifold of $\mathbb{R}^{D}$ with reach greater than $1$ and such that it contains $\mathcal{B}_{\mathbb{R}^{d}\times\{0\}^{D-d}}(0,1/2).$ $M_{0}^{(0)}$ can be built for example by flattening smoothly a unit $d$ -sphere in $\mathbb{R}^{d+1}\times\{0\}^{D-d-1}$ . Since $M_{0}^{(0)}$ is $\mathcal{C}^{\infty}$ , the uniform probability distribution $P_{0}^{(0)}$ on $M_{0}^{(0)}$ belongs to $\mathcal{P}^{k}_{1,\mathbf{L}^{(0)},1/V_{0}^{(0)},1/V_{0}^{(0)}}$ , for some $\mathbf{L}^{(0)}$ and $V_{0}^{(0)}=Vol(M_{0}^{(0)})$ .

Let now $M_{0}=(2\tau_{min})M_{0}^{(0)}$ be the submanifold obtained from $M_{0}^{(0)}$ by homothecy. By construction, and from Proposition A.4, we have

[TABLE]

and the uniform probability distribution $P_{0}$ on $M_{0}$ satisfies

[TABLE]

whenever $L_{\perp}/2\geq L_{\perp}^{(0)}/(2\tau_{min})$ , $\ldots$ , $L_{k}/2\geq L_{k}^{(0)}/(2\tau_{min})^{k-1}$ , and provided that $2f_{min}\leq\bigl{(}(2\tau_{min})^{d}V_{0}^{(0)}\bigr{)}^{-1}\leq f_{max}/2$ . Note that $L_{\perp}^{(0)},$ $\ldots,L_{k}^{(0)}$ , $Vol(M_{0}^{(0)})$ depend only on $d$ and $k$ . For this reason, all the lower bounds will be valid for $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},(\tau_{min}^{d}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ large enough to exceed the thresholds $L_{\perp}^{(0)}/2,\ldots,L_{k}^{(0)}/2^{k-1}$ , $2^{d}V_{0}^{(0)}$ and $(2^{d}V_{0}^{(0)})^{-1}$ respectively.

For $0<\delta\leq\tau_{min}/4$ , let $x_{1},\ldots,x_{m}\in M_{0}\cap\mathcal{B}(0,\tau_{min}/4)$ be a family of points such that

[TABLE]

For instance, considering the family $\left\{\bigl{(}l_{1}\delta,\ldots,l_{d}\delta,0,\ldots,0\bigr{)}\right\}_{l_{i}\in\mathbb{Z},|l_{i}|\leq\lfloor\tau_{min}/(4\delta)\rfloor}$ ,

[TABLE]

for some $c_{d}>0$ .

We let $e\in\mathbb{R}^{D}$ denote the $(d+1)$ th vector of the canonical basis. In particular, we have the orthogonal decomposition of the ambient space

[TABLE]

Let $\phi:\mathbb{R}^{D}\rightarrow[0,1]$ be a smooth scalar map such that $\left.\phi\right|_{\mathcal{B}\left(0,\frac{1}{2}\right)}=1\text{ and }\left.\phi\right|_{\mathcal{B}\left(0,1\right)^{c}}=0.$

Let $\Lambda_{+}>0$ and $1\geq A_{+}>A_{-}>0$ be real numbers to be chosen later. Let $\mathbf{\Lambda}=\left(\Lambda_{1},\ldots,\Lambda_{m}\right)$ with entries $-\Lambda_{+}\leq\Lambda_{k}\leq\Lambda_{+}$ , and $\mathbf{A}=\left(A_{1},\ldots,A_{m}\right)$ with entries $A_{-}\leq A_{k}\leq A_{+}$ . For $z\in\mathbb{R}^{D}$ , we write $z=(z_{1},\ldots,z_{D})$ for its coordinates in the canonical basis. For all $\tau=\left(\tau_{1},\ldots,\tau_{m}\right)\in\left\{0,1\right\}^{m}$ , define the bump map as

[TABLE]

An analogous deformation map was considered in [1]. We let $P^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}$ denote the pushforward distribution of $P_{0}$ by $\Phi^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}$ , and write $M^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}$ for its support. Roughly speaking, $M^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}$ consists of $m$ bumps at the $x_{k}$ ’s having different shapes (Figure 7). If $\tau_{k}=0$ , the bump at $x_{k}$ is a symmetric plateau function and has height $\Lambda_{k}$ . If $\tau_{k}=1$ , it fits the graph of the polynomial $A_{k}(x-x_{k})_{1}^{i}$ locally.

The following Lemma C.12 gives differential bounds and geometric properties of $\Phi^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}$ .

Lemma C.12.

There exists $c_{\phi,i}<1$ such that if $A_{+}\leq c_{\phi,i}\delta^{i-1}$ and $\Lambda_{+}\leq c_{\phi,i}\delta$ , then $\Phi^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}$ is a global $\mathcal{C}^{\infty}$ -diffeomorphism of $\mathbb{R}^{D}$ such that for all $1\leq k\leq m$ , $\Phi^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}\left(\mathcal{B}(x_{k},\delta)\right)=\mathcal{B}(x_{k},\delta)$ . Moreover,

[TABLE]

and for $j\geq 2$ ,

[TABLE]

Proof of Lemma C.12.

Follows straightforwardly from chain rule, similarly to Lemma 11 in [1]. ∎

Lemma C.13.

If $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), then provided that $\Lambda_{+}\vee A_{+}\delta^{i}\leq c_{k,d,\tau_{min}}\delta^{k}$ , for all $\tau\in\{0,1\}^{m}$ , $P^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}\in\mathcal{P}^{k}_{\tau_{min},\mathbf{L},f_{min},f_{max}}$

Proof of Lemma C.13.

Follows using the stability of the model Lemma A.4 applied to the distribution $P_{0}\in\mathcal{P}^{k}_{2\tau_{min},\mathbf{L}/2,2f_{min},f_{max}/2}$ and the map $\Phi^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}$ , of which differential bounds are asserted by Lemma C.12. ∎

C.3 Hypotheses for Tangent Space and Curvature

C.3.1 Proof of Lemma 8

This section is devoted to the proof of Lemma 8, for which we first derive two slightly more general results, with parameters to be tuned later. The proof is split into two intermediate results Lemma C.14 and Lemma C.15.

Let us write $\bar{Q}^{(i)}_{\tau,n}$ for the mixture distribution on $(\mathbb{R}^{D})^{n}$ defined by

[TABLE]

Although the probability distribution $\bar{Q}^{(i)}_{\tau,n}$ depends on $A_{-},A_{+}$ and $\Lambda_{+}$ , we omit this dependency for the sake of compactness. Another way to define $\bar{Q}^{(i)}_{\tau,n}$ is the following: draw uniformly $\mathbf{\Lambda}$ in $[-\Lambda_{+},\Lambda_{+}]^{m}$ and $\mathbf{A}$ in $[A_{-},A_{+}]^{m}$ , and given $\left(\mathbf{\Lambda},\mathbf{A}\right)$ , take $Z_{i}=\Phi^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}\left(Y_{i}\right)$ , where $Y_{1},\ldots,Y_{n}$ is an i.i.d. $n$ -sample with common distribution $P_{0}$ on $M_{0}$ . Then $\left(Z_{1},\ldots,Z_{n}\right)$ has distribution $\bar{Q}^{(i)}_{\tau,n}$ .

Lemma C.14.

Assume that the conditions of Lemma C.12 hold, and let

[TABLE]

and

[TABLE]

Then the sets $U_{k}\times U^{\prime}_{k}$ are pairwise disjoint, $\bar{Q}^{(i)}_{\tau,n}\in\overline{Conv}\bigl{(}\bigl{(}\mathcal{P}^{(i)}_{\tau}\bigr{)}^{\otimes n}\bigr{)}$ , and if $\left(Z_{1},\ldots,Z_{n}\right)=\left(Z_{1},Z_{2:n}\right)$ has distribution $\bar{Q}^{(i)}_{\tau,n}$ , $Z_{1}$ and $Z_{2:d}$ are independent conditionally on the event $\left\{\left(Z_{1},Z_{2:n}\right)\in U_{k}\times U^{\prime}_{k}\right\}$ .

Moreover, if $\left(X_{1},\ldots,X_{n}\right)$ has distribution $\bigl{(}{P}^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}\bigr{)}^{\otimes n}$ (with fixed $\mathbf{A}$ and $\mathbf{\Lambda}$ ), then on the event $\left\{X_{1}\in U_{k}\right\}$ , we have:

•

if $\tau_{k}=0$ ,

[TABLE]

and $d_{H}\bigl{(}M_{0},{{M}^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}}\bigr{)}\geq|\Lambda_{k}|$ .

•

if $\tau_{k}=1$ ,

–

for $i=1$ : $\angle\left(T_{X_{1}}{{M}^{\mathbf{\Lambda},\mathbf{A},(1)}_{\tau}},\mathbb{R}^{d}\times\left\{0\right\}^{D-d}\right)\geq A_{-}/2$ .

–

for $i=2$ : $\left\|II_{X_{1}}^{{{M}^{\mathbf{\Lambda},\mathbf{A},(2)}_{\tau}}}\circ\pi_{T_{X_{1}}{{M}^{\mathbf{\Lambda},\mathbf{A},(2)}_{\tau}}}\right\|_{op}\geq A_{-}/2$ .

Proof of Lemma C.14.

It is clear from the definition (8) that $\bar{Q}^{(i)}_{\tau,n}\in\overline{Conv}\bigl{(}\bigl{(}\mathcal{P}^{(i)}_{\tau}\bigr{)}^{\otimes n}\bigr{)}$ . By construction of the $\Phi^{\mathbf{\Lambda},\mathbf{A},i}_{\tau}$ ’s, these maps leave the sets

[TABLE]

unchanged for all $\mathbf{\Lambda},\mathbf{L}$ . Therefore, on the event $\left\{\left(Z_{1},Z_{2:n}\right)\in U_{k}\times U^{\prime}_{k}\right\}$ , one can write $Z_{1}$ only as a function of $X_{1},\Lambda_{k},A_{k}$ , and $Z_{2:n}$ as a function of the rest of the $X_{j}$ ’s, $\Lambda_{k}$ ’s and $A_{k}$ ’s. Therefore, $Z_{1}$ and $Z_{2:n}$ are independent.

We now focus on the geometric statements. For this, we fix a deterministic point $z=\Phi^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}(x_{0})\in U_{k}\cap{M}^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}$ . By construction, one necessarily has $x_{0}\in M_{0}\cap\mathcal{B}(x_{k},\delta/2)$ .

•

If $\tau_{k}=0$ , locally around $x_{0}$ , $\Phi^{\mathbf{\Lambda},\mathbf{A},(1)}_{\tau}$ is the translation of vector $\Lambda_{k}e$ . Therefore, since $M_{0}$ satisfies $T_{x_{0}}M_{0}=\mathbb{R}^{d}\times\left\{0\right\}^{D-d}$ and $II_{x_{0}}^{M_{0}}=0$ , we have

[TABLE]

•

if $\tau_{k}=1$ ,

–

for $i=1$ : locally around $x_{0}$ , $\Phi^{\mathbf{\Lambda},\mathbf{A},(1)}_{\tau}$ can be written as $x\mapsto x+A_{k}(x-x_{k})_{1}e$ . Hence, $T_{z}{M}^{\mathbf{\Lambda},\mathbf{A},(i)}_{\tau}$ contains the direction $(1,A_{k})$ in the plane $span(e_{1},e)$ spanned by the first vector of the canonical basis and $e$ . As a consequence, since $e$ is orthogonal to $\mathbb{R}^{d}\times\left\{0\right\}^{D-d}$ ,

[TABLE]

–

for $i=2$ : locally around $x_{0}$ , $\Phi^{\mathbf{\Lambda},\mathbf{A},(2)}_{\tau}$ can be written as $x\mapsto x+A_{k}(x-x_{k})_{1}^{2}e$ . Hence, ${M}^{\mathbf{\Lambda},\mathbf{A},(2)}_{\tau}$ contains an arc of parabola of equation $y=A_{k}(x-x_{k})_{1}^{2}$ in the plane $span(e_{1},e)$ . As a consequence,

[TABLE]

∎

Lemma C.15.

Assume that the conditions of Lemma C.12 and Lemma C.14 hold. If in addition, $cA_{+}(\delta/4)^{i}\leq\Lambda_{+}\leq CA_{+}(\delta/4)^{i}$ for some absolute constants $C\geq c>3/4$ , and $A_{-}=A_{+}/2$ , then,

[TABLE]

and

[TABLE]

Proof of Lemma C.15.

First note that all the involved distributions have support in $\mathbb{R}^{d}\times span(e)\times\left\{0\right\}^{D-(d+1)}$ . Therefore, we use the canonical coordinate system of $\mathbb{R}^{d}\times span(e)$ , centered at $x_{k}$ , and we denote the components by $(x_{1},x_{2},\ldots,x_{d},y)=(x_{1},x_{2:d},y)$ . Without loss of generality, assume that $\tau_{k}=0$ (if not, flip $\tau$ and $\tau^{k}$ ). Recall that $\phi$ has been chosen to be constant and equal to $1$ on the ball $\mathcal{B}(0,1/2)$ .

By definition (8), on the event $\left\{Z\in U_{k}\right\}$ , a random variable $Z$ having distribution $\bar{Q}^{(i)}_{\tau,1}$ can be represented by $Z=X+\phi\left(\frac{X-x_{k}}{\delta}\right)\Lambda_{k}e=X+\Lambda_{k}e$ where $X$ and $\Lambda_{k}$ are independent and have respective distributions $P_{0}$ (the uniform distribution on $M_{0}$ ) and the uniform distribution on $[-\Lambda_{+},\Lambda_{+}]$ . Therefore, on $U_{k}$ , $\bar{Q}^{(i)}_{\tau,1}$ has a density with respect to the Lebesgue measure $\lambda_{d+1}$ on $\mathbb{R}^{d}\times span(e)$ that can be written as

[TABLE]

Analogously, nearby $x_{k}$ a random variable $Z$ having distribution $\bar{Q}^{(i)}_{\tau^{k},1}$ can be represented by $Z=X+A_{k}(X-x_{k})_{1}^{i}e$ where $A_{k}$ has uniform distribution on $[A_{-},A_{+}]$ . Therefore, a straightforward change of variable yields the density

[TABLE]

We recall that $Vol(M_{0})=(2\tau_{min})^{d}Vol\bigl{(}M_{0}^{(0)}\bigr{)}=c^{\prime}_{d}\tau_{min}^{d}$ . Let us now tackle the right-hand side inequality, writing

[TABLE]

It follows that

[TABLE]

For the integral on $U^{\prime}_{k}$ , notice that by definition, $\bar{Q}^{(i)}_{\tau,n-1}$ and $\bar{Q}^{(i)}_{\tau^{k},n-1}$ coincide on $U^{\prime}_{k}$ since they are respectively the image distributions of $P_{0}$ by functions that are equal on that set. Moreover, these two functions leave $\mathbb{R}^{D}\setminus\left\{\mathcal{B}_{\mathbb{R}^{d}\times\{0\}^{D-d}}\left(x_{k},\delta\right)+\mathcal{B}_{span(e)}(0,{\tau_{min}}/{2})\right\}$ unchanged. Therefore,

[TABLE]

hence the result. ∎

Proof of Lemma 8.

The properties of $\bigl{\{}\bar{Q}^{(i)}_{\tau,n}\bigr{\}}_{\tau}$ and $\left\{U_{k}\times U^{\prime}_{k}\right\}_{k}$ given by Lemma C.14 and Lemma C.15 yield the result, setting $\Lambda_{+}=A_{+}\delta^{i}/4$ , $A_{+}=2A_{-}=\varepsilon\delta^{k-i}$ for $\varepsilon=\varepsilon_{k,d,\tau_{min}}$ , and $\delta$ such that $c^{\prime}_{d}\left(\frac{\delta}{\tau_{min}}\right)^{d}=\frac{1}{n-1}$ . ∎

C.3.2 Proof of Lemma 9

This section details the construction leading to Lemma 9 that we restate in Lemma C.16.

Lemma C.16.

Assume that $\tau_{min}{L_{\perp}}$ , $\ldots$ , $\tau_{min}^{k-1}{L_{k}}$ , $({\tau_{min}^{d}}f_{min})^{-1}$ , ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), and $\sigma\geq C_{k,d,\tau_{min}}\left({1}/({n-1)}\right)^{k/d}$ for $C_{k,d,\tau_{min}}>0$ large enough. Given $i\in\{1,2\}$ , there exists a collection of $2^{m}$ distributions $\bigl{\{}\mathbf{P}_{\tau}^{(i),\sigma}\bigr{\}}_{\tau\in\{0,1\}^{m}}\subset\mathcal{P}^{k}(\sigma)$ with associated submanifolds $\bigl{\{}M_{\tau}^{(i),\sigma}\bigr{\}}_{\tau\in\{0,1\}^{m}}$ , together with pairwise disjoint subsets $\{U^{\sigma}_{k}\}_{1\leq k\leq m}$ of $\mathbb{R}^{D}$ such that the following holds for all $\tau\in\{0,1\}^{m}$ and $1\leq k\leq m$ .

If $x\in U_{k}^{\sigma}$ and $y=\pi_{M_{\tau}^{(i),\sigma}}(x)$ , we have

•

if $\tau_{k}=0$ ,

[TABLE]

•

if $\tau_{k}=1$ ,

–

for $i=1$ : $\displaystyle\angle\left(T_{y}M_{\tau}^{(1),\sigma},\mathbb{R}^{d}\times\left\{0\right\}^{D-d}\right)\geq c_{k,d,\tau_{min}}\left(\frac{\sigma}{n-1}\right)^{\frac{k-1}{k+d}}$ ,

–

for $i=2$ : $\displaystyle\left\|II_{y}^{M_{\tau}^{(2),\sigma}}\circ\pi_{T_{y}M_{\tau}^{(2),\sigma}}\right\|_{op}\geq c^{\prime}_{k,d,\tau_{min}}\left(\frac{\sigma}{n-1}\right)^{\frac{k-2}{k+d}}$ .

Furthermore,

[TABLE]

Proof of Lemma C.16.

Following the notation of Section C.2, for $i\in\left\{1,2\right\}$ , $\tau\in\{0,1\}^{m}$ , $\delta\leq\tau_{min}/4$ and $A>0$ , consider

[TABLE]

Note that (9) is a particular case of (7). Clearly from the definition, $\Phi^{A,i}_{\tau}$ and $\Phi^{A,i}_{\tau^{k}}$ coincide outside $\mathcal{B}(x_{k},\delta)$ , $(\Phi(x)-x)\in span(e)$ for all $x\in\mathbb{R}^{D}$ , and $\left\|I_{D}-\Phi\right\|_{\infty}\leq A\delta^{i}$ . Let us define $M_{\tau}^{A,i}=\Phi_{\tau}^{A,i}(M_{0})$ . From Lemma C.13, we have $M_{\tau}^{A,i}\in\mathcal{C}^{k}_{\tau_{min},\mathbf{L}}$ provided that $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}}$ are large enough, and that $\delta\leq\tau_{min}/2$ , with $A/\delta^{k-i}\leq\varepsilon$ for $\varepsilon=\varepsilon_{k,d,\tau_{min},i}$ small enough.

Furthermore, let us write

[TABLE]

Then the family $\{{U^{\sigma}_{k}}\}_{1\leq k\leq m}$ is pairwise disjoint. Also, since $\tau_{k}=0$ implies that $M_{\tau}^{A,i}$ coincides with $M_{0}$ on $\mathcal{B}(x_{k},\delta)$ , we get that if $x\in{U^{\sigma}_{k}}$ and $y=\pi_{M_{\tau}^{A,i}}(x)$ ,

[TABLE]

Furthermore, by construction of the bump function $\Phi_{\tau}^{A,i}$ , if $x\in{U_{k}^{\sigma}}$ and $\tau_{k}=1$ , then

[TABLE]

and

[TABLE]

Now, let us write

[TABLE]

for the offset of $M_{\tau}^{\Lambda,A,i}$ of radius $\sigma/2$ . The sets $\bigl{\{}\mathcal{O}_{\tau}^{A,i}\bigr{\}}_{\tau}$ are closed subsets of $\mathbb{R}^{D}$ with non-empty interiors. Let $\mathbf{P}_{\tau}^{A,i}$ denote the uniform distribution on $\mathcal{O}_{\tau}^{A,i}$ . Finally, let us denote by $P_{\tau}^{A,i}=\bigl{(}\pi_{M_{\tau}^{A,i}}\bigr{)}_{\ast}\mathbf{P}_{\tau}^{A,i}$ the pushforward distributions of $\mathbf{P}_{\tau}^{A,i}$ by the projection maps $\pi_{M_{\tau}^{A,i}}$ . From Lemma 19 in [26], $P_{\tau}^{A,i}$ has a density $f_{\tau}^{A,i}$ with respect to the volume measure on $M_{\tau}^{A,i}$ , and this density satisfies

[TABLE]

and

[TABLE]

Since, by construction, $Vol(M_{0})=c_{d}\tau_{min}^{d}$ , and $c^{\prime}_{d}\leq Vol\bigr{(}M_{\tau}^{\Lambda,A,i}\bigr{)}/Vol(M_{0})\leq C^{\prime}_{d}$ whenever $A/\delta^{i-1}\leq\varepsilon^{\prime}_{d,\tau_{min},i}$ , we get that $P_{\tau}^{A,i}$ belongs to the model $\mathcal{P}^{k}$ provided that $({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough. This proves that under these conditions, the family $\bigl{\{}\mathbf{P}_{\tau}^{A,i}\bigr{\}}_{\tau\in\{0,1\}^{m}}$ is included in the model $\mathcal{P}^{k}({\sigma})$ .

Let us now focus on the bounds on the $L^{1}$ test affinities. Let $\tau\in\{0,1\}^{m}$ and $1\leq k\leq m$ be fixed, and assume, without loss of generality, that $\tau_{k}=0$ (if not, flip the role of $\tau$ and $\tau^{k}$ ). First, note that

[TABLE]

Furthermore, since $\mathbf{P}_{\tau}^{A,i}$ and $\mathbf{P}_{\tau^{k}}^{A,i}$ are the uniform distributions on $\mathcal{O}_{\tau}^{A,i}$ and $\mathcal{O}_{\tau}^{A,i}$ ,

[TABLE]

Furthermore,

[TABLE]

To get a lower bound on the denominator, note that for $\delta\leq\tau_{min}/2$ , $M_{\tau}^{A,i}$ and $M_{\tau^{k}}^{A,i}$ both contain

[TABLE]

so that $\mathcal{O}_{\tau}^{A,i}$ and $\mathcal{O}_{\tau^{k}}^{A,i}$ both contain

[TABLE]

As a consequence, $Vol\left(\mathcal{O}_{\tau}^{A,i}\right)\wedge Vol\left(\mathcal{O}_{\tau^{k}}^{A,i}\right)\geq c_{d}\omega_{d}\tau_{min}^{d}\omega_{D-d}(\sigma/2)^{D-d},$ where $\omega_{\ell}$ denote the volume of a $\ell$ -dimensional unit Euclidean ball.

We now derive an upper bound on $Vol\bigl{(}\mathcal{O}_{\tau}^{A,i}\setminus\mathcal{O}_{\tau^{k}}^{A,i}\bigr{)}$ . To this aim, let us consider $a_{0}=y+\xi\in\mathcal{O}_{\tau}^{A,i}\setminus\mathcal{O}_{\tau^{k}}^{A,i}$ , with $y\in M_{\tau}^{A,i}$ and $\xi\in\bigl{(}T_{y}M^{A,i}_{\tau}\bigr{)}^{\perp}$ . Since $\Phi^{A,i}_{\tau}$ and $\Phi^{A,i}_{\tau^{k}}$ coincide outside $\mathcal{B}(x_{k},\delta)$ , so do $M^{A,i}_{\tau}$ and $M^{A,i}_{\tau^{k}}$ . Hence, one necessarily has $y\in\mathcal{B}(x_{k},\delta)$ . Thus, $\bigl{(}T_{y}M^{A,i}_{\tau}\bigr{)}^{\perp}=T_{y}M_{0}^{\perp}=span(e)+\left\{0\right\}^{d+1}\times\mathbb{R}^{D-d-1}$ , so we can write $\xi=se+z$ with $s\in\mathbb{R}$ and $z\in\left\{0\right\}^{d+1}\times\mathbb{R}^{D-d-1}$ . By definition of $\mathcal{O}_{\tau}^{A,i}$ , $\left\|\xi\right\|=\sqrt{s^{2}+\left\|z\right\|^{2}}\leq\sigma/2$ , which yields $\left\|z\right\|\leq\sigma/2$ and $|s|\leq\sqrt{(\sigma/2)^{2}-\left\|z\right\|^{2}}$ . Furthermore, $y_{0}$ does not belong to $\mathcal{O}_{\tau^{k}}^{A,i}$ , which translates to

[TABLE]

from what we get $|s|\geq\sqrt{(\sigma/2)^{2}-\left\|z\right\|^{2}}-\left\|I_{D}-\Phi_{\tau^{k}}^{A,i}\right\|_{\infty}$ . We just proved that $\mathcal{O}_{\tau}^{A,i}\setminus\mathcal{O}_{\tau^{k}}^{A,i}$ is a subset of

[TABLE]

Hence,

[TABLE]

Similar arguments lead to

[TABLE]

Since $\left\|I_{D}-\Phi_{\tau}^{A,i}\right\|_{\infty}\vee\left\|I_{D}-\Phi_{\tau^{k}}^{A,i}\right\|_{\infty}\leq A\delta^{i}$ , summing up bounds (10) and (11) yields

[TABLE]

To derive the last bound, we notice that since $U_{k}^{\sigma}\subset\mathcal{O}_{\tau}^{A,i}=Supp\bigl{(}\mathbf{P}_{\tau}^{A,i}\bigr{)}$ , we have

[TABLE]

Hence, whenever $A\delta^{i}\leq c_{d}\sigma$ for $c_{d}$ small enough, we get

[TABLE]

Since $m$ can be chosen such that $m\geq c_{d}(\tau_{min}/\delta)^{d}$ , we get the last bound.

Eventually, writting $\mathbf{P}_{\tau}^{(i),\sigma}=\mathbf{P}_{\tau}^{A,i}$ for the particular parameters $A=\varepsilon\delta^{k-i}$ , for $\varepsilon=\varepsilon_{k,d,\tau_{min}}$ small enough, and $\delta$ such that $\frac{3A\delta^{i}}{\sigma}\left(\frac{\delta}{\tau_{min}}\right)^{d}=\frac{1}{n-1}$ yields the result. Such a choice of parameter $\delta$ does meet the condition $A\delta^{i}=\varepsilon\delta^{k}\leq c_{d}\sigma$ , provided that $\sigma\geq\frac{c_{d}}{\varepsilon}\left(\frac{1}{n-1}\right)^{k/d}$ . ∎

C.4 Hypotheses for Manifold Estimation

C.4.1 Proof of Lemma 5

Let us prove Lemma 5, stated here as Lemma C.17.

Lemma C.17.

If $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), there exist $P_{0},P_{1}\in\mathcal{P}^{k}$ with associated submanifolds $M_{0},M_{1}$ such that

[TABLE]

Proof of Lemma C.17.

Following the notation of Section C.2, for $\delta\leq\tau_{min}/4$ and $\Lambda>0$ , consider

[TABLE]

which is a particular case of (7). Define $M^{\Lambda}=\Phi^{\Lambda}(M_{0})$ , and $P^{\Lambda}=\Phi^{\Lambda}_{\ast}P_{0}$ . Under the conditions of Lemma C.13, $P_{0}$ and $P^{\Lambda}$ belong to $\mathcal{P}^{k}$ , and by construction, $d_{H}(M_{0},M^{\Lambda})=\Lambda$ . In addition, since $P_{0}$ and $P^{\Lambda}$ coincide outside $\mathcal{B}(0,\delta)$ ,

[TABLE]

Setting $P_{1}=P^{\Lambda}$ with $\omega_{d}\left(\frac{\delta}{\tau_{min}}\right)^{d}=\frac{1}{n}$ and $\Lambda=c_{k,d,\tau_{min}}\delta^{k}$ for $c_{k,d,\tau_{min}}>0$ small enough yields the result. ∎

C.4.2 Proof of Lemma 6

Here comes the proof of Lemma 6, stated here as Lemma C.17.

Lemma C.18.

If $\tau_{min}{L_{\perp}},\ldots,\tau_{min}^{k-1}{L_{k}},({\tau_{min}^{d}}f_{min})^{-1}$ and ${\tau_{min}^{d}}{f_{max}}$ are large enough (depending only on $d$ and $k$ ), there exist $P_{0}^{\sigma},P_{1}^{\sigma}\in\mathcal{P}^{k}(\sigma)$ with associated submanifolds $M_{0}^{\sigma},M_{1}^{\sigma}$ such that

[TABLE]

Proof of Lemma C.18.

The proof follows the lines of that of Lemma C.16. Indeed, with the notation of Section C.2, for $\delta\leq\tau_{min}/4$ and $0<\Lambda\leq c_{k,d,\tau_{min}}\delta^{k}$ for $c_{k,d,\tau_{min}}>0$ small enough, consider

[TABLE]

Define $M^{\Lambda}=\Phi^{\Lambda}(M_{0})$ . Write $\mathcal{O}_{0}$ , $\mathcal{O}^{\Lambda}$ for the offsets of radii $\sigma/2$ of $M_{0}$ , $M^{\Lambda}$ , and and $\mathbf{P}_{0},\mathbf{P}^{\Lambda}$ for the uniform distributions on these sets.

By construction, we have $d_{H}(M_{0},M^{\Lambda})=\Lambda$ , and as in the proof of Lemma C.16, we get

[TABLE]

Denoting ${P}_{0}^{\sigma}=\mathbf{P}_{0}$ and ${P}_{1}^{\sigma}=\mathbf{P}^{\Lambda}$ with $\Lambda=\varepsilon_{k,d,\tau_{min}}\delta^{k}$ and $\delta$ such that $3\frac{\Lambda}{\sigma}\left(\frac{\delta}{\tau_{min}}\right)^{d}$ yields the result.

∎

C.5 Minimax Inconsistency Results

This section is devoted to the proof of Theorem 1, reproduced here as Theorem C.19.

Theorem C.19.

Assume that $\tau_{min}=0$ . If $D\geq d+3$ , then, for all $k\geq 2$ and $L_{\perp}>0$ , provided that $L_{3}/L_{\perp}^{2},\ldots,{L_{k}}/L_{\perp}^{k-1},{L_{\perp}^{d}}/{f_{min}}$ and ${f_{max}}/{L_{\perp}^{d}}$ are large enough (depending only on $d$ and $k$ ), for all $n\geq 1$ ,

[TABLE]

where the infimum is taken over all the estimators $\hat{T}=\hat{T}\bigl{(}X_{1},\ldots,X_{n}\bigr{)}$ .

Moreover, for any $D\geq d+1$ , provided that $L_{3}/L_{\perp}^{2},\ldots,{L_{k}}/L_{\perp}^{k-1},{L_{\perp}^{d}}/{f_{min}}$ and ${f_{max}}/{L_{\perp}^{d}}$ are large enough (depending only on $d$ and $k$ ), for all $n\geq 1$ ,

[TABLE]

where the infimum is taken over all the estimators $\widehat{II}=\widehat{II}\bigl{(}X_{1},\ldots,X_{n}\bigr{)}$ .

We will make use of Le Cam’s Lemma, which we recall here.

Theorem C.20 (Le Cam’s Lemma [36]).

For all pairs $P,P^{\prime}$ in $\mathcal{P}$ ,

[TABLE]

where the infimum is taken over all the estimators $\hat{\theta}=\hat{\theta}(X_{1},\ldots,X_{n})$ .

Proof of Theorem C.19.

For $\delta\geq\Lambda>0$ , let $\mathcal{C},\mathcal{C}^{\prime}\subset\mathbb{R}^{3}$ be closed curves of the Euclidean space as in Figure 8, and such that outside the figure, $\mathcal{C}$ and $\mathcal{C}^{\prime}$ coincide and are $\mathcal{C}^{\infty}$ . The bumped parts are obtained with a smooth diffeomorphism similar to (7) and centered at $x$ . Here, $\delta$ and $\Lambda$ can be chosen arbitrarily small.

Let $\mathcal{S}^{d-1}\subset\mathbb{R}^{d}$ be a $d-1$ -sphere of radius $1/L_{\perp}$ . Consider the Cartesian products $M_{1}=\mathcal{C}\times\mathcal{S}^{d-1}$ and $M_{1}^{\prime}=\mathcal{C}^{\prime}\times\mathcal{S}^{d-1}$ . $M_{1}$ and $M_{1}^{\prime}$ are subsets of $\mathbb{R}^{d+3}\subset\mathbb{R}^{D}$ . Finally, let $P_{1}$ and $P_{1}^{\prime}$ denote the uniform distributions on $M$ and $M^{\prime}$ . Note that $M$ , $M^{\prime}$ can be built by homothecy of ratio $\lambda=1/L_{\perp}$ from some unitary scaled $M_{1}^{(0)},{M^{\prime}}_{1}^{(0)}$ , similarly to Section 5.3.2 in [2], yielding, from Proposition A.4, that $P_{1},P^{\prime}_{1}$ belong to $\mathcal{P}^{k}_{(x)}$ provided that $L_{3}/L_{\perp}^{2},\ldots,{L_{k}}/L_{\perp}^{k-1},{L_{\perp}^{d}}/{f_{min}}$ and ${f_{max}}/{L_{\perp}^{d}}$ are large enough (depending only on $d$ and $k$ ), and that $\Lambda,\delta$ and $\Lambda^{k}/\delta$ are small enough. From Le Cam’s Lemma C.20, we have for all $n\geq 1$ ,

[TABLE]

By construction, $\angle\bigl{(}T_{x}M_{1},T_{x}M_{1}^{\prime}\bigr{)}=1$ , and since $\mathcal{C}$ and $\mathcal{C}^{\prime}$ coincide outside $\mathcal{B}_{\mathbb{R}^{3}}(0,\delta)$ ,

[TABLE]

Hence, at fixed $n\geq 1$ , letting $\Lambda,\delta$ go to [math] with $\Lambda^{k}/\delta$ small enough, we get the announced bound.

We now tackle the lower bound on curvature estimation with the same strategy. Let $M_{2},M_{2}^{\prime}\subset\mathbb{R}^{D}$ be $d$ -dimensional submanifolds as in Figure 9: they both contain $x$ , the part on the top of $M_{2}$ is a half $d$ -sphere of radius $2/L_{\perp}$ , the bottom part of $M^{\prime}_{2}$ is a piece of a $d$ -plane, and the bumped parts are obtained with a smooth diffeomorphism similar to (7), centered at $x$ . Outside $\mathcal{B}(x,\delta)$ , $M_{2}$ and $M_{2}^{\prime}$ coincide and connect smoothly the upper and lower parts.

Let $P_{2},P^{\prime}_{2}$ be the probability distributions obtained by the pushforward given by the bump maps. Under the same conditions on the parameters as previously, $P_{2}$ and $P_{2}^{\prime}$ belong to $\mathcal{P}^{k}_{(x)}$ according to Proposition A.4. Hence from Le Cam’s Lemma C.20 we deduce

[TABLE]

But by construction, $\left\|II_{x}^{M_{2}}\circ\pi_{T_{x}M_{2}}\right\|_{op}=0$ , and since $M_{2}^{\prime}$ is a part of a sphere of radius $2/L_{\perp}$ nearby $x$ , $\left\|II_{x}^{M_{2}^{\prime}}\circ\pi_{T_{x}M_{2}^{\prime}}\right\|_{op}=L_{\perp}/2$ . Hence,

[TABLE]

Moreover, since $P_{2}$ and $P^{\prime}_{2}$ coincide on $\mathbb{R}^{D}\setminus\mathcal{B}(x,\delta)$ ,

[TABLE]

At $n\geq 1$ fixed, letting $\Lambda,\delta$ go to [math] with $\Lambda^{k}/\delta$ small enough, we get the desired result.

∎

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] {barticle} [author] \bauthor \bsnm Aamari, \bfnm E. \binits E. and \bauthor \bsnm Levrard, \bfnm C. \binits C. ( \byear 2015). \btitle Stability and Minimax Optimality of Tangential Delaunay Complexes for Manifold Reconstruction. \bjournal Ar Xiv e-prints. \endbibitem
2[2] {barticle} [author] \bauthor \bsnm Aamari, \bfnm Eddie \binits E. and \bauthor \bsnm Levrard, \bfnm Clément \binits C. ( \byear 2017). \btitle Non-asymptotic rates for manifold, tangent space and curvature estimation. \endbibitem
3[3] {barticle} [author] \bauthor \bsnm Alexander, \bfnm Stephanie B. \binits S. B. and \bauthor \bsnm Bishop, \bfnm Richard L. \binits R. L. ( \byear 2006). \btitle Gauss equation and injectivity radii for subspaces in spaces of curvature bounded above. \bjournal Geom. Dedicata \bvolume 117 \bpages 65–84. \bdoi 10.1007/s 10711-005-9011-6 \bmrnumber 2231159 (2007 c:53110) \endbibitem
4[4] {barticle} [author] \bauthor \bsnm Arias-Castro, \bfnm E. \binits E., \bauthor \bsnm Lerman, \bfnm G. \binits G. and \bauthor \bsnm Zhang, \bfnm T. \binits T. ( \byear 2013). \btitle Spectral Clustering Based on Local PCA. \bjournal Ar Xiv e-prints. \endbibitem
5[5] {barticle} [author] \bauthor \bsnm Arias-Castro, \bfnm E. \binits E., \bauthor \bsnm Pateiro-López, \bfnm B. \binits B. and \bauthor \bsnm Rodríguez-Casal, \bfnm A. \binits A. ( \byear 2016). \btitle Minimax Estimation of the Volume of a Set with Smooth Boundary. \bjournal Ar Xiv e-prints. \endbibitem
6[6] {barticle} [author] \bauthor \bsnm Boissonnat, \bfnm Jean-Daniel \binits J.-D. and \bauthor \bsnm Ghosh, \bfnm Arijit \binits A. ( \byear 2014). \btitle Manifold reconstruction using tangential Delaunay complexes. \bjournal Discrete Comput. Geom. \bvolume 51 \bpages 221–267. \bdoi 10.1007/s 00454-013-9557-2 \bmrnumber 3148657 \endbibitem
7[7] {bbook} [author] \bauthor \bsnm Boucheron, \bfnm Stéphane \binits S., \bauthor \bsnm Lugosi, \bfnm Gábor \binits G. and \bauthor \bsnm Massart, \bfnm Pascal \binits P. ( \byear 2013). \btitle Concentration inequalities. \bpublisher Oxford University Press, Oxford \bnote A nonasymptotic theory of independence, With a foreword by Michel Ledoux. \bdoi 10.1093/acprof:oso/9780199535255.001.0001 \bmrnumber 3185193 \endbibitem
8[8] {barticle} [author] \bauthor \bsnm Bousquet, \bfnm Olivier \binits O. ( \byear 2002). \btitle A Bennett concentration inequality and its application to suprema of empirical processes. \bjournal C. R. Math. Acad. Sci. Paris \bvolume 334 \bpages 495–500. \bdoi 10.1016/S 1631-073X(02)02292-6 \bmrnumber 1890640 (2003 f:60039) \endbibitem

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Non-asymptotic Rates for Manifold, Tangent Space and Curvature Estimation

Abstract

keywords:

keywords:

1 Introduction

1.1 Overview of the Main Results

2 Ck\mathcal{C}^{k}Ck Models for Submanifolds

2.1 Notation

2.2 Reach and Regularity of Submanifolds

Lemma 1**.**

Definition 1**.**

Definition 2** (Noise-Free and Tubular Noise Models).**

Proposition 1**.**

2.3 Necessity of a Global Assumption

Theorem 1**.**

3 Main Results

3.1 Tangent Spaces

Theorem 2**.**

Theorem 3**.**

3.2 Curvature

Theorem 4**.**

Theorem 5**.**

3.3 Support Estimation

Theorem 6**.**

Theorem 7**.**

4 Conclusion, Prospects

5 Proofs

5.1 Upper bounds

5.1.1 Preliminary results on polynomial expansions

Lemma 2**.**

Lemma 3**.**

Proof of Lemma 3.

Proposition 2**.**

5.1.2 Upper Bound for Tangent Space Estimation

Proof of Theorem 2.

5.1.3 Upper Bound for Curvature Estimation

Proof of Theorem 4.

5.1.4 Upper Bound for Manifold Estimation

Proof of Theorem 6

Lemma 4**.**

5.2 Minimax Lower Bounds

5.2.1 Lower Bound for Manifold Estimation

Theorem 8** (Le Cam’s Lemma [36]).**

Lemma 5**.**

Lemma 6**.**

Proof of Theorem 7.

5.2.2 Lower Bounds for Tangent Space and Curvature Estimation

Lemma 7** (Conditional Assouad).**

Lemma 8**.**

Lemma 9**.**

Proof of Theorem 3.

Proof of Theorem 5.

Acknowledgements

Appendix A: Properties and Stability of the Models

A.1 Property of the Exponential Map in Cτmin2\mathcal{C}^{2}_{\tau_{min}}Cτmin​2​

Lemma A.1**.**

Proof of Lemma A.1.

A.2 Geometric Properties of the Models Ck\mathcal{C}^{k}Ck

Lemma A.2**.**

Proof of Lemma A.2.

Lemma A.3**.**

Proof of Lemma A.3.

A.3 Proof of Proposition 1

Proposition A.4**.**

Proof of Proposition A.4.

Proposition A.5**.**

Proof of Proposition A.5.

Lemma A.6** (Change of variable for the Hausdorff measure).**

Proof of Lemma A.6.

Appendix B: Some Probabilistic Tools

B.1 Volume and Covering Rate

Lemma B.7**.**

Proof of Lemma B.7.

B.2 Concentration Bounds for Local Polynomials

2 $\mathcal{C}^{k}$ Models for Submanifolds

Lemma 1.

Definition 1.

Definition 2 (Noise-Free and Tubular Noise Models).

Proposition 1.

Theorem 1.

Theorem 2.

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

Lemma 2.

Lemma 3.

Proposition 2.

Lemma 4.

Theorem 8 (Le Cam’s Lemma [36]).

Lemma 5.

Lemma 6.

Lemma 7 (Conditional Assouad).

Lemma 8.

Lemma 9.

A.1 Property of the Exponential Map in $\mathcal{C}^{2}_{\tau_{min}}$

Lemma A.1.

A.2 Geometric Properties of the Models $\mathcal{C}^{k}$

Lemma A.2.

Lemma A.3.

Proposition A.4.

Proposition A.5.

Lemma A.6 (Change of variable for the Hausdorff measure).

Lemma B.7.

Proposition B.8.

Proposition B.9.

Lemma B.10.

Lemma C.11 (Conditional Assouad).

Lemma C.12.

Lemma C.13.

Lemma C.14.

Lemma C.15.

Lemma C.16.

Lemma C.17.

Lemma C.18.

Theorem C.19.

Theorem C.20 (Le Cam’s Lemma [36]).