Asymptotic equivalence of non-parametric regression with spherical regressors and Gaussian white noise

Martin Kroll

arXiv:2508.21656·math.ST·May 5, 2026

Asymptotic equivalence of non-parametric regression with spherical regressors and Gaussian white noise

Martin Kroll

PDF

TL;DR

This paper proves that non-parametric regression on spherical data with different sampling designs becomes statistically equivalent to Gaussian white noise models as sample size grows, under certain smoothness conditions.

Contribution

It establishes the asymptotic equivalence of regression experiments with spherical designs and Gaussian white noise, extending the Le Cam theory to spherical regressors.

Findings

01

Regression experiments are asymptotically equivalent to Gaussian white noise models.

02

Equivalence holds over spherical Sobolev and Besov balls under specified conditions.

03

Sharpness of smoothness assumptions is demonstrated through non-equivalence results.

Abstract

We study the asymptotic behaviour of both spherical $t$ -designs and random uniform designs as the set of sampling points in non-parametric regression with spherical regressors of arbitrary dimension. We show that the corresponding regression experiments are asymptotically equivalent, in the sense of Le Cam, to the same sequence of Gaussian white noise experiments as the sample size tends to infinity. More precisely, global asymptotic equivalence is established over spherical Sobolev balls (for both the fixed and the random uniform design case) and over spherical Besov balls (for the fixed design case). Matching non-equivalence results demonstrate that the imposed smoothness assumptions are essentially sharp.

Equations525

Z_{i} = f (X_{i}) + σ ε_{i}, i = 1, \dots, n,

Z_{i} = f (X_{i}) + σ ε_{i}, i = 1, \dots, n,

\differential Z (x) = f (x) \differential μ (x) + σ \differential W (x),

\differential Z (x) = f (x) \differential μ (x) + σ \differential W (x),

n \to \infty lim Δ (E_{n}, F_{n}) = 0.

n \to \infty lim Δ (E_{n}, F_{n}) = 0.

\frac{1}{n} i = 1 \sum n p (x_{i}) = \int_{S^{d}} p (x) \differential μ (x)

\frac{1}{n} i = 1 \sum n p (x_{i}) = \int_{S^{d}} p (x) \differential μ (x)

L^{2} (S^{d}) = {f : S^{d} \to R : ∥ f ∥_{L^{2} (S^{d})}^{2} : = \int_{S^{d}} f^{2} (x) \differential μ (x) < \infty}

L^{2} (S^{d}) = {f : S^{d} \to R : ∥ f ∥_{L^{2} (S^{d})}^{2} : = \int_{S^{d}} f^{2} (x) \differential μ (x) < \infty}

L^{p} (S^{d}) = {f : S^{d} \to R : ∥ f ∥_{L^{p} (S^{d})}^{p} : = \int_{S^{d}} f^{p} (x) \differential μ (x) < \infty} .

L^{p} (S^{d}) = {f : S^{d} \to R : ∥ f ∥_{L^{p} (S^{d})}^{p} : = \int_{S^{d}} f^{p} (x) \differential μ (x) < \infty} .

L^{2} (S^{d}) = ℓ = 0 ⨁ \infty H_{ℓ}^{d},

L^{2} (S^{d}) = ℓ = 0 ⨁ \infty H_{ℓ}^{d},

N_{ℓ}^{d} : = dim H_{ℓ}^{d} = (d ℓ + d) - (d ℓ + d - 2) ≍ ℓ^{d - 1} .

N_{ℓ}^{d} : = dim H_{ℓ}^{d} = (d ℓ + d) - (d ℓ + d - 2) ≍ ℓ^{d - 1} .

f = ℓ = 0 \sum \infty m = 1 \sum N_{ℓ}^{d} θ_{ℓ, m} Y_{ℓ, m},

f = ℓ = 0 \sum \infty m = 1 \sum N_{ℓ}^{d} θ_{ℓ, m} Y_{ℓ, m},

θ_{ℓ, m} = ⟨ f, Y_{ℓ, m} ⟩_{L^{2} (S^{d})} = \int_{S^{d}} f (x) Y_{ℓ, m} (x) \differential μ (x) .

θ_{ℓ, m} = ⟨ f, Y_{ℓ, m} ⟩_{L^{2} (S^{d})} = \int_{S^{d}} f (x) Y_{ℓ, m} (x) \differential μ (x) .

ι : {(ℓ, m) \in N_{0} \times N : 1 \leq m \leq N_{ℓ}^{d}} \to N, (ℓ, m) \mapsto ι (ℓ, m),

ι : {(ℓ, m) \in N_{0} \times N : 1 \leq m \leq N_{ℓ}^{d}} \to N, (ℓ, m) \mapsto ι (ℓ, m),

ι (ℓ, m) \leq ι (ℓ^{'}, m^{'}) \Leftrightarrow ℓ < ℓ^{'} or (ℓ = ℓ^{'} and m \leq m^{'}) .

ι (ℓ, m) \leq ι (ℓ^{'}, m^{'}) \Leftrightarrow ℓ < ℓ^{'} or (ℓ = ℓ^{'} and m \leq m^{'}) .

f = j = 1 \sum \infty θ_{j} Y_{j} .

f = j = 1 \sum \infty θ_{j} Y_{j} .

P_{L}^{d} = ℓ = 0 ⨁ L H_{ℓ}^{d}

P_{L}^{d} = ℓ = 0 ⨁ L H_{ℓ}^{d}

dim P_{L}^{d} = ℓ = 0 \sum L N_{ℓ}^{d} ≍ L^{d} .

dim P_{L}^{d} = ℓ = 0 \sum L N_{ℓ}^{d} ≍ L^{d} .

Π_{H_{L}^{d}} f = m = 1 \sum N_{ℓ}^{d} θ_{ℓ, m} Y_{ℓ, m} and Π_{P_{L}^{d}} f = ℓ = 0 \sum L m = 1 \sum N_{ℓ}^{d} θ_{ℓ, m} Y_{ℓ, m} .

Π_{H_{L}^{d}} f = m = 1 \sum N_{ℓ}^{d} θ_{ℓ, m} Y_{ℓ, m} and Π_{P_{L}^{d}} f = ℓ = 0 \sum L m = 1 \sum N_{ℓ}^{d} θ_{ℓ, m} Y_{ℓ, m} .

m = 1 \sum N_{ℓ}^{d} Y_{ℓ, m}^{2} (x) = N_{ℓ}^{d}, x \in S^{d},

m = 1 \sum N_{ℓ}^{d} Y_{ℓ, m}^{2} (x) = N_{ℓ}^{d}, x \in S^{d},

K P_{θ}^{E} (A) = \int_{X_{E}} K (x, A) \differential P_{θ}^{E} (x), A \in A_{F} .

K P_{θ}^{E} (A) = \int_{X_{E}} K (x, A) \differential P_{θ}^{E} (x), A \in A_{F} .

Δ (E, F) = max {δ (E, F), δ (F, E)}

Δ (E, F) = max {δ (E, F), δ (F, E)}

δ (E, F) = K in f θ \in Θ sup V (K P_{θ}^{E}, P_{θ}^{F}) .

δ (E, F) = K in f θ \in Θ sup V (K P_{θ}^{E}, P_{θ}^{F}) .

n \to \infty lim Δ (E_{n}, F_{n}) = 0.

n \to \infty lim Δ (E_{n}, F_{n}) = 0.

\differential Z (x) = f (x) \differential μ (x) + σ \differential W (x),

\differential Z (x) = f (x) \differential μ (x) + σ \differential W (x),

G_{g}

G_{g}

= \int_{S^{d}} f (x) g (x) \differential μ (x) + σ \int_{S^{d}} g (x) \differential W (x)

g \mapsto \int_{S^{d}} g (x) \differential W (x)

g \mapsto \int_{S^{d}} g (x) \differential W (x)

Cov (\int g (x) \differential W (x), \int h (x) \differential W (x)) = ⟨ g, h ⟩_{L^{2} (S^{d})} .

Cov (\int g (x) \differential W (x), \int h (x) \differential W (x)) = ⟨ g, h ⟩_{L^{2} (S^{d})} .

y_{j} = G_{φ_{j}} = θ_{j} + σ η_{j}, j \in N,

y_{j} = G_{φ_{j}} = θ_{j} + σ η_{j}, j \in N,

V (P_{f}, P_{g}) = 1 - 2Φ (- \frac{∥ f - g ∥ _{L^{2} (S^{d})}}{2 σ}) ≲ σ^{- 1} ∥ f - g ∥_{L^{2} (S^{d})},

V (P_{f}, P_{g}) = 1 - 2Φ (- \frac{∥ f - g ∥ _{L^{2} (S^{d})}}{2 σ}) ≲ σ^{- 1} ∥ f - g ∥_{L^{2} (S^{d})},

Z_{i} = f (x_{i}) + σ ε_{i}, i = 1, \dots, n,

Z_{i} = f (x_{i}) + σ ε_{i}, i = 1, \dots, n,

\differential Z (x) = f (x) \differential μ (x) + \frac{σ}{n} \differential W (x) .

\differential Z (x) = f (x) \differential μ (x) + \frac{σ}{n} \differential W (x) .

f_{m} = j = 1 \sum m β_{j} ψ_{j}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Asymptotic equivalence of non-parametric regression with spherical regressors

and Gaussian white noise

Martin Kroll

Universität Bayreuth

Abstract

We study the asymptotic behaviour of both spherical $t$ -designs and random uniform designs as the set of sampling points in non-parametric regression with spherical regressors of arbitrary dimension. We show that the corresponding regression experiments are asymptotically equivalent, in the sense of Le Cam, to the same sequence of Gaussian white noise experiments as the sample size tends to infinity. More precisely, global asymptotic equivalence is established over spherical Sobolev balls (for both the fixed and the random uniform design case) and over spherical Besov balls (for the fixed design case). Matching non-equivalence results demonstrate that the imposed smoothness assumptions are essentially sharp.

Keywords: non-parametric regression, spherical $t$ -designs, random uniform design, Gaussian white noise, Le Cam distance, asymptotic equivalence of experiments

1 Introduction

Regression models with spherical regressors of arbitrary dimension and real-valued responses occur in a wide variety of scientific disciplines. Instances of such models appear in the Earth sciences, where physical processes on the Earth (which is approximately a two-dimensional sphere with radius $6371$ km) are considered. Specific examples are the global temperature field [OL04] and the Earth’s magnetic field [HCM03]. Other applications involving functions on the two-dimensional sphere can be found in biology, for instance, for the purpose of cell-shape modeling [RM18], or in texture analysis [SB03]. The special case of functions defined on the three-dimensional sphere arises in crystallography, where it can be used to describe probability distributions of crystalline orientations [MS08], or in medical imaging [Hos+13]. Besides, spherical harmonic expansions find various applications in quantum theory [Ave93].

Motivated by its wide range of applications, we consider the non-parametric regression model with observations $Z_{1},\ldots,Z_{n}$ obeying the model equations

[TABLE]

where $f\colon\mathbb{S}^{d}\to\mathbb{R}$ is the unknown regression function defined on the $d$ -dimensional sphere $\mathbb{S}^{d}=\{x\in\mathbb{R}^{d+1}:\lVert x\rVert=1\}$ , $\mathbb{X}=\{X_{1},\ldots,X_{n}\}\subset\mathbb{S}^{d}$ is the finite set of (deterministic or random) sampling points (following the standard convention, we will from now on denote deterministic sampling points with lowercase and random sampling points with uppercase letters), and $\varepsilon_{1},\ldots,\varepsilon_{n}$ are i.i.d. standard Gaussian random variables independent of the design. The noise level $\sigma>0$ is assumed to be known. Non-parametric estimators of the regression function $f$ in model (1.1) have already been studied: [Wah81] and [ANS96] consider spline interpolation and smoothing on the sphere. Local polynomial smoothing for circular data is treated in [DMPT09]. Series estimators in terms of spherical harmonics as well as wavelet like series estimators are studied in [NPW06a, Wia+08, Mon11]. Let us also mention that, in the context of density estimation with spherical data, kernel methods [HWC87], Fourier expansions [Hen90], and methods based on spherical needlets [Bal+09] have already been considered.

The theoretical analysis of the regression model (1.1), notwithstanding its practical relevance, is hindered by the discrete nature of the model due to the measurements taken at isolated sampling points. Consequently, one might be tempted to replace the model (1.1) by a continuous Gaussian white noise model,

[TABLE]

where $\widetilde{\sigma}=\widetilde{\sigma}(\sigma,n)>0$ is a suitable noise level, depending on both the noise level $\sigma$ and the sample size $n$ in the discrete model (1.1), $\mu$ is the normalized surface area measure on $\mathbb{S}^{d}$ and $\differential W$ a standard Gaussian white noise process on $\mathbb{S}^{d}$ . Indeed, model (1.2) allows for a rigorous and neat mathematical analysis, which is conducted in [Kle99] where the sharp asymptotic minimax risk for different function classes and loss functions is derived. To the best of our knowledge, no theoretical study has yet addressed the relationship between the more realistic observation model (1.1) and the more tractable model (1.2). The present paper aims to close this gap.

More precisely, the main purpose of the present work is to state (essentially sharp) conditions on the set $\mathbb{X}$ of sampling points and the class of admissible regression functions that allow to replace the discrete model (1.1) with the continuous surrogate (1.2) (or vice versa). For this, we rely on Le Cam’s theory of asymptotic equivalence of experiments. Within this theory, the discrepancy between statistical experiments $\mathfrak{E}$ and $\mathfrak{F}$ sharing the same parameter space $\Theta$ is quantified using a pseudo-metric $\Delta$ , commonly referred to as the Le Cam distance. Two sequences $(\mathfrak{E}_{n})$ and $(\mathfrak{F}_{n})$ of statistical experiments having the same parameter space $\Theta$ are said to be asymptotically equivalent, in the sense of Le Cam, if

[TABLE]

From an inferential point of view, asymptotically equivalent experiments are equally informative in the limit. For a more comprehensive account of asymptotic equivalence theory we refer the reader to Chapter 1 of [GN16], as well as the survey papers [Nus04] and [Mar16].

Since the seminal paper [BL96], asymptotic equivalence of many non-parametric experiments has been established. The articles [BL96, Roh04, Rei08] consider fixed design regression on the unit interval with equidistant sampling points, whereas [Bro+02] deals with the random design case. [Rei08] extends the results from [BL96] and [Roh04] also to the multivariate and random design case. In addition, the papers [BL96] and [Rei08] discuss minor deviations from the assumption of equidistant sampling points. [GN02] considers asymptotic equivalence for non-parametric regression with centered, but non-Gaussian, noise, whereas [MR13] considers non-regular errors. The contributions [Car09], [SH14], and [DK22] weaken the i.i.d. assumption on the noise. The paper [CZ09] develops asymptotic equivalence theory for robust non-parametric regression. The limits of asymptotic equivalence theory are discussed in [BZ98] and [ES96], respectively, by providing examples of asymptotically non-equivalent experiments. However, all the papers cited so far discuss asymptotic equivalence for regression experiments and a corresponding Gaussian white noise model only when the regression domain is a subset of some Euclidean space (admittedly, the case of periodic regression functions on $[0,1]^{d}$ , considered also in [Rei08], can be interpreted as a first step to a manifold setup due to the topological identification of $[0,1]^{d}$ and the $d$ -dimensional torus $\mathbb{T}^{d}$ ).

Concerning the choice of design points, the asymptotic equivalence results cited above are obtained under the assumption that the sampling points are evenly spread over the whole regression domain. In the deterministic design case this is achieved by choosing sampling points forming a regular grid, in the random design case the natural approach is to sample from the uniform distribution on the regression domain. In the following, we briefly review these two cases which will be considered in the main part of the paper.

In the one-dimensional case considered in [BL96] and [Roh04], the target parameter is a function defined on the unit interval, and the canonical deterministic design that yields asymptotic equivalence with the corresponding Gaussian white noise model is the equidistant grid with sampling points $x_{i}=i/n$ for $i=1,\ldots,n$ . In the multivariate setup, studied in [Rei08], the regression domain is given by the $d$ -dimensional unit cube $[0,1]^{d}$ and a regular grid of sampling points of the form $(i_{1}/m,\ldots,i_{d}/m)$ with $m=n^{1/d}$ and $i_{1},\ldots,i_{d}\in\{1,\ldots,m\}$ is assumed. For more complicated regression domains, possibly subsets of manifolds, a comparable notion of evenly spread deterministic point sets is not evident. For the special case of spheres, various measures to evaluate the distributional properties of finite point sets exist [BG15]. Besides its intrinsic mathematical motivation, the problem of finding evenly spread points on spheres is of fundamental relevance in fields like viral morphology, crystallography, molecular structure, and electrostatics. We refer the reader to [SK97] for further discussion.

In the following, we will build on the notion of spherical $t$ -designs as originally introduced in [DGS77]. A finite, non-empty set $\mathbb{X}=\{x_{1},\ldots,x_{n}\}\subset\mathbb{S}^{d}$ is called a spherical $t$ -design if the identity

[TABLE]

holds for all polynomials $p$ of total degree $\leq t$ in $d+1$ variables. Since the work of [SZ84], it is known that spherical $t$ -designs exist for all combinations of $t$ and $d$ , provided that $n$ is sufficiently large. In [BRV13] it has finally been proven that spherical $t$ -designs in $\mathbb{S}^{d}$ exist for all $n\geq C_{d}t^{d}$ with a numerical constant $C_{d}>0$ depending only on the dimension of the sphere. This result is essentially optimal since for the minimal number $N(d,t)$ of points forming a spherical $t$ -design the estimate $N(d,t)\gtrsim t^{d}$ has already been established in [DGS77].

The notion of spherical $t$ -designs, originally introduced in the field of algebraic combinatorics, has connections to various fields of mathematics and we refer to the survey article [BB09] for a comprehensive overview. In the area of numerical analysis, spherical $t$ -designs have already attracted some interest, for instance, as cubature points for numerical integration of functions on spheres [HSW10]. Spherical $t$ -designs of small cardinality and low dimension are explicitly known and correspond to highly symmetrical point configurations. For instance, spherical $t$ -designs on the two-dimensional sphere are obtained as the vertex sets of the regular tetrahedron (for $t=2$ ), the cube and the regular octahedron (both for $t=3$ ), the regular dodecahedron and the regular icosahedron (both for $t=5$ ). For larger values of $t$ , spherical $t$ -designs possessing good geometric properties are also known to exist [Wom18]. Figure 1 illustrates spherical $t$ -designs on the two-dimensional sphere for two larger values of $t$ , highlighting their excellent distributional properties.

In view of these properties, spherical $t$ -designs seem to be a promising choice of sampling points in (non-parametric) regression with spherical regressors of arbitrary dimension. This heuristic will be supported in Section 3 by means of appropriate asymptotic equivalence results. The first main result of this paper provides an upper bound on the Le Cam distance between the models (1.1) and (1.2) for a general function class $\Theta$ when the fixed sampling points form a spherical $t$ -design. In the sequel, this general result is applied to two special cases: First, we prove global asymptotic equivalence for regression functions from spherical Sobolev balls of smoothness $s>d/2$ . Sobolev spaces on spheres can either be defined in terms of charts or via the decay of coefficients in spherical harmonic expansions. We rely on the latter characterization which is sufficient for our purposes. As a second application, which extends the first one, we consider spherical Besov spaces $B^{s}_{r,q}(\mathbb{S}^{d})$ which can be defined in terms of a function’s coefficients with respect to a set of spherical needlets. For this more general setup, we derive asymptotic equivalence between the models (1.1) and (1.2) under the assumption $s>d/r$ . This condition and the analogous condition $s>d/2$ in the Sobolev case guarantee that the considered function spaces can be continuously embedded in the Banach space $C(\mathbb{S}^{d})$ of continuous functions on the sphere.

Proving global asymptotic equivalence over Sobolev and Besov balls builds on results from numerical analysis concerning the approximation by a so-called hyperinterpolation [Slo95] which, in statistical terminology, coincides with the least-squares estimator in certain cases. The hyperinterpolation of the regression function is used to define an intermediate experiment between the experiments defined by (1.1) and (1.2), respectively. Since the sample size $n$ must in general be chosen strictly larger than the model dimension of the intermediate experiment, the hyperinterpolation does usually not interpolate the regression function at the design points. In contrast, in the cited papers for the Euclidean setting, for instance [Rei08], asymptotic equivalence is proven by means of an intermediate experiment defined in terms of a suitable interpolation of the regression function. Consequently, in the Euclidean case, the intermediate experiment and the regression experiment (1.1) are even non-asymptotically equivalent. In the spherical case, equivalence in the sense of Le Cam between the intermediate experiment and the regression experiment holds only asymptotically and additional estimates are necessary. In the general bound on the Le Cam distance between the experiments (1.1) and (1.2), this leads to an extra term that is not present in the Euclidean setting considered in [Rei08].

In the second part of the paper, we will consider the regression model (1.1) where the design points are i.i.d. according to the uniform distribution $\mathcal{U}(\mathbb{S}^{d})$ , that is, the normalized surface area measure on the sphere. This model is more realistic in many applications where sampling points cannot be chosen by the experimenter but are themselves random. Two instances of random uniform designs, with sample sizes equal to the ones in Figure 1, are shown in Figure 2 for the case $d=2$ .

A comparison of Figures 1 and 2 suggests that a typical realization of a random design is much less regular than a spherical $t$ -design with good geometric properties (there exist both data voids and exceptionally close sampling points within the random design). Furthermore, the exact cubature formula (1.3) for spherical $t$ -designs holds only in expectation for the random uniform design case which makes the analysis much more involved. This has already been noticed in [Bro+02] and [Rei08] where a separate investigation of low- and high-frequency coefficients of the regression function was necessary in order to establish asymptotic equivalence with a Gaussian white noise model analogous to (1.2). This separate treatment of low- and high-frequency coefficients appears also in our analysis, which in its overall structure follows the one from [Rei08]. However, some special properties related to the underlying spherical geometry are of importance and additional tools from numerical analysis and representation theory will be used. Using these tools, asymptotic equivalence of random uniform design regression and the Gaussian white noise model is proven over Sobolev balls on the sphere. As in the fixed design case, asymptotic equivalence holds for $s>d/2$ . The non-equivalence of both fixed and random design regression and Gaussian white noise in the regime $s\leq d/2$ for Sobolev balls (or $s\leq d/r$ for Besov balls) is established in Section 5.

The rest of the paper is organized as follows. In the preliminary Section 2, we provide some background on spherical harmonic expansions, asymptotic equivalence theory, and the Gaussian white noise model. In Section 3.1 we derive the announced general bound on the Le Cam distance between the regression experiment defined by (1.1) on a spherical $t$ -design and the Gaussian white noise model (1.2). This bound is then used to establish asymptotic equivalence between these models when the target functional parameter belongs to a Sobolev (Section 3.2.1) or Besov (Section 3.2.2) ball. Section 4 deals with asymptotic equivalence of random uniform design regression and Gaussian white noise over Sobolev balls. In Section 5, we prove matching non-equivalence results showing that the smoothness assumptions from Sections 3 and 4 cannot be improved. In Section 6, we conclude with a brief discussion indicating some connections to optimal design theory and open problems. All proofs are deferred to Section 7.

Notation

For sequences $(a_{n})$ , $(b_{n})$ we write $a_{n}\lesssim b_{n}$ if there exists a universal (and irrelevant) numerical constant $C>0$ such that $a_{n}\leq Cb_{n}$ holds. The notion $\gtrsim$ is defined analogously and $a_{n}\asymp b_{n}$ means that both $a_{n}\lesssim b_{n}$ and $b_{n}\lesssim a_{n}$ hold simultaneously. Throughout, we denote the identity operator on a space $S$ by $\mathrm{Id}_{S}$ , and the $m\times m$ identity matrix by $I_{m}$ . A norm $\lVert\,\boldsymbol{\cdot}\,\rVert$ without subindex refers to the usual Euclidean norm (where the dimension of the Euclidean space is suppressed in the notation).

2 Preliminaries

2.1 Spherical harmonics

Let $\mu$ be the normalized surface area measure on the $d$ -dimensional sphere $\mathbb{S}^{d}=\{x\in\mathbb{R}^{d+1}:\lVert x\rVert=1\}$ in $d+1$ -dimensional ambient Euclidean space. We denote with

[TABLE]

the Hilbert space of (equivalence classes of) square-integrable, real-valued functions on $\mathbb{S}^{d}$ . More generally, for any $p\in[1,\infty]$ , we consider the Banach spaces

[TABLE]

The Laplace-Beltrami operator $\Delta=\mathrm{div}\circ\nabla$ on $\mathbb{S}^{d}$ gives rise to the decomposition

[TABLE]

where $\mathscr{H}^{d}_{\ell}$ , $\ell\in\mathbb{N}_{0}$ , denote the eigenspaces associated with the increasing sequence $(\lambda^{d}_{\ell})$ of non-negative eigenvalues of $-\Delta$ . It is known that $\lambda^{d}_{\ell}=\ell(\ell+d-1)$ and that the dimension of $\mathscr{H}^{d}_{\ell}$ equals

[TABLE]

For any $\ell\in\mathbb{N}_{0}$ , we choose an orthonormal basis $\{Y_{\ell,m},m=1,\ldots,N^{d}_{\ell}\}$ of $\mathscr{H}^{d}_{\ell}$ consisting of real-valued eigenfunctions $Y_{\ell,m}$ of $-\Delta$ associated with the eigenvalue $\lambda^{d}_{\ell}$ . Any such set of orthonormal basis functions is referred to as spherical harmonics of degree $\ell$ . Any $f\in L^{2}(\mathbb{S}^{d})$ can be represented as an infinite series

[TABLE]

with coefficients

[TABLE]

For convenience, we suppress the dependence of the spherical harmonics $Y_{\ell,m}$ and the corresponding coefficients $\theta_{\ell,m}$ on $d$ in our notation. On occasion, it will turn out convenient to switch from the double index notation to a single index. This is achieved by a one-to-one enumeration function

[TABLE]

for which we additionally assume that

[TABLE]

In this case, we set $Y_{j}=Y_{\ell,m}$ and $\theta_{j}=\theta_{\ell,m}$ if $j=\iota(\ell,m)$ . For instance, the representation (2.1) can alternatively written as

[TABLE]

We denote with

[TABLE]

the finite-dimensional space spanned by all spherical harmonics up to degree $L\in\mathbb{N}_{0}$ which has dimension

[TABLE]

We write $\Pi_{V}f$ for the $L^{2}(\mathbb{S}^{d})$ -orthogonal projection of $f$ onto some subspace $V$ of $L^{2}(\mathbb{S}^{d})$ . In particular, for $f$ in (2.1) we have

[TABLE]

In the proofs of Section 7, we will on several occasions exploit the addition formula

[TABLE]

for spherical harmonics (see [DX13], Lemma 1.2.3 and Corollary 1.2.7).

2.2 Asymptotic equivalence of statistical experiments

We briefly recall the notion of Le Cam distance between statistical experiments and collect some properties that are essential for the remainder of the paper. Let $\mathfrak{E}=(\mathsf{X}_{\mathfrak{E}},\mathcal{A}_{\mathfrak{E}},(\mathbf{P}_{\theta}^{\mathfrak{E}})_{\theta\in\Theta})$ and $\mathfrak{F}=(\mathsf{X}_{\mathfrak{F}},\mathcal{A}_{\mathfrak{F}},(\mathbf{P}^{\mathfrak{F}}_{\theta})_{\theta\in\Theta})$ be two statistical experiments sharing the same parameter space $\Theta$ . Recall that a Markov kernel $\mathsf{K}\colon\mathsf{X}_{\mathfrak{E}}\times\mathcal{A}_{\mathfrak{F}}\to[0,1]$ induces a map that transports probability measures from $(\mathsf{X}_{\mathfrak{E}},\mathcal{A}_{\mathfrak{E}})$ to $(\mathsf{X}_{\mathfrak{F}},\mathcal{A}_{\mathfrak{F}})$ . Given a probability measure $\mathbf{P}_{\theta}^{\mathfrak{E}}$ on $(\mathsf{X}_{\mathfrak{E}},\mathcal{A}_{\mathfrak{E}})$ , the probability measure $\mathsf{K}\mathbf{P}_{\theta}^{\mathfrak{E}}$ on $(\mathsf{X}_{\mathfrak{F}},\mathcal{A}_{\mathfrak{F}})$ is defined by

[TABLE]

The Le Cam distance between $\mathfrak{E}$ and $\mathfrak{F}$ is defined as the symmetrized quantity

[TABLE]

where the deficiency $\delta(\mathfrak{E},\mathfrak{F})$ is defined by

[TABLE]

The infimum in this definition is taken over all Markov kernels $\mathsf{K}\colon(\mathsf{X}_{\mathfrak{E}},\mathcal{A}_{\mathfrak{E}})\to(\mathfrak{X}_{\mathfrak{F}},\mathcal{A}_{\mathfrak{F}})$ and $V$ denotes the total variation distance. If $\Delta(\mathfrak{E},\mathfrak{F})=0$ , the experiments $\mathfrak{E}$ and $\mathfrak{F}$ are said to be (exactly) equivalent in the sense of Le Cam. More generally, two sequences $(\mathfrak{E}_{n})$ and $(\mathfrak{F}_{n})$ of statistical experiments having the same parameter space $\Theta$ are said to be asymptotically equivalent if

[TABLE]

2.3 Gaussian white noise

Consider the Gaussian white noise model with continuous observation

[TABLE]

where $f\in L^{2}(\mathbb{S}^{d})$ , $\widetilde{\sigma}>0$ , and $\differential W$ is standard Gaussian white noise on the $d$ -dimensional sphere. The stochastic differential equation (2.4) can be interpreted in a distributional sense as follows: (2.4) is equivalent to observing a Gaussian process $G$ , indexed by the set $L^{2}(\mathbb{S}^{d})$ of test functions, which is defined by

[TABLE]

for $g\in L^{2}(\mathbb{S}^{d})$ . Here, the white noise part

[TABLE]

is a centered Gaussian process with covariance structure

[TABLE]

Evaluating the process $G$ at an orthonormal basis $\{\varphi_{j}\}_{j\in\mathbb{N}}$ of $L^{2}(\mathbb{S}^{d})$ shows that observing (2.4) is equivalent to the infinite-dimensional Gaussian sequence model

[TABLE]

where $\theta_{j}=\langle f,\varphi_{j}\rangle_{L^{2}(\mathbb{S}^{d})}$ is the sequence of Fourier coefficients (with respect to the basis $\{\varphi_{j}\}_{j\in\mathbb{N}}$ ) and $(\eta_{j})_{j\in\mathbb{N}}$ is a sequence of independent standard Gaussian random variables. Let $\mathbf{P}_{f}$ denote the distribution of the process $Z$ in (2.4). The following expression for the total variation distance between $\mathbf{P}_{f}$ and $\mathbf{P}_{g}$ is derived in [Car06], Section 3.2, and will be used frequently in the proofs of Section 7:

[TABLE]

where $\Phi$ denotes the distribution function of a standard Gaussian random variable.

3 Regression on spherical $t$ -designs

For a class $\Theta$ of functions $f\colon\mathbb{S}^{d}\to\mathbb{R}$ and a finite, non-empty set $\mathbb{X}=\{x_{1},\ldots,x_{n}\}\subset\mathbb{S}^{d}$ of fixed sampling points, we denote by $\mathfrak{F}^{d}_{n}=\mathfrak{F}^{d}_{n}(\Theta,\mathbb{X})$ the regression experiment with observations

[TABLE]

where $f\in\Theta$ is the unknown regression function, $\varepsilon_{1},\ldots,\varepsilon_{n}$ are i.i.d. standard Gaussian, and $\sigma>0$ . We assume that the standard deviation $\sigma$ of the additive noise is known and suppress the dependence of the experiment $\mathfrak{F}^{d}_{n}$ on $\sigma$ in the notation. Similarly, we denote by $\mathfrak{G}^{d}_{n}=\mathfrak{G}^{d}_{n}(\Theta)$ the Gaussian white noise experiment defined via Equation (1.2) in the introduction with $\widetilde{\sigma}=\sigma/\sqrt{n}$ , that is, given by

[TABLE]

3.1 A general bound for the Le Cam distance

In order to state a general bound on the Le Cam distance $\Delta(\mathfrak{F}^{d}_{n},\mathfrak{G}^{d}_{n})$ , we introduce some further notation. Given a finite set $\{\psi_{1},\ldots,\psi_{m}\}$ of approximating functions in $L^{2}(\mathbb{S}^{d})$ , not necessarily forming an orthonormal basis of their span $S=\mathrm{span}(\{\psi_{1},\ldots,\psi_{m}\})$ , we consider the finite-dimensional approximation

[TABLE]

of a function $f\in L^{2}(\mathbb{S}^{d})$ where

[TABLE]

The empirical counterparts $\widetilde{\beta}_{j}$ of the coefficients $\beta_{j}$ are obtained by the corresponding equal-weight cubature rules at the sampling points $\mathbb{X}=\{x_{1},\ldots,x_{n}\}$ ,

[TABLE]

Since spherical harmonics are restrictions of polynomials in $d+1$ variables to the unit sphere with the index $\ell$ indicating the degree of the polynomial associated with $Y_{\ell,m}$ (see [DX13], Chapter 1, for details), a spherical $t$ -design $\mathbb{X}=\{x_{1},\ldots,x_{n}\}$ with $t=L$ yields an equal-weight cubature rule that is exact for functions in $\mathscr{P}_{L}^{d}$ , that is,

[TABLE]

The semi-norm associated with the inner product $\langle\boldsymbol{\cdot},\boldsymbol{\cdot}\rangle_{n}$ is denoted by $\lVert\,\boldsymbol{\cdot}\,\rVert_{n}$ . Replacing $\beta_{j}$ in (3.3) with $\widetilde{\beta}_{j}$ leads to the empirical approximation

[TABLE]

Note that while $f_{m}$ depends on $f$ via exact $L^{2}(\mathbb{S}^{d})$ -inner products, the approximation $\widetilde{f}_{m}$ is fully data-driven by means of the cubature formula (3.4). Associated with $\widetilde{f}_{m}$ we introduce the intermediate experiment $\widetilde{\mathfrak{G}}^{d}_{n}=\widetilde{\mathfrak{G}}^{d}_{n}(\Theta,\mathbb{X})$ with continuous observation

[TABLE]

Note that the intermediate experiment $\widetilde{\mathfrak{G}}^{d}_{n}$ , contrary to the Gaussian white noise experiment $\mathfrak{G}_{n}^{d}$ , depends on the sampling points $\mathbb{X}$ via the empirical coefficients $\widetilde{\beta}_{j}$ . The following theorem states the announced upper bound on the Le Cam distance $\Delta(\mathfrak{F}_{n}^{d},\mathfrak{G}_{n}^{d})$ .

Theorem 3.1.

Consider the experiments $\mathfrak{F}_{n}^{d}$ , defined by (3.1), and $\mathfrak{G}_{n}^{d}$ , defined by (3.2), where the unknown parameter $f$ belongs to some class $\Theta$ of functions defined on the $d$ -dimensional sphere $\mathbb{S}^{d}$ . Assume that $S=\mathrm{span}(\{\psi_{1},\ldots,\psi_{m}\})\subseteq\mathscr{P}^{d}_{L}$ for some $L\in\mathbb{N}_{0}$ and that $\mathbb{X}=\{x_{1},\ldots,x_{n}\}\subset\mathbb{S}^{d}$ is a spherical $t$ -design for $t\geq 2L$ . Then, the Le Cam distance between the two experiments is bounded by

[TABLE]

where

[TABLE]

and

[TABLE]

*Remark 3.2**.*

Let us briefly comment on the two terms appearing in the bound on $\Delta(\mathfrak{F}_{n}^{d},\mathfrak{G}_{n}^{d})$ derived in the theorem. The second term

[TABLE]

is not surprising as it already appears in known results for the multivariate Euclidean case studied before (see, for instance, the bound stated in Theorem 2.4 of [Rei08]). In contrast, the first term

[TABLE]

is new. In the setting of Theorem 3.1, the approximant $\widetilde{f}_{m}$ can in general not be chosen to interpolate $f$ at the sampling points $x_{1},\ldots,x_{n}$ . This is due to the non-existence, except for few special cases, of so-called tight spherical $t$ -designs. A more detailed discussion of this issue is given in Remark 3.6.

*Remark 3.3**.*

The proof of Theorem 3.1 is constructive in the sense that it provides explicit Markov kernels for transferring observations between the two considered experiments. These kernels involve randomizations, which, however, require knowledge of the noise level $\sigma$ . This assumption is common in the literature on asymptotic equivalence between non-parametric regression and Gaussian white noise experiments (see, for instance, [Bro+02, Roh04, Rei08]). Only few works address the more realistic case with an unknown noise variance, [Car07] being a notable contribution in this direction.

*Remark 3.4**.*

From Theorem 3.1 it is easily derivable that global asymptotic equivalence follows if the class $\Theta$ consists of functions $f$ that satisfy the smoothness condition

[TABLE]

with $L=L(n)\to\infty$ as $n\to\infty$ . This smoothness assumption appears in the numerical analysis literature since it guarantees uniform convergence of the hyperinterpolation $\widetilde{f}_{m}$ to $f$ [TVB19]. For instance, for band-limited functions, (3.7) is trivially fulfilled. More precisely, for the function class

[TABLE]

for some fixed $L^{\ast}\in\mathbb{N}_{0}$ , the considered regression model $\mathfrak{F}_{n}^{d}$ and the Gaussian white noise model $\mathfrak{G}_{n}^{d}$ are exactly equivalent, for $n$ resp. $L$ sufficiently large, to observing a multivariate Gaussian of dimension $\dim\mathscr{P}^{d}_{L^{\ast}}=\sum_{\ell=0}^{L^{\ast}}N^{d}_{\ell}$ with unknown mean in $\mathbb{R}^{\dim\mathscr{P}^{d}_{L^{\ast}}}$ and covariance matrix $\sigma^{2}n^{-1}I_{\dim\mathscr{P}^{d}_{L^{\ast}}}$ .

3.2 Results for specific function classes

In the following two sections, we apply the general Theorem 3.1 to two specific function classes: Sobolev balls (Section 3.2.1) and Besov balls (Section 3.2.2).

3.2.1 Asymptotic equivalence over Sobolev balls

For a smoothness parameter $s>0$ (not necessarily being an integer), we define the (fractional) Sobolev norm $\lVert\,\cdot\,\rVert_{H_{2}^{s}(\mathbb{S}^{d})}$ by

[TABLE]

where $(\theta_{\ell,m})$ is the sequence of Fourier coefficients of the function $f$ defined via (2.2). The Sobolev space $H_{2}^{s}(\mathbb{S}^{d})$ is defined as the set of functions $f\in\mathbb{L}^{2}(\mathbb{S}^{d})$ such that $\lVert f\rVert_{H_{2}^{s}(\mathbb{S}^{d})}<\infty$ . It is easy to verify that

[TABLE]

(replacing the $L^{2}(\mathbb{S}^{d})$ -norm by another $L^{p}(\mathbb{S}^{d})$ -norm can be used to define fractional Sobolev spaces $H_{p}^{s}(\mathbb{S}^{d})$ for general $p\in[1,\infty]$ ). The Fourier multiplier approach presented here to define fractional Sobolev spaces is sufficient for the purposes of this paper. This approach is equivalent to the one that defines Sobolev spaces in terms of charts [Tri86]. The first specific function class for which asymptotic equivalence results will be derived in the sequel are the Sobolev balls

[TABLE]

consisting of functions in $H^{s}_{2}(\mathbb{S}^{d})$ with Sobolev norm bounded by $R>0$ .

In order to derive from Theorem 3.1 asymptotic equivalence of $\mathfrak{F}_{n}^{d}$ and $\mathfrak{G}_{n}^{d}$ over Sobolev balls, we take $\{\psi_{1},\ldots,\psi_{m}\}$ as the set of all spherical harmonics up to a certain resolution level $L$ , that is, we set $\psi_{j}=Y_{\ell,m}$ if $j=\iota(\ell,m)$ for the enumeration function $\iota$ introduced in Section 2.1. Consequently, $m=\dim\mathscr{P}^{d}_{L}\asymp L^{d}$ . In this specific setup, assuming that the set $\mathbb{X}$ of sampling points is a spherical $t$ -design with $t\geq 2L$ ensures that $\widetilde{f}_{m}$ defined in (3.5) is the least-squares approximation of $f$ in $\mathscr{P}_{L}^{d}$ based on the data $(x_{1},f(x_{1})),\ldots,(x_{n},f(x_{n}))$ , that is,

[TABLE]

In fact, this follows from the inclusion $\mathscr{P}^{d}_{L}\cdot\mathscr{P}^{d}_{L}\subseteq\mathscr{P}^{d}_{2L}$ which implies that for the design matrix $X=(\psi_{j}(x_{i}))\in\mathbb{R}^{n\times m}$ we have $X^{\top}X=nI_{m}$ . Hence, the least-squares approximation is given by $\sum_{j=1}^{m}\widetilde{\beta}_{j}\psi_{j}$ where $\widetilde{\beta}=(\widetilde{\beta}_{1},\ldots,\widetilde{\beta}_{m})^{\top}$ satisfies

[TABLE]

From this, we obtain $\widetilde{\beta}_{j}=n^{-1}\sum_{i=1}^{n}f(x_{i})\psi_{j}(x_{i})=\langle f,\psi_{j}\rangle_{n}$ , showing that the least-squares approximation coincides with $\widetilde{f}_{m}$ in this case. Before we devote ourselves to the application of Theorem 3.1 to Sobolev balls, we make two remarks.

*Remark 3.5**.*

For the case of Sobolev balls, condition (3.7) leads to a stronger assumption on the interplay of the sample size $n$ , the resolution level $L$ , and the smoothness parameter $s$ than necessary. More precisely, we will consider below spherical $t$ -designs of cardinality $n\asymp L^{d}$ which is the minimal order achievable [DGS77]. Then, global asymptotic equivalence can be achieved under the assumption $s>d/2$ . From (3.7), only the more restrictive condition $s>d$ can be obtained. In fact, this is a direct consequence of Equation (26) in [Kus00] which implies that

[TABLE]

This lower bound implies that the condition $L^{-s+d}\to 0$ as $L=L(n)\to\infty$ , which is equivalent to $s>d$ , must be satisfied in order to guarantee that the term $\Delta_{1}$ converges to zero as desired. The less restrictive condition $s>d/2$ , established below by finding an appropriate upper bound for the quantity $\lVert f-\Pi_{\mathscr{P}^{d}_{L}}f\rVert_{n}$ , is also the one expected from the corresponding results in the Euclidean case [Rei08].

*Remark 3.6**.*

Contrary to the approach in the Euclidean case considered in [Rei08], the least-squares approximation $\widetilde{f}_{m}$ can in general not be chosen as an interpolation through the points $(x_{1},f(x_{1})),\ldots,(x_{n},f(x_{n}))$ . Lemma 3 in [Slo95] states that the classical interpolation formula

[TABLE]

holds for all continuous functions on the sphere if and only if the equal-weight cubature rule associated with the design $\mathbb{X}$ , which is exact for $f\in\mathscr{P}^{d}_{2L}$ , is also minimal in the sense that $n$ equals the dimension of $\mathscr{P}^{d}_{L}$ (see [Slo95], p. 242, for this definition of minimality). Now, also from [Slo95], Section 4.1, we report that spherical $t$ -designs with this property (such spherical $t$ -designs are also referred to as tight) only exist in few special cases. More precisely, in [BD79] it is shown that tight spherical designs do not exist for $d\geq 2$ and $L\geq 3$ . Therefore, the sample size $n$ is strictly larger than $\dim\mathscr{P}^{d}_{L}$ in general and $\widetilde{f}_{m}\lvert_{\mathbb{X}}\penalty 10000\ \neq f\lvert_{\mathbb{X}}$ . Consequently, the term $\Delta_{1}$ in Theorem 3.1 does not vanish. In the case $d=1$ considered in [Rei08], on the contrary, the term $\Delta_{1}$ is not present. In that case, a set of $2L+1$ equidistant design points (these points form a regular $2L+1$ -gon on the unit circle $\mathbb{S}^{1}$ ; see also Example 2.6 in [BB09]) defines a spherical $t$ -design for $t=2L$ and the least-squares approximation interpolates $f$ on the spherical $t$ -design. In the multivariate case with regression domain equal to $[0,1]^{d}$ , which is also treated in [Rei08], a product design approach can be chosen which inherits the interpolation property from the case $d=1$ . In this sense, the term $\Delta_{1}$ is a new ingredient appearing in the bound on the Le Cam distance due to the considered spherical framework of dimension $d\geq 2$ . The term $\Delta_{2}$ , as mentioned above, is standard. In contrast to the work [Rei08], where fine properties of the Fourier basis are used in order to bound this term, we will exploit recent results from [LW23] on approximation by (weighted) least squares polynomials on the sphere to bound $\Delta_{2}$ (and also $\Delta_{1}$ ).

Applying the bound on the Le Cam distance derived in Theorem 3.1 establishes asymptotic equivalence of the experiments $\mathfrak{F}_{n}^{d}$ and $\mathfrak{G}_{n}^{d}$ over Sobolev balls.

Theorem 3.7.

Assume that $\mathbb{X}=\{x_{1},\ldots,x_{n}\}$ is a spherical $t$ -design with $t\geq 2L$ . Then, for $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$ with $s>d/2$ ,

[TABLE]

In particular, choosing $\mathbb{X}$ as a spherical $t$ -design with $t=2L$ and cardinality $n$ of the minimal possible order $n\asymp L^{d}$ , yields

[TABLE]

and the experiments $\mathfrak{F}^{d}_{n}$ and $\mathfrak{G}^{d}_{n}$ are asymptotically equivalent as $n\to\infty$ .

3.2.2 Asymptotic equivalence over Besov balls

The representation of a regression function, for instance in model (1.1), as a spherical harmonic expansion as in (2.1) suffers from the drawback that spherical harmonics are spread all over the sphere. This leads to poor local performance of regression estimates relying on truncated spherical harmonic expansions. In order to address this issue, the article [NPW06a] introduced a class of localized tight frames on spheres of arbitrary dimension, which are referred to as needlets due to their excellent localization properties. In Figure 3, this essential difference between spherical harmonics and needlets is illustrated by opposing typical heatmaps of a spherical harmonic and a spherical needlet, respectively.

We give a brief sketch of the needlets’ construction since our arguments in the following rely on some fine properties of the needlet expansion of a function. Our presentation here is mainly based on the papers [Bal+09] and [Wan+17].

The first ingredient in the definition of needlets is a continuous and compactly supported function $h\colon[0,\infty)\to[0,\infty)$ which is referred to as a filter. We assume that $h$ satisfies

[TABLE]

For some of the properties stated below, the assumption $h\in C^{\infty}$ is more restrictive than necessary. Given (3.9), (3.10) is equivalent to the partition of unity property for $h^{2}$ ,

[TABLE]

For a filter function $h$ and $\tau\geq 0$ , we consider the filtered kernel $\kappa_{\tau,h}\colon\mathbb{S}^{d}\times\mathbb{S}^{d}\to\mathbb{R}$ defined by

[TABLE]

where $\langle\boldsymbol{\cdot},\boldsymbol{\cdot}\rangle$ denotes the standard inner product on $\mathbb{R}^{d+1}$ and $P_{\ell}^{(d+1)}$ the normalized Gegenbauer polynomial defined by

[TABLE]

with $P_{\ell}^{(\alpha,\beta)}$ the Jacobi polynomial of degree $\ell$ for $\alpha,\beta>-1$ .

The second ingredient in the construction of spherical needlets are cubature rules

[TABLE]

which are exact for functions $f\in\mathscr{P}^{d}_{2^{j+1}-1}$ , that is,

[TABLE]

The weights of these cubature rules are assumed to be positive but not necessarily equal (but in principle, the underlying cubature rules can itself be chosen as equal-weight spherical $t$ -designs as in Figure 3). Then, for any $j\in\mathbb{N}_{0}$ and $k=1,\ldots,N_{j}$ , spherical needlets $\psi_{j,k}\colon\mathbb{S}^{d}\to\mathbb{R}$ are, for $x\in\mathbb{S}^{d}$ , defined by

[TABLE]

or, equivalently, by

[TABLE]

Of course, this construction of needlet functions depends on the cubature points and their corresponding weights. However, it can be imposed that

[TABLE]

which will be assumed from now on. By construction, the $\psi_{j,k}$ are band-limited but not orthogonal. More precisely, $\psi_{j,k}$ is a polynomial of degree $2^{j}-1$ and it holds that

[TABLE]

Moreover, for any $p\in[1,\infty]$ ,

[TABLE]

The needlet coefficients $\beta_{j,k}$ of a function $f\colon\mathbb{S}^{d}\to\mathbb{R}$ are defined as

[TABLE]

and a function $f\in L^{2}(\mathbb{S}^{d})$ can be represented as

[TABLE]

By a slight abuse of notation, we denote the truncated needlet expansion including only the $\psi_{j,k}$ with $j\leq J$ by $f_{J}$ ,

[TABLE]

which can equivalently be written as

[TABLE]

where the new filter $H\colon[0,\infty)\to[0,\infty)$ is defined in terms of the needlet filter $h$ as follows:

[TABLE]

Membership to spherical Besov spaces can be characterized by means of the needlet coefficients of a function $f$ . More precisely, $f$ belongs to the Besov space $B^{s}_{r,q}(\mathbb{S}^{d})$ if and only if

[TABLE]

$\lVert\,\boldsymbol{\cdot}\,\rVert_{B_{r,q}^{s}(\mathbb{S}^{d})}$ is referred to as the Besov norm. Putting

[TABLE]

we have the equivalence

[TABLE]

of norms (see [NPW06], Chapter 5). The Sobolev spaces $H^{s}_{2}(\mathbb{S}^{d})$ introduced above coincide with the Besov spaces $B_{2,2}^{s}(\mathbb{S}^{d})$ [Bal+09]. We denote by $B_{r,q}^{s}(\mathbb{S}^{d},R)$ the ball of radius $R$ in the Besov space $B_{r,q}^{s}(\mathbb{S}^{d})$ . We now apply the bound on the Le Cam distance derived in Theorem 3.1 to establish asymptotic equivalence of the experiments $\mathfrak{F}^{d}_{n}$ and $\mathfrak{G}_{n}^{d}$ over such Besov balls. Denote with $\widetilde{f}_{J}$ the empirical needlet approximation defined by

[TABLE]

where

[TABLE]

for the set $\mathbb{X}=\{x_{1},\ldots,x_{n}\}$ of sampling points. In the following, we assume that $\mathbb{X}$ is a spherical $t$ -design for $t\geq 3\cdot 2^{J}$ (this assumption is slightly stronger than the one in Theorem 3.1). Of course, $\widetilde{f}_{J}$ is the empirical counterpart of $f_{J}$ obtained by replacing the $L^{2}(\mathbb{S}^{d})$ -inner product with its empirical analogue relying on the equal-weight cubature rule associated with the spherical design $\mathbb{X}$ . Equivalently, we can write

[TABLE]

see [Wan+17], Equations (36), (37), and (43). The maximal resolution level $J=J(n)$ will be chosen such that

[TABLE]

which coincides, except for a missing logarithmic term, with the choice of this truncation parameter in adaptive non-parametric estimation using needlets [Bal+09, Mon11].

Theorem 3.8.

Assume that $\mathbb{X}=\{x_{1},\ldots,x_{n}\}$ is a spherical $t$ -design with $t\geq 3\cdot 2^{J}$ , $J\in\mathbb{N}_{0}$ . Then, for $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ with $s>d/r$ ,

[TABLE]

In particular, choosing $\mathbb{X}$ as a spherical $t$ -design with $t=3\cdot 2^{J}$ and cardinality $n$ of the minimal possible order $n\asymp 2^{Jd}$ , yields

[TABLE]

and the experiments $\mathfrak{F}^{d}_{n}$ and $\mathfrak{G}^{d}_{n}$ are asymptotically equivalent as $n\to\infty$ .

4 Regression on random uniform designs

The aim of this section is to establish asymptotic equivalence between the Gaussian white noise model $\mathfrak{G}_{n}^{d}$ given by (3.2) and random uniform design regression with model equations

[TABLE]

where $X_{1},\ldots,X_{n}$ are i.i.d. $\sim\mathcal{U}(\mathbb{S}^{d})$ but all the other quantities are defined as in the fixed design regression model $\mathfrak{F}_{n}^{d}$ defined by (3.1). In this section, we restrict ourselves to the smoothness class $\Theta=H^{s}_{2}(\mathbb{S}^{d},R)$ . The corresponding statistical experiment is denoted by $\mathfrak{R}_{n}^{d}=\mathfrak{R}_{n}^{d}(\Theta)$ and the following result states the asymptotic equivalence of $\mathfrak{R}_{n}^{d}$ and $\mathfrak{G}_{n}^{d}$ under the same smoothness assumption $s>d/2$ as in the fixed design case.

Theorem 4.1.

Let $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$ with $s>d/2$ . Consider the random design regression experiment $\mathfrak{R}_{n}^{d}$ defined by (4.1). For all sufficiently large $n\in\mathbb{N}$ , let $L\in\mathbb{N}$ be maximal such that

[TABLE]

where $\kappa_{2}=(3\ln(1.5)-1)/6\approx 0.036$ . Then, for any $L_{0}\in\mathbb{N}$ with $L_{0}\leq L$ , we have¸

[TABLE]

In particular, choosing $L_{0}=L_{0}(n)$ and $L=L(n)$ with $L_{0}\leq L$ such that in addition

[TABLE]

implies that

[TABLE]

and the experiments $\mathfrak{R}^{d}_{n}$ and $\mathfrak{G}^{d}_{n}$ are asymptotically equivalent as $n\to\infty$ .

The proof of Theorem 4.1, which is given in Section 7.4, is (like the one of Theorem 3.1) constructive in the sense that it provides concrete Markov kernels that allow to transform observations between the considered experiments. The proof relies on separate calculations for low- and high-frequency coefficients and is similar to the proof of the analogous result in the multivariate Euclidean case given in [Rei08]. However, several additional technical tools (a generalization of the classical $QR$ decomposition, results from representation theory like Schur’s lemma, reverse Hölder inequalities for spherical harmonics, a Taylor series expansion of the matrix valued function $A\mapsto(I+A)^{1/2}$ ) are necessary. The interaction of these tools is certainly of independent interest and might be useful to establish further extensions of the present result in future work.

5 Asymptotic non-equivalence

We now show that the asymptotic equivalence results of Sections 3 and 4 and are essentially optimal in the sense that the conditions $s>d/2$ (for Sobolev balls) and $s>d/r$ (for Besov balls) cannot further be weakened. For this purpose it will turn out sufficient to reduce the framework to the case where the common parameter space $\Theta$ for both the regression and the Gaussian white noise experiment consists of two elements only. More precisely, we will construct subsets $\Theta_{n}^{\prime}\subset\Theta$ with $\lvert\Theta_{n}^{\prime}\rvert=2$ such that the observations in the regression model become indistinguishable over $\Theta_{n}^{\prime}$ whereas in the Gaussian white noise model the total variation distance between the two potential distributions is uniformly bounded (as a function of the sample size $n$ ) from below by a positive constant. From this, asymptotic non-equivalence between the two experiments can be followed by a standard argument. The construction of the hypotheses in the two experiments is inspired by the approach in [WW16] where the optimal recovery of smooth functions on spheres from function values was studied (especially, the ideas used in the proof of Lemma 3.5 of that reference turn out to be useful for the proof of the following result).

Theorem 5.1.

Consider $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$ with $s\leq d/2$ or $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ with $s\leq d/r$ . Let $\mathbb{X}$ be any (deterministic or random) point set with $\lvert\mathbb{X}\rvert\asymp n$ . Denote with $\mathfrak{G}_{n}^{d}$ the Gaussian white noise experiment given by (3.2) and with $\mathfrak{E}_{n}^{d}$ the regression experiment defined by (1.1) on the design $\mathbb{X}$ with regression function $f\in\Theta$ . Then,

[TABLE]

In particular, the experiments $\mathfrak{E}_{n}^{d}$ and $\mathfrak{G}_{n}^{d}$ are not asymptotically equivalent.

*Remark 5.2**.*

The statement of Theorem 5.1 is in coincidence with usual embedding theorems which state that $H_{2}^{s}(\mathbb{S}^{d})$ and $B^{s}_{r,q}(\mathbb{S}^{d})$ can be continuously embedded into the space $C(\mathbb{S}^{d})$ of continuous functions on the sphere provided that $s>d/2$ and $s>d/r$ , respectively (see [NPW06] and [Bal+09] for details). In this light, the requirements $s>d/2$ and $s>d/r$ are natural for any method based on function values. The proof of Theorem 5.1 even shows that asymptotic equivalence does not hold if one restricts the function spaces to functions having a finite series expansion (in the proof, we consider $\Theta_{n}^{\prime}\subset\mathscr{P}^{d}_{L}$ for some $L=L(n)$ with $L\ \to\infty$ ).

6 Discussion

We have proven that non-parametric regression with spherical regressors is asymptotically equivalent, in the sense of Le Cam, to a corresponding Gaussian white noise experiment. The results hold for both the fixed design case, where the sampling points form a spherical $t$ -design, and the random uniform design case. As special cases of function classes, for which asymptotic equivalence holds, Sobolev and Besov balls were considered and the smoothness assumptions imposed to establish asymptotic equivalence over these function spaces were shown to be sharp.

The derived results suggest that both spherical $t$ -designs and random uniform designs are a good choice of sampling points in non-parametric regression as the resulting statistical experiment is asymptotically equivalent to the Gaussian white noise model which is usually regarded as the simplest model of the form $\texttt{data}=\texttt{signal}+\texttt{noise}$ . The established symptotic equivalences suggest that known results for the Gaussian white noise model on the sphere (for instance, the sharp minimax results from [Kle99] already cited in the introduction) are valid also for both the fixed and the random design regression framework with spherical regressors. Besides estimation procedures as considered in [Kle99] also (sharp) non-parametric testing rates and confidence bands might now be transferable from the idealized Gaussian white noise model to the more realistic regression models.

The interpretation of the considered designs as a good choice of sampling points is in line with existing results on optimal designs in finite-dimensional linear regression models. For truncated expansions in spherical harmonics, the papers [DMP05] (for the two-dimensional sphere) and [Det+19] (for spheres of arbitrary dimension) identify the uniform distribution as an optimal design distribution with respect to the $\Phi_{p}$ -criteria introduced by Kiefer in [Kie74]. Such a design given by an absolutely continuous distribution cannot be implemented directly in practice. In [Det+19], Remark 3.1, the authors mention spherical $t$ -designs as a potential remedy to address this issue. Even more recently, [Hai24] discusses the use of spherical $t$ -designs as optimal designs for the special case of the two-dimensional sphere. So-called $\lambda$ -designs, which extend the notion of spherical $t$ -designs to Riemannian manifolds, are identified as optimal designs for regression on Lie groups in [CDK26].

In this work, we have restricted ourselves to the case of spheres of arbitrary dimension for two reasons: (i) the restriction to this special case keeps the technical jargon to a minimum but already incorporates all the essential ideas, (ii) the case of regression with spherical regressors, especially for the two-dimensional sphere $\mathbb{S}^{2}$ , is certainly the most relevant one in applications. We conjecture that main parts of our analysis can be carried over to non-parametric regression on more general compact Riemannian manifolds. Regression models on manifolds and the presumably equivalent Gaussian white noise model are, for instance, considered in [CKP14] in the context of non-parametric Bayesian estimation. Another work in this direction is [KNP11] where confidence bands for needlet density estimators on compact homogeneous manifolds have been derived.

7 Proofs

7.1 Proof of Theorem 3.1

We first derive a uniform upper bound on $\Delta(\mathfrak{F}_{n}^{d},\widetilde{\mathfrak{G}}^{d}_{n})$ , that is, the Le Cam distance between the regression experiment $\mathfrak{F}_{n}^{d}$ and the intermediate experiment $\widetilde{\mathfrak{G}}^{d}_{n}$ . For this, we first consider the deficiency $\delta(\mathfrak{F}_{n}^{d},\widetilde{\mathfrak{G}}^{d}_{n})$ . Assume that an observation $Z=(Z_{1},\ldots,Z_{n})^{\top}$ from the regression experiment is given. Setting $\langle v,\psi_{j}\rangle_{n}=n^{-1}\sum_{i=1}^{n}v_{i}\psi_{j}(x_{i})$ for $v=(v_{1},\ldots,v_{n})^{\top}\in\mathbb{R}^{n}$ , we define the random process

[TABLE]

The random variables $\langle\varepsilon,\psi_{j}\rangle_{n}=n^{-1}\sum_{i=1}^{n}\varepsilon_{i}\psi_{j}(x_{i})$ are Gaussian with mean zero. Combining the inclusion $S\cdot S\subseteq\mathscr{P}^{d}_{L}\cdot\mathscr{P}^{d}_{L}\subseteq\mathscr{P}^{d}_{2L}$ with the assumption that $\mathbb{X}$ is a spherical $t$ -design for $t\geq 2L$ implies that

[TABLE]

By linearity, this shows that the process $\zeta_{m}$ is standard Gaussian white noise on $S$ . By adding Gaussian white noise, scaled by the same factor $\sigma/\sqrt{n}$ , on the $L^{2}(\mathbb{S}^{d})$ -orthogonal complement of $S$ , we obtain the process

[TABLE]

where $\zeta$ denotes a standard Gaussian white noise process. In differential notation, the process $\widetilde{Z}$ can equivalently be written as

[TABLE]

showing that

[TABLE]

Conversely, given the continuous observation (3.6) from the intermediate experiment and choosing an orthonormal basis $\{\varphi_{1},\ldots,\allowbreak\varphi_{\dim S}\}$ of $S$ , the random vector $\widehat{\theta}=(\widehat{\theta}_{1},\ldots,\widehat{\theta}_{\dim S})^{\top}$ with

[TABLE]

follows a multivariate Gaussian distribution with mean

[TABLE]

and covariance matrix $\sigma^{2}n^{-1}I_{\dim S}$ . Consider the $n\times\dim S$ design matrix $X=(\varphi_{j}(x_{i}))$ . The vector $\widehat{Z}=(\widehat{Z}_{1},\ldots,\widehat{Z}_{n})^{\top}=X\widehat{\theta}$ of fitted values at the sampling points $x_{1},\ldots,x_{n}$ follows a multivariate Gaussian with mean vector

[TABLE]

and covariance matrix $\sigma^{2}n^{-1}XX^{\top}$ . Consider a mean zero multivariate Gaussian random vector $\xi$ with covariance matrix

[TABLE]

independent of $\widehat{Z}$ (the matrix $\sigma^{2}(I_{n}-X(X^{\top}X)^{-1}X^{\top})$ corresponds to the covariance matrix of the residuals in the linear model given by $Y=X\theta+\varepsilon$ and is therefore positive semi-definite; the fact that the design points form a spherical $t$ -design then implies that $(X^{\top}X)^{-1}=n^{-1}I_{\dim S}$ ). It follows that the vector $Z^{\prime}=\widehat{Z}+\xi$ follows a multivariate Gaussian with mean (7.2) and covariance matrix $\sigma^{2}I_{n}$ . Hence, the deficiency $\delta(\widetilde{\mathfrak{G}}^{d}_{n},\mathfrak{F}_{n}^{d})$ can be bounded by taking the supremum over all $f$ of the total variation distance between the multivariate Gaussian vectors $Z$ and $Z^{\prime}$ ,

[TABLE]

see, for instance, [NO24]. Combining (7.1) and (7.3) yields

[TABLE]

Next, we derive a uniform upper bound on the Le Cam distance $\Delta(\widetilde{\mathfrak{G}}^{d}_{n},\mathfrak{G}^{d}_{n})$ between the intermediate experiment $\widetilde{\mathfrak{G}}^{d}_{n}$ and the Gaussian white noise experiment $\mathfrak{G}^{d}_{n}$ . Denote by $\mathbf{P}_{f}$ the distribution of the process $Z$ in the experiment $\mathfrak{G}^{d}_{n}$ , and by $\mathbf{P}_{\widetilde{f}_{m}}$ the distribution of the process $\widetilde{Z}$ in the experiment $\widetilde{\mathfrak{G}}^{d}_{n}$ , respectively. Then, (2.5) yields

[TABLE]

from which it follows that

[TABLE]

By combining (7.4) and (7.5) the bound for $\Delta(\mathfrak{F}^{d}_{n},\mathfrak{G}^{d}_{n})$ announced in the theorem follows from the triangle inequality for the Le Cam distance.

7.2 Proof of Theorem 3.7

In the following, we treat the terms $\Delta_{1}$ and $\Delta_{2}$ separately under the assumption that $\mathbb{X}=\{x_{1},\ldots,x_{n}\}$ is a spherical $t$ -design with $t\geq 2L$ .

Bound for $\Delta_{1}$ when $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$

Recall that $\widetilde{f}_{m}$ is defined by (3.8) in the statement of Theorem 3.7. We have

[TABLE]

Putting

[TABLE]

for $j\in\mathbb{N}$ we can write

[TABLE]

and the series on the right-hand side converges uniformly for $f\in H_{2}^{s}(\mathbb{S}^{d},R)$ when $s>d/2$ . Using Lemma 7.1 with $p_{0}=p=2$ (note that $A_{j}f\in\mathscr{P}^{d}_{2^{j}L}$ by definition), we obtain for $f\in H_{2}^{s}(\mathbb{S}^{d},R)$ that

[TABLE]

As a consequence, we obtain by means of the triangle inequality that, for $s>d/2$ ,

[TABLE]

It follows that for $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$ , the term $\Delta_{1}$ in Theorem 3.1 can be bounded as

[TABLE]

Bound for $\Delta_{2}$ when $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$

Under the assumptions of Theorem 3.1, applying [LW23], Theorem 1.2, Equation (1.6), yields directly that, for $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$ with $s>d/2$ ,

[TABLE]

Combining the bounds (7.6) and (7.7) implies the statement of Theorem 3.7.

7.3 Proof of Theorem 3.8

As for the Sobolev case, the proof of Theorem 3.8 relies on finding suitable upper bounds for the quantities $\Delta_{1}$ and $\Delta_{2}$ in the proof of Theorem 3.1. In the following analysis, we assume that $r\leq 2$ . The result of the theorem equally holds true for the case $r\geq 2$ and directly follows from the case $r=2$ via the Besov embedding $B^{s}_{r,q}(\mathbb{S}^{d})\subseteq B^{s}_{2,q}(\mathbb{S}^{d})$ for $r\geq 2$ (see [Bal+09], Theorem 5).

Bound for $\Delta_{1}$ when $\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)$

With $\widetilde{f}_{J}$ taking the role of $\widetilde{f}_{m}$ in Theorem 3.1, that is, setting $m=\sum_{j=0}^{J}N_{j}$ , we obtain

[TABLE]

To bound $\lVert f-\widetilde{f}_{J}\rVert_{n}$ uniformly for $f\in\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)$ , we use the decomposition

[TABLE]

where we have used that $\mathbb{X}$ is a spherical $t$ -design with $t\geq 2\cdot(2^{J}-1)$ . Below, in the analysis of $\Delta_{2}$ , we will show that

[TABLE]

From [Bal+09], p. 3383, we obtain

[TABLE]

Finally, by Lemma 7.1 and again the estimate from [Bal+09], p. 3383, we obtain that

[TABLE]

Combining (7.8), (7.9), and (7.10), we obtain

[TABLE]

Bound for $\Delta_{2}$ when $\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)$

We now derive a bound for $\lVert f-\widetilde{f}_{J}\rVert_{L^{2}(\mathbb{S}^{d})}$ . Using Equation (3.15), we obtain

[TABLE]

where we use both the fact that the equal-weight cubature rule associated with the spherical $t$ -design $\mathbb{X}$ is exact for polynomials of degree $\leq 3\cdot 2^{J}$ and identity (3.12). It follows that

[TABLE]

From [Bal+09], we immediately obtain that the first term on the right-hand side can be bounded as

[TABLE]

Let us now consider the second term. [Wan+17], Theorem 3.3, yields that for $H\in C^{\infty}$ the inequality

[TABLE]

holds with a constant $C=C(d,H)$ that does neither depend on $\tau$ nor $x$ . Using this inequality yields

[TABLE]

where we use Lemma 7.1 twice (in each case with $p_{0}=2$ but first with $p=1$ and then with $p=2$ ). Hence,

[TABLE]

Combining the bounds (7.12) and (7.13), we obtain, uniformly for $f\in\Theta$ ,

[TABLE]

which implies that

[TABLE]

Combining the bounds (7.11) and (7.14) finishes the proof of Theorem 3.8.

7.4 Proof of Theorem 4.1

As in the fixed design case we denote with $\langle\boldsymbol{\cdot},\boldsymbol{\cdot}\rangle_{n}$ the empirical inner product defined by

[TABLE]

and write $\lVert\,\boldsymbol{\cdot}\,\rVert_{n}$ for the associated norm. $\Pi^{n}_{V}$ denotes the orthogonal projection on $V$ with respect to $\langle\boldsymbol{\cdot},\boldsymbol{\cdot}\rangle_{n}$ .

We first notice that the random design regression experiment $\mathfrak{R}_{n}^{d}$ given by Equation (4.1) is asymptotically equivalent to the experiment $\check{\mathfrak{R}}_{n}^{d}$ with (4.1) replaced by

[TABLE]

Indeed, by conditioning on the design $\mathbb{X}$ , we have for $Z=(Z_{1},\ldots,Z_{n})^{\top}$ and $\check{Z}=(\check{Z}_{1},\ldots,\check{Z}_{n})^{\top}$ the bound

[TABLE]

Using the conditioning property for $f$ -divergences (see [PW24], p. 117) and Jensen’s inequality, we obtain, uniformly for $f\in\Theta$ ,

[TABLE]

Hence,

[TABLE]

Analogously, we bound the Le Cam distance between the Gaussian white noise experiment $\mathfrak{G}_{n}^{d}$ and the truncated experiment $\widetilde{\mathfrak{G}}_{n}^{d}$ given by

[TABLE]

In this case, a direct application of inequality (2.5) yields that

[TABLE]

After this reduction to two experiments with truncated parameter, it remains to find a bound for the Le Cam distance $\Delta(\check{\mathfrak{R}}^{d}_{n},\widetilde{\mathfrak{G}}_{n}^{d})$ , that is, we can assume without loss of generality that $f\in\mathscr{P}_{L}^{d}$ for the rest of the proof without further reference.

In order to obtain the bound for $\Delta(\check{\mathfrak{R}}^{d}_{n},\widetilde{\mathfrak{G}}_{n}^{d})$ we consider three further intermediate experiments (denoted by $\mathfrak{I}_{n,1}^{d}$ , $\mathfrak{I}_{n,2}^{d}$ , and $\mathfrak{I}_{n,3}^{d}$ below). To state these experiments, we first introduce some notation. Define the event

[TABLE]

Similar to the proof of Theorem 3.1, we denote with $X\in\mathbb{R}^{n\times D}$ the design matrix $(Y_{j}(X_{i}))$ where $j=\iota(\ell,m)$ for the enumeration function introduced in Section 2.1 and $Y_{1},\ldots,Y_{D}$ are spherical harmonics up to resolution level $L$ , that is, $D=\iota(L,N^{d}_{L})$ and $Y_{1},\ldots,Y_{D}$ are an $L^{2}(\mathbb{S}^{d})$ -orthonormal basis of $\mathscr{P}^{d}_{L}$ . If the matrix $X\in\mathbb{R}^{n\times D}$ has full column rank (this is always the case on $\Omega_{L}$ ), we apply the generalized thin $QR$ decomposition (described in detail in Section 7.6.3) to obtain a matrix $Q\in\mathbb{R}^{n\times D}$ with orthonormal columns and an upper triangular block matrix $R$ such that

[TABLE]

Here, the choice of the blocks is as in Proposition 7.3, that is, the diagonal blocks are in one-to-one correspondence with the eigenspaces $\mathscr{H}^{d}_{\ell}$ , $\ell=1,\ldots,L$ . We define functions $Y_{1}^{n},\ldots,Y_{D}^{n}$ by

[TABLE]

The fact that $(R^{-1})^{\top}$ is a block lower triangular matrix implies that $Y_{j}^{n}\in\mathscr{P}^{d}_{L}$ for $j=1,\ldots,D$ . Consider the mapping $T\colon\mathscr{P}^{d}_{L}\to\mathscr{P}^{d}_{L}$ defined by $TY_{j}=Y_{j}^{n}$ for $j=1,\ldots,D$ . Since the matrix $Q$ in the generalized thin $QR$ decomposition has orthonormal columns, $Y_{1}^{n},\ldots,Y_{D}^{n}$ are orthonormal with respect to the empirical scalar product $\langle\boldsymbol{\cdot},\boldsymbol{\cdot}\rangle_{n}$ and $T\colon(\mathscr{P}^{d}_{L},\lVert\,\boldsymbol{\cdot}\,\rVert_{L^{2}(\mathbb{S}^{d})})\to(\mathscr{P}^{d}_{L},\lVert\,\boldsymbol{\cdot}\,\rVert_{n})$ is an isometry. More precisely, we have

[TABLE]

where $R_{k,j}$ is the entry of $R$ in the $k$ -th row and $j$ -th column.

Set $D_{0}=\iota(L_{0},N^{d}_{L_{0}})$ for the intermediate resolution level $L_{0}\leq L$ . Departing from observations in the random uniform design regression experiment $\check{\mathfrak{R}}_{n}^{d}$ with regression function $f\in\mathscr{P}^{d}_{L}\cap\Theta$ , we define, given the design $\mathbb{X}$ , a first intermediate continuous experiment $\mathfrak{I}_{n,1}^{d}$ with observation $\widetilde{Z}_{1}$ given by

[TABLE]

where $\zeta_{1}$ is standard Gaussian white noise on the complement of $\mathscr{P}_{L}^{d}$ . Note that the observations $\check{Z}=(\check{Z}_{1},\ldots,\check{Z}_{n})$ in the experiment $\check{\mathfrak{R}}_{n}^{d}$ and $\widetilde{Z}_{1}$ in the experiment $\mathfrak{I}_{n,1}^{d}$ can be transferred into one another provided that $X$ has full column rank. Obviously, (7.16) defines $\widetilde{Z}_{1}$ in terms of $\check{Z}$ under this assumption. Vice versa, observations following the same distribution as $\check{Z}$ can be generated from (7.16) by transforming the high-frequency part by application of the mapping $T$ (that is, $Y_{j}$ is replaced with $Y_{j}^{n}$ ) and then using the same argument as in the proof of Theorem 3.1. Defining Markov kernels between the underlying measurable spaces arbitrarily on the null set where $X$ does not have full column rank then implies $\Delta(\check{\mathfrak{R}}_{n}^{d},\mathfrak{I}_{n,1}^{d})=0$ .

In addition to $\mathfrak{I}_{n,1}^{d}$ , we define two further intermediate experiments $\mathfrak{I}_{n,2}^{d}$ and $\mathfrak{I}_{n,3}^{d}$ , that are both defined, conditional on the design $\mathbb{X}$ , by observations $\widetilde{Z}_{2}$ and $\widetilde{Z}_{3}$ defined as follows:

[TABLE]

Here, both $\zeta_{2}$ and $\zeta_{3}$ denote standard Gaussian white noise on $L^{2}(\mathbb{S}^{d})$ . In the following, we will bound the Le Cam distance $\Delta(\check{\mathfrak{R}}_{n}^{d},\widetilde{\mathfrak{G}}_{n}^{d})$ by means of the triangle inequality,

[TABLE]

In order to bound these three terms, it is sufficient to work on the event $\Omega_{L}$ , since by (7.26) we have the bound

[TABLE]

(of course, the terms $\Delta(\mathfrak{I}_{n,2}^{d},\mathfrak{I}_{n,3}^{d})$ and $\Delta(\mathfrak{I}_{n,3}^{d},\widetilde{\mathfrak{G}}_{n}^{d})$ can be treated analogously). It remains to find appropriate bounds for the terms

[TABLE]

and

[TABLE]

which will finish the proof. Before we consider these three terms separately, we remember that the transformations used to define $\widetilde{Z}_{1}$ , $\widetilde{Z}_{2}$ , and $\widetilde{Z}_{3}$ are only well-defined if $X$ has full column rank. As already mentioned above, this condition holds true on the event $\Omega_{L}$ . If $X$ does not have full column rank, one can define the Markov kernels that transform between the considered experiments arbitrarily.

Bound on $\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{1}|\mathbb{X}),\mathcal{L}(\widetilde{Z}_{2}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]$

Following the notation introduced in Section 7.6.3, we denote with $X_{1:n,\underline{1}:\underline{L}_{0}}$ the submatrix of $X$ consisting only of the first $D_{0}$ columns. Note that $\widetilde{Z}_{1}$ can be written as

[TABLE]

where the mean zero vector $\beta^{\prime}=(\beta^{\prime}_{1},\ldots,\beta^{\prime}_{D_{0}})^{\top}$ is defined by

[TABLE]

for $\varepsilon=(\varepsilon_{1},\ldots,\varepsilon_{n})^{\top}\sim\mathcal{N}(0,1)^{\otimes n}$ and has covariance matrix

[TABLE]

The processes $\widetilde{Z}_{1}$ and $\widetilde{Z}_{2}$ have the same mean but different covariance structure, and applying [DMR18], Theorem 1.1, combined with Equation (2) from the same reference, we obtain that

[TABLE]

where $\lVert\,\boldsymbol{\cdot}\,\rVert_{\mathrm{F}}$ denotes the Frobenius norm of a matrix. Taking expectations and using the addition formula (2.3) we obtain that

[TABLE]

Bound on $\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{2}|\mathbb{X}),\mathcal{L}(\widetilde{Z}_{3}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]$

Note that on the event $\Omega_{L}$ we have that

[TABLE]

By combining this bound with Equation (2.5), we have

[TABLE]

where $v=(f(X_{1}),\ldots,f(X_{n}))^{\top}-(\Pi^{d}_{L_{0}}f(X_{1}),\ldots,\Pi^{d}_{L_{0}}f(X_{n}))^{\top}$ . Using the addition formula (2.3), we obtain

[TABLE]

from which we conclude that

[TABLE]

Bound on $\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{3}|\mathbb{X}),\mathcal{L}(\widetilde{Z}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]$

We have

[TABLE]

By (7.15), the matrix corresponding to the mapping $T^{-1}$ (in terms of the spherical harmonics $Y_{1},\ldots,Y_{D}$ ) is the block upper triangular matrix $R$ which implies that

[TABLE]

Hence,

[TABLE]

Combining the identity

[TABLE]

for $\ell,\ell^{\prime}\in\mathbb{N}_{0}$ , $\ell,\ell^{\prime}\leq L$ , with Proposition 7.3 yields that

[TABLE]

We decompose

[TABLE]

The Cauchy-Schwarz inequality yields

[TABLE]

Combining the Cauchy-Schwarz inequality, the estimate $(a\pm b)^{2}\leq 2a^{2}+2b^{2}$ , and the fact that $T$ is an isometry yields that

[TABLE]

Hence, by combining the last estimate with Equation (7.26), we get

[TABLE]

Identifying the projection $\Pi_{\mathscr{H}^{d}_{\ell}}f$ with the coefficient vector

[TABLE]

we can write

[TABLE]

where the matrix $M_{\ell}\in\mathbb{R}^{N^{d}_{\ell}\times N^{d}_{\ell}}$ is defined as

[TABLE]

On $\Omega_{L}$ we have

[TABLE]

which implies that the spectrum of $M_{\ell}$ is contained in the interval $[0,3/2]$ . Consequently, the spectrum of $M_{\ell}-I_{N^{d}_{\ell}}$ is contained in $[-1,1/2]$ on $\Omega_{L}$ . Using the Taylor series expansion of the matrix function $A\mapsto(I+A)^{1/2}$ (which converges for $A$ with spectrum contained in $[-1,1]$ ; see [Hig08], Theorem 4.7), we obtain that

[TABLE]

By taking expectations we obtain

[TABLE]

First, since $\mathbf{E}X_{1:n,\underline{\ell}}^{\top}X_{1:n,\underline{\ell}}=nI_{N^{d}_{\ell}}$ , we obtain

[TABLE]

Second, by (7.26), Proposition 7.4, and Proposition 7.5, (b),

[TABLE]

where, according to Proposition 7.4, we can take

[TABLE]

Hence, for $\ell\leq L$ ,

[TABLE]

Third, the bound

[TABLE]

yields

[TABLE]

from which we obtain that

[TABLE]

We have

[TABLE]

Note that the non-zero eigenvalues of $X_{1:n,\underline{\ell}}X_{1:n,\underline{\ell}}^{\top}$ coincide with those of $X_{1:n,\underline{\ell}}^{\top}X_{1:n,\underline{\ell}}$ and that on the event $\Omega_{L}$ the positive eigenvalues of $X_{1:n,\underline{\ell}}^{\top}X_{1:n,\underline{\ell}}$ are bounded by $3n/2$ . It follows that

[TABLE]

By Proposition 7.5, (a), we thus obtain

[TABLE]

By combining all the obtained estimates, we have

[TABLE]

Thus

[TABLE]

finishing the proof of the theorem.

7.5 Proof of Theorem 5.1

For the experiments $\mathfrak{E}_{n}^{d}$ and $\mathfrak{G}_{n}^{d}$ , we reduce the common parameter set $\Theta$ to $\Theta_{n}^{\prime}=\{f_{n,1},f_{n,2}\}$ for suitably defined $f_{n,1},f_{n,2}\in\Theta$ . For any $n\in\mathbb{N}$ and both $\Theta=H^{s}_{2}(\mathbb{S}^{d},R)$ or $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ , we take $f_{n,1}\equiv 0$ (which trivially belongs to both of the smoothness classes $\Theta$ considered). In order to define $f_{n,2}$ , we first choose an integer $L$ such that $2n\leq\dim\mathscr{P}^{d}_{L}\leq Cn$ for some suitable constant $C>2$ . Consider the linear subspace $W_{n}$ of $\mathscr{P}_{L}^{d}$ defined by

[TABLE]

We have

[TABLE]

Now, by [DW13], Proposition 3.5, there exists a function $h_{n}\in W_{n}$ such that

[TABLE]

for all $p\in[1,\infty]$ . Based on these preliminaries we now construct $f_{2,n}\in\Theta$ such that

[TABLE]

Case $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$

We make the ansatz

[TABLE]

with a constant $c>0$ independent of $n$ the value of which will be specified now. Since $f_{2,n}\in W_{n}\subseteq\mathscr{P}_{L}^{d}$ , we have by the choice of $L$ combined with (7.17) that

[TABLE]

showing that $f_{2,n}\in H_{2}^{s}(\mathbb{S}^{d},R)$ for $c$ sufficiently small. Moreover,

[TABLE]

Case $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ with $r\geq 2$

As in the previous case, we put

[TABLE]

with a constant $c>0$ independent of $n$ to be chosen appropriately. Let $m\in\mathbb{N}$ be such that $2^{m-1}\leq L<2^{m}$ . Note that $E_{2^{j}}(f_{2,n},r)\leq\lVert f_{2,n}\rVert_{L^{r}(\mathbb{S}^{d})}$ for any $j\in\mathbb{N}_{0}$ and $E_{2^{j}}(f_{2,n},r)=0$ for $j\geq m$ . Then, using the equivalence (3.14) of norms, we obtain

[TABLE]

Hence $f_{2,n}\in B^{s}_{r,q}(\mathbb{S}^{d},R)$ provided that $c$ is chosen sufficiently small. Moreover, as in the previous case,

[TABLE]

Case $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ with $r<2$

Let $x_{n}^{\ast}\in\mathbb{S}^{d}$ be such that $\lVert h_{n}\rVert_{L^{\infty}(\mathbb{S}^{d})}=\lvert h_{n}(x_{n}^{\ast})\rvert$ , and put

[TABLE]

with $\kappa_{L,H}$ as defined by (3.11) and (3.13). As in the previous cases, the constant $c>0$ has to be chosen sufficiently small and independent of $n$ . On the one hand

[TABLE]

on the other hand

[TABLE]

and hence $\kappa_{L,H}(x_{n}^{\ast},x_{n}^{\ast})\asymp L^{d}$ . We have

[TABLE]

By [NPW06], Proposition 2.5, we obtain that, for $1\leq p<\infty$ ,

[TABLE]

where we used (7.17) and [Wan+17], Theorem 3.3, in the last estimate. Using (7.19) and [NPW06], Proposition 2.5, again, we have

[TABLE]

Combining the last estimate with (7.20), we obtain

[TABLE]

also for $1\leq p<\infty$ . Using (7.21), we first obtain with $m$ being chosen as in the previous case that

[TABLE]

hence $f_{2,n}\in B^{s}_{r,q}(\mathbb{S}^{d},R)$ for $c$ sufficiently small. Moreover, also from (7.21) we directly obtain

[TABLE]

which is (7.18) for $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ and $r<2$ .

Having established (7.18) in all three cases of interest, we can now derive (5.1). By the very definition of $W_{n}$ we have $f_{2,n}\lvert_{\mathbb{X}}\penalty 10000\ \equiv 0$ , which trivially implies that

[TABLE]

for any $n$ . Combining (7.18) with (2.5), however, shows that

[TABLE]

for all the considered cases. [RSH19], Lemma 1, states that

[TABLE]

Plugging (7.22) and (7.23) into (7.24) implies that $\delta(\mathfrak{E}_{n}^{d},\mathfrak{G}_{n}^{d})\gtrsim 1$ and hence the asymptotic non-equivalence of $\mathfrak{E}_{n}^{d}$ and $\mathfrak{G}_{n}^{d}$ .

7.6 Auxiliary results

7.6.1 A consequence of the Marcinkiewicz-Zygmund condition

Our asymptotic equivalence results for specific function classes stated in Sections 3.2.1 and 3.2.2 rely on recent contributions concerning numerical integration on spheres, for instance, from [LW23]. The main assumption in [LW23] is the validity of a certain Marcinkiewicz-Zygmund condition on the cubature points (see [LW23], Equation (1.3)). The following technical lemma, which is used in the proofs of Theorems 3.7 and 3.8, is essentially based on [Dai06], Theorem 2.1. Our formulation is slightly closer to the version stated in [LW23], Lemma 3.1. The condition (7.25) in the lemma is always satisfied for spherical $t$ -designs when taking $p_{0}=2$ , $t\geq 2L_{0}$ , $C_{0}=1$ (in fact, in this case even equality holds in (7.25)). The assertion of Lemma 7.1 then allows to bound the value of the cubature formula (which is exact on $\mathscr{P}^{d}_{L_{0}}$ ) from above by the target integral also for truncated spherical harmonics expansions where the cubature is not exact any more (the more coefficients are involved, the less accurate the bound becomes). This kind of bound is used in the proofs of Theorems 3.7 and 3.8, respectively, to control the empirical norms appearing in the term $\Delta_{1}$ of Theorem 3.1.

Lemma 7.1.

Let $\mathbb{X}=\{x_{1},\ldots,x_{n}\}\subset\mathbb{S}^{d}$ and $\omega_{1},\ldots,\omega_{n}>0$ be weights satisfying, for some $L_{0}\in\mathbb{N}_{0}$ and some $p_{0}\in(0,\infty)$ ,

[TABLE]

where $C_{0}>0$ is a numerical constant. Then, for any $p\in(0,\infty)$ and any integer $L\geq L_{0}$ ,

[TABLE]

with a numerical constant $C=C(d,p)>0$ depending only on $d$ and $p$ .

7.6.2 Bound for $\mathbf{P}(\Omega_{L}^{\complement})$

The defining property of the event $\Omega_{L}$ ,

[TABLE]

is equivalent to

[TABLE]

where $X$ denotes the design matrix defined in the Proof of Theorem 4.1. Applying Theorem 1 from [CDL13] (taking into account the corrected numerical constants from [CDL18]) with $\delta=1/2$ (leading to the choice $c_{1/2}=(3\ln(3/2)-1)/2>0$ in the statement of that theorem) and choosing the maximal $L$ such that for $\kappa_{r}=(3\ln(3/2)-1)/(2+2r)$

[TABLE]

we obtain the estimate

[TABLE]

In Theorem 4.1 and its proof, we choose $r=2$ which leads to $\kappa_{2}\approx 0.036$ .

7.6.3 Generalized thin $QR$ decomposition

For the proof of Theorem 4.1 we need a generalization of the classical thin $QR$ decompostion of a rectangular matrix with full column rank (see [GVL13], Theorem 5.2.3). Since we are not aware of a reference for this kind of generalization, we state it here in full detail (a proof can be obtained by an obvious adjustment of the one in the classical case). Recall that a square matrix $A\in\mathbb{R}^{n\times n}$ is called positive definite if the following two conditions hold:

(i)

$A$ is symmetric, that is, $A=A^{\top}$ , 2. (ii)

$x^{\top}Ax>0$ for all $x\in\mathbb{R}^{n}$ with $x\neq 0\in\mathbb{R}^{n}$ .

Here and in the proof of Theorem 4.1 (given in Section 7.4), we will use underlined indices like $\underline{\ell}$ in order to refer to the indices belonging to the block with index $\ell$ . For instance, in the following theorem $\underline{\ell}$ is a shorthand for the indices running from $\sum_{j=1}^{\ell-1}N_{j}+1$ to $\sum_{j=1}^{\ell}N_{j}$ .

Theorem 7.2 (Generalized thin $QR$ decomposition).

Suppose $X\in\mathbb{R}^{n\times D}$ has full column rank. Let $N_{1},\ldots,N_{L}\in\mathbb{N}$ such that $\sum_{\ell=1}^{L}N_{\ell}=D$ . Then, the generalized thin $QR$ decomposition

[TABLE]

is unique where $Q\in\mathbb{R}^{n\times D}$ has orthonormal columns and $R=(R_{\underline{\ell},\underline{\ell^{\prime}}})\in\mathbb{R}^{D\times D}$ with $R_{\underline{\ell},\underline{\ell^{\prime}}}\in\mathbb{R}^{N_{\ell}\times N_{\ell^{\prime}}}$ satisfies the following properties:

(i)

$R_{\underline{\ell},\underline{\ell}^{\prime}}=0\in\mathbb{R}^{N_{\ell}\times N_{\ell^{\prime}}}$ * if $\ell>\ell^{\prime}$ and* 2. (ii)

$R_{\underline{\ell},\underline{\ell}}$ * is positive definite.*

Proof.

The existence part of Theorem 7.2 can be proven in a constructive way, for instance by adapting the classical Gram-Schmidt algorithm (see [GVL13], Section 5.2.7) in a suitable manner. The resulting algorithm is stated in Algorithm 1. For a positive definite matrix $A\in\mathbb{R}^{D\times D}$ , we call a decomposition

[TABLE]

where $\Lambda=(\Lambda_{\underline{\ell},\underline{\ell}^{\prime}})\in\mathbb{R}^{D\times D}$ with blocks $\Lambda_{\underline{\ell},\underline{\ell}^{\prime}}\in\mathbb{R}^{N_{\ell}\times N_{\ell^{\prime}}}$ a (generalized) Cholesky decomposition if the following properties are satisfied:

(i)

$\Lambda_{\underline{\ell},\underline{\ell}^{\prime}}=0\in\mathbb{R}^{N_{\ell}\times N_{\ell^{\prime}}}$ if $\ell<\ell^{\prime}$ , 2. (ii)

$\Lambda_{\underline{\ell},\underline{\ell}}$ is positive definite.

As for the usual Cholesky decomposition (which corresponds to the case $L=D$ and $N_{1}=\ldots=N_{D}=1$ ) the generalized Cholesky decomposition is uniquely determined given $L$ and $N_{1},\ldots,N_{L}$ and the factor $\Lambda$ is referred to as the Cholesky factor. Note that $\Lambda=R^{\top}$ for the factor $R$ in the generalized thin $QR$ decomposition $X=QR$ is the Cholesky factor in the generalized Cholesky factorization $P=\Lambda\Lambda^{\top}$ of the symmetric positive definite matrix $P=X^{\top}X$ (of course, for the same choice of $L$ and $N_{1},\ldots,N_{L}$ ). The uniqueness of the generalized thin $QR$ decomposition follows from the uniqueness of the Cholesky factor $\Lambda=R^{\top}$ combined with $Q=XR^{-1}$ . ∎

In the following proposition we state key stochastic properties of the block upper triangular matrix $R$ in the generalized thin $QR$ decomposition (7.27).

Proposition 7.3.

Let $R$ be defined as the block upper triangular matrix $R$ with positive definite diagonal blocks in the generalized thin $QR$ decomposition $X=QR$ of the design matrix $X=(Y_{j}(X_{i}))\in\mathbb{R}^{n\times D}$ , where $Y_{1},\ldots,Y_{D}$ are all the spherical harmonics up to the resolution level $L$ (consequently, $D=\dim\mathscr{P}^{d}_{L}$ ), the numbers $N_{\ell}$ in the generalized thin $QR$ decomposition are chosen as $N_{\ell}=N_{\ell}^{d}$ , and $X_{1},\ldots,X_{n}$ are i.i.d. $\sim\mathcal{U}(\mathbb{S}^{d})$ . Then,

(a)

$\mathbf{E}[R_{\underline{\ell},\underline{\ell}}]=\lambda_{\ell}I_{N^{d}_{\ell}}\in\mathbb{R}^{N^{d}_{\ell}\times N^{d}_{\ell}}$ , $\ell=1,\ldots,L$ , 2. (b)

$\mathbf{E}[R_{\underline{\ell},\underline{\ell}^{\prime}}]=0\in\mathbb{R}^{N^{d}_{\ell}\times N^{d}_{\ell^{\prime}}}$ , $\ell,\ell^{\prime}=1,\ldots,L$ with $\ell\neq\ell^{\prime}$ .

Proof.

Let $U$ be a random rotation distributed according to the Haar measure on $\mathrm{SO}(d+1)$ and set $\widetilde{X}_{i}\vcentcolon=U^{-1}X_{i}$ (here, $U$ is should be interpreted as a $d+1\times d+1$ -matrix and $X_{i}$ as an element of $\mathbb{R}^{d+1}$ ). Then, $\widetilde{X}_{1},\ldots,\widetilde{X}_{n}$ are i.i.d. $\sim\mathcal{U}(\mathrm{SO}(d+1))$ . Denote with $\widetilde{X}=(Y_{j}(\widetilde{X}_{i}))$ the corresponding design matrix. Note that by [Gro96], Proposition 3.2.4, $\widetilde{X}$ can be written as

[TABLE]

where $B=(B_{\underline{\ell},\underline{\ell}^{\prime}})_{\ell,\ell^{\prime}=1,\ldots,L}\in\mathbb{R}^{D\times D}$ is a block matrix with $B_{\underline{\ell},\underline{\ell}^{\prime}}=0\in\mathbb{R}^{N^{d}_{\ell}\times N^{d}_{\ell^{\prime}}}$ if $\ell\neq\ell^{\prime}$ , and $B_{\underline{\ell},\underline{\ell}}$ is orthogonal. Let $\widetilde{X}=\widetilde{Q}\widetilde{R}$ be the generalized thin $QR$ decomposition of $\widetilde{X}$ which is related to the $QR$ decomposition of $X$ via

[TABLE]

that is, $\widetilde{R}=B^{\top}RB$ . Since $R$ and $\widetilde{R}$ have the equal law, we have

[TABLE]

Combining Schur’s lemma (see [Ser77], Proposition 4 and Corollary 1, which carry over to the compact group case as explained in Section 4.3 of that reference) with the fact that the spaces $\mathscr{H}^{d}_{\ell}$ , $\ell\in\mathbb{N}_{0}$ , are in one-to-one correspondence with the irreducible representations of $\mathrm{SO}(d+1)$ (see [DX13], Theorem 1.7.2) then implies both assertions (a) and (b). ∎

7.6.4 A reverse Hölder inequality for spherical harmonics

The following result, which provides even (slightly) stronger statements than what we need in the proof of Theorem 4.1, gathers some special instances of reverse Hölder inequalities for spherical harmonics from [DFT16].

Proposition 7.4.

Let $f_{\ell}\in\mathscr{H}_{\ell}^{d}$ and $X\sim\mathcal{U}(\mathbb{S}^{d})$ . Then,

[TABLE]

Proof.

Note that $\mathbf{E}[f_{\ell}^{4}(X)]=\int_{\mathbb{S}^{d}}f_{\ell}^{4}(x)\,\differential\mu(x)\asymp\lVert f_{\ell}\rVert_{L^{4}(\mathbb{S}^{d})}^{4}$ . From this, the claimed assertions follow from Theorem 1.1 in [DFT16]: The cases $d=2$ and $d=3$ follow from assertions (iv) and (i) of that theorem, respectively. The case $d\geq 4$ , however, follows form assertion (ii). ∎

7.6.5 Further estimates

Part (a) of the following proposition is an adaption of [Rei08], Proposition 4.9, to our setup.

Proposition 7.5.

Let $f_{\ell}\in\mathscr{H}_{\ell}^{d}$ . Then, the following assertions hold true:

(a)

$\mathbf{E}\lVert\Pi^{n}_{\mathscr{P}_{\ell-1}^{d}}f_{\ell}\rVert_{n}^{2}\mathbf{1}_{\Omega_{L}}\lesssim\lVert f_{\ell}\rVert^{2}_{L_{2}(\mathbb{S}^{d})}\ell^{d}n^{-1}$ , 2. (b)

$\mathbf{E}\lVert\Pi^{n}_{\mathscr{P}^{d}_{\ell-1}}f_{\ell}\rVert_{n}^{4}\lesssim\lVert f_{\ell}\rVert^{4}_{L_{2}(\mathbb{S}^{d})}\ell^{d}n^{-1}+\lVert f_{\ell}\rVert^{4}_{L_{2}(\mathbb{S}^{d})}\ell^{2d-2}n^{-2}$ .

Proof.

We have

[TABLE]

which proves (a). For the proof of (b), we use the decomposition

[TABLE]

For the first term on the right-hand side, we have by (a) and the defining property of the event $\Omega_{L}$ that

[TABLE]

The second term on the right-hand side is bounded as

[TABLE]

Combining the two obtained bounds implies (b). ∎

Bibliography72

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[ANS 96] Peter Alfeld, Marian Neamtu and Larry L. Schumaker “Fitting scattered data on sphere-like surfaces using spherical splines” In J. Comput. Appl. Math. 73.1–2 Elsevier BV, 1996, pp. 5–43 DOI: 10.1016/0377-0427(96)00034-9 · doi ↗
2[Ave 93] John Avery “Selected applications of hyperspherical harmonics in quantum theory” In J. Phys. Chem. 97.10 American Chemical Society (ACS), 1993, pp. 2406–2412 DOI: 10.1021/j 100112 a 048 · doi ↗
3[Bal+09] P. Baldi, G. Kerkyacharian, D. Marinucci and D. Picard “Adaptive density estimation for directional data using needlets” In Ann. Statist. 37.6A Institute of Mathematical Statistics, 2009 DOI: 10.1214/09-aos 682 · doi ↗
4[BB 09] Eiichi Bannai and Etsuko Bannai “A survey on spherical designs and algebraic combinatorics on spheres” In Eur. J. Combin. 30.6 Elsevier BV, 2009, pp. 1392–1425 DOI: 10.1016/j.ejc.2008.11.007 · doi ↗
5[BD 79] E. Bannai and R. M. Damerell “Tight spherical designs. I” In J. Math. Soc. Jpn. 31.1 , 1979, pp. 199–207 DOI: 10.2969/jmsj/03110199 · doi ↗
6[BG 15] Johann S. Brauchart and Peter J. Grabner “Distributing many points on spheres: Minimal energy and designs” In J. Complexity 31.3 , 2015, pp. 293–326 DOI: 10.1016/j.jco.2015.02.003 · doi ↗
7[BL 96] Lawrence D. Brown and Mark G. Low “Asymptotic equivalence of nonparametric regression and white noise” In Ann. Statist. 24.6 , 1996, pp. 2384–2398 DOI: 10.1214/aos/1032181159 · doi ↗
8[Bro+02] Lawrence D. Brown, T. Tony Cai, Mark G. Low and Cun-Hui Zhang “Asymptotic equivalence theory for nonparametric regression with random design” Dedicated to the memory of Lucien Le Cam In Ann. Statist. 30.3 , 2002, pp. 688–707 DOI: 10.1214/aos/1028674838 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Asymptotic equivalence of non-parametric regression with spherical regressors

Abstract

1 Introduction

Notation

2 Preliminaries

2.1 Spherical harmonics

2.2 Asymptotic equivalence of statistical experiments

2.3 Gaussian white noise

3 Regression on spherical ttt-designs

3.1 A general bound for the Le Cam distance

Theorem 3.1**.**

Remark 3.2*.*

Remark 3.3*.*

Remark 3.4*.*

3.2 Results for specific function classes

3.2.1 Asymptotic equivalence over Sobolev balls

Remark 3.5*.*

Remark 3.6*.*

Theorem 3.7**.**

3.2.2 Asymptotic equivalence over Besov balls

Theorem 3.8**.**

4 Regression on random uniform designs

Theorem 4.1**.**

5 Asymptotic non-equivalence

Theorem 5.1**.**

Remark 5.2*.*

6 Discussion

7 Proofs

7.1 Proof of Theorem 3.1

7.2 Proof of Theorem 3.7

Bound for Δ1\Delta_{1}Δ1​ when Θ=H2s(Sd,R)\Theta=H_{2}^{s}(\mathbb{S}^{d},R)Θ=H2s​(Sd,R)

Bound for Δ2\Delta_{2}Δ2​ when Θ=H2s(Sd,R)\Theta=H_{2}^{s}(\mathbb{S}^{d},R)Θ=H2s​(Sd,R)

7.3 Proof of Theorem 3.8

Bound for Δ1\Delta_{1}Δ1​ when Θ=Br,qs(Sd,R)\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)Θ=Br,qs​(Sd,R)

Bound for Δ2\Delta_{2}Δ2​ when Θ=Br,qs(Sd,R)\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)Θ=Br,qs​(Sd,R)

7.4 Proof of Theorem 4.1

Bound on E[V(L(Z~1∣X),L(Z~2∣X))1ΩL]\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{1}|\mathbb{X}),\mathcal{L}(\widetilde{Z}_{2}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]E[V(L(Z1​∣X),L(Z2​∣X))1ΩL​​]

Bound on E[V(L(Z~2∣X),L(Z~3∣X))1ΩL]\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{2}|\mathbb{X}),\mathcal{L}(\widetilde{Z}_{3}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]E[V(L(Z2​∣X),L(Z3​∣X))1ΩL​​]

Bound on E[V(L(Z~3∣X),L(Z~∣X))1ΩL]\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{3}|\mathbb{X}),\mathcal{L}(\widetilde{Z}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]E[V(L(Z3​∣X),L(Z∣X))1ΩL​​]

7.5 Proof of Theorem 5.1

Case Θ=H2s(Sd,R)\Theta=H_{2}^{s}(\mathbb{S}^{d},R)Θ=H2s​(Sd,R)

Case Θ=Br,qs(Sd,R)\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)Θ=Br,qs​(Sd,R) with r≥2r\geq 2r≥2

Case Θ=Br,qs(Sd,R)\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)Θ=Br,qs​(Sd,R) with r<2r<2r<2

7.6 Auxiliary results

7.6.1 A consequence of the Marcinkiewicz-Zygmund condition

Lemma 7.1**.**

7.6.2 Bound for P(ΩL∁)\mathbf{P}(\Omega_{L}^{\complement})P(ΩL∁​)

7.6.3 Generalized thin QRQRQR decomposition

Theorem 7.2** (Generalized thin QRQRQR decomposition).**

Proof.

Proposition 7.3**.**

Proof.

7.6.4 A reverse Hölder inequality for spherical harmonics

Proposition 7.4**.**

Proof.

7.6.5 Further estimates

Proposition 7.5**.**

Proof.

3 Regression on spherical $t$ -designs

Theorem 3.1.

*Remark 3.2**.*

*Remark 3.3**.*

*Remark 3.4**.*

*Remark 3.5**.*

*Remark 3.6**.*

Theorem 3.7.

Theorem 3.8.

Theorem 4.1.

Theorem 5.1.

*Remark 5.2**.*

Bound for $\Delta_{1}$ when $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$

Bound for $\Delta_{2}$ when $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$

Bound for $\Delta_{1}$ when $\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)$

Bound for $\Delta_{2}$ when $\Theta=B_{r,q}^{s}(\mathbb{S}^{d},R)$

Bound on $\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{1}|\mathbb{X}),\mathcal{L}(\widetilde{Z}_{2}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]$

Bound on $\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{2}|\mathbb{X}),\mathcal{L}(\widetilde{Z}_{3}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]$

Bound on $\mathbf{E}[V(\mathcal{L}(\widetilde{Z}_{3}|\mathbb{X}),\mathcal{L}(\widetilde{Z}|\mathbb{X}))\mathbf{1}_{\Omega_{L}}]$

Case $\Theta=H_{2}^{s}(\mathbb{S}^{d},R)$

Case $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ with $r\geq 2$

Case $\Theta=B^{s}_{r,q}(\mathbb{S}^{d},R)$ with $r<2$

Lemma 7.1.

7.6.2 Bound for $\mathbf{P}(\Omega_{L}^{\complement})$

7.6.3 Generalized thin $QR$ decomposition

Theorem 7.2 (Generalized thin $QR$ decomposition).

Proposition 7.3.

Proposition 7.4.

Proposition 7.5.