Inference for spherical location under high concentration

Davy Paindaveine; Thomas Verdebout

arXiv:1901.00359·math.ST·June 11, 2019

Inference for spherical location under high concentration

Davy Paindaveine, Thomas Verdebout

PDF

TL;DR

This paper develops a broad semiparametric framework for inference on spherical location parameters under high concentration, revealing optimal procedures and super-efficiency of the spherical mean, with theoretical and simulation validation.

Contribution

It extends inference methods for spherical data beyond Fisher-von Mises-Langevin models to a general class, analyzing high concentration asymptotics and establishing optimality properties.

Findings

01

Spherical mean is super-efficient estimator of for high concentration.

02

Watson and Wald tests have non-standard optimality properties under high concentration.

03

Optimal inference procedures depend on the nuisance functional .

Abstract

Motivated by the fact that circular or spherical data are often much concentrated around a location $θ$ , we consider inference about $θ$ under "high concentration" asymptotic scenarios for which the probability of any fixed spherical cap centered at $θ$ converges to one as the sample size $n$ diverges to infinity. Rather than restricting to Fisher-von Mises-Langevin distributions, we consider a much broader, semiparametric, class of rotationally symmetric distributions indexed by the location parameter $θ$ , a scalar concentration parameter $κ$ and a functional nuisance $f$ . We determine the class of distributions for which high concentration is obtained as $κ$ diverges to infinity. For such distributions, we then consider inference (point estimation, confidence zone estimation, hypothesis testing) on $θ$ in asymptotic scenarios…

Equations314

x \mapsto \frac{c _{p, κ_{n}, f} Γ ( \frac{p - 1}{2} )}{2 π ^{(p - 1) /2}} f (κ_{n} x^{'} θ_{n}),

x \mapsto \frac{c _{p, κ_{n}, f} Γ ( \frac{p - 1}{2} )}{2 π ^{(p - 1) /2}} f (κ_{n} x^{'} θ_{n}),

c_{p,\kappa,f}:=1\,\Big{/}\int_{-1}^{1}(1-s^{2})^{(p-3)/2}f(\kappa s)\,ds.

c_{p,\kappa,f}:=1\,\Big{/}\int_{-1}^{1}(1-s^{2})^{(p-3)/2}f(\kappa s)\,ds.

u_{ni} = X_{ni}^{'} θ_{n}, v_{ni} := 1 - u_{ni}^{2},

u_{ni} = X_{ni}^{'} θ_{n}, v_{ni} := 1 - u_{ni}^{2},

S_{ni} := \frac{( I _{p} - θ _{n} θ _{n}^{'} ) X _{ni}}{∥ ( I _{p} - θ _{n} θ _{n}^{'} ) X _{ni} ∥} = \frac{1}{v _{ni}} (I_{p} - θ_{n} θ_{n}^{'}) X_{ni} .

S_{ni} := \frac{( I _{p} - θ _{n} θ _{n}^{'} ) X _{ni}}{∥ ( I _{p} - θ _{n} θ _{n}^{'} ) X _{ni} ∥} = \frac{1}{v _{ni}} (I_{p} - θ_{n} θ_{n}^{'}) X_{ni} .

s \mapsto c_{p, κ_{n}, f} (1 - s^{2})^{(p - 3) /2} f (κ_{n} s) I [s \in [- 1, 1]],

s \mapsto c_{p, κ_{n}, f} (1 - s^{2})^{(p - 3) /2} f (κ_{n} s) I [s \in [- 1, 1]],

{\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}\big{[}\mathbf{X}_{n1}^{\prime}{\boldsymbol{\theta}}_{n}>1-\varepsilon\big{]}=c_{p,\kappa_{n},f}\int_{1-\varepsilon}^{1}(1-s^{2})^{(p-3)/2}f(\kappa_{n}s)\,ds\to 1,

{\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}\big{[}\mathbf{X}_{n1}^{\prime}{\boldsymbol{\theta}}_{n}>1-\varepsilon\big{]}=c_{p,\kappa_{n},f}\int_{1-\varepsilon}^{1}(1-s^{2})^{(p-3)/2}f(\kappa_{n}s)\,ds\to 1,

c_{p, κ_{n}, f} \int_{1 - ε}^{1} (1 - s^{2})^{(p - 3) /2} f (κ_{n} s) d s = \frac{\int _{1 - ε}^{1} ( 1 - s ^{2} ) ^{(p - 3) /2} s ^{b} d s}{\int _{0}^{1} ( 1 - s ^{2} ) ^{(p - 3) /2} s ^{b} d s} =: C < 1,

c_{p, κ_{n}, f} \int_{1 - ε}^{1} (1 - s^{2})^{(p - 3) /2} f (κ_{n} s) d s = \frac{\int _{1 - ε}^{1} ( 1 - s ^{2} ) ^{(p - 3) /2} s ^{b} d s}{\int _{0}^{1} ( 1 - s ^{2} ) ^{(p - 3) /2} s ^{b} d s} =: C < 1,

\int_{-1}^{1}g_{\xi,\zeta}(s)\big{|}e^{\log f(\kappa s)-\log f(\kappa)}-e^{(s-1)\kappa\varphi_{f}(\kappa)}\big{|}\,ds=o\bigg{(}\frac{1}{(\kappa\varphi_{f}(\kappa))^{\xi+1}}\bigg{)}

\int_{-1}^{1}g_{\xi,\zeta}(s)\big{|}e^{\log f(\kappa s)-\log f(\kappa)}-e^{(s-1)\kappa\varphi_{f}(\kappa)}\big{|}\,ds=o\bigg{(}\frac{1}{(\kappa\varphi_{f}(\kappa))^{\xi+1}}\bigg{)}

(i)\qquad 1-e_{n2}=\frac{p-1}{\kappa_{n}\varphi_{f}(\kappa_{n})}+o\bigg{(}\frac{1}{\kappa_{n}\varphi_{f}(\kappa_{n})}\bigg{)},

(i)\qquad 1-e_{n2}=\frac{p-1}{\kappa_{n}\varphi_{f}(\kappa_{n})}+o\bigg{(}\frac{1}{\kappa_{n}\varphi_{f}(\kappa_{n})}\bigg{)},

(ii)\qquad\tilde{e}_{n2}=\frac{p-1}{2(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}+o\bigg{(}\frac{1}{(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}\bigg{)}

(ii)\qquad\tilde{e}_{n2}=\frac{p-1}{2(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}+o\bigg{(}\frac{1}{(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}\bigg{)}

(iii)\qquad{\rm E}\big{[}v_{n1}^{4}\big{]}=\frac{p^{2}-1}{(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}+o\bigg{(}\frac{1}{(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}\bigg{)}

(iii)\qquad{\rm E}\big{[}v_{n1}^{4}\big{]}=\frac{p^{2}-1}{(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}+o\bigg{(}\frac{1}{(\kappa_{n}\varphi_{f}(\kappa_{n}))^{2}}\bigg{)}

\frac{(1-e_{n2})^{2}}{\tilde{e}_{n2}}=2(p-1)+o(1)\quad\textrm{ and }\quad\frac{{\rm E}\big{[}v_{n1}^{4}\big{]}}{\tilde{e}_{n2}}=2(p+1)+o(1)

\frac{(1-e_{n2})^{2}}{\tilde{e}_{n2}}=2(p-1)+o(1)\quad\textrm{ and }\quad\frac{{\rm E}\big{[}v_{n1}^{4}\big{]}}{\tilde{e}_{n2}}=2(p+1)+o(1)

\hat{θ}_{n} := \frac{X ˉ _{n}}{∥ X ˉ _{n} ∥},

\hat{θ}_{n} := \frac{X ˉ _{n}}{∥ X ˉ _{n} ∥},

\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}\,(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\mathcal{N}\big{(}{\bf 0},{\bf I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime}\big{)}

\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}\,(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\mathcal{N}\big{(}{\bf 0},{\bf I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime}\big{)}

n\kappa_{n}\varphi_{f}(\kappa_{n})\big{(}1-({\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}\big{)}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}

n\kappa_{n}\varphi_{f}(\kappa_{n})\big{(}1-({\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}\big{)}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}

\Bigg{\{}{\boldsymbol{\theta}}\in\mathcal{S}^{p-1}\!:|{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n}|\geq\sqrt{1-\frac{\chi^{2}_{p-1,1-\alpha}}{n\kappa_{n}\varphi_{f}(\kappa_{n})}}\ \Bigg{\}},

\Bigg{\{}{\boldsymbol{\theta}}\in\mathcal{S}^{p-1}\!:|{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n}|\geq\sqrt{1-\frac{\chi^{2}_{p-1,1-\alpha}}{n\kappa_{n}\varphi_{f}(\kappa_{n})}}\ \Bigg{\}},

\Bigg{\{}{\boldsymbol{\theta}}\in\mathcal{S}^{p-1}\!:{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n}\geq 1-\frac{\chi^{2}_{p-1,1-\alpha}}{2n\kappa_{n}\varphi_{f}(\kappa_{n})}\Bigg{\}},

\Bigg{\{}{\boldsymbol{\theta}}\in\mathcal{S}^{p-1}\!:{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n}\geq 1-\frac{\chi^{2}_{p-1,1-\alpha}}{2n\kappa_{n}\varphi_{f}(\kappa_{n})}\Bigg{\}},

\frac{\sqrt{n(p-1)}(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})}{\sqrt{1-\hat{e}_{n2}}}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\mathcal{N}\big{(}{\bf 0},{\bf I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime}\big{)}

\frac{\sqrt{n(p-1)}(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})}{\sqrt{1-\hat{e}_{n2}}}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\mathcal{N}\big{(}{\bf 0},{\bf I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime}\big{)}

\frac{n(p-1)\big{(}1-({\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}\big{)}}{1-\hat{e}_{n2}}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\quad\textrm{ and }\quad\frac{2n(p-1)(1-{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})}{1-\hat{e}_{n2}}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}

\frac{n(p-1)\big{(}1-({\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}\big{)}}{1-\hat{e}_{n2}}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\quad\textrm{ and }\quad\frac{2n(p-1)(1-{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})}{1-\hat{e}_{n2}}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}

\Bigg{\{}{\boldsymbol{\theta}}\in\mathcal{S}^{p-1}\!:{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n}\geq 1-\frac{1-\hat{e}_{n2}}{2n(p-1)}\chi^{2}_{p-1,1-\alpha}\Bigg{\}}.

\Bigg{\{}{\boldsymbol{\theta}}\in\mathcal{S}^{p-1}\!:{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n}\geq 1-\frac{1-\hat{e}_{n2}}{2n(p-1)}\chi^{2}_{p-1,1-\alpha}\Bigg{\}}.

W_{n} := \frac{n ( p - 1 ) X ˉ _{n}^{'} ( I _{p} - θ _{0} θ _{0}^{'} ) X ˉ _{n}}{1 - \frac{1}{n} \sum _{i = 1}^{n} ( X _{ni}^{'} θ _{0} ) ^{2}}

W_{n} := \frac{n ( p - 1 ) X ˉ _{n}^{'} ( I _{p} - θ _{0} θ _{0}^{'} ) X ˉ _{n}}{1 - \frac{1}{n} \sum _{i = 1}^{n} ( X _{ni}^{'} θ _{0} ) ^{2}}

S_{n} = \frac{n ( p - 1 ) ( X ˉ _{n}^{'} θ _{0} ) ^{2} θ ^ _{n}^{'} ( I _{p} - θ _{0} θ _{0}^{'} ) θ ^ _{n}}{1 - \frac{1}{n} \sum _{i = 1}^{n} ( X _{ni}^{'} θ _{0} ) ^{2}}

S_{n} = \frac{n ( p - 1 ) ( X ˉ _{n}^{'} θ _{0} ) ^{2} θ ^ _{n}^{'} ( I _{p} - θ _{0} θ _{0}^{'} ) θ ^ _{n}}{1 - \frac{1}{n} \sum _{i = 1}^{n} ( X _{ni}^{'} θ _{0} ) ^{2}}

R_{n} = \frac{1 - \frac{1}{n} \sum _{i = 1}^{n} ( X _{ni}^{'} θ _{0} ) ^{2}}{2 ( p - 1 ) e ~ _{n 2}^{1/2}}

R_{n} = \frac{1 - \frac{1}{n} \sum _{i = 1}^{n} ( X _{ni}^{'} θ _{0} ) ^{2}}{2 ( p - 1 ) e ~ _{n 2}^{1/2}}

W_{n} =: \frac{W ~ _{n}}{R _{n}} and S_{n} =: \frac{( X ˉ _{n}^{'} θ _{0} ) ^{2} S ~ _{n}}{R _{n}} \cdot

W_{n} =: \frac{W ~ _{n}}{R _{n}} and S_{n} =: \frac{( X ˉ _{n}^{'} θ _{0} ) ^{2} S ~ _{n}}{R _{n}} \cdot

W_{n} = S_{n} + o_{P} (1) \to D χ_{p - 1}^{2}

W_{n} = S_{n} + o_{P} (1) \to D χ_{p - 1}^{2}

W_{n}=S_{n}+o_{\rm P}(1)\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)}

W_{n}=S_{n}+o_{\rm P}(1)\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)}

{\boldsymbol{\theta}}_{n\ell}=\Bigg{(}\!\begin{array}[]{ccc}\cos\alpha_{n\ell}&-\sin\alpha_{n\ell}&0\\[-4.2679pt] \sin\alpha_{n\ell}&\hskip 8.53581pt\cos\alpha_{n\ell}&0\\[-4.2679pt] 0&0&1\end{array}\!\Bigg{)}{\boldsymbol{\theta}}_{0},

{\boldsymbol{\theta}}_{n\ell}=\Bigg{(}\!\begin{array}[]{ccc}\cos\alpha_{n\ell}&-\sin\alpha_{n\ell}&0\\[-4.2679pt] \sin\alpha_{n\ell}&\hskip 8.53581pt\cos\alpha_{n\ell}&0\\[-4.2679pt] 0&0&1\end{array}\!\Bigg{)}{\boldsymbol{\theta}}_{0},

{\rm P}[Y_{\ell}>\chi^{2}_{p-1,1-\alpha}],\quad\textrm{ with }Y_{\ell}\sim\chi^{2}_{p-1}\big{(}\ell^{2}\big{)};

{\rm P}[Y_{\ell}>\chi^{2}_{p-1,1-\alpha}],\quad\textrm{ with }Y_{\ell}\sim\chi^{2}_{p-1}\big{(}\ell^{2}\big{)};

\frac{1}{f(\kappa)}\int_{-1}^{1}\!\!\big{(}\varphi_{f}(\kappa s)-\varphi_{f}(\kappa)\big{)}^{2}(1-s^{2})^{(p-3)/2}f(\kappa s)\,\!ds\!=o\bigg{(}\frac{1}{\kappa^{(p+1)/2}(\varphi_{f}(\kappa))^{(p-3)/2}}\bigg{)}

\frac{1}{f(\kappa)}\int_{-1}^{1}\!\!\big{(}\varphi_{f}(\kappa s)-\varphi_{f}(\kappa)\big{)}^{2}(1-s^{2})^{(p-3)/2}f(\kappa s)\,\!ds\!=o\bigg{(}\frac{1}{\kappa^{(p+1)/2}(\varphi_{f}(\kappa))^{(p-3)/2}}\bigg{)}

\displaystyle\frac{1}{f(\kappa_{n})}\int_{-1}^{1}\int_{0}^{1}\!\!\big{|}\log f(\kappa_{n}s+h^{\pm}_{n}(s,w))-\log f(\kappa_{n}s)-h^{\pm}_{n}(s,w)\varphi_{f}(\kappa_{n}s)\big{|}f(\kappa_{n}s)

\displaystyle\frac{1}{f(\kappa_{n})}\int_{-1}^{1}\int_{0}^{1}\!\!\big{|}\log f(\kappa_{n}s+h^{\pm}_{n}(s,w))-\log f(\kappa_{n}s)-h^{\pm}_{n}(s,w)\varphi_{f}(\kappa_{n}s)\big{|}f(\kappa_{n}s)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Inference for spherical location

under high concentration

Davy Paindaveinelabel=e1][email protected] label=u1 [[

url]http://homepages.ulb.ac.be/dpaindav

Thomas Verdebout label=e2][email protected] label=u2 [[

url]http://tverdebo.ulb.ac.be

Université libre de Bruxelles

ECARES and Département de Mathématique

Avenue F.D. Roosevelt, 50

ECARES, CP114/04

B-1050, Brussels

Belgium

Université libre de Bruxelles

ECARES and Département de Mathématique

Boulevard du Triomphe, CP210

B-1050, Brussels

Belgium

Abstract

Motivated by the fact that circular or spherical data are often much concentrated around a location $\boldsymbol{\theta}$ , we consider inference about $\boldsymbol{\theta}$ under high concentration asymptotic scenarios for which the probability of any fixed spherical cap centered at $\boldsymbol{\theta}$ converges to one as the sample size $n$ diverges to infinity. Rather than restricting to Fisher–von Mises–Langevin distributions, we consider a much broader, semiparametric, class of rotationally symmetric distributions indexed by the location parameter $\boldsymbol{\theta}$ , a scalar concentration parameter $\kappa$ and a functional nuisance $f$ . We determine the class of distributions for which high concentration is obtained as $\kappa$ diverges to infinity. For such distributions, we then consider inference (point estimation, confidence zone estimation, hypothesis testing) on $\boldsymbol{\theta}$ in asymptotic scenarios where $\kappa_{n}$ diverges to infinity at an arbitrary rate with the sample size $n$ . Our asymptotic investigation reveals that, interestingly, optimal inference procedures on $\boldsymbol{\theta}$ show consistency rates that depend on $f$ . Using asymptotics “à la Le Cam”, we show that the spherical mean is, at any $f$ , a parametrically super-efficient estimator of ${\boldsymbol{\theta}}$ and that the Watson and Wald tests for $\mathcal{H}_{0}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ enjoy similar, non-standard, optimality properties. We illustrate our results through simulations and treat a real data example. On a technical point of view, our asymptotic derivations require challenging expansions of rotationally symmetric functionals for large arguments of $f$ .

62E20, 62F30,

62F05, 62F12,

Concentrated distributions,

Directional statistics,

Le Cam’s asymptotic theory of statistical experiments,

Local asymptotic normality,

Super-efficiency,

keywords:

[class=MSC]

keywords:

\setattribute

journalname

\startlocaldefs

\endlocaldefs

and

t1Corresponding author. Davy Paindaveine’s research is supported by a research fellowship from the Francqui Foundation and by the Program of Concerted Research Actions (ARC) of the Université libre de Bruxelles. t2Thomas Verdebout’s research is supported by the ARC Program of the Université libre de Bruxelles and by the Crédit de Recherche J.0134.18 of the FNRS (Fonds National pour la Recherche Scientifique), Communauté Française de Belgique.

1 Introduction

Directional statistics is concerned with data on the unit sphere $\mathcal{S}^{p-1}=\{\mathbf{x}\in\mathbb{R}^{p}:\|\mathbf{x}\|^{2}=\mathbf{x}^{\prime}\mathbf{x}=1\}$ of $\mathbb{R}^{p}$ or more generally on Riemannian manifolds such as a torus or an infinite cylinder. Directional data are present in many fields and have attracted a lot of attention in the last decade. Recent applications include analysis of magnetic remanence through copulae on product manifolds in Jupp (2015), analysis of animal movement using angular regression in Rivest et al. (2016), or analysis of flight trajectories through principal component analysis for functional data on $\mathcal{S}^{p-1}$ in Dai and Müller (2018), to cite only a few. For an overview of the topic, we refer to Mardia and Jupp (2000) and Ley and Verdebout (2017).

In this paper, we consider a class of distributions on $\mathcal{S}^{p-1}$ admitting a density at $\mathbf{x}$ that is proportional to $f(\kappa\mathbf{x}^{\prime}{\boldsymbol{\theta}})$ , where ${\boldsymbol{\theta}}\in\mathcal{S}^{p-1}$ , $\kappa>0$ and $f$ is a monotone increasing function from $\mathbb{R}$ to $\mathbb{R}^{+}$ (throughout, densities on $\mathcal{S}^{p-1}$ will be with respect to the surface area measure). The resulting distribution on the sphere will be denoted as ${\rm Rot}_{p}({\boldsymbol{\theta}},\kappa,f)$ to stress its rotational symmetry: if $\mathbf{X}\sim{\rm Rot}_{p}({\boldsymbol{\theta}},\kappa,f)$ , then $\mathbf{O}\mathbf{X}$ and $\mathbf{X}$ are equal in distribution for any $p\times p$ orthogonal matrix $\mathbf{O}$ such that $\mathbf{O}{\boldsymbol{\theta}}={\boldsymbol{\theta}}$ . Clearly, ${\boldsymbol{\theta}}$ is the modal location on the sphere, hence plays the role of a location parameter. In contrast, $\kappa$ is a scale or concentration parameter. This terminology is justified by the fact that, for many functions $f$ , the distribution ${\rm Rot}_{p}({\boldsymbol{\theta}},\kappa,f)$ becomes arbitrarily concentrated around ${\boldsymbol{\theta}}$ as $\kappa$ diverges to infinity; it is in particular so for the celebrated Fisher–von Mises–Langevin (FvML) distributions, that are obtained with $f=\exp$ . FvML distributions play a central role in directional statistics, a role that can be compared to the one played by Gaussian distributions in classical multivariate setups. For instance, the responses of the circular/spherical regression models in Rivest (1986), Downs and Mardia (2002), SenGupta, Kim and Arnold (2013) and Rosenthal et al. (2014) are FvML with a location parameter that depends on the predictors.

In most applications, the location parameter ${\boldsymbol{\theta}}$ is the parameter of interest, whereas the concentration parameter $\kappa$ and the infinite-dimensional parameter $f$ are unspecified nuisances. The most classical estimator of ${\boldsymbol{\theta}}$ is the spherical mean, whereas the most celebrated test for $\mathcal{H}_{0}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ , where ${\boldsymbol{\theta}}_{0}\in\mathcal{S}^{p-1}$ is fixed, is the Watson test (see Sections 3 and 4, respectively). In the standard asymptotic scenario under which $n$ diverges to infinity with $\kappa$ fixed, the asymptotic properties of these procedures are well-known; see, e.g., Mardia and Jupp (2000). In particular, the spherical mean is root- $n$ consistent, whereas the Watson test shows non-trivial asymptotic powers under sequences of local alternatives of the form $\mathcal{H}_{1}^{(n)}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{n}$ with $\sqrt{n}\|{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}\|\to c>0$ .

In practice, the asymptotic results above are relevant in cases where the underlying concentration $\kappa$ is neither too small nor too large. For small values of $\kappa$ , the fixed- $\kappa$ asymptotic distribution of the spherical mean and the corresponding asymptotic null distribution of $W_{n}$ only poorly approximate the exact distribution of these statistics, unless the sample size $n$ at hand is extremely large. This motivates considering a double asymptotic scenario where $\kappa=\kappa_{n}$ goes to zero as $n$ diverges to infinity. The observations $\mathbf{X}_{n1},\ldots,\mathbf{X}_{nn}$ are then assumed to form a random sample from the distribution ${\rm Rot}_{p}({\boldsymbol{\theta}},\kappa_{n},f)$ , with $\kappa_{n}=o(1)$ , which makes it here strictly necessary to consider triangular arrays of observations. Such a “low-concentration double asymptotic scenario” was considered in Paindaveine and Verdebout (2017), where it was proved that the faster $\kappa_{n}$ goes to zero, the poorer the consistency rates of the aforementioned inference procedures. More precisely, (i) if $\kappa_{n}=o(1)$ with $\kappa_{n}\sqrt{n}\to\infty$ , then $\kappa_{n}\sqrt{n}(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})$ is asymptotically normal, so that the consistency rate of the spherical mean deteriorates from $\sqrt{n}$ (in the standard fixed- $\kappa$ case) to $\kappa_{n}\sqrt{n}$ (in the present case); (ii) if $\kappa_{n}=O(1/\sqrt{n})$ , then the spherical mean is not consistent anymore. Similarly, in situation (i), the Watson test shows non-trivial asymptotic powers under sequences of local alternatives of the form $\mathcal{H}_{1}^{(n)}\!:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{n}$ with $\kappa_{n}\sqrt{n}\|{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}\|\to c>0$ , and, in situation (ii), there is no sequence of alternatives under which this test would be consistent. These behaviors of the spherical mean and of the Watson test are non-standard yet expected: as the concentration $\kappa_{n}$ gets smaller, the distribution ${\rm Rot}_{p}({\boldsymbol{\theta}},\kappa_{n},f)$ becomes increasingly closer to the uniform distribution on $\mathcal{S}^{p-1}$ for which the parameter of interest ${\boldsymbol{\theta}}$ is not identifiable. In other words, inference on ${\boldsymbol{\theta}}$ is increasingly challenging as $\kappa$ decreases to zero, which reflects in the deterioration of the consistency rates above.

The situation for large concentrations $\kappa$ is similar yet different. On the one hand, it is still so that a standard, fixed- $\kappa$ , asymptotic analysis could in principle fail describing in a suitable way the finite-sample behaviors of the spherical mean and of the Watson test statistic under high concentration. On the other hand, inference about ${\boldsymbol{\theta}}$ intuitively becomes increasingly easy as the distribution gets more and more concentrated around ${\boldsymbol{\theta}}$ , which should make it possible to define “super-efficient” estimators and tests on ${\boldsymbol{\theta}}$ . Inference for “concentrated” FvML distributions actually has already been quite much considered in the literature. One of the first papers tackling inference problems for the location parameter of FvML distributions under large values of $\kappa$ is Watson (1984), where asymptotic results as $\kappa\to\infty$ with $n$ fixed were derived. In the same asymptotic scenario, Rivest (1986) investigated the null limiting behavior of a goodness-of-fit test for FvML distributions, whereas Rivest (1989), Downs and Mardia (2002) and Downs (2003) considered spherical regression in a concentrated FvML setup. Rosenthal et al. (2014) analyzed concentrated data using a regression model with an FvML noise. Fujikoshi and Watamori (1992) obtained the asymptotic null distributions of various test statistics for $\mathcal{H}_{0}\!:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ again as $\kappa\to\infty$ with $n$ fixed, and derived the asymptotic powers of the corresponding tests under appropriate sequences of local alternatives. Still in the framework of FvML distributions, Watamori (1996) reviewed point estimation and (one-sample and multi-sample) hypothesis testing in the standard asymptotic scenario where $n\to\infty$ with $\kappa$ fixed and in the concentrated scenario where $\kappa\to\infty$ with $n$ fixed. Arnold and Jupp (2013) and Arnold, Jupp and Schaeben (2018) considered estimation of “highly concentrated rotations”. Finally, Chikuse (2003a) considered inference for concentrated matrix FvML distributions, still in a setup where $\kappa\to\infty$ with $n$ fixed; see also Chikuse (2003b). Monographs covering inference for concentrated FvML distributions include Watson (1983) and Mardia and Jupp (2000).

This review of the literature shows that inference on ${\boldsymbol{\theta}}$ under high concentration is a classical topic in directional statistics. Yet this review also reveals some important limitations in previous studies: (i) all asymptotic results available are as $\kappa\to\infty$ with $n$ fixed, while, parallel to the low-concentration case above, a double asymptotic scenario where $\kappa=\kappa_{n}$ would go to infinity with $n$ would be at least as natural (particularly so if $\kappa_{n}$ would be allowed to diverge to infinity at an arbitrary rate as a function of $n$ ); (ii) all results are limited to the parametric case of FvML distributions, so that the asymptotic properties of the spherical mean and of the Watson test remain unknown in the broader semiparametric class of ${\rm Rot}_{p}({\boldsymbol{\theta}},\kappa,f)$ distributions; (iii) for hypothesis testing, most works focused on the null hypothesis: very few results try and describe asymptotic powers under sequences of local alternatives, and, more importantly, not a single optimality result, to the best of our knowledge, was obtained in the literature. In this paper, we therefore fill an important gap by deriving results that are getting rid of the limitations (i)–(iii).

The outline of the paper is as follows. In Section 2, we fix the notation, introduce the assumptions that will be used throughout and characterize the rotationally symmetric distributions that provide high concentration for arbitrarily large values of $\kappa$ . In Section 3, we derive the asymptotic distribution of the spherical mean in a double asymptotic scenario where $\kappa_{n}$ diverges to infinity at an arbitrary rate with $n$ . Interestingly, in contrast with what happens for low concentrations, the consistency rate here depends on the nuisance function $f$ . We also provide confidence zones for ${\boldsymbol{\theta}}$ that quite naturally take the form of spherical caps centered at the spherical mean. In Section 4, we study the asymptotic behaviour of the Watson and Wald tests. In Section 5, we turn to optimality issues and show that, under mild assumptions on $f$ , the sequence of statistical experiments considered is locally asymptotically normal. We establish the Le Cam optimality of the spherical mean estimator and of the Watson and Wald tests under high concentration. Finally, a real data application is conducted in Section 6 and a wrap up is provided in Section 7. Proofs are collected in the appendix.

2 High concentration

Throughout, we will denote as ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ the hypothesis under which the observations $\mathbf{X}_{n1},\ldots,\mathbf{X}_{nn}$ form a random sample from the distribution ${\rm Rot}_{p}({\boldsymbol{\theta}}_{n},\kappa_{n},f)$ described in the introduction, that is, the hypothesis under which these observations are mutually independent and share the common density

[TABLE]

where $\Gamma(\cdot)$ is the Euler Gamma function and the constant $c_{p,\kappa,f}$ is given by

[TABLE]

In the sequel, $f:\mathbb{R}\to\mathbb{R}^{+}$ is assumed to be monotone non-decreasing on $(-\infty,0]$ and monotone increasing on $[0,\infty)$ . Under this assumption, the location parameter ${\boldsymbol{\theta}}_{n}$ is properly identified as the modal location on the sphere. One way to also make $\kappa_{n}$ and $f$ identifiable would be to further impose $f(0)=f^{\prime}(0)=1$ . We will not impose these conditions since we also want to consider functions $f$ that are not differentiable at zero. The resulting lack of identifiability will not be an issue in the sequel since $\kappa_{n}$ and $f$ play the role of nuisance parameters when conducting inference on ${\boldsymbol{\theta}}_{n}$ .

We will often make use of the tangent-normal decomposition of $\mathbf{X}_{ni}$ with respect to ${\boldsymbol{\theta}}_{n}$ , which reads $\mathbf{X}_{ni}=u_{ni}{\boldsymbol{\theta}}_{n}+v_{ni}\mathbf{S}_{ni}$ , with

[TABLE]

and

[TABLE]

The cosine $u_{ni}$ is associated with the latitude of $\mathbf{X}_{ni}$ with respect to the “north pole” ${\boldsymbol{\theta}}_{n}$ , whereas $\mathbf{S}_{ni}$ determines the corresponding hyper-longitude. Under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ , $u_{n1}$ and $\mathbf{S}_{n1}$ are mutually independent, $\mathbf{S}_{n1}$ is uniformly distributed on $\mathcal{S}^{\perp}_{{\boldsymbol{\theta}}_{n}}:=\{\mathbf{x}\in\mathcal{S}^{p-1}:\mathbf{x}^{\prime}{\boldsymbol{\theta}}_{n}=0\}$ , and $u_{n1}$ admits the density

[TABLE]

where $\mathbb{I}[A]$ stands for the indicator function of the set $A$ . The moments of $u_{n1}$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ will play an important role below and will be denoted as $e_{n\ell}:={\rm E}[u_{n1}^{\ell}]$ , $\ell=1,2,\ldots$ We will also write $\tilde{e}_{n2}=e_{n2}-e_{n1}^{2}$ for the corresponding variance. The function $f$ governs (jointly with $\kappa_{n}$ ) the distribution of the angle $\arccos(u_{n1})$ between $\mathbf{X}_{n1}$ and ${\boldsymbol{\theta}}_{n}$ , hence is sometimes referred to as an angular function.

The present paper is concerned with sequences of rotationally symmetric distributions that are asymptotically highly concentrated, meaning that the probability mass of any fixed spherical cap centered at ${\boldsymbol{\theta}}_{n}$ converges to one as $n$ diverges to infinity. More precisely, we will say that the sequence of hypotheses ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ is asymptotically highly concentrated if and only if, for any sequence ( $\kappa_{n}$ ) diverging to infinity and any $\varepsilon\in(0,2)$ , we have

[TABLE]

that is, if and only if $u_{n1}$ converges in probability to one as soon as $(\kappa_{n})$ diverges to infinity. Since this is clearly a property that depends on $f$ only, we will say that $f$ provides high concentration if and only if (2.4) holds. Not all functions $f$ provide high concentration. The polynomial functions $z\mapsto f(z)=t^{b}\mathbb{I}[t\geq 0]$ are examples since, for any $\varepsilon\in(0,1)$ , they yield

[TABLE]

where $C$ does not depend on $n$ . It is easy to check that $z\mapsto f(z)=\frac{\pi}{2}+\arctan(z)$ does not provide high concentration either, but that the angular FvML function $z\mapsto f(z)=\exp(z)$ does. It is therefore desirable to characterize the functions $f$ providing high concentration, which is the aim of the following result.

Theorem 2.1.

Let $f:\mathbb{R}\to\mathbb{R}^{+}$ be monotone non-decreasing on $(-\infty,0]$ and monotone increasing on $[0,\infty)$ . Assume that $f$ is differentiable in a neighborhood of $\infty$ $($ in the sense that there exists $M$ such that $f$ is differentiable over $(M,\infty))$ and put $\varphi_{f}:=f^{\prime}/f$ , where $f^{\prime}$ is the derivative of $f$ . Then we have the following:

(i)

If $\kappa\varphi_{f}(\kappa)\nearrow\infty$ as $\kappa\to\infty$ , then $f$ provides high concentration.

(ii)

If $\kappa\varphi_{f}(\kappa)\to c(>0)$ as $\kappa\to\infty$ , then $f$ does not provide high concentration.

(iii)

If $\kappa\varphi_{f}(\kappa)\searrow 0$ as $\kappa\to\infty$ , then $f$ does not provide high concentration.

In this result, $g(\kappa)\nearrow\infty$ (resp., $g(\kappa)\searrow 0$ ) as $\kappa\to\infty$ means that (a) $g(\kappa)$ diverges to infinity (resp., converges to zero) as $\kappa$ diverges to infinity and that (b) there exists $M$ such that $\kappa\mapsto g(\kappa)$ is monotone non-decreasing (resp., monotone non-increasing) over $(M,\infty)$ . Essentially, Theorem 2.1 states that high concentration is obtained if $f(z)$ diverges to infinity at least exponentially fast as $z$ diverges to infinity. In particular, this result confirms that the polynomial and arctan functions $f$ above do not provide high concentration, but that the FvML one does. Writing throughout $z^{b}:={\rm sgn}(z)|z|^{b}$ , it also shows that all functions $z\mapsto f_{b}(z):=\exp(z^{b})$ , with $b>0$ , do provide high concentration. These functions $f$ , which include the FvML one, will be our main running examples below.

In the rest of the paper, $\mathcal{F}$ will stand for the collection of functions $f:\mathbb{R}\to\mathbb{R}^{+}$ that (i) are monotone non-decreasing on $(-\infty,0]$ and monotone increasing on $[0,\infty)$ , (ii) are differentiable in a neighborhood of $\infty$ , (iii) are such that $\kappa\varphi_{f}(\kappa)\nearrow\infty$ as $\kappa\to\infty$ and (iv) satisfy, for any $\xi,\zeta>-1$ ,

[TABLE]

as $\kappa\to\infty$ , with $g_{\xi,\zeta}(s):=(1-s)^{\xi}(1+s)^{\zeta}$ . As the following result shows, our prototypical examples of angular functions $f$ providing high concentration meet these properties.

Proposition 2.1.

For any $b>0$ , the function $z\mapsto f_{b}(z)=\exp(z^{b})$ belongs to $\mathcal{F}$ .

As already mentioned, the moments of $u_{n1}=\mathbf{X}_{n1}^{\prime}{\boldsymbol{\theta}}_{n}$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ will play a key role in the sequel. It will actually be important to understand the asymptotic behavior of these moments under high concentration. This is the role of the following result.

Theorem 2.2.

Fix an integer $p\geq 2$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Then,

[TABLE]

and

[TABLE]

as $n\to\infty$ .

As a corollary, we have

[TABLE]

as $n\to\infty$ . Also, Vitali’s Theorem (see, e.g., Theorem 5.5 in Shorack, 2000) readily implies that, under the conditions of Theorem 2.2, $e_{n1}=1+o(1)$ as $n\to\infty$ . One could obtain an expansion of $1-e_{n1}$ that is similar to the one in Theorem 2.2(i), but we will not do so since this is not needed for our purposes.

3 Point estimation

As mentioned in the introduction, the most classical estimator of location under rotational symmetry is the spherical mean, which is given by

[TABLE]

with $\bar{\mathbf{X}}_{n}:=\frac{1}{n}\sum_{i=1}^{n}\mathbf{X}_{ni}$ . Under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ , ${\rm E}[\mathbf{X}_{n1}]=\lambda_{\kappa_{n},f}{\boldsymbol{\theta}}$ for some positive scalar factor $\lambda_{\kappa_{n},f}$ , so that the spherical mean is a moment-type estimator of ${\boldsymbol{\theta}}$ . It is easy to check that it is also the maximum likelihood estimator of ${\boldsymbol{\theta}}$ in the class of FvML distributions. This makes it desirable to investigate the asymptotic behavior of this estimator under high concentration. We have the following result.

Theorem 3.1.

Fix an integer $p\geq 2$ , ${\boldsymbol{\theta}}\in\mathcal{S}^{p-1}$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Then, under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ , so that, still under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ (throughout, $\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}$ denotes convergence in distribution).

Since the sequence $(\kappa_{n}\varphi_{f}(\kappa_{n}))$ diverges to infinity under high concentration, Theorem 3.1 shows that the consistency rate of the spherical mean is faster than the usual parametric root- $n$ rate. Interestingly, this consistency rate depends on the angular function $f$ . For instance, for $f(z)=\exp(z^{b})$ with $b>0$ , the rate is $n^{(b+1)/2}$ , hence can be arbitrary close to the standard root- $n$ rate for small $b$ , but can also provide arbitrary fast polynomial convergence. Clearly, even faster rates can be achieved by considering more extreme high concentration patterns.

The asymptotic result (3.2) in principle allows constructing confidence zones for ${\boldsymbol{\theta}}$ . More precisely, it follows from this result that a confidence zone for ${\boldsymbol{\theta}}$ at asymptotic confidence level $1-\alpha$ is given by

[TABLE]

where $\chi^{2}_{p-1,1-\alpha}$ denotes the upper $\alpha$ -quantile of the $\chi^{2}_{p-1}$ distribution. This confidence zone, however, is problematic in two respects. First, it is not connected, as it takes the form of two antipodal spherical caps centered at $\pm\hat{\boldsymbol{\theta}}_{n}$ , which is not natural. Second, while the $f$ -dependent consistency rate in Theorem 3.1 is interesting, it also leads to confidence zones that cannot be used in practice since $f$ is usually an unspecified nuisance. The first problem can be dealt with by deriving a weak limiting result for ${\boldsymbol{\theta}}^{\prime}\hat{\boldsymbol{\theta}}_{n}$ obtained from a second-order delta method (while Theorem 3.1 results from a classical, first-order, delta method). We have the following result.

Theorem 3.2.

Fix an integer $p\geq 2$ , ${\boldsymbol{\theta}}\in\mathcal{S}^{p-1}$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Then, under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ , $2n\kappa_{n}\varphi_{f}(\kappa_{n})(1-{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}$ as $n\to\infty$ .

This second-order result provides confidence zones at asymptotic confidence level $1-\alpha$ that are given by

[TABLE]

hence take, quite naturally, the form of (connected) spherical caps centered at $\hat{\boldsymbol{\theta}}_{n}$ . Of course, these confidence zones still cannot be used in practice since $f$ is unspecified. Fortunately, Theorem 2.2(i) allows replacing the unknown quantity $\kappa_{n}\varphi_{f}(\kappa_{n})$ by the quantity $(p-1)/(1-e_{n2})=(p-1)/(1-{\rm E}[(\mathbf{X}_{n1}^{\prime}{\boldsymbol{\theta}})^{2}])$ , which can be naturally estimated by $(p-1)/(1-\hat{e}_{n2})$ , where we let $\hat{e}_{n2}:=\frac{1}{n}\sum_{i=1}^{n}(\mathbf{X}_{ni}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}$ . The following result, that guarantees that this replacement has no asymptotic impact, opens the door to the construction of feasible confidence zones.

Theorem 3.3.

Fix an integer $p\geq 2$ , ${\boldsymbol{\theta}}\in\mathcal{S}^{p-1}$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Then, under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ , and, still under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ , where, in all cases, $\hat{e}_{n2}=\frac{1}{n}\sum_{i=1}^{n}(\mathbf{X}_{ni}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}$ .

As a direct corollary, a feasible version of the spherical cap confidence zone in (3.3) is

[TABLE]

We conducted the following Monte Carlo exercises to check the validity of Theorems 3.2–3.3. For each combination of $a\in\{0.5,1\}$ and $b\in\{0.5,1,1.4\}$ , we generated $M=10,\!000$ random samples of size $n=100$ from the rotationally symmetric distribution with location ${\boldsymbol{\theta}}=(1,0,0)^{\prime}\in\mathcal{S}^{2}$ , concentration $\kappa_{n}=n^{a}$ , and angular function $z\mapsto f_{b}(z)=\exp(z^{b})$ (numerical overflows prevented us from considering larger values of $b$ ). For each $a$ and $b$ , Figure 1 plots kernel density estimates of the resulting $M$ values of $T_{n}^{\rm Oracle}:=2n\kappa_{n}\varphi_{f}(\kappa_{n})(1-{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})$ and $T^{\rm Feasible}_{n}:=2n(p-1)(1-{\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})/(1-\hat{e}_{n2})$ (for $a=1$ , raw histograms are also provided). Clearly, Figure 1 supports the theoretical results above, with possibly one exception only, namely the case of $T^{\rm Feasible}_{n}$ with $b=0.5$ . We therefore focused on this case and repeated the same Monte Carlo exercise with $n=10,\!000$ . The results, that are shown in Figure 2, are now in perfect agreement with the theory for $a=1$ , whereas the fit still is not excellent for $a=0.5$ . A closer inspection provides the explanation: despite the large sample size $n$ considered in Figure 2, the distribution associated with $a=b=0.5$ is far for being highly concentrated; see the right panel of this figure. The fit observed for $a=0.5$ in the left panel of Figure 2 therefore does not contradict our theoretical results, which would materialize for higher concentrations.

4 Hypothesis testing

We now turn to hypothesis testing and, more specifically, to the generic problem of testing the null hypothesis $\mathcal{H}_{0}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ against the alternative $\mathcal{H}_{1}:{\boldsymbol{\theta}}\neq{\boldsymbol{\theta}}_{0}$ , where ${\boldsymbol{\theta}}_{0}$ is a fixed unit $p$ -vector. In this section, we consider the Watson test (Watson, 1983, p. 140) and the Wald test (Hayakawa, 1990; Hayakawa and Puri, 1985), that respectively reject the null hypothesis at asymptotic level $\alpha$ whenever

[TABLE]

and

[TABLE]

exceed the critical value $\chi^{2}_{p-1,1-\alpha}$ . In standard asymptotic scenarios where the sample size $n$ diverges to infinity with $\kappa$ fixed, the Watson and Wald test statistics are asymptotically equivalent in probability under the null hypothesis, hence also under sequences of contiguous alternatives, so that these tests may be considered asymptotically equivalent. As shown in Paindaveine and Verdebout (2017), however, this asymptotic equivalence does not survive asymptotic scenarios for which $\kappa_{n}=O(1/\sqrt{n})$ as $n$ diverges to infinity. This suggests investigating the asymptotic behavior of these tests under the high concentration scenarios considered in the previous sections.

To do so, let

[TABLE]

and decompose the Watson and Wald test statistics into

[TABLE]

We then have the following lemma.

Lemma 4.1.

Fix an integer $p\geq 2$ , ${\boldsymbol{\theta}}_{0}\in\mathcal{S}^{p-1}$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Let $({\boldsymbol{\tau}}_{n})$ be a bounded sequence in $\mathbb{R}^{p}$ such that ${\boldsymbol{\theta}}_{n}={\boldsymbol{\theta}}_{0}+\nu_{n}{\boldsymbol{\tau}}_{n}\in\mathcal{S}^{p-1}$ for all $n$ , with $\nu_{n}:=1/\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}$ . Then, under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ , we have $R_{n}=1+o_{\rm P}(1)$ and $\bar{\mathbf{X}}_{n}^{\prime}{\boldsymbol{\theta}}_{0}=1+o_{\rm P}(1)$ as $n\to\infty$ , so that $W_{n}=\tilde{W}_{n}+o_{\rm P}(1)$ and $S_{n}=\tilde{S}_{n}+o_{\rm P}(1)$ as $n\to\infty$ .

This lemma ensures that, both under the sequence of null hypotheses ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{0},\kappa_{n},f}$ (taking ${\boldsymbol{\tau}}_{n}\equiv{\bf 0}$ ) and under sequences of local alternatives of the form ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ , one may focus on $\tilde{W}_{n}$ and $\tilde{S}_{n}$ when studying the asymptotic behaviors of the Watson and Wald test statistics in (4.1)–(4.2). These asymptotic behaviors are provided in the following result.

Theorem 4.1.

Fix an integer $p\geq 2$ , ${\boldsymbol{\theta}}_{0}\in\mathcal{S}^{p-1}$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Let $({\boldsymbol{\tau}}_{n})$ be a sequence in $\mathbb{R}^{p}$ converging to ${\boldsymbol{\tau}}$ and such that ${\boldsymbol{\theta}}_{n}={\boldsymbol{\theta}}_{0}+\nu_{n}{\boldsymbol{\tau}}_{n}\in\mathcal{S}^{p-1}$ for all $n$ , with $\nu_{n}:=1/\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}$ . Then, (i) under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{0},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ ; (ii) under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ , where $\chi^{2}_{p-1}(c)$ denotes the non-central chi-square distribution with $p-1$ degrees of freedom and non-centrality parameter $c$ .

This result shows that, under high concentration, the Watson and Wald test statistics remain asymptotically equivalent in probability both under the null hypothesis and under the considered sequences of local alternatives. Both tests show asymptotic size $\alpha$ under the null hypothesis, irrespective of the angular function $f$ and of the rate at which the concentration $\kappa_{n}$ diverges to infinity. Theorem 4.1 also reveals that $\nu_{n}$ describes the consistency rate of these tests, in the sense that the Watson and Wald tests show non-trivial asymptotic powers (that is, asymptotic powers in $(\alpha,1)$ ) under sequences of local alternatives of the form ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ , with $\nu_{n}^{-1}\|{\boldsymbol{\theta}}_{n}-{\boldsymbol{\theta}}_{0}\|\to c>0$ . Like in point estimation, this rate depends on $f$ and is faster than the standard parametric root- $n$ rate that is obtained for fixed $\kappa$ ; that is, compared to the alternatives that can be detected in the standard fixed- $\kappa$ situation, less severe—hence, more challenging—alternatives can be detected under high concentration.

We performed the following Monte Carlo exercise to illustrate the results in Theorem 4.1. For each combination of $a\in\{0.5,1\}$ , $b\in\{0.5,1,1.4\}$ and $\ell\in\{0,1,2,3,4\}$ , we generated $M=10,\!000$ random samples of size $n=100$ from the rotationally symmetric distribution with concentration $\kappa_{n}=n^{a}$ , angular function $z\mapsto f_{b}(z)=\exp(z^{b})$ , and location

[TABLE]

where we let ${\boldsymbol{\theta}}_{0}=(1,0,0)^{\prime}$ and $\alpha_{n\ell}:=2\arcsin(\ell/(2\nu_{n}))$ , with $\nu_{n}=\linebreak 1/\sqrt{n\kappa_{n}\varphi_{f_{b}}(\kappa_{n})}$ . The alternative locations ${\boldsymbol{\theta}}_{n\ell}$ rewrite ${\boldsymbol{\theta}}_{0}+\nu_{n}{\boldsymbol{\tau}}_{n\ell}$ for some $p$ -vector ${\boldsymbol{\tau}}_{n\ell}$ with norm $\ell$ . Clearly, $\ell=0$ refers to the null hypothesis $\mathcal{H}_{0}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ and $\ell=1,2,3,4$ correspond to increasingly severe alternatives. In each sample, we performed the Watson and Wald tests at asymptotic level $\alpha=5\%$ . Figure 3 plots, as a function of $\ell$ , the resulting rejection frequencies, or more precisely, the difference between these rejection frequencies and the corresponding theoretical limiting powers

[TABLE]

see Theorem 4.1(ii). The figure also reports the results for sample size $n=700$ , but for the case with highest concentration (i.e., the case $(a,b)=(1,1.4)$ ) for which data generation led to numerical overflow. Rejection frequencies agree well with the limiting powers (note the scale of the vertical axes), particularly for $\kappa_{n}=n$ which provides a higher concentration than $\kappa_{n}=\sqrt{n}$ . The agreement improves as the sample size increases. In all cases but the one with lowest concentration (i.e., the case $(a,b)=(0.5,0.5)$ ), the asymptotic equivalence between the Watson and Wald tests materializes already for $n=100$ .

5 Local asymptotic normality

The Watson test was shown to enjoy strong optimality properties, both in the standard asymptotic scenario where the concentration $\kappa_{n}$ is fixed and in the non-standard one where the concentration goes to zero; see Paindaveine and Verdebout (2017). In the latter scenario, the Wald test, on the contrary, fails to be optimal. In this section, we investigate the optimality properties of the Watson and Wald tests and of the spherical mean estimator under high concentration. Optimality will be in the Le Cam sense, which requires studying the Local Asymptotic Normality (LAN) of the sequence of fixed- $f$ parametric submodels at hand.

To do so, we will need to reinforce our assumptions on $f$ . Let $p(\geq 2)$ be an integer, $(\kappa_{n})$ be a positive sequence diverging to infinity, and $(t_{n})$ be a bounded positive sequence. In the sequel, we will denote as $\mathcal{F}_{\rm LAN}(p,\kappa_{n},t_{n})$ the collection of angular functions $f\in\mathcal{F}$ such that, as $\kappa\to\infty$ ,

[TABLE]

and such that, letting $h^{\pm}_{n}(s,w):=-{\textstyle{\frac{1}{2}}}t_{n}^{2}\kappa_{n}\nu_{n}^{2}s\pm c_{n}t_{n}\kappa_{n}\nu_{n}(1-s^{2})^{1/2}w^{1/2}$ , with $\nu_{n}:=1/\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}$ and $c_{n}:=(1-\frac{1}{4}\nu_{n}^{2}t_{n}^{2})^{1/2}$ ,

[TABLE]

as $n\to\infty$ , where, for $p\geq 3$ , $G_{p}$ is the cumulative distribution function of the ${\rm Beta}(\frac{1}{2},\frac{p-2}{2})$ distribution, whereas, for $p=2$ , $G_{p}$ is the cumulative distribution function of the Dirac distribution in $1$ . As shown in the next result, most angular functions $f_{b}$ do satisfy these extra assumptions, sometimes under an extremely mild restriction on the rate at which the sequence $(\kappa_{n})$ diverges to infinity with $n$ .

Proposition 5.1.

Let $p(\geq 2)$ be an integer, $(\kappa_{n})$ be a positive sequence diverging to infinity, and $(t_{n})$ be a bounded positive sequence. Then, for any $b\geq 1$ , the function $z\mapsto f_{b}(z)=\exp(z^{b})$ belongs to $\mathcal{F}_{\rm LAN}(p,\kappa_{n},t_{n})$ . Provided that there exists $\varepsilon\in(0,2)$ such that $\kappa_{n}^{b}/(\log n)\geq(1-b)/(2-\varepsilon)$ for $n$ large enough, the same holds for $f_{b}$ , with $b\in(\frac{1}{2},1)$ .

In other words, $f_{b}$ , with $b\geq 1$ , belongs to $\mathcal{F}_{\rm LAN}(p,\kappa_{n},t_{n})$ irrespective of the sequences ( $\kappa_{n}$ ) and $(t_{n})$ , whereas all angular functions $f_{b}$ , with $b\in(\frac{1}{2},1)$ , belong to $\mathcal{F}_{\rm LAN}(p,\kappa_{n},t_{n})$ in particular when $(\kappa_{n})$ diverges to infinity at least as fast as $(\log n)^{2}$ , hence e.g. when $\kappa_{n}=n^{a}$ , with $a>0$ . We then have the following LAN result.

Theorem 5.1.

Fix an integer $p\geq 2$ and ${\boldsymbol{\theta}}\in\mathcal{S}^{p-1}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Let $({\boldsymbol{\tau}}_{n})$ be a bounded sequence in $\mathbb{R}^{p}$ such that ${\boldsymbol{\theta}}_{n}={\boldsymbol{\theta}}+\nu_{n}{\boldsymbol{\tau}}_{n}\in\mathcal{S}^{p-1}$ for all $n$ , with $\nu_{n}:=1/\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}$ . Assume that $f$ belongs to $\mathcal{F}_{\rm LAN}(p,\kappa_{n},\|{\boldsymbol{\tau}}_{n}\|)$ . Then, as $n\to\infty$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ ,

[TABLE]

where the central sequence ${\boldsymbol{\Delta}}^{(n)}_{{\boldsymbol{\theta}},f}:=\nu_{n}^{-1}(\mathbf{I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime})\bar{\mathbf{X}}_{n}$ , still under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ , is asymptotically normal with mean zero and covariance matrix ${\boldsymbol{\Gamma}}_{\boldsymbol{\theta}}:=\mathbf{I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime}\!$ .

This result shows that the rate $\nu_{n}$ identified in the previous sections is actually the contiguity rate associated with the sequence of statistical experiments at hand. Remarkably, this provides one of the few semiparametric examples (if any) where the contiguity rate depends on the fixed value of the functional nuisance $f$ . Since the contiguity rate coincides with the rate of convergence of the spherical mean (see Theorem 3.1), we conclude that the spherical mean is rate-consistent. Better: since the proof of Theorem 3.1 establishes that

[TABLE]

as $n\rightarrow\infty$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ , it actually follows from Theorems 3.1 and 5.1 that the spherical mean is an asymptotically optimal estimator in the sense of the convolution theorem; see, e.g., Theorem 8.8 from van der Vaart (1998). Turning to hypothesis testing, it also follows from the LAN result above that the Watson and Wald tests from the previous section are rate-consistent, since Theorem 4.1(ii) indicates that these tests show non-trivial asymptotic powers under the sequence of contiguous alternatives involved in Theorem 5.1. Actually, in the present LAN framework, an application of the Le Cam third lemma confirms these asymptotic local powers.

To show this, fix a positive real sequence $(\kappa_{n})$ that diverges to infinity and local alternatives as in Theorem 5.1. Then, under the sequence of null hypotheses ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{0},\kappa_{n},f}$ ,

[TABLE]

is asymptotically normal with mean zero and covariance matrix ${\boldsymbol{\Gamma}}_{{\boldsymbol{\theta}}_{0}}$ ; this follows from (A.13) in the proof of Theorem 3.1. Now, by using Theorem 2.2(ii), we obtain that, under the same sequence of hypotheses,

[TABLE]

Thus, Le Cam’s third lemma entails that, under the sequence of contiguous alternatives ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ , with ${\boldsymbol{\theta}}_{n}\!={\boldsymbol{\theta}}_{0}+\nu_{n}{\boldsymbol{\tau}}_{n}$ , $\nu_{n}\!=1/\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}$ and $({\boldsymbol{\tau}}_{n})\!\to{\boldsymbol{\tau}}$ , $\mathbf{T}_{n}^{W}$ is asymptotically normal with mean ${\boldsymbol{\tau}}$ and covariance matrix ${\boldsymbol{\Gamma}}_{{\boldsymbol{\theta}}_{0}}$ , so that, under this sequence of hypotheses, $\tilde{W}_{n}=(\mathbf{T}^{W}_{n})^{\prime}{\boldsymbol{\Gamma}}_{{\boldsymbol{\theta}}_{0}}^{-}\mathbf{T}^{W}_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)},$ where $\mathbf{A}^{-}$ stands for the Moore-Penrose inverse of $\mathbf{A}$ . From contiguity, we thus obtain that $W_{n}=\tilde{W}_{n}+o_{\rm P}(1)\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)}$ under the alternatives considered, which, as announced, is in agreement with Theorem 4.1(ii). As for the Wald test, the fact that $S_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)}$ under the same sequence of alternatives directly follows from the result for the Watson test and from the fact that the null asymptotic equivalence $W_{n}=S_{n}+o_{\rm P}(1)$ in Theorem 4.1(i) extends, from contiguity, to the present contiguous alternatives.

Beyond this, one of the main interests of the LAN result in Theorem 5.1 is to pave the way to the construction of Le Cam optimal tests for the problem of testing $\mathcal{H}_{0}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ versus $\mathcal{H}_{1}:{\boldsymbol{\theta}}\neq{\boldsymbol{\theta}}_{0}$ under angular function $f$ . It directly follows from this result that, for this problem, the test rejecting the null hypothesis at asymptotic level $\alpha$ whenever

[TABLE]

is Le Cam optimal (more precisely, locally asymptotically maximin) at asymptotic level $\alpha$ . Since Theorem 2.2(ii) ensures that, under the null hypothesis,

[TABLE]

Lemma 4.1 readily entails that $Q_{n}=\tilde{W}_{n}+o_{\rm P}(1)=W_{n}+o_{\rm P}(1)$ under the null hypothesis, hence, from contiguity, also under the sequences of local alternatives above. It follows that, under the assumptions of Theorem 5.1, the Watson test is optimal in the Le Cam sense. Since the Watson test does not depend on $f$ , this optimality holds at any $f$ meeting the assumptions of Theorem 5.1. From the asymptotic equivalence result in Theorem 4.1(i) and from contiguity, this extends to the Wald test.

In the high concentration framework considered, it may be intuitively appealing to linearize the problem and apply a standard Euclidean test to the data projected onto the tangent space to $\mathcal{S}^{p-1}$ at the null location ${\boldsymbol{\theta}}_{0}$ —or equivalently, to the data $\mathbf{Y}_{ni}:={\bf P}_{{\boldsymbol{\theta}}_{0}}^{\prime}\mathbf{X}_{ni}$ , $i=1,\ldots,n$ , where ${\bf P}_{{\boldsymbol{\theta}}_{0}}$ is an arbitrary $p\times(p-1)$ matrix whose columns form an orthornormal basis of the orthogonal complement of ${\boldsymbol{\theta}}_{0}$ in $\mathbb{R}^{p}$ . The null hypothesis $\mathcal{H}_{0}:{\boldsymbol{\theta}}={\boldsymbol{\theta}}_{0}$ translates into testing that the mean of the common (under rotational symmetry about ${\boldsymbol{\theta}}_{0}$ , spherically symmetric) distribution of the $\mathbf{Y}_{ni}$ ’s is the zero vector. The Watson test can actually be seen as the (spherical) Hotelling test rejecting the null hypothesis at asymptotic level $\alpha$ whenever $n\bar{\bf Y}_{n}^{\prime}\mathbf{S}_{n}^{-1}\bar{\bf Y}_{n}>\chi^{2}_{p-1,1-\alpha}$ , with $\bar{\bf Y}_{n}:=n^{-1}\sum_{i=1}^{n}\mathbf{Y}_{ni}$ and with a standardization matrix $\mathbf{S}_{n}$ that, in line with the underlying spherical symmetry, is a multiple of the identity matrix. Quite nicely, Theorem 5.1 formally proves that this linearization provides a test that is Le Cam optimal at any $f$ . We insist, however, that it was unclear that such a linearization would provide a test that achieves optimality in the original sequence of curved statistical experiments. Not only because the impact of linearization is difficult to control, but also because it is unknown whether or not the spherical Hotelling test is optimal in any sense under the, highly concentrated and skewed, alternatives obtained in the tangent space (to the best of our knowledge, the only optimality results for the spherical Hotelling test relate to shifted spherical Gaussian distributions; see, e.g., Hallin and Paindaveine, 2002).

6 Real data illustration

The real dataset we analyze here consists in measurements of magnetic remanence directions in $n=62$ rock specimens. The objective of Remanent magnetism or equivalently Paleomagnetism is to study the strength and the direction of the Earth’s magnetic field over time. The orientation and intensity of the Earth’s magnetic field can be obtained through the record of remanent magnetism preserved in rocks. The directions of remanent magnetization allow scientists to determine the position of the Earth’s magnetic pole with respect to the study location at the time when the magnetization was acquired.

We consider here a well-known dataset on $\mathcal{S}^{2}$ that has already been used for inference on spherical location in Fisher, Lewis and Embleton (1987). The dataset, which is provided as Dataset A in Appendix B8 of this monograph, is showed in the left panel of Figure 4. Clearly, the data is highly concentrated. In line with this, the FvML maximum likelihood estimator of the concentration parameter $\kappa$ takes value $\hat{\kappa}=76.12$ , which is of the same order of magnitude as the sample size $n=62$ . Figure 4 also suggests that rotational symmetry is a plausible assumption. To assess this, we performed the three tests of rotational symmetry on $\mathcal{S}^{2}$ that were recently proposed in García-Portugués, Paindaveine and Verdebout (2019): a location test and a scatter test, that respectively show power against location-type alternatives and scatter-type alternatives to rotational symmetry (we refer to García-Portugués, Paindaveine and Verdebout, 2019 for details), as well as a hybrid test that shows power against both types of alternatives. These three tests, that are meant to test the null hypothesis of rotational symmetry about an unspecified location ${\boldsymbol{\theta}}$ , provided the $p$ -values $.844$ , $.305$ and $.607$ , respectively, hence did not lead to rejection at any usual nominal level. To somewhat assess the robustness of this result, we performed the following analysis: on the $62$ samples of size $61$ obtained by leaving one of the original observations out, we performed the same three tests of rotational symmetry and provided in Figure 5 the boxplots of the 62 $p$ -values obtained for each of the three tests. Again, at any usual nominal level, none of these subsamples led any of the three tests to reject the null hypothesis of rotational symmetry.

The various statistical methods studied in this paper are therefore perfectly suitable for the present dataset. To illustrate one of these methods, we computed the 95 $\%$ confidence cap for the spherical location defined in (3.4). The resulting confidence cap is showed in the right panel of Figure 4. This confidence zone is centered at the spherical mean $\hat{{\boldsymbol{\theta}}}=(.210,.104,.972)^{\prime}$ and, as expected in the present high concentration setup, has a very small size.

7 Wrap up

We discussed inference on the location parameter of rotationally symmetric distributions under high concentration. We did so by considering double asymptotic scenarios where the underlying concentration parameter $\kappa_{n}$ diverges to infinity at an arbitrary rate with the sample size $n$ . This significantly improves over the state of the art for directional inference under high concentration, since previous works not only focused on a parametric class of distributions (namely, the FvML one) but also restricted to asymptotics as $\kappa$ diverges to infinity with $n$ fixed. Our asymptotic results indicate that standard fixed- $\kappa$ methods are robust to high concentration, in the sense that they will remain valid in the aforementioned double asymptotic scenarios: the spherical mean remains consistent and asymptotically normal, whereas the Watson and Wald tests still asymptotically meet the level constraint. Under high concentration, however, these statistical procedures enjoy faster consistency rates than in the standard fixed- $\kappa$ asymptotic scenario. Remarkably, these consistency rates depend on the type of rotationally symmetric distributions considered, that is, they depend on the underlying angular function $f$ ; this dependence is such that the higher the concentration, the faster the consistency rates. In contrast with all previous works on high concentration, we also considered optimality issues. We showed that, under mild assumptions on $f$ , the aforementioned inference procedures enjoy strong, Le Cam-type, optimality properties. For some (not all) angular functions, optimality requires that $\kappa_{n}$ diverges to infinity sufficiently fast as a function of $n$ ; the corresponding restriction, as we have seen, is extremely mild for our running example associated with $f_{b}(z)=\exp(z^{b})$ , as optimality, for $b\in(\frac{1}{2},1)$ holds in particular when $\kappa_{n}$ diverges to infinity at least as fast as $(\log n)^{2}$ , whereas no restriction of this sort is required for $b\geq 1$ , hence in particular for the usual FvML case.

Appendix A Proofs

A.1 Proof of Theorem 2.1

The proof requires the following preliminary result.

Lemma A.1.

If $\kappa\varphi_{f}(\kappa)\nearrow\infty$ (resp., $\kappa\varphi_{f}(\kappa)\searrow 0$ ) as $\kappa\to\infty$ , then there exists $z_{0}$ such that $f$ is convex (resp., concave) in $[z_{0},\infty)$ .

Proof of Lemma A.1. Assume that $\kappa\varphi_{f}(\kappa)\nearrow\infty$ . Pick $z_{0}$ large enough so that, in $[z_{0},\infty)$ , $z\mapsto z\varphi_{f}(z)$ is monotone non-decreasing and takes its values in $[1,\infty)$ . Then, letting $g(z):=z/f(z)$ , the mean value theorem implies that, for any $a,b$ with $z_{0}\leq a<b$ ,

[TABLE]

for some $c\in(a,b)$ . Since $c\varphi(c)\geq 1$ , we must have $f^{\prime}(b)\geq f^{\prime}(a)$ . Therefore, $f^{\prime}$ is monotone non-decreasing in $[z_{0},\infty)$ , so that $f$ is convex on the same set. The proof is entirely similar for the case $\kappa\varphi_{f}(\kappa)\searrow 0$ , where $z_{0}$ is taken so that, in $[z_{0},\infty)$ , $z\mapsto z\varphi_{f}(z)$ is monotone non-increasing and takes its values in $[0,1]$ . $\square$

Proof of Theorem 2.1. Writing

[TABLE]

note that $f$ provides high concentration if and only if $A_{\kappa}/(A_{\kappa}+B_{\kappa})\to 1$ as $\kappa\to\infty$ , or equivalently, if and only if $A_{\kappa}/B_{\kappa}\to\infty$ as $\kappa\to\infty$ . In this proof, $C$ denotes a positive quantity that does not depend on $\kappa$ and whose value may change from line to line.

(i) Assume that $\kappa\varphi_{f}(\kappa)\nearrow\infty$ . Without loss of generality, restrict then to $\kappa\geq\kappa_{0}$ , where $\kappa_{0}$ is such that $f$ is convex in $[\kappa_{0}(1-\varepsilon),\infty)$ (Lemma A.1). Then, using the fact that $(1-s^{2})^{(p-3)/2}(s-(1-\varepsilon))$ is positive for $s\in(1-\varepsilon,1)$ , we have

[TABLE]

Since

[TABLE]

we conclude that

[TABLE]

as $\kappa$ diverges to infinity, so that $f$ provides high concentration.

(ii) Assume that $\kappa\varphi_{f}(\kappa)\to c$ for some $c>0$ , that is, $z\varphi_{f}(z)=c+o(1)$ as $z\to\infty$ . This means that $\varphi_{f}(z)-c/z=(\log f(z)-c\log z)^{\prime}=g(z)$ for a function $g$ that satisfies $g(z)=o(1/z)$ as $z\to\infty$ , hence that is integrable in a neighborhood of $\infty$ . For $z_{0}$ large enough so that $g(z)\leq 1$ for $z\geq z_{0}$ and $g$ is integrable in $[z_{0},\infty)$ , we then have

[TABLE]

as $z\to\infty$ , which rewrites

[TABLE]

for some constant $C$ as $z\to\infty$ . This entails that, for any $0<a<b\leq 1$ ,

[TABLE]

as $\kappa\to\infty$ . Fixing $\varepsilon\in(0,1/2)$ , this implies that

[TABLE]

so that $A_{\kappa}/B_{\kappa}=O(1)$ as $\kappa\to\infty$ , which shows that $f$ does not provide high concentration.

(iii) Assume that $\kappa\varphi_{f}(\kappa)\searrow 0$ . Fix $\tilde{\varepsilon}>\varepsilon$ and restrict, without loss of generality, to $\kappa\geq\kappa_{0}$ , where $\kappa_{0}$ is such that $f$ is concave in $[\kappa_{0}(1-\tilde{\varepsilon}),\infty)$ (Lemma A.1). Concavity ensures that

[TABLE]

Since

[TABLE]

we obtain

[TABLE]

as $\kappa$ diverges to infinity, so that $f$ does not provide high concentration. $\square$

A.2 Proof of Proposition 2.1

The proof of Proposition 2.1 requires both following preliminary results.

Lemma A.2.

For any $\xi,\zeta>-1$ ,

[TABLE]

*as $c\to\infty$ . *

Proof of Lemma A.2. Letting $y=2-z/c$ (i.e., $z=c(2-y)$ ), we have

[TABLE]

For any $y\in(0,2)$ , we have that

[TABLE]

for any $c>0$ . The result then follows from the Lebesgue Dominated Convergence Theorem. $\square$

Lemma A.3.

(i) For $b\in(0,1)$ , $(1-r)^{b}-1+br\leq 0$ for any $r\in[0,2]$ $($ recall that $z^{b}:={\rm sgn}(z)|z|^{b})$ . (ii) For $b\geq 1$ , there exists $c\in(0,1)$ such that $0\leq(1-r)^{b}-1+br\leq cbr$ for any $r\in[0,2]$ . (iii) For $b>0$ , there exists $C>0$ such that $|(1-r)^{b}-1+br|\leq Cr^{2}$ for any $r\in[0,2]$ .

Proof of Lemma A.3. (i) Fix $b\in(0,1)$ and put $g(r)=(1-r)^{b}-1+br$ . For $r\in(0,1)$ , $g^{\prime}(r)=-b(1-r)^{b-1}+b\leq 0$ and for $r\in(1,2)$ , $g^{\prime}(r)=(-(r-1)^{b}-1+br)^{\prime}=-b(r-1)^{b-1}+b\leq 0.$ Since $g$ is continuous over $[0,2]$ , this implies that $g$ is monotone non-increasing over $[0,2]$ . The result thus follows from the fact that $g(0)=0$ .

(ii) Fix $b\geq 1$ . Then $g^{\prime}(r)\geq 0$ for any $r\in(0,1)\cup(1,2)$ . The continuity of $g$ over $[0,2]$ and the fact that $g(0)=0$ thus imply that $g(r)\geq 0$ for any $r\in(0,2)$ . It remains to show that there exists $c\in(0,1)$ such that $(1-r)^{b}+br-1\leq cbr$ for any $r\in[0,2]$ , or equivalently, that there exists a positive integer $k$ for which

[TABLE]

for any $r\in[0,2]$ . Clearly, $h_{k}(r)\to h(r):=(1-r)^{b}-1$ as $k\to\infty$ and the convergence is uniform in $r\in[0,2]$ . Since $h_{2}^{\prime}$ is right-continuous at [math] and satisfies

[TABLE]

there exists $\eta>0$ such that $h^{\prime}_{2}(r)<0$ for all $r\in(0,\eta]$ , which (since $h_{2}(0)=0$ ) yields $h_{2}(r)<0$ for all $r\in(0,\eta]$ . Since, for any $r\in(0,2]$ , $h_{k}(r)$ is monotone decreasing in $k$ , we deduce that $h_{k}(r)<0$ for all $r\in(0,\eta]$ and all $k\geq 2$ . Now, put $\varepsilon:=-h_{2}(\eta)>0$ . The uniform convergence of $(h_{k})$ to $h$ ensures that there exists $k_{0}$ such that $|h_{k_{0}}(r)-h(r)|<\varepsilon/2$ for any $r\in[0,2]$ . This and the fact that $h$ is monotone decreasing in $[0,2]$ implies that, for any $r\in[\eta,2]$ ,

[TABLE]

We conclude that $h_{k_{0}}(0)=0$ and $h_{k_{0}}(r)<0$ for any $r\in(0,2]$ , so that (A.2) holds for $k=k_{0}$ .

(iii) The Cauchy formula for the remainder of Taylor expansions yields that, for any $r\in[0,\frac{1}{2}]$ , we have $(1-r)^{b}-1+br=\frac{1}{2}b(b-1)(1-\eta_{b,r}r)^{b-2}r^{2}$ for some $\eta_{b,r}\in(0,1)$ . This implies that there exists $c_{1}>0$ such that

[TABLE]

for any $r\in[0,\frac{1}{2}]$ . Now, the mapping $r\mapsto(1-r)^{b}-1+br$ is continuous over $r\in[\frac{1}{2},2]$ , so that, for any $r\in[\frac{1}{2},2]$ , we have

[TABLE]

The claim therefore holds with $C:=\max(c_{1},4c_{2})$ . $\square$

Proof of Proposition 2.1. We only need to prove that Condition (2.5) holds for any $b>0$ (the other conditions are indeed trivially fulfilled). To do so, fix $b>0$ and note that, for $f_{b}(z)=\exp(z^{b})$ , (2.5) rewrites

[TABLE]

(in this proof, all convergences are as $\kappa\to\infty$ ), that is, letting $s=1-r$ ,

[TABLE]

If $b\in(0,1)$ , then Parts (i) and (iii) of Lemma A.3 and the mean value theorem yield (below, $\eta_{b,r}\in(0,1)$ )

[TABLE]

Now, if $b\geq 1$ , then Lemma A.3(ii)–(iii) and the mean value theorem yield

[TABLE]

We therefore showed that, for any $b>0$ , there exists $K>0$ such that

[TABLE]

By letting $z=K\kappa^{b}r$ , this yields

[TABLE]

where we used Lemma A.2. This proves (A.3), hence establishes the result. $\square$

A.3 Proof of Theorem 2.2

The proof crucially relies on the following lemma.

Lemma A.4.

Fix an integer $p\geq 2$ and $f\in\mathcal{F}$ . Let $(\kappa_{n})$ be a positive real sequence that diverges to $\infty$ . Then,

[TABLE]

and

[TABLE]

as $n\to\infty$ .

Proof of Lemma A.4. (i) Write

[TABLE]

with

[TABLE]

and

[TABLE]

Letting $z=(1-s)\kappa_{n}\varphi_{f}(\kappa_{n})$ , Lemma A.2 readily yields

[TABLE]

Since (2.5) ensures that

[TABLE]

the result follows.

(ii) Using the U-statistic formulation of the variance, we have

[TABLE]

where

[TABLE]

and

[TABLE]

We start with $S_{n1}$ . Letting $z=(1-s)\kappa_{n}\varphi_{f}(\kappa_{n})$ and $\tilde{z}=(1-\tilde{s})\kappa_{n}\varphi_{f}(\kappa_{n})$ , we obtain

[TABLE]

so that Lemma A.2 provides

[TABLE]

We turn to $S_{n2}$ . Upper-bounding $(s-\tilde{s})^{2}=((1-s)-(1-\tilde{s}))^{2}$ by $2(1-s)^{2}+2(1-\tilde{s})^{2}$ , we obtain

[TABLE]

Letting $z=(1-s)\kappa_{n}\varphi_{f}(\kappa_{n})$ in two of the four integrals above, (2.5) yields

[TABLE]

We treat $S_{n3}$ by upper-bounding again $(s-\tilde{s})^{2}$ by $2(1-s)^{2}+2(1-\tilde{s})^{2}$ , which yields

[TABLE]

This completes the proof. $\square$

We can now prove Theorem 2.2.

Proof of Theorem 2.2. First note that Lemma A.4 readily yields

[TABLE]

and

[TABLE]

The result then follows by writing

[TABLE]

and

[TABLE]

and by using Lemma A.4 along with (A.4)–(A.5). $\square$

A.4 Proofs of Theorems 3.1, 3.2 and 3.3

Several proofs of this section rely on the following uniform second-order delta method (the proof is a trivial extension of the proof of Theorem 3.8 in van der Vaart, 1998).

Lemma A.5.

Let $\phi:\mathbb{R}^{p}\to\mathbb{R}$ be twice continuously differentiable in a neighborhood of $\mathbf{v}$ . Let $(\mathbf{v}_{n})$ be a sequence in $\mathbb{R}^{p}$ converging to $\mathbf{v}$ . Let $(\mathbf{T}_{n})$ be a sequence of random vectors taking their values in the domain of $\phi$ and such that $r_{n}(\mathbf{T}_{n}-\mathbf{v}_{n})$ is $O_{\rm P}(1)$ for a sequence $(r_{n})$ that diverges to infinity. Then,

[TABLE]

where $\nabla\phi(\mathbf{v})$ and $\mathbf{H}\phi(\mathbf{v})$ denote the gradient and Hessian matrix of $\phi$ at $\mathbf{v}$ , respectively.

Assuming that $\sqrt{n}\tilde{e}_{n2}^{-1/4}(\bar{\mathbf{X}}_{n}-e_{n1}{\boldsymbol{\theta}})$ is $O_{\rm P}(1)$ (this will be proved later in this section), this lemma entails that

[TABLE]

where $g_{j}:\mathbb{R}^{p}\setminus\{{\bf 0}\}\to\mathbb{R}$ is the mapping defined through $g_{j}(\mathbf{x})=x_{j}/\|\mathbf{x}\|$ . Note that this in particular yields

[TABLE]

We can now prove Theorem 3.1.

Proof of Theorem 3.1. In this proof, all expectations and variances are under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ and all stochastic convergences are as $n\to\infty$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ . Using the tangent-normal decomposition of $\mathbf{X}_{ni}$ with respect to ${\boldsymbol{\theta}}$ , write

[TABLE]

Since Theorem 2.2(ii) implies that

[TABLE]

we obtain that

[TABLE]

For any unit $p$ -vector $\mathbf{u}$ , write

[TABLE]

For any $n$ , the $Z_{ni}$ ’s are centered i.i.d. random variables such that

[TABLE]

where we used the first result in (2.6). Aiming at establishing the asymptotic normality of $\mathbf{u}^{\prime}\mathbf{W}_{n}$ , the Lindeberg condition reads

[TABLE]

Applying the Cauchy-Schwarz and Chebyshev inequalities yields

[TABLE]

Since the second convergence in (2.6) provides

[TABLE]

the Lindeberg condition in (A.12) is satisfied, so that $s_{n}^{-1}\mathbf{u}^{\prime}\mathbf{W}_{n}$ is asymptotically standard normal for any unit $p$ -vector $\mathbf{u}$ . Consequently,

[TABLE]

for any unit $p$ -vector $\mathbf{u}$ , which entails that

[TABLE]

It follows that

[TABLE]

Therefore, (A.11) holds and readily yields

[TABLE]

which, by using Theorem 2.2(ii), provides the weak limiting result in (3.1). The one in (3.2) then follows by noting that $1-({\boldsymbol{\theta}}^{\prime}\hat{{\boldsymbol{\theta}}}_{n})^{2}=(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})^{\prime}({\bf I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime})^{-}(\hat{{\boldsymbol{\theta}}}_{n}-{\boldsymbol{\theta}})$ , where $\mathbf{A}^{-}$ stands the Moore-Penrose inverse of $\mathbf{A}$ . $\square$

Proof of Theorem 3.2. Direct computations allow checking that the function $g_{j}:\mathbb{R}^{p}\setminus\{{\bf 0}\}\to\mathbb{R}$ defined through $g_{j}(\mathbf{x})=x_{j}/\|\mathbf{x}\|$ has the Hessian matrix

[TABLE]

where ${\bf e}_{j}$ stands for the $j$ th vector of the canonical basis of $\mathbb{R}^{p}$ . Therefore, premultiplying both sides of (A.10) by ${\boldsymbol{\theta}}^{\prime}$ yields

[TABLE]

where (the $\theta_{j}$ ’s are the components of ${\boldsymbol{\theta}}$ )

[TABLE]

Therefore, using (A.13), we obtain that

[TABLE]

The result then follows from Theorem 2.2(ii). $\square$

The proof of Theorem 3.3 requires the following preliminary result.

Lemma A.6.

Fix an integer $p\geq 2$ and $f\in\mathcal{F}$ . Let $({\boldsymbol{\theta}}_{n})$ be a sequence in $\mathcal{S}^{p-1}$ and $(\kappa_{n})$ be a positive real sequence that diverges to infinity. Then, under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ ,

[TABLE]

as $n\to\infty$ , where $u_{ni}=\mathbf{X}_{ni}^{\prime}{\boldsymbol{\theta}}_{n}$ refers to the tangent-normal decomposition of $\mathbf{X}_{ni}$ with respect to ${\boldsymbol{\theta}}_{n}$ .

Proof of Lemma A.6. Using the tangent-normal decomposition of $\mathbf{X}_{ni}$ with respect to ${\boldsymbol{\theta}}_{n}$ , write

[TABLE]

where

[TABLE]

and

[TABLE]

Applying the Cauchy-Schwarz inequality and using (2.6) yields

[TABLE]

which implies that $\mathbf{T}_{n1}$ converges to zero in probability. Using (2.6) along with the fact $\tilde{e}_{n2}=o(1)$ (Theorem 2.2), we obtain that $\mathbf{T}_{n2}=o(1)$ . Now, denoting as ${\rm vec}$ the operator that stacks the columns of a matrix on top of each other and using the identity ${\rm E}[\mathbf{S}_{n1}\mathbf{S}_{n1}^{\prime}]=(\mathbf{I}_{p}-{\boldsymbol{\theta}}_{n}{\boldsymbol{\theta}}_{n}^{\prime})/(p-1)$ , we obtain

[TABLE]

Using again (2.6) along with the fact $\tilde{e}_{n2}=o(1)$ thus shows that $\mathbf{T}_{n3}$ converges to zero in probability, which establishes the result. $\square$

Proof of Theorem 3.3. By using Theorem 2.2(i), it follows from Theorem 3.1 that

[TABLE]

and from Theorem 3.2 that

[TABLE]

as $n\to\infty$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}},\kappa_{n},f}$ (in this proof, all stochastic convergences are under this sequence of hypotheses). Therefore, it is sufficient to show that

[TABLE]

Since Theorem 2.2 implies that

[TABLE]

we have ${\rm E}[(Y_{n1}-1)^{2}]=\big{(}{\rm E}[Y_{n1}]-1\big{)}^{2}+{\rm Var}[Y_{n1}]={\rm Var}[Y_{n1}]=o(1),$ so that $Y_{n1}=1+o_{\rm P}(1)$ . Since the same theorem also implies that $\tilde{e}_{n2}^{1/2}/(1-e_{n2})=O(1)$ , it is sufficient to prove that $Y_{n2}=o_{\rm P}(1)$ .

To do so, write

[TABLE]

Using Lemma A.6 (with ${\boldsymbol{\theta}}_{n}\equiv{\boldsymbol{\theta}}$ ) and (A.11), we then obtain

[TABLE]

where we used (A.13). Since $n^{-1}\sum_{i=1}^{n}u_{ni}^{2}\leq 1$ almost surely, we conclude that $Y_{n2}$ is $o_{\rm P}(1)$ , which establishes the result. $\square$

A.5 Proofs of Lemma 4.1 and Theorem 4.1

Proof of Lemma 4.1. We start with $\bar{\mathbf{X}}_{n}^{\prime}{\boldsymbol{\theta}}_{0}$ . Since $\bar{\mathbf{X}}_{n}^{\prime}{\boldsymbol{\theta}}_{0}=\bar{\mathbf{X}}_{n}^{\prime}{\boldsymbol{\theta}}_{n}-\nu_{n}\bar{\mathbf{X}}_{n}^{\prime}{\boldsymbol{\tau}}_{n}$ , we have

[TABLE]

where we used the facts that $\|\bar{\mathbf{X}}_{n}\|\leq 1$ almost surely and that $e_{n1}=1+o(1)$ . Since the tangent-normal decomposition with respect to ${\boldsymbol{\theta}}_{n}$ further entails that

[TABLE]

we conclude that $\bar{\mathbf{X}}_{n}^{\prime}{\boldsymbol{\theta}}_{0}$ converges to one in quadratic mean, hence also in probability.

We turn to $R_{n}$ , which we decompose as

[TABLE]

with

[TABLE]

and

[TABLE]

Since (2.6) entails that

[TABLE]

and

[TABLE]

we have that $R_{n1}$ converges to one in quadratic mean, hence also in probability. As for $R_{n2}$ , Lemma A.6 and Theorem 2.2(ii) yield

[TABLE]

where we let $U_{n}:=(1/n)\sum_{i=1}^{n}u_{ni}^{2}$ . Since ${\boldsymbol{\theta}}_{n}={\boldsymbol{\theta}}+\nu_{n}{\boldsymbol{\tau}}_{n}$ is a unit $p$ -vector, we have ${\boldsymbol{\theta}}^{\prime}{\boldsymbol{\tau}}_{n}=-\nu_{n}\|{\boldsymbol{\tau}}_{n}\|^{2}/2$ , which yields ${\boldsymbol{\theta}}_{n}^{\prime}{\boldsymbol{\tau}}_{n}=({\boldsymbol{\theta}}+\nu_{n}{\boldsymbol{\tau}}_{n})^{\prime}{\boldsymbol{\tau}}_{n}=\nu_{n}\|{\boldsymbol{\tau}}_{n}\|^{2}/2$ . Thus, using Theorem 2.2(ii) and the fact that $U_{n}\leq 1$ almost surely, we obtain

[TABLE]

Finally, since $(\mathbf{X}_{ni}^{\prime}{\boldsymbol{\tau}}_{n})^{2}\leq\|{\boldsymbol{\tau}}_{n}\|^{2}$ almost surely, Theorem 2.2(ii) also entails that $R_{n3}=o_{\rm P}(1).$ Therefore, $R_{n}=1+o_{\rm P}(1)$ , as was to be proved. $\square$

Proof of Theorem 4.1. Since Part (i) of the result is actually a particular case of Part (ii), we only prove the latter. Accordingly, all stochastic convergences in this proof will be as $n\to\infty$ under ${\rm P}^{(n)}_{{\boldsymbol{\theta}}_{n},\kappa_{n},f}$ , with ${\boldsymbol{\theta}}_{n}={\boldsymbol{\theta}}_{0}+\nu_{n}{\boldsymbol{\tau}}_{n}$ , $\nu_{n}:=1/\sqrt{n\kappa_{n}\varphi_{f}(\kappa_{n})}$ and ${\boldsymbol{\tau}}_{n}\to{\boldsymbol{\tau}}$ . Consider then

[TABLE]

where we used Theorem 2.2(ii) and the fact that $e_{n1}=1+o(1)$ . Now, proceeding exactly as in the proof of Theorem 3.1, it can be shown that

[TABLE]

It follows that $\mathbf{T}^{W}_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\mathcal{N}\big{(}{\boldsymbol{\tau}},{\bf I}_{p}-{\boldsymbol{\theta}}_{0}{\boldsymbol{\theta}}_{0}^{\prime}\big{)},$ so that Lemma 4.1 entails that

[TABLE]

Turning then to the Wald test, consider now

[TABLE]

Since, under the sequence of hypotheses considered, Lemma A.5 implies that

[TABLE]

we have that $\mathbf{T}^{S}_{n}=({\bf I}-{\boldsymbol{\theta}}_{0}{\boldsymbol{\theta}}_{0}^{\prime})\mathbf{T}^{W}_{n}+o_{\rm P}(1).$ Using Lemma 4.1 again, this yields that $S_{n}=\tilde{S}_{n}+o_{\rm P}(1)=(\mathbf{T}^{S}_{n})^{\prime}({\bf I}_{p}-{\boldsymbol{\theta}}_{0}{\boldsymbol{\theta}}_{0}^{\prime})\mathbf{T}^{S}_{n}+o_{\rm P}(1)=(\mathbf{T}^{W}_{n})^{\prime}({\bf I}_{p}-{\boldsymbol{\theta}}_{0}{\boldsymbol{\theta}}_{0}^{\prime})\mathbf{T}^{W}_{n}+o_{\rm P}(1)=W_{n}+o_{\rm P}(1)$ , which establishes the result. $\square$

A.6 Proofs of Proposition 5.1 and Theorem 5.1

The proof of Proposition 5.1 requires the following result.

Lemma A.7.

Fix $b>0$ . Then there exists $C_{b}$ such that for any $x,y\in\mathbb{R}$ with $x,y>0$ , one has $|y^{b}-x^{b}-b(y-x)x^{b-1}|\leq C_{b}(y-x)^{2}(|x|^{b-2}+|y|^{b-2})$ .

Proof. Since $x,y>0$ , the mapping $z\mapsto z^{b}$ is continuous on the interval with end points $x$ and $y$ , and it is differentiable on the interior of this interval. The mean value theorem then yields that, for some $c$ between $x$ and $y$ ,

[TABLE]

which establishes the result. $\square$

Proof of Proposition 5.1. With $f(z)=\exp(z^{b})$ ,

[TABLE]

where we let $r=1-s$ . Let $\delta=1$ if $b\in(0,1)$ and [math] otherwise. Then, by using Lemma A.3(i)–(ii) and the fact that there exists some constant $C$ such that $|(1-r)^{b-1}-1|\leq Cr|1-r|^{\delta(b-1)}$ for any $r\in[0,2]$ , we obtain that (A.15) is upper-bounded by

[TABLE]

We may therefore focus on (5.1). Fix then a positive sequence $(\kappa_{n})$ diverging to infinity (which, for $b\in(\frac{1}{2},1)$ , is assumed to satisfy the assumption stated in the proposition), a bounded positive sequence $(t_{n})$ , and consider the quantities $h^{\pm}_{n}(s,w)$ appearing in (5.1). First note that

[TABLE]

so that, for $n$ large enough,

[TABLE]

where we let $M:=\sup_{n}t_{n}$ . Hence, for $s\notin\mathcal{I}_{n}:=(-4M\nu_{n},4M\nu_{n})$ and $n$ large enough,

[TABLE]

For $n$ large enough, we then have

[TABLE]

where, with $\mathcal{I}_{n}^{c}:=[-1,1]\setminus\mathcal{I}_{n}$ , we let

[TABLE]

and

[TABLE]

here, $C_{\varepsilon}:=1/2$ if $b\geq 1$ and $C_{\varepsilon}:=\varepsilon/4$ if $b\in(0,1)$ , where $\varepsilon>0$ is as in the statement of the proposition.

Let us first consider $T_{1n}$ . It directly follows from (A.16) that, for $n$ large enough, $\kappa_{n}s+h^{\pm}_{n}(s,w)$ and $\kappa_{n}s$ share the same sign in the integrand of $T_{1n}$ . Consequently, using Lemma A.7 then (A.16) yields

[TABLE]

for $n$ large enough. Therefore, by using again Lemma A.3(i)–(ii), we obtain that, still for $n$ large enough,

[TABLE]

Letting $z=K\kappa_{n}^{b}r$ , we obtain

[TABLE]

which shows using Lemma A.2 that $T_{n1}$ is $O(\kappa_{n}^{-b})$ , hence $o(1)$ .

Turning to $T_{n2}$ , we have

[TABLE]

which yields

[TABLE]

Consequently, if $b\geq 1$ , then $T_{n2}$ is $o(1)$ , as was to be shown. Focus then on the case $b\in(\frac{1}{2},1)$ . By assumption, for $n$ large enough,

[TABLE]

which yields

[TABLE]

so that $T_{n2}=o(1)$ . The result follows. $\square$

Proof of Theorem 5.1. Write

[TABLE]

with

[TABLE]

and

[TABLE]

Using the identity ${\boldsymbol{\tau}}_{n}^{\prime}{\boldsymbol{\theta}}=-\frac{1}{2}\nu_{n}\|{\boldsymbol{\tau}}\|^{2}$ and Lemma 4.1, we readily obtain

[TABLE]

so that we only need to show that both $L_{n2}$ and $L_{n3}$ are $o_{\rm P}(1)$ .

We start with $L_{n2}$ . Using the tangent-normal decomposition of $\mathbf{X}_{ni}$ with respect to ${\boldsymbol{\theta}}$ , write $L_{n2}=L_{n2a}+{\boldsymbol{\tau}}_{n}^{\prime}{\bf L}_{n2b}$ , where we let

[TABLE]

and

[TABLE]

We have

[TABLE]

and

[TABLE]

Now, by using Lemma A.4(i) and the fact that $f\in\mathcal{F}_{\rm LAN}(p,\kappa_{n},\|{\boldsymbol{\tau}}_{n}\|)$ , we obtain

[TABLE]

Therefore, ${\rm E}[L_{n2a}^{2}]$ and ${\rm E}[\|{\bf L}_{n2b}\|^{2}]$ are $o(1)$ , which implies that $L_{n2a}$ and ${\bf L}_{n2b}$ , hence also $L_{n2}$ , are $o_{\rm P}(1)$ .

Let us turn to $L_{n3}$ . Since ${\boldsymbol{\tau}}_{n}^{\prime}\mathbf{X}_{n1}=u_{n1}{\boldsymbol{\tau}}_{n}^{\prime}{\boldsymbol{\theta}}+v_{n1}{\boldsymbol{\tau}}_{n}^{\prime}\mathbf{S}_{n1}=-\frac{1}{2}\nu_{n}u_{n1}\|{\boldsymbol{\tau}}_{n}\|^{2}+v_{n1}{\boldsymbol{\tau}}_{n}^{\prime}\mathbf{S}_{n1}$ and $\|(\mathbf{I}_{p}-{\boldsymbol{\theta}}{\boldsymbol{\theta}}^{\prime}){\boldsymbol{\tau}}_{n}\|^{2}=\|{\boldsymbol{\tau}}_{n}\|^{2}-({\boldsymbol{\theta}}^{\prime}{\boldsymbol{\tau}}_{n})^{2}=\|{\boldsymbol{\tau}}_{n}\|^{2}-\frac{1}{4}\nu_{n}^{2}\|{\boldsymbol{\tau}}_{n}\|^{4}=c_{n}^{2}\|{\boldsymbol{\tau}}_{n}\|^{2}$ , rotation invariance yields that ${\rm E}[|L_{n3}|]$ is upper-bounded by

[TABLE]

where $U_{n1}={\boldsymbol{\theta}}_{\perp}^{\prime}\mathbf{S}_{n}$ , with ${\boldsymbol{\theta}}_{\perp}$ an arbitrary unit vector orthogonal to ${\boldsymbol{\theta}}$ . Clearly, $U_{n1}$ is equal in distribution to any marginal of a random vector that is uniformly distributed over $\mathcal{S}^{p-2}$ . Therefore, $-U_{n1}\stackrel{{\scriptstyle\mathcal{D}}}{{=}}U_{n1}$ and $W:=U_{1}^{2}$ has the cumulative distribution function $G_{p}$ in page 5, so that conditioning with respect to the sign of $U_{n1}$ yields that ${\rm E}[|L_{n3}|]$ is $o(1)$ if and only if

[TABLE]

In view of Lemma A.4(i), this is the case if and only if

[TABLE]

Since $f$ belongs to $\mathcal{F}_{\rm LAN}(p,\kappa_{n},\|{\boldsymbol{\tau}}_{n}\|)$ , the result then follows. $\square$

Bibliography28

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Arnold and Jupp (2013) {barticle} [author] \bauthor \bsnm Arnold, \bfnm R \binits R. and \bauthor \bsnm Jupp, \bfnm P. E. \binits P. E. ( \byear 2013). \btitle Statistics of orthogonal axial frames. \bjournal Biometrika \bvolume 100 \bpages 571–586. \endbibitem
2Arnold, Jupp and Schaeben (2018) {barticle} [author] \bauthor \bsnm Arnold, \bfnm Richard \binits R., \bauthor \bsnm Jupp, \bfnm Peter E \binits P. E. and \bauthor \bsnm Schaeben, \bfnm Helmut \binits H. ( \byear 2018). \btitle Statistics of ambiguous rotations. \bjournal J. Multivariate Anal. \bvolume 165 \bpages 73–85. \endbibitem
3Chikuse (2003 a) {barticle} [author] \bauthor \bsnm Chikuse, \bfnm Yasuko \binits Y. ( \byear 2003 a). \btitle Concentrated matrix Langevin distributions. \bjournal J. Multivariate Anal. \bvolume 85 \bpages 375–394. \endbibitem
4Chikuse (2003 b) {bbook} [author] \bauthor \bsnm Chikuse, \bfnm Yasuko \binits Y. ( \byear 2003 b). \btitle Statistics on Special Manifolds. \bseries Lecture Notes in Statistics \bvolume 174. \bpublisher Springer, \baddress New York. \endbibitem
5Dai and Müller (2018) {barticle} [author] \bauthor \bsnm Dai, \bfnm Xiongtao \binits X. and \bauthor \bsnm Müller, \bfnm Hans-Georg \binits H.-G. ( \byear 2018). \btitle Principal component analysis for functional data on Riemannian manifolds and spheres. \bjournal Ann. Statist. \bvolume 46 \bpages 3334–3361. \endbibitem
6Downs (2003) {barticle} [author] \bauthor \bsnm Downs, \bfnm T. D. \binits T. D. ( \byear 2003). \btitle Spherical regression. \bjournal Biometrika \bvolume 90 \bpages 655–668. \endbibitem
7Downs and Mardia (2002) {barticle} [author] \bauthor \bsnm Downs, \bfnm Thomas D \binits T. D. and \bauthor \bsnm Mardia, \bfnm KV \binits K. ( \byear 2002). \btitle Circular regression. \bjournal Biometrika \bvolume 89 \bpages 683–698. \endbibitem
8Fisher, Lewis and Embleton (1987) {bbook} [author] \bauthor \bsnm Fisher, \bfnm Nicholas I \binits N. I., \bauthor \bsnm Lewis, \bfnm Toby \binits T. and \bauthor \bsnm Embleton, \bfnm Brian JJ \binits B. J. ( \byear 1987). \btitle Statistical analysis of spherical data. \bpublisher Cambridge Univ. Press press, \baddress Cambridge. \endbibitem

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Inference for spherical location

Abstract

keywords:

keywords:

1 Introduction

2 High concentration

Theorem 2.1**.**

Proposition 2.1**.**

Theorem 2.2**.**

3 Point estimation

Theorem 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

4 Hypothesis testing

Lemma 4.1**.**

Theorem 4.1**.**

5 Local asymptotic normality

Proposition 5.1**.**

Theorem 5.1**.**

6 Real data illustration

7 Wrap up

Appendix A Proofs

A.1 Proof of Theorem 2.1

Lemma A.1**.**

A.2 Proof of Proposition 2.1

Lemma A.2**.**

Lemma A.3**.**

A.3 Proof of Theorem 2.2

Lemma A.4**.**

A.4 Proofs of Theorems 3.1, 3.2 and 3.3

Lemma A.5**.**

Lemma A.6**.**

A.5 Proofs of Lemma 4.1 and Theorem 4.1

A.6 Proofs of Proposition 5.1 and Theorem 5.1

Lemma A.7**.**

Theorem 2.1.

Proposition 2.1.

Theorem 2.2.

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

Lemma 4.1.

Theorem 4.1.

Proposition 5.1.

Theorem 5.1.

Lemma A.1.

Lemma A.2.

Lemma A.3.

Lemma A.4.

Lemma A.5.

Lemma A.6.

Lemma A.7.