Inference for spherical location under high concentration
Davy Paindaveine, Thomas Verdebout

TL;DR
This paper develops a broad semiparametric framework for inference on spherical location parameters under high concentration, revealing optimal procedures and super-efficiency of the spherical mean, with theoretical and simulation validation.
Contribution
It extends inference methods for spherical data beyond Fisher-von Mises-Langevin models to a general class, analyzing high concentration asymptotics and establishing optimality properties.
Findings
Spherical mean is super-efficient estimator of for high concentration.
Watson and Wald tests have non-standard optimality properties under high concentration.
Optimal inference procedures depend on the nuisance functional .
Abstract
Motivated by the fact that circular or spherical data are often much concentrated around a location , we consider inference about under "high concentration" asymptotic scenarios for which the probability of any fixed spherical cap centered at converges to one as the sample size diverges to infinity. Rather than restricting to Fisher-von Mises-Langevin distributions, we consider a much broader, semiparametric, class of rotationally symmetric distributions indexed by the location parameter , a scalar concentration parameter and a functional nuisance . We determine the class of distributions for which high concentration is obtained as diverges to infinity. For such distributions, we then consider inference (point estimation, confidence zone estimation, hypothesis testing) on in asymptotic scenariosâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Inference for spherical location
under high concentration
Davy Paindaveinelabel=e1][email protected] label=u1 [[ ââ
url]http://homepages.ulb.ac.be/dpaindav
ââ
Thomas Verdebout label=e2][email protected] label=u2 [[ ââ
Université libre de Bruxelles
Université libre de Bruxelles
ECARES and Département de Mathématique
Avenue F.D. Roosevelt, 50
ECARES, CP114/04
B-1050, Brussels
Belgium
Université libre de Bruxelles
ECARES and Département de Mathématique
Boulevard du Triomphe, CP210
B-1050, Brussels
Belgium
Abstract
Motivated by the fact that circular or spherical data are often much concentrated around a location , we consider inference about under high concentration asymptotic scenarios for which the probability of any fixed spherical cap centered at converges to one as the sample size diverges to infinity. Rather than restricting to Fisherâvon MisesâLangevin distributions, we consider a much broader, semiparametric, class of rotationally symmetric distributions indexed by the location parameter , a scalar concentration parameter and a functional nuisance . We determine the class of distributions for which high concentration is obtained as diverges to infinity. For such distributions, we then consider inference (point estimation, confidence zone estimation, hypothesis testing) on in asymptotic scenarios where diverges to infinity at an arbitrary rate with the sample size . Our asymptotic investigation reveals that, interestingly, optimal inference procedures on show consistency rates that depend on . Using asymptotics âĂ la Le Camâ, we show that the spherical mean is, at any , a parametrically super-efficient estimator of and that the Watson and Wald tests for enjoy similar, non-standard, optimality properties. We illustrate our results through simulations and treat a real data example. On a technical point of view, our asymptotic derivations require challenging expansions of rotationally symmetric functionals for large arguments of .
62E20, 62F30,
62F05, 62F12,
Concentrated distributions,
Directional statistics,
Le Camâs asymptotic theory of statistical experiments,
Local asymptotic normality,
Super-efficiency,
keywords:
[class=MSC]
keywords:
\setattribute
journalname
\startlocaldefs
\endlocaldefs
â and ââ
t1Corresponding author. Davy Paindaveineâs research is supported by a research fellowship from the Francqui Foundation and by the Program of Concerted Research Actions (ARC) of the UniversitĂ© libre de Bruxelles. t2Thomas Verdeboutâs research is supported by the ARC Program of the UniversitĂ© libre de Bruxelles and by the CrĂ©dit de Recherche J.0134.18 of the FNRS (Fonds National pour la Recherche Scientifique), CommunautĂ© Française de Belgique.
1 Introduction
Directional statistics is concerned with data on the unit sphere of or more generally on Riemannian manifolds such as a torus or an infinite cylinder. Directional data are present in many fields and have attracted a lot of attention in the last decade. Recent applications include analysis of magnetic remanence through copulae on product manifolds in Jupp (2015), analysis of animal movement using angular regression in Rivest et al. (2016), or analysis of flight trajectories through principal component analysis for functional data on in Dai and MĂŒller (2018), to cite only a few. For an overview of the topic, we refer to Mardia and Jupp (2000) and Ley and Verdebout (2017).
In this paper, we consider a class of distributions on admitting a density at that is proportional to , where , and is a monotone increasing function from to (throughout, densities on will be with respect to the surface area measure). The resulting distribution on the sphere will be denoted as to stress its rotational symmetry: if , then and are equal in distribution for any orthogonal matrix such that . Clearly, is the modal location on the sphere, hence plays the role of a location parameter. In contrast, is a scale or concentration parameter. This terminology is justified by the fact that, for many functions , the distribution becomes arbitrarily concentrated around as diverges to infinity; it is in particular so for the celebrated Fisherâvon MisesâLangevin (FvML) distributions, that are obtained with . FvML distributions play a central role in directional statistics, a role that can be compared to the one played by Gaussian distributions in classical multivariate setups. For instance, the responses of the circular/spherical regression models in Rivest (1986), Downs and Mardia (2002), SenGupta, Kim and Arnold (2013) and Rosenthal et al. (2014) are FvML with a location parameter that depends on the predictors.
In most applications, the location parameter is the parameter of interest, whereas the concentration parameter and the infinite-dimensional parameter are unspecified nuisances. The most classical estimator of is the spherical mean, whereas the most celebrated test for , where is fixed, is the Watson test (see Sections 3 and 4, respectively). In the standard asymptotic scenario under which diverges to infinity with fixed, the asymptotic properties of these procedures are well-known; see, e.g., Mardia and Jupp (2000). In particular, the spherical mean is root- consistent, whereas the Watson test shows non-trivial asymptotic powers under sequences of local alternatives of the form with .
In practice, the asymptotic results above are relevant in cases where the underlying concentration is neither too small nor too large. For small values of , the fixed- asymptotic distribution of the spherical mean and the corresponding asymptotic null distribution of only poorly approximate the exact distribution of these statistics, unless the sample size at hand is extremely large. This motivates considering a double asymptotic scenario where goes to zero as diverges to infinity. The observations are then assumed to form a random sample from the distribution , with , which makes it here strictly necessary to consider triangular arrays of observations. Such a âlow-concentration double asymptotic scenarioâ was considered in Paindaveine and Verdebout (2017), where it was proved that the faster goes to zero, the poorer the consistency rates of the aforementioned inference procedures. More precisely, (i) if with , then is asymptotically normal, so that the consistency rate of the spherical mean deteriorates from (in the standard fixed- case) to (in the present case); (ii) if , then the spherical mean is not consistent anymore. Similarly, in situation (i), the Watson test shows non-trivial asymptotic powers under sequences of local alternatives of the form with , and, in situation (ii), there is no sequence of alternatives under which this test would be consistent. These behaviors of the spherical mean and of the Watson test are non-standard yet expected: as the concentration gets smaller, the distribution becomes increasingly closer to the uniform distribution on for which the parameter of interest is not identifiable. In other words, inference on is increasingly challenging as decreases to zero, which reflects in the deterioration of the consistency rates above.
The situation for large concentrations is similar yet different. On the one hand, it is still so that a standard, fixed-, asymptotic analysis could in principle fail describing in a suitable way the finite-sample behaviors of the spherical mean and of the Watson test statistic under high concentration. On the other hand, inference about intuitively becomes increasingly easy as the distribution gets more and more concentrated around , which should make it possible to define âsuper-efficientâ estimators and tests on . Inference for âconcentratedâ FvML distributions actually has already been quite much considered in the literature. One of the first papers tackling inference problems for the location parameter of FvML distributions under large values of is Watson (1984), where asymptotic results as with fixed were derived. In the same asymptotic scenario, Rivest (1986) investigated the null limiting behavior of a goodness-of-fit test for FvML distributions, whereas Rivest (1989), Downs and Mardia (2002) and Downs (2003) considered spherical regression in a concentrated FvML setup. Rosenthal et al. (2014) analyzed concentrated data using a regression model with an FvML noise. Fujikoshi and Watamori (1992) obtained the asymptotic null distributions of various test statistics for again as with fixed, and derived the asymptotic powers of the corresponding tests under appropriate sequences of local alternatives. Still in the framework of FvML distributions, Watamori (1996) reviewed point estimation and (one-sample and multi-sample) hypothesis testing in the standard asymptotic scenario where with fixed and in the concentrated scenario where with fixed. Arnold and Jupp (2013) and Arnold, Jupp and Schaeben (2018) considered estimation of âhighly concentrated rotationsâ. Finally, Chikuse (2003a) considered inference for concentrated matrix FvML distributions, still in a setup where with fixed; see also Chikuse (2003b). Monographs covering inference for concentrated FvML distributions include Watson (1983) and Mardia and Jupp (2000).
This review of the literature shows that inference on under high concentration is a classical topic in directional statistics. Yet this review also reveals some important limitations in previous studies: (i) all asymptotic results available are as with fixed, while, parallel to the low-concentration case above, a double asymptotic scenario where would go to infinity with would be at least as natural (particularly so if would be allowed to diverge to infinity at an arbitrary rate as a function of ); (ii) all results are limited to the parametric case of FvML distributions, so that the asymptotic properties of the spherical mean and of the Watson test remain unknown in the broader semiparametric class of distributions; (iii) for hypothesis testing, most works focused on the null hypothesis: very few results try and describe asymptotic powers under sequences of local alternatives, and, more importantly, not a single optimality result, to the best of our knowledge, was obtained in the literature. In this paper, we therefore fill an important gap by deriving results that are getting rid of the limitations (i)â(iii).
The outline of the paper is as follows. In Section 2, we fix the notation, introduce the assumptions that will be used throughout and characterize the rotationally symmetric distributions that provide high concentration for arbitrarily large values of . In Section 3, we derive the asymptotic distribution of the spherical mean in a double asymptotic scenario where diverges to infinity at an arbitrary rate with . Interestingly, in contrast with what happens for low concentrations, the consistency rate here depends on the nuisance function . We also provide confidence zones for that quite naturally take the form of spherical caps centered at the spherical mean. In Section 4, we study the asymptotic behaviour of the Watson and Wald tests. In Section 5, we turn to optimality issues and show that, under mild assumptions on , the sequence of statistical experiments considered is locally asymptotically normal. We establish the Le Cam optimality of the spherical mean estimator and of the Watson and Wald tests under high concentration. Finally, a real data application is conducted in Section 6 and a wrap up is provided in Section 7. Proofs are collected in the appendix.
2 High concentration
Throughout, we will denote as the hypothesis under which the observations form a random sample from the distribution described in the introduction, that is, the hypothesis under which these observations are mutually independent and share the common density
[TABLE]
where is the Euler Gamma function and the constant is given by
[TABLE]
In the sequel, is assumed to be monotone non-decreasing on and monotone increasing on . Under this assumption, the location parameter is properly identified as the modal location on the sphere. One way to also make and identifiable would be to further impose . We will not impose these conditions since we also want to consider functions that are not differentiable at zero. The resulting lack of identifiability will not be an issue in the sequel since and play the role of nuisance parameters when conducting inference on .
We will often make use of the tangent-normal decomposition of with respect to , which reads , with
[TABLE]
and
[TABLE]
The cosine is associated with the latitude of with respect to the ânorth poleâ , whereas determines the corresponding hyper-longitude. Under , and are mutually independent, is uniformly distributed on , and admits the density
[TABLE]
where stands for the indicator function of the set . The moments of under will play an important role below and will be denoted as , We will also write for the corresponding variance. The function governs (jointly with ) the distribution of the angle between and , hence is sometimes referred to as an angular function.
The present paper is concerned with sequences of rotationally symmetric distributions that are asymptotically highly concentrated, meaning that the probability mass of any fixed spherical cap centered at converges to one as diverges to infinity. More precisely, we will say that the sequence of hypotheses is asymptotically highly concentrated if and only if, for any sequence () diverging to infinity and any , we have
[TABLE]
that is, if and only if converges in probability to one as soon as diverges to infinity. Since this is clearly a property that depends on only, we will say that provides high concentration if and only if (2.4) holds. Not all functions provide high concentration. The polynomial functions are examples since, for any , they yield
[TABLE]
where does not depend on . It is easy to check that does not provide high concentration either, but that the angular FvML function does. It is therefore desirable to characterize the functions providing high concentration, which is the aim of the following result.
Theorem 2.1**.**
Let be monotone non-decreasing on and monotone increasing on . Assume that is differentiable in a neighborhood of in the sense that there exists such that is differentiable over and put , where is the derivative of . Then we have the following:
- (i)
If as , then provides high concentration.
- (ii)
If as , then does not provide high concentration.
- (iii)
If as , then does not provide high concentration.
In this result, (resp., ) as means that (a) diverges to infinity (resp., converges to zero) as diverges to infinity and that (b) there exists such that is monotone non-decreasing (resp., monotone non-increasing) over . Essentially, Theorem 2.1 states that high concentration is obtained if diverges to infinity at least exponentially fast as diverges to infinity. In particular, this result confirms that the polynomial and arctan functions above do not provide high concentration, but that the FvML one does. Writing throughout , it also shows that all functions , with , do provide high concentration. These functions , which include the FvML one, will be our main running examples below.
In the rest of the paper, will stand for the collection of functions that (i) are monotone non-decreasing on and monotone increasing on , (ii) are differentiable in a neighborhood of , (iii) are such that as and (iv) satisfy, for any ,
[TABLE]
as , with . As the following result shows, our prototypical examples of angular functions providing high concentration meet these properties.
Proposition 2.1**.**
For any , the function belongs to .
As already mentioned, the moments of under will play a key role in the sequel. It will actually be important to understand the asymptotic behavior of these moments under high concentration. This is the role of the following result.
Theorem 2.2**.**
Fix an integer and . Let be a positive real sequence that diverges to infinity. Then,
[TABLE]
[TABLE]
and
[TABLE]
as .
As a corollary, we have
[TABLE]
as . Also, Vitaliâs Theorem (see, e.g., Theorem 5.5 in Shorack, 2000) readily implies that, under the conditions of Theorem 2.2, as . One could obtain an expansion of that is similar to the one in Theorem 2.2(i), but we will not do so since this is not needed for our purposes.
3 Point estimation
As mentioned in the introduction, the most classical estimator of location under rotational symmetry is the spherical mean, which is given by
[TABLE]
with . Under , for some positive scalar factor , so that the spherical mean is a moment-type estimator of . It is easy to check that it is also the maximum likelihood estimator of in the class of FvML distributions. This makes it desirable to investigate the asymptotic behavior of this estimator under high concentration. We have the following result.
Theorem 3.1**.**
Fix an integer , and . Let be a positive real sequence that diverges to infinity. Then, under ,
[TABLE]
as , so that, still under ,
[TABLE]
as (throughout, denotes convergence in distribution).
Since the sequence diverges to infinity under high concentration, Theorem 3.1 shows that the consistency rate of the spherical mean is faster than the usual parametric root- rate. Interestingly, this consistency rate depends on the angular function . For instance, for with , the rate is , hence can be arbitrary close to the standard root- rate for small , but can also provide arbitrary fast polynomial convergence. Clearly, even faster rates can be achieved by considering more extreme high concentration patterns.
The asymptotic result (3.2) in principle allows constructing confidence zones for . More precisely, it follows from this result that a confidence zone for at asymptotic confidence level is given by
[TABLE]
where denotes the upper -quantile of the distribution. This confidence zone, however, is problematic in two respects. First, it is not connected, as it takes the form of two antipodal spherical caps centered at , which is not natural. Second, while the -dependent consistency rate in Theorem 3.1 is interesting, it also leads to confidence zones that cannot be used in practice since is usually an unspecified nuisance. The first problem can be dealt with by deriving a weak limiting result for obtained from a second-order delta method (while Theorem 3.1 results from a classical, first-order, delta method). We have the following result.
Theorem 3.2**.**
Fix an integer , and . Let be a positive real sequence that diverges to infinity. Then, under , as .
This second-order result provides confidence zones at asymptotic confidence level that are given by
[TABLE]
hence take, quite naturally, the form of (connected) spherical caps centered at . Of course, these confidence zones still cannot be used in practice since is unspecified. Fortunately, Theorem 2.2(i) allows replacing the unknown quantity by the quantity , which can be naturally estimated by , where we let . The following result, that guarantees that this replacement has no asymptotic impact, opens the door to the construction of feasible confidence zones.
Theorem 3.3**.**
Fix an integer , and . Let be a positive real sequence that diverges to infinity. Then, under ,
[TABLE]
as , and, still under ,
[TABLE]
as , where, in all cases, .
As a direct corollary, a feasible version of the spherical cap confidence zone in (3.3) is
[TABLE]
We conducted the following Monte Carlo exercises to check the validity of Theorems 3.2â3.3. For each combination of and , we generated random samples of size from the rotationally symmetric distribution with location , concentration , and angular function (numerical overflows prevented us from considering larger values of ). For each and , Figure 1 plots kernel density estimates of the resulting values of and (for , raw histograms are also provided). Clearly, Figure 1 supports the theoretical results above, with possibly one exception only, namely the case of with . We therefore focused on this case and repeated the same Monte Carlo exercise with . The results, that are shown in Figure 2, are now in perfect agreement with the theory for , whereas the fit still is not excellent for . A closer inspection provides the explanation: despite the large sample size considered in Figure 2, the distribution associated with is far for being highly concentrated; see the right panel of this figure. The fit observed for in the left panel of Figure 2 therefore does not contradict our theoretical results, which would materialize for higher concentrations.
4 Hypothesis testing
We now turn to hypothesis testing and, more specifically, to the generic problem of testing the null hypothesis against the alternative , where is a fixed unit -vector. In this section, we consider the Watson test (Watson, 1983, p.â 140) and the Wald test (Hayakawa, 1990; Hayakawa and Puri, 1985), that respectively reject the null hypothesis at asymptotic level whenever
[TABLE]
and
[TABLE]
exceed the critical value . In standard asymptotic scenarios where the sample size diverges to infinity with fixed, the Watson and Wald test statistics are asymptotically equivalent in probability under the null hypothesis, hence also under sequences of contiguous alternatives, so that these tests may be considered asymptotically equivalent. As shown in Paindaveine and Verdebout (2017), however, this asymptotic equivalence does not survive asymptotic scenarios for which as diverges to infinity. This suggests investigating the asymptotic behavior of these tests under the high concentration scenarios considered in the previous sections.
To do so, let
[TABLE]
and decompose the Watson and Wald test statistics into
[TABLE]
We then have the following lemma.
Lemma 4.1**.**
Fix an integer , and . Let be a positive real sequence that diverges to infinity. Let be a bounded sequence in such that for all , with . Then, under , we have and as , so that and as .
This lemma ensures that, both under the sequence of null hypotheses (taking ) and under sequences of local alternatives of the form , one may focus on and when studying the asymptotic behaviors of the Watson and Wald test statistics in (4.1)â(4.2). These asymptotic behaviors are provided in the following result.
Theorem 4.1**.**
Fix an integer , and . Let be a positive real sequence that diverges to infinity. Let be a sequence in converging to and such that for all , with . Then, (i) under ,
[TABLE]
as ; (ii) under ,
[TABLE]
as , where denotes the non-central chi-square distribution with degrees of freedom and non-centrality parameter .
This result shows that, under high concentration, the Watson and Wald test statistics remain asymptotically equivalent in probability both under the null hypothesis and under the considered sequences of local alternatives. Both tests show asymptotic size under the null hypothesis, irrespective of the angular function and of the rate at which the concentration diverges to infinity. Theorem 4.1 also reveals that describes the consistency rate of these tests, in the sense that the Watson and Wald tests show non-trivial asymptotic powers (that is, asymptotic powers in ) under sequences of local alternatives of the form , with . Like in point estimation, this rate depends on and is faster than the standard parametric root- rate that is obtained for fixed ; that is, compared to the alternatives that can be detected in the standard fixed- situation, less severeâhence, more challengingâalternatives can be detected under high concentration.
We performed the following Monte Carlo exercise to illustrate the results in Theorem 4.1. For each combination of , and , we generated random samples of size from the rotationally symmetric distribution with concentration , angular function , and location
[TABLE]
where we let and , with . The alternative locations rewrite for some -vector with norm . Clearly, refers to the null hypothesis and correspond to increasingly severe alternatives. In each sample, we performed the Watson and Wald tests at asymptotic level . Figure 3 plots, as a function of , the resulting rejection frequencies, or more precisely, the difference between these rejection frequencies and the corresponding theoretical limiting powers
[TABLE]
see Theorem 4.1(ii). The figure also reports the results for sample size , but for the case with highest concentration (i.e., the case ) for which data generation led to numerical overflow. Rejection frequencies agree well with the limiting powers (note the scale of the vertical axes), particularly for which provides a higher concentration than . The agreement improves as the sample size increases. In all cases but the one with lowest concentration (i.e., the case ), the asymptotic equivalence between the Watson and Wald tests materializes already for .
5 Local asymptotic normality
The Watson test was shown to enjoy strong optimality properties, both in the standard asymptotic scenario where the concentration is fixed and in the non-standard one where the concentration goes to zero; see Paindaveine and Verdebout (2017). In the latter scenario, the Wald test, on the contrary, fails to be optimal. In this section, we investigate the optimality properties of the Watson and Wald tests and of the spherical mean estimator under high concentration. Optimality will be in the Le Cam sense, which requires studying the Local Asymptotic Normality (LAN) of the sequence of fixed- parametric submodels at hand.
To do so, we will need to reinforce our assumptions on . Let be an integer, be a positive sequence diverging to infinity, and be a bounded positive sequence. In the sequel, we will denote as the collection of angular functions such that, as ,
[TABLE]
and such that, letting , with and ,
[TABLE]
as , where, for , is the cumulative distribution function of the distribution, whereas, for , is the cumulative distribution function of the Dirac distribution in . As shown in the next result, most angular functions do satisfy these extra assumptions, sometimes under an extremely mild restriction on the rate at which the sequence diverges to infinity with .
Proposition 5.1**.**
Let be an integer, be a positive sequence diverging to infinity, and be a bounded positive sequence. Then, for any , the function belongs to . Provided that there exists such that for large enough, the same holds for , with .
In other words, , with , belongs to irrespective of the sequences () and , whereas all angular functions , with , belong to in particular when diverges to infinity at least as fast as , hence e.g. when , with . We then have the following LAN result.
Theorem 5.1**.**
Fix an integer and . Let be a positive real sequence that diverges to infinity. Let be a bounded sequence in such that for all , with . Assume that belongs to . Then, as under ,
[TABLE]
where the central sequence , still under , is asymptotically normal with mean zero and covariance matrix .
This result shows that the rate identified in the previous sections is actually the contiguity rate associated with the sequence of statistical experiments at hand. Remarkably, this provides one of the few semiparametric examples (if any) where the contiguity rate depends on the fixed value of the functional nuisance . Since the contiguity rate coincides with the rate of convergence of the spherical mean (see Theorem 3.1), we conclude that the spherical mean is rate-consistent. Better: since the proof of Theorem 3.1 establishes that
[TABLE]
as under , it actually follows from Theorems 3.1 and 5.1 that the spherical mean is an asymptotically optimal estimator in the sense of the convolution theorem; see, e.g., Theorem 8.8 from van der Vaart (1998). Turning to hypothesis testing, it also follows from the LAN result above that the Watson and Wald tests from the previous section are rate-consistent, since Theorem 4.1(ii) indicates that these tests show non-trivial asymptotic powers under the sequence of contiguous alternatives involved in Theorem 5.1. Actually, in the present LAN framework, an application of the Le Cam third lemma confirms these asymptotic local powers.
To show this, fix a positive real sequence that diverges to infinity and local alternatives as in Theorem 5.1. Then, under the sequence of null hypotheses ,
[TABLE]
is asymptotically normal with mean zero and covariance matrix ; this follows from (A.13) in the proof of Theorem 3.1. Now, by using Theorem 2.2(ii), we obtain that, under the same sequence of hypotheses,
[TABLE]
Thus, Le Camâs third lemma entails that, under the sequence of contiguous alternatives , with , and , is asymptotically normal with mean and covariance matrix , so that, under this sequence of hypotheses, \tilde{W}_{n}=(\mathbf{T}^{W}_{n})^{\prime}{\boldsymbol{\Gamma}}_{{\boldsymbol{\theta}}_{0}}^{-}\mathbf{T}^{W}_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)}, where stands for the Moore-Penrose inverse of . From contiguity, we thus obtain that W_{n}=\tilde{W}_{n}+o_{\rm P}(1)\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)} under the alternatives considered, which, as announced, is in agreement with Theorem 4.1(ii). As for the Wald test, the fact that S_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\chi^{2}_{p-1}\big{(}\|{\boldsymbol{\tau}}\|^{2}\big{)} under the same sequence of alternatives directly follows from the result for the Watson test and from the fact that the null asymptotic equivalence in Theorem 4.1(i) extends, from contiguity, to the present contiguous alternatives.
Beyond this, one of the main interests of the LAN result in Theorem 5.1 is to pave the way to the construction of Le Cam optimal tests for the problem of testing versus under angular function . It directly follows from this result that, for this problem, the test rejecting the null hypothesis at asymptotic level whenever
[TABLE]
is Le Cam optimal (more precisely, locally asymptotically maximin) at asymptotic level . Since Theorem 2.2(ii) ensures that, under the null hypothesis,
[TABLE]
Lemma 4.1 readily entails that under the null hypothesis, hence, from contiguity, also under the sequences of local alternatives above. It follows that, under the assumptions of Theorem 5.1, the Watson test is optimal in the Le Cam sense. Since the Watson test does not depend on , this optimality holds at any meeting the assumptions of Theorem 5.1. From the asymptotic equivalence result in Theorem 4.1(i) and from contiguity, this extends to the Wald test.
In the high concentration framework considered, it may be intuitively appealing to linearize the problem and apply a standard Euclidean test to the data projected onto the tangent space to at the null location âor equivalently, to the data , , where is an arbitrary matrix whose columns form an orthornormal basis of the orthogonal complement of in . The null hypothesis translates into testing that the mean of the common (under rotational symmetry about , spherically symmetric) distribution of the âs is the zero vector. The Watson test can actually be seen as the (spherical) Hotelling test rejecting the null hypothesis at asymptotic level whenever , with and with a standardization matrix that, in line with the underlying spherical symmetry, is a multiple of the identity matrix. Quite nicely, Theorem 5.1 formally proves that this linearization provides a test that is Le Cam optimal at any . We insist, however, that it was unclear that such a linearization would provide a test that achieves optimality in the original sequence of curved statistical experiments. Not only because the impact of linearization is difficult to control, but also because it is unknown whether or not the spherical Hotelling test is optimal in any sense under the, highly concentrated and skewed, alternatives obtained in the tangent space (to the best of our knowledge, the only optimality results for the spherical Hotelling test relate to shifted spherical Gaussian distributions; see, e.g., Hallin and Paindaveine, 2002).
6 Real data illustration
The real dataset we analyze here consists in measurements of magnetic remanence directions in rock specimens. The objective of Remanent magnetism or equivalently Paleomagnetism is to study the strength and the direction of the Earthâs magnetic field over time. The orientation and intensity of the Earthâs magnetic field can be obtained through the record of remanent magnetism preserved in rocks. The directions of remanent magnetization allow scientists to determine the position of the Earthâs magnetic pole with respect to the study location at the time when the magnetization was acquired.
We consider here a well-known dataset on that has already been used for inference on spherical location in Fisher, Lewis and Embleton (1987). The dataset, which is provided as Dataset A in Appendix B8 of this monograph, is showed in the left panel of Figure 4. Clearly, the data is highly concentrated. In line with this, the FvML maximum likelihood estimator of the concentration parameter takes value , which is of the same order of magnitude as the sample size . Figure 4 also suggests that rotational symmetry is a plausible assumption. To assess this, we performed the three tests of rotational symmetry on that were recently proposed in GarcĂa-PortuguĂ©s, Paindaveine and Verdebout (2019): a location test and a scatter test, that respectively show power against location-type alternatives and scatter-type alternatives to rotational symmetry (we refer to GarcĂa-PortuguĂ©s, Paindaveine and Verdebout, 2019 for details), as well as a hybrid test that shows power against both types of alternatives. These three tests, that are meant to test the null hypothesis of rotational symmetry about an unspecified location , provided the -values , and , respectively, hence did not lead to rejection at any usual nominal level. To somewhat assess the robustness of this result, we performed the following analysis: on the samples of size obtained by leaving one of the original observations out, we performed the same three tests of rotational symmetry and provided in Figure 5 the boxplots of the 62 -values obtained for each of the three tests. Again, at any usual nominal level, none of these subsamples led any of the three tests to reject the null hypothesis of rotational symmetry.
The various statistical methods studied in this paper are therefore perfectly suitable for the present dataset. To illustrate one of these methods, we computed the 95 confidence cap for the spherical location defined in (3.4). The resulting confidence cap is showed in the right panel of Figure 4. This confidence zone is centered at the spherical mean and, as expected in the present high concentration setup, has a very small size.
7 Wrap up
We discussed inference on the location parameter of rotationally symmetric distributions under high concentration. We did so by considering double asymptotic scenarios where the underlying concentration parameter diverges to infinity at an arbitrary rate with the sample size . This significantly improves over the state of the art for directional inference under high concentration, since previous works not only focused on a parametric class of distributions (namely, the FvML one) but also restricted to asymptotics as diverges to infinity with fixed. Our asymptotic results indicate that standard fixed- methods are robust to high concentration, in the sense that they will remain valid in the aforementioned double asymptotic scenarios: the spherical mean remains consistent and asymptotically normal, whereas the Watson and Wald tests still asymptotically meet the level constraint. Under high concentration, however, these statistical procedures enjoy faster consistency rates than in the standard fixed- asymptotic scenario. Remarkably, these consistency rates depend on the type of rotationally symmetric distributions considered, that is, they depend on the underlying angular function ; this dependence is such that the higher the concentration, the faster the consistency rates. In contrast with all previous works on high concentration, we also considered optimality issues. We showed that, under mild assumptions on , the aforementioned inference procedures enjoy strong, Le Cam-type, optimality properties. For some (not all) angular functions, optimality requires that diverges to infinity sufficiently fast as a function of ; the corresponding restriction, as we have seen, is extremely mild for our running example associated with , as optimality, for holds in particular when diverges to infinity at least as fast as , whereas no restriction of this sort is required for , hence in particular for the usual FvML case.
Appendix A Proofs
A.1 Proof of Theorem 2.1
The proof requires the following preliminary result.
Lemma A.1**.**
If (resp., ) as , then there exists such that is convex (resp., concave) in .
Proof of Lemma A.1. Assume that . Pick large enough so that, in , is monotone non-decreasing and takes its values in . Then, letting , the mean value theorem implies that, for any with ,
[TABLE]
for some . Since , we must have . Therefore, is monotone non-decreasing in , so that is convex on the same set. The proof is entirely similar for the case , where is taken so that, in , is monotone non-increasing and takes its values in .
Proof of Theorem 2.1. Writing
[TABLE]
note that provides high concentration if and only if as , or equivalently, if and only if as . In this proof, denotes a positive quantity that does not depend on and whose value may change from line to line.
(i) Assume that . Without loss of generality, restrict then to , where is such that is convex in (Lemma A.1). Then, using the fact that is positive for , we have
[TABLE]
Since
[TABLE]
we conclude that
[TABLE]
as diverges to infinity, so that provides high concentration.
(ii) Assume that for some , that is, as . This means that for a function that satisfies as , hence that is integrable in a neighborhood of . For large enough so that for and is integrable in , we then have
[TABLE]
as , which rewrites
[TABLE]
for some constant as . This entails that, for any ,
[TABLE]
as . Fixing , this implies that
[TABLE]
so that as , which shows that does not provide high concentration.
(iii) Assume that . Fix and restrict, without loss of generality, to , where is such that is concave in (Lemma A.1). Concavity ensures that
[TABLE]
Since
[TABLE]
we obtain
[TABLE]
as diverges to infinity, so that does not provide high concentration.
A.2 Proof of Proposition 2.1
The proof of Proposition 2.1 requires both following preliminary results.
Lemma A.2**.**
For any ,
[TABLE]
*as . *
Proof of Lemma A.2. Letting (i.e., ), we have
[TABLE]
For any , we have that
[TABLE]
for any . The result then follows from the Lebesgue Dominated Convergence Theorem.
Lemma A.3**.**
(i) For , for any recall that . (ii) For , there exists such that for any . (iii) For , there exists such that for any .
Proof of Lemma A.3. (i) Fix and put . For , and for , Since is continuous over , this implies that is monotone non-increasing over . The result thus follows from the fact that .
(ii) Fix . Then for any . The continuity of over and the fact that thus imply that for any . It remains to show that there exists such that for any , or equivalently, that there exists a positive integer for which
[TABLE]
for any . Clearly, as and the convergence is uniform in . Since is right-continuous at [math] and satisfies
[TABLE]
there exists such that for all , which (since ) yields for all . Since, for any , is monotone decreasing in , we deduce that for all and all . Now, put . The uniform convergence of to ensures that there exists such that for any . This and the fact that is monotone decreasing in implies that, for any ,
[TABLE]
We conclude that and for any , so that (A.2) holds for .
(iii) The Cauchy formula for the remainder of Taylor expansions yields that, for any , we have for some . This implies that there exists such that
[TABLE]
for any . Now, the mapping is continuous over , so that, for any , we have
[TABLE]
The claim therefore holds with .
Proof of Proposition 2.1. We only need to prove that Condition (2.5) holds for any (the other conditions are indeed trivially fulfilled). To do so, fix and note that, for , (2.5) rewrites
[TABLE]
(in this proof, all convergences are as ), that is, letting ,
[TABLE]
If , then Parts (i) and (iii) of Lemma A.3 and the mean value theorem yield (below, )
[TABLE]
Now, if , then Lemma A.3(ii)â(iii) and the mean value theorem yield
[TABLE]
We therefore showed that, for any , there exists such that
[TABLE]
By letting , this yields
[TABLE]
where we used Lemma A.2. This proves (A.3), hence establishes the result.
A.3 Proof of Theorem 2.2
The proof crucially relies on the following lemma.
Lemma A.4**.**
Fix an integer and . Let be a positive real sequence that diverges to . Then,
[TABLE]
and
[TABLE]
as .
Proof of Lemma A.4. (i) Write
[TABLE]
with
[TABLE]
and
[TABLE]
Letting , Lemma A.2 readily yields
[TABLE]
Since (2.5) ensures that
[TABLE]
the result follows.
(ii) Using the U-statistic formulation of the variance, we have
[TABLE]
where
[TABLE]
[TABLE]
and
[TABLE]
We start with . Letting and , we obtain
[TABLE]
so that Lemma A.2 provides
[TABLE]
We turn to . Upper-bounding by , we obtain
[TABLE]
Letting in two of the four integrals above, (2.5) yields
[TABLE]
We treat by upper-bounding again by , which yields
[TABLE]
This completes the proof.
We can now prove Theorem 2.2.
Proof of Theorem 2.2. First note that Lemma A.4 readily yields
[TABLE]
and
[TABLE]
The result then follows by writing
[TABLE]
[TABLE]
and
[TABLE]
and by using Lemma A.4 along with (A.4)â(A.5).
A.4 Proofs of Theorems 3.1, 3.2 and 3.3
Several proofs of this section rely on the following uniform second-order delta method (the proof is a trivial extension of the proof of Theorem 3.8 in van der Vaart, 1998).
Lemma A.5**.**
Let be twice continuously differentiable in a neighborhood of . Let be a sequence in converging to . Let be a sequence of random vectors taking their values in the domain of and such that is for a sequence that diverges to infinity. Then,
[TABLE]
where and denote the gradient and Hessian matrix of at , respectively.
Assuming that is (this will be proved later in this section), this lemma entails that
[TABLE]
where is the mapping defined through . Note that this in particular yields
[TABLE]
We can now prove Theorem 3.1.
Proof of Theorem 3.1. In this proof, all expectations and variances are under and all stochastic convergences are as under . Using the tangent-normal decomposition of with respect to , write
[TABLE]
Since Theorem 2.2(ii) implies that
[TABLE]
we obtain that
[TABLE]
For any unit -vector , write
[TABLE]
For any , the âs are centered i.i.d. random variables such that
[TABLE]
where we used the first result in (2.6). Aiming at establishing the asymptotic normality of , the Lindeberg condition reads
[TABLE]
Applying the Cauchy-Schwarz and Chebyshev inequalities yields
[TABLE]
Since the second convergence in (2.6) provides
[TABLE]
the Lindeberg condition in (A.12) is satisfied, so that is asymptotically standard normal for any unit -vector . Consequently,
[TABLE]
for any unit -vector , which entails that
[TABLE]
It follows that
[TABLE]
Therefore, (A.11) holds and readily yields
[TABLE]
which, by using Theorem 2.2(ii), provides the weak limiting result in (3.1). The one in (3.2) then follows by noting that , where stands the Moore-Penrose inverse of .
Proof of Theorem 3.2. Direct computations allow checking that the function defined through has the Hessian matrix
[TABLE]
where stands for the th vector of the canonical basis of . Therefore, premultiplying both sides of (A.10) by yields
[TABLE]
where (the âs are the components of )
[TABLE]
Therefore, using (A.13), we obtain that
[TABLE]
The result then follows from Theorem 2.2(ii).
The proof of Theorem 3.3 requires the following preliminary result.
Lemma A.6**.**
Fix an integer and . Let be a sequence in and be a positive real sequence that diverges to infinity. Then, under ,
[TABLE]
as , where refers to the tangent-normal decomposition of with respect to .
Proof of Lemma A.6. Using the tangent-normal decomposition of with respect to , write
[TABLE]
where
[TABLE]
and
[TABLE]
Applying the Cauchy-Schwarz inequality and using (2.6) yields
[TABLE]
which implies that converges to zero in probability. Using (2.6) along with the fact (Theorem 2.2), we obtain that . Now, denoting as the operator that stacks the columns of a matrix on top of each other and using the identity , we obtain
[TABLE]
Using again (2.6) along with the fact thus shows that converges to zero in probability, which establishes the result.
Proof of Theorem 3.3. By using Theorem 2.2(i), it follows from Theorem 3.1 that
[TABLE]
and from Theorem 3.2 that
[TABLE]
as under (in this proof, all stochastic convergences are under this sequence of hypotheses). Therefore, it is sufficient to show that
[TABLE]
Since Theorem 2.2 implies that
[TABLE]
we have {\rm E}[(Y_{n1}-1)^{2}]=\big{(}{\rm E}[Y_{n1}]-1\big{)}^{2}+{\rm Var}[Y_{n1}]={\rm Var}[Y_{n1}]=o(1), so that . Since the same theorem also implies that , it is sufficient to prove that .
To do so, write
[TABLE]
Using Lemma A.6 (with ) and (A.11), we then obtain
[TABLE]
where we used (A.13). Since almost surely, we conclude that is , which establishes the result.
A.5 Proofs of Lemma 4.1 and Theorem 4.1
Proof of Lemma 4.1. We start with . Since , we have
[TABLE]
where we used the facts that almost surely and that . Since the tangent-normal decomposition with respect to further entails that
[TABLE]
we conclude that converges to one in quadratic mean, hence also in probability.
We turn to , which we decompose as
[TABLE]
with
[TABLE]
and
[TABLE]
Since (2.6) entails that
[TABLE]
and
[TABLE]
we have that converges to one in quadratic mean, hence also in probability. As for , Lemma A.6 and Theorem 2.2(ii) yield
[TABLE]
where we let . Since is a unit -vector, we have , which yields . Thus, using Theorem 2.2(ii) and the fact that almost surely, we obtain
[TABLE]
Finally, since almost surely, Theorem 2.2(ii) also entails that Therefore, , as was to be proved.
Proof of Theorem 4.1. Since Part (i) of the result is actually a particular case of Part (ii), we only prove the latter. Accordingly, all stochastic convergences in this proof will be as under , with , and . Consider then
[TABLE]
where we used Theorem 2.2(ii) and the fact that . Now, proceeding exactly as in the proof of Theorem 3.1, it can be shown that
[TABLE]
It follows that \mathbf{T}^{W}_{n}\stackrel{{\scriptstyle\mathcal{D}}}{{\to}}\mathcal{N}\big{(}{\boldsymbol{\tau}},{\bf I}_{p}-{\boldsymbol{\theta}}_{0}{\boldsymbol{\theta}}_{0}^{\prime}\big{)}, so that Lemma 4.1 entails that
[TABLE]
Turning then to the Wald test, consider now
[TABLE]
Since, under the sequence of hypotheses considered, Lemma A.5 implies that
[TABLE]
we have that Using Lemma 4.1 again, this yields that , which establishes the result.
A.6 Proofs of Proposition 5.1 and Theorem 5.1
The proof of Proposition 5.1 requires the following result.
Lemma A.7**.**
Fix . Then there exists such that for any with , one has .
Proof. Since , the mapping is continuous on the interval with end points and , and it is differentiable on the interior of this interval. The mean value theorem then yields that, for some between and ,
[TABLE]
which establishes the result.
Proof of Proposition 5.1. With ,
[TABLE]
where we let . Let if and [math] otherwise. Then, by using Lemma A.3(i)â(ii) and the fact that there exists some constant such that for any , we obtain that (A.15) is upper-bounded by
[TABLE]
We may therefore focus on (5.1). Fix then a positive sequence diverging to infinity (which, for , is assumed to satisfy the assumption stated in the proposition), a bounded positive sequence , and consider the quantities appearing in (5.1). First note that
[TABLE]
so that, for large enough,
[TABLE]
where we let . Hence, for and large enough,
[TABLE]
For large enough, we then have
[TABLE]
where, with , we let
[TABLE]
and
[TABLE]
here, if and if , where is as in the statement of the proposition.
Let us first consider . It directly follows from (A.16) that, for large enough, and share the same sign in the integrand of . Consequently, using Lemma A.7 then (A.16) yields
[TABLE]
for large enough. Therefore, by using again Lemma A.3(i)â(ii), we obtain that, still for large enough,
[TABLE]
Letting , we obtain
[TABLE]
which shows using Lemma A.2 that is , hence .
Turning to , we have
[TABLE]
which yields
[TABLE]
Consequently, if , then is , as was to be shown. Focus then on the case . By assumption, for large enough,
[TABLE]
which yields
[TABLE]
so that . The result follows.
Proof of Theorem 5.1. Write
[TABLE]
with
[TABLE]
[TABLE]
and
[TABLE]
Using the identity and Lemma 4.1, we readily obtain
[TABLE]
so that we only need to show that both and are .
We start with . Using the tangent-normal decomposition of with respect to , write , where we let
[TABLE]
and
[TABLE]
We have
[TABLE]
and
[TABLE]
Now, by using Lemma A.4(i) and the fact that , we obtain
[TABLE]
Therefore, and are , which implies that and , hence also , are .
Let us turn to . Since and , rotation invariance yields that is upper-bounded by
[TABLE]
where , with an arbitrary unit vector orthogonal to . Clearly, is equal in distribution to any marginal of a random vector that is uniformly distributed over . Therefore, and has the cumulative distribution function in page 5, so that conditioning with respect to the sign of yields that is if and only if
[TABLE]
In view of Lemma A.4(i), this is the case if and only if
[TABLE]
Since belongs to , the result then follows.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Arnold and Jupp (2013) {barticle} [author] \bauthor \bsnm Arnold, \bfnm R \binits R. and \bauthor \bsnm Jupp, \bfnm P. E. \binits P. E. ( \byear 2013). \btitle Statistics of orthogonal axial frames. \bjournal Biometrika \bvolume 100 \bpages 571â586. \endbibitem
- 2Arnold, Jupp and Schaeben (2018) {barticle} [author] \bauthor \bsnm Arnold, \bfnm Richard \binits R., \bauthor \bsnm Jupp, \bfnm Peter E \binits P. E. and \bauthor \bsnm Schaeben, \bfnm Helmut \binits H. ( \byear 2018). \btitle Statistics of ambiguous rotations. \bjournal J. Multivariate Anal. \bvolume 165 \bpages 73â85. \endbibitem
- 3Chikuse (2003 a) {barticle} [author] \bauthor \bsnm Chikuse, \bfnm Yasuko \binits Y. ( \byear 2003 a). \btitle Concentrated matrix Langevin distributions. \bjournal J. Multivariate Anal. \bvolume 85 \bpages 375â394. \endbibitem
- 4Chikuse (2003 b) {bbook} [author] \bauthor \bsnm Chikuse, \bfnm Yasuko \binits Y. ( \byear 2003 b). \btitle Statistics on Special Manifolds. \bseries Lecture Notes in Statistics \bvolume 174. \bpublisher Springer, \baddress New York. \endbibitem
- 5Dai and MĂŒller (2018) {barticle} [author] \bauthor \bsnm Dai, \bfnm Xiongtao \binits X. and \bauthor \bsnm MĂŒller, \bfnm Hans-Georg \binits H.-G. ( \byear 2018). \btitle Principal component analysis for functional data on Riemannian manifolds and spheres. \bjournal Ann. Statist. \bvolume 46 \bpages 3334â3361. \endbibitem
- 6Downs (2003) {barticle} [author] \bauthor \bsnm Downs, \bfnm T. D. \binits T. D. ( \byear 2003). \btitle Spherical regression. \bjournal Biometrika \bvolume 90 \bpages 655â668. \endbibitem
- 7Downs and Mardia (2002) {barticle} [author] \bauthor \bsnm Downs, \bfnm Thomas D \binits T. D. and \bauthor \bsnm Mardia, \bfnm KV \binits K. ( \byear 2002). \btitle Circular regression. \bjournal Biometrika \bvolume 89 \bpages 683â698. \endbibitem
- 8Fisher, Lewis and Embleton (1987) {bbook} [author] \bauthor \bsnm Fisher, \bfnm Nicholas I \binits N. I., \bauthor \bsnm Lewis, \bfnm Toby \binits T. and \bauthor \bsnm Embleton, \bfnm Brian JJ \binits B. J. ( \byear 1987). \btitle Statistical analysis of spherical data. \bpublisher Cambridge Univ. Press press, \baddress Cambridge. \endbibitem
