Tyler shape depth
Davy Paindaveine, Germain Van Bever

TL;DR
This paper introduces Tyler shape depth, a new data depth concept for shape matrices in multivariate analysis, enabling robust estimation, hypothesis testing, and ranking of shapes based on data directions.
Contribution
It proposes Tyler shape depth, a novel depth measure for shape matrices, with theoretical properties and applications in estimation, testing, and outlier detection.
Findings
Proves invariance, quasi-concavity, and continuity of Tyler shape depth.
Establishes existence and Fisher consistency of the deepest shape matrix.
Derives consistency results and a Glivenko-Cantelli-type theorem.
Abstract
In many problems from multivariate analysis, the parameter of interest is a shape matrix, that is, a normalized version of the corresponding scatter or dispersion matrix. In this paper, we propose a depth concept for shape matrices that involves data points only through their directions from the center of the distribution. We use the terminology Tyler shape depth since the resulting estimator of shape, namely the deepest shape matrix, is the median-based counterpart of the M-estimator of shape of Tyler (1987). Beyond estimation, shape depth, like its Tyler antecedent, also allows hypothesis testing on shape. Its main benefit, however, lies in the ranking of shape matrices it provides, whose practical relevance is illustrated in principal component analysis and in shape-based outlier detection. We study the invariance, quasi-concavity and continuity properties of Tyler shape depth, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Measurement and Metrology Techniques
Tyler Shape Depth
Davy Paindaveine∗ and Germain Van Bever†
∗ ECARES and Departement of Mathematics, Université libre de Bruxelles, Avenue F.D. Roosevelt, 50, CP114/04, B-1050, Brussels, Belgium
† Departement of Mathematics and Namur Institute for Complex Systems, Université de Namur, Rempart de la Vierge, 8, 5000, Namur, Belgium
Abstract
In many problems from multivariate analysis, the parameter of interest is a shape matrix, that is, a normalized version of the corresponding scatter or dispersion matrix. In this paper, we propose a depth concept for shape matrices that involves data points only through their directions from the center of the distribution. We use the terminology Tyler shape depth since the resulting estimator of shape, namely the deepest shape matrix, is the median-based counterpart of the M-estimator of shape of Tyler (1987). Beyond estimation, shape depth, like its Tyler antecedent, also allows hypothesis testing on shape. Its main benefit, however, lies in the ranking of shape matrices it provides, whose practical relevance is illustrated in principal component analysis and in shape-based outlier detection. We study the invariance, quasi-concavity and continuity properties of Tyler shape depth, the topological and boundedness properties of the corresponding depth regions, existence of a deepest shape matrix and prove Fisher consistency in the elliptical case. Finally, we derive a Glivenko–Cantelli-type result and establish almost sure consistency of the deepest shape matrix estimator.
Keywords: Elliptical distribution; Principal component analysis; Robustness; Shape matrix; Statistical depth; Test for sphericity.
1 Introduction
Location depths measure the centrality of an arbitrary -vector with respect to a probability measure over . Letting denote the unit sphere in , the most famous instance is the Tukey (1975) halfspace depth
[TABLE]
throughout, refers to probability under the probability measure at hand. The halfspace depth regions form a family of nested convex subsets of . The Tukey median , defined as the barycenter of the innermost region , extends the univariate median to the multivariate case and is a robust alternative to the expectation . Beyond location estimation, many inference problems can be tackled in a robust and nonparametric way by using the center-outward order resulting from depth (Liu et al., 1999). Adopting the parametric depth approach from Mizera (2002), can also be read as a measure of how well the location parameter value fits the probability measure . In this spirit, possible outliers in a data set will be flagged by low depth values , where denotes the corresponding empirical probability measure.
In this paper, the focus is on multivariate dispersion parameters known as shape matrices. For simplicity, we restrict in this section to elliptical distributions. Let be the collection of symmetric positive definite matrices and write , with , for the unique square root of in . We will say that is elliptical with location , scatter and generating variate if has the same distribution as , where is uniformly distributed over and is independent of the nonnegative scalar random variable , which has unit median. This median constraint makes identifiable without moment conditions. Under finite second-order moments, the resulting covariance matrix is . Inference problems such as constructing confidence regions for require one to estimate the full scatter matrix or the full covariance matrix . However, in many other problems, it is sufficient to estimate the shape matrix, that is, the normalized scatter matrix
[TABLE]
This shape matrix could be normalized, as in Paindaveine (2008), to have determinant one or upper-left entry one, which would not affect the results of the present paper. For instance, principal components may be equivalently computed from , from or, when it exists, from , since proportional matrices have the same eigenvectors. Now, when it comes to fixing the number of principal components on which to base further analysis, one typically looks at the proportions of explained variances (), where denotes the th largest eigenvalue of . Similarly to eigenvectors, these proportions remain unchanged if they are computed from rather than from or . In principal component analysis it is thus sufficient to estimate, or know the value of, .
There is a large literature on inference for shape. Our main contribution is to provide a depth concept for shape, measuring how well a given shape matrix fits the probability measure . While the proposed depth will lead to estimators and tests for shape, its main added value is the ordering of shape matrices resulting from depth. Here, we mention only two possible applications. The first is in principal component analysis, where a suitable estimator is to be chosen. When it is suspected that there might be outliers, one might for instance consider the minimum covariance determinant estimates , , trimming a proportion of the data; see 5. Choosing should typically be done on the basis of the proportion of outliers, which is usually unknown. We will show that the shape depth of allows for an informed choice on . The second application concerns outlier detection in multivariate financial times series. Since volatility is key in finance, one might flag atypical days in such series by spotting days that associate a low depth to a shape estimator computed from the full series.
Depth for a generic parameter has been discussed in Mizera (2002). Depth for scatter matrices, however, has only been considered in Zhang (2002), Chen et al. (2018) and Paindaveine and Van Bever (2018), and only the last considers depth for shape matrices.
2 Shape depth
Tyler (1987) introduced a shape notion extending the concept of shape outside the elliptical setup. Consider the multivariate sign defined as if and as [math] otherwise, where is the inverse of . Let also , where stacks the columns of on top of each other and where is the identity matrix. The Tyler shape of , say, is then the matrix satisfying
[TABLE]
If is smooth at , in the sense that no hyperplane containing has a strictly positive -probability mass, then (2.1) admits a unique solution that agrees with the true shape if is elliptical with location (Tyler, 1987; Kent and Tyler, 1988; Dümbgen, 1998). In essence, (2.1) identifies the shape making the origin of most central in an -sense for the distribution of , that is, it defines as the solution of
[TABLE]
The present work finds its source in the idea that one may define the shape of as the matrix making the origin of most central for the distribution of , in the halfspace depth sense, that is, as the value of maximizing the following depth.
Definition 2.1** (Tyler shape depth).**
Let be a probability measure over and fix . (i) For any , the fixed- shape depth of with respect to is . (ii) The shape depth of with respect to is , where is the Tukey median of .
We will use the notation for both halfspace and Tyler shape depths, as the vector or matrix nature of the argument will remove any ambiguity. The fixed- shape depth can equivalently be defined as where the infimum is over all symmetric matrices ; see Lemma 1 in the Supplementary Material. While, in view of (2.2), can be seen as a sign-based mean concept for shape, the maximizer of Tyler shape depth is of a median nature. The main benefit of the proposed depth does not come from the deepest shape itself but rather from the ranking of shapes it provides; see 5.
Definition 2.1(ii) calls for some comments. Two approaches were considered in the literature for Tyler shape in the case of unspecified center: the Tyler (1987) plug-in approach, which replaces the unknown with some location functional, and the Hettmansperger and Randles (2002) approach, which jointly solves and ; existence of a unique solution to joint location and scatter M-estimating equations was studied in Maronna (1976) under ellipticity and in Tatsuoka and Tyler (2000) for non-elliptic cases. Both approaches provide two distinct shapes outside the elliptical setup. In contrast, for the proposed depth, the plug-in and joint maximization approaches always lead to the same shape: irrespective of , the objective function is indeed maximized
at and , since is, for any , maximized at .
An alternative way to obtain an unspecified location version of Tyler shape is to construct it on pairwise differences (Dümbgen, 1998). We will not investigate this for our shape depth, since the sample version of the resulting depth would lead to a much heavier computational burden.
3 Main properties
In this section, we study the main properties of the shape depth and of the corresponding depth regions . Topological statements for subsets of and for functions defined on will refer to the topology whose open sets are generated by balls of the form , where is the usual geodesic distance on : with the classical log mapping on , this distance is such that , where is the Frobenius norm of (Bhatia, 2007). We start with the following continuity result.
Theorem 3.1**.**
Let be a probability measure over and fix . Then, (i) is upper semicontinuous on ; (ii) the depth region is closed for any ; (iii) if is absolutely continuous with respect to the Lebesgue measure, then is also lower semicontinuous, hence continuous, on .
We will say that a subset of is bounded if and only if for some ; since satisfies the triangle inequality, we need only consider balls centered at . Moreover, we will say that is smooth at if and only if , with . We then have the following result.
Theorem 3.2**.**
Let be a probability measure over and fix . Then the depth region is bounded and compact for any .
The main reason to work with geodesic distance rather than Frobenius distance is that, unlike , the metric space is complete; see, e.g., Proposition 10 in Bhatia and Holbrook (2006). This is what allows us to establish compacity in Theorem 3.2, which is the main ingredient for the following result.
Theorem 3.3**.**
Let be a probability measure over and fix . (i) If is non-empty, then there exists a shape maximizing . In particular, (ii) if is smooth at , then such a deepest shape exists.
While the previous result guarantees existence of a deepest shape for absolutely continuous probability measures, uniqueness is not guaranteed in general. Parallel to what is done for the Tukey median, we then define the fixed- shape matrix of as the barycenter of the deepest shape region of , that is, as the shape matrix satisfying
[TABLE]
with . Two remarks are in order. First, the integrals in (3.1) exist and are finite since is a bounded subset of : for any . Second, the following convexity result implies that has maximal depth.
Theorem 3.4**.**
Let be a probability measure over and fix . Then, (i) is quasi-concave: for with and ; (ii) the region is convex for any .
This defines the fixed- shape of a probability measure under the very mild condition that is non-empty, hence in particular when is smooth at . Of course, it is important that, under ellipticity, this agrees with the elliptical concept of shape provided in 1. The following Fisher consistency result confirms that this is the case.
Theorem 3.5**.**
Let be an elliptical probability measure over with location and shape . Then, for any , and, provided that , the equality holds if and only if . Letting be Beta with parameters and , the maximal depth is .
In this result, equals the probability that the generating variate associated to is equal to zero. Lemma 2 in Paindaveine and Van Bever (2017) implies that the maximal depth in Theorem 3.5 is monotone decreasing in if does not depend on , in which case the maximal depth is convergent as goes to infinity. Since has the same distribution as , where is -variate standard normal, the limit is equal to . The proof of Theorem 3.5 requires the following result.
Theorem 3.6**.**
Let be a probability measure over and fix . Then, for any shape matrix , any invertible matrix and any -vector ,
[TABLE]
where is the shape matrix proportional to .
This shows that the fixed- shape depth and the corresponding regions behave well under affine transformations, and in particular under changes of the measurement units. Affine invariance is a classical requirement in location depth (Zuo and Serfling, 2000).
Tyler shape depth is a sign concept in the sense that it depends on the underlying random vector only through its multivariate sign . In the elliptical case, it follows that, if the distribution does not charge the center of the distribution, this depth does not depend on the distribution of the underlying generating variate . More precisely, we have the following result.
Theorem 3.7**.**
Let be an elliptical probability measure over with location and shape . Then, (i) for some that does not depend on or on ,
[TABLE]
(ii) for ,
[TABLE]
with is Beta distributed with parameters and .
The function in this result does not depend on , so that depth, under ellipticity, depends on through and only, with the dependence on not affecting the induced ranking of shape matrices. It is easy to check that the explicit bivariate elliptical depth in (3.3) is compatible with the general results obtained above. While it seems very challenging to obtain an explicit expression for the function in (3.2), numerical experiments lead us to conjecture that, irrespective of the dimension , the mapping is of the form for some function .
The results of this section extend to the unspecified-location shape depth and to the corresponding regions . Theorems 3.1 to 3.4 hold for any fixed and their unspecified- versions are simply obtained by substituting for throughout. In particular, the existence of an unspecified-location deepest shape matrix is guaranteed if is smooth at , or, more generally, if is non-empty. Under unspecified location, the shape of is then defined as the barycenter of the set of shape matrices maximizing . In view of the affine equivariance of , i.e., , the affine-invariance/equivariance properties
[TABLE]
follow directly from Theorem 3.6, to which we refer for the definition of . Finally, Theorems 3.5 and 3.7 also readily extend to the unspecified-location case, since for any elliptical probability measure with location . In particular, if is elliptical with shape , then the unspecified- shape depth is uniquely maximized at , if the distribution is not degenerate at a single point.
4 Consistency
When -variate observations are available, we define the sample fixed- depth of a shape matrix as , where is the empirical probability measure associated with , and its unspecified-location version as . In this section, we state a Glivenko–Cantelli-type result for these sample depths and investigate consistency of max-depth shape estimators.
Theorem 4.1**.**
Let be a probability measure over and let denote the empirical probability measure associated with a random sample of size from . Then, (i) for any , almost surely as ; (ii) if is absolutely continuous with respect to the Lebesgue measure, then almost surely as .
We illustrate this result in the bivariate elliptical case associated with Theorem 3.7(ii). Figure 1 provides contour plots of in terms of and , for various bivariate, arbitrarily elliptical, probability measures. The sign nature of shape depth ensures that these contours, along with their empirical counterparts, are distribution-free in the class of elliptical distributions that do not charge the centre of symmetry. Figure 1 also reports the empirical contour plots obtained from a random sample of size drawn from the corresponding bivariate normal distributions. Clearly, the results support the consistency in Theorem 4.1(i).
In 3, the shape of was defined as the barycenter of the collection of -deepest shape matrices. In the empirical case, a natural estimator is the corresponding shape matrix computed from the empirical probability measure associated with the sample at hand; existence here follows from the fact that may only take values (). The same argument ensures the existence of the sample deepest shape in the unspecified-location case. The sample Tukey median was one of the first affine-equivariant location estimators with a high breakdown point. It would therefore be interesting to investigate whether the affine-equivariant shape estimator , parallel to the Maronna–Stahel–Yohai P-estimators of scatter, also has a high breakdown point (Tyler, 1994). Since this is beyond the scope of this paper, we focus on consistency of sample deepest shapes.
Theorem 4.2**.**
Let be a probability measure over and let denote the empirical probability measure associated with a random sample of size from . (i) Fix and assume that is non-empty. Then, almost surely as . (ii) If is absolutely continuous with respect to the Lebesgue measure, then almost surely as .
The specified- result in Theorem 4.2(i) holds in particular if is smooth at . The unspecified- result requires a more stringent smoothness assumption, namely absolute continuity of . This assumption, which is already present in Theorem 4.1(ii), is only needed to control the impact of replacing by in and . Figure 1 also supports Theorem 4.2(i) since, in each sample considered, the sample deepest shape is close to its population counterpart.
5 Two applications
5.1 Choosing a shape matrix estimator in principal component analysis
There is a vast literature on scatter or shape estimation. Among the most famous estimators are the minimum covariance determinant scatters . Recall that, in the empirical case, is the covariance matrix with the smallest determinant among covariance matrices computed using only a proportion of the observations. The choice of the trimming proportion is crucial, as the loss in efficiency can be very large if the trimming is excessive; see, for example, Croux and Haesbroeck (1999) or Paindaveine and Van Bever (2014). Choosing is therefore difficult, as it should be taken large, but not so large as to incorporate outliers. In this section, we consider robust principal component analysis based on the shape estimators and show that Tyler shape depth allows the making of an informed choice of .
For several contamination proportions , we independently generated bivariate samples of independent observations, each comprising clean observations and outliers. With bivariate normal with zero mean and covariance matrix and bivariate normal with mean and identity covariance matrix, the clean observations are equal to in distribution, whereas the outliers are distributed, in equal proportions, as or . Two simulations were conducted, one for and one for ; clearly, the former simulation provides a harder robustness problem than the latter. We consider estimating the first principal direction of the uncontaminated distribution. For any , a natural estimator is, up to a sign, the first eigenvector of . Denoting as this estimate in replication , estimation performance can be measured through the mean squared error
[TABLE]
where is the angle between the population first eigendirection and its estimate . Figure 2 plots as a function of ; the Monte Carlo exercise was performed for every value of . The results confirm that, for any contamination proportion , a suitable value of should be identified. The optimal value basically coincides with in the easy case , whereas, in the harder one , is slightly smaller than for large contaminations. This is no surprise: when outliers are hard to identify, the estimators , with , are likely to be based on some outliers, which will strongly affect the estimation performance.
In this framework, Tyler shape depth, as announced, may be very useful to select a suitable value of . We suggest choosing based on visual inspection of the curve , where denotes the empirical measure associated with the optimal subsample leading to . The rationale is the following: for small, will remain relatively high as long as no outlier is added to the optimal subsample. As increases and outliers are added in the computation of , the depth will sharply decrease, thereby forming a kink in . The selected for a given dataset, , should therefore be the largest value for which exhibit a stable behaviour. Figure 2 plots the curve for the values of and considered above and clearly illustrates the behaviour of the depth curves just described. When the outliers are easily identifiable, the kinks occur at , which coincides with . In the harder case, where outliers and clean data tend to be mixed, the selected value is still remarkably close to . In conclusion, Tyler shape depth, and the ranking of shape matrices it provides, yield an effective visual tool that allows the selection of a sensible trimming proportion in a data-driven way when conducting, e.g., a principal component analysis.
5.2 Outlier detection
For each trading day between February 1st, 2015 and February 1st, 2017, we collected the Nasdaq Composite and SP500 stock indices every five minutes and computed their returns, that is, the differences between two logs of consecutive index values. The returns on a given day form a bivariate dataset of usually observations, though the number of observations varies due to missing values; days with fewer than bivariate returns were discarded. The resulting dataset comprises observations on trading days.
Our analysis studies the joint behaviour of the bivariate returns in order to determine which trading days are atypical. An important source of atypicality is associated with the overall scale of the bivariate returns, which alternate between periods of high and low volatility. Such deviations can easily be detected by comparing the trace of any scatter measure on intraday data with that on the whole dataset, so we focus instead on detecting atypical joint volatility, i.e., days on which the ratios of the marginal volatilities or the correlations between the returns deviate greatly from their global behaviour.
Let denote the minimum covariance determinant shape estimator computed from the full collection of returns with maximal shape depth. More precisely, denoting as the empirical distribution of the full collection of returns, let , for . The value obtained is , with corresponding depth . This high depth value ensures that is an excellent proxy for the deepest shape matrix , so the computation of is unnecessary. Returns at the beginning of each trading period are known to be more volatile and should be discarded in shape estimation, so the robustness of is an obvious asset: the value of allows us to adaptively discard days on which the volatility deviates from its global pattern. The procedure discarded more than half of the corresponding intra-day returns for 17 days, and, remarkably, of these days lie within the two atypical periods mentioned in the next paragraph.
For each day , we evaluated the depth of the global shape estimate with respect to the empirical distribution of the bivariate returns on day . The left panel of Figure 3 presents the depth values . Vertical lines mark major events affecting the shape of the volatility, while the two greyed rectangles cover two periods during which the markets notoriously gave atypical returns: the first period follows the devaluation of the Yuan on August 11th, 2015 which saw rapid changes in the stock markets, including large devaluations on August 24th, event (a). The second period covers the beginning of 2016, when a slump in oil prices made stocks relying on oil very volatile compared to others. This resulted in atypical shape behaviour during January 22 – February 9; this last day, event (b), had the sharpest loss for the SP500 index. The other events are (c) the decision of the European Central Bank on March 10th, 2016 to extend quantitative easing thereby slashing interest rates, which had a significant positive impact on both the Nasdaq and SP500, but more pronounced for the latter, (d) the positive impact on the financial stocks following Fed officials’ comments on the possibility of rate hike made on May 27, 2016, and (e) the aftermath of Donald Trump’s election on November 9th. Detection of atypical observations was achieved by flagging outliers with a depth so low that it is outside the box-and-whiskers plot. This resulted in 12 flagged days, each either being one of the events described above or lying in one of the greyed regions.
We also computed the halfspace shape depth of the global estimate for each day (Paindaveine and Van Bever, 2018). The right panel of Figure 3, a plot of versus , shows a clear positive association. Halfspace shape depth values seem to have a higher concentration than Tyler’s, because the former maximizes a concept of scatter depth in scale and may be able to find scatter estimates better suited to the data. Indeed, a decrease in volatility in one of the marginals might be balanced by considering a scatter with a smaller scale which would have a large depth value. A byproduct of this is the fact that, when evaluating halfspace shape depth, the difficult maximisation step in scale seems to be crucial in correctly computing the depth ranking of the data, which can be affected by small deviations. More importantly, while events (a) and (b) receive low depth with respect to both concepts, only Tyler shape depth succeeds in flagging days associated with events (c) to (e) as outlying.
6 Hypothesis testing for shape
In the previous section, we presented two specific applications of shape depth. The concept also allows us to tackle more standard inference problems for shape, such as point estimation and hypothesis testing. Here, we consider testing against at level , where is fixed, based on a random sample from a -variate elliptical distribution with known location and unknown shape . In view of Theorem 3.5, a natural depth-based test, say, rejects the null for small values of , where is the empirical distribution of . Since is discrete, achieving null size in general requires randomization. The resulting test thus rejects the null hypothesis if , rejects the null hypothesis with probability if , and does not reject the null hypothesis if , where is the null -quantile of and is the amount of randomization. Under the assumption that does not charge the center of the distribution, is distribution-free under the null hypothesis, which allows estimating and arbitrarily well through simulations. Prior to applying the test below for at level with sample sizes , , these were estimated from mutually independent standard normal samples for each sample size, yielding , , and . Distribution-freeness of under the null hypothesis actually extends to the class of distributions with elliptical directions (Randles, 2000).
We performed two simulations in the bivariate case. The first considers the problem of testing the null hypothesis of sphericity about and compares the finite-sample powers of with those of some competitors. For each value of we generated independent random samples of size from the normal distribution with location and shape
[TABLE]
and from the corresponding elliptical Cauchy distribution. The value corresponds to the null hypothesis, whereas provide increasingly severe alternatives. We took and for the normal and Cauchy samples in order to obtain roughly the same rejection frequencies in both cases.
For each sample, we carried out six tests at nominal level : (i) the test described above; (ii) the Gaussian test from John (1972), or more precisely, its extension to elliptical distributions with finite fourth-order moments from Hallin and Paindaveine (2006); (iii) the sign test from Hallin and Paindaveine (2006); (iv) the Wald test based on the Tyler (1987) scatter matrix; (v)–(vi) the tests from Paindaveine and Van Bever (2014) based on the shape estimator in 5, with and . The tests (ii)–(vi) were performed based on their asymptotic null distribution. The rejection frequencies in Figure 4 reveal that performs very similarly to, although it may be slightly dominated by, the sign-based tests in (iii)–(iv) but performs very well under heavy tails, where it beats all other tests. As expected, the Gaussian test collapses under heavy tails and the minimum covariance determinant tests show low empirical power.
The second simulation tests , with and specified location , and compares the tests above in terms of the level robustness (He et al., 1990). We considered mixture distributions with several contamination levels . Here, is a bivariate, normal or elliptical Cauchy, null random vector. The contamination random vector was chosen as follows: (a) has the same distribution as the vector obtained by rotating about the origin by degrees; (b) has the same elliptical distribution as but its shape is ; (c) is obtained by multiplying the vector in (b) by four. The uncontaminated distribution puts more mass along the horizontal axis. In (a), the contamination typically shows along the main bisector, whereas the contamination in (b) is uniformly distributed over the unit circle. As for (c), the contamination combines the directional feature of (b) with radial outlyingness. For each combination of distribution, normal or Cauchy, of contamination pattern, (a)–(c), and of contamination level, or , we generated independent random samples of size . Figure 5 plots the resulting rejection frequencies and reveals the very good robustness of the depth-based test ; recall that, irrespective of , the target rejection frequency is here . In particular, always dominates its sign-based competitors (iii)–(iv). The minimum covariance determinant tests (v)–(vi) dominate in terms of robustness but exhibit poor finite-sample power. Radial outliers strongly affect the Gaussian test.
Summing up, the test associated with the proposed shape depth provides a good balance between efficiency and robustness. The improved robustness compared to its sign-based competitors is obtained at a very slight loss of power. Depth-based procedures can thus be defined for standard inference problems on shape, and will tend to perform as well as sign-based procedures. As shown in 5, however, shape depth provides a whole ranking of shape matrices that allows addressing less standard applications.
7 Perspectives for future research
The present work offers quite rich research perspectives. The asymptotic distributions of the sample depths and as well as those of the corresponding deepest shape estimators could be studied. Investigating the robustness properties of these shape estimators would also be of interest, in particular to see whether these estimators have a high breakdown point. Regarding hypothesis testing, it would be desirable to define depth-based tests for other shape problems, such as testing the null hypothesis that two populations share the same shape.
Another key point is related to computational aspects. Since Tyler shape depth was defined through halfspace depth, it can in principle be evaluated by using the numerous packages that are dedicated to halfspace depth. The definition of Tyler shape depth suggests that evaluation of this depth in dimension requires the computation of halfspace depth in dimension . Fortunately, redundancies in the random vector reduce the dimension from to as shown by the following result.
Theorem 7.1**.**
Let be a probability measure over and fix . Let be the vector stacking the lower-diagonal entries of on top of each other and be deprived of its first component. Then, , with .
It follows that, for and , Tyler shape depth dominates its halfspace counterpart from Paindaveine and Van Bever (2018) from a computational point of view. There is, though, probably room for ad hoc algorithms to compute Tyler shape depth more efficiently. It would also be desirable to design iterative algorithms for the computation of deepest shape matrices.
Appendix A Appendix
As in the main manuscript, will refer to probability under the probability measure at hand. However, it will sometimes be needed to emphasize the underlying probability measure, in which case we will write , , , etc.
Many of the subsequent results require the following lemma.
Lemma A.1**.**
Let be a probability measure over and fix . Write C^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}\setminus\{\theta\}:(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}\geq{{\rm tr}(M)/k}\big{\}} and \tilde{C}^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}:(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}\geq{{\rm tr}(M)/k}\big{\}}, where is defined as if and as [math] otherwise. Then, for any and any ,
[TABLE]
where collects the symmetric matrices with arbitrary trace, is the subset of of matrices with trace , and where is the collection of matrices in with Frobenius norm one.
Proof.
It directly follows from the definition of Tyler shape depth that
[TABLE]
When runs over , the matrix satisfying runs over the collection of matrices. Since for any , this yields
[TABLE]
Letting be equal to one if condition A holds and to zero otherwise, this provides
[TABLE]
where we have used the fact that {\rm pr}\big{(}C_{\theta,V}^{M}\big{)} is unchanged when is replaced with for any . The same invariance property explains that the infimum over in (A.2) may be replaced with an infimum over for any . Finally, the result for follows from (A.1) by noting that for any and that cannot provide the infimum in (A.1). The proof is complete. ∎
Proof of Theorem 3.1.
(i) Fix and consider , where was defined in Lemma A.1. Since is closed, the mapping is upper semicontinuous for weak convergence. Now, Slutzky’s lemma entails that, as , the measure defined by converges weakly to the one defined by . Therefore, is upper semicontinuous at . From Lemma A.1, we then obtain that
[TABLE]
is upper semicontinuous, as it is the infimum of a collection of upper semicontinuous functions. (ii) The result follows from the fact that the depth region is the inverse image of by the upper semicontinuous function . (iii) Fix a sequence in such that . In view of Lemma A.1 again, we can, for any , pick such that . Compactness of ensures that we can extract a subsequence of that converges to . Writing for the indicator function of the set , the dominated convergence theorem then yields that
[TABLE]
as . The absolute continuity assumption on guarantees that -almost everywhere. Consequently,
[TABLE]
We conclude that, if is absolutely continuous with respect to the Lebesgue measure, then is also lower semicontinuous, hence continuous. ∎
The proof of Theorem 3.2 requires the following result.
Lemma A.2**.**
*Let be a probability measure over and fix . Write if and 0 otherwise. For any , further let , so that . Then, as . *
Proof of Lemma A.2.
Since is increasing in over and is larger than or equal to for any positive , we have that exists and is such that . Now, fix a decreasing sequence converging to [math] and consider an arbitrary sequence such that
[TABLE]
Since is compact, we can consider a subsequence that converges to ; without loss of generality, we can of course assume that this subsequence is such that is an increasing sequence. Let then . Clearly, is a decreasing sequence of sets with , so that
[TABLE]
Now, for any , we have {\rm pr}\big{[}u_{\theta}^{X}\in\cup_{v\in C_{\ell}}\{y:|v^{T}y|\leq c_{n_{\ell}}\}\big{]}\geq{\rm pr}(|v_{n_{\ell}}^{T}u_{\theta}^{X}|\leq c_{n_{\ell}})\geq t_{\theta,P}(c_{n_{\ell}})-(1/n_{\ell}), which implies that . ∎
Proof of Theorem 3.2.
Fix and denote as the largest eigenvalue of . Similarly, denote . Possible ties are unimportant below. Letting and be arbitrary corresponding unit eigenvectors, Lemma A.1 provides, with
[TABLE]
where we used the inequality which follows from the constraint , and where is defined in Lemma A.2. Therefore,
[TABLE]
Now, ad absurdum, take such that is unbounded. This implies that there exists a sequence in satisfying for any and for which . Since , we must have that . Lemma A.2 and (A.3) then imply that for large enough, a contradiction. Consequently, is bounded for any .
Now, Lemma C.1 in Paindaveine and Van Bever (2018) readily implies that a bounded subset of is also totally bounded, in the sense that, for any , it can be covered by finitely many balls of the form . Part (i) of the result and Theorem 3.1(ii) thus entail that, for any , the region is closed and totally bounded. The result then follows from the completeness of the metric space . ∎
Proof of Theorem 3.3.
Let . By assumption, is non-empty. Thus, and the result holds if . We may therefore assume that . For any , pick then in , where is defined as for . Fix . For large enough, all terms of the sequence belong to the compact set ; see Theorem 3.2. Thus, there exists a subsequence that converges in , to say. For any , all eventually belong to the closed set , so that . Therefore, for any such , which establishes the result. ∎
The proof of Theorem 3.4 requires the following preliminary result.
Lemma A.3**.**
For any and any symmetric matrix , the mapping is quasi-convex, that is, for any and any , , with .
Proof.
We treat two cases separately. (i) Assume first that . Write
[TABLE]
Since , the weighted harmonic-arithmetic matrix inequality then shows that, for any ,
[TABLE]
as was to be showed; we refer to Lemma 2.1(vii) in Lawson and Lim, 2013 for the aforementioned inequality. (ii) Assume then that . Without loss of generality, assume that and . If , then for any and the result trivially holds. Hence, we may assume that or , which implies that for a unique . From continuity, pick then such that, for any ,
[TABLE]
By applying Part (i) of the proof with and , we obtain that, for any ,
[TABLE]
Since {\rm tr}(MV_{t})y^{T}V^{-1}_{t}y\leq 0\leq\max\big{\{}{\rm tr}(MV_{a})y^{T}V^{-1}_{a}y,{\rm tr}(MV_{b})y^{T}V^{-1}_{b}y\big{\}} for any , the result follows. ∎
Proof of Theorem 3.4.
(i) Write , where and are fixed. First note that, letting , Lemma A.1 yields
[TABLE]
Writing again , Lemma A.3 thus yields that, for any ,
[TABLE]
The result then follows from (A.4). (ii) If , then Part (i) of the result entails that , so that . ∎
The proof of Theorem 3.5 requires both following lemmas.
Lemma A.4**.**
Let be elliptical over with location [math] and shape . Then, where is uniformly distributed over the unit sphere .
Lemma A.5**.**
Let be elliptical over with location [math] and shape . Then, for any , , where is uniformly distributed over .
Proof of Lemma A.4.
In the spherical setup considered, we have that, for any ,
[TABLE]
where is uniform over . Lemma A.1 then entails that
[TABLE]
Decomposing into , where is a orthogonal matrix and where is a diagonal matrix, this yields
[TABLE]
By using successively the facts that and for any , we obtain
[TABLE]
The result then follows from Theorem 2 from Paindaveine and Van Bever (2017), that states that the last infimum in (A.5) is equal to . ∎
Proof of Lemma A.5.
Fix and let be a random -vector with . Write , where is a orthogonal matrix and is a diagonal matrix with . The affine invariance property from Theorem 3.6 entails that
[TABLE]
Denoting by the first vector of the canonical basis of , we then have
[TABLE]
where is uniform over . To have D_{0}(V,P^{X})={\rm pr}(X\neq 0){\rm pr}\big{(}U_{1}^{2}\geq 1/k\big{)}, the inequality in (A.7) needs to be an equality, which requires that for all , hence that . ∎
We can now prove Theorem 3.5.
Proof of Theorem 3.5.
Lemmas AA.4-AA.5 establish the result in the spherical case associated with and . For general values of and , note that is elliptical with location [math], shape , and satisfies . Writing
[TABLE]
affine invariance then entails that
[TABLE]
with equality if and only if , that is, if and only if . ∎
Proof of Theorem 3.6.
In the proof of Theorem 3.4, we showed that
[TABLE]
Using the fact that for some orthogonal matrix , this readily yields
[TABLE]
as was to be shown. The affine-equivariance property of the depth regions readily follows. ∎
The proof of Theorem 3.7 requires the following lemma, whose proof is straightforward, hence is omitted.
Lemma A.6**.**
For any such that , we have
[TABLE]
Proof of Theorem 3.7.
(i) If is elliptical with location and shape , then is equal in distribution to , where is uniformly distributed over the unit sphere and is independent of the nonnegative random variable . Theorem 3.6 then yields
[TABLE]
where is as in (A.8). Now, for any , Lemma A.1 entails that
[TABLE]
Combining with (A.9), we obtain
[TABLE]
which establishes Part (i) of the result. (ii) Assume that is bivariate standard normal and fix . We aim at evaluating
[TABLE]
see (A). To do so, it will be convenient to parametrise and the matrix as
[TABLE]
with and . Indeed, makes the probability in (A.11) equal to one, which cannot be the infimum. Decomposing into , where is a orthogonal matrix and where , with , involves the eigenvalues of or, equivalently, of , we have
[TABLE]
where is still bivariate standard normal. Since for any , we have
[TABLE]
which allows us to restrict to positive values of . We will show below that for any . A direct computation shows that, for ,
[TABLE]
and
[TABLE]
Since does not depend on , (A) leads to
[TABLE]
It is easy to check that is differentiable over with a derivative of the form , where for any , and that
[TABLE]
We treat the cases and separately.
(a) Assume that . If , then and Theorem 3.5 establishes the result. If , then has no critical point and
[TABLE]
and
[TABLE]
so that (A.14) yields
[TABLE]
where we have used the fact that if has a Fisher-Snedecor distribution, then has a distribution.
(b) Assume now that . Then the only critical point of is , so that, irrespective of the fact that this critical point is a local minimum/maximum of ,
[TABLE]
and
[TABLE]
Lemma A.6 yields
[TABLE]
and
[TABLE]
hence also
[TABLE]
Therefore, (A.14) finally provides
[TABLE]
This proves the result for the case where is bivariate standard normal. The general result then follows from Part (i) of the theorem. ∎
Proof of Theorem 4.1.
(i) Let and be two probability measures over and fix . Fix and assume, without loss of generality, that . Lemma A.1 entails that there exists such that {\rm pr}_{P}\big{(}C^{M_{0}}_{\theta,V}\big{)}\leq D_{\theta}(V,P)+\varepsilon, where we still use the notation C^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}\setminus\{\theta\}:(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}\geq{\rm tr}(M)/k\big{\}}. Consequently, using Lemma A.1 again,
[TABLE]
with . Since this holds for any and for any , we have
[TABLE]
It thus only remains to show that is a Vapnik-Chervonenkis class. To do so, note that C^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}\setminus\{\theta\}:(x-\theta)^{T}V^{-1/2}MV^{-1/2}(x-\theta)\geq 0\big{\}}, so that , with D_{\theta,A}=\big{\{}x\in\mathbb{R}^{k}:(x-\theta)^{T}A(x-\theta)\geq 0\big{\}}. Theorem 4.6 from Dudley (2014) implies that is a Vapnik-Chervonenkis class . It then follows from Lemma 2.6.17(ii) in van der Vaart and Wellner (1996) that , hence also , is a Vapnik-Chervonenkis class. (ii) The proof is long and technical, but follows along the same lines as the proof of Theorem 2.2 in Paindaveine and Van Bever (2018), hence is omitted for the sake of brevity. ∎
Proof of Theorem 4.2.
(i) Recall from (3.1) that is defined as the barycentre of , with . The mapping is upper semicontinuous (Theorem 3.1) and constant over . Clearly, it is easy to define a mapping that is upper semicontinuous, agrees with in the complement of , and for which is the unique maximizer. By using Theorem 4.1, it follows from Theorem 2.12 and Lemma 14.3 in Kosorok (2008) that almost surely as . Part (i) of the result then follows from the fact that, in neighbourhoods of the form , there exists a constant such that , where is the Frobenius distance. (ii) The proof is entirely similar, hence is omitted. ∎
Proof of Theorem 7.1.
Let . Since , there exists a full-rank matrix such that Therefore, there exists a full-rank matrix such that . One can, for example, take , where is the usual duplication matrix. It follows that
[TABLE]
where we used the fact that has full column rank. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bhatia (2007) Bhatia, R. (2007) Positive Definite Matrices . Princeton, NJ: Princeton University Press.
- 2Bhatia and Holbrook (2006) Bhatia, R. and Holbrook, J. (2006) Riemannian geometry and matrix geometric means. Linear Algebra Appl. , 413 , 594–618.
- 3Chen et al. (2018) Chen, M., Gao, C. and Ren, Z. (2018) Robust covariance and scatter matrix estimation under huber’s contamination model. Ann. Statist. , 46 , 1932–1960.
- 4Croux and Haesbroeck (1999) Croux, C. and Haesbroeck, G. (1999) Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivariate Anal. , 71 , 161–190.
- 5Dudley (2014) Dudley, R. M. (2014) Uniform Central Limit Theorems . Cambridge University Press, 2nd edition edn.
- 6Dümbgen (1998) Dümbgen, L. (1998) On Tyler’s M 𝑀 M -functional of scatter in high dimension. Ann. Inst. Statist. Math. , 50 , 471–491.
- 7Hallin and Paindaveine (2006) Hallin, M. and Paindaveine, D. (2006) Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann. Statist. , 34 , 2707–2756.
- 8He et al. (1990) He, X., Simpson, D. and Portnoy, S. (1990) Breakdown robustness of tests. J. Amer. Statist. Assoc. , 85 , 446–452.
