Tyler shape depth

Davy Paindaveine; Germain Van Bever

arXiv:1706.00666·math.ST·December 3, 2018

Tyler shape depth

Davy Paindaveine, Germain Van Bever

PDF

Open Access

TL;DR

This paper introduces Tyler shape depth, a new data depth concept for shape matrices in multivariate analysis, enabling robust estimation, hypothesis testing, and ranking of shapes based on data directions.

Contribution

It proposes Tyler shape depth, a novel depth measure for shape matrices, with theoretical properties and applications in estimation, testing, and outlier detection.

Findings

01

Proves invariance, quasi-concavity, and continuity of Tyler shape depth.

02

Establishes existence and Fisher consistency of the deepest shape matrix.

03

Derives consistency results and a Glivenko-Cantelli-type theorem.

Abstract

In many problems from multivariate analysis, the parameter of interest is a shape matrix, that is, a normalized version of the corresponding scatter or dispersion matrix. In this paper, we propose a depth concept for shape matrices that involves data points only through their directions from the center of the distribution. We use the terminology Tyler shape depth since the resulting estimator of shape, namely the deepest shape matrix, is the median-based counterpart of the M-estimator of shape of Tyler (1987). Beyond estimation, shape depth, like its Tyler antecedent, also allows hypothesis testing on shape. Its main benefit, however, lies in the ranking of shape matrices it provides, whose practical relevance is illustrated in principal component analysis and in shape-based outlier detection. We study the invariance, quasi-concavity and continuity properties of Tyler shape depth, the…

Equations124

D (θ, P) = u \in S^{k - 1} in f pr {u^{T} (X - θ) \geq 0};

D (θ, P) = u \in S^{k - 1} in f pr {u^{T} (X - θ) \geq 0};

V = \frac{k}{tr ( Σ )} Σ = \frac{k}{tr ( Σ _{P} )} Σ_{P} .

V = \frac{k}{tr ( Σ )} Σ = \frac{k}{tr ( Σ _{P} )} Σ_{P} .

E (W_{θ, V}) = 0.

E (W_{θ, V}) = 0.

0 = ar g m \in R^{k^{2}} min E (∥ W_{θ, V} - m ∥^{2}) .

0 = ar g m \in R^{k^{2}} min E (∥ W_{θ, V} - m ∥^{2}) .

{\rm vec}\,V_{\theta,P}={\int_{{\rm vec}\,R_{\theta}(\alpha_{*},P)}v\,dv}\Big{/}{\int_{{\rm vec}\,R_{\theta}(\alpha_{*},P)}dv},

{\rm vec}\,V_{\theta,P}={\int_{{\rm vec}\,R_{\theta}(\alpha_{*},P)}v\,dv}\Big{/}{\int_{{\rm vec}\,R_{\theta}(\alpha_{*},P)}dv},

D_{A\theta+b}\big{(}V_{A},P^{AX+{b}}\big{)}=D_{\theta}(V,P^{X}),\quad R_{A\theta+b}(\alpha,P^{AX+b})=\big{\{}V_{A}:V\in R_{\theta}(\alpha,P)\big{\}},

D_{A\theta+b}\big{(}V_{A},P^{AX+{b}}\big{)}=D_{\theta}(V,P^{X}),\quad R_{A\theta+b}(\alpha,P^{AX+b})=\big{\{}V_{A}:V\in R_{\theta}(\alpha,P)\big{\}},

D_{\theta_{0}}(V,P)=(1-{\rm pr}[\{\theta_{0}\}])\,h\bigg{\{}\frac{k(V_{0}^{-1/2}VV_{0}^{-1/2})}{{\rm tr}(V_{0}^{-1}V)}\bigg{\}};

D_{\theta_{0}}(V,P)=(1-{\rm pr}[\{\theta_{0}\}])\,h\bigg{\{}\frac{k(V_{0}^{-1/2}VV_{0}^{-1/2})}{{\rm tr}(V_{0}^{-1}V)}\bigg{\}};

D_{\theta_{0}}(V,P)=(1-{\rm pr}[\{\theta_{0}\}])\,{\rm pr}\bigg{(}Y_{2}\geq\frac{1}{2}+\frac{1}{2}\bigg{[}1-\det\bigg{\{}\frac{2V_{0}^{-1}V}{{\rm tr}(V_{0}^{-1}V)}\bigg{\}}\bigg{]}^{1/2}\,\bigg{)},

D_{\theta_{0}}(V,P)=(1-{\rm pr}[\{\theta_{0}\}])\,{\rm pr}\bigg{(}Y_{2}\geq\frac{1}{2}+\frac{1}{2}\bigg{[}1-\det\bigg{\{}\frac{2V_{0}^{-1}V}{{\rm tr}(V_{0}^{-1}V)}\bigg{\}}\bigg{]}^{1/2}\,\bigg{)},

D\big{(}V_{A},P^{AX+{b}}\big{)}=D(V,P^{X}),\quad R(\alpha,P^{AX+b})=\big{\{}V_{A}:V\in R(\alpha,P)\big{\}}

D\big{(}V_{A},P^{AX+{b}}\big{)}=D(V,P^{X}),\quad R(\alpha,P^{AX+b})=\big{\{}V_{A}:V\in R(\alpha,P)\big{\}}

MSE_{γ} = \frac{1}{R} r = 1 \sum R (Δ α_{r, γ})^{2},

MSE_{γ} = \frac{1}{R} r = 1 \sum R (Δ α_{r, γ})^{2},

V_{ℓ, ξ} = I_{2} + ℓ ξ (0.5 - 1 1 0.5)

V_{ℓ, ξ} = I_{2} + ℓ ξ (0.5 - 1 1 0.5)

D_{\theta}(V,P)=\inf_{M\in\mathcal{M}^{\rm all}_{k}}{\rm pr}\big{(}\tilde{C}^{M}_{\theta,V}\big{)}=\inf_{M\in\mathcal{M}^{\rm all}_{k,F}}{\rm pr}\big{(}\tilde{C}^{M}_{\theta,V}\big{)}=\inf_{M\in\mathcal{M}^{\rm all}_{k}}{\rm pr}\big{(}C^{M}_{\theta,V}\big{)}=\inf_{M\in\mathcal{M}^{r}_{k}}{\rm pr}\big{(}C^{M}_{\theta,V}\big{)},

D_{\theta}(V,P)=\inf_{M\in\mathcal{M}^{\rm all}_{k}}{\rm pr}\big{(}\tilde{C}^{M}_{\theta,V}\big{)}=\inf_{M\in\mathcal{M}^{\rm all}_{k,F}}{\rm pr}\big{(}\tilde{C}^{M}_{\theta,V}\big{)}=\inf_{M\in\mathcal{M}^{\rm all}_{k}}{\rm pr}\big{(}C^{M}_{\theta,V}\big{)}=\inf_{M\in\mathcal{M}^{r}_{k}}{\rm pr}\big{(}C^{M}_{\theta,V}\big{)},

D_{\theta}(V,P)=\inf_{v\in\mathbb{R}^{k^{2}}}{\rm pr}\big{[}\big{\{}x\in\mathbb{R}^{k}:v^{T}{\rm vec}\,\{u_{\theta,V}^{x}(u_{\theta,V}^{x})^{T}-{(1/k)}I_{k}\}\geq 0\big{\}}\big{]}.

D_{\theta}(V,P)=\inf_{v\in\mathbb{R}^{k^{2}}}{\rm pr}\big{[}\big{\{}x\in\mathbb{R}^{k}:v^{T}{\rm vec}\,\{u_{\theta,V}^{x}(u_{\theta,V}^{x})^{T}-{(1/k)}I_{k}\}\geq 0\big{\}}\big{]}.

D_{θ} (V, P)

D_{θ} (V, P)

D_{\theta}(V,P)=\inf_{M\in\mathcal{M}_{k}^{\rm all}}\Big{(}{\rm pr}\big{(}C_{\theta,V}^{M}\big{)}+{\rm pr}[\{\theta\}]\mathbb{I}\big{\{}{\rm tr}(M)\leq 0\big{\}}\Big{)}=\inf_{M\in\mathcal{M}_{k}^{\rm all}}{\rm pr}\big{(}C_{\theta,V}^{M}\big{)},

D_{\theta}(V,P)=\inf_{M\in\mathcal{M}_{k}^{\rm all}}\Big{(}{\rm pr}\big{(}C_{\theta,V}^{M}\big{)}+{\rm pr}[\{\theta\}]\mathbb{I}\big{\{}{\rm tr}(M)\leq 0\big{\}}\Big{)}=\inf_{M\in\mathcal{M}_{k}^{\rm all}}{\rm pr}\big{(}C_{\theta,V}^{M}\big{)},

V \mapsto D_{θ} (V, P) = M \in M_{k}^{all} in f pr (\tilde{C}_{θ, V}^{M}),

V \mapsto D_{θ} (V, P) = M \in M_{k}^{all} in f pr (\tilde{C}_{θ, V}^{M}),

{\rm pr}(\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}})-{\rm pr}(\tilde{C}^{M_{0}}_{\theta,V_{0}})=\int_{\mathbb{R}^{k}}\big{\{}\mathbb{I}(\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}})-\mathbb{I}(\tilde{C}^{M_{0}}_{\theta,V_{0}})\big{\}}\,dP\to 0

{\rm pr}(\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}})-{\rm pr}(\tilde{C}^{M_{0}}_{\theta,V_{0}})=\int_{\mathbb{R}^{k}}\big{\{}\mathbb{I}(\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}})-\mathbb{I}(\tilde{C}^{M_{0}}_{\theta,V_{0}})\big{\}}\,dP\to 0

\liminf_{n\to\infty}D_{\theta}(V_{n},P)=\liminf_{n\to\infty}{\rm pr}\big{(}\tilde{C}^{M}_{\theta,V_{n}}\big{)}=\liminf_{\ell\to\infty}{\rm pr}\big{(}\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}}\big{)}={\rm pr}\big{(}\tilde{C}^{M_{0}}_{\theta,V_{0}}\big{)}\geq D_{\theta}(V_{0},P).

\liminf_{n\to\infty}D_{\theta}(V_{n},P)=\liminf_{n\to\infty}{\rm pr}\big{(}\tilde{C}^{M}_{\theta,V_{n}}\big{)}=\liminf_{\ell\to\infty}{\rm pr}\big{(}\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}}\big{)}={\rm pr}\big{(}\tilde{C}^{M_{0}}_{\theta,V_{0}}\big{)}\geq D_{\theta}(V_{0},P).

pr (∣ v_{n}^{T} u_{θ}^{X} ∣ \leq c_{n}) \geq t_{θ, P} (c_{n}) - (1/ n) .

pr (∣ v_{n}^{T} u_{θ}^{X} ∣ \leq c_{n}) \geq t_{θ, P} (c_{n}) - (1/ n) .

\lim_{\ell\to\infty}{\rm pr}\big{[}u_{\theta}^{X}\in\cup_{v\in C_{\ell}}\{y:|v^{T}y|\leq c_{n_{\ell}}\}\big{]}={\rm pr}\big{[}u_{\theta}^{X}\in\{y:|v_{0}^{T}y|\leq 0\}\big{]}={\rm pr}\big{(}|v_{0}^{T}u_{\theta}^{X}|=0\big{)}.

\lim_{\ell\to\infty}{\rm pr}\big{[}u_{\theta}^{X}\in\cup_{v\in C_{\ell}}\{y:|v^{T}y|\leq c_{n_{\ell}}\}\big{]}={\rm pr}\big{[}u_{\theta}^{X}\in\{y:|v_{0}^{T}y|\leq 0\}\big{]}={\rm pr}\big{(}|v_{0}^{T}u_{\theta}^{X}|=0\big{)}.

D_{θ} (V, P)

D_{θ} (V, P)

D_{\theta}(V,P)\leq{\rm pr}\big{[}\lambda_{1}(V^{-1})\{v_{1}^{T}(V)u_{\theta}^{X}\}^{2}\leq k\big{]}\leq t_{\theta,P}\big{[}\{k\lambda_{k}(V)\}^{1/2}\big{]}.

D_{\theta}(V,P)\leq{\rm pr}\big{[}\lambda_{1}(V^{-1})\{v_{1}^{T}(V)u_{\theta}^{X}\}^{2}\leq k\big{]}\leq t_{\theta,P}\big{[}\{k\lambda_{k}(V)\}^{1/2}\big{]}.

\frac{V _{t}}{tr ( M V _{t} )} = (1 - s_{t}) \frac{V _{a}}{tr ( M V _{a} )} + s_{t} \frac{V _{b}}{tr ( M V _{b} )}, with s_{t} = \frac{t tr ( M V _{b} )}{( 1 - t ) tr ( M V _{a} ) + t tr ( M V _{b} )} \cdot

\frac{V _{t}}{tr ( M V _{t} )} = (1 - s_{t}) \frac{V _{a}}{tr ( M V _{a} )} + s_{t} \frac{V _{b}}{tr ( M V _{b} )}, with s_{t} = \frac{t tr ( M V _{b} )}{( 1 - t ) tr ( M V _{a} ) + t tr ( M V _{b} )} \cdot

\displaystyle y^{T}\bigg{\{}\frac{V_{t}}{{\rm tr}(MV_{t})}\bigg{\}}^{-1}y

\displaystyle y^{T}\bigg{\{}\frac{V_{t}}{{\rm tr}(MV_{t})}\bigg{\}}^{-1}y

tr (M V_{t}) y^{T} V_{t}^{- 1} y

tr (M V_{t}) y^{T} V_{t}^{- 1} y

tr (M V_{t}) y^{T} V_{t}^{- 1} y

tr (M V_{t}) y^{T} V_{t}^{- 1} y

D_{θ} (V, P)

D_{θ} (V, P)

\displaystyle\hskip 5.69054pt{\rm pr}\big{\{}(X-\theta)^{T}M(X-\theta)\geq(1/k){\rm tr}(MV_{t})d^{2}_{\theta}(V_{t}),X\neq\theta\big{\}}

\displaystyle\hskip 5.69054pt{\rm pr}\big{\{}(X-\theta)^{T}M(X-\theta)\geq(1/k){\rm tr}(MV_{t})d^{2}_{\theta}(V_{t}),X\neq\theta\big{\}}

{\rm pr}\big{\{}(X^{T}MX)/\|X\|^{2}\geq{\rm tr}(M)/k,\,X\neq 0\big{\}}={\rm pr}\big{\{}U^{T}MU\geq{\rm tr}(M)/k\big{\}}{\rm pr}(X\neq 0),

{\rm pr}\big{\{}(X^{T}MX)/\|X\|^{2}\geq{\rm tr}(M)/k,\,X\neq 0\big{\}}={\rm pr}\big{\{}U^{T}MU\geq{\rm tr}(M)/k\big{\}}{\rm pr}(X\neq 0),

D_{0}(I_{k},P)=(1-{\rm pr}[\{0\}])\inf_{M\in\mathcal{M}_{k}^{\rm all}}{\rm pr}\big{\{}U^{T}MU\geq{\rm tr}(M)/k\big{\}}.

D_{0}(I_{k},P)=(1-{\rm pr}[\{0\}])\inf_{M\in\mathcal{M}_{k}^{\rm all}}{\rm pr}\big{\{}U^{T}MU\geq{\rm tr}(M)/k\big{\}}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Measurement and Metrology Techniques

Full text

Tyler Shape Depth

Davy Paindaveine∗ and Germain Van Bever†

∗ ECARES and Departement of Mathematics, Université libre de Bruxelles, Avenue F.D. Roosevelt, 50, CP114/04, B-1050, Brussels, Belgium

† Departement of Mathematics and Namur Institute for Complex Systems, Université de Namur, Rempart de la Vierge, 8, 5000, Namur, Belgium

Abstract

In many problems from multivariate analysis, the parameter of interest is a shape matrix, that is, a normalized version of the corresponding scatter or dispersion matrix. In this paper, we propose a depth concept for shape matrices that involves data points only through their directions from the center of the distribution. We use the terminology Tyler shape depth since the resulting estimator of shape, namely the deepest shape matrix, is the median-based counterpart of the M-estimator of shape of Tyler (1987). Beyond estimation, shape depth, like its Tyler antecedent, also allows hypothesis testing on shape. Its main benefit, however, lies in the ranking of shape matrices it provides, whose practical relevance is illustrated in principal component analysis and in shape-based outlier detection. We study the invariance, quasi-concavity and continuity properties of Tyler shape depth, the topological and boundedness properties of the corresponding depth regions, existence of a deepest shape matrix and prove Fisher consistency in the elliptical case. Finally, we derive a Glivenko–Cantelli-type result and establish almost sure consistency of the deepest shape matrix estimator.

Keywords: Elliptical distribution; Principal component analysis; Robustness; Shape matrix; Statistical depth; Test for sphericity.

1 Introduction

Location depths measure the centrality of an arbitrary $k$ -vector $\theta$ with respect to a probability measure $P=P^{X}$ over $\mathbb{R}^{k}$ . Letting $\mathcal{S}^{k-1}=\{x\in\mathbb{R}^{k}:\|x\|^{2}=x^{T}x=1\}$ denote the unit sphere in $\mathbb{R}^{k}$ , the most famous instance is the Tukey (1975) halfspace depth

[TABLE]

throughout, $\rm pr$ refers to probability under the probability measure $P$ at hand. The halfspace depth regions $\{\theta\in\mathbb{R}^{k}:D(\theta,P)\geq\alpha\}$ form a family of nested convex subsets of $\mathbb{R}^{k}$ . The Tukey median $\theta_{P}$ , defined as the barycenter of the innermost region $M_{P}=\{\theta\in\mathbb{R}^{k}:D(\theta,P)=\max_{\xi\in\mathbb{R}^{k}}D(\xi,P)\}$ , extends the univariate median to the multivariate case and is a robust alternative to the expectation $E(X)$ . Beyond location estimation, many inference problems can be tackled in a robust and nonparametric way by using the center-outward order resulting from depth (Liu et al., 1999). Adopting the parametric depth approach from Mizera (2002), $D(\theta,P)$ can also be read as a measure of how well the location parameter value $\theta$ fits the probability measure $P$ . In this spirit, possible outliers in a data set $X_{1},\ldots,X_{n}$ will be flagged by low depth values $D(X_{i},P_{n})$ , where $P_{n}$ denotes the corresponding empirical probability measure.

In this paper, the focus is on multivariate dispersion parameters known as shape matrices. For simplicity, we restrict in this section to elliptical distributions. Let $\mathcal{P}_{k}$ be the collection of $k\times k$ symmetric positive definite matrices and write $A^{1/2}$ , with $A\in\mathcal{P}_{k}$ , for the unique square root of $A$ in $\mathcal{P}_{k}$ . We will say that $P=P^{X}$ is elliptical with location $\theta\in\mathbb{R}^{k}$ , scatter $\Sigma\in\mathcal{P}_{k}$ and generating variate $R$ if $X$ has the same distribution as $\theta+R\Sigma^{1/2}U$ , where $U$ is uniformly distributed over $\mathcal{S}^{k-1}$ and is independent of the nonnegative scalar random variable $R$ , which has unit median. This median constraint makes $\Sigma$ identifiable without moment conditions. Under finite second-order moments, the resulting covariance matrix is $\Sigma_{P}=\{E(R^{2})/k\}\Sigma$ . Inference problems such as constructing confidence regions for $\theta$ require one to estimate the full scatter matrix $\Sigma$ or the full covariance matrix $\Sigma_{P}$ . However, in many other problems, it is sufficient to estimate the shape matrix, that is, the normalized scatter matrix

[TABLE]

This shape matrix $V$ could be normalized, as in Paindaveine (2008), to have determinant one or upper-left entry one, which would not affect the results of the present paper. For instance, principal components may be equivalently computed from $V$ , from $\Sigma$ or, when it exists, from $\Sigma_{P}$ , since proportional matrices have the same eigenvectors. Now, when it comes to fixing the number of principal components on which to base further analysis, one typically looks at the proportions of explained variances $p_{m}(\Sigma_{P})=\sum_{\ell=1}^{m}\lambda_{\ell}(\Sigma_{P})/\sum_{\ell=1}^{k}\lambda_{\ell}(\Sigma_{P})$ ( $m=1,\ldots,k$ ), where $\lambda_{\ell}(A)$ denotes the $\ell$ th largest eigenvalue of $A$ . Similarly to eigenvectors, these proportions remain unchanged if they are computed from $V$ rather than from $\Sigma$ or $\Sigma_{P}$ . In principal component analysis it is thus sufficient to estimate, or know the value of, $V$ .

There is a large literature on inference for shape. Our main contribution is to provide a depth concept for shape, measuring how well a given shape matrix $V$ fits the probability measure $P$ . While the proposed depth will lead to estimators and tests for shape, its main added value is the ordering of shape matrices resulting from depth. Here, we mention only two possible applications. The first is in principal component analysis, where a suitable estimator $\hat{V}$ is to be chosen. When it is suspected that there might be outliers, one might for instance consider the minimum covariance determinant estimates $\hat{V}_{\gamma}$ , $\gamma\in[0.5,1]$ , trimming a proportion $1-\gamma$ of the data; see $\S$ 5. Choosing $\gamma$ should typically be done on the basis of the proportion of outliers, which is usually unknown. We will show that the shape depth of $\hat{V}_{\gamma}$ allows for an informed choice on $\gamma$ . The second application concerns outlier detection in multivariate financial times series. Since volatility is key in finance, one might flag atypical days in such series by spotting days that associate a low depth to a shape estimator $\hat{V}_{\rm full}$ computed from the full series.

Depth for a generic parameter has been discussed in Mizera (2002). Depth for scatter matrices, however, has only been considered in Zhang (2002), Chen et al. (2018) and Paindaveine and Van Bever (2018), and only the last considers depth for shape matrices.

2 Shape depth

Tyler (1987) introduced a shape notion extending the concept of shape outside the elliptical setup. Consider the multivariate sign $U_{\theta,V}$ defined as $V^{-1/2}(X-\theta)/\|V^{-1/2}(X-\theta)\|$ if $X\neq\theta$ and as [math] otherwise, where $V^{-1/2}$ is the inverse of $V^{1/2}$ . Let also $W_{\theta,V}={\rm vec}\{U_{\theta,V}U^{T}_{\theta,V}-(1/k)I_{k}\}$ , where ${\rm vec}\,A$ stacks the columns of $A$ on top of each other and where $I_{k}$ is the $k\times k$ identity matrix. The Tyler shape of $P=P^{X}$ , $V_{T}$ say, is then the matrix $V\in\mathcal{P}_{k,{\rm tr}}=\{V\in\mathcal{P}_{k}:~{}{\rm tr}(V)=k\}$ satisfying

[TABLE]

If $P$ is smooth at $\theta$ , in the sense that no hyperplane containing $\theta$ has a strictly positive $P$ -probability mass, then (2.1) admits a unique solution $V\in\mathcal{P}_{k,{\rm tr}}$ that agrees with the true shape if $P$ is elliptical with location $\theta$ (Tyler, 1987; Kent and Tyler, 1988; Dümbgen, 1998). In essence, (2.1) identifies the shape $V$ making the origin of $\mathbb{R}^{k^{2}}$ most central in an $L_{2}$ -sense for the distribution $P^{W_{\theta,V}}$ of $W_{\theta,V}$ , that is, it defines $V_{T}$ as the solution of

[TABLE]

The present work finds its source in the idea that one may define the shape of $P$ as the matrix $V\in\mathcal{P}_{k,{\rm tr}}$ making the origin of $\mathbb{R}^{k^{2}}$ most central for the distribution of $W_{\theta,V}$ , in the halfspace depth sense, that is, as the value of $V$ maximizing the following depth.

Definition 2.1 (Tyler shape depth).

Let $P=P^{X}$ be a probability measure over $\mathbb{R}^{k}$ and fix $V\in\mathcal{P}_{k,{\rm tr}}$ . (i) For any $\theta\in\mathbb{R}^{k}$ , the fixed- $\theta$ shape depth of $V$ with respect to $P$ is $D_{\theta}(V,P)=D(0,P^{W_{\theta,V}})=\inf_{u\in\mathcal{S}^{k^{2}-1}}{\rm pr}(u^{T}W_{\theta,V}\geq 0)$ . (ii) The shape depth of $\hskip 0.56905ptV$ with respect to $P$ is $D(V,P)=D_{\theta_{P}}(V,P)$ , where $\theta_{P}$ is the Tukey median of $P$ .

We will use the notation $D(\cdot,P)$ for both halfspace and Tyler shape depths, as the vector or matrix nature of the argument will remove any ambiguity. The fixed- $\theta$ shape depth can equivalently be defined as $D_{\theta}(V,P)=\inf_{M}{\rm pr}\{U_{\theta,V}^{T}MU_{\theta,V}-{{\rm tr}(M)/k}\geq 0\},$ where the infimum is over all $k\times k$ symmetric matrices $M$ ; see Lemma 1 in the Supplementary Material. While, in view of (2.2), $V_{T}$ can be seen as a sign-based mean concept for shape, the maximizer of Tyler shape depth is of a median nature. The main benefit of the proposed depth does not come from the deepest shape itself but rather from the ranking of shapes it provides; see $\S$ 5.

Definition 2.1(ii) calls for some comments. Two approaches were considered in the literature for Tyler shape in the case of unspecified center: the Tyler (1987) plug-in approach, which replaces the unknown $\theta$ with some location functional, and the Hettmansperger and Randles (2002) approach, which jointly solves $E(U_{\theta,V})=0$ and $E(W_{\theta,V})=0$ ; existence of a unique solution to joint location and scatter M-estimating equations was studied in Maronna (1976) under ellipticity and in Tatsuoka and Tyler (2000) for non-elliptic cases. Both approaches provide two distinct shapes outside the elliptical setup. In contrast, for the proposed depth, the plug-in and joint maximization approaches always lead to the same shape: irrespective of $\lambda$ , the objective function $(\theta,V)\mapsto D(0,P^{U_{\theta,V}})+\lambda D(0,P^{W_{\theta,V}})$ is indeed maximized

at $\theta=\theta_{P}$ and $V=\arg\max_{V}D(0,P^{W_{\theta_{P},V}})$ , since $D\left(0,P^{U_{\theta,V}}\right)=D(0,P^{V^{-1/2}(X-\theta)})=D(\theta,P^{X})$ is, for any $V$ , maximized at $\theta=\theta_{P}$ .

An alternative way to obtain an unspecified location version of Tyler shape is to construct it on pairwise differences (Dümbgen, 1998). We will not investigate this for our shape depth, since the sample version of the resulting depth would lead to a much heavier computational burden.

3 Main properties

In this section, we study the main properties of the shape depth $D_{\theta}(V,P)$ and of the corresponding depth regions $R_{\theta}(\alpha,P)=\{V\in\mathcal{P}_{k,{\rm tr}}:D_{\theta}(V,P)\geq\alpha\}$ . Topological statements for subsets of $\mathcal{P}_{k,{\rm tr}}$ and for functions defined on $\mathcal{P}_{k,{\rm tr}}$ will refer to the topology whose open sets are generated by balls of the form $B(V_{0},r)=\{V\in\mathcal{P}_{k,{\rm tr}}:d(V,V_{0})<r\}$ , where $d$ is the usual geodesic distance on $\mathcal{P}_{k}$ : with the classical log mapping on $\mathcal{P}_{k}$ , this distance is such that $d(V_{a},V_{b})=\|\log(V_{a}^{-1/2}V_{b}V_{a}^{-1/2})\|_{F}$ , where $\|A\|_{F}=\{{\rm tr}(AA^{T})\}^{1/2}$ is the Frobenius norm of $A$ (Bhatia, 2007). We start with the following continuity result.

Theorem 3.1.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Then, (i) $V\mapsto D_{\theta}(V,P)$ is upper semicontinuous on $\mathcal{P}_{k,{\rm tr}}$ ; (ii) the depth region $R_{\theta}(\alpha,P)$ is closed for any $\alpha\geq 0$ ; (iii) if $P$ is absolutely continuous with respect to the Lebesgue measure, then $V\mapsto D_{\theta}(V,P)$ is also lower semicontinuous, hence continuous, on $\mathcal{P}_{k,{\rm tr}}$ .

We will say that a subset $R$ of $\mathcal{P}_{k,{\rm tr}}$ is bounded if and only if $R\subset B(I_{k},r)$ for some $r>0$ ; since $d$ satisfies the triangle inequality, we need only consider balls centered at $I_{k}$ . Moreover, we will say that $P$ is smooth at $\theta$ if and only if $t_{\theta,P}=0$ , with $t_{\theta,P}=\sup_{u\in\mathcal{S}^{k-1}}{\rm pr}\{u^{T}(X-\theta)=0\}$ . We then have the following result.

Theorem 3.2.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Then the depth region $R_{\theta}(\alpha,P)$ is bounded and compact for any $\alpha>t_{\theta,P}$ .

The main reason to work with geodesic distance rather than Frobenius distance $d_{F}(V_{1},V_{2})=\|V_{2}-V_{1}\|_{F}$ is that, unlike $(\mathcal{P}_{k,{\rm tr}},d_{F})$ , the metric space $(\mathcal{P}_{k,{\rm tr}},d)$ is complete; see, e.g., Proposition 10 in Bhatia and Holbrook (2006). This is what allows us to establish compacity in Theorem 3.2, which is the main ingredient for the following result.

Theorem 3.3.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . (i) If $R_{\theta}(t_{\theta,P},P)$ is non-empty, then there exists a shape $V_{*}\in\mathcal{P}_{k,{\rm tr}}$ maximizing $D_{\theta}(V,P)$ . In particular, (ii) if $P$ is smooth at $\theta$ , then such a deepest shape $V_{*}$ exists.

While the previous result guarantees existence of a deepest shape for absolutely continuous probability measures, uniqueness is not guaranteed in general. Parallel to what is done for the Tukey median, we then define the fixed- $\theta$ shape matrix of $P$ as the barycenter of the deepest shape region of $P$ , that is, as the shape matrix $V_{\theta,P}$ satisfying

[TABLE]

with $\alpha_{*}=\max_{V}D_{\theta}(V,P)$ . Two remarks are in order. First, the integrals in (3.1) exist and are finite since ${\rm vec}\,\mathcal{P}_{k,{\rm tr}}$ is a bounded subset of $\mathbb{R}^{k^{2}}$ : $0\leq V^{2}_{ij}<V_{ii}V_{jj}\leq k^{2}$ for any $V\in\mathcal{P}_{k,{\rm tr}}$ . Second, the following convexity result implies that $V_{\theta,P}$ has maximal depth.

Theorem 3.4.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Then, (i) $V\mapsto D_{\theta}(V,P)$ is quasi-concave: $D_{\theta}(V_{t},P)\geq\min\{D_{\theta}(V_{a},P),D_{\theta}(V_{b},P)\}$ for $V_{t}=(1-t)V_{a}+tV_{b}$ with $V_{a},V_{b}\in\mathcal{P}_{k,{\rm tr}}$ and $t\in[0,1]$ ; (ii) the region $R_{\theta}(\alpha,P)$ is convex for any $\alpha\geq 0$ .

This defines the fixed- $\theta$ shape of a probability measure $P$ under the very mild condition that $R_{\theta}(t_{\theta,P},P)$ is non-empty, hence in particular when $P$ is smooth at $\theta$ . Of course, it is important that, under ellipticity, this agrees with the elliptical concept of shape provided in $\S$ 1. The following Fisher consistency result confirms that this is the case.

Theorem 3.5.

Let $P$ be an elliptical probability measure over $\mathbb{R}^{k}$ with location $\theta_{0}$ and shape $V_{0}$ . Then, $D_{\theta_{0}}(V_{0},P)\geq D_{\theta_{0}}(V,P)$ for any $V\in\mathcal{P}_{k,{\rm tr}}$ , and, provided that ${\rm pr}[\{\theta_{0}\}]<1$ , the equality holds if and only if $V=V_{0}$ . Letting $Y_{k}$ be Beta with parameters $1/2$ and $(k-1)/2$ , the maximal depth is $D_{\theta_{0}}(V_{0},P)=(1-{\rm pr}[\{\theta_{0}\}]){\rm pr}(Y_{k}>1/k)$ .

In this result, ${\rm pr}[\{\theta_{0}\}]$ equals the probability that the generating variate $R$ associated to $P$ is equal to zero. Lemma 2 in Paindaveine and Van Bever (2017) implies that the maximal depth in Theorem 3.5 is monotone decreasing in $k$ if ${\rm pr}[\{\theta_{0}\}]$ does not depend on $k$ , in which case the maximal depth is convergent as $k$ goes to infinity. Since $Y_{k}$ has the same distribution as $Z_{1}^{2}/(\sum_{\ell=1}^{k}Z_{\ell}^{2})$ , where $Z=(Z_{1},\ldots,Z_{k})^{T}$ is $k$ -variate standard normal, the limit is equal to ${\rm pr}(Z_{1}^{2}>1)\approx 0.317$ . The proof of Theorem 3.5 requires the following result.

Theorem 3.6.

Let $P=P^{X}$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Then, for any shape matrix $V$ , any invertible $k\times k$ matrix $A$ and any $k$ -vector $b$ ,

[TABLE]

where $V_{A}=kAV\!A^{T}/{\rm tr}(AV\!A^{T})$ is the shape matrix proportional to $AV\!A^{T}$ .

This shows that the fixed- $\theta$ shape depth and the corresponding regions behave well under affine transformations, and in particular under changes of the measurement units. Affine invariance is a classical requirement in location depth (Zuo and Serfling, 2000).

Tyler shape depth is a sign concept in the sense that it depends on the underlying random vector $X$ only through its multivariate sign $U_{\theta,V}$ . In the elliptical case, it follows that, if the distribution does not charge the center of the distribution, this depth does not depend on the distribution of the underlying generating variate $R$ . More precisely, we have the following result.

Theorem 3.7.

Let $P$ be an elliptical probability measure over $\mathbb{R}^{k}$ with location $\theta_{0}$ and shape $V_{0}$ . Then, (i) for some $h:\mathcal{P}_{k,{\rm tr}}\to[0,1]$ that does not depend on $V$ or on $P$ ,

[TABLE]

(ii) for $k=2$ ,

[TABLE]

with $Y_{2}$ is Beta distributed with parameters $1/2$ and $1/2$ .

The function $h$ in this result does not depend on $P$ , so that depth, under ellipticity, depends on $P$ through $V_{0}$ and ${\rm pr}[\{\theta_{0}\}]$ only, with the dependence on ${\rm pr}[\{\theta_{0}\}]$ not affecting the induced ranking of shape matrices. It is easy to check that the explicit bivariate elliptical depth in (3.3) is compatible with the general results obtained above. While it seems very challenging to obtain an explicit expression for the function $h$ in (3.2), numerical experiments lead us to conjecture that, irrespective of the dimension $k$ , the mapping $h$ is of the form $h(M)=g(\det M)$ for some function $g:\mathbb{R}^{+}\to[0,1]$ .

The results of this section extend to the unspecified-location shape depth $D(V,P)=D_{\theta_{P}}(V,P)$ and to the corresponding regions $R(\alpha,P)=\{V\in\mathcal{P}_{k,{\rm tr}}:D(V,P)\geq\alpha\}$ . Theorems 3.1 to 3.4 hold for any fixed $\theta$ and their unspecified- $\theta$ versions are simply obtained by substituting $\theta_{P}$ for $\theta$ throughout. In particular, the existence of an unspecified-location deepest shape matrix is guaranteed if $P$ is smooth at $\theta_{P}$ , or, more generally, if $R(t_{\theta_{P},P},P)$ is non-empty. Under unspecified location, the shape $V_{P}$ of $P$ is then defined as the barycenter of the set of shape matrices maximizing $D(\cdot,P)$ . In view of the affine equivariance of $\theta_{P}$ , i.e., $\theta_{P^{AX+B}}=A\theta_{P^{X}}+b$ , the affine-invariance/equivariance properties

[TABLE]

follow directly from Theorem 3.6, to which we refer for the definition of $V_{A}$ . Finally, Theorems 3.5 and 3.7 also readily extend to the unspecified-location case, since $\theta_{P}=\theta_{0}$ for any elliptical probability measure $P$ with location $\theta_{0}$ . In particular, if $P$ is elliptical with shape $V_{0}$ , then the unspecified- $\theta$ shape depth $D(V,P)$ is uniquely maximized at $V=V_{0}$ , if the distribution is not degenerate at a single point.

4 Consistency

When $k$ -variate observations $X_{1},\ldots,X_{n}$ are available, we define the sample fixed- $\theta$ depth of a shape matrix $V$ as $D_{\theta}(V,P_{n})$ , where $P_{n}$ is the empirical probability measure associated with $X_{1},\ldots,X_{n}$ , and its unspecified-location version as $D(V,P_{n})$ . In this section, we state a Glivenko–Cantelli-type result for these sample depths and investigate consistency of max-depth shape estimators.

Theorem 4.1.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and let $P_{n}$ denote the empirical probability measure associated with a random sample of size $n$ from $P$ . Then, (i) for any $\theta\in\mathbb{R}^{k}$ , $\sup_{V\in\mathcal{P}_{k,{\rm tr}}}|D_{\theta}(V,P_{n})-D_{\theta}(V,P)|\to 0$ almost surely as $n\to\infty$ ; (ii) if $P$ is absolutely continuous with respect to the Lebesgue measure, then $\sup_{V\in\mathcal{P}_{k,{\rm tr}}}|D(V,P_{n})-D(V,P)|\to 0$ almost surely as $n\to\infty$ .

We illustrate this result in the bivariate elliptical case associated with Theorem 3.7(ii). Figure 1 provides contour plots of $D_{\theta}(V,P)$ in terms of $V_{12}/(V_{11}V_{22})^{1/2}$ and $V_{22}/V_{11}$ , for various bivariate, arbitrarily elliptical, probability measures. The sign nature of shape depth ensures that these contours, along with their empirical counterparts, are distribution-free in the class of elliptical distributions that do not charge the centre of symmetry. Figure 1 also reports the empirical contour plots obtained from a random sample of size $n=800$ drawn from the corresponding bivariate normal distributions. Clearly, the results support the consistency in Theorem 4.1(i).

In $\S$ 3, the shape $V_{\theta,P}$ of $P$ was defined as the barycenter of the collection of $P$ -deepest shape matrices. In the empirical case, a natural estimator is the corresponding shape matrix $V_{\theta,P_{n}}$ computed from the empirical probability measure $P_{n}$ associated with the sample at hand; existence here follows from the fact that $D_{\theta}(V,P_{n})$ may only take values $\ell/n$ ( $\ell=0,1,\ldots,n$ ). The same argument ensures the existence of the sample deepest shape $V_{P_{n}}$ in the unspecified-location case. The sample Tukey median $\theta_{P_{n}}$ was one of the first affine-equivariant location estimators with a high breakdown point. It would therefore be interesting to investigate whether the affine-equivariant shape estimator $V_{P_{n}}$ , parallel to the Maronna–Stahel–Yohai P-estimators of scatter, also has a high breakdown point (Tyler, 1994). Since this is beyond the scope of this paper, we focus on consistency of sample deepest shapes.

Theorem 4.2.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and let $P_{n}$ denote the empirical probability measure associated with a random sample of size $n$ from $P$ . (i) Fix $\theta\in\mathbb{R}^{k}$ and assume that $R_{\theta}(t_{\theta,P},P)$ is non-empty. Then, $V_{\theta,P_{n}}\to V_{\theta,P}$ almost surely as $n\to\infty$ . (ii) If $P$ is absolutely continuous with respect to the Lebesgue measure, then $V_{P_{n}}\to V_{P}$ almost surely as $n\to\infty$ .

The specified- $\theta$ result in Theorem 4.2(i) holds in particular if $P$ is smooth at $\theta$ . The unspecified- $\theta$ result requires a more stringent smoothness assumption, namely absolute continuity of $P$ . This assumption, which is already present in Theorem 4.1(ii), is only needed to control the impact of replacing $\theta$ by $\theta_{P_{n}}$ in $D_{\theta}(V,P_{n})$ and $V_{\theta,P_{n}}$ . Figure 1 also supports Theorem 4.2(i) since, in each sample considered, the sample deepest shape is close to its population counterpart.

5 Two applications

5.1 Choosing a shape matrix estimator in principal component analysis

There is a vast literature on scatter or shape estimation. Among the most famous estimators are the minimum covariance determinant scatters $S_{\gamma}$ . Recall that, in the empirical case, $S_{\gamma}$ is the covariance matrix with the smallest determinant among covariance matrices computed using only a proportion $\gamma$ of the observations. The choice of the trimming proportion $1-\gamma$ is crucial, as the loss in efficiency can be very large if the trimming is excessive; see, for example, Croux and Haesbroeck (1999) or Paindaveine and Van Bever (2014). Choosing $\gamma$ is therefore difficult, as it should be taken large, but not so large as to incorporate outliers. In this section, we consider robust principal component analysis based on the shape estimators $\hat{V}_{\gamma}=kS_{\gamma}/{\rm tr}(S_{\gamma})$ and show that Tyler shape depth allows the making of an informed choice of $\gamma$ .

For several contamination proportions $\eta$ , we independently generated $R=500$ bivariate samples of $n=800$ independent observations, each comprising $(1-\eta)n$ clean observations and $\eta n$ outliers. With $X$ bivariate normal with zero mean and covariance matrix ${\rm diag}(4,1)$ and $Y$ bivariate normal with mean $(0,\delta)^{T}$ and identity covariance matrix, the clean observations are equal to $X$ in distribution, whereas the outliers are distributed, in equal proportions, as $Y$ or $-Y$ . Two simulations were conducted, one for $\delta=4$ and one for $\delta=5$ ; clearly, the former simulation provides a harder robustness problem than the latter. We consider estimating the first principal direction $e_{1}=(1,0)^{T}$ of the uncontaminated distribution. For any $\gamma\in[0.5,1]$ , a natural estimator is, up to a sign, the first eigenvector $\hat{v}_{\gamma}$ of $\hat{V}_{\gamma}$ . Denoting as $\hat{v}_{r,\gamma}$ this estimate in replication $r=1,\ldots,R$ , estimation performance can be measured through the mean squared error

[TABLE]

where $\Delta\alpha_{r,\gamma}=\arccos(|e_{1}^{T}\hat{v}_{r,\gamma}|)$ is the angle between the population first eigendirection $e_{1}$ and its estimate $\hat{v}_{r,\gamma}$ . Figure 2 plots $\textrm{MSE}_{\gamma}$ as a function of $\gamma$ ; the Monte Carlo exercise was performed for every value of $\gamma\in\{0.5,0.51,\ldots,0.99,1\}$ . The results confirm that, for any contamination proportion $\eta$ , a suitable value of $\gamma$ should be identified. The optimal value $\gamma_{0}=\arg\min_{\gamma}\textrm{MSE}_{\gamma}$ basically coincides with $1-\eta$ in the easy case $\delta=5$ , whereas, in the harder one $\delta=4$ , $\gamma_{0}$ is slightly smaller than $1-\eta$ for large contaminations. This is no surprise: when outliers are hard to identify, the estimators $\hat{V}_{\gamma}$ , with $\gamma\approx 1-\eta$ , are likely to be based on some outliers, which will strongly affect the estimation performance.

In this framework, Tyler shape depth, as announced, may be very useful to select a suitable value of $\gamma$ . We suggest choosing $\gamma$ based on visual inspection of the curve $\mathcal{C}=\{(\gamma,D(\hat{V}_{\gamma},P_{\gamma,n})):\gamma\in[0.5,1]\}$ , where $P_{\gamma,n}$ denotes the empirical measure associated with the optimal subsample leading to $\hat{V}_{\gamma}$ . The rationale is the following: for $\gamma$ small, $D(\hat{V}_{\gamma},P_{\gamma,n})$ will remain relatively high as long as no outlier is added to the optimal subsample. As $\gamma$ increases and outliers are added in the computation of $\hat{V}_{\gamma}$ , the depth $D(\hat{V}_{\gamma},P_{\gamma,n})$ will sharply decrease, thereby forming a kink in $\mathcal{C}$ . The selected $\gamma$ for a given dataset, $\hat{\gamma}$ , should therefore be the largest value for which $\mathcal{C}$ exhibit a stable behaviour. Figure 2 plots the curve $\mathcal{C}$ for the values of $\delta$ and $\eta$ considered above and clearly illustrates the behaviour of the depth curves just described. When the outliers are easily identifiable, the kinks occur at $\gamma_{0}$ , which coincides with $1-\eta$ . In the harder case, where outliers and clean data tend to be mixed, the selected value $\hat{\gamma}$ is still remarkably close to $\gamma_{0}$ . In conclusion, Tyler shape depth, and the ranking of shape matrices it provides, yield an effective visual tool that allows the selection of a sensible trimming proportion $1-\gamma$ in a data-driven way when conducting, e.g., a principal component analysis.

5.2 Outlier detection

For each trading day between February 1st, 2015 and February 1st, 2017, we collected the Nasdaq Composite and S $\&$ P500 stock indices every five minutes and computed their returns, that is, the differences between two logs of consecutive index values. The returns on a given day form a bivariate dataset of usually $78$ observations, though the number of observations varies due to missing values; days with fewer than $70$ bivariate returns were discarded. The resulting dataset comprises $n=38489$ observations on $D=478$ trading days.

Our analysis studies the joint behaviour of the bivariate returns in order to determine which trading days are atypical. An important source of atypicality is associated with the overall scale of the bivariate returns, which alternate between periods of high and low volatility. Such deviations can easily be detected by comparing the trace of any scatter measure on intraday data with that on the whole dataset, so we focus instead on detecting atypical joint volatility, i.e., days on which the ratios of the marginal volatilities or the correlations between the returns deviate greatly from their global behaviour.

Let $\hat{V}_{\rm full}=\hat{V}_{\hat{\gamma}}$ denote the minimum covariance determinant shape estimator computed from the full collection of $n$ returns with maximal shape depth. More precisely, denoting as $P_{\rm full}$ the empirical distribution of the full collection of returns, let $\hat{\gamma}=\arg\max_{\gamma\in\Gamma}D(\hat{V}_{\gamma},P_{\rm full})$ , for $\Gamma=\{0.5,0.505,0.51,\dots,0.995,1\}$ . The value obtained is $\hat{\gamma}=0.825$ , with corresponding depth $D(\hat{V}_{\hat{\gamma}},P_{\rm full})=0.497$ . This high depth value ensures that $\hat{V}_{\rm full}$ is an excellent proxy for the deepest shape matrix $\hat{V}=\arg\max_{V}D(V,P_{\rm full})$ , so the computation of $\hat{V}$ is unnecessary. Returns at the beginning of each trading period are known to be more volatile and should be discarded in shape estimation, so the robustness of $\hat{V}_{\rm full}$ is an obvious asset: the value of $\hat{\gamma}$ allows us to adaptively discard days on which the volatility deviates from its global pattern. The procedure discarded more than half of the corresponding intra-day returns for 17 days, and, remarkably, $13$ of these days lie within the two atypical periods mentioned in the next paragraph.

For each day $d=1,\dots,D$ , we evaluated the depth $D(\hat{V}_{\rm full},P_{d})$ of the global shape estimate with respect to the empirical distribution $P_{d}$ of the bivariate returns on day $d$ . The left panel of Figure 3 presents the depth values $D(\hat{V}_{\rm full},P_{d})$ . Vertical lines mark major events affecting the shape of the volatility, while the two greyed rectangles cover two periods during which the markets notoriously gave atypical returns: the first period follows the devaluation of the Yuan on August 11th, 2015 which saw rapid changes in the stock markets, including large devaluations on August 24th, event (a). The second period covers the beginning of 2016, when a slump in oil prices made stocks relying on oil very volatile compared to others. This resulted in atypical shape behaviour during January 22 – February 9; this last day, event (b), had the sharpest loss for the S $\&$ P500 index. The other events are (c) the decision of the European Central Bank on March 10th, 2016 to extend quantitative easing thereby slashing interest rates, which had a significant positive impact on both the Nasdaq and S $\&$ P500, but more pronounced for the latter, (d) the positive impact on the financial stocks following Fed officials’ comments on the possibility of rate hike made on May 27, 2016, and (e) the aftermath of Donald Trump’s election on November 9th. Detection of atypical observations was achieved by flagging outliers with a depth so low that it is outside the box-and-whiskers plot. This resulted in 12 flagged days, each either being one of the events described above or lying in one of the greyed regions.

We also computed the halfspace shape depth ${\rm HD}(\hat{V}_{\rm full},P_{d})$ of the global estimate for each day $d$ (Paindaveine and Van Bever, 2018). The right panel of Figure 3, a plot of $D(\hat{V}_{\rm full},P_{d})$ versus ${\rm HD}(\hat{V}_{\rm full},P_{d})$ , shows a clear positive association. Halfspace shape depth values seem to have a higher concentration than Tyler’s, because the former maximizes a concept of scatter depth in scale and may be able to find scatter estimates better suited to the data. Indeed, a decrease in volatility in one of the marginals might be balanced by considering a scatter with a smaller scale which would have a large depth value. A byproduct of this is the fact that, when evaluating halfspace shape depth, the difficult maximisation step in scale seems to be crucial in correctly computing the depth ranking of the data, which can be affected by small deviations. More importantly, while events (a) and (b) receive low depth with respect to both concepts, only Tyler shape depth succeeds in flagging days associated with events (c) to (e) as outlying.

6 Hypothesis testing for shape

In the previous section, we presented two specific applications of shape depth. The concept also allows us to tackle more standard inference problems for shape, such as point estimation and hypothesis testing. Here, we consider testing $\mathcal{H}_{0}:V=V_{0}$ against $\mathcal{H}_{1}:V\neq V_{0}$ at level $\alpha\in(0,1)$ , where $V_{0}\in\mathcal{P}_{k,{\rm tr}}$ is fixed, based on a random sample $X_{1},\ldots,X_{n}$ from a $k$ -variate elliptical distribution with known location $\theta$ and unknown shape $V$ . In view of Theorem 3.5, a natural depth-based test, $\phi_{D}$ say, rejects the null for small values of $T_{\theta,n}=D_{\theta}(V_{0},P_{n})$ , where $P_{n}$ is the empirical distribution of $X_{1},\ldots,X_{n}$ . Since $T_{\theta,n}$ is discrete, achieving null size $\alpha$ in general requires randomization. The resulting test thus rejects the null hypothesis if $T_{\theta,n}<t_{\alpha,n}$ , rejects the null hypothesis with probability $\gamma_{\alpha,n}$ if $T_{\theta,n}=t_{\alpha,n}$ , and does not reject the null hypothesis if $T_{\theta,n}>t_{\alpha,n}$ , where $t_{\alpha,n}$ is the null $\alpha$ -quantile of $T_{\theta,n}$ and $\gamma_{\alpha,n}$ is the amount of randomization. Under the assumption that $P$ does not charge the center of the distribution, $T_{\theta,n}$ is distribution-free under the null hypothesis, which allows estimating $t_{\alpha,n}$ and $\gamma_{\alpha,n}$ arbitrarily well through simulations. Prior to applying the test below for $k=2$ at level $5\%$ with sample sizes $n=200$ , $500$ , these were estimated from $500,\!000$ mutually independent standard normal samples for each sample size, yielding $\hat{t}_{0.05,200}=0.40$ , $\hat{\gamma}_{0.05,200}=0.61$ , $\hat{t}_{0.05,500}=0.43$ and $\hat{\gamma}_{0.05,500}=0.25$ . Distribution-freeness of $T_{\theta,n}$ under the null hypothesis actually extends to the class of distributions with elliptical directions (Randles, 2000).

We performed two simulations in the bivariate case. The first considers the problem of testing the null hypothesis of sphericity $\mathcal{H}_{0}:V_{0}=I_{2}$ about $\theta=0$ and compares the finite-sample powers of $\phi_{D}$ with those of some competitors. For each value of $\ell=0,1,\ldots,6$ we generated $M=3,\!000$ independent random samples $X_{1},\ldots,X_{n}$ of size $n=500$ from the normal distribution with location $\theta=0$ and shape

[TABLE]

and from the corresponding elliptical Cauchy distribution. The value $\ell=0$ corresponds to the null hypothesis, whereas $\ell=1,\ldots,6$ provide increasingly severe alternatives. We took $\xi=0.035$ and $0.045$ for the normal and Cauchy samples in order to obtain roughly the same rejection frequencies in both cases.

For each sample, we carried out six tests at nominal level $5\%$ : (i) the test $\phi_{D}$ described above; (ii) the Gaussian test from John (1972), or more precisely, its extension to elliptical distributions with finite fourth-order moments from Hallin and Paindaveine (2006); (iii) the sign test from Hallin and Paindaveine (2006); (iv) the Wald test based on the Tyler (1987) scatter matrix; (v)–(vi) the tests from Paindaveine and Van Bever (2014) based on the shape estimator $\hat{V}_{\gamma}$ in $\S$ 5, with $\gamma=0.5$ and $\gamma=0.8$ . The tests (ii)–(vi) were performed based on their asymptotic null distribution. The rejection frequencies in Figure 4 reveal that $\phi_{D}$ performs very similarly to, although it may be slightly dominated by, the sign-based tests in (iii)–(iv) but performs very well under heavy tails, where it beats all other tests. As expected, the Gaussian test collapses under heavy tails and the minimum covariance determinant tests show low empirical power.

The second simulation tests $\mathcal{H}_{0}:V=V_{0}$ , with $V_{0}={\rm diag}(2,1/2)$ and specified location $\theta=0$ , and compares the tests above in terms of the level robustness (He et al., 1990). We considered mixture distributions $P^{X_{(\eta)}}=(1-\eta)P^{X}+\eta P^{Y}$ with several contamination levels $\eta$ . Here, $X$ is a bivariate, normal or elliptical Cauchy, null random vector. The contamination random vector $Y$ was chosen as follows: (a) $Y$ has the same distribution as the vector obtained by rotating $X$ about the origin by $45$ degrees; (b) $Y$ has the same elliptical distribution as $X$ but its shape is $V=I_{2}$ ; (c) $Y$ is obtained by multiplying the vector $Y$ in (b) by four. The uncontaminated distribution $P^{X}$ puts more mass along the horizontal axis. In (a), the contamination typically shows along the main bisector, whereas the contamination in (b) is uniformly distributed over the unit circle. As for (c), the contamination combines the directional feature of (b) with radial outlyingness. For each combination of distribution, normal or Cauchy, of contamination pattern, (a)–(c), and of contamination level, $\eta=0,0.025,0.05,0.1,0.2,0.25$ or $0.3$ , we generated $3,\!000$ independent random samples $X_{(\eta)1},\ldots,X_{(\eta)n}$ of size $n=200$ . Figure 5 plots the resulting rejection frequencies and reveals the very good robustness of the depth-based test $\phi_{D}$ ; recall that, irrespective of $\eta$ , the target rejection frequency is here $5\%$ . In particular, $\phi_{D}$ always dominates its sign-based competitors (iii)–(iv). The minimum covariance determinant tests (v)–(vi) dominate $\phi_{D}$ in terms of robustness but exhibit poor finite-sample power. Radial outliers strongly affect the Gaussian test.

Summing up, the test associated with the proposed shape depth provides a good balance between efficiency and robustness. The improved robustness compared to its sign-based competitors is obtained at a very slight loss of power. Depth-based procedures can thus be defined for standard inference problems on shape, and will tend to perform as well as sign-based procedures. As shown in $\S$ 5, however, shape depth provides a whole ranking of shape matrices that allows addressing less standard applications.

7 Perspectives for future research

The present work offers quite rich research perspectives. The asymptotic distributions of the sample depths $D_{\theta}(V,P_{n})$ and $D(V,P_{n})$ as well as those of the corresponding deepest shape estimators could be studied. Investigating the robustness properties of these shape estimators would also be of interest, in particular to see whether these estimators have a high breakdown point. Regarding hypothesis testing, it would be desirable to define depth-based tests for other shape problems, such as testing the null hypothesis that two populations share the same shape.

Another key point is related to computational aspects. Since Tyler shape depth was defined through halfspace depth, it can in principle be evaluated by using the numerous packages that are dedicated to halfspace depth. The definition of Tyler shape depth suggests that evaluation of this depth in dimension $k$ requires the computation of halfspace depth in dimension $k^{2}$ . Fortunately, redundancies in the random vector $W_{\theta,V}$ reduce the dimension from $k^{2}$ to $d_{k}=k(k+1)/2-1$ as shown by the following result.

Theorem 7.1.

Let $P=P^{X}$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Let ${\rm vech}(A)$ be the vector stacking the lower-diagonal entries of $A=(A_{ij})$ on top of each other and ${\rm vech}_{0}\,A$ be ${\rm vech}(A)$ deprived of its first component. Then, $D_{\theta}(V,P)\!=\!D(0,P^{\tilde{W}_{\theta,V}})\!=\!\inf_{u\in\mathcal{S}^{d_{k}-1}}\!{\rm pr}(u^{T}\tilde{W}_{\theta,V}\geq 0)$ , with $\tilde{W}_{\theta,V}\!=\!{\rm vech}_{0}\{U_{\theta,V}U^{T}_{\theta,V}-(1/k)I_{k}\}$ .

It follows that, for $k=2$ and $3$ , Tyler shape depth dominates its halfspace counterpart from Paindaveine and Van Bever (2018) from a computational point of view. There is, though, probably room for ad hoc algorithms to compute Tyler shape depth more efficiently. It would also be desirable to design iterative algorithms for the computation of deepest shape matrices.

Appendix A Appendix

As in the main manuscript, $\rm pr$ will refer to probability under the probability measure $P$ at hand. However, it will sometimes be needed to emphasize the underlying probability measure, in which case we will write ${\rm pr}_{P}$ , ${\rm pr}_{Q}$ , ${\rm pr}_{P_{n}}$ , etc.

Many of the subsequent results require the following lemma.

Lemma A.1.

Let $P$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Write $C^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}\setminus\{\theta\}:(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}\geq{{\rm tr}(M)/k}\big{\}}$ and $\tilde{C}^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}:(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}\geq{{\rm tr}(M)/k}\big{\}},$ where $u_{\theta,V}^{x}$ is defined as $V^{-1/2}(x-\theta)/\|V^{-1/2}(x-\theta)\|$ if $x\neq\theta$ and as [math] otherwise. Then, for any $V\in\mathcal{P}_{k,{\rm tr}}$ and any $r\in\mathbb{R}$ ,

[TABLE]

where $\mathcal{M}^{\rm all}_{k}$ collects the $k\times k$ symmetric matrices with arbitrary trace, $\mathcal{M}^{r}_{k}$ is the subset of $\mathcal{M}^{\rm all}_{k}$ of matrices with trace $r$ , and where $\mathcal{M}^{\rm all}_{k,F}$ is the collection of matrices in $\mathcal{M}^{\rm all}_{k}$ with Frobenius norm one.

Proof.

It directly follows from the definition of Tyler shape depth that

[TABLE]

When $v$ runs over $\mathbb{R}^{k^{2}}$ , the matrix $M$ satisfying $v={\rm vec}(M^{T})$ runs over the collection $\mathcal{N}_{k}$ of $k\times k$ matrices. Since $(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}=(u_{\theta,V}^{x})^{T}\{(M+M^{T})/2\}u_{\theta,V}^{x}$ for any $M\in\mathcal{N}_{k}$ , this yields

[TABLE]

Letting $\mathbb{I}(A)$ be equal to one if condition A holds and to zero otherwise, this provides

[TABLE]

where we have used the fact that ${\rm pr}\big{(}C_{\theta,V}^{M}\big{)}$ is unchanged when $M$ is replaced with $M+\lambda I_{k}$ for any $\lambda\in\mathbb{R}$ . The same invariance property explains that the infimum over $\mathcal{M}_{k}^{\rm all}$ in (A.2) may be replaced with an infimum over $\mathcal{M}_{k}^{r}$ for any $r$ . Finally, the result for $\mathcal{M}^{\rm all}_{k,F}$ follows from (A.1) by noting that $\tilde{C}_{\theta,V}^{\lambda M}=\tilde{C}_{\theta,V}^{M}$ for any $\lambda>0$ and that $M=0$ cannot provide the infimum in (A.1). The proof is complete. ∎

Proof of Theorem 3.1.

(i) Fix $M\in\mathcal{M}^{\rm all}_{k}$ and consider $\tilde{C}^{M}=\tilde{C}^{M}_{0,I_{k}}$ , where $\tilde{C}^{M}_{\theta,V}$ was defined in Lemma A.1. Since $\tilde{C}^{M}$ is closed, the mapping $P\mapsto{\rm pr}_{P}(\tilde{C}^{M})$ is upper semicontinuous for weak convergence. Now, Slutzky’s lemma entails that, as $d(V,V_{0})\to 0$ , the measure defined by $B\mapsto{\rm pr}(\theta+V^{1/2}B)$ converges weakly to the one defined by $B\mapsto{\rm pr}(\theta+V_{0}^{1/2}B)$ . Therefore, $V\mapsto{\rm pr}(\theta+V^{1/2}\tilde{C}^{M})={\rm pr}(\tilde{C}^{M}_{\theta,V})$ is upper semicontinuous at $V_{0}$ . From Lemma A.1, we then obtain that

[TABLE]

is upper semicontinuous, as it is the infimum of a collection of upper semicontinuous functions. (ii) The result follows from the fact that the depth region $R_{\theta}(\alpha,P)$ is the inverse image of $[\alpha,\infty)$ by the upper semicontinuous function $V\mapsto D_{\theta}(V,P)$ . (iii) Fix a sequence $(V_{n})$ in $\mathcal{P}_{k,{\rm tr}}$ such that $d(V_{n},V_{0})\to 0$ . In view of Lemma A.1 again, we can, for any $n$ , pick $M_{n}\in\mathcal{M}^{\rm all}_{k,F}$ such that $P(\tilde{C}^{M_{n}}_{\theta,V_{n}})\leq D_{\theta}(V_{n},P)+\frac{1}{n}$ . Compactness of $\mathcal{M}^{\rm all}_{k,F}$ ensures that we can extract a subsequence $(M_{n_{\ell}})$ of $(M_{n})$ that converges to $M_{0}\in\mathcal{M}^{\rm all}_{k,F}$ . Writing $\mathbb{I}(B)$ for the indicator function of the set $B$ , the dominated convergence theorem then yields that

[TABLE]

as $\ell\to\infty$ . The absolute continuity assumption on $P$ guarantees that $\mathbb{I}(\tilde{C}^{M_{n_{\ell}}}_{\theta,V_{n_{\ell}}})-\mathbb{I}(\tilde{C}^{M_{0}}_{\theta,V_{0}})\to 0$ $P$ -almost everywhere. Consequently,

[TABLE]

We conclude that, if $P$ is absolutely continuous with respect to the Lebesgue measure, then $V\mapsto D_{\theta}(V,P)$ is also lower semicontinuous, hence continuous. ∎

The proof of Theorem 3.2 requires the following result.

Lemma A.2.

*Let $P$ be a probability measure over $\mathbb{R}^{k}$ and fix $\theta\in\mathbb{R}^{k}$ . Write $u_{\theta}^{x}=(x-\theta)/\|x-\theta\|$ if $x\neq\theta$ and 0 otherwise. For any $c\geq 0$ , further let $t_{\theta,P}(c)=\sup_{v\in\mathcal{S}^{k-1}}{\rm pr}(|v^{T}u_{\theta}^{X}|\leq c)$ , so that $t_{\theta,P}=t_{\theta,P}(0)=\sup_{v\in\mathcal{S}^{k-1}}{\rm pr}\{v^{T}(X-\theta)=0\}$ . Then, $t_{\theta,P}(c)\to t_{\theta,P}$ as $c\to 0$ . *

Proof of Lemma A.2.

Since $t_{\theta,P}(c)$ is increasing in $c$ over $[0,\infty)$ and is larger than or equal to $t_{\theta,P}$ for any positive $c$ , we have that $\tilde{t}_{\theta,P}=\lim_{c\to 0}t_{\theta,P}(c)$ exists and is such that $\tilde{t}_{\theta,P}\geq t_{\theta,P}$ . Now, fix a decreasing sequence $(c_{n})$ converging to [math] and consider an arbitrary sequence $(v_{n})$ such that

[TABLE]

Since $\mathcal{S}^{k-1}$ is compact, we can consider a subsequence $(v_{n_{\ell}})$ that converges to $v_{0}\in\mathcal{S}^{k-1}$ ; without loss of generality, we can of course assume that this subsequence is such that $(v_{0}^{T}v_{n_{\ell}})$ is an increasing sequence. Let then $C_{\ell}=\{v\in\mathcal{S}^{k-1}:v_{0}^{T}v\geq v_{0}^{T}v_{n_{\ell}}\}$ . Clearly, $C_{\ell}$ is a decreasing sequence of sets with $\cap_{\ell}C_{\ell}=\{v_{0}\}$ , so that

[TABLE]

Now, for any $\ell$ , we have ${\rm pr}\big{[}u_{\theta}^{X}\in\cup_{v\in C_{\ell}}\{y:|v^{T}y|\leq c_{n_{\ell}}\}\big{]}\geq{\rm pr}(|v_{n_{\ell}}^{T}u_{\theta}^{X}|\leq c_{n_{\ell}})\geq t_{\theta,P}(c_{n_{\ell}})-(1/n_{\ell})$ , which implies that $t_{\theta,P}\geq{\rm pr}(|v_{0}^{T}u_{\theta}^{X}|=0)\geq\tilde{t}_{\theta,P}$ . ∎

Proof of Theorem 3.2.

Fix $V\in\mathcal{P}_{k,{\rm tr}}$ and denote as $\lambda_{1}(V)$ the largest eigenvalue of $V$ . Similarly, denote $\lambda_{k}(V)$ . Possible ties are unimportant below. Letting $v_{1}(V)$ and $v_{k}(V)$ be arbitrary corresponding unit eigenvectors, Lemma A.1 provides, with $M_{V}=v_{1}(V)\linebreak v_{1}^{T}(V)\in\mathcal{M}^{\rm all}_{k}$

[TABLE]

where we used the inequality $\lambda_{1}(V)\geq 1$ which follows from the constraint ${\rm tr}(V)=k$ , and where $u_{\theta}^{s}$ is defined in Lemma A.2. Therefore,

[TABLE]

Now, ad absurdum, take $\varepsilon>0$ such that $R_{\theta}(t_{\theta,P}+\varepsilon,P)$ is unbounded. This implies that there exists a sequence $(V_{n})$ in $\mathcal{P}_{k,{\rm tr}}$ satisfying $D_{\theta}(V_{n},P)\geq t_{\theta,P}+\varepsilon$ for any $n$ and for which $d(V_{n},I_{k})\to\infty$ . Since $\lambda_{1}(V_{n})<{\rm tr}(V_{n})=k$ , we must have that $\lambda_{k}(V_{n})\to 0$ . Lemma A.2 and (A.3) then imply that $D_{\theta}(V_{n},P)<t_{\theta,P}+\varepsilon$ for $n$ large enough, a contradiction. Consequently, $R_{\theta}(\alpha,P)$ is bounded for any $\alpha>t_{\theta,P}$ .

Now, Lemma C.1 in Paindaveine and Van Bever (2018) readily implies that a bounded subset of $\mathcal{P}_{k,{\rm tr}}$ is also totally bounded, in the sense that, for any $\varepsilon>0$ , it can be covered by finitely many balls of the form $B(V,\varepsilon)=\{\tilde{V}\in\mathcal{P}_{k,{\rm tr}}:d(\tilde{V},V)<\varepsilon\}$ . Part (i) of the result and Theorem 3.1(ii) thus entail that, for any $\alpha>t_{\theta,P}$ , the region $R_{\theta}(\alpha,P)$ is closed and totally bounded. The result then follows from the completeness of the metric space $(\mathcal{P}_{k,{\rm tr}},d)$ . ∎

Proof of Theorem 3.3.

Let $\alpha_{*}=\sup_{V\in\mathcal{P}_{k,{\rm tr}}}D_{\theta}(V,P)$ . By assumption, $R_{\theta}(t_{\theta,P},P)$ is non-empty. Thus, $\alpha_{*}\geq t_{\theta,P}$ and the result holds if $\alpha_{*}=t_{\theta,P}$ . We may therefore assume that $\alpha_{*}>t_{\theta,P}$ . For any $n$ , pick then $V_{n}$ in $R_{\theta}(\alpha_{*}-1/n,P)$ , where $R_{\theta}(\alpha,P)$ is defined as $\mathcal{P}_{k,{\rm tr}}$ for $\alpha<0$ . Fix $\varepsilon\in(0,\alpha_{*}-t_{\theta,P})$ . For $n$ large enough, all terms of the sequence $(V_{n})$ belong to the compact set $R_{\theta}(\alpha_{*}-\varepsilon,P)$ ; see Theorem 3.2. Thus, there exists a subsequence $(V_{n_{k}})$ that converges in $R_{\theta}(\alpha_{*}-\varepsilon,P)$ , to $V_{*}$ say. For any $\varepsilon^{\prime}\in(0,\varepsilon)$ , all $(V_{n_{k}})$ eventually belong to the closed set $R_{\theta}(\alpha_{*}-\varepsilon^{\prime},P)$ , so that $V_{*}\in R_{\theta}(\alpha_{*}-\varepsilon^{\prime},P)$ . Therefore, $\alpha_{*}-\varepsilon^{\prime}\leq D_{\theta}(V_{*},P)\leq\alpha_{*}$ for any such $\varepsilon^{\prime}$ , which establishes the result. ∎

The proof of Theorem 3.4 requires the following preliminary result.

Lemma A.3.

For any $y\in\mathbb{R}^{k}$ and any $k\times k$ symmetric matrix $M$ , the mapping $V\mapsto{\rm tr}(MV)y^{T}V^{-1}y$ is quasi-convex, that is, for any $V_{a},V_{b}\in\mathcal{P}_{k,{\rm tr}}$ and any $t\in[0,1]$ , ${\rm tr}(MV_{t})y^{T}V_{t}^{-1}y\leq\max\{{\rm tr}(MV_{a})y^{T}V^{-1}_{a}y,{\rm tr}(MV_{b})y^{T}V^{-1}_{b}y\}$ , with $V_{t}=(1-t)V_{a}+tV_{b}$ .

Proof.

We treat two cases separately. (i) Assume first that ${\rm tr}(MV_{a}){\rm tr}(MV_{b})>0$ . Write

[TABLE]

Since $s_{t}\in[0,1]$ , the weighted harmonic-arithmetic matrix inequality then shows that, for any $y\in\mathbb{R}^{k}$ ,

[TABLE]

as was to be showed; we refer to Lemma 2.1(vii) in Lawson and Lim, 2013 for the aforementioned inequality. (ii) Assume then that ${\rm tr}(MV_{a}){\rm tr}(MV_{b})\leq 0$ . Without loss of generality, assume that ${\rm tr}(MV_{a})\leq 0$ and ${\rm tr}(MV_{b})\geq 0$ . If ${\rm tr}(MV_{a})={\rm tr}(MV_{b})=0$ , then ${\rm tr}(MV_{t})=0$ for any $t$ and the result trivially holds. Hence, we may assume that ${\rm tr}(MV_{a})\neq 0$ or ${\rm tr}(MV_{b})\neq 0$ , which implies that ${\rm tr}(MV_{t_{0}})=0$ for a unique $t_{0}\in[0,1]$ . From continuity, pick then $\delta\in(0,1-t_{0})$ such that, for any $t\in[t_{0},t_{0}+\delta)$ ,

[TABLE]

By applying Part (i) of the proof with $V_{t_{0}+\delta}$ and $V_{b}$ , we obtain that, for any $t\in[t_{0}+\delta,1]$ ,

[TABLE]

Since ${\rm tr}(MV_{t})y^{T}V^{-1}_{t}y\leq 0\leq\max\big{\{}{\rm tr}(MV_{a})y^{T}V^{-1}_{a}y,{\rm tr}(MV_{b})y^{T}V^{-1}_{b}y\big{\}}$ for any $t\in[0,t_{0}]$ , the result follows. ∎

Proof of Theorem 3.4.

(i) Write $V_{t}=(1-t)V_{a}+tV_{b}$ , where $V_{a},V_{b}\in\mathcal{P}_{k,{\rm tr}}$ and $t\in[0,1]$ are fixed. First note that, letting $d^{2}_{\theta}(V)=(X-\theta)^{T}V^{-1}(X-\theta)$ , Lemma A.1 yields

[TABLE]

Writing again $V_{t}=(1-t)V_{a}+tV_{b}$ , Lemma A.3 thus yields that, for any $M\in\mathcal{M}^{\rm all}_{k}$ ,

[TABLE]

The result then follows from (A.4). (ii) If $V_{a},V_{b}\in R_{\theta}(\alpha,P)$ , then Part (i) of the result entails that $D_{\theta}(V_{t},P)\geq\min\{D_{\theta}(V_{a},P),D_{\theta}(V_{b},P)\}\geq\alpha$ , so that $V_{t}\in R_{\theta}(\alpha,P)$ . ∎

The proof of Theorem 3.5 requires both following lemmas.

Lemma A.4.

Let $P$ be elliptical over $\mathbb{R}^{k}$ with location [math] and shape $I_{k}$ . Then, $D_{0}(I_{k},P)=(1-{\rm pr}[\{0\}]){\rm pr}(U_{1}^{2}>1/k),$ where $U=(U_{1},\ldots,U_{k})^{T}$ is uniformly distributed over the unit sphere $\mathcal{S}^{k-1}$ .

Lemma A.5.

Let $P$ be elliptical over $\mathbb{R}^{k}$ with location [math] and shape $I_{k}$ . Then, for any $V\in\mathcal{P}_{k,{\rm tr}}\setminus\{I_{k}\}$ , $D_{0}(V,P)<(1-{\rm pr}[\{0\}]){\rm pr}(U_{1}^{2}>1/k)$ , where $U=(U_{1},\ldots,U_{k})^{T}$ is uniformly distributed over $\mathcal{S}^{k-1}$ .

Proof of Lemma A.4.

In the spherical setup considered, we have that, for any $M\in\mathcal{M}_{k}^{\rm all}$ ,

[TABLE]

where $U=(U_{1},\ldots,U_{k})^{T}$ is uniform over $\mathcal{S}^{k-1}$ . Lemma A.1 then entails that

[TABLE]

Decomposing $M$ into $O\Lambda O^{T}$ , where $O$ is a $k\times k$ orthogonal matrix and where $\Lambda={\rm diag}(\lambda_{1},\ldots,\lambda_{k})$ is a diagonal matrix, this yields

[TABLE]

By using successively the facts that $p(0)=1$ and $p(\lambda)=p(\lambda/\|\lambda\|)$ for any $\lambda\in\mathbb{R}^{k}\setminus\{0\}$ , we obtain

[TABLE]

The result then follows from Theorem 2 from Paindaveine and Van Bever (2017), that states that the last infimum in (A.5) is equal to ${\rm pr}(U_{1}^{2}>1/k)$ . ∎

Proof of Lemma A.5.

Fix $V\in\mathcal{P}_{k,{\rm tr}}$ and let $X$ be a random $k$ -vector with $P=P^{X}$ . Write $V=O{\Lambda}O^{T}$ , where $O$ is a $k\times k$ orthogonal matrix and ${\Lambda}={\rm diag}(\lambda_{1},\ldots,\lambda_{k})$ is a diagonal matrix with $\lambda_{1}\geq\lambda_{2}\geq\ldots\geq\lambda_{k}$ . The affine invariance property from Theorem 3.6 entails that

[TABLE]

Denoting by $e_{1}$ the first vector of the canonical basis of $\mathbb{R}^{k^{2}}$ , we then have

[TABLE]

where $U=(U_{1},\ldots,U_{k})^{T}$ is uniform over $\mathcal{S}^{k-1}$ . To have $D_{0}(V,P^{X})={\rm pr}(X\neq 0){\rm pr}\big{(}U_{1}^{2}\geq 1/k\big{)}$ , the inequality in (A.7) needs to be an equality, which requires that $\lambda_{\ell}=\lambda_{1}$ for all $\ell$ , hence that $V=I_{k}$ . ∎

We can now prove Theorem 3.5.

Proof of Theorem 3.5.

Lemmas AA.4-AA.5 establish the result in the spherical case associated with $\theta_{0}=0$ and $V_{0}=I_{k}$ . For general values of $\theta_{0}$ and $V_{0}$ , note that $Y=V_{0}^{-1/2}(X-\theta_{0})$ is elliptical with location [math], shape $I_{k}$ , and satisfies ${\rm pr}(Y=0)={\rm pr}(X=\theta_{0})$ . Writing

[TABLE]

affine invariance then entails that

[TABLE]

with equality if and only if $W_{0}=I_{k}$ , that is, if and only if $V=V_{0}$ . ∎

Proof of Theorem 3.6.

In the proof of Theorem 3.4, we showed that

[TABLE]

Using the fact that $V_{A}^{1/2}=k^{1/2}AV^{1/2}O/\{{\rm tr}(AV\!A^{T})\}^{1/2}$ for some $k\times k$ orthogonal matrix $O$ , this readily yields

[TABLE]

as was to be shown. The affine-equivariance property of the depth regions readily follows. ∎

The proof of Theorem 3.7 requires the following lemma, whose proof is straightforward, hence is omitted.

Lemma A.6.

For any $v_{1},v_{2}$ such that $v_{1}^{2}+v_{2}^{2}<1$ , we have

[TABLE]

Proof of Theorem 3.7.

(i) If $P=P^{X}$ is elliptical with location $\theta_{0}$ and shape $V_{0}$ , then $V_{0}^{-1/2}(X-\theta_{0})$ is equal in distribution to $RU$ , where $U$ is uniformly distributed over the unit sphere $\mathcal{S}^{k-1}$ and is independent of the nonnegative random variable $R$ . Theorem 3.6 then yields

[TABLE]

where $W_{0}$ is as in (A.8). Now, for any $\tilde{V}\in\mathcal{P}_{k,{\rm tr}}$ , Lemma A.1 entails that

[TABLE]

Combining with (A.9), we obtain

[TABLE]

which establishes Part (i) of the result. (ii) Assume that $P=P^{X}$ is bivariate standard normal and fix $V\in\mathcal{P}_{2,{\rm tr}}$ . We aim at evaluating

[TABLE]

see (A). To do so, it will be convenient to parametrise $V$ and the matrix $M$ as

[TABLE]

with $v_{1}^{2}+v_{2}^{2}<1$ and $m_{1}\neq 0$ . Indeed, $m_{1}=0$ makes the probability in (A.11) equal to one, which cannot be the infimum. Decomposing $V^{-1/2}MV^{-1/2}$ into $O\Lambda O^{T}$ , where $O$ is a $2\times 2$ orthogonal matrix and where $\Lambda={\rm diag}\{\lambda_{1}(V^{-1}M),\lambda_{2}(V^{-1}M)\}$ , with $\lambda_{1}(V^{-1}M)\geq\lambda_{2}(V^{-1}M)$ , involves the eigenvalues of $V^{-1}M$ or, equivalently, of $V^{-1/2}MV^{-1/2}$ , we have

[TABLE]

where $X=(X_{1},X_{2})^{T}$ is still bivariate standard normal. Since $\lambda_{1}(-V^{-1}M)=-\lambda_{2}(V^{-1}M)$ for any $M\in\mathcal{M}_{k}^{0}$ , we have

[TABLE]

which allows us to restrict to positive values of $m_{1}$ . We will show below that $\lambda_{2}(V^{-1}M)<0<\lambda_{1}(V^{-1}M)$ for any $M\in\mathcal{M}_{k}^{0}$ . A direct computation shows that, for $m_{1}>0$ ,

[TABLE]

and

[TABLE]

Since $f(m_{2})=-\lambda_{2}(V^{-1}M)/\lambda_{1}(V^{-1}M)$ does not depend on $m_{1}$ , (A) leads to

[TABLE]

It is easy to check that $f$ is differentiable over $\mathbb{R}$ with a derivative of the form $c_{v_{1},v_{2}}(m_{2})\linebreak(v_{2}-v_{1}m_{2})$ , where $c_{v_{1},v_{2}}(m_{2})>0$ for any $m_{2}$ , and that

[TABLE]

We treat the cases $v_{1}=0$ and $v_{1}\neq 0$ separately.

(a) Assume that $v_{1}=0$ . If $v_{2}=0$ , then $V=I_{2}$ and Theorem 3.5 establishes the result. If $v_{2}\neq 0$ , then $f$ has no critical point and

[TABLE]

and

[TABLE]

so that (A.14) yields

[TABLE]

where we have used the fact that if $Z$ has a $F(1,1)$ Fisher-Snedecor distribution, then $Z/(1+Z)$ has a ${\rm Beta}(1/2,1/2)$ distribution.

(b) Assume now that $v_{1}\neq 0$ . Then the only critical point of $f$ is $m_{2}^{\rm crit}=v_{2}/v_{1}$ , so that, irrespective of the fact that this critical point is a local minimum/maximum of $f$ ,

[TABLE]

and

[TABLE]

Lemma A.6 yields

[TABLE]

and

[TABLE]

hence also

[TABLE]

Therefore, (A.14) finally provides

[TABLE]

This proves the result for the case where $P$ is bivariate standard normal. The general result then follows from Part (i) of the theorem. ∎

Proof of Theorem 4.1.

(i) Let $P$ and $Q$ be two probability measures over $\mathbb{R}^{k}$ and fix $V\in\mathcal{P}_{k,{\rm tr}}$ . Fix $\varepsilon>0$ and assume, without loss of generality, that $D_{\theta}(V,P)\leq D_{\theta}(V,Q)$ . Lemma A.1 entails that there exists $M_{0}\in\mathcal{M}^{0}_{k}$ such that ${\rm pr}_{P}\big{(}C^{M_{0}}_{\theta,V}\big{)}\leq D_{\theta}(V,P)+\varepsilon,$ where we still use the notation $C^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}\setminus\{\theta\}:(u_{\theta,V}^{x})^{T}Mu_{\theta,V}^{x}\geq{\rm tr}(M)/k\big{\}}$ . Consequently, using Lemma A.1 again,

[TABLE]

with $\mathcal{C}_{\theta}=\{C^{M}_{\theta,V}:M\in\mathcal{M}^{0}_{k},V\in\mathcal{P}_{k,{\rm tr}}\}$ . Since this holds for any $\varepsilon>0$ and for any $V\in\mathcal{P}_{k,{\rm tr}}$ , we have

[TABLE]

It thus only remains to show that $\mathcal{C}_{\theta}$ is a Vapnik-Chervonenkis class. To do so, note that $C^{M}_{\theta,V}=\big{\{}x\in\mathbb{R}^{k}\setminus\{\theta\}:(x-\theta)^{T}V^{-1/2}MV^{-1/2}(x-\theta)\geq 0\big{\}}$ , so that $\mathcal{C}_{\theta}\subset\{D_{\theta,A}\cap(\mathbb{R}^{k}\setminus\{\theta\}):A\in\mathcal{M}_{k}^{\rm all}\}$ , with $D_{\theta,A}=\big{\{}x\in\mathbb{R}^{k}:(x-\theta)^{T}A(x-\theta)\geq 0\big{\}}$ . Theorem 4.6 from Dudley (2014) implies that $\{D_{\theta,A}:A\in\mathcal{M}_{k}^{\rm all}\}$ is a Vapnik-Chervonenkis class $\mathcal{D}_{\theta}$ . It then follows from Lemma 2.6.17(ii) in van der Vaart and Wellner (1996) that $\{D_{\theta,A}\cap(\mathbb{R}^{k}\setminus\{\theta\}):A\in\mathcal{M}_{k}^{\rm all}\}$ , hence also $\mathcal{C}_{\theta}$ , is a Vapnik-Chervonenkis class. (ii) The proof is long and technical, but follows along the same lines as the proof of Theorem 2.2 in Paindaveine and Van Bever (2018), hence is omitted for the sake of brevity. ∎

Proof of Theorem 4.2.

(i) Recall from (3.1) that $V_{\theta,P}$ is defined as the barycentre of $R_{\theta}(\alpha_{*},P)$ , with $\alpha_{*}=\max_{V}D_{\theta}(V,P)$ . The mapping $V\mapsto D_{\theta}(V,P)$ is upper semicontinuous (Theorem 3.1) and constant over $R_{\theta}(\alpha_{*},P)$ . Clearly, it is easy to define a mapping $V\mapsto\tilde{D}_{\theta}(V,P)$ that is upper semicontinuous, agrees with $V\mapsto D_{\theta}(V,P)$ in the complement of $R_{\theta}(\alpha_{*},P)$ , and for which $V_{\theta,P}$ is the unique maximizer. By using Theorem 4.1, it follows from Theorem 2.12 and Lemma 14.3 in Kosorok (2008) that $d(V_{\theta,P_{n}},V_{\theta,P})\to 0$ almost surely as $n\to\infty$ . Part (i) of the result then follows from the fact that, in neighbourhoods of the form $\{V:d(V,V_{\theta,P})<\varepsilon\}$ , there exists a constant $C=C_{\varepsilon}$ such that $d_{F}(V,V_{\theta,P})<Cd(V,V_{\theta,P})$ , where $d_{F}$ is the Frobenius distance. (ii) The proof is entirely similar, hence is omitted. ∎

Proof of Theorem 7.1.

Let $L_{\theta,V}=U_{\theta,V}U^{T}_{\theta,V}-(1/k)I_{k}$ . Since $(L_{\theta,V})_{11}=-\sum_{\ell=2}^{k}(L_{\theta,V})_{\ell\ell}$ , there exists a $(d_{k}+1)\times d_{k}$ full-rank matrix $H_{0}$ such that ${\rm vech}(L_{\theta,V})=H_{0}\,{\rm vech}_{0}(L_{\theta,V}).$ Therefore, there exists a $k^{2}\times d_{k}$ full-rank matrix $H$ such that $W_{\theta,V}={\rm vec}(L_{\theta,V})=H\tilde{W}_{\theta,V}$ . One can, for example, take $H=DH_{0}$ , where $D$ is the usual duplication matrix. It follows that

[TABLE]

where we used the fact that $H^{T}$ has full column rank. ∎

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bhatia (2007) Bhatia, R. (2007) Positive Definite Matrices . Princeton, NJ: Princeton University Press.
2Bhatia and Holbrook (2006) Bhatia, R. and Holbrook, J. (2006) Riemannian geometry and matrix geometric means. Linear Algebra Appl. , 413 , 594–618.
3Chen et al. (2018) Chen, M., Gao, C. and Ren, Z. (2018) Robust covariance and scatter matrix estimation under huber’s contamination model. Ann. Statist. , 46 , 1932–1960.
4Croux and Haesbroeck (1999) Croux, C. and Haesbroeck, G. (1999) Influence function and efficiency of the minimum covariance determinant scatter matrix estimator. J. Multivariate Anal. , 71 , 161–190.
5Dudley (2014) Dudley, R. M. (2014) Uniform Central Limit Theorems . Cambridge University Press, 2nd edition edn.
6Dümbgen (1998) Dümbgen, L. (1998) On Tyler’s M 𝑀 M -functional of scatter in high dimension. Ann. Inst. Statist. Math. , 50 , 471–491.
7Hallin and Paindaveine (2006) Hallin, M. and Paindaveine, D. (2006) Semiparametrically efficient rank-based inference for shape. I. Optimal rank-based tests for sphericity. Ann. Statist. , 34 , 2707–2756.
8He et al. (1990) He, X., Simpson, D. and Portnoy, S. (1990) Breakdown robustness of tests. J. Amer. Statist. Assoc. , 85 , 446–452.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Abstract

1 Introduction

2 Shape depth

Definition 2.1** (Tyler shape depth).**

3 Main properties

Theorem 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

Theorem 3.4**.**

Theorem 3.5**.**

Theorem 3.6**.**

Theorem 3.7**.**

4 Consistency

Theorem 4.1**.**

Theorem 4.2**.**

5 Two applications

5.1 Choosing a shape matrix estimator in principal component analysis

5.2 Outlier detection

6 Hypothesis testing for shape

7 Perspectives for future research

Theorem 7.1**.**

Appendix A Appendix

Lemma A.1**.**

Proof.

Proof of Theorem 3.1.

Lemma A.2**.**

Proof of Lemma A.2.

Proof of Theorem 3.2.

Proof of Theorem 3.3.

Lemma A.3**.**

Proof.

Proof of Theorem 3.4.

Lemma A.4**.**

Lemma A.5**.**

Proof of Lemma A.4.

Proof of Lemma A.5.

Proof of Theorem 3.5.

Proof of Theorem 3.6.

Lemma A.6**.**

Proof of Theorem 3.7.

Proof of Theorem 4.1.

Proof of Theorem 4.2.

Proof of Theorem 7.1.

Definition 2.1 (Tyler shape depth).

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

Theorem 3.4.

Theorem 3.5.

Theorem 3.6.

Theorem 3.7.

Theorem 4.1.

Theorem 4.2.

Theorem 7.1.

Lemma A.1.

Lemma A.2.

Lemma A.3.

Lemma A.4.

Lemma A.5.

Lemma A.6.