Illumination depth

Stanislav Nagy; Ji\v{r}\'i Dvo\v{r}\'ak

arXiv:1905.04119·math.ST·May 28, 2021

Illumination depth

Stanislav Nagy, Ji\v{r}\'i Dvo\v{r}\'ak

PDF

TL;DR

This paper introduces the illumination depth, a new convex geometry-based depth measure that improves resolution, tie-breaking, and extends to points outside the data support, while maintaining key properties like affine invariance and robustness.

Contribution

It proposes the illumination depth, a novel concept that complements halfspace depth and enhances multivariate data analysis.

Findings

01

Illumination depth provides finer resolution of sample points.

02

It naturally breaks ties in depth-based ordering.

03

The measure is affine invariant and robust.

Abstract

The concept of illumination bodies studied in convex geometry is used to amend the halfspace depth for multivariate data. The proposed notion of illumination enables finer resolution of the sample points, naturally breaks ties in the associated depth-based ordering, and introduces a depth-like function for points outside the convex hull of the support of the probability measure. The illumination is, in a certain sense, dual to the halfspace depth mapping, and shares the majority of its beneficial properties. It is affine invariant, robust, uniformly consistent, and aligns well with common probability distributions.

Tables6

Table 1. Table 1. Computation times (in seconds) for the evaluation of a single halfspace depth region P n , α subscript 𝑃 𝑛 𝛼 P_{n,\alpha} that is illuminated on ( TukeyRegion ); the illumination of 1000 1000 1000 randomly sampled points onto P n , α subscript 𝑃 𝑛 𝛼 P_{n,\alpha} ( geometry ); and the usual halfspace depth of these 1000 1000 1000 points w.r.t. P n subscript 𝑃 𝑛 P_{n} ( ddalpha ). In all cases the depth is computed w.r.t. a random sample from a standard d 𝑑 d -variate normal distribution of size n 𝑛 n .

Setup $\$ R package		TukeyRegion	geometry	ddalpha
$n = 50$	$d = 2$	0.02	0.77	0.08
	$d = 3$	0.03	1.03	0.05
	$d = 4$	0.27	9.61	0.06
$n = 200$	$d = 2$	0.03	0.70	0.20
	$d = 3$	0.54	1.88	0.21
	$d = 4$	295.64	70.36	0.20
$n = 500$	$d = 2$	0.31	0.71	0.51
	$d = 3$	9.99	2.31	0.48
	$d = 4$	72010.61	220.21	0.50

Table 2. Table 2. For different values of δ 𝛿 \delta the table shows the mean and standard deviation (in brackets) of the number of observations (out of 1000 1000 1000 ) with h D ( x ; P n ) ≤ δ ℎ 𝐷 𝑥 subscript 𝑃 𝑛 𝛿 hD(x;P_{n})\leq\delta , and for these observations the mean and standard deviation (in brackets) of the estimated (Spearman) correlation coefficient between their correct ranking R c subscript 𝑅 𝑐 R_{c} and the rankings based on the halfspace depth R h D subscript 𝑅 ℎ 𝐷 R_{hD} and the improved ranking R ℐ subscript 𝑅 ℐ R_{\mathcal{I}} based on the illumination, respectively. The last row corresponds to the observations lying on the boundary of the convex hull of the data — these all have the same halfspace depth, the same R h D subscript 𝑅 ℎ 𝐷 R_{hD} rank, and hence cor S ( R h D , R c ) subscript cor 𝑆 subscript 𝑅 ℎ 𝐷 subscript 𝑅 𝑐 \mathrm{cor}_{S}(R_{hD},R_{c}) is not defined. Based on 100 100 100 replications of the experiment.

	observations	${cor}_{S} (R_{h D}, R_{c})$	${cor}_{S} (R_{ℐ}, R_{c})$
$δ = 0.5$	1000 (0.)	0.989 (0.002)	0.991 (0.002)
$δ = 0.05$	733 (11.)	0.975 (0.004)	0.981 (0.004)
$δ = 0.01$	363 (12.)	0.895 (0.016)	0.933 (0.012)
$δ = 0.005$	253 (11.)	0.806 (0.029)	0.905 (0.017)
$δ = 0.001$	107 (10.)	—	0.923 (0.023)

Table 3. Table 3. Average misclassification rates and their standard deviations (in brackets), bivariate normal distributions with different location and different scale, level of contamination in one of the training samples ranging from 0 to 10 %. Based on 100 replications of the experiment and all testing points (left part) and the outsiders (right part), respectively.

	All points			Outsiders
	Illumination	QDA	Ref. depth	Illumination	QDA	Ref. depth
0 %	0.025 (0.003)	0.024 (0.003)	0.029 (0.006)	0.034 (0.023)	0.033 (0.024)	0.202 (0.148)
1 %	0.025 (0.003)	0.050 (0.005)	0.045 (0.010)	0.035 (0.029)	0.055 (0.038)	0.506 (0.155)
5 %	0.026 (0.003)	0.044 (0.005)	0.052 (0.011)	0.033 (0.026)	0.081 (0.052)	0.539 (0.146)
10 %	0.034 (0.004)	0.047 (0.008)	0.059 (0.059)	0.036 (0.029)	0.092 (0.058)	0.541 (0.139)

Table 4. Table 4. Misclassification rates and their standard deviations (in brackets), bivariate elliptical distributions with different location and different scale, level of contamination in one of the training samples ranging from 0 0 to 10 % percent 10 10~{}\% . Based on 100 100 100 replications of the experiment and all testing points (left part) and the outsiders (right part), respectively.

	All points			Outsiders
	Illumination	QDA	Ref. depth	Illumination	QDA	Ref. depth
0 %	0.055 (0.006)	0.064 (0.007)	0.060 (0.007)	0.155 (0.071)	0.184 (0.105)	0.240 (0.118)
1 %	0.056 (0.006)	0.096 (0.014)	0.066 (0.008)	0.179 (0.081)	0.238 (0.094)	0.527 (0.183)
5 %	0.057 (0.005)	0.103 (0.029)	0.093 (0.016)	0.185 (0.090)	0.239 (0.102)	0.755 (0.129)
10 %	0.068 (0.008)	0.154 (0.038)	0.108 (0.015)	0.202 (0.084)	0.266 (0.098)	0.742 (0.128)

Table 5. Table B.5. Average misclassification rates and their standard deviations (in brackets), bivariate normal distributions with different location and same scale, level of contamination in one of the training samples ranging from 0 to 10 %. Based on 100 replications of the experiment and all testing points (left part) and outsiders (right part), respectively.

	All points			Outsiders
	Illumination	QDA	Ref. depth	Illumination	QDA	Ref. depth
0 %	0.079 (0.006)	0.079 (0.006)	0.085 (0.009)	0.054 (0.030)	0.051 (0.031)	0.236 (0.165)
1 %	0.079 (0.006)	0.089 (0.007)	0.101 (0.010)	0.039 (0.032)	0.059 (0.056)	0.236 (0.235)
5 %	0.081 (0.006)	0.118 (0.010)	0.113 (0.011)	0.047 (0.038)	0.065 (0.052)	0.216 (0.109)
10 %	0.087 (0.006)	0.135 (0.012)	0.120 (0.011)	0.066 (0.045)	0.069 (0.054)	0.210 (0.117)

Table 6. Table B.6. Average misclassification rates and their standard deviations (in brackets), bivariate elliptical distributions with different location and same scale, level of contamination in one of the training samples ranging from 0 to 10 %. Based on 100 replications of the experiment and all testing points (left part) and the outsiders (right part), respectively.

	All points			Outsiders
	Illumination	QDA	Ref. depth	Illumination	QDA	Ref. depth
0 %	0.135 (0.007)	0.136 (0.008)	0.140 (0.009)	0.286 (0.089)	0.296 (0.109)	0.353 (0.127)
1 %	0.134 (0.008)	0.168 (0.015)	0.142 (0.010)	0.317 (0.111)	0.374 (0.119)	0.382 (0.136)
5 %	0.140 (0.007)	0.242 (0.017)	0.162 (0.013)	0.285 (0.092)	0.354 (0.116)	0.384 (0.107)
10 %	0.165 (0.014)	0.265 (0.020)	0.175 (0.014)	0.311 (0.110)	0.336 (0.114)	0.365 (0.120)

Equations247

h D (x; P) = u \in R^{d} in f P (u^{T} X \leq u^{T} x) .

h D (x; P) = u \in R^{d} in f P (u^{T} X \leq u^{T} x) .

K_{δ} = vol_{d} (K \cap H^{-}) = δ ⋂ H^{+},

K_{δ} = vol_{d} (K \cap H^{-}) = δ ⋂ H^{+},

K_{δ} = {x \in R^{d} : h D (x; P) \geq δ} .

K_{δ} = {x \in R^{d} : h D (x; P) \geq δ} .

K^{δ} = {x \in R^{d} : vol_{d} (co (K \cup {x})) \leq vol_{d} (K) + δ} .

K^{δ} = {x \in R^{d} : vol_{d} (co (K \cup {x})) \leq vol_{d} (K) + δ} .

I (x; K) = vol_{d} (co (K \cup {x})) .

I (x; K) = vol_{d} (co (K \cup {x})) .

I (x; E_{μ, Σ})

I (x; E_{μ, Σ})

+ ∣ Σ ∣ \frac{π ^{\frac{d - 1}{2}}}{Γ ( \frac{d + 1}{2} )} \frac{d _{Σ} ( x , μ )}{d} (1 - \frac{1}{d _{Σ} ( x , μ ) ^{2}})^{\frac{d + 1}{2}} - \int_{0}^{a r c c o s (1/ d_{Σ} (x, μ))} sin^{d} (t) d t .

g_{d} (d_{Σ} (x, μ)) = \frac{I ( x ; E _{μ, Σ} )}{vol _{d} ( E _{μ, Σ} )} .

g_{d} (d_{Σ} (x, μ)) = \frac{I ( x ; E _{μ, Σ} )}{vol _{d} ( E _{μ, Σ} )} .

g_{d}^{'} (t) = \frac{Γ ( \frac{d}{2} + 1 )}{π Γ ( \frac{d + 1}{2} )} \frac{1}{d} (1 - \frac{1}{t ^{2}})^{(d - 1) /2} \mbox f or t \in (1, \infty) .

g_{d}^{'} (t) = \frac{Γ ( \frac{d}{2} + 1 )}{π Γ ( \frac{d + 1}{2} )} \frac{1}{d} (1 - \frac{1}{t ^{2}})^{(d - 1) /2} \mbox f or t \in (1, \infty) .

d_{Σ} (x, μ) = g_{d}^{- 1} (\frac{I ( x ; E _{μ, Σ} )}{vol _{d} ( E _{μ, Σ} )}) .

d_{Σ} (x, μ) = g_{d}^{- 1} (\frac{I ( x ; E _{μ, Σ} )}{vol _{d} ( E _{μ, Σ} )}) .

asa (K) = c_{d} δ \to 0 lim \frac{vol _{d} ( K ) - vol _{d} ( K _{δ} )}{δ ^{2/ (d + 1)}}

asa (K) = c_{d} δ \to 0 lim \frac{vol _{d} ( K ) - vol _{d} ( K _{δ} )}{δ ^{2/ (d + 1)}}

asa (K) = b_{d} δ \to 0 lim \frac{vol _{d} ( K ^{δ} ) - vol _{d} ( K )}{δ ^{2/ (d + 1)}}

asa (K) = b_{d} δ \to 0 lim \frac{vol _{d} ( K ^{δ} ) - vol _{d} ( K )}{δ ^{2/ (d + 1)}}

P_{n, δ} = {x \in R^{d} : h D (x; P_{n}) \geq δ},

P_{n, δ} = {x \in R^{d} : h D (x; P_{n}) \geq δ},

D_{α_{n}} (x; P_{n}) = (h D (x; P_{n}) I (x; P_{n, α_{n}}) / vol_{d} (P_{n, α_{n}})) .

D_{α_{n}} (x; P_{n}) = (h D (x; P_{n}) I (x; P_{n, α_{n}}) / vol_{d} (P_{n, α_{n}})) .

n \to \infty lim D_{α_{n}} (x; P_{n}) - (h D (x; P), 1)^{T} = 0 \mbox a l m os t s u r e l y .

n \to \infty lim D_{α_{n}} (x; P_{n}) - (h D (x; P), 1)^{T} = 0 \mbox a l m os t s u r e l y .

D_{α} (x; P) = (D_{α}^{1} (x; P) D_{α}^{2} (x; P)) = (h D (x; P) I (x; P_{α}) / vol_{d} (P_{α})),

D_{α} (x; P) = (D_{α}^{1} (x; P) D_{α}^{2} (x; P)) = (h D (x; P) I (x; P_{α}) / vol_{d} (P_{α})),

x \in K sup ∣ I (x; P_{n, α}) - I (x; P_{α}) ∣ a.s. n \to \infty 0.

x \in K sup ∣ I (x; P_{n, α}) - I (x; P_{α}) ∣ a.s. n \to \infty 0.

x \in K_{n} sup ∣ I (x; P_{n, α}) - I (x; P_{α}) ∣ = O_{P} (\frac{max { 1 , R _{n}^{d - 1} }}{n}),

x \in K_{n} sup ∣ I (x; P_{n, α}) - I (x; P_{α}) ∣ = O_{P} (\frac{max { 1 , R _{n}^{d - 1} }}{n}),

x \in K_{n} sup \frac{I ( x ; P _{n, α} )}{vol _{d} ( P _{n, α} )} - \frac{I ( x ; P _{α} )}{vol _{d} ( P _{α} )} = O_{P} (\frac{max { 1 , R _{n}^{d - 1} }}{n}),

x \in K_{n} sup ∥ D_{α} (x; P_{n}) - D_{α} (x; P) ∥ = O_{P} (\frac{max { 1 , R _{n}^{d - 1} }}{n}) .

x \in K_{n} sup ∣ I (x; P_{n, α}) - I (x; P_{α}) ∣

x \in K_{n} sup ∣ I (x; P_{n, α}) - I (x; P_{α}) ∣

x \in K_{n} sup \frac{I ( x ; P _{n, α} )}{vol _{d} ( P _{n, α} )} - \frac{I ( x ; P _{α} )}{vol _{d} ( P _{α} )}

x \in K_{n} sup ∥ D_{α} (x; P_{n}) - D_{α} (x; P) ∥

B P (T, P_{n}) = min {\frac{m}{n + m} : Y^{(m)} sup d (T (Q_{m + n}), T (P_{n})) = \infty},

B P (T, P_{n}) = min {\frac{m}{n + m} : Y^{(m)} sup d (T (Q_{m + n}), T (P_{n})) = \infty},

d_{H} (K, L) = max {x \in K max y \in L in f ∥ x - y ∥, x \in L max y \in K in f ∥ x - y ∥}

d_{H} (K, L) = max {x \in K max y \in L in f ∥ x - y ∥, x \in L max y \in K in f ∥ x - y ∥}

B P (T_{δ}, P_{n}) = \frac{⌈ δ / ( 1 - δ ) n ⌉}{n + ⌈ δ / ( 1 - δ ) n ⌉} \mbox i f δ \leq Π (P_{n}) / (1 + Π (P_{n})) .

B P (T_{δ}, P_{n}) = \frac{⌈ δ / ( 1 - δ ) n ⌉}{n + ⌈ δ / ( 1 - δ ) n ⌉} \mbox i f δ \leq Π (P_{n}) / (1 + Π (P_{n})) .

T_{α, δ} (P_{n}) = {x \in R^{d} : I (x; P_{n, α}) / vol_{d} (P_{n, α}) \leq δ}

T_{α, δ} (P_{n}) = {x \in R^{d} : I (x; P_{n, α}) / vol_{d} (P_{n, α}) \leq δ}

B P (T_{α, δ}, P_{n})

B P (T_{α, δ}, P_{n})

B P (T_{α, δ}, P_{n})

n \to \infty lim B P (T_{α, δ}, P_{n}) = {α \frac{Π ( P )}{1 + Π ( P )} \mbox i f α < Π (P) / (1 + Π (P)), \mbox o t h er w i se .

n \to \infty lim B P (T_{α, δ}, P_{n}) = {α \frac{Π ( P )}{1 + Π ( P )} \mbox i f α < Π (P) / (1 + Π (P)), \mbox o t h er w i se .

h D (x; P) = h D (Σ^{- 1/2} (x - μ); Q) = F (- Σ^{- 1/2} (x - μ)) = F (- d_{Σ} (x, μ)) .

h D (x; P) = h D (Σ^{- 1/2} (x - μ); Q) = F (- Σ^{- 1/2} (x - μ)) = F (- d_{Σ} (x, μ)) .

P_{α} = {x \in R^{d} : d_{Σ} (x, μ) \leq - F^{- 1} (α) = F^{- 1} (1 - α)} = E_{μ, Σ (F^{- 1} (1 - α))^{2}} .

P_{α} = {x \in R^{d} : d_{Σ} (x, μ) \leq - F^{- 1} (α) = F^{- 1} (1 - α)} = E_{μ, Σ (F^{- 1} (1 - α))^{2}} .

T_{α, δ} (P)

T_{α, δ} (P)

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Illumination depth

Stanislav Nagy

and

Jiří Dvořák

[email protected]

Charles University, Prague, Faculty of Mathematics and Physics, Department of Probability and Math. Statistics, Czech Republic

Abstract.

The concept of illumination bodies studied in convex geometry is used to amend the halfspace depth for multivariate data. The proposed notion of illumination enables finer resolution of the sample points, naturally breaks ties in the associated depth-based ordering, and introduces a depth-like function for points outside the convex hull of the support of the probability measure. The illumination is, in a certain sense, dual to the halfspace depth mapping, and shares the majority of its beneficial properties. It is affine invariant, robust, uniformly consistent, and aligns well with common probability distributions.

1. Introduction

Halfspace depth is a well known statistical tool that allows to define orders, ranks, and quantiles for multivariate datasets. Recent discoveries of connections between the depth and floating bodies [1, 19] uncovered a vast body of knowledge on depth-like procedures in geometry and related fields. That section of mathematics, collected over the past 70 years, is little known in mathematical statistics. In this paper we focus on the paradigm of illumination intimately connected to the depth, yet never studied with respect to its statistical applications. In convex geometry, illumination is known to be dual to the floating bodies (and, by extension, to the halfspace depth). We introduce it as a tool complementary to the halfspace depth, explore its statistical properties, and outline applications. We show that halfspace depth in conjunction with illumination allows to devise a nonparametric methodology similar to the depth, with many advantageous properties:

(i) conceptual and computational simplicity; (ii) full affine invariance; (iii) excellent robustness and large sample properties; (iv) the capacity of naturally breaking ties in data orderings; (v) it can be used for the estimation of extreme quantile regions with efficiency comparable to the state-of-the-art approaches; (vi) it is well adjusted to elliptically symmetric distributions; and (vii) is powerful in applications such as classification.

In Section 2 we introduce illumination and motivate our research by drawing connections between illumination, floating bodies, and halfspace depth. The definition of the depth illumination and its properties are provided in Section 3. The special case of elliptically symmetric distributions is treated in Section 4. In Section 5 we apply the new procedures to tie-breaking in depth-induced orderings, the estimation of extremal depth regions and in the classification task. Additional technical details, proofs and supplementary results from the simulation studies are gathered in the appendix.

2. Illumination of convex bodies

Since its proposal by John W. Tukey [29, 3], the concept of the halfspace depth has occupied a prominent place in multivariate statistics. Its main idea is to rank the points in a $d$ -dimensional Euclidean space $\mathbb{R}^{d}$ , $d\geq 1$ , according to their centrality as recognized with respect to (w.r.t.) a Borel probability measure $P$ on $\mathbb{R}^{d}$ . The higher the depth of $x$ w.r.t. $P$ is, the more centrally located $x$ is within the probability mass of $P$ . Points that maximize the depth over $\mathbb{R}^{d}$ generalize medians, and the loci of points whose depth exceeds given thresholds form equivalents of the inter-quantile regions from univariate inference. A remarkable array of applications of the depth can be found in [13, 35, 14] and the references therein.

Suppose that all random variables are defined on a probability space $\left(\Omega,\mathcal{F},\mathsf{P}\right)$ and denote the set of Borel probability measures on $\mathbb{R}^{d}$ by $\mathcal{P}\left({\mathbb{R}^{d}}\right)$ . The halfspace depth of $x\in\mathbb{R}^{d}$ w.r.t. $X\sim P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ is the minimum probability of a halfspace that contains $x$

[TABLE]

For $P$ uniform on a convex body $K$ (a convex compact subset of $\mathbb{R}^{d}$ with non-empty interior) the map $hD\left(\cdot;P\right)$ closely relates to the floating bodies of $K$ , a concept used in geometry since the 19th century; for an extensive bibliography on the topic see [19].

The (convex) floating body $K_{\delta}$ of a convex body $K$ with $\delta\geq 0$ is defined as the intersection of all halfspaces whose defining hyperplanes cut off a set of volume $\delta$ from $K$

[TABLE]

where $H^{+}$ and $H^{-}$ are the two halfspaces with a boundary hyperplane $H\subset\mathbb{R}^{d}$ [25]. When $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ is uniform on $K$ of unit volume,

[TABLE]

For general $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ and $\delta\in[0,1]$ the latter set is called the central region of $P$ corresponding to $\delta$ . In the sequel it will be denoted by $P_{\delta}$ .

In geometry, a particular collection of bodies somewhat dual to the floating bodies is known as the illumination bodies. Floating bodies $K_{\delta}$ form subsets of $K$ and fill in $K$ from the inside as $\delta$ decreases to zero. In contrast, illumination bodies $K^{\delta}$ of $K$ are supersets of $K$ and approximate $K$ from the outside as $\delta\to 0$ . Let $K$ be a convex body in $\mathbb{R}^{d}$ and let $\delta\geq 0$ . The illumination body $K^{\delta}$ is the collection of all points whose volume of the convex hull with $K$ does not exceed the volume of $K$ by more than $\delta$

[TABLE]

Illumination bodies were proposed by Werner, [32] who found that several important properties of the floating bodies have analogues also when recast in terms of illumination. Just as the floating bodies, the illumination bodies are

(i) convex bodies; (ii) affine equivariant; (iii) ellipsoids if $K$ is an ellipsoid; and (iv) they converge to $K$ at the same rate as $K_{\delta}$ with $\delta$ decreasing to zero (see also the discussion in Section 2.2 below). Further important properties of illumination bodies can be found in [31, 33, 34, 24, 26].

All these characteristics make the illumination bodies of great interest in statistics. Convexity of the upper level sets of depths is a trait that is often recognized as desirable [5, 27]. As argued by Donoho [3, 4], affine invariance in connection with robustness is the most valuable characteristic of the halfspace depth. For $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ multivariate normal, or more generally, for $P$ elliptically symmetric with a density $f$ , the depth central regions $P_{\delta}$ are known to be ellipsoids with the same centre and orientation as the ellipsoids given by the level sets of $f$ .

It may appear that floating bodies and illumination bodies are inverse to each other, i.e. that $(K_{\delta})^{\delta^{\prime}}=(K^{\delta^{\prime}})_{\delta}=K$ for $\delta=\delta^{\prime}$ , or at least for $\delta^{\prime}$ chosen appropriately (see Figure 1). The latter is true for $K$ an ellipsoid. But, none of those identities holds true generally, as can be seen already for $K$ a polytope. Indeed, $K_{\delta}$ is always strictly convex, but for $K$ a polytope, $K^{\delta}$ is again a polytope [25]; see also Figure 2.

The open problem of finding an inverse floating body is much more involved; for an overview of some advances in this direction see [19, Section 8]. For these reasons, it appears that at the current state of the art, illumination is the closest one can get to the inverse mapping of the floating body operator (or, by extension, to the inverse mapping of the halfspace depth).

2.1. Illumination of ellipsoids

The set of ellipsoids is central to the theories of illumination, floating bodies and halfspace depth. For instance, ellipsoids form an invariance class for all three transformations. In the sequel, it will thus be important to have exact expressions for the illumination bodies of ellipsoids. Define for a convex body $K$ and $x\in\mathbb{R}^{d}$ the illumination of $x$ w.r.t. $K$ as

[TABLE]

Obviously, $K^{\delta}=\left\{x\in\mathbb{R}^{d}\colon\mathcal{I}\left(x;K\right)\leq\operatorname{vol}_{d}\left(K\right)+\delta\right\}$ .

Lemma 1.

For $\mu\in\mathbb{R}^{d}$ and a symmetric, positive definite matrix $\Sigma\in\mathbb{R}^{d\times d}$ consider the distance $\mathsf{d}_{\Sigma}\left(x,\mu\right)=\sqrt{\left(x-\mu\right)^{\mathsf{T}}\Sigma^{-1}\left(x-\mu\right)}$ and the ellipsoid given as its unit ball $\mathcal{E}_{\mu,\Sigma}=\left\{x\in\mathbb{R}^{d}\colon\mathsf{d}_{\Sigma}\left(x,\mu\right)\leq 1\right\}$ . For all $x\notin\mathcal{E}_{\mu,\Sigma}$ we have

[TABLE]

The proof of Lemma 1 can be found in Appendix A along with Lemma 10 that states some important properties of the function $g_{d}$ given by

[TABLE]

The function $g_{d}$ is continuously differentiable and its derivative takes a rather simple form

[TABLE]

According to Lemma 10, there exists an inverse function $g_{d}^{-1}\colon[1,\infty)\to[1,\infty)$ to $g_{d}$ , and we can write

[TABLE]

This result will be of great importance in the subsequent analysis.

2.2. Duality considerations

The main impetus for considering floating/illumination bodies in geometry comes from the calculus of variations where many functionals over subsets of $\mathbb{R}^{d}$ are minimized by ellipsoids. Such statements are conveniently quantified using the affine surface area $\operatorname{asa}\left(K\right)$ of a convex body $K$ . For $K$ with a sufficiently smooth boundary $\partial K$ , $\operatorname{asa}\left(K\right)$ is given as a certain integral over $\partial K$ [22, Section 10.5]. Interestingly, it can be written also as the limit

[TABLE]

for $c_{d}>0$ a known constant. Thus, floating bodies can be used to extend the definition of the affine surface area to arbitrary convex bodies [25], see also [19, Section 5.3]. In connection with the motivation from the calculus of variations, by the important affine isoperimetric inequality [22, Section 10.5] ellipsoids are the only maximizers of the affine surface area among all convex bodies with fixed volume.

As shown in [32], a definition equivalent to (2) can be stated also in terms of the illumination bodies

[TABLE]

for $\operatorname{asa}\left(K\right)$ from (2), $b_{d}>0$ a known constant, and $K$ a convex body. Thus, floating bodies and illumination bodies approach $K$ at the same rate, and in this respect they act dually to each other.

Another duality aspect of floating and illumination bodies was recently studied in [17, 18]. There it was shown that, under appropriate conditions, the polar of a floating body of $K$ is, in a proper distance, close to an illumination body of the polar of $K$ . Therefore, even though the exact correspondence between $\left(K_{\delta}\right)^{\delta}$ and $K$ does not hold true, solid evidence from convex geometry suggests that illumination is a concept that is naturally complementary to the floating body (and the halfspace depth). This pairing will be used throughout this paper to define robust, affine invariant extensions of the halfspace depth $hD$ .

3. Illumination depth

Denote the depth of the halfspace median of $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ by $\Pi(P)=\sup_{x\in\mathbb{R}^{d}}hD\left(x;P\right)$ . By definition, $\Pi(P)\geq 1/2$ if and only if $P$ is halfspace symmetric [36]. Thus, $\Pi(\cdot)$ may be considered a measure of symmetry of probability distributions. For distributions with a density $1/2\geq\Pi(P)\geq(d+1)^{-1}$ [19, Section 4].

Let $\left\{X_{n}\right\}_{n=1}^{\infty}$ be a sequence of independent random variables with distribution $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ . Denote by $P_{n}\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ the empirical measure of the first $n$ variables. We propose to amend the sample halfspace depth $hD\left(\cdot;P_{n}\right)$ by considering not only the collection of all depth central regions

[TABLE]

but also the illuminations on these. Let $\left\{\alpha_{n}\right\}_{n=1}^{\infty}\subset[0,\Pi(P))$ be a non-increasing sequence of constants. For a random sample of size $n$ the halfspace-illumination depth (or simply the illumination depth) of $x\in\mathbb{R}^{d}$ w.r.t. the empirical measure $P_{n}$ is

[TABLE]

In Figure 3 some level sets of this depth are displayed. Several remarks are in order.

(R1)

The depth $D_{\alpha_{n}}$ has two components. The usual halfspace depth is a reliable indicator of centrality if $x$ lies inside the region $P_{n,\alpha_{n}}$ where there are enough observations to assess its degree of centrality. For such $x$ , $D_{\alpha_{n}}(x;P_{n})=(hD(x;P_{n}),1)^{\mathsf{T}}$ , and the illumination does not affect depth-rankings. In contrast, the illumination evaluates the position of $x$ against the group of central points $P_{n,\alpha_{n}}$ that represent the main mass of $P_{n}$ . Illumination thus plays a role in the ranking of extremal points whose depth is small, or zero. In the terminology of [35], illumination is an outlyingness function. 2. (R2)

The illumination is undefined if $\Pi(P_{n})<\alpha_{n}$ , or if $\operatorname{vol}_{d}\left(P_{n,\alpha_{n}}\right)=0$ . The first situation cannot occur for $n$ large enough since $\Pi(P_{n})\xrightarrow[n\to\infty]{\mathrm{a.s.}}\Pi(P)$ [4, formula (6.7)]. Suppose then that $n$ is big enough for $P_{n,\alpha_{n}}$ to be non-empty. Because $P_{n,\alpha_{n}}$ is convex, its volume is zero only if that set is contained in a hyperplane. For random samples from continuous distributions, that happens with probability zero. 3. (R3)

The maximum depth $\Pi(P)$ is typically unknown. In practice it can be replaced by $\Pi(P_{n})$ or, if $P$ is regular enough, by a universal lower bound on $\Pi(P)$ (i.e. $1/2$ for halfspace (or elliptically) symmetric distributions, $\exp(-1)$ for log-concave distributions, or $(s+1)^{-1/s}$ for $s$ -concave measures with $-1<s<0$ [19, Theorem 3]). In any case, $\alpha_{n}$ should be bounded away from $\Pi(P)$ for the sets $P_{n,\alpha_{n}}$ to be sufficiently large. 4. (R4)

The illumination depth may be parametrized also in terms of probability — $\alpha_{n}$ may be chosen as the maximum $\delta\in(0,1/2)$ with the property that $P_{n,\delta}$ contains at least $\left\lceil p_{n}\,n\right\rceil$ sample points for $\left\{p_{n}\right\}_{n=1}^{\infty}\in(0,1)$ given. In the theoretical treatment of the depth we consider the simpler parametrization by $\alpha$ .

The practical choice of the cut-off levels $\alpha_{n}$ determines the properties of the depth and should be selected with an application in mind. For estimation of extreme depth quantile regions $P_{\delta}$ with $\delta$ extremely small, it will be advantageous to take $\alpha_{n}$ to converge to [math] slowly enough. That way, one obtains estimators comparable to the approaches taken in [7, 10]. The disadvantage of this choice of thresholding is its lack of robustness. For a procedure with good robustness properties, a sequence of cut-offs bounded from below is more appropriate. Here we strive for robustness in conjunction with affine invariance. Therefore, we focus mainly on the latter situation; one example of the former scenario will be given in Section 5.2. For tie-breaking purposes, for a particular point $x$ the cut-off $\alpha_{n}$ may even be taken to depend on $x$ . Then, if the halfspace depths of $x$ any $y$ coincide, illumination on some $P_{n,\alpha_{n}}$ with $\alpha_{n}>hD\left(x;P_{n}\right)$ may help to decide which of the two points is deeper inside the mass of $P_{n}$ , see Section 5.1. Without any substantial loss of generality111All results presented here could be extended in a straightforward way, at the cost of further technicalities., in our theoretical treatment we focus on the situation of a constant cut-off sequence $\alpha_{n}=\alpha<\Pi(P)$ for all $n$ . As we will see in Section 3.2, for $\alpha=\Pi(P)/(1+\Pi(P))$ we obtain a procedure with excellent robustness properties.

The mode of the cut-offs $\alpha_{n}$ affects the population version of $D_{\alpha_{n}}$ . If $\alpha_{n}\to 0$ , only $hD$ in (3) is relevant as $n\to\infty$ . In this situation the appropriate population version of $D_{\alpha_{n}}$ is the usual halfspace depth $hD\left(\cdot;P\right)$ and by the standard consistency result for $hD$ for any $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ [4]

[TABLE]

The situation is different for a constant $\alpha_{n}=\alpha\in(0,\Pi(P))$ . Then we define the population version of $D_{\alpha}$ as

[TABLE]

which is well defined as soon as $\operatorname{vol}_{d}\left(P_{\alpha}\right)>0$ . It is immediate that (4) reduces to (3) for $P=P_{n}$ and $\alpha=\alpha_{n}$ .

The illumination depth $D_{\alpha}$ satisfies the desirable properties of a statistical depth suggested in [35, 27].

Theorem 2.

Let $x\in\mathbb{R}^{d}$ , $X\sim P_{X}=P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ , and $\alpha\in(0,\Pi(P))$ be such that $\operatorname{vol}_{d}\left(P_{\alpha}\right)>0$ .

(i)

For any $A\in\mathbb{R}^{d\times d}$ non-singular and any $b\in\mathbb{R}^{d}$ we have $D_{\alpha}\left(Ax+b;P_{AX+b}\right)=D_{\alpha}\left(x;P_{X}\right)$ , where $P_{AX+b}$ is the distribution of the random vector $AX+b$ . 2. (ii)

$P$ * is halfspace symmetric around $x\in\mathbb{R}^{d}$ if and only if $D_{\alpha}\left(x;P\right)=\left(c,1\right)^{\mathsf{T}}$ for $c\geq 1/2$ .* 3. (iii)

For $x$ that satisfies $hD\left(x;P\right)=\Pi(P)$ and any $u\in\mathbb{R}^{d}$ , the first and the second component of $D_{\alpha}\left(x+u\,t;P\right)$ are a non-increasing and a non-decreasing function of $t\geq 0$ , respectively. 4. (iv)

All the level sets of the form $\left\{x\in\mathbb{R}^{d}\colon D_{\alpha}^{1}\left(x;P\right)\geq c_{1}\mbox{ and }D_{\alpha}^{2}\left(x;P\right)\leq c_{2}\right\}$ are convex for all $c_{1},c_{2}\in\mathbb{R}$ , and compact for all $c_{1}>0$ , $c_{2}\in\mathbb{R}$ . 5. (v)

The first element of $D_{\alpha}\left(\cdot;P\right)$ is a function that is upper semi-continuous. Its second element is a function that is continuous on $\mathbb{R}^{d}$ . If $P$ has a density, $D_{\alpha}\left(\cdot;P\right)$ is continuous on $\mathbb{R}^{d}$ . 6. (vi)

As $\left\|x\right\|\to\infty$ , the first element of $D_{\alpha}\left(x;P\right)$ converges uniformly to zero. Its second element increases uniformly to $\infty$ .

3.1. Uniform consistency

Recall that $P$ is said to have contiguous support if the support of $P$ cannot be separated by a slab between two parallel hyperplanes. Connected support is contiguous.

Theorem 3.

Let $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ be absolutely continuous with contiguous support, and let $\alpha\in(0,\Pi(P))$ . Then the illumination is locally uniformly consistent for $P$ , that is for any $K\subset\mathbb{R}^{d}$ compact

[TABLE]

If, furthermore, $P$ satisfies Assumptions 1 and 2 from [1], and $K_{n}$ is a sequence of sets such that $K_{n}\cup P_{\alpha}$ is contained in a ball of radius $R_{n}$ , then

[TABLE]

In particular, if $d=1$ and $R_{n}>0$ , or $d>1$ and $R_{n}=o\left(n^{1/(2(d-1))}\right)$ ,

[TABLE]

Thanks to the sharp bound on the volume difference of convex bodies devised in Lemma 11 in the appendix, it is possible to state an explicit deviation inequality such as that from [1, Theorem 2]. We omit this result for brevity.

Note that the technical Assumptions 1 and 2 from [1] are not restrictive at all. They are satisfied, for instance, if $P$ has a density that is bounded away from zero in a large enough superset of $P_{\alpha}$ , if the density of $P$ is continuous, positive and decreases fast enough [1, Assumption 3], or, for the case of elliptically symmetric distributions, if the density of $P$ is continuous and positive at the boundary of $P_{\alpha}$ .

Neither illumination nor the illumination depth are consistent uniformly over unbounded sets in $\mathbb{R}^{d}$ . This is illustrated in an example in Section A.5 in the appendix. This does not limit practical applications. For any sequence of compact sets $K_{n}$ , allowed to increase in size with $n$ , Theorem 3 guarantees uniform consistency.

3.2. Robustness

We now turn to the robustness properties of the illumination. For a data set that corresponds to an empirical measure $P_{n}\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ , the addition breakdown point of an estimator $T=T(P_{n})$ is defined [3] as

[TABLE]

where $Y^{(m)}$ is an $m$ -tuple of (not necessarily distinct) points in $\mathbb{R}^{d}$ , $Q_{m+n}$ is the empirical measure that assigns probability $1/(m+n)$ to all the data points from $P_{n}$ and $Y^{(m)}$ , and $\mathsf{d}$ is an appropriate distance in the target space of $T$ . For the usual halfspace depth, the finite sample breakdown point of the central region $T_{\delta}(P_{n})=P_{n,\delta}$ for $\delta\in(0,\Pi(P_{n}))$ can be derived from [4, Section 3]. With the Hausdorff distance of compact sets $K,L\subset\mathbb{R}^{d}$

[TABLE]

in (5) in place of $\mathsf{d}$ , from the advances in [4] (for a formal proof see Section A.6 in the appendix) it can be shown that

[TABLE]

For $\delta$ close to zero and $n$ large, the finite sample breakdown point of $P_{n,\delta}$ is of order $\delta$ . This corroborates the well known fact that the outer, extremal regions of the halfspace depth are not robust. For instance, in any configuration of $n$ points in $\mathbb{R}^{d}$ , to dislocate the largest proper depth region $P_{n,1/n}$ — the convex hull of the sample points — it is enough to add a single contaminating observation. On the other hand, for any $\delta\geq\Pi(P_{n})/(1+\Pi(P_{n}))$ , the central region $P_{n,\delta}$ is rather stable with a positive breakdown point not smaller than $\Pi(P_{n})/(1+\Pi(P_{n}))$ . If $P_{n}$ corresponds to a random sample from $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ , as $n\to\infty$ the latter breakdown point approaches $\Pi(P)/(1+\Pi(P))$ [4, Proposition 3.3]. Thus, the more regular $P$ is, the more robust its inner halfspace depth central regions are.

We now give expressions for the finite sample breakdown point of the illumination.

Theorem 4.

For an empirical measure $P_{n}\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ , $\alpha<\Pi(P_{n})$ , and

[TABLE]

for $\delta\geq 1$ , we have that

[TABLE]

If $P_{n}$ is the empirical measure of a random sample from $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ of size $n$ , then it almost surely holds true that

[TABLE]

Theorem 4 asserts that the illumination is quite robust — unlike for the halfspace depth, its breakdown point does not depend on $\delta$ . For $\alpha=\Pi(P)/(1+\Pi(P))$ we have, in view of Remark (R3), that for any configuration of $n$ points $P_{n}$ we have $BP\left(T_{\alpha,\delta},P_{n}\right)\geq 1/(d+2)$ , and the illumination always possesses a strictly positive breakdown point. This simple bound is, however, rather pessimistic. If $P_{n}$ is a random sample from a log-concave distribution, $\lim_{n\to\infty}BP\left(T_{\alpha,\delta},P_{n}\right)=1/(1+e)\approx 0.27$ almost surely, and for $P$ halfspace symmetric, $\lim_{n\to\infty}BP\left(T_{\alpha,\delta},P_{n}\right)=1/3$ almost surely. Overall, for a random sample of size $n$ from $P$ that is regular enough, it takes, with large probability, at least almost $m=n/2$ points to be added to the dataset to disturb the illumination procedure completely. This contrasts sharply with the usual halfspace depth. According to (7), for any distribution $P$ , $P_{n,\delta}$ alone will be disrupted completely already if around $m=n\delta/(1-\delta)$ contaminants are strategically added to the sample. For numerical results see Section 5.

3.3. Computational cost

Illumination is computed in two steps. Given a dataset $P_{n}$ , firstly a single central region $P_{n,\alpha}$ is computed. This set is a convex polytope. In the second step, illumination of $x$ onto $P_{n,\alpha}$ is evaluated by employing algorithms for the computation of convex hulls of points and volumes of convex polytopes.

Computation of $P_{n,\alpha}$ is generally a demanding task. For a single level $\alpha$ required for the illumination, recent advances made this feasible in dimension $d$ up to five or ten and moderate sample sizes $n$ ; see [15] and the references therein.

Finding the illumination of $x$ is already quite well explored. Both problems of finding a convex hull of a dataset and its volume are standard in computational geometry. A great number of effective algorithms exists in this direction [2]; for a more recent contribution see [8].

In our R implementation we combine tools from package TukeyRegion with the R interface to the Qhull222http://www.qhull.org/ toolbox implemented in package geometry. This code, given in Appendix C, handles hundreds of observations in dimensions $d\leq 5$ without substantial difficulties, see Table 1. More efficient algorithms for the computation of volumes of convex polytopes can be used to speed up the computation. The computation of the single halfspace depth region $P_{n,\alpha}$ is the true bottleneck of this procedure, especially in higher dimensions.

4. Illumination for elliptically symmetric distributions

Now we focus our attention to elliptically symmetric distributions, or more generally, to those distributions $P$ whose halfspace depth central regions $P_{\delta}$ are close to ellipsoids. It may appear that the latter assumption is restrictive. Nonetheless, it is known that any sufficiently regular distribution $P$ possesses central regions $P_{\delta}$ that are bound to have almost ellipsoidal shapes. This was first observed by Milman and Pajor, [16], see the proposition in the appendix of that paper. There it is shown that for $P$ uniform on a symmetric convex body $K$ every $P_{\delta}$ is uniformly, up to a known constant, isomorphic to an ellipsoid. References to further extensions of that groundbreaking result to (asymmetric) log-concave or $s$ -concave measures with additional discussion can be found in [19, Section 7]. Thus, even though formally the restriction to elliptical symmetry in this section is real, at least heuristically all these results will hold true more widely.

We start by collecting some useful information about elliptically symmetric distributions. For references to these results see [28, 9]. $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ is said to be spherically symmetric if the measure $P$ is invariant with respect to all orthogonal transformations on $\mathbb{R}^{d}$ . It is elliptically symmetric if it can be represented as an affine image of a spherically symmetric distribution — we say that $X\sim P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ is elliptically symmetric if $X\stackrel{{\scriptstyle\mathclap{\mbox{\tiny{\emph{d}}}}}}{{=}}\mu+AZ$ for $\mu\in\mathbb{R}^{d}$ , $A\in\mathbb{R}^{d\times k}$ , and $Z=\left(Z_{1},\dots,Z_{k}\right)^{\mathsf{T}}\sim Q\in\mathcal{P}\left({\mathbb{R}^{k}}\right)$ is spherically symmetric. The symbol $\stackrel{{\scriptstyle\mathclap{\mbox{\tiny{\emph{d}}}}}}{{=}}$ stands for “is equal in distribution”. Note that $Q$ is uniquely characterized by the symmetric333By symmetry of the distribution function we mean that $F(z)=1-F(-z)$ at all points of continuity of $F$ . distribution function $F(z)=\mathsf{P}\left(Z_{1}\leq z\right)$ , $z\in\mathbb{R}$ . We also write $X\sim P=EC\left(\mu,\Sigma,F\right)$ for $\Sigma=AA^{\mathsf{T}}\in\mathbb{R}^{d\times d}$ . Because $\mu+cAZ=\mu+A(cZ)$ and $\widetilde{F}(z)=\mathsf{P}\left(cZ_{1}\leq z\right)=F\left(z/c\right)$ for any $c>0$ and $z\in\mathbb{R}$ , $EC\left(\mu,c^{2}\Sigma,F\right)=EC\left(\mu,\Sigma,\widetilde{F}\right)$ . To identify $P$ uniquely, we therefore in this section consider mainly elliptically symmetric distributions whose scatter matrix $\Sigma$ is normalized to have a unit determinant $\left|\Sigma\right|=1$ .

For $X\sim P=EC\left(\mu,\Sigma,F\right)\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ with $\Sigma$ positive definite and $Q=\Sigma^{-1/2}\left(X-\mu\right)$ the spherically symmetric affine image of $P$ , affine invariance of $hD$ gives a simple expression for the halfspace depth of any $x\in\mathbb{R}^{d}$

[TABLE]

It is also not hard to realise (see, e.g., [19, Theorem 34]) that $P=EC(\mu,\Sigma,F)$ if and only if all the halfspace depth central regions $P_{\alpha}$ with $0<\alpha<1/2$ are ellipsoids of the form

[TABLE]

Thus, by the expression for the illumination of ellipsoids (1), the lower level sets of the illumination (8) with $\delta\geq 1$ are also all ellipsoids

[TABLE]

In particular, for $x\notin P_{\alpha}$ ,

[TABLE]

and both the halfspace depth upper level sets and the illumination lower level sets are ellipsoids centred at $\mu$ with the same orientation as the Mahalanobis ellipsoid $\mathcal{E}_{\mu,\Sigma}$ , or equivalently, the density contours of $P$ . A first application of this property is straightforward. For $P$ with a unimodal elliptically symmetric density, the depth-induced centre-outward ordering of the points $x\in\mathbb{R}^{d}$ is a function of their Mahalanobis distance $\mathsf{d}_{\Sigma}(x,\mu)$ from the mode $\mu$ — the smaller $\mathsf{d}_{\Sigma}\left(x,\mu\right)$ , the more central $x$ is. To get a robust, affine invariant depth-based estimator of $\mathsf{d}_{\Sigma}\left(x,\mu\right)$ , employ (9) and (10) to see that

[TABLE]

Because $F$ is non-decreasing and $g_{d}^{-1}$ strictly increases on its domain by Lemma 10, $\mathsf{d}_{\Sigma}(x_{1},\mu)\leq\mathsf{d}_{\Sigma}(x_{2},\mu)$ is equivalent with one of the three possible situations:

(i) either $hD\left(x_{1};P\right)\geq\alpha>hD\left(x_{2};P\right)$ ; or (ii) if both depths are high, $hD\left(x_{1};P\right)\geq hD\left(x_{2};P\right)\geq\alpha$ ; or (iii) if both depths are low, $\alpha>\max\left\{hD\left(x_{1};P\right),hD\left(x_{2};P\right)\right\}$ , and at the same time $\mathcal{I}\left(x_{1};P_{\alpha}\right)\leq\mathcal{I}\left(x_{2};P_{\alpha}\right)$ . This yields the following centre-outwards ranking procedure for points $x_{1},\dots,x_{m}\in\mathbb{R}^{d}$ (the lowest rank is for the most central position):

(i)

compute the depth $D_{\alpha}$ of all points $x_{1},\dots,x_{m}$ ; 2. (ii)

the $k$ points whose halfspace depth is at least $\alpha$ are ranked as the $k$ most central points $x_{(1)}\preceq\dots\preceq x_{(k)}$ according to their decreasing halfspace depth, i.e. $x_{i}\preceq x_{j}$ if $hD\left(x_{i};P\right)\geq hD\left(x_{j};P\right)$ ; 3. (iii)

the $m-k$ remaining points are ranked as the less central points $x_{(k+1)}\preceq\dots\preceq x_{(m)}$ according to their increasing illumination, i.e. $x_{i}\preceq x_{j}$ if $\mathcal{I}\left(x_{i};P_{\alpha}\right)\leq\mathcal{I}\left(x_{j};P_{\alpha}\right)$ .

This robust ranking can produce ties. However, they are easy to break. If $hD\left(x_{i};P\right)=hD\left(x_{j};P\right)\geq\alpha$ , use the illumination and set $x_{i}\prec x_{j}$ if $\mathcal{I}\left(x_{i};P_{\alpha^{\prime}}\right)<\mathcal{I}\left(x_{j};P_{\alpha^{\prime}}\right)$ for some $\alpha^{\prime}>hD\left(x_{i};P\right)$ . If the original tied ranks of $x_{i}$ and $x_{j}$ were decided from the illumination, set $x_{i}\prec x_{j}$ if $hD\left(x_{i};P\right)>hD\left(x_{j};P\right)$ . For $x_{1},\dots,x_{m}$ sampled randomly from a continuous distribution, we identify the ranks uniquely, with no ties, almost surely. The performance of these ranking procedures is demonstrated in Section 5.1.

Suppose now that we have an estimator $F_{n}$ of $F$ from $P=EC\left(\mu,\Sigma,F\right)$ at hand. It could be obtained, for instance, by first performing a robust whitening transformation of the random sample $X_{1},\dots,X_{n}$ from $P$ , i.e. considering $Z_{i}=\widehat{\Sigma}^{-1/2}\left(X_{i}-\widehat{\mu}\right)$ , $i=1,\dots,n$ , for some robust location and scatter estimators $\widehat{\mu}$ and $\widehat{\Sigma}$ (in accordance with our parametrization, $\left|\widehat{\Sigma}\right|=1$ ). The estimators $\widehat{\mu}$ and $\widehat{\Sigma}$ could be, for instance, the halfspace median (the barycentre of the points that maximize $hD\left(\cdot;P_{n}\right)$ ), and the matrix of unit determinant proportional to the halfspace scatter median matrix (the matrix that maximizes the scatter extension of the halfspace depth, see [20]). In the second step, $F$ is estimated simply by the empirical distribution function $F_{n}$ of any univariate marginal distribution of $Z_{1},\dots,Z_{n}$ . Since $F_{n}$ estimates a univariate distribution function, it does not suffer from the curse of dimensionality, and can be expected to have decent theoretical properties. More involved estimators of $F$ such as that from [12] can be employed as well.

Because $F$ is symmetric, we can assume that also $F_{n}$ possesses the symmetry property. From a possibly non-symmetric estimator $\widetilde{F}_{n}$ of $F$ this can be achieved by symmetrization: set $F_{n}$ to be the right continuous version of the function $t\mapsto\left(\widetilde{F}_{n}(t)+1-\widetilde{F}_{n}(-t)\right)/2$ . This procedure improves the properties of the basic estimator $\widetilde{F}_{n}$ if $F$ is symmetric [23].

Finally, assume that $F_{n}$ is non-decreasing and affine invariant, the latter meaning that $F_{n}$ based on $X_{1},\dots,X_{n}$ is the same as $F_{n}$ constructed from $AX_{1}+b,\dots,AX_{n}+b$ for any $A\in\mathbb{R}^{d\times d}$ non-singular and $b\in\mathbb{R}^{d}$ . These conditions are natural in our setting and are all satisfied by most reasonable estimators.

4.1. Estimation of the Mahalanobis distance

From (11) we see that the Mahalanobis distance $\mathsf{d}_{\Sigma}\left(x,\mu\right)$ can be estimated directly from the illumination depth, given that an estimator of $F$ is at hand. Consider the estimator

[TABLE]

Note that $F_{n}$ needs to be known only in the central part of the distribution; in the more extreme regions, $M_{\alpha}$ is proportional to a known function of the illumination only. Several desirable properties of $M_{\alpha}\left(\cdot;P_{n}\right)$ are summarized in the following theorem.

Theorem 5.

Let $P_{n}\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ be the empirical measure of a random sample $X_{1},\dots,X_{n}$ from $P\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ that is not concentrated in a singleton, let $F_{n}^{-1}(1-\alpha)>0$ with $0<\alpha<1/2$ , and let $x\in\mathbb{R}^{d}$ .

(i)

If $P=EC\left(\mu,\Sigma,F\right)$ and $F_{n}$ is a Fisher consistent estimator of $F$ , then $M_{\alpha}\left(x;P_{n}\right)$ is a Fisher consistent estimator of the Mahalanobis distance $\mathsf{d}_{\Sigma}(x,\mu)$ . 2. (ii)

$M_{\alpha}$ * is affine invariant, i.e. $M_{\alpha}\left(Ax+b;P_{AX+b,n}\right)=M_{\alpha}\left(x;P_{n}\right)$ for any non-singular matrix $A\in\mathbb{R}^{d\times d}$ and $b\in\mathbb{R}^{d}$ , where $P_{AX+b,n}$ is the empirical measure of the transformed random sample $AX_{1}+b,\dots,AX_{n}+b$ .* 3. (iii)

For any $\delta\geq F_{n}^{-1}\left(1-\Pi(P_{n})\right)$ , the lower level set

[TABLE]

is either the unique halfspace median of $P_{n}$ , or a convex body. 4. (iv)

As $n\to\infty$ , let the limiting addition breakdown point (5) of the estimator $T(P_{n})=F_{n}^{-1}\left(1-\alpha\right)$ with the metric $\mathsf{d}(s,t)=\left|\log(s)-\log(t)\right|$ for $s,t>0$ be at least $\min\left\{\alpha,1/3\right\}$ almost surely. Then for any $\delta>\lim_{n\to\infty}F_{n}^{-1}\left(1-\Pi\left(P_{n}\right)\right)$ , the limiting addition breakdown point of the level set (12) with respect to the Hausdorff distance is $\min\left\{\alpha,1/3\right\}$ almost surely.

The choice of the metric in the breakdown point from part (iv) is natural. For a non-degenerate symmetric distribution function $F$ and its symmetric estimator $F_{n}$ , the quantile $F^{-1}\left(1-\alpha\right)$ with $0<\alpha<1/2$ lies in the positive halfline, and a sequence of estimated quantiles that converges to zero is just as undesirable as that escaping to infinity. Note also that the condition on the breakdown point of the sample quantile is naturally satisfied for any reasonable estimator of $F$ — already the (symmetrized) empirical distribution function of any univariate random sample obeys it. The additional condition on $\delta$ in part (iv) guarantees that the level set (12) is non-empty. If, for instance, $P=EC\left(\mu,\Sigma,F\right)$ is such that $F$ strictly increases in a neighbourhood of [math], and $F_{n}$ is an estimator that is strongly uniformly consistent on this neighbourhood, this condition reduces to $\delta>0$ .

In the following theorem we study the uniform consistency of our robust estimator of the Mahalanobis distance.

Theorem 6.

Let $P=EC\left(\mu,\Sigma,F\right)\in\mathcal{P}\left({\mathbb{R}^{d}}\right)$ be such that $\Sigma$ is positive definite, $0<\alpha<1/2$ , $F$ is continuous at [math] and strictly increasing on $[0,F^{-1}\left(1-\alpha\right)]$ , it has a density that is bounded from below in a neighbourhood of $F^{-1}\left(1-\alpha\right)$ , and let $F_{n}$ satisfy

[TABLE]

and

[TABLE]

where $R_{n}=o\left(n^{1/(2(d-1))}\right)$ . Let $K_{n}$ be a sequence of sets with $K_{n}\subset B^{d}\left(\mu,R_{n}\right)$ . Then

[TABLE]

The proof of Theorem 6 is technical and can be found in the appendix.

4.2. Estimation of the halfspace depth

The close relation of the illumination depth with the Mahalanobis distance will now be used in a robustified definition of the sample halfspace depth based on the idea of illumination. Recall the connections of the halfspace depth with the Mahalanobis distance from (9). For $\alpha\in[0,1/2)$ we propose the following estimator of the depth $hD\left(\cdot;P\right)$ of $P=EC(\mu,\Sigma,F)$

[TABLE]

As for the illumination, there are several natural choices of the cut-off $\alpha$ . Our main focus is in the robust estimation of the halfspace depth. Thus, we consider mainly constant $\alpha$ . In spaces of lower dimensions, $\alpha=1/3$ guarantees decent stability in combination with affine invariance, and superb robustness properties; details analogous to Theorem 5 are omitted.

Theorem 7.

Under the assumptions of Theorem 6

[TABLE]

Theorem 7 asserts that the robustified depth is a uniformly consistent estimator of the true halfspace depth. Following [7, 10], for an estimator of $F_{n}$ that performs well also in the tails of the distribution, we are able to derive a multiplicative version of the uniform consistency result. By [7, Remark 1], this result is much stronger than Theorem 7, and is valuable especially when extreme depth-regions are to be estimated.

Theorem 8.

Suppose that the assumptions of Theorem 6 are satisfied. In addition, let

[TABLE]

for a sequence $\xi_{n}$ , and let

[TABLE]

hold true for $\lambda$ the smallest eigenvalue of $\Sigma$ . For $b=F^{-1}\left(1-\alpha\right)$ and any $c>0$ let

[TABLE]

with $\omega_{n}=\left(R_{n}^{d-1}/\sqrt{n}\right)^{2/(d+1)}$ . Then

[TABLE]

We conclude this section by giving several remarks on the assumptions of Theorems 6, 7, and 8.

(R5)

The conditions on the convergence rates of the estimator of the quantile (14) and (16) are not restrictive at all. Suppose, for instance, that the parametric convergence rate $F_{n}^{-1}\left(1-\alpha\right)-F^{-1}\left(1-\alpha\right)=\mathcal{O}_{\mathsf{P}}\left(n^{-1/2}\right)$ is true. Then, for $d>1$ , both conditions (14) and (16) are satisfied provided already that $R_{n}=o\left(n^{1/(2(d-1))}\right)$ , as required for the consistency of the illumination. 2. (R6)

Condition (17) is also not too stringent. It is satisfied by the refined estimator of the univariate distribution function $F$ studied in [7, Section 2.1]. There, based on the extreme value theory, an estimator $F_{n}$ is constructed that, under appropriate assumptions, obeys

[TABLE]

for an adequate sequence $\delta_{n}\to 0$ as $n\to\infty$ . Given that the sequence $R_{n}$ in (17) does not grow too fast, i.e. that $R_{n}\leq\sqrt{\lambda}F^{-1}\left(1-\delta_{n}\right)/2$ , we have that $\left\{t\in\mathbb{R}\colon\left|t\right|<2R_{n}/\sqrt{\lambda}\right\}\subset\left\{t\in\mathbb{R}\colon\left|t\right|<F^{-1}\left(1-\delta_{n}\right)\right\}$ , and (17) is valid for the estimator of $F$ from [7]. 3. (R7)

Condition (18) is valid for $R_{n}$ that increases slowly enough. Suppose, for instance, as in Remark (R5) that $\xi_{n}=n^{-1/2}$ in (16). Then $\max\left\{R_{n}\xi_{n},\omega_{n}\right\}$ from (18) reduces to $\omega_{n}$ , and we may bound, with $w_{F}$ the minimal modulus of continuity of $F$ ,

[TABLE]

If $F$ has a density bounded from above by a constant $M>0$ , the mean value theorem gives $w_{F}\left(h\right)=\sup_{\left|s-t\right|<h}\left|F(s)-F(t)\right|\leq Mh$ . Therefore, for $d>1$ , for (18) to be true it is enough that $\omega_{n}=\left(R_{n}^{d-1}/\sqrt{n}\right)^{2/(d+1)}=o\left(F\left(-R_{n}/\sqrt{\lambda}\right)\right)$ . If $F(t)$ does not decrease with $t\to-\infty$ at a rate faster than $\left|t\right|^{\gamma}$ for some $\gamma<0$ , it is not difficult to see that $R_{n}=o\left(n^{(2(d-1)-\gamma(d+1))^{-1}}\right)$ guarantees (18). For $F$ with a lighter tail, polynomial rates of $R_{n}$ may not be sufficiently slow. For $F(t)$ not decreasing with $t\to-\infty$ at a rate faster than $e^{-\left|t\right|^{\gamma}}$ for $\gamma>0$ , we need to take $R_{n}$ increasing slower than $\left(\left(\frac{1}{d+1}-\varepsilon\right)\log(n)\right)^{1/\gamma}\sqrt{\lambda}$ for some $\varepsilon>0$ to get (18). Likewise, for $F(t)$ not decreasing faster than $\exp\left(-e^{\left|t\right|^{\gamma}}\right)$ with $t\to-\infty$ and $\gamma>0$ , we take $R_{n}$ slower than $\left(\log\left(\frac{1}{d+1}-\varepsilon\right)+\log\log n\right)^{1/\gamma}\sqrt{\lambda}$ for any $\varepsilon>0$ small enough. Note, however, that these estimates are rough, and for particular distribution functions $F$ finer rates of $R_{n}$ can be deduced directly from (18). Detailed proofs of these results can be found in Appendix A.

5. Applications

5.1. Tie-breaking

For an illustration of the tie-breaking capability of the illumination we consider a sample of $n=1000$ observations from the spherically symmetric standard five-dimensional normal distribution $P$ ( $d=5$ ). The correct ranking $R_{c}$ of the observations from the most central to the most extreme is given by the decreasing values of the (population) density of $P$ . We compare it to the ranking $R_{hD}$ based on the decreasing values of the sample halfspace depth of the observations, where many ties occur, and the improved ranking $R_{\mathcal{I}}$ where the ties are resolved based on the value of $\mathcal{I}(x;P_{n,\alpha_{n}(x)})$ , larger values corresponding to more extreme observations, see Section 4. The cut-off $\alpha_{n}(x)$ is chosen so that $P_{n,\alpha_{n}(x)}$ contains one half of observations whose halfspace depth is not smaller than that of $x$ .

In Table 2 we report the means and standard deviations of the estimated (Spearman) correlation coefficient between $R_{hD},R_{c}$ and $R_{\mathcal{I}},R_{c}$ , respectively, for observations with $hD(x;P_{n})\leq\delta$ for several values of $\delta$ . The reported results are based on $100$ replications of the experiment.

From the $1000$ observations, on average $107$ lies on the boundary of the convex hull of the data and the halfspace depth alone cannot rank them properly. The refined ranking based on illumination is quite successful in this case, see the last row of Table 2 and Figure 4.

5.2. Estimation of extreme central regions

Consider the problem of estimation of the region $P_{\delta}$ for very small values of $\delta$ , based on the sample of size $n$ from distribution $P$ . The approach described in [7] is based on the so-called refined halfspace depth and finding the set $S=P_{n,k/n}$ for an appropriate value of $k\in\mathbb{N}$ . The region $P_{\delta}$ is then estimated by the inflated set $E_{R}=c\,S=\left\{c\,x\colon x\in S\right\}$ , where $c=\left(\frac{k}{n\delta}\right)^{1/\hat{\alpha}}$ , and $\hat{\alpha}$ is the estimated tail index of $P$ . Note the implicit assumptions of homothety of the depth contours, and that of the halfspace median of $P$ being the origin.

Assume now that $S=P_{n,k/n}$ is an ellipsoid given by $\left\{x\in\mathbb{R}^{d}\colon\mathsf{d}_{\Sigma}\left(x,0\right)\leq 1\right\}$ (which is a relevant approximation for elliptical distributions). Let $x$ be a point on the boundary of $S$ and $x^{*}=c\,x$ a point on the boundary of $c\,S$ . It holds that $\mathsf{d}_{\Sigma}(x^{*},0)=c\,\mathsf{d}_{\Sigma}(x,0)=c$ and using (1) we get $g_{d}(c)=g_{d}(\mathsf{d}_{\Sigma}(x^{*},0))=\mathcal{I}(x^{*};S)/\operatorname{vol}_{d}\left(S\right)$ .

Instead of the inflation of the set $S$ , our approach is based on finding the set $E_{\mathcal{I}}=\{x\in\mathbb{R}^{d}:\mathcal{I}(x;S)\leq g_{d}(c)\}$ . This is related to the approach from [7] but more robust in the sense that our procedure is less sensitive to errors in estimation of $S=P_{n,k/n}$ and the tail index $\alpha$ . Figure 5 shows the estimates of the central region $P_{1/n}$ based on a sample of size $n=500$ from the spherically symmetric bivariate Cauchy distribution where, in agreement with [7], we take $k=75$ for finding $S=P_{n,k/n}$ and the tail index $\alpha$ .

We repeated the experiment 100 times and computed the two Hausdorff distances $\mathsf{d}_{H}(E_{\mathcal{I}},P_{1/n})$ and $\mathsf{d}_{H}(E_{R},P_{1/n})$ , respectively. The boxplots of these distances are given in Figure 6. We remark that in all 100 replications of the experiment we observed $\mathsf{d}_{H}(E_{\mathcal{I}},P_{1/n})<\mathsf{d}_{H}(E_{R},P_{1/n})$ , making the illumination-based approach more successful in the estimation of $P_{1/n}$ . This appears to be justified by a more spherical shape of $E_{\mathcal{I}}$ , obtained by the illumination of $S$ , compared to $E_{R}$ , obtained by the inflation of $S$ .

5.3. Robust classification

Illumination can be used to devise a robust and affine invariant version of the quadratic discriminant analysis (QDA) classification rule, whose population version is optimal. Suppose, for simplicity, that we have two independent $d$ -variate random samples from normal distributions $P^{(j)}$ with unknown mean vectors $\mu_{j}$ and unknown variance matrices $\Sigma_{j}$ for $j=1,2$ , respectively. A new observation $x$ is sampled from $P^{(j)}$ with a known probability $\pi_{j}$ , $\pi_{1}+\pi_{2}=1$ . Our task is to determine from which distribution $x$ was sampled. For the illumination-based QDA, let $0<\delta<1/2$ be a fixed parameter. We suggest to assign $x$ into $P^{(1)}$ if and only if

[TABLE]

In the population case, this simple classification rule is equivalent with the QDA, i.e. it is optimal in our setting. At the same time, it is affine invariant, highly robust for $\delta$ large enough, and entirely depth-based, as we saw in Section 4.1 that $\mathsf{d}_{\Sigma_{j}}(x,\mu_{j})$ can be consistently estimated by $M_{\delta}\left(x;P_{n}^{(j)}\right)$ with $F_{n}=\Phi$ the distribution function of the univariate standard normal variable, and $\operatorname{vol}_{d}\left(P_{n,\delta}^{(j)}\right)$ almost surely approaches $\operatorname{vol}_{d}\left(P_{\delta}^{(j)}\right)$ as $n\to\infty$ . The proof of the following result can be found in Appendix A.

Theorem 9.

For $P^{(j)}$ as above the illumination-based QDA classification rule (19) is optimal, i.e. for any $\delta\in(0,1/2)$ it coincides with the classical quadratic discriminant rule. Furthermore, for any $K\subset\mathbb{R}^{d}$ bounded and $j=1,2$

[TABLE]

where $\Phi$ is used in place of $F_{n}$ in $M_{\delta}\left(x;P_{n}^{(j)}\right)$ .

Note that in Theorem 9 we deal with normal distributions. For other elliptically symmetric distributions $EC\left(\mu_{j},\Sigma_{j},F\right)$ analogous results are straightforward to derive; for details see Appendix A.12.

To illustrate the performance of the robust QDA classification approach we consider two simulation experiments. Another two scenarios are given in Appendix B.

5.3.1. Bivariate normal distribution, location and scale difference

Let $P_{X}$ be the standard bivariate normal distribution and denote $P^{(1)}=P_{X},P^{(2)}=P_{2X+(4,4)^{\mathsf{T}}}$ . The training sets consist of $500$ observations from $P^{(1)}$ and $P^{(2)}$ , respectively; the testing sets consist of $1000$ points sampled from $P^{(1)}$ and $P^{(2)}$ , respectively. We consider the illumination-based QDA procedure given above and compare it to the classical QDA method and the method based on the refined halfspace depth [7].

For the illumination we choose $\delta$ so that the probability content of $P_{X,\delta}$ is $0.5$ . This results in $\delta=1-\Phi(\sqrt{2\log 2})$ . For computing the refined depth we take $k=75$ in agreement with [7].

The experiment was repeated $100$ times. Figure 7 (left panel) shows boxplots of the misclassification rates for different methods. The illumination-based approach and the classical QDA on average achieve the optimal (Bayes) error rate while the misclassification rates of the method based on the refined depth tend to be slightly higher.

To assess the performance of the classification methods in the extremes we consider another experiment where first $2500$ testing points are generated from each distribution, but only those outside the convex hull of both training sets are used for classification. This setup corresponds to the so-called outsider problem studied, among others, in [11]. The outsiders have all zero empirical halfspace depth w.r.t. both training sets, and hence cannot be classified based on $hD$ only. Figure 7 (right panel) shows boxplots of the misclassification rates of the considered methods for the outsiders. Both illumination-based approach and QDA still perform well. The method based on the refined depth suffers from much higher misclassification rates because of the incorrect estimation of the tail index.

To study the robustness properties of the classification methods we consider contamination of the first training set by observations from $P^{(3)}=P_{X+(40,40)^{\mathsf{T}}}$ . We set the extent of contamination to 1 %, 5 % and 10 % of the data points, respectively. Table 3 gives the average misclassification rates in these settings. Note that in the right part of the table different rows are not directly comparable since higher contamination implies larger span of the training points, hence on average fewer (more outlying) test points lie outside the convex hull of the training points. We observe that the illumination-based approach is very robust and performs well even under rather severe contamination. In contrast, the refined depth [7] is sensitive to the contamination through the estimation of the tail index.

5.3.2. Bivariate elliptical distribution, location and scale difference

To study the classification performances for a heavy-tailed distribution, let $P_{Y}$ be the elliptical distribution from [10] with the probability density function $f(x,y)=\frac{3(x^{2}/4+y^{2})^{2}}{4\pi(1+(x^{2}/4+y^{2})^{3})^{3/2}},(x,y)^{\mathsf{T}}\in\mathbb{R}^{2}$ . Let $P^{(1)}=P_{Y}$ , $P^{(2)}=P_{2Y+(4,4)^{\mathsf{T}}}$ , $P^{(3)}=P_{X+(40,40)^{\mathsf{T}}}$ for $X$ from Section 5.3.1. It is possible to adapt our robust QDA procedure by replacing $\Phi$ with the marginal distribution function $F$ of the first element of the spherically symmetric affine image of $Y$ . The function $F$ is assumed to be known444To avoid lengthy numerical computations of multiple quantiles of $F$ , here and in Section B.1.2 in the appendix we slightly simplify the rule (19), and for $x\in P_{n,\delta}^{(1)}\cup P_{n,\delta}^{(2)}$ we assign $x$ to $P^{(1)}$ if and only if $hD\left(x;P^{(1)}_{n}\right)>hD\left(x;P^{(2)}_{n}\right)$ .. Here we choose $\delta\doteq 0.11205$ so that the probability content of $P_{Y,\delta}$ is $0.5$ .

The results are summarized in Figure 8 and Table 4. We observe that in the case with no contamination, the illumination-based approach performs the best from the three classification methods considered. Note that the very high misclassification rates for the method based on the refined depth and the group of outsiders under contamination (see the right part of Table 4), are caused by the poorly estimated tail index of the distribution $P^{(1)}$ , and because approximately three times more testing points come from the distribution $P^{(2)}$ than from $P^{(1)}$ due to the greater spread of $P^{(2)}$ .

Overall, our experiments demonstrate a great potential for varied illumination-based statistical methodology. All these results will be further elaborated on in the coming works of the authors.

Appendix A Proofs of the theoretical results

A.1. Proof of Lemma 1

For $d=1$ , $\Sigma=\sigma^{2}>0$ and $\left|x-\mu\right|>\sigma$ , the formula reduces to $\mathcal{I}\left(x;\mathcal{E}_{\mu,\Sigma}\right)=\left|x-\mu\right|+\sigma$ , which is the illumination of $x$ outside $\mathcal{E}_{\mu,\Sigma}=\sigma B^{1}$ on that ball. For $d>1$ , let us first compute the illumination of a unit ball. Take $x\notin B^{d}$ . The set difference of the convex hull of $x$ and $B^{d}$ minus $B^{d}$ is a cone with height $\left\|x\right\|-1/\left\|x\right\|$ and base a $(d-1)$ -dimensional ball with radius $\sqrt{1-1/\left\|x\right\|^{2}}$ , without a spherical cap of $B^{d}$ of height $1-1/\left\|x\right\|$ . Because $\operatorname{vol}_{d}\left(B^{d}\right)=\frac{\pi^{d/2}}{\Gamma\left(\frac{d}{2}+1\right)}$ , the volume of the cone is

[TABLE]

and the volume of the cap is

[TABLE]

Altogether, (20) and (21) give that

[TABLE]

It is not difficult to see that $\mathcal{E}_{\mu,\Sigma}=\Sigma^{1/2}B^{d}+\mu=\bigcup_{x\in B^{d}}\left\{\Sigma^{1/2}x+\mu\right\}$ . Thus, by the affine equivariance of the illumination bodies [34, Proposition 2] we have $\mathcal{I}\left(x;\mathcal{E}_{\mu,\Sigma}\right)=\mathcal{I}\left(\Sigma^{-1/2}\left(x-\mu\right);B^{d}\right)\sqrt{\left|\Sigma\right|}$ . The general assertion then follows from $\left\|\Sigma^{-1/2}\left(x-\mu\right)\right\|=\sqrt{\left(x-\mu\right)^{\mathsf{T}}\Sigma^{-1}\left(x-\mu\right)}$ .

A.2. Lemma 10

The next lemma summarizes some analytical properties of the function $g_{d}$ defined in Section 2.

Lemma 10.

For all $d\geq 1$

(i)

function $g_{d}\colon[1,\infty)\to[1,\infty)$ is uniformly continuous, strictly increasing, and convex; 2. (ii)

$g_{d}(1)=1$ , $\lim_{t\to\infty}g_{d}(t)=\infty$ ; 3. (iii)

$g_{d}$ * is differentiable on $(1,\infty)$ and*

[TABLE] 4. (iv)

$g_{d}(t)-1=\mathcal{O}\left(\left(t-1\right)^{(d+1)/2}\right)$ * as $t\to 1$ from the right;* 5. (v)

the minimal modulus of continuity of the inverse function $g_{d}^{-1}$ takes the form

[TABLE] 6. (vi)

as $h\to 0$ from the right, $w_{g_{d}^{-1}}(h)=\mathcal{O}\left(h^{2/(d+1)}\right)$ ; 7. (vii)

as $t\to\infty$ , $g_{d}^{-1}(t)=\mathcal{O}\left(t\right)$ .

Proof.

Using the Leibniz integral formula it is easy to see that the derivative of $g_{d}$ is (22). That function is positive, increasing, and bounded from above. Hence, $g_{d}$ is strictly increasing, convex, and Lipschitz continuous. Part (iv) follows by an application of l’Hôpital’s rule

[TABLE]

For Part (v) first note that because $g_{d}$ is smooth, strictly increasing and convex, its inverse $g_{d}^{-1}$ must be smooth, strictly increasing and concave. For such a function the mean value theorem asserts that the greatest difference $g_{d}^{-1}(s)-g_{d}^{-1}(t)$ subject to $1\leq t\leq s<t+h$ must be attained at the left endpoint of its domain, i.e. for $t=1$ and $s=1+h$ . To obtain the rate of the modulus of continuity, note that by (23) there exists $c>0$ such that

[TABLE]

Apply $g_{d}^{-1}$ to both sides of this inequality and substitute $h=c\left(t-1\right)^{(d+1)/2}$ to get

[TABLE]

and the conclusion follows. Finally, using substitution $t=g_{d}(s)$ and l’Hôpital’s rule again,

[TABLE]

Hence, $g_{d}^{-1}(t)=\mathcal{O}\left(t\right)$ as $t\to\infty$ . ∎

A.3. Proof of Theorem 2

We only prove the first part of the theorem. The remaining parts are straightforward, and follow directly from the essential properties of the halfspace depth [19], and the properties of the illumination [34].

By the affine invariance of the halfspace depth [4, Lemma 2.1] we know that $\left(P_{AX+b}\right)_{\alpha}=A(P_{X})_{\alpha}+b$ . For the illumination, it follows that

[TABLE]

A.4. Proof of Theorem 3

We start with the illumination. From [6, Theorem 4.2] we know that under the assumptions of the theorem, the central regions $P_{\alpha}$ are consistent for $P$ in the Hausdorff distance, i.e.

[TABLE]

For any $x\in K_{n}$ we know that almost surely for $n$ large

[TABLE]

In the inequalities we used Lemma 11 stated below for $\mathsf{d}_{H}\left(P_{n,\alpha},P_{\alpha}\right)<1$ , and the properties of the Hausdorff distance [22, p. 64]. Since for a fixed compact set $K=K_{n}$ for all $n$ the term $R_{n}$ is constant, the first part of the theorem is verified in view of (24).

To derive the rates of convergence, by [1, Theorem 2] we have that $\mathsf{d}_{H}\left(P_{n,\alpha},P_{\alpha}\right)=\mathcal{O}_{\mathsf{P}}\left(n^{-1/2}\right)$ , and the last inequality in (25) is enough to conclude.

For the affine invariant version of the illumination, write

[TABLE]

By the assumptions of the theorem we know that $\operatorname{vol}_{d}\left(P_{\alpha}\right)>0$ . From (24) and Lemma 11 it thus follows that for $n$ large enough $\operatorname{vol}_{d}\left(P_{n,\alpha}\right)\geq\operatorname{vol}_{d}\left(P_{\alpha}\right)/2$ almost surely, and that for such $n$ it also holds true that

[TABLE]

almost surely, for $c_{d}>0$ the constant from Lemma 11. By [1, Theorem 2] the last formula can be written also as

[TABLE]

Finally, because $P_{\alpha}$ is a fixed bounded set, a trivial upper bound for $\sup_{x\in K_{n}}\left|\mathcal{I}(x;P_{\alpha})\right|$ is the maximum illumination of $x\in K_{n}$ w.r.t. the smallest enclosing ball of $P_{\alpha}$ . By Lemmas 1 and 10 this is of order $\mathcal{O}\left(R_{n}\right)$ . Altogether, all the above bounds and the consistency result for $\mathcal{I}$ can be plugged into (26) to obtain the desired rate of convergence

[TABLE]

Lemma 11.

Let $R>0$ . There exists a constant $c_{d}>0$ such that for all convex bodies $K,L\subset\mathbb{R}^{d}$ with $K\subset B^{d}\left(x,R\right)$ for some $x\in\mathbb{R}^{d}$

[TABLE]

Proof.

Write $\delta=\mathsf{d}_{H}\left(K,L\right)$ . From the definition of the Hausdorff distance (6) we have that

[TABLE]

If $\operatorname{vol}_{d}\left(K\right)\leq\operatorname{vol}_{d}\left(L\right)$ , this gives $\operatorname{vol}_{d}\left(K\right)\leq\operatorname{vol}_{d}\left(L\right)\leq\operatorname{vol}_{d}\left(K+\delta B^{d}\right)$ ; in the other case $\operatorname{vol}_{d}\left(L\right)<\operatorname{vol}_{d}\left(K\right)$ we get $\operatorname{vol}_{d}\left(L\right)<\operatorname{vol}_{d}\left(K\right)\leq\operatorname{vol}_{d}\left(L+\delta B^{d}\right)$ . This results in

[TABLE]

and it is enough to bound the excess volume of the outer parallel body $K+\delta B^{d}$ of a convex body $K$ , and analogously for $L$ . For this, use the Steiner formula [22, Formula (4.1)]

[TABLE]

where $V_{j}(K)$ stands for the intrinsic volume of the convex body $K$ [22, Chapter 4]. In particular, it holds true that $V_{d}(K)=\operatorname{vol}_{d}\left(K\right)$ , $V_{d-1}(K)$ is proportional to the surface area measure of $K$ , $V_{1}(K)$ is the so-called intrinsic width of $K$ , and $V_{0}(K)=1$ .

From the monotonicity of the intrinsic volumes that follows from formulas (5.25) and (5.31) in [22], and $K\subset B^{d}\left(x,R\right)$ , we can use the expression for the intrinsic volumes of a ball (4.64) from [22] and bound

[TABLE]

For a bound on the excess volume of $L+\delta B^{d}$ , first note that from (27) we have

[TABLE]

Similarly as in (28) we can thus write

[TABLE]

From (28) and the last inequality we see that our claim holds true for

[TABLE]

the maximum of all the terms that are constant in $R$ and $\delta$ in the sums on the right-hand sides of the two excess volume bounds. ∎

A.5. Consistency of the illumination on unbounded sets

Over unbounded subsets of $\mathbb{R}^{d}$ with $d>1$ , neither illumination, nor the illumination depth are uniformly consistent. So see this take a convex body $K$ in $\mathbb{R}^{d}$ , $y$ in the distance of $\varepsilon>0$ from $K$ , and let $K_{y}=\operatorname{co}\left(K\cup\{y\}\right)$ . Surely, $\mathsf{d}_{H}\left(K_{y},K\right)=\varepsilon$ . By the Hahn-Banach separation theorem [22, Theorem 1.3.7], $y$ and $K$ can be strongly separated by two parallel hyperplanes $H_{1}$ , $H_{2}$ whose distance is at least $\varepsilon/2$ and $y\in H_{1}$ . Take $x\in H_{2}$ far enough from $y$ . The illumination $\mathcal{I}\left(x;K_{y}\right)$ and $\mathcal{I}\left(x;K\right)$ then differs by, at least, the illumination of $x$ onto the cone $K_{y}\cap H_{2}^{+}$ for $H_{2}^{+}$ the halfspace whose boundary is $H_{2}$ and $y\in H_{2}^{+}$ . This illumination can be bounded from both below and above by the illumination of $x$ on any two balls $B_{1}$ and $B_{2}$ centred at some $z\in K_{y}\cap H_{2}^{+}$ such that $B_{1}\subset K_{y}\cap H_{2}^{+}\subset B_{2}$ , respectively. By Lemmas 1 and 10, the latter two illuminations both grow with increasing $R=\left\|z-x\right\|$ at a rate $\mathcal{O}\left(R\right)$ , i.e. $\mathcal{I}\left(x;K_{y}\right)-\mathcal{I}\left(x;K\right)=\mathcal{O}\left(\left\|z-x\right\|\right)$ with $H_{2}\ni x\to\infty$ . In other words, for any $\varepsilon>0$ one can find $x$ far enough so that $\mathcal{I}\left(x;K_{y}\right)-\mathcal{I}\left(x;K\right)\geq 1$ . Consequently, even if the distance $\mathsf{d}_{H}\left(K_{n},K\right)$ converges to zero (almost surely), the illumination differences $\left|\mathcal{I}\left(x;K_{n}\right)-\mathcal{I}\left(x;K\right)\right|$ and $\left|\mathcal{I}\left(x;K_{n}\right)/\operatorname{vol}_{d}\left(K_{n}\right)-\mathcal{I}\left(x;K\right)/\operatorname{vol}_{d}\left(K\right)\right|$ cannot, in general, vanish uniformly over unbounded sets. The same example applies to the second component of $D_{\alpha_{n}}$ .

A.6. Proof of Theorem 4

For $x$ fixed, the illumination of $x$ tends to infinity if and only if the halfspace depth central region $P_{n,\alpha}$ breaks down. Hence, it is enough to evaluate the breakdown point of $P_{n,\alpha}$ with respect to the Hausdorff distance. We follow the derivations in the proofs of [3, Proposition 2.2] and [4, Proposition 3.2]. Let $x_{M}\in\mathbb{R}^{d}$ be (any) halfspace median of $P_{n}$ , that is let $hD\left(x_{M};P_{n}\right)=\Pi(P_{n})$ . By the argument used in the proof of [4, Lemma 3.1] to upset the set $P_{n,\alpha}=\left\{y\in\mathbb{R}^{d}\colon hD\left(y;P_{n}\right)n\geq\lceil\alpha n\rceil\right\}$ entirely, the smallest number of additional points that need to be added to the data is $m$ , the smallest integer that satisfies $m\geq\lceil\alpha\left(m+n\right)\rceil$ (compare with formula (6.19) in [4]). This inequality is solved by $m=\lceil(\alpha/(1-\alpha))n\rceil$ . The additional condition $\alpha\leq\Pi(P_{n})/(1+\Pi(P_{n}))$ ensures that $m\leq\lceil\Pi(P_{n})\,n\rceil=\Pi(P_{n})\,n$ . From this it follows that the depth of $x_{M}$ with respect to the contaminated dataset must be at least $\Pi(P_{n})\,n/(n+m)\geq\Pi(P_{n})/(1+\Pi(P_{n}))\geq\alpha$ . Hence, after the contamination procedure, the central region of points whose depth is at least $\alpha$ must be non-empty.

In the situation when $\alpha>\Pi(P_{n})/(1+\Pi(P_{n}))$ , due to the nestedness of the central regions $P_{n,\alpha}$ , by the previous part of the proof at least

[TABLE]

contaminating points are needed.

The corollary with the asymptotic value of the breakdown point follows the same argument as in the proof of [4, Propositions 3.2 and 3.3].

A.7. Proof of Theorem 5

The proofs of parts (i), (ii) and (iii) are straightforward and analogous to the proof of Theorem 2. For part (iv) it is sufficient to realise that according to the non-degeneracy of $P$ , and symmetry conditions imposed on the estimator $F_{n}$ , the lower level set of $M_{\alpha}\left(\cdot;P_{n}\right)$ is large if and only if either

(i) the central region $P_{n,\alpha}$ is extremely large; or (ii) $F_{n}^{-1}\left(1-\alpha\right)$ is extremely small. By Theorem 4, for the former case, asymptotically at least $m\approx n\min\{\alpha,1/3\}/(1-\min\{\alpha,1/3\})$ contaminating points have to be added to the random sample to disrupt the central region entirely. In the latter case, unless there exists a configuration of $m$ points that make $F_{n}^{-1}\left(1-\alpha\right)$ arbitrarily small, the set $P_{n,\alpha}$ cannot be made arbitrarily large. By extension, no fixed lower level set (12) can then be made too big. By the assumption on the breakdown point of $F_{n}^{-1}\left(1-\alpha\right)$ , in the second scenario it is even more difficult to break down the estimator (12) than in the first one. Another option when the lower level set (12) breaks down is when it is an empty set. But, that can happen only if for some $\delta>0$ small enough, $F_{n}^{-1}\left(1-\Pi\left(P_{n}\right)\right)>\delta$ . This is ruled out by the additional condition imposed on $\delta$ . Thus, the resulting limiting breakdown point of the level set is the same as that of $P_{n,\alpha}$ .

A.8. Proof of Theorem 6

By (9) and (10), the Mahalanobis distance $\mathsf{d}_{\Sigma}(x,\mu)$ can be written either as $F^{-1}\left(1-hD\left(x;P\right)\right)$ for any $x\in\mathbb{R}^{d}$ , or, in case when $x\notin P_{\alpha}$ , also as $F^{-1}\left(1-\alpha\right)g_{d}^{-1}\left(\mathcal{I}\left(x;P_{\alpha}\right)/\operatorname{vol}_{d}\left(P_{\alpha}\right)\right)$ . It is thus sufficient to bound

[TABLE]

The three suprema on the right hand side will be treated separately. Denote them by I, II, and III, respectively.

A.8.1. Supremum I

The sample halfspace depth $hD\left(\cdot;P_{n}\right)$ is known [4, formula (6.6)] to be a uniformly consistent estimator of its population version

[TABLE]

Because $P$ is halfspace symmetric, yet its centre of symmetry has zero probability mass, for any $x\in K_{n}\cap P_{n,\alpha}$ we have $\alpha\leq hD\left(x;P_{n}\right)\leq 1/2$ , with the second inequality almost surely for all $n$ large enough due to (29). We may use the consistency (29) again to get that for any $\varepsilon>0$ small, $\alpha-\varepsilon\leq hD\left(x;P\right)\leq 1/2$ for all $x\in K_{n}\cap P_{n,\alpha}$ and $n$ large enough.

Function $F$ is strictly increasing in a neighbourhood of $[0,F^{-1}\left(1-\alpha\right)]$ . Thus, $F^{-1}$ is (uniformly) continuous on $I=[1/2,1-\alpha]$ . Its approximating sequence $\left\{F_{n}^{-1}\right\}_{n=1}^{\infty}$ is a sequence of functions that are non-decreasing, and converge to $F^{-1}$ at each $t\in I$ by the uniform consistency of $F_{n}$ from (13), and [30, Lemma 21.2]. A lemma of Pólya [21, Problem 127, part II] gives that this convergence is uniform on $I$ . We can thus write for $n$ large enough

[TABLE]

where $w_{F^{-1}}$ is the minimal modulus of continuity of $F^{-1}$ restricted to the interval $I$ . The first supremum on the right hand is small almost surely for $n$ large by the uniform convergence of the quantile functions established above. The second will vanish almost surely because of (29) and the uniform continuity of $F^{-1}$ on $I$ .

A.8.2. Supremum II

Let us first introduce the notation

[TABLE]

In supremum II we bound for $L_{n}^{II}=K_{n}\setminus\left(P_{n,\alpha}\cup P_{\alpha}\right)$

[TABLE]

For the supremum in the first summand in (31) we know from (10) that

[TABLE]

where $\lambda>0$ is the smallest eigenvalue of $\Sigma$ . Using the assumption (14) we see that the first summand on the right hand side of (31) vanishes in probability as $n\to\infty$ . Furthermore, by (14) we also have that $\left|b_{n}\right|=\mathcal{O}_{\mathsf{P}}(1)$ , and by Lemma 10 together with Theorem 3

[TABLE]

where $w_{g_{d}^{-1}}$ is the minimal modulus of continuity of $g_{d}^{-1}$ from Lemma 10. Together, we have verified that

[TABLE]

A.8.3. Supremum III

Here it will be crucial that under the conditions of the theorem, the set $P_{\alpha}\setminus P_{n,\alpha}$ is negligible as $n\to\infty$ by the consistency of the halfspace depth contours (24). First, without loss of generality, suppose that both $P_{\alpha}$ and $P_{n,\alpha}$ are contained in $K_{n}$ . This is possible, because $P_{\alpha}$ is a fixed set, and the sequence $P_{n,\alpha}$ is convergent almost surely by (24). Thus, possible enlargement of $K_{n}$ by a fixed set does not affect any results in this proof. Take $x\in L_{n}^{III}=K_{n}\cap\left(P_{\alpha}\setminus P_{n,\alpha}\right)$ . As $x\notin P_{n,\alpha}$ ,

[TABLE]

In terms of $x$ , this expression varies monotonically with $\mathcal{I}\left(x;P_{n,\alpha}\right)$ . Note that for $K$ a convex body, the illumination $\mathcal{I}\left(\cdot;K\right)$ strictly increases on any straight halfline $L$ that starts from $x\in\partial K$ (the boundary of $K$ ) and does not intersect $K$ elsewhere, i.e. $K\cap L=\{x\}$ . Thus, in our situation, if one considers any halfline that starts at a boundary point of $P_{n,\alpha}$ and passes through $x$ ,

[TABLE]

On the boundary of $P_{n,\alpha}$ we are in the situation dealt with in supremum I, and by that part of the proof we know that for $\varepsilon>0$ given, almost surely for any $n$ large enough,

[TABLE]

Likewise, for the upper bound, by part II of this proof, and the continuity of $\mathsf{d}_{\Sigma}(x,\mu)$ , we have an analogous restriction, and with high probability, for $n$ large enough,

[TABLE]

Finally, we use (24) and the fact that the Hausdorff distances of convex bodies, and of their boundaries, are the same [22, Lemma 1.8.1]. This gets that almost surely, for any $\delta>0$ , for all $n$ large enough, and any $y\in\partial P_{n,\alpha}$ , there exists $z\in\partial P_{\alpha}$ such that $\left\|y-z\right\|<\delta$ . Now, because $\mathsf{d}_{\Sigma}(x,\mu)$ is in $x$ (uniformly) continuous in a uniform neighbourhood of $P_{\alpha}$ , this means that almost surely, for $n$ large enough,

[TABLE]

and for any $x\in L_{n}^{III}$

[TABLE]

Altogether, collect all the bounds in this part of the proof to get that for any $\varepsilon>0$ , with high probability, for $n$ large enough,

[TABLE]

which finishes the proof.

A.9. Proof of Theorem 7

In view of the uniform consistency of the halfspace depth (29) it suffices to show that

[TABLE]

Proceed analogously as in the proof of Theorem 6, and consider two situations — the supremum above over $x\in L_{n}^{II}=K_{n}\setminus\left(P_{n,\alpha}\cup P_{\alpha}\right)$ , and the the supremum over $x\in L_{n}^{III}=K_{n}\cap\left(P_{\alpha}\setminus P_{n,\alpha}\right)$ .

Suppose first that $x\in L_{n}^{II}$ . By (9) and (10), in the notation from (30) we have that

[TABLE]

Therefore,

[TABLE]

The first summand on the right hand side above vanishes almost surely as $n\to\infty$ by (13). For the second summand we already have a bound from (34) from the proof of Theorem 6. Since $F$ has a density, it must be uniformly continuous on $\mathbb{R}$ . Denote by $w_{F}\colon(0,\infty)\to\mathbb{R}$ its minimal modulus of continuity. We obtain

[TABLE]

This completes the part of the proof with $L_{n}^{II}$ .

For the second part, consider $x\in L_{n}^{III}$ . Note that thanks to (9) and the continuity of $F$ in a neighbourhood of $F^{-1}\left(1-\alpha\right)$ , the halfspace depth $hD\left(\cdot;P\right)$ must be (uniformly) continuous in a uniform neighbourhood of $P_{\alpha}$ . Furthermore, for $x\notin P_{n,\alpha}$ , $RhD_{\alpha}\left(x;P_{n}\right)$ varies monotonically with $\mathcal{I}\left(x;P_{n,\alpha}\right)$ . Thus, derivation analogous to that from part III in the proof of Theorem 6 gives that the convergence of the halfspace depth contours (24) implies that with $n\to\infty$

[TABLE]

and the proof is finished.

A.10. Proof of Theorem 8

By the uniform consistency of the halfspace depth (29) we can bound for $n$ large enough

[TABLE]

where the last term vanishes almost surely as $n\to\infty$ . Thus, in the notation established in (30) in the proof of Theorem 6, it suffices to show that also the right hand size of

[TABLE]

is asymptotically negligible, where $L_{n}^{II}=K_{n}\setminus\left(P_{n,\alpha}\cup P_{\alpha}\right)$ and $L_{n}^{III}=K_{n}\cap\left(P_{\alpha}\setminus P_{n,\alpha}\right)$ . We used (9) and (10) to obtain the expression on the right hand side. We already have everything prepared to bound the second summand above. Indeed, by Theorem 7

[TABLE]

Let us now focus on the supremum over $L_{n}^{II}$ . For $x\in L_{n}^{II}$ we can write

[TABLE]

In the same way as in (31), (32), (33) and (34) in the proof of Theorem 6 we have, using (16), that

[TABLE]

By the definition of the refined depth (15) we also see that $a_{n,x}<-1$ for any $x\in L_{n}^{II}$ . Combine this with (36) to obtain that there exists $c>0$ such that for all $n\geq 1$ and $x\in L_{n}^{II}$ we can write $\left(1-c\,\omega_{n}\right)b<\left|a_{x}b\right|$ , which means that for $n$ large enough $b/2<\left|a_{x}b\right|<R_{n}/\sqrt{\lambda}$ for all $x\in L_{n}^{II}$ . Formulas (36) therefore allow us to write for some $c>0$ large enough for all $\varepsilon>0$ and $n$ large

[TABLE]

The first two summands on the right hand side vanish with $n\to\infty$ because of the first two formulas in (36). The argument in the last summand is non-random, and the probability is therefore equal to zero for $n$ large by (18). Thus,

[TABLE]

Using similar argumentation we have that there is $c>0$ with the property that for all $\varepsilon>0$ and $n$ large enough

[TABLE]

and the last expression tends to zero in as $n\to\infty$ thanks to the second rate in (36) and (17). Thus,

[TABLE]

Altogether, we can start from (35) and bound

[TABLE]

The first term on the right hand side is bounded in probability due to (37). The summands vanish in probability thanks to (38) and (37), respectively. The theorem is proved.

A.11. Proof of Remark (R7)

Suppose first that $F(t)\geq c\,\left|t\right|^{\gamma}$ for some $c>0$ , $t$ small enough and $\gamma<0$ . Consider $R_{n}=\mathcal{O}(n^{\alpha})$ for $\alpha>0$ . We have

[TABLE]

For the right hand side to be $o(1)$ , it is sufficient that $\alpha<\left(2(d-1)-\gamma(d+1)\right)^{-1}$ .

If $F(t)\geq c\,e^{-\left|t\right|^{\gamma}}$ for some $c>0$ , $\gamma>0$ and all $t$ small enough, we get for $R_{n}\leq\left(\left(\frac{1}{d+1}-\varepsilon\right)\log(n)\right)^{1/\gamma}\sqrt{\lambda}$ and for $\varepsilon>0$

[TABLE]

For $F(t)\geq c\,\exp\left(-e^{\left|t\right|^{\gamma}}\right)$ for some $c>0$ , $\gamma>0$ and all $t$ small enough, $R_{n}\leq\left(\log\left(\frac{1}{d+1}-\varepsilon\right)+\log\log n\right)^{1/\gamma}\sqrt{\lambda}$ with $\varepsilon>0$ gives

[TABLE]

A.12. Proof of Theorem 9

The logarithm of the density of $P^{(j)}$ at $x\in\mathbb{R}^{d}$ can be written as

[TABLE]

By (9) we know that for any $0<\delta<1/2$

[TABLE]

where $\widetilde{\Sigma}_{j}=\Phi^{-1}\left(1-\delta\right)^{2}\Sigma_{j}$ . Thus,

[TABLE]

and $\pi_{1}f_{1}(x)>\pi_{2}f_{2}(x)$ if and only if (19) is true.

The uniform consistency follows from Theorem 6, formula (24), and Lemma 11.

Appendix B Additional simulations and results

B.1. Robust classification

B.1.1. Bivariate normal distribution, location difference

We repeat the same classification experiment as in Section 5.3.1, with $P^{(1)}=P_{X},P^{(2)}=P_{X+(2,2)^{\mathsf{T}}},P^{(3)}=P_{X+(20,20)^{T}}$ . This accounts for classification in presence of only location difference. The results are summarized in Figure B.10 and Table B.5. We observe similar results as in Section 5.3.1: the optimal (Bayes) error rate is nearly achieved by the illumination-based approach and the classical QDA. The approach based on the refined depth performs worse, especially in the extremes, and it is very sensitive to possible contamination.

B.1.2. Bivariate elliptical distribution, location difference

Finally, consider the experiment from Section 5.3.2 with $P^{(1)}=P_{Y},P^{(2)}=P_{Y+(2,2)^{\mathsf{T}}},P^{(3)}=P_{X+(20,20)^{\mathsf{T}}}$ . Our results are summarized in Figure B.11 and Table B.6. We observe that in the case with no contamination, the illumination-based approach and the classical QDA, used only as a reference method here, perform slightly better than the method based on the refined depth. If some contamination is present, the robust QDA appears to outperform both competitors.

Appendix C R source code

library(TukeyRegion) library(geometry)

Illumination = function(X,x,alpha){

X: n-times-d matrix of the sample points (n points in d dimensions)

x: vector of length d whose illumination is computed

alpha: cut-off value for the illumination

returns

I: the illumination of x onto the depth central region

(volume of the convex hull of points with hD at least alpha, and x)

volPa: volume of the depth central region

(volume of the region of points whose hD is at least alpha)

Pa = TukeyRegion(X,depth=alpha*nrow(X),retVertices=TRUE,retVolume=TRUE) volPax = convhulln(rbind(Pa $vertices,x),options="FA")$ vol return(list(I=volPax,volPa=Pa$volume)) }

Acknowledgements

We would like to thank John H. J. Einmahl and Jun Li for sharing the source code of the refined halfspace depth. This work was supported by the grant 19-16097Y of the Czech Science Foundation, and by the PRIMUS/17/SCI/3 project of Charles University.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Brunel, [2019] Brunel, V.-E. (2019). Concentration of the empirical level sets of Tukey’s halfspace depth. Probab. Theory Related Fields , 173(3–4):1165–1196.
2Büeler et al., [2000] Büeler, B., Enge, A., and Fukuda, K. (2000). Exact volume computation for polytopes: a practical study. In Polytopes—combinatorics and computation (Oberwolfach, 1997) , volume 29 of DMV Sem. , pages 131–154. Birkhäuser, Basel.
3Donoho, [1982] Donoho, D. L. (1982). Breakdown properties of multivariate location estimators. Qualifying paper, Harvard University.
4Donoho and Gasko, [1992] Donoho, D. L. and Gasko, M. (1992). Breakdown properties of location estimates based on halfspace depth and projected outlyingness. Ann. Statist. , 20(4):1803–1827.
5Dyckerhoff, [2004] Dyckerhoff, R. (2004). Data depths satisfying the projection property. Allg. Stat. Arch. , 88(2):163–190.
6Dyckerhoff, [2018] Dyckerhoff, R. (2018). Convergence of depths and depth-trimmed regions. ar Xiv preprint ar Xiv:1611.08721 .
7Einmahl et al., [2015] Einmahl, J. H. J., Li, J., and Liu, R. Y. (2015). Bridging centrality and extremity: refining empirical data depth using extreme value statistics. Ann. Statist. , 43(6):2738–2765.
8Emiris and Fisikopoulos, [2018] Emiris, I. Z. and Fisikopoulos, V. (2018). Practical polytope volume approximation. ACM Trans. Math. Software , 44(4):Art. 38, 21.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Illumination depth

Abstract.

1. Introduction

2. Illumination of convex bodies

2.1. Illumination of ellipsoids

Lemma 1**.**

2.2. Duality considerations

3. Illumination depth

Theorem 2**.**

3.1. Uniform consistency

Theorem 3**.**

3.2. Robustness

Theorem 4**.**

3.3. Computational cost

4. Illumination for elliptically symmetric distributions

4.1. Estimation of the Mahalanobis distance

Theorem 5**.**

Theorem 6**.**

4.2. Estimation of the halfspace depth

Theorem 7**.**

Theorem 8**.**

5. Applications

5.1. Tie-breaking

5.2. Estimation of extreme central regions

5.3. Robust classification

Theorem 9**.**

5.3.1. Bivariate normal distribution, location and scale difference

5.3.2. Bivariate elliptical distribution, location and scale difference

Appendix A Proofs of the theoretical results

A.1. Proof of Lemma 1

A.2. Lemma 10

Lemma 10**.**

Proof.

A.3. Proof of Theorem 2

A.4. Proof of Theorem 3

Lemma 11**.**

Proof.

A.5. Consistency of the illumination on unbounded sets

A.6. Proof of Theorem 4

A.7. Proof of Theorem 5

A.8. Proof of Theorem 6

A.8.1. Supremum I

A.8.2. Supremum II

A.8.3. Supremum III

A.9. Proof of Theorem 7

A.10. Proof of Theorem 8

A.11. Proof of Remark (R7)

A.12. Proof of Theorem 9

Appendix B Additional simulations and results

B.1. Robust classification

B.1.1. Bivariate normal distribution, location difference

B.1.2. Bivariate elliptical distribution, location difference

Appendix C R source code

X: n-times-d matrix of the sample points (n points in d dimensions)

x: vector of length d whose illumination is computed

alpha: cut-off value for the illumination

returns

I: the illumination of x onto the depth central region

(volume of the convex hull of points with hD at least alpha, and x)

volPa: volume of the depth central region

(volume of the region of points whose hD is at least alpha)

Acknowledgements

Lemma 1.

Theorem 2.

Theorem 3.

Theorem 4.

Theorem 5.

Theorem 6.

Theorem 7.

Theorem 8.

Theorem 9.

Lemma 10.

Lemma 11.