Extent of occurrence reconstruction using a new data-driven support   estimator

A. Rodr\'iguez-Casal; P. Saavedra-Nieves

arXiv:1907.08627·math.ST·July 23, 2019

Extent of occurrence reconstruction using a new data-driven support estimator

A. Rodr\'iguez-Casal, P. Saavedra-Nieves

PDF

Open Access

TL;DR

This paper introduces a new data-driven method for estimating the probability support of a distribution, using an r-convex set estimator with an algorithm to determine the shape parameter from data, applicable to ecological data.

Contribution

The paper presents a stochastic algorithm to estimate the shape parameter r for r-convex support sets, enabling flexible and accurate support reconstruction from data.

Findings

01

Achieves convergence rates similar to convex hull estimators for convex sets.

02

Provides a practical algorithm for estimating the shape parameter r.

03

Demonstrates application to ecological data for invasive species.

Abstract

Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support S. Under the mild assumption that S is r-convex, the smallest r-convex set which contains the sample points is the natural estimator. The main problem for using this estimator in practice is that r is an unknown geometric characteristic of the set S. A stochastic algorithm is proposed for determining an optimal estimate of r from the data under mild regularity assumptions on the density function. The resulting data-driven reconstruction of S attains the same convergence rates as the convex hull for estimating convex sets, but under a much more flexible smoothness shape condition. The new support estimator will be used for reconstructing the extent of occurrence of an assemblage of invasive plant species in the Azores archipelago.

Figures6

Click any figure to enlarge with its caption.

Equations162

C_{r} (A) = {B_{r} (x) : B_{r} (x) \cap A = \emptyset} ⋂ (B_{r} (x))^{c}

C_{r} (A) = {B_{r} (x) : B_{r} (x) \cap A = \emptyset} ⋂ (B_{r} (x))^{c}

C_{r} (A) = (A \oplus r B) ⊖ r B,

C_{r} (A) = (A \oplus r B) ⊖ r B,

r_{0} = sup {γ > 0 : C_{γ} (S) = S} .

r_{0} = sup {γ > 0 : C_{γ} (S) = S} .

\hat{f}_{n} (x) = i : x \in V or (X_{i}) max f_{n} (X_{i}) I_{x \in C_{r} (X_{n})} .

\hat{f}_{n} (x) = i : x \in V or (X_{i}) max f_{n} (X_{i}) I_{x \in C_{r} (X_{n})} .

Δ_{n}^{*} (X_{n}) = sup {γ : \exists x \mbox s u c h t ha t {x} + γ A \subset S ∖ X_{n}} .

Δ_{n}^{*} (X_{n}) = sup {γ : \exists x \mbox s u c h t ha t {x} + γ A \subset S ∖ X_{n}} .

R (S) = sup {γ > 0 : \exists x \in S \mbox s u c h t ha t B_{γ} [x] \subset S} .

R (S) = sup {γ > 0 : \exists x \in S \mbox s u c h t ha t B_{γ} [x] \subset S} .

Δ_{n} (X_{n}) = sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{γ}{f ( x ) ^{1/ d}} A \subset S ∖ X_{n}}

Δ_{n} (X_{n}) = sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{γ}{f ( x ) ^{1/ d}} A \subset S ∖ X_{n}}

V_{n} (X_{n}) = Δ_{n} (X_{n})^{d} .

V_{n} (X_{n}) = Δ_{n} (X_{n})^{d} .

P (U \leq u) = exp (- exp (- u)) \mbox f or u \in R

P (U \leq u) = exp (- exp (- u)) \mbox f or u \in R

U (X_{n}) \to d U \mbox w h e n n \to \infty,

U (X_{n}) \to d U \mbox w h e n n \to \infty,

n \to \infty lim inf \frac{n V _{n} ( X _{n} ) - l o g ( n )}{l o g ( l o g ( n ))} \geq d - 1 \mbox a . s ., \mbox n \to \infty lim sup \frac{n V _{n} ( X _{n} ) - l o g ( n )}{l o g ( l o g ( n ))} \leq d + 1 \mbox a . s .

n \to \infty lim inf \frac{n V _{n} ( X _{n} ) - l o g ( n )}{l o g ( l o g ( n ))} \geq d - 1 \mbox a . s ., \mbox n \to \infty lim sup \frac{n V _{n} ( X _{n} ) - l o g ( n )}{l o g ( l o g ( n ))} \leq d + 1 \mbox a . s .

U (X_{n}) = n V_{n} (X_{n}) - l o g (n) - (d - 1) l o g (l o g (n)) - l o g (β) .

U (X_{n}) = n V_{n} (X_{n}) - l o g (n) - (d - 1) l o g (l o g (n)) - l o g (β) .

β = \frac{1}{d !} (\frac{π Γ ( \frac{d}{2} + 1 )}{Γ ( \frac{d + 1}{2} )})^{d - 1} .

β = \frac{1}{d !} (\frac{π Γ ( \frac{d}{2} + 1 )}{Γ ( \frac{d + 1}{2} )})^{d - 1} .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) = sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{γ}{f ^ _{n} ( x ) ^{1/ d}} A \subset C_{r} (X_{n}) ∖ X_{n}} .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) = sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{γ}{f ^ _{n} ( x ) ^{1/ d}} A \subset C_{r} (X_{n}) ∖ X_{n}} .

H_{0} : \mbox S \mbox i s r - \mbox co n v e xv er s u s H_{1} : \mbox S \mbox i s n o t r - \mbox co n v e x .

H_{0} : \mbox S \mbox i s r - \mbox co n v e xv er s u s H_{1} : \mbox S \mbox i s n o t r - \mbox co n v e x .

c_{n, α} = \frac{1}{n} (- l o g (- l o g (1 - α)) + l o g (n) + (d - 1) l o g (l o g (n)) + l o g (β))

c_{n, α} = \frac{1}{n} (- l o g (- l o g (1 - α)) + l o g (n) + (d - 1) l o g (l o g (n)) + l o g (β))

\overset{r}{^}_{0} = sup {γ > 0 : \mbox T h e n u l l h y p o t h es i s H_{0} \mbox t ha t S \mbox i s γ - \mbox co n v e x i s a cce pt e d} .

\overset{r}{^}_{0} = sup {γ > 0 : \mbox T h e n u l l h y p o t h es i s H_{0} \mbox t ha t S \mbox i s γ - \mbox co n v e x i s a cce pt e d} .

d_{H} (A, C) = max {a \in A sup d (a, C), c \in C sup d (c, A)}, \vspace - 0.13 c m

d_{H} (A, C) = max {a \in A sup d (a, C), c \in C sup d (c, A)}, \vspace - 0.13 c m

d_{H} (S, C_{r_{n}} (X_{n})) = O_{P} (\frac{lo g n}{n})^{\frac{2}{d + 1}} .

d_{H} (S, C_{r_{n}} (X_{n})) = O_{P} (\frac{lo g n}{n})^{\frac{2}{d + 1}} .

x + \frac{c _{n, α}^{1/ d}}{f ^ _{n}^{1/ d} ( x )} A \subset C_{r} (X_{n}) ∖ X_{n} .

x + \frac{c _{n, α}^{1/ d}}{f ^ _{n}^{1/ d} ( x )} A \subset C_{r} (X_{n}) ∖ X_{n} .

x + \frac{c _{n, α}^{1/ d}}{f ^ _{n}^{1/ d} ( x )} A \subset C_{r} (X_{n} ∖ X_{n}) .

x + \frac{c _{n, α}^{1/ d}}{f ^ _{n}^{1/ d} ( x )} A \subset C_{r} (X_{n} ∖ X_{n}) .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) \leq sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{γ}{f ^ _{n} ( x ) ^{1/ d}} A \subset S ∖ X_{n}} .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) \leq sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{γ}{f ^ _{n} ( x ) ^{1/ d}} A \subset S ∖ X_{n}} .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) \leq sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{( 1 - ϵ _{n}^{+} ) γ}{f ( x ) ^{1/ d}} A \subset S ∖ X_{n}} .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) \leq sup {γ : \exists x \mbox s u c h t ha t {x} + \frac{( 1 - ϵ _{n}^{+} ) γ}{f ( x ) ^{1/ d}} A \subset S ∖ X_{n}} .

P (U (X_{n}) > - (1 - ϵ_{n}^{+})^{d} lo g (- lo g (1 - α))

P (U (X_{n}) > - (1 - ϵ_{n}^{+})^{d} lo g (- lo g (1 - α))

+ ((1 - ϵ_{n}^{+})^{d} - 1) (lo g (n) + (d - 1) lo g (lo g (n)) + lo g (β))) .

+ ((1 - ϵ_{n}^{+})^{d} - 1) (lo g (n) + (d - 1) lo g (lo g (n)) + lo g (β))) .

u sup ∣ P (U (X_{n}) \leq u) - P (U \leq u) ∣ \to 0.

u sup ∣ P (U (X_{n}) \leq u) - P (U \leq u) ∣ \to 0.

P (U > - lo g (- lo g (1 - α)) + o (1)) \to α .

P (U > - lo g (- lo g (1 - α)) + o (1)) \to α .

P (\hat{V}_{n, r} > c_{n, α}) \leq P (U (X_{n}) > - lo g (- lo g (1 - α)) + o (1)) \to α .

P (\hat{V}_{n, r} > c_{n, α}) \leq P (U (X_{n}) > - lo g (- lo g (1 - α)) + o (1)) \to α .

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) \geq (λ_{0} - ϵ_{n}^{-}) R (C_{r} (X_{n}) ∖ X_{n}),

\hat{δ} (C_{r} (X_{n}) ∖ X_{n}) \geq (λ_{0} - ϵ_{n}^{-}) R (C_{r} (X_{n}) ∖ X_{n}),

R (C_{r} (X_{n}) ∖ X_{n}) \geq ρ^{^{'}} > 0.

R (C_{r} (X_{n}) ∖ X_{n}) \geq ρ^{^{'}} > 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Statistical Methods and Models · Pharmacological Effects of Medicinal Plants · Statistical Methods and Inference

Full text

Extent of occurrence reconstruction using a new data-driven support estimator

A. Rodríguez-Casal

P. Saavedra-Nieves

Department of Statistics, Mathematical Analysis and Optimization, Universidade de Santiago de Compostela, Spain

Abstract

Given a random sample of points from some unknown distribution, we propose a new data-driven method for estimating its probability support $S$ . Under the mild assumption that $S$ is $r-$ convex, the smallest $r-$ convex set which contains the sample points is the natural estimator. The main problem for using this estimator in practice is that $r$ is an unknown geometric characteristic of the set $S$ . A stochastic algorithm is proposed for determining an optimal estimate of $r$ from the data under mild regularity assumptions on the density function. The resulting data-driven reconstruction of $S$ attains the same convergence rates as the convex hull for estimating convex sets, but under a much more flexible smoothness shape condition. The new support estimator will be used for reconstructing the extent of occurrence of an assemblage of invasive plant species in the Azores archipelago.

keywords:

Support estimation, $r-$ convex, testing $r-$ convexity, spacing, extent of occurrence (EOO), area of occupancy (AOO)

††journal:

1 Introduction

Natural reserve network designs require information about species occurrence data. One of the most widely handled concepts is the extent of occurrence (EOO). In fact, the International Union for the Conservation of Nature (IUCN) establishes the EOO as a key measure of extinction risk. Roughly speaking, the IUCN defines the EOO as the area contained within the shortest continuous imaginary boundary which can be drawn to encompass all the known, inferred or projected sites of present occurrence of a taxon, excluding cases of vagrancy. For a complete review on this subject, see Rondinini et al. (2006).

The problem of EOO reconstruction will be illustrated via the analysis of a real dataset containing $740$ geographical coordinates (or occurrences) for $28$ species of terrestrial invasive plants distributed in two of the Azorean islands (Terceira and São Miguel) from 2010 until 2018. In Figure 1, a satellite image of major Azorean islands (top, left) and five of the invasive species are shown (bottom). The $740$ geographical locations (slightly jittered) are represented on the map of Terceira and São Miguel islands in Figure 1 (top, right). This dataset is available from the Global Biodiversity Information Facility (GBIF) website (see GBIF.org, 27th May 2019).

An initial estimation of the EOO for this assemblage of invasive plants was obtained from GeoCAT. It is an open source, browser based tool endorsed by IUCN that allows to reconstruct the EOO from the geographical locations of species or taxon. Users can quickly combine data from multiple sources including GBIF datasets which can be easily imported. The GeoCAT reconstruction of the EOO for the assemblage of plant species used here as an example is given by the convex hull of the sample of the $740$ coordinates, $H(\mathcal{X}_{740})$ . Mathematically, $H(\mathcal{X}_{740})$ is the smallest convex set that contains $\mathcal{X}_{740}$ . In fact, it is computed as the intersection of all half spaces containing $\mathcal{X}_{740}$ . For more details, compare Figure 4 (first row, left) and Figure 4 (second row, left). Note that this EOO estimation presents some limitations because a marine area is inside the $H(\mathcal{X}_{740})$ . Obviously, none of the plant species considered here can occur in open sea which should remain outside the EOO. Therefore, convexity can be a too restrictive shape condition to be assumed in practice.

Our goal is to propose a more realistic and automatic EOO reconstruction from support estimation perspective. This methodological approach has proved to be useful in different disciplines such as image analysis (see Rodríguez-Casal and Saavedra-Nieves, 2016), quality control (see Devroye and Wise, 1980 or Chevalier, 1976) or animals home range estimation (see De Haan and Resnick, 1994 or Baíllo and Chacón, 2018). However, the problem of reconstructing the EOO has not been yet considered formally under this viewpoint.

In general, support estimation deals with the problem of reconstructing the compact and nonempty support $S\subset\mathbb{R}^{d}$ of an absolutely continuous random vector $X$ from a random sample $\mathcal{X}_{n}=\{X_{1},...,X_{n}\}$ (see Cuevas and Fraiman, 2010 for a complete survey on the subject). Of course, when the support $S$ is assumed to be convex then the convex hull of the sample points, $H(\mathcal{X}_{n})$ , provides a natural support estimator. See Schneider (1988, 1993), Dümbgen and Walther (1996) or Reitzner (2003), for thorough analysis of this estimator. This estimator is indeed simple, but it may not be suitable for practical situations, failing to provide a satisfactory support estimator when $S$ is disconnected as in the example of invasive plants in Azores archipelago where the occurrences are distributed in two different islands.

In this work, we will propose a new data-driven support estimator and, as a consequence, an original and realistic EOO reconstruction that will overcome the limitations derived from convexity restriction. Concretely, we assume that the support $S$ satifies the $r-$ convexity shape condition for $r>0$ , a much more flexible property than convexity as it will be shown. Our proposal considers the smallest $r$ -convex set containing $\mathcal{X}_{n}$ ( $r-$ convex hull of $\mathcal{X}_{n}$ , namely $C_{r}(\mathcal{X}_{n})$ ) as the natural estimator for the usually unknown support. This estimator is well known in the computational geometry literature for providing reasonable global reconstructions if the sample points are (approximately) uniformly distributed on the set $S$ (see Edelsbrunner, 2014). In fact, despite being $r-$ convexity a more general condition than convexity, $C_{r}(\mathcal{X}_{n})$ can achieve the same convergence rates than $H(\mathcal{X}_{n})$ as proved by Rodríguez-Casal (2007). However, this estimator presents an important disadvantage: it depends on the commonly unknown parameter $r$ . Although the influence of $r$ is considerable, it must be specified by the practitioner (see Joppa et al., 2016). For the example of invasive species in Azorean islands, Figure 4 shows $C_{r}(\mathcal{X}_{740})$ for different values of $r$ . Small values of $r$ provide fragmented estimators (many isotated points and connected components) leading to an EOO reconstruction which resembles $\mathcal{X}_{n}$ (Figure 4: second row, right). If $r=0.3$ , a realistic reconstruction of the EOO is obtained since sea areas are not inside the estimator (Figure 4: third row, left). However, if large values of $r$ are considered then $C_{r}(\mathcal{X}_{n})$ basically coincides with $H(\mathcal{X}_{n})$ (Figure 4: third row, right). Therefore, arbitrary choices of $r$ may provide incongruous EOO estimations.

Most of the available results in the literature about support estimation make special emphasis on asymptotic properties, especially consistency and convergence rates but they do not usually give any criterion for selecting the unknown parameter $r$ in $C_{r}(\mathcal{X}_{n})$ from the sample. The aim of this paper is to overcome this drawback and present a method for selecting the parameter $r$ for the $r-$ convex hull estimator from the available data. This problem has scarcely been studied in the statistical literature with just a couple of references available on the topic. First, Mandal and Murthy (1997) proposed a selector for $r$ based on the concept of minimum spanning tree but only consistency of the method was provided without considering optimality issues. Later, Rodríguez-Casal and Saavedra-Nieves (2016) proposed an automatic selection criterion based on a very intuitive idea for the selection of $r$ but under the restriction that the sample distribution is uniform. According to Figure 4 (bottom, right), sea areas are contained in $C_{r}(\mathcal{X}_{n})$ if the selected $r$ is too large. So, the estimator contains a large ball empty of sample points, see gray balls in Figure 4 (top, left) and (bottom, right). Janson (1987) calibrated the size of this maximal ball (or spacing) when the sample distribution is uniform on $S$ . Berrendero et al. (2012) used this result to test uniformity when the support is unknown. However, Rodríguez-Casal and Saavedra-Nieves (2016) followed the somewhat opposite approach. They assume that $\mathcal{X}_{n}$ comes from a uniform distribution on $S$ and if a big enough spacing is found in $C_{r}(\mathcal{X}_{n})$ then it is incompatible with the assumption that data are uniform. As a consequence, it is concluded that $r$ is too large. Therefore, it is proposed to select the largest value of $r$ compatible with the uniformity assumption on $C_{r}(\mathcal{X}_{n})$ .

Recently, Aaron et al. (2017) extended the results by Janson (1987) to the case where the data are generated from a density $f$ that is bounded from below and Lipschitz continuous restricted to its bounded support. Here, we will use this extension in order to derive a test to decide, given a fixed $r>0$ , whether the unknown support $S$ is $r-$ convex with no more information apart from $\mathcal{X}_{n}$ . In this case, if a large enough spacing is found in $C_{r}(\mathcal{X}_{n})$ then the null hypothesis of $r-$ convexity will be rejected. A new data-driven selector for the index $r$ will be established from this test. Following the scheme in Rodríguez-Casal and Saavedra-Nieves (2016), it is proposed to choose the largest value of $r$ compatible with the $r-$ convexity assumption.

Once the parameter $r$ is estimated from $\mathcal{X}_{n}$ , a new data-driven support reconstruction, based on the estimator of $r$ , will be proposed. As a consequence, a flexible reconstruction for the EOO will be obtained.

This paper is organized as follows. Mathematical tools are introduced in Section 2. First, the geometric assumptions on $S$ and the optimal value of the parameter $r$ to be estimated are introduced. Then, the regularity assumptions on $f$ and a new nonparametric estimator are established. At last, the maximal spacing and its estimator are formally defined. In Section 3, we propose a procedure for testing the null hypothesis that $S$ is $r-$ convex for a given $r>0$ . This test will play a key role in the definition of the consistent estimator of $r$ . Then, a new estimator for the support $S$ is proposed in Section 4 and it will be seen that it achieves the same convergence rates as the convex hull for estimating convex sets. The main numerical features involving the practical application of the algorithm are exposed in Section 5. In Section 6, the performance of the new support reconstruction will be analyzed estimating the EOO of an assemblage of terrestrial plant species in two Azorean islands. Conclusions are exposed in Section 7. In Section 8, we detail the proofs of theoretical results. Finally, some auxiliary results are deferred to Section 9.

2 Mathemathical tools

Regularity conditions, namely shape assumptions on $S$ , will be introduced next. In addition, we will discuss which is the optimal value of the shape index $r$ to be estimated. Then, required conditions on the density function $f$ and an original nonparemetric kernel estimator will be also presented. Finally, basic notions on maximal spacings are established.

2.1 About geometric assumptions on $S$ and the optimal value of $r$

In this work, $S$ is assumed to be $r-$ convex for some $r>0$ . Therefore, it is necessary to establish the formal definition of this geometric property in Definition 2.1.

Definition 2.1.

A closed set $A\subset\mathbb{R}^{d}$ is said to be $r-$ convex, for some $r>0$ , if $A=C_{r}(A)$ , where

[TABLE]

denotes the $r-$ convex hull of $A$ and $B_{r}(x)$ , the open ball with center $x$ and radius $r$ .

In practice, $C_{r}(\mathcal{X}_{n})$ can be computed as the intersection of the complements of all open balls of radius larger than or equal to $r$ that do not intersect $\mathcal{X}_{n}$ . In Figure 2, the computation of $C_{r}(\mathcal{X}_{740})$ for $r=0.3$ (left) and $r=5$ (right) is shown considering the example in Azorean islands. Note that $C_{0.3}(\mathcal{X}_{740})$ is an acceptable EOO reconstruction equal to the intersection of the complements of all gray open balls represented. However, if we select $r=5$ , marine areas are clearly inside the $C_{5}(\mathcal{X}_{740})$ .

Furthermore, the concept of $r-$ convex hull is closely related to the closing of $A$ by $B_{r}(0)$ from the mathematical morphology, see Serra (1982). It can be shown that

[TABLE]

where $B=B_{1}(0)$ , $\lambda C=\{\lambda c:c\in C\}$ , $C\oplus D=\{c+d:\ c\in C,d\in D\}$ and $C\ominus D=\{x\in\mathbb{R}^{d}:\ \{x\}\oplus D\subset C\}$ , for $\lambda\in\mathbb{R}$ and sets $C$ and $D$ .

As it has been mentioned in the Introduction, the problem of reconstructing a $r-$ convex support $S$ using a data-driven procedure could be easily solved if the parameter $r$ is estimated from a random sample of points $\mathcal{X}_{n}$ taken in $S$ . The first step is to determine precisely the optimal value of $r$ to be estimated, which is established in Definition 2.2: we propose to estimate the largest value of $r$ which verifies that $S$ is $r-$ convex.

Definition 2.2.

Let $S\subset\mathbb{R}^{d}$ a compact, nonconvex and $r-$ convex set for some $r>0$ . It is defined

[TABLE]

For simplicity, it is assumed that $S$ is not convex (of course, if $S$ is convex $r_{0}$ would be infinity). Proposition 2.4 in Rodríguez-Casal and Saavedra-Nieves (2016) shows that, under mild regularity conditions, the supreme established in (1) is a maximum, that is, $S$ is $r_{0}-$ convex and $r-$ convex for all $r<r_{0}$ . Under this hypothesis, the optimality of the smoothing parameter defined in (1) can be justified. It is clear that $S$ is $r-$ convex for $r\leq r_{0}$ but if $r<r_{0}$ , $C_{r}(\mathcal{X}_{n})$ is a non admisible estimator since it is always outperformed by $C_{r_{0}}(\mathcal{X}_{n})$ . This happens because, with probability one, $C_{r}(\mathcal{X}_{n})\subset C_{r_{0}}(\mathcal{X}_{n})\subset S$ . It should also noted that, for $r>r_{0}$ , even for $r$ very close to $r_{0}$ , $C_{r}(\mathcal{X}_{n})$ would considerably overestimate $S$ . For instance, if $S$ is equal to the circular ring in Figure 3 (right) and $r>r_{0}$ , $C_{r}(S)$ coincides with the outer circle. The mild regularity condition we need is slightly stronger than $r-$ convexity:

( $R$ ) $S$ fulfills the $r-$ rolling property and $S^{c}$ fulfills the $\lambda-$ rolling condition for some $r$ , $\lambda>0$ .

Following Cuevas et al. (2012), it is said $A$ satisfies the (outside) $r-$ rolling condition if each boundary point $a\in\partial A$ is contained in a closed ball with radius $r$ whose interior does not meet $A$ . There exist interesting relationships between this property and $r-$ convexity. In particular, Cuevas et al. (2012) proved that if $A$ is compact and $r-$ convex then $A$ fulfills the $r-$ rolling condition. According to Figure 3 (left), the reciprocal is not always true. Proposition 2.2 in Rodríguez-Casal and Saavedra-Nieves (2016) shows that ( $R$ ) is a (mild) sufficient condition to ensure the $r-$ rolling condition implies $r-$ convexity. Condition ( $R$ ) was essentially analyzed by Walther (1997, 1999) but just the case $r=\lambda$ was taken into account. In this work, the radius $\lambda$ can be different from $r$ , see Figure 3 (center). Walther (1997, 1999) proved that, if $r=\lambda$ , $\partial S$ is a $\mathcal{C}^{1}$ $(d-1)-$ dimensional submanifold in $\mathbb{R}^{d}$ and that $S$ is $r-$ convex. Proposition 2.2 in Rodríguez-Casal and Saavedra-Nieves (2016) generalized this property since, for $\lambda<r$ , Walther’s result would only imply $\lambda$ -convexity but not $r-$ convexity. So, for sets satisfying ( $R$ ), $r-$ convexity is ensured, even for very small values of $\lambda$ .

Proposition 2.2 in Rodríguez-Casal and Saavedra-Nieves (2016) is the key for proving that $r_{0}$ is a maximum. To see this, let be $\{r_{n}\}$ a sequence converging to $r_{0}$ such that $C_{r_{n}}(S)=S$ . This sequence always exists by Definition 2.2. It can be proved, using the results by Cuevas et al. (2012), that $S$ satisfies the $r_{n}-$ rolling condition and, by Proposition 2.3 in Rodríguez-Casal and Saavedra-Nieves (2016), this property is preserved in the limit, so $S$ is also $r_{0}$ -rolling. Finally, under ( $R$ ), $r_{0}$ -rolling implies that $S$ is $r_{0}-$ convex.

The authors conjecture that the equivalence between $r-$ convexity and $r-$ rolling could be stated in a more general framework and it may be proved under milder conditions.

Remark 2.3.

Under certain conditions of $S$ (for instance, $\operatorname{Int}(H(S))\neq\emptyset$ ), it is verified that $C_{\infty}(S)=H(S)$ where $C_{\infty}(S)=\lim_{r_{n}\rightarrow\infty}C_{r_{n}}(S)$ . Therefore, if $S$ is assumed to be convex, Proposition 2.4 in Rodríguez-Casal and Saavedra-Nieves (2016) remains true. For more details, see Walther (1999).

2.2 About regularity conditions on $f$ and its nonparametric estimation

All through this paper, we assume that the random sample of points, $\mathcal{X}_{n}$ , is generated from a density $f$ that satisfies the next regularity condition:

( $f_{0,1}^{L}$ ) The restriction of the density $f$ to $S$ is Lipschitz continuous (there exists $k_{f}$ such that $\forall x,y\in S$ ,

( $f_{0,1}^{L}$ ) $|f(x)-f(y)|\leq k_{f}\|x-y\|$ and there exists $f_{0}>0$ such that $f(x)\geq f_{0}$ for all $x\in S$ . Furthermore,

( $f_{0,1}^{L}$ ) $f_{1}=\max_{x\in S}f(x)$ .

As an example in the one-dimensional case, condition ( $f_{0,1}^{L}$ ) is satisfied by $f(x)=1/(b-a)$ if $x\in[a,b]$ and $f(x)=0$ , otherwise where $a$ and $b$ denote two real numbers verifying that $a<b$ .

Morever, a non-conventional density estimator will be introduced in Definition 2.4.

Definition 2.4.

Let $r>0$ and let $Vor(X_{i})$ be the Voronoi cell of the point $X_{i}$ (i.e. $Vor(X_{i})=\{x:\|x-X_{i}\|=\min_{y\in\mathcal{X}_{n}}\|x-y\|\}$ ). If $K$ is a kernel function and $f_{n}(x)=\frac{1}{nh_{n}^{d}}\sum K((x-X_{i})/h_{n})$ denotes the usual kernel density estimator, we define

[TABLE]

Note that this nonparametric estimator have a non-usual behaviour: it is expected to converge towards the unknown density when the support is $r-$ convex, but not when the support is not $r-$ convex.

Moreover, some technical hypotheses on the kernel function must be established.

( $\mathcal{K}_{\phi}^{p}$ ) The kernel function $K$ belongs to the set of kernels $\mathcal{K}$ such that $K(u)=\phi(p(u))$ where $p$ is a polynomial ( $\mathcal{K}_{\phi}^{p}$ ) and $\phi$ is a is bounded real function of bounded variation, verifying that $c_{K}=\int\|u\|K(u)du<\infty$ , $K\geq 0$ ( $\mathcal{K}_{\phi}^{p}$ ) and there exists $r_{K}$ and $c^{{}^{\prime}}_{K}>0$ such that $K(x)\geq c^{{}^{\prime}}_{K}$ for all $x\in B_{r_{K}}[0]$ .

Condition ( $\mathcal{K}_{\phi}^{p}$ ) is satisfied, for instance, by the Gaussian kernel.

2.3 About maximal spacings and its nonparametric estimation

The optimal value of the shape index $r$ to be estimated is just established in Definition 2.2. Some concepts on maximal spacings theory must be handled to propose a consistent estimate of $r$ .

The notion of maximal-spacing in several dimensions was introduced and studied by Deheuvels (1983) for uniformly distributed data on the unit cube. Later on, Janson (1987) extended these results to uniformly distributed data on any bounded set and derived the asymptotic distribution of different maximal-spacings notions without conditions on the shape of the support $S$ . Aaron et al. (2017) generalized the results by Janson (1987) to the non-uniform case.

The shape of the considered spacings will be defined by a given set $A\subset\mathbb{R}^{d}$ . For the validity of the theoretical results, it is sufficient to assume that $A$ is a compact and convex set. For practical purposes, the usual choices are $A=[0,1]^{d}$ or $A=B_{1}[0]$ , the closed ball of center [math] and radius $1$ . For a general dimension $d$ , the first definition of maximal spacing is that used by Janson (1987) under the restriction of data are uniformly distributed:

[TABLE]

If the Lebesgue measure of the set $A$ is one, $\Delta_{n}^{*}(\mathcal{X}_{n})^{d}$ represents the Lebesgue measure of the largest set $\{x\}+\gamma A\subset S\setminus\mathcal{X}_{n}$ . The concept of maximal spacing can be related easily to the maximal inner radius when $A=B_{1}[0]$ . If $\operatorname{Int}(S)\neq\emptyset$ , the maximal inner radius of $S$ is defined as

[TABLE]

Note that the value of the maximal spacing depends on $S$ and also on $\mathcal{X}_{n}$ . However, the definition of the maximal inner radius relies only on $S$ .

Aaron et al. (2017) extended the definition of maximal-spacing assuming that $\mathcal{X}_{n}$ is drawn according to a density $f$ with bounded support $S$ , the Lebesgue measure of the set $A$ is one and its barycentre is the origin of $\mathbb{R}^{d}$ . In this more general setting, the maximal spacing is defined as

[TABLE]

and

[TABLE]

The previous definition of maximal spacing relies on density $f$ . In this way, it distinguishes between low and high density regions. Throughout this paper, we will assume this latter choice $A=w_{d}^{-1/d}B_{1}[0]$ where $w_{d}$ denotes the Lebesgue measure of $B_{1}[0]$ .

Janson (1987) calibrated the volume of the maximal spacing under uniformity assumptions without conditions on the shape of the support $S$ . The corresponding extension established in Theorem 2 in Aaron et al. (2017) is shown in Theorem 2.5 modifying slightly the original hypotheses on $f$ and on the shape of $S$ . The result remains true if it is assumed that $S$ is under ( $R$ ) and the density function $f$ satifies ( $f_{0,1}^{L}$ ).

Theorem 2.5.

Let $\mathcal{X}_{n}$ be a random and i.i.d sample drawn according to a density $f$ that satisfies ( $f_{0,1}^{L}$ ) with compact and nonempty support $S$ under ( $R$ ). Let $U$ be a random variable with distribution

[TABLE]

and let $\beta$ be a constant specified in Janson (1987). Then, we have that

[TABLE]

where

[TABLE]

Remark 2.6.

The value of constant $\beta$ does not depend on $S$ . It is explicitly given in Janson (1987). Concretely,

[TABLE]

In particular, for the bidimensional case, $\beta=1$ .

A plug-in estimator of the maximal spacing $\Delta_{n}(\mathcal{X}_{n})$ will be proposed next. Note that the definition of $\Delta_{n}(\mathcal{X}_{n})$ relies on the support $S$ and also on the density function $f$ (both are usually unknown). Under the assumption of $r-$ convexity, $S$ will be estimated as $C_{r}(\mathcal{X}_{n})$ . As for the density function $f$ , the new nonparametric density estimator introduced in Definition 2.4 will be used. Then, we define the following plug-in estimator of $\Delta_{n}(\mathcal{X}_{n})$ :

[TABLE]

Note that if $S$ is $r-$ convex, $\hat{\delta}(C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})$ should converge to zero as the sample size increases. However, if $S\subsetneq C_{r}(S)$ , the plug-in estimator of $\Delta_{n}(\mathcal{X}_{n})$ is expected to converge to a positive constant.

3 A new test for $r-$ convexity

We will introduce a consistent hypothesis test based on $\mathcal{X}_{n}$ drawn according to an unknown density $f$ on the unknown support $S$ , to asses $r-$ convexity for a certain $r>0$ . This test is crucial for defining an estimator of $r_{0}$ that would allow the data-driven estimation of the support $S$ .

Given $r>0$ , the null hypothesis that $S$ is $r-$ convex will be tested taking the volume of $\hat{\delta}(C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})$ as statistic. The idea that supports this procedure is simple: Under ( $f_{0,1}^{L}$ ) and ( $R$ ), Theorem 2.5 allows us to detect which values of $V_{n}(\mathcal{X}_{n})$ are large enough to be incompatible with these two assumptions. Since a similar reasoning can be also applied if we consider the volume of $\hat{\delta}(C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})$ , the test is based on the opposite approach: Under ( $f_{0,1}^{L}$ ) and ( $R$ ), if the test statistic takes large enough values, it will mean that the selected $r$ is not appropriate and a smaller one should be considered.

The performance of this test can be illustrated using the real database of invasive plants in Azorean islands. Given the sample $\mathcal{X}_{740}$ , the practitioner could be interested in testing the null hythothesis that the EOO is $r-$ convex, for instance, for $r=5$ . According to Figure 4 (third row, right), it is clear that large Atlantic Ocean areas are inside $C_{5}(\mathcal{X}_{740})$ and the EOO is overestimated. Moreover, the volume of $\hat{\delta}(C_{5}(\mathcal{X}_{740})\setminus\mathcal{X}_{740})$ will be too large. In fact, although larger samples sizes were considered, its volume would take a constant value (see gray ball inside the EOO reconstruction). Therefore, the null hypothesis of $5-$ convexity should be rejected. Note that the situation is the opposite if testing $r-$ convexity for $r=0.3$ is the goal. The volume of $\hat{\delta}(C_{0.3}(\mathcal{X}_{740})\setminus\mathcal{X}_{740})$ should be clearly smaller. Furthermore, when the sample size increases, this volume tends to zero. Formally, the asymptotic behaviour of the test is stated in Theorem 3.1.

Theorem 3.1.

Let $r>0$ and let $\mathcal{X}_{n}$ be a random and i.i.d sample drawn according to a density $f$ that satisfies ( $f_{0,1}^{L}$ ) with compact and nonempty support $S$ under ( $R$ ). Let $\hat{f}_{n}$ be the corresponding density estimator introduced in Definition 2.4 and let $K$ be the kernel function under ( $\mathcal{K}_{\phi}^{p}$ ). Assume that $h_{n}=O(n^{-\zeta})$ for some $0<\zeta<1/d$ . For the following decision problem,

[TABLE]

(a)

The test based on the statistic $\hat{V}_{n,r}=\hat{\delta}(C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})^{d}$ with critical region $RC=\{\hat{V}_{n,r}>c_{n,\alpha}\}$ , where

[TABLE]

has an asymptotic level less than $\alpha$ .

(b)

Moreover, if $S$ verifying ( $R$ *) is not * $r-$ convex, the power is $1$ for sufficiently large $n$ .

Remark 3.2.

Note that the optimal kernel sequence size, $h_{n}=h_{0}n^{1/(d+4)}$ , satisfies the hypotheses under which Theorem 3.1 holds. Therefore, any reasonable bandwidth selector should be suitable for testing $r-$ convexity.

3.1 Selection and consistency results of the optimal smoothing parameter

The optimal estimation of the smoothing parameter $r_{0}$ from $\mathcal{X}_{n}$ is based on the test previously proposed. Specifically, according to Definition 2.2, $r_{0}$ will be estimated by

[TABLE]

That is, it is proposed to select the largest value of $\gamma$ compatible with the $\gamma-$ convexity assumption. Note that this choice depends on the significance level of the test. Again, we use the example of invasive plants in Azorean islands in order to analyze this estimator. Under ( $f_{0,1}^{L}$ ) and ( $R$ ), if the volume of $\hat{\delta}(C_{\gamma}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})$ is large enough, then the null hypothesis of $\gamma-$ convexity will be rejected. Therefore, a smaller value of $\gamma$ should be selected. This case corresponds to Figure 4 (third row, right) taking $\gamma=5$ . However, the situation is completely opposite in Figure 4 (second row, right) when $\gamma=0.03$ . Here, the size of the maximal spacing found in $C_{0.03}(\mathcal{X}_{740})\setminus\mathcal{X}_{740}$ does not allow to reject that the support is $0.03-$ convex. As a consequence, a bigger $\gamma$ than $0.03$ should be considered.

The technical properties for the estimator of $r_{0}$ are considered next. First, the existence of the supreme defined in (2) must be guaranteed, a result which is proved in Theorem 3.3. In addition, it is also proved that $\hat{r}_{0}$ consistently estimates $r_{0}$ .

Theorem 3.3.

Let $f$ be a density function that satisfies ( $f_{0,1}^{L}$ ) with compact, nonconvex and nonempty support $S$ under ( $R$ ). Let $\hat{f}_{n}$ be the density estimator introduced in Definition 2.4 and let $K$ be the kernel function under ( $\mathcal{K}_{\phi}^{p}$ ). Assume that $h_{n}=O(n^{-\zeta})$ for some $0<\zeta<1/d$ . Let $r_{0}$ be the parameter defined in (1) and $\hat{r}_{0}$ defined in (2). Let $\{\alpha_{n}\}\subset(0,1)$ be a sequence converging to zero such that $\log(\alpha_{n})/n\rightarrow 0$ . Then, $\hat{r}_{0}$ converges to $r_{0}$ in probability.

Remark 3.4.

For the sake of clarity, $S$ is assumed non-convex throughout the test. However, if $S$ is convex, it can be shown that $\hat{r}_{0}$ goes to infinity (which is the value of $r_{0}$ in this case) because, with high probability, the test is not rejected for all values of $r$ .

4 Consistency of resulting support estimator

The behaviour of the random set $C_{\hat{r}_{0}}(\mathcal{X}_{n})$ as an estimator of $S$ can be studied once the consistency of $\hat{r}_{0}$ has been proved. Two metrics between sets are usually considered in order to assess the performance of a support estimator. Specifically, let $A$ and $C$ be two closed, bounded, nonempty subsets of $\mathbb{R}^{d}$ . The Hausdorff distance between $A$ and $C$ is defined by

[TABLE]

where $d(a,C)=\inf\{\|a-c\|:c\in C\}$ and $\|\mbox{ }\|$ denotes the Euclidean norm. Besides, if $A$ and $C$ are two bounded and Borel sets then the distance in measure between $A$ and $C$ is defined by $d_{\mu}(A,C)=\mu(A\triangle C)$ , where $\mu$ denotes the Lebesgue measure and $\triangle$ , the symmetric difference, that is, $A\triangle C=(A\setminus C)\cup(C\setminus A).$ Hausdorff distance quantifies the physical proximity between two sets whereas the distance in measure is useful to quantify their similarity in content. However, neither of these distances are completely useful for measuring the similarity between the shape of two sets. The Hausdorff distance between boundaries, $d_{H}(\partial A,\partial C)$ , can be also used to evaluate the performance of the estimators (see Baíllo and Cuevas, 2001; Cuevas and Rodríguez-Casal, 2004; Rodríguez-Casal, 2007 or Genovese et al., 2012).

In particular, if $\lim_{r\rightarrow r_{0}^{+}}d_{H}(S,C_{r}(S))=0$ then, the consistency of $C_{\hat{r}_{0}}(\mathcal{X}_{n})$ can be proved easily from Theorem 3.3. However, the consistency cannot be guaranteed if $d_{H}(S,C_{r}(S))$ does not go to zero as $r$ goes to $r_{0}$ from above (as $\hat{r}_{0}$ does, see Proposition 8.1 below). This problem can be solved by considering the estimator $C_{r_{n}}(\mathcal{X}_{n})$ where $r_{n}=\nu\hat{r}_{0}$ with $\nu\in(0,1)$ fixed. This ensures that, for $n$ large enough, with high probability, $C_{r_{n}}(\mathcal{X}_{n})\subset S$ . From the practical point of view the selection of $\nu$ is not a major issue because $\hat{r}_{0}$ is numerically approximated and the computed estimator always satisfies this property without multiplying by $\nu$ . In some sense, Theorem 4.1 gives the convergence rate of the numerical approximation of $\hat{r}_{0}$ .

Theorem 4.1.

Let $\mathcal{X}_{n}$ be a random and i.i.d sample drawn according to a density $f$ that satisfies ( $f_{0,1}^{L}$ ) with compact, nonconvex and nonempty support $S$ under ( $R$ ). Let $r_{0}$ be the parameter defined in (1) and $\hat{r}_{0}$ defined in (2). Let $\{\alpha_{n}\}\subset(0,1)$ be a sequence converging to zero such that $\log(\alpha_{n})/n\rightarrow 0$ . Let be $\nu\in(0,1)$ and $r_{n}=\nu\hat{r}_{0}$ . Then,

[TABLE]

The same convergence order holds for $d_{H}(\partial S,\partial C_{r_{n}}(\mathcal{X}_{n}))$ and $d_{\mu}(S\triangle C_{r_{n}}(\mathcal{X}_{n}))$ .

5 Numerical illustration

The main numerical aspects of the estimation algorithm of $r_{0}$ in (1) are detailed in what follows. Although the method proposed in this work is fully data-driven from a theoretical point of view, its practical implementation depends on the specification of two parameters to be selected by the practitioner: the significance level of the test $\alpha$ and the maximum number for connected components $\mathcal{C}$ of the resulting support estimator. Choosing them is a much more flexible and simpler problem than the specification of the shape index $r_{0}$ .

With probability one, for a large enough $n$ , the existence of the estimator $\hat{r}_{0}$ defined in (2) is guaranteed under the hypotheses of Theorem 3.3. However, in practice, this estimator might not exist for a specific sample $\mathcal{X}_{n}$ and a given value of the significance level $\alpha$ . Therefore, the influence of $\alpha$ must be taken into account. The null hypothesis of $r-$ convexity will be (incorrectly) rejected for $0<r\leq r_{0}$ with probability $\alpha$ , approximately. This is not important from the theoretical point of view, since we are assuming that $\alpha=\alpha_{n}$ goes to zero as the sample size increases. But, what should be done, for a given sample, if $H_{0}$ is rejected for all $r$ (or at least all reasonable values of $r$ )? In order to fix a minimum acceptable value of $r$ , it is assumed that $S$ (and, hence, its estimator) will have no more than $\mathcal{C}$ connected components. Too fragmented estimators will not be considered even in the case that we reject $H_{0}$ for all $r$ . The minimum value that ensures a number of connected components not greater than $\mathcal{C}$ will be taken in this latter case. Therefore, this parameter $\mathcal{C}$ can be interpreted as a geometric stopping criteria that does not appear in theoretical results because the sequence $\alpha_{n}$ is assumed to tend to zero.

Dichotomy algorithms can be used to compute $\hat{r}_{0}$ . The practitioner must select a maximum number of iterations $I$ and two initial points $r_{m}$ and $r_{M}$ with $r_{m}<r_{M}$ such that the null hypothesis of $r_{M}-$ convexity is rejected and the null hypothesis of $r_{m}-$ convexity is accepted. According to the previous comments, it is assumed that the number of connected components of $C_{r_{m}}(\mathcal{X}_{n})$ must not be greater than $\mathcal{C}$ . Choosing a value close enough to zero is usually sufficient to select $r_{m}$ . According to Figure 4 (second row, right), the maximal spacing in $C_{0.03}(\mathcal{X}_{n})$ will be small enough to accept $0.03-$ convexity. Therefore, taking $r_{m}\leq 0.03$ will be a good choice. However, if selecting this $r_{m}$ is not possible because, for very low values of $r$ , the hypothesis of $r-$ convexity is still rejected then $r_{0}$ is estimated as the positive closest value to zero $r$ such that the number of connected components of $C_{r}(\mathcal{X}_{n})$ is smaller than or equal to $\mathcal{C}$ . On the other hand, if a large enough spacing for having a statistically significant test cannot be found in $H(\mathcal{X}_{n})$ then we propose $H(\mathcal{X}_{n})$ as the estimator for the support.

To sum up, the following inputs should be given: the significance level $\alpha\in(0,1)$ , a maximum number of iterations $I$ , a maximum number of connected components $\mathcal{C}$ and two initial values $r_{m}$ and $r_{M}$ . Given these parameters $\hat{r}_{0}$ will be computed as follows:

In each iteration and while the number of them is smaller than $I$ :

(a)

$r=(r_{m}+r_{M})/2.$ 2. (b)

If the null hypothesis of $r-$ convexity is not rejected then $r_{m}=r$ . 3. (c)

Otherwise, $r_{M}=r$ . 2. 2.

Then, $\hat{r}_{0}=r_{m}$ .

Some technical aspects related to the computation of the maximal spacings must be also mentioned. In the proposed procedure, the null hypothesis needs to be tested $I$ times. Since it involves the calculation of the maximal spacing, one may be aware of computational cost of the method. Nevertheless, as noted by Rodríguez-Casal and Saavedra-Nieves (2016), this maximal spacing does not need to be specifically determined and it is enough to check if there exists a point $x$ such that

[TABLE]

In this case, $\hat{V}_{n,r}\geq c_{n,\alpha}$ and, therefore, the null hypothesis of $r-$ convexity will be rejected. Furthermore, note that if this disc exists then $x\notin B_{c_{n,\alpha}^{x,w}}(X_{k})$ where $c_{n,\alpha}^{x,w}=c_{n,\alpha}^{1/d}w_{d}^{-1/d}\hat{f}_{n}^{-1/d}(x)$ and $X_{k}$ denotes the sample point such that $x\in Vor(X_{k})$ . Therefore, $\hat{f}_{n}(x)=f_{n}(X_{k})$ .

Then, the centers of the possible maximal balls that belong to the Voronoi tile with nucleus $X_{i}$ ( $i,\cdots,n$ ) necessarily lie in $B_{c_{n,\alpha}^{X_{i},w}}(X_{i})^{c}\cap Vor(X_{i})$ . We will follow the next steps:

Determine the set of candidates for ball centers $D(r)=C_{r}(\mathcal{X}_{n})\cap\bigcup_{X_{i}\in E(m)}(\partial B_{c_{n,\alpha}^{X_{i},w}}(X_{i})\cap Vor(X_{i}))$ where $E(m)\subset\mathcal{X}_{n}$ denotes the extremes of the $m-$ shape of $\mathcal{X}_{n}$ when $m=\min\left\{c_{n,\alpha}^{X_{j},w}:X_{j}\in\mathcal{X}_{n}\right\}$ , see Edelsbrunner (2014). If $x\in D(r)$ then we can guarantee that $B_{c_{n,\alpha}^{X_{i},w}}(x)\cap\mathcal{X}_{n}=\emptyset$ . Equivalently,

[TABLE] 2. 2.

Calculate $M(r)=\max\{d(x,\partial C_{r}(\mathcal{X}_{n}):x\in D(r)\}$ . 3. 3.

If $M(r)\leq\hat{c}_{n,\alpha}$ then the null hypothesis of $r-$ convexity is not rejected.

It should be noted that if $X_{i}\notin E(m)$ , for all $x\in B_{c_{n,\alpha}^{X_{i},w}}(X_{i})^{c}\cap Vor(X_{i})$ , $B_{c_{n,\alpha}^{X_{i},w}}(x)\cap\mathcal{X}_{n}\neq\emptyset$ . Therefore, these points can be discarded in order to determine $D(r)$ . Furthermore, $E(m)$ , $\partial C_{r}(\mathcal{X}_{n})$ and $\partial B_{\hat{c}_{n,\alpha,r}^{*}}(\mathcal{X}_{n})$ can be easily computed (at least for the bidimensional case). See Pateiro-López and Rodríguez-Casal (2010) for further details.

6 Extent of occurrence estimation

The new support estimator introduced in this work will be used for reconstructing the EOO of an assemblage of terrestrial invasive plants in two islands of the Azores Archipelago, Terceira and São Miguel. For this real dataset, we have shown that convexity assumption is very restrictive. According to Figure 4 (first and second rows, left), sea areas are inside the classical estimator of the EOO. Obviously, it is overestimated given that terrestrial invasive plants does not occupy the Atlantic Ocean. The goal here is to reconstruct the EOO overcoming these limitations.

First, it is necessary to estimate the optimal value $r_{0}$ from the sample of $740$ geographical locations. If we select the significance level $\alpha$ equal to $0.01$ and $\mathcal{C}=4$ , the resulting estimator is $\hat{r}_{0}=0.127$ . In Figure 5, $C_{\hat{r}_{0}}(\mathcal{X}_{740})$ is shown. According to the results obtained, the EOO reconstruction has two different connected components corresponding to the two Azorean islands. Unlike classical EOO estimator, sea areas are not inside the reconstruction. Therefore, if the sample size is large enough, a more sophisticated and realistic estimator of the EOO can be determined.

The new method, although designed for handling more complex situations, provides similar reconstructions to those corresponding to the convex hull in those cases where the classical reconstruction works appropiately. For showing this, we will focus on the geographical locations from São Miguel island. Separately, the EOO will be estimated from data corresponding to years $2015$ and $2016$ . A total of $33$ and $48$ geographical locations are available in $2015$ and $2016$ , respectively.

Figure 6 contains the EOO estimator in $2015$ (left) and $2016$ (center). In 2015, the resulting reconstruction of the EOO is equal to $H(\mathcal{X}_{33})$ . In 2016, $\hat{r}_{0}=1.5$ ; however, the estimation of the EOO obtained, $C_{1.5}(\mathcal{X}_{48})$ , is not so different from the convex hull. This last illustration suggests that, if more amount of data are available by year, this kind of analysis could be useful for studying the temporal changes in the spatial pattern of organisms, including invasive plants, on an area of interest.

7 Conclusions and open problems

The main goal of this work is to propose a new data-driven method for reconstructing a $r-$ convex support in a consistent way. The route designed to reach this goal can be summarized as follows: (1) Defining the optimal value of $r$ , $r_{0}$ , to be estimated, (2) establishing a nonparametric test to asess the null hypothesis that $S$ is $r-$ convex for a given $r>0$ , (3) defining the estimator of $r_{0}$ that strongly relies on the previous test and (4) checking that the estimator of $r_{0}$ and the resulting support reconstruction are consistent.

The definition of the estimator $\hat{r}_{0}$ depends on the $r-$ convexity test established that, of course, could be used in an independent way. In many practical situations where the support is completely unknown and only a sample of points is available, it can be interesting to test if the corresponding support distribution is $r-$ convex.

Futhermore, the behaviour of the proposed support estimator was illustrated through the estimation of the EOO of an assemblage of terrestrial invasive plants in two Azorean islands. In this particular case, where convexity assumption on the EOO is too restrictive, our support estimator provides a more realistic and sophisticated reconstruction. Besides, we have shown that when the classical convex reconstruction works appropiately, our estimator offers similar reconstructions. Furthermore, we have shown that estimating the EOO from annual (or any other time period) occurrences could be useful for detecting temporal changes in the spatial pattern of organisms.

Note that the resulting support estimator is spatially flexible. In other words, it is able to distinguish the different disconnected components of the support. Therefore, it could be used for reconstructing the support of an intensity function of a Poisson process.

Finally, another interesting problem and intimately related to the EOO reconstruction is to estimate the area of occupancy (AOO). The IUCN defined the AOO as the area within its extent of occurrence. Under $r-$ convexity, we could estimate the AOO as the area of the $r-$ convex hull of the sample points. However, this estimator suffers from the drawback of not being rate-optimal. Arias-Castro et al. (2018) proposed an optimal volume estimator based on the sample $r-$ convex hull using a sample splitting strategy that attains the minimax lower bound. Therefore, the problem of estimating the AOO could be studied from a different perspective in future.

8 Proofs

In this section the proofs of the stated theorems are presented.

Proof of Theorem 2.5.

First, Aaron et al. (2017) assumed that $f$ is Hölder continuous with respect to Lebesgue measure. Under ( $f_{0,1}^{L}$ ), this condition is satisfied. See Aaron et al. (2017) for more details.

Furthermore, Aaron et al. (2017) also assumed that there exists $k<d$ and $C_{\partial S}>0$ such that $N(\partial S,\epsilon)\leq C_{\partial S}\epsilon^{-k}$ where $N(\partial S,\epsilon)$ denotes the inner covering number of $\partial S$ . Under ( $R$ ), Theorem 1 in Walther (1997) guaranteed that $\partial S$ is a $\mathcal{C}^{1}$ $(d-1)-$ dimensional submanifold. Therefore, the previous assumption is fulfilled for $k=d-1$ . See Aaron et al. (2017) for more details.

Proof of Theorem 3.1.

First, we will prove (a) and then, (b).

(a)

Under $H_{0}$ ( $C_{r}(S)=S$ ), $C_{r}(\mathcal{X}_{n})\subset S$ . Then,

[TABLE]

If we apply Lemma 9.1, we get, with probability one, for $n$ large enough,

[TABLE]

Equivalently, $\Delta_{n}(\mathcal{X}_{n})\geq(1-\epsilon_{n}^{+})\hat{\delta}(C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})$ and, therefore, $\mathbb{P}(\hat{V}_{n,r}>c_{n,\alpha})\leq\mathbb{P}(V_{n}(\mathcal{X}_{n})>(1-\epsilon_{n}^{+})c_{n,\alpha})$ from where it follows that $\mathbb{P}(\hat{V}_{n,r}>c_{n,\alpha})$ can be majorized by,

[TABLE]

According to Theorem 2.5, $U(\mathcal{X}_{n})\stackrel{{\scriptstyle d}}{{\rightarrow}}U$ when $n\rightarrow\infty$ . Furthermore, notice that $U$ has a continuous distribution, so convergence in distribution implies that

[TABLE]

Therefore, using that $\log(n)\epsilon_{n}^{+}$ tends to zero, we get that

[TABLE]

As a consequence,

[TABLE]

(b)

From Lemma 9.1 (ii),

[TABLE]

where $\epsilon_{n}^{-}$ tends to zero, almost surely. Under $H_{1}$ ( $S$ is not $r-$ convex, $S\subsetneq C_{r}(S)$ ), we will prove that, with probability one and for $n$ large enough,

[TABLE]

In particular, we will find a closed ball of radius $\rho^{{}^{\prime}}>0$ that, with probability one and for $n$ large enough, is inside $C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n}$ .

Then, let be $r^{*}$ such that $r>r^{*}>0$ and $S\subsetneq C_{r^{*}}(S)\subset C_{r}(S)$ . Since $S$ is under ( $R$ ), Proposition 2.2 in Rodríguez-Casal and Saavedra-Nieves (2016) ensures that $S$ is $r-$ convex. However, under $H_{1}$ , $S$ is not $r-$ convex. Therefore, it is easy to guarantee the existence of $r^{*}$ .

According to Lemma 8.3 in Rodríguez-Casal and Saavedra-Nieves (2016),

[TABLE]

It can be assumed, without loss of generality, that $r\leq\frac{\rho}{2}+r^{*}$ . If this is not the case then it would be possible to replace $r^{*}$ by $r^{**}>r^{*}$ satisfying $r^{**}<r\leq\frac{\rho}{2}+r^{**}$ . For this $r^{**}$ ,

[TABLE]

Now, we can apply Lemma 3 in Walther (1997) in order to ensure that

[TABLE]

If $S\oplus r^{*}B_{1}[0]\subset\mathcal{X}_{n}\oplus rB_{1}[0]$ then $(S\oplus r^{*}B_{1}[0])\ominus r^{*}B_{1}[0]\subset(\mathcal{X}_{n}\oplus rB_{1}[0])\ominus r^{*}B_{1}[0]$ , that is, $C_{r^{*}}(S)\subset(\mathcal{X}_{n}\oplus rB_{1}[0])\ominus r^{*}B_{1}[0]$ . This imply that

[TABLE]

In addition,

[TABLE]

where we have used that, for sets $A,C$ and $D$ , $(A\ominus C)\ominus D=A\ominus(C\oplus D)$ . Finally, since $B_{\rho}(x)\subset C_{r^{*}}(S)$ and $\rho/2\geq(r-r^{*})$ , we have $B_{\rho/3}[x]\subset C_{r^{*}}(S)\ominus(\rho/2)B_{1}[0]\subset C_{r^{*}}(S)\ominus(r-r^{*})B_{1}[0]\subset C_{r}(\mathcal{X}_{n})$ . This part of the proof is concluded by taking $\rho^{{}^{\prime}}=\rho/3$ .

Therefore,

[TABLE]

Then, with probability one and for $n$ large enough,

[TABLE]

The proof is finished taking into account that $c_{n,\alpha}$ tends to zero.

Proof of Theorem 3.3.

Some auxiliary results are necessary. First we will prove that, with probability tending to one, $\hat{r}_{0}$ is at least as big as $r_{0}$ .

Proposition 8.1.

Let $f$ be a density function that fulfils condition ( $f_{0,1}^{L}$ ) with compact, nonconvex and nonempty support $S$ under ( $R$ ). Let $\hat{f}_{n}$ be the corresponding density estimator introduced in Definition 2.4 and let $K$ be the kernel function under ( $\mathcal{K}_{\phi}^{p}$ ). Assume that $h_{n}=O(n^{-\zeta})$ for some $0<\zeta<1/d$ . Let $r_{0}$ be the parameter defined in (1) and $\hat{r}_{0}$ defined in (2). Let $\{\alpha_{n}\}\subset(0,1)$ be a sequence converging to zero. Then,

[TABLE]

Proof.

Equivalently, we will prove that

[TABLE]

From the definition of $\hat{r}_{0}$ , see (2), it is clear that

[TABLE]

where $\hat{V}_{n,r_{0}}=\hat{\delta}(C_{r_{0}}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})^{d}$ and $c_{n,\alpha_{n}}=n^{-1}(-\log(-\log(1-\alpha_{n}))+\log(n)+(d-1)\log\log(n)+\log{\beta})$ . Therefore,

[TABLE]

Since, with probability one, $C_{r_{0}}(\mathcal{X}_{n})\subset S$ , applying Lemma 9.1, $\Delta_{n}(\mathcal{X}_{n})\geq(1-\epsilon_{n}^{+})\hat{\delta}(C_{r_{0}}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})$ and, therefore, $\mathbb{P}(\hat{V}_{n,r_{0}}>c_{n,\alpha_{n}})\leq\mathbb{P}(V_{n}(\mathcal{X}_{n})>(1-\epsilon_{n}^{+})^{d}c_{n,\alpha_{n}})$ from where it follows that $\mathbb{P}(\hat{V}_{n,r_{0}}>c_{n,\alpha_{n}})$ can be majorized by,

[TABLE]

According to Theorem 2.5, $U(\mathcal{X}_{n})\stackrel{{\scriptstyle d}}{{\rightarrow}}U$ when $n\rightarrow\infty$ . Furthermore, notice that $U$ has a continuous distribution, so convergence in distribution implies that

[TABLE]

Since $\alpha_{n}\to 0$ and $\log(n)\epsilon_{n}^{+}\to 0$ , we can prove

[TABLE]

This ensures that

[TABLE]

Therefore, $\mathbb{P}(\hat{r}_{0}\geq r_{0})\to 1$ .

∎

It remains to prove that $\hat{r}_{0}$ cannot be arbitrarily larger that $r_{0}$ . Some auxiliary results must be proved next.

Lemma 8.2.

Let $\mathcal{X}_{n}$ be a random and i.i.d sample drawn according to a density $f$ that satisfies ( $f_{0,1}^{L}$ ) with compact, nonconvex and nonempty support $S$ under ( $R$ ). Let $r_{0}$ be the parameter defined in (1). Then, for all $r>r_{0}$ , there exists an open ball $B_{\rho}(x)$ such that $B_{\rho}(x)\cap S=\emptyset$ and

[TABLE]

Proof.

Let be $r^{*}$ such that $r>r^{*}>r_{0}$ . Since $C_{r_{0}}(S)=S\subsetneq C_{r^{*}}(S)$ , according to Lemma 8.3 in Rodríguez-Casal and Saavedra-Nieves (2016),

[TABLE]

It can be assumed, without loss of generality, that $r\leq\frac{\epsilon}{2}+r^{*}$ . If this is not the case then it would be possible to replace $r^{*}$ by $r^{**}>r^{*}$ satisfying $r^{**}<r\leq\frac{\epsilon}{2}+r^{**}$ . For this $r^{**}$ ,

[TABLE]

Now, we can apply Lemma 3 in Walther (1997) in order to ensure that

[TABLE]

If $S\oplus r^{*}B\subset\mathcal{X}_{n}\oplus rB$ then $(S\oplus r^{*}B)\ominus r^{*}B\subset(\mathcal{X}_{n}\oplus rB)\ominus r^{*}B$ , that is, $C_{r^{*}}(S)\subset(\mathcal{X}_{n}\oplus rB)\ominus r^{*}B$ . This imply that

[TABLE]

In addition,

[TABLE]

where we have used that, for sets $A,C$ and $D$ , $(A\ominus C)\ominus D=A\ominus(C\oplus D)$ . Finally, since $B_{\epsilon}(x)\subset C_{r^{*}}(S)$ and $\epsilon/2\geq(r-r^{*})$ , we have $B_{\epsilon/2}(x)\subset C_{r^{*}}(S)\ominus(\epsilon/2)B\subset C_{r^{*}}(S)\ominus(r-r^{*})B\subset C_{r}(\mathcal{X}_{n})$ . This concludes the proof of the lemma by taking $\rho=\epsilon/2$ . ∎

Proposition 8.3.

Let $\mathcal{X}_{n}$ be a random and i.i.d sample drawn according to a density $f$ that satisfies ( $f_{0,1}^{L}$ ) with compact, nonconvex and nonempty support $S$ under ( $R$ ). Let $r_{0}$ be the parameter defined in (1) and $\{\alpha_{n}\}\subset(0,1)$ a sequence converging to zero such that $\log(\alpha_{n})/n\rightarrow 0$ . Then, for any $\epsilon>0$ ,

[TABLE]

Proof.

Given $\epsilon>0$ let be $r=r_{0}+\epsilon$ . According to Lemma 8.2, there exists $x\in\mathbb{R}^{d}$ and $\rho>0$ such that $B_{\rho}(x)\cap S=\emptyset$ and

[TABLE]

Since, with probability one, $\mathcal{X}_{n}\subset S$ we have $B_{\rho}(x)\cap\mathcal{X}_{n}=\emptyset$ . Then, $\{x\}+\rho B_{1}[0]\subset C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n}$ . Let $W$ the positive number $\hat{f}_{n}^{1/d}(x)w_{d}^{1/d}$ . If $\gamma=\rho W>0$ then it is trivial to check that

[TABLE]

Therefore, $\hat{\delta}(C_{r}(\mathcal{X}_{n})\setminus\mathcal{X}_{n})\geq\gamma>0$ and, consequently, $\hat{V}_{n,r}=c_{\gamma}>0$ . Similarly, $\hat{V}_{n,r^{{}^{\prime}}}\geq\hat{V}_{n,r}=c_{\gamma}>0$ for all $r^{{}^{\prime}}\geq r$ . On the other hand, since $-u_{\alpha_{n}}/\log(\alpha_{n})=\log(-\log(1-\alpha_{n}))/\log(\alpha_{n})\to 1$ , we have, with probability one,

[TABLE]

Then, with probability one, there exists $n_{0}$ such that if $n\geq n_{0}$ we have

[TABLE]

Therefore, $\hat{r}_{0}\leq r$ . This last statement follows from $\hat{V}_{n,r^{\prime}}>c_{n,\alpha_{n}}$ for all $r^{\prime}\geq r$ and the definition of $\hat{r}_{0}$ , see (2). ∎

Theorem 3.3 is a straightforward consequence of Propositions 8.1 and 8.3.

Proof of Theorem 4.1.

Theorem 3 of Rodríguez-Casal (2007) ensures that, under ( $R$ ) when $r=\lambda=\widetilde{r}$ ), then $\mathbb{P}(\mathcal{E}_{n})\to 1$ , where

[TABLE]

and $D$ is some constant. Under the hypothesis of Theorem 4.1 this holds for any $\widetilde{r}\leq\min\{r,\lambda\}$ . Fix one $\widetilde{r}\leq\min\{r,\lambda\}$ such that $\widetilde{r}<\nu r_{0}$ and define $\mathcal{R}_{n}=\{\widetilde{r}\leq r_{n}\leq r_{0}\}$ . Since, by Theorem 3.3, $r_{n}=\nu\hat{r}_{0}$ converges in probability to $\nu r_{0}$ and $\widetilde{r}<\nu r_{0}<r_{0}$ , we have that $\mathbb{P}(\mathcal{R}_{n})\to 1$ . If the events $\mathcal{E}_{n}$ and $\mathcal{R}_{n}$ hold (notice that $\mathbb{P}(\mathcal{E}_{n}\cap\mathcal{R}_{n})\to 1$ ) we have $C_{\widetilde{r}}(\mathcal{X}_{n})\subset C_{r_{n}}(\mathcal{X}_{n})\subset S$ and, therefore,

[TABLE]

This completes the proof of the first statement of Theorem 4.1. Similarly, it is possible to prove the result for the other error criteria considered in Theorem 4.1.

9 Auxiliary results

Lemma 9.1 shows that Lemma 5 in Aaron et al. (2017) remains true if $S$ satisfies ( $R$ ) and the density estimator $\hat{f}_{n}$ introduced in Definition 2.4 is considered. Concretely, Aaron et al. (2017) assumed that $S$ is a compact standard set. Roughly speaking, this condition prevents the support $S$ from being too spiky. Under the smoothness condition ( $R$ ), standardness is guaranteed. See Rodríguez-Casal (2007) or Cuevas and Fraiman (1997) for more details.

Lemma 9.1.

Let $r>0$ and let $f$ be a density function that satisfies ( $f_{0,1}^{L}$ ) with compact and nonempty support $S$ under ( $R$ ). Let $\hat{f}_{n}$ be the corresponding density estimator introduced in Definition 2.4 and let $K$ be the kernel function under ( $\mathcal{K}_{\phi}^{p}$ ). Assume that $h_{n}=O(n^{-\zeta})$ with $\zeta\in(0,1/d)$ . Then,

(i)

there exists a sequence $\epsilon_{n}^{+}$ such that $\log(n)\epsilon_{n}^{+}$ tends to zero and for all $x\in S$ ,

[TABLE]

(ii)

there exists a sequence $\epsilon_{n}^{-}$ tending to zero and a constant $\lambda_{0}$ such that for all $x\in C_{r}(\mathcal{X}_{n})$ , $(\hat{f}_{n}(x))^{1/d}\geq\lambda_{0}-\epsilon_{n}^{-}$ e.a.s.

Proof.

Next, some preliminary results established in Aaron et al. (2017) (see proof of Lemma 5) are detailed.

First, taking $\rho_{n}=\left(\frac{4f_{1}\log(n)}{f_{0}w_{d}n}\right)^{1/d}$ , it can be proved that

[TABLE]

Under ( $\mathcal{K}_{\phi}^{p}$ ), $S$ verifies ( $R$ ) and $K$ is bounded from below on a neighbourhood of the origin, there exist $c^{{}^{\prime\prime}}_{K}$ and $r_{K}>0$ such that

[TABLE]

Furthermore, for all $x\in S$ ,

[TABLE]

Since $f$ is Lipschitz and $\int_{\mathbb{R}^{d}}K(u)du=1$ it is verified that, for all $x\in S$ ,

[TABLE]

where $c_{k}>0$ y $k_{f}$ is established in Condition B.

From (4) and the condition $f(x)>f_{0}$ for all $x\in S$ , it follows that

[TABLE]

First, we will prove (i). Using triangular inequality, we can ensure that

[TABLE]

As for the first term on the right hand side of this inequality, it is necessary to take into account that $K$ verifies ( $\mathcal{K}_{\phi}^{p}$ ) and $h_{n}=O(n^{-\zeta})$ with $\zeta\in(0,1/d)$ . Then, Theorem 2.3 in Giné and Gillou (2002) guarantees that, there exists a constant $C_{1}$ such that, with probability one, for $n$ large enough,

[TABLE]

Therefore,

[TABLE]

As a consequence,

[TABLE]

Next, the second term on the right hand side of inequality (7) will be bounded. For all $x\in S$ ,

[TABLE]

Since $\{(x,y)\in S^{2},\mbox{ }\|x-y\|\leq h_{n}\}$ is compact, the Lebesgue dominate convergence theorem entails that there exists $y_{0}\in S$ such that $\|x-y_{0}|\leq\rho_{n}$ , and a sequence $y_{k}$ with $y_{k}$ tending to $y_{0}$ , $\|y_{k}-y_{0}\|\leq\rho_{n}$ , such that for $n$ large enough, with probability one,

[TABLE]

Next, equation (5) and Lipschitz continuity of $f$ allow to prove that

[TABLE]

With the same type of argument, we can ensure that

[TABLE]

From equations (9), (10), (11) and (3), we get

[TABLE]

Taking $\epsilon_{n}=k_{f}\rho_{n}+k_{f}h_{n}c_{K}+(f_{1}+k_{f}h_{n}c_{K})C_{S}n^{-2}+C_{1}\left(\frac{nh_{n}^{d}}{-log(h_{n})}\right)^{1/2}$ such that $log(n)\epsilon_{n}$ tends to zero. From equations (7), (8), (12), we obtain that, with probability one, for $n$ large enough,

[TABLE]

Then, for all $x\in S$ , $\hat{f}_{n}(x)-f(x)\leq f(x)\epsilon_{n}/f_{0}$ , and thus,

[TABLE]

or equivalently,

[TABLE]

Finally, if $\epsilon_{n}^{+}=(1-(1+\epsilon_{n}/f_{0})^{-1/d})\sim\epsilon_{n}/(df_{0})$ then $\epsilon_{n}^{+}log(n)$ tends to zero. Therefore,

[TABLE]

This concludes the proof of (i).

In order to prove (ii), observe that

[TABLE]

Since we have already proved that $\max_{x\in\mathbb{R}^{d}}|\mathbb{E}\hat{f}_{n}(x)-\hat{f}_{n}(x)|$ tends to zero almost surely, it only remains to check that $\min_{x\in\mathbb{R}^{d}}\mathbb{E}\hat{f}_{n}(x)$ is bounded from below by a positive constant. From $\min_{x\in\mathbb{R}^{d}}\mathbb{E}\hat{f}_{n}(x)=\min_{x\in\mathcal{X}_{n}}\mathbb{E}f_{n}(x)$ and (6), we get

[TABLE]

∎

Acknowledgements. The authors are grateful to Ignacio Munilla Rumbao for drawing his attention to the extent of occurrence estimation problem and to Rosa M. Crujeiras for her useful and enriching comments. This work has been supported by Projects MTM2016-76969P and MTM2017-089422-P from the Ministry of Economy and Competitiveness and ERDF.

Bibliography33

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Aaron, C., Cholaquidis, A., Fraiman, R.: On the maximal multivariate spacing extension and convexity tests. Ar Xiv preprint:1411.2482 (2014)
2[2] Arias-Castro, E., Pateiro-López, B., Rodríguez-Casal, A.: Minimax Estimation of the Volume of a Set Under the Rolling Ball Condition. JASA, 1-12 (2018)
3[3] Baíllo, A., Chacón, J. E.: A survey and a new selection criterion for statistical home range estimation. ar Xiv preprint ar Xiv:1804.05129. (2018)
4[4] Baíllo, A., Cuevas, A.: On the estimation of a star-shaped set. Adv. in Appl. Probab., 33, 717–726 (2001)
5[5] Berrendero, J. R., Cuevas, A., Pateiro-López, B.: A multivariate uniformity test for the case of unknown support. Stat. Comput., 22, 259–271 (2012)
6[6] Chevalier, J.: Estimation du support et du contour du support d’une loi de probabilité. Ann. Inst. Henri Poincaré Probab. Stat., 12, 339–364 (1976)
7[7] Cuevas, A., Fraiman, R.: Set estimation. New perspectives in stochastic geometry, 374–397 (2010)
8[8] Cuevas, A., Fraiman, R., Pateiro-López, B.: On statistical properties of sets fulfilling rolling-type conditions. Adv. in Appl. Probab., 44, 311–329 (2012)

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Extent of occurrence reconstruction using a new data-driven support estimator

Abstract

keywords:

1 Introduction

2 Mathemathical tools

2.1 About geometric assumptions on SSS and the optimal value of rrr

Definition 2.1**.**

Definition 2.2**.**

Remark 2.3**.**

2.2 About regularity conditions on fff and its nonparametric estimation

Definition 2.4**.**

2.3 About maximal spacings and its nonparametric estimation

Theorem 2.5**.**

Remark 2.6**.**

3 A new test for r−r-r−convexity

Theorem 3.1**.**

Remark 3.2**.**

3.1 Selection and consistency results of the optimal smoothing parameter

Theorem 3.3**.**

Remark 3.4**.**

4 Consistency of resulting support estimator

Theorem 4.1**.**

5 Numerical illustration

6 Extent of occurrence estimation

7 Conclusions and open problems

8 Proofs

Proposition 8.1**.**

Proof.

Lemma 8.2**.**

Proof.

Proposition 8.3**.**

Proof.

9 Auxiliary results

Lemma 9.1**.**

Proof.

2.1 About geometric assumptions on $S$ and the optimal value of $r$

Definition 2.1.

Definition 2.2.

Remark 2.3.

2.2 About regularity conditions on $f$ and its nonparametric estimation

Definition 2.4.

Theorem 2.5.

Remark 2.6.

3 A new test for $r-$ convexity

Theorem 3.1.

Remark 3.2.

Theorem 3.3.

Remark 3.4.

Theorem 4.1.

Proposition 8.1.

Lemma 8.2.

Proposition 8.3.

Lemma 9.1.