Nonparametric intensity estimation from noisy observations of a Poisson   process under unknown error distribution

Martin Kroll

arXiv:1703.05619·math.ST·February 19, 2019

Nonparametric intensity estimation from noisy observations of a Poisson process under unknown error distribution

Martin Kroll

PDF

TL;DR

This paper develops a nonparametric method for estimating the intensity of a Poisson process from noisy, indirect observations, achieving minimax optimal rates even when the error distribution is unknown and estimated from additional data.

Contribution

It introduces an orthonormal series estimator that adapts to unknown smoothness and error distribution, providing minimax optimal convergence rates in a circular Poisson process model.

Findings

01

Estimator attains minimax optimal convergence rates.

02

Data-driven dimension selection improves adaptivity.

03

Method effectively handles unknown error distribution.

Abstract

We consider the nonparametric estimation of the intensity function of a Poisson point process in a circular model from indirect observations $N_{1}, \dots, N_{n}$ . These observations emerge from hidden point process realizations with the target intensity through contamination with additive error. In case that the error distribution can only be estimated from an additional sample $Y_{1}, \dots, Y_{m}$ we derive minimax rates of convergence with respect to the sample sizes $n$ and $m$ under abstract smoothness conditions and propose an orthonormal series estimator which attains the optimal rate of convergence. The performance of the estimator depends on the correct specification of a dimension parameter whose optimal choice relies on smoothness characteristics of both the intensity and the error density. We propose a data-driven choice of the dimension parameter based on model selection and show…

Tables1

Table 1. Table 1. Exemplary rates of convergence for nonparametric intensity estimation. The rates are given in the framework of Theorems 1 , 2 and 4 which impose the given restrictions. In all examples ω 0 = 1 subscript 𝜔 0 1 \omega_{0}=1 , ω j = | j | 2 s subscript 𝜔 𝑗 superscript 𝑗 2 𝑠 \omega_{j}=|j|^{2s} for j ≠ 0 𝑗 0 j\neq 0 , whereas the choices (pol) and (exp) for the sequences γ 𝛾 \gamma and α 𝛼 \alpha are explained in Section 3.4 .

γ

α

Θ ​ (Ψ_{n})

Θ ​ (Φ_{m})

Restrictions

(pol)

n^{- \frac{2 ​ (p - s)}{2 ​ p + 2 ​ a + 1}}

m^{- \frac{(p - s) \land a}{a}}

p \geq s

,

a > \frac{1}{2}

(exp)

(pol)

{(\log n)}^{2 ​ s + 2 ​ a + 1} \cdot n^{- 1}

m^{- 1}

a > \frac{1}{2}

(pol)

(exp)

{(\log n)}^{- 2 ​ (p - s)}

{(\log m)}^{- 2 ​ (p - s)}

p \geq s

(exp)

{(\log n)}^{2 ​ s} \cdot n^{- \frac{p}{p + a}}

${(\log m)}^{2 s} \cdot m^{- p / a}$	if $a \geq p$
$m^{- 1}$	if $a < p$

Equations283

N_{i} = j \sum δ_{x_{ij}}

N_{i} = j \sum δ_{x_{ij}}

N_{i} = j \sum δ_{y_{ij}} .

N_{i} = j \sum δ_{y_{ij}} .

N_{1}, \dots, N_{n} i.i.d. and Y_{1}, \dots, Y_{m} i.i.d. \sim f

N_{1}, \dots, N_{n} i.i.d. and Y_{1}, \dots, Y_{m} i.i.d. \sim f

λ_{k} = 0 \leq ∣ j ∣ \leq k \sum [λ]_{j} e_{j}

λ_{k} = 0 \leq ∣ j ∣ \leq k \sum [λ]_{j} e_{j}

[g]_{j} = \int_{0}^{1} g (t) e_{j} (- t) d t .

[g]_{j} = \int_{0}^{1} g (t) e_{j} (- t) d t .

λ \in Λ sup f \in F sup E [∥ λ - λ ∥_{ω}^{2}]

λ \in Λ sup f \in F sup E [∥ λ - λ ∥_{ω}^{2}]

λ in f λ \in Λ sup f \in F sup E [∥ λ - λ ∥_{ω}^{2}]

λ in f λ \in Λ sup f \in F sup E [∥ λ - λ ∥_{ω}^{2}]

λ \in Λ sup f \in F sup E [∥ λ^{*} - λ ∥_{ω}^{2}] ≲ λ in f λ \in Λ sup f \in F sup E [∥ λ - λ ∥_{ω}^{2}] .

λ \in Λ sup f \in F sup E [∥ λ^{*} - λ ∥_{ω}^{2}] ≲ λ in f λ \in Λ sup f \in F sup E [∥ λ - λ ∥_{ω}^{2}] .

N_{i} = j \sum δ_{x_{ij} + ε_{ij} - ⌊ x_{ij} + ε_{ij} ⌋}

N_{i} = j \sum δ_{x_{ij} + ε_{ij} - ⌊ x_{ij} + ε_{ij} ⌋}

ℓ (t) = \int_{0}^{1} λ ((t - ε) - ⌊ t - ε ⌋) f (ε) d ε, t \in [0, 1) .

ℓ (t) = \int_{0}^{1} λ ((t - ε) - ⌊ t - ε ⌋) f (ε) d ε, t \in [0, 1) .

[ℓ]_{j} = \frac{1}{n} i = 1 \sum n \int_{0}^{1} e_{j} (- t) d N_{i} (t)

[ℓ]_{j} = \frac{1}{n} i = 1 \sum n \int_{0}^{1} e_{j} (- t) d N_{i} (t)

[ℓ]_{j} = [λ]_{j} \cdot [f]_{j} + ξ_{j} for all j \in Z

[ℓ]_{j} = [λ]_{j} \cdot [f]_{j} + ξ_{j} for all j \in Z

λ_{k} = 0 \leq ∣ j ∣ \leq k \sum \frac{[ ℓ ] _{j}}{[ f ] _{j}} \mathds 1_{Ω_{j}} e_{j}

λ_{k} = 0 \leq ∣ j ∣ \leq k \sum \frac{[ ℓ ] _{j}}{[ f ] _{j}} \mathds 1_{Ω_{j}} e_{j}

Λ_{γ}^{r} : = {λ \in L^{2} : λ \geq 0 and j \in Z \sum γ_{j} ∣ [λ]_{j} ∣^{2} = : ∥ λ ∥_{γ}^{2} \leq r}

Λ_{γ}^{r} : = {λ \in L^{2} : λ \geq 0 and j \in Z \sum γ_{j} ∣ [λ]_{j} ∣^{2} = : ∥ λ ∥_{γ}^{2} \leq r}

F_{α}^{d} : = {f \in L^{2} : f \geq 0, [f]_{0} = 1 and d^{- 1} \leq ∣ [f]_{j} ∣^{2} / α_{j} \leq d}

F_{α}^{d} : = {f \in L^{2} : f \geq 0, [f]_{0} = 1 and d^{- 1} \leq ∣ [f]_{j} ∣^{2} / α_{j} \leq d}

Ψ_{n}

Ψ_{n}

Φ_{m}

k_{n}^{\ast}=\operatornamewithlimits{argmin}_{k\in\mathbb{N}_{0}}\max\bigg{\{}\frac{\omega_{k}}{\gamma_{k}},\sum_{0\leq\left|j\right|\leq k}\frac{\omega_{j}}{n\alpha_{j}}\bigg{\}}.

k_{n}^{\ast}=\operatornamewithlimits{argmin}_{k\in\mathbb{N}_{0}}\max\bigg{\{}\frac{\omega_{k}}{\gamma_{k}},\sum_{0\leq\left|j\right|\leq k}\frac{\omega_{j}}{n\alpha_{j}}\bigg{\}}.

λ in f λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ - λ ∥_{ω}^{2}] \geq \frac{ζ r}{16 η} \cdot Ψ_{n}

λ in f λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ - λ ∥_{ω}^{2}] \geq \frac{ζ r}{16 η} \cdot Ψ_{n}

λ in f λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ - λ ∥_{ω}^{2}] \geq \frac{1 - 3 /2}{8} \cdot ζ^{2} r d^{- 1/2} \cdot Φ_{m}

λ in f λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ - λ ∥_{ω}^{2}] \geq \frac{1 - 3 /2}{8} \cdot ζ^{2} r d^{- 1/2} \cdot Φ_{m}

λ in f λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ - λ ∥_{ω}^{2}]

λ in f λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ - λ ∥_{ω}^{2}]

\displaystyle\geq\frac{1}{2}\bigg{\{}\frac{\zeta r}{16\eta}\cdot\Psi_{n}+\frac{1-\sqrt{3}/2}{8}\cdot\zeta^{2}rd^{-1/2}\cdot\Phi_{m}\bigg{\}}.

λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ_{k_{n}^{*}} - λ ∥_{ω}^{2}] ≲ Ψ_{n} + Φ_{m} .

λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ_{k_{n}^{*}} - λ ∥_{ω}^{2}] ≲ Ψ_{n} + Φ_{m} .

\widehat{k}\vcentcolon=\operatornamewithlimits{argmin}_{k\in\mathcal{M}_{n}}\big{\{}\Upsilon_{n}(\widehat{\lambda}_{k})+\textsc{pen}_{k}\big{\}},

\widehat{k}\vcentcolon=\operatornamewithlimits{argmin}_{k\in\mathcal{M}_{n}}\big{\{}\Upsilon_{n}(\widehat{\lambda}_{k})+\textsc{pen}_{k}\big{\}},

Δ_{k}^{α} = 0 \leq j \leq k max ω_{j} α_{j}^{- 1} and δ_{k}^{α} = (2 k + 1) Δ_{k}^{α} \frac{lo g ( Δ _{k}^{α} \lor ( k + 3 ))}{lo g ( k + 3 )} .

Δ_{k}^{α} = 0 \leq j \leq k max ω_{j} α_{j}^{- 1} and δ_{k}^{α} = (2 k + 1) Δ_{k}^{α} \frac{lo g ( Δ _{k}^{α} \lor ( k + 3 ))}{lo g ( k + 3 )} .

N_{n}^{α} = in f {1 \leq j \leq n : \frac{α _{j}}{2 j + 1} < \frac{lo g ( n + 3 ) ω _{j}^{+}}{n}} - 1 \land n,

N_{n}^{α} = in f {1 \leq j \leq n : \frac{α _{j}}{2 j + 1} < \frac{lo g ( n + 3 ) ω _{j}^{+}}{n}} - 1 \land n,

M_{m}^{α} = in f {1 \leq j \leq m : α_{j} < 640 d m^{- 1} lo g (m + 1)} - 1 \land m,

\textsc p e n_{k} = \frac{165}{2} d \cdot ([ℓ]_{0} \lor 1) \cdot \frac{δ _{k}^{α}}{n} .

\textsc p e n_{k} = \frac{165}{2} d \cdot ([ℓ]_{0} \lor 1) \cdot \frac{δ _{k}^{α}}{n} .

k = 0 \leq k \leq K_{nm}^{α} argmin {Υ (λ_{k}) + \textsc p e n_{k}} .

k = 0 \leq k \leq K_{nm}^{α} argmin {Υ (λ_{k}) + \textsc p e n_{k}} .

λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ_{k} - λ ∥_{ω}^{2}] ≲ 0 \leq k \leq K_{nm}^{α} min max {\frac{ω _{k}}{γ _{k}}, \frac{δ _{k}^{α}}{n}} + Φ_{m} + \frac{1}{m} + \frac{1}{n} .

λ \in Λ_{γ}^{r} sup f \in F_{α}^{d} sup E [∥ λ_{k} - λ ∥_{ω}^{2}] ≲ 0 \leq k \leq K_{nm}^{α} min max {\frac{ω _{k}}{γ _{k}}, \frac{δ _{k}^{α}}{n}} + Φ_{m} + \frac{1}{m} + \frac{1}{n} .

Δ_{k} = 0 \leq j \leq k max \frac{ω _{j}}{∣ [ f ] _{j} ∣ ^{2}} \mathds 1_{Ω_{j}} and δ_{k} = (2 k + 1) Δ_{k} \frac{lo g ( Δ _{k} \lor ( k + 4 ))}{lo g ( k + 4 )} .

Δ_{k} = 0 \leq j \leq k max \frac{ω _{j}}{∣ [ f ] _{j} ∣ ^{2}} \mathds 1_{Ω_{j}} and δ_{k} = (2 k + 1) Δ_{k} \frac{lo g ( Δ _{k} \lor ( k + 4 ))}{lo g ( k + 4 )} .

N_{n} = in f {1 \leq j \leq n : ∣ [f]_{j} ∣^{2} / (2 j + 1) < lo g (n + 4) ω_{j}^{+} / n} - 1 \land n,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Nonparametric intensity estimation from noisy observations of a Poisson process under unknown error distribution

Martin Kroll

ENSAE-ParisTech CREST

[email protected]

Abstract.

We consider the nonparametric estimation of the intensity function of a Poisson point process in a circular model from indirect observations $N_{1},\ldots,N_{n}$ . These observations emerge from hidden point process realizations with the target intensity through contamination with additive error. In case that the error distribution can only be estimated from an additional sample $Y_{1},\ldots,Y_{m}$ we derive minimax rates of convergence with respect to the sample sizes $n$ and $m$ under abstract smoothness conditions and propose an orthonormal series estimator which attains the optimal rate of convergence. The performance of the estimator depends on the correct specification of a dimension parameter whose optimal choice relies on smoothness characteristics of both the intensity and the error density. We propose a data-driven choice of the dimension parameter based on model selection and show that the adaptive estimator attains the minimax optimal rate.

Key words and phrases:

Resarch for this article was performed while I was PhD student at Universität Mannheim. Financial support by the Deutsche Forschungsgemeinschaft (DFG) through the Research Training Group RTG 1953 is gratefully acknowledged. I am indepted to my supervisors Jan Johannes and Martin Schlather for fruitful discussions and helpful comments on the paper.

1. Introduction

Point process models are used in a wide variety of applications, including, amongst others, stochastic geometry [Chi+13], extreme value theory [Res08], and queueing theory [Bré81]. Each realization of a point process is a random set of points $\{x_{j}\}$ which can alternatively be represented as an $\mathbb{N}_{0}$ -valued random measure $\sum_{j}\delta_{x_{j}}$ where $\delta_{\bullet}$ denotes the Dirac measure concentrated at $\bullet$ . Poisson point processes (PPPs) are of particular importance since they serve as the elementary building blocks for more complex point process models. Let $\mathbb{X}$ be a locally compact second countable Hausdorff space, $\mathscr{X}$ the corresponding Borel $\sigma$ -field and $\Lambda$ a locally finite measure on the measurable space $(\mathbb{X},\mathscr{X})$ , i.e., $\Lambda(C)<\infty$ for all relatively compact sets $C$ in $\mathscr{X}$ . A random set of points $N=\{x_{j}\}$ from $\mathbb{X}$ (resp. the random measure $N=\sum_{j}\delta_{x_{j}}$ ) is called Poisson point process with intensity measure $\Lambda$ if (i) the number $N_{C}=|N\cap C|$ of points located in $C$ follows a Poisson distribution with parameter $\Lambda(C)$ for all relatively compact $C\in\mathscr{X}$ , and (ii) for all $n\in\mathbb{N}$ and disjoint sets $A_{1},\ldots,A_{n}\in\mathscr{X}$ , the random variables $N_{A_{1}},\ldots,N_{A_{n}}$ are independent. It is well-known that the distribution of a PPP is completely determined by its intensity measure. Hence, from a statistical point of view, the (nonparametric) estimation of the intensity measure or its Radon-Nikodym derivative (the intensity function) with respect to some dominating measure from observations of the point process is of fundamental importance.

Inference and testing problems for Poisson and more general point processes have been tackled in a wide range of scenarios. The monographs [Kar91] and [Kut98] offer a comprehensive overview and discuss both parametric and nonparametric methods. From a methodological point of view, our approach in this paper is related to the article [Rey03] where the estimation of the intensity function from direct observations was studied using concentration inequalities.

Other approaches to nonparametric intensity estimation from direct observations, without making a claim to be exhaustive, can be found in [BB09] (where the performance of a histogram estimator under Hellinger loss is analysed), [Bir07] (using a testing approach to model selection), [GN00] (using a minimum complexity estimator in the Aalen model), and [PW04] (suggesting a wavelet estimator in the multiplicative intensity model).

Theoretical work on intensity estimation has recently been motivated by applications to genomic data. The model considered in the article [Big+13] is motivated by data arising throughout the processing of DNA ChIP-seq data. The article [San14] takes its motivation from the analysis of genomic data as well. In addition, let us mention two further articles where the development of nonparametric statistical methods for the analysis of point processes was inspired by applications from biology: first, motivated through DNA sequencing techniques, the article [SZ12] introduces a change-point model for nonhomogeneous Poisson processes occurring in molecular biology. Second, the article [ZK10] considered the nonparametric inference of Cox process data by means of a kernel type estimator.

Usually one aims to estimate the intensity function $\lambda$ from direct observations $\widetilde{N}_{1},\ldots,\widetilde{N}_{n}$ where

[TABLE]

are realizations of a PPP with the target intensity $\lambda$ . In this paper, however, we assume that we are interested in the nonparametric estimation of the intensity function $\lambda$ without having access to the observations in (1). Instead, we are in the setup of a Poisson inverse problem [AB06] where we can only observe $N_{1},\ldots,N_{n}$ given through

[TABLE]

The indirect observations $N_{i}$ are related to the hidden $\widetilde{N}_{i}$ by the identity $y_{ij}=x_{ij}+\varepsilon_{ij}-\lfloor x_{ij}+\varepsilon_{ij}\rfloor$ . The definition of the $y_{ij}$ as the fractional part of the additively contaminated $x_{ij}$ yields a circular model by means of the usual topological identification of the interval $[0,1)$ and the circle of perimeter $1$ .

In contrast to our approach, the few existing papers on Poisson inverse problems [CK02, AB06, Big+13] assume the error distribution to be known. This conservative assumption is also standard in research articles dealing with classical deconvolution problems [Mei09]. If the error density is unknown, even identifiability of the statistical model is not guaranteed. Several remedies have been introduced to overcome this problem: for instance, it is possible to impose additional assumptions on the statistical model (e.g., [SV10] which deals with blind convolution under additive Gaussian noise with unknown variance). Alternatively, one can consider a framework with panel data [Neu07]. Finally, one can assume the availability of an additional sample from the error density (e.g., [DH93, Joh09, CL10, CL11]) to guarantee identifiability and enable inference. In this paper, we will stick to this last option.

Let us assume that the errors $\varepsilon_{ij}$ in the general model (2) are i.i.d. $\sim f$ for some unknown error density $f$ . We will study the resulting model and consider the nonparametric estimation of the intensity function from observations

[TABLE]

where the $N_{i}$ are given as in (2). A natural aim here is to detect optimal rates of convergence in terms of the sample sizes $n$ and $m$ and to construct adaptive estimators attaining these rates. Note that the observation of $n$ i.i.d. processes $N_{1},\ldots,N_{n}$ with intensity $\lambda$ is equivalent to the observation of one process $N$ with intensity $n\lambda$ , and both directions of this equivalence can easily be made rigorous. In order to obtain $N$ from the $N_{1},\ldots,N_{n}$ , put $N=N_{1}\cup\ldots\cup N_{n}$ (denoting by $\cup$ the set-theoretic union of point processes; this shows the infinite divisibility of Poisson point processes). For the other direction, given $N$ and $n$ , it suffices to assign every point $x\in N$ to one of the processes $N_{1},\ldots,N_{n}$ with equal probability.

From a methodological point of view, our approach is inspired by the one conducted in [JS13]. We consider orthonormal series estimators of the form

[TABLE]

where $\mathbf{e}_{j}(\cdot)=\exp(2\pi ij\cdot)$ and $\widehat{[\lambda]}_{j}$ is an appropriate estimator of the Fourier coefficient ${[\lambda]}_{j}$ corresponding to the basis function $\mathbf{e}_{j}(\cdot)$ (see Section 2 for details). Of course, this estimator is motivated by the $\mathbb{L}^{2}$ -convergent representation $\lambda=\sum_{j\in\mathbb{Z}}[\lambda]_{j}\mathbf{e}_{j}$ for square-integrable $\lambda$ . It turns out that the performance of the estimator $\widehat{\lambda}_{k}$ crucially depends on the choice of the dimension parameter $k$ and that its optimal value depends on smoothness characteristics of the intensity that are usually not available in practice. In order to choose $k$ in a completely data-driven manner, we follow an approach based on model selection (see [BBM99, Com15]) and select the dimension parameter as the minimizer of a penalized contrast criterion. For the theoretical analysis of the adaptive estimator we need Talagrand type concentration inequalities tailored to the framework with PPP observations which cannot be directly transferred from results applied in the usual density estimation or deconvolution frameworks (see Remark 2.2 in [Kro16]). These inequalities have already been derived in a separate manuscript [Kro16], and we only state the necessary consequences of these results in the appendix. The article is organized as follows: in Section 2 we introduce our methodological approach. In Section 3 we study the nonparametric estimation problem from a minimax point of view. Section 4 considers adaptive estimation of the intensity for the Poisson model. Proofs are given in Section A and B.

2. Methodology

2.1. Notation

Throughout this work we assume that the intensity $\lambda$ and the density $f$ belong to the space $\mathbb{L}^{2}=\mathbb{L}^{2}([0,1),\mathrm{d}x)$ of square-integrable functions on the interval $[0,1)$ . Let $\{\mathbf{e}_{j}\}_{j\in\mathbb{Z}}$ be the complex trigonometric basis of $\mathbb{L}^{2}$ given by $\mathbf{e}_{j}(t)=\exp(2\pi ijt)$ . The Fourier coefficients of a function $g\in\mathbb{L}^{2}$ are denoted as follows:

[TABLE]

For a strictly positive symmetric sequence $\omega=(\omega_{j})_{j\in\mathbb{Z}}$ we introduce the weighted norm $\lVert\cdot\rVert_{\omega}$ defined via $\lVert g\rVert_{\omega}^{2}=\sum_{j\in\mathbb{Z}}\omega_{j}\lvert[g]_{j}\rvert^{2}$ . The corresponding scalar product is denoted with $\langle g,h\rangle_{\omega}=\sum_{j\in\mathbb{Z}}\omega_{j}[g]_{j}\overline{[h]}_{j}$ . Throughout the paper, we use the notation $a(n,m)\lesssim b(n,m)$ if $a(n,m)\leq C\cdot b(n,m)$ for some numerical constant $C$ independent of $n$ and $m$ .

2.2. The minimax point of view

We evaluate the performance of an arbitrary estimator $\widetilde{\lambda}$ of $\lambda$ by means of the mean integrated weighted squared loss $\mathbb{E}[\|\widetilde{\lambda}-\lambda\|_{\omega}^{2}]$ . We take up the minimax point of view and consider the maximum risk defined by

[TABLE]

where $\Lambda$ and $\mathcal{F}$ are classes of potential intensity functions $\lambda$ and densities $f$ , respectively. The minimax risk is defined via

[TABLE]

where the infimum is taken over all estimators $\widetilde{\lambda}$ of $\lambda$ . An estimator $\lambda^{\ast}$ is called rate optimal if

[TABLE]

By allowing for general weight sequences $\omega$ , we can treat both the estimation of $\lambda$ (in this case, $\omega\equiv 1$ ) as well as the estimation of derivatives (take $\omega_{j}=j^{2s}$ for $|j|\geq 1$ for the $s$ -th derivative). The classes $\Lambda$ of intensity functions and $\mathcal{F}$ of densities to be considered in this article will be specified in Section 3 below where we derive lower bounds on the minimax risk for these specific choices and prove that this lower bound is attained up to a numerical constant by a suitably defined orthonormal series estimator.

2.3. Sequence space representation

Under the considered model, the observed point processes $N_{1},\ldots,N_{n}$ in (3) are generated from independent Poisson point processes $\widetilde{N}_{1},\ldots,\widetilde{N}_{n}$ with intensity function $\lambda$ by independent random contaminations of the individual points. We emphasize again that the (unobserved) contaminations are assumed to follow a probability law given by an unknown density $f$ and are to be understood additively modulo $1$ . Thus, the observations $N_{i}$ under the Poisson model are given by

[TABLE]

where $\widetilde{N}_{i}=\sum_{j}\delta_{x_{ij}}$ is the realization of a Poisson point process with intensity function $\lambda$ and the errors $\varepsilon_{ij}$ are i.i.d. $\sim f$ . Note that each $N_{i}$ is again a realization of a Poisson point process whose intensity function is given by the circular convolution $\lambda\star f$ modulo 1 of $\lambda$ with the error density $f$ . More precisely, $\ell=\lambda\star f$ is given by the formula

[TABLE]

By the convolution theorem, we have $[\ell]_{j}=[\lambda]_{j}\cdot[f]_{j}$ for all $j\in\mathbb{Z}$ . From Campbell’s theorem (cf. [Ser09], Chapter 3, Theorem 24) it can be deduced that for measurable functions $g$ we have $\mathbb{E}[\int_{0}^{1}g(t)\mathrm{d}N_{i}(t)]=\int_{0}^{1}\ell(t)g(t)\mathrm{d}t$ provided that the integral on the right-hand side exists. Exploiting this equation for $g(t)=\mathbf{e}_{j}(-t)$ and setting

[TABLE]

we thus obtain that $\mathbb{E}\widehat{[\ell]}_{j}=[\lambda]_{j}\cdot[f]_{j}$ for all $j\in\mathbb{Z}$ . More precisely, we have

[TABLE]

where $\xi_{j}=\widehat{[\ell]}_{j}-\mathbb{E}\widehat{[\ell]}_{j}=\frac{1}{n}\sum_{i=1}^{n}[\int_{0}^{1}\mathbf{e}_{j}(-t)\mathrm{d}N_{i}(t)-\int_{0}^{1}\ell(t)\mathbf{e}_{j}(-t)\mathrm{d}t].$

2.4. Orthonormal series estimator

In view of (6) and the fact that $\mathbb{E}\xi_{j}=0$ , a natural estimator of $\lambda$ is given by

[TABLE]

with $\widehat{[\ell]}_{j}$ as defined in (5), $\widehat{[f]}_{j}\vcentcolon=\frac{1}{m}\sum_{i=1}^{m}\mathbf{e}_{j}(-Y_{i})$ and $\Omega_{j}\vcentcolon=\{|\widehat{[f]}_{j}|^{2}\geq 1/m\}$ . Note that $[f]_{j}$ in (6) is not directly available and thus has to be estimated from the sample $Y_{1},\ldots,Y_{m}$ in (3). The additional threshold occurring in the definition of $\widehat{\lambda}_{k}$ through the indicator function over the set $\Omega_{j}$ compensates for ’too small’ absolute values of $\widehat{[f]}_{j}$ and is imposed in order to avoid unstable behaviour of the estimator. The optimal choice ${k_{n}^{\ast}}$ of the dimension parameter in the minimax framework will be determined in Section 3 and depends on the classes $\Lambda$ and $\mathcal{F}$ . The data-driven choice of the dimension parameter is discussed in Section 4.

3. Minimax theory

3.1. Model assumptions

Let $\gamma=(\gamma_{j})_{j\in\mathbb{Z}}$ and $\alpha=(\alpha_{j})_{j\in\mathbb{Z}}$ be strictly positive symmetric sequences and fix $r>0,d\geq 1$ . In this section, we derive minimax rates of convergence concerning the maximum risk defined in (4) with respect to the classes

[TABLE]

and

[TABLE]

of intensity functions and error densities, respectively. We now state some regularity conditions imposed on the sequences $\gamma$ and $\alpha$ .

Assumption 1.

$\gamma=(\gamma_{j})_{j\in\mathbb{Z}}$ , $\alpha=(\alpha_{j})_{j\in\mathbb{Z}}$ and $\omega=(\omega_{j})_{j\in\mathbb{Z}}$ are strictly positive symmetric sequences such that $\gamma_{0}=\omega_{0}=\alpha_{0}=1$ , $\gamma_{j}\geq 1$ for all $j\in\mathbb{Z}$ and the sequences $(\omega_{n}/\gamma_{n})_{n\in\mathbb{N}_{0}}$ and $(\alpha_{n})_{n\in\mathbb{N}_{0}}$ are both non-increasing. Finally, $\rho\vcentcolon=\sum_{j\in\mathbb{Z}}\alpha_{j}<\infty$ .

3.2. Minimax lower bounds

The following two theorems provide minimax lower bounds in terms of the sample sizes $n$ and $m$ in (3), respectively. To state our results, we put

[TABLE]

By the results of this section, $\Psi_{n}$ and $\Phi_{m}$ will turn out to be the optimal (up to constants) rates of convergence in terms of $n$ and $m$ . The two terms over which the maximum is taken in the definition of $\Psi_{n}$ can be interpreted as a squared bias term and a variance term, respectively. The rate in $n$ should then be obtained by choosing the truncation value such that the maximum of these two terms is minimized. This suggests to choose the truncation parameter as

[TABLE]

Our first theorem establishes a lower bound in terms of $n$ .

Theorem 1.

Let Assumption 1 hold, and further assume that

(C1)

$\Gamma\vcentcolon=\sum_{j\in\mathbb{Z}}\gamma_{j}^{-1}<\infty$ , and 2. (C2)

$0<\eta^{-1}={\inf_{n\in\mathbb{N}}\Psi_{n}^{-1}\cdot\min\big{\{}\frac{\omega_{k_{n}^{\ast}}}{\gamma_{k_{n}^{\ast}}},\sum_{0\leq\left|j\right|\leq k_{n}^{\ast}}\frac{\omega_{j}}{n\alpha_{j}}\big{\}}}$ * for some $1\leq\eta<\infty$ .*

Then, for any $n\in\mathbb{N}$ ,

[TABLE]

where $\zeta=\min\{\frac{1}{2\Gamma d\eta},\frac{2\delta}{d\sqrt{r}}\}$ with $\delta=\frac{1}{2}-\frac{1}{2\sqrt{2}}$ , and the infimum is taken over all estimators $\widetilde{\lambda}$ of $\lambda$ based on the observations from (3).

As the proof Theorem 1 shows, the lower bound $\Psi_{n}$ , which does not depend on the sample size of the auxiliary sample from the error density, is valid also in case of a known error density. The potential deterioration of the overall rate of convergence in contrast to this case is introduced by the uncertainty concerning the error density $f$ . Since this uncertainty is quantified by the sample size $m$ , one would expect a dependence of the lower bound on $m$ as well. This intuition is made rigorous by means of the following theorem.

Theorem 2.

Let Assumption 1 hold, and in addition assume that

(C3)

there exists a density $f$ in $\mathcal{F}_{\alpha}^{\sqrt{d}}$ with $f\geq 1/2$ .

Then, for any $m\in\mathbb{N}$ ,

[TABLE]

where $\zeta=\min\{1/(4\sqrt{d}),1-d^{-1/4}\}$ and the infimum is taken over all estimators $\widetilde{\lambda}$ of $\lambda$ based on the observations from (3).

The next corollary is an immediate consequence of Theorems 1 and 2.

Corollary 3.

Under the assumptions of Theorems 1 and 2, for any $n,m\in\mathbb{N}$ ,

[TABLE]

Note that the contributions of the sample sizes $n$ and $m$ to the overall lower bound are separated from another, and the rate is determined by the maximum of $\Psi_{n}$ and $\Phi_{m}$ . This phenomenon has already been observed in the related problem of density estimation [Joh09, CL11, JS13] and other inverse problems with unknown operator [Del+12, JS13a]. In addition, it can be seen from the mere definition of $\Psi_{n}$ and $\Phi_{m}$ that the rate in terms of $m$ is always faster then the one in $n$ . Hence, as long as $m\geq n$ , there is no deterioration in the rate in comparison to the setup with known error density (see Table 1 for a more detailed evaluation of the rates in some special cases).

3.3. Upper bound

Let us now establish an upper bound for the maximum risk in terms of $n$ and $m$ for the estimator $\widehat{\lambda}_{k}$ in (7) under a suitable choice of the dimension parameter $k$ . More precisely, the following theorem establishes an upper bound for the rate of convergence of $\widehat{\lambda}_{{k_{n}^{\ast}}}$ with $k_{n}^{*}$ defined in Equation (8). Thus, due to the lower bound proofs in the preceding subsection it is shown that $\widehat{\lambda}_{k_{n}^{*}}$ attains the minimax rates of convergence in terms of the samples sizes $n$ and $m$ . Note that this rate optimal choice ${k_{n}^{\ast}}$ of the dimension parameter does not depend on the sample size $m$ (recall Equation (8) for its definition, and note that none of the quantities appearing there depends on $m$ ). The non-dependence of the rate-optimal smoothing parameter can also been observed in the related model of circular density deconvolution with unknown error density considered in [JS13].

Theorem 4.

Let Assumption 1 hold.Then, for any $n,m\in\mathbb{N}$ ,

[TABLE]

3.4. Examples of convergence rates

Fixing $\omega_{0}=1$ , $\omega_{j}=|j|^{2s}$ for $j\neq 0$ and some $s\geq 0$ , we consider specific choices of the sequences $\gamma$ and $\alpha$ and state the resulting rates with respect to both sample sizes $n$ and $m$ .

Choices for the sequence $\gamma$ :

•

(pol): $\gamma_{0}=0$ and $\gamma_{j}=|j|^{2p}$ for all $j\neq 0$ and some $p\geq 0$ . This corresponds to the case when the unknown intensity function belongs to some Sobolev space.

•

(exp): $\gamma_{j}=\exp(2p|j|)$ for all $j\in\mathbb{Z}$ and some $p\geq 0$ . In this case, $\lambda$ belongs to some space of analytic functions.

Choices for the sequence $\alpha$ :

•

(pol): $\alpha_{0}=0$ and $\alpha_{j}=|j|^{-2a}$ for all $j\neq 0$ and some $a\geq 0$ . This corresponds to the case when the error density is ordinary smooth.

•

(exp): $\alpha_{j}=\exp(-2a|j|)$ for all $j\in\mathbb{Z}$ and some $a\geq 0$ .

Table 1 summarizes the rates $\Psi_{n}$ and $\Phi_{m}$ for the different choices of $\gamma$ and $\alpha$ . The rates in terms of $n$ coincide formally with the classical rates for nonparametric inverse problems (see [Fan91, Lac06], for instance). The rates in $m$ are of the same order as those that have already been obtained in the related model of (circular) density deconvolution with unknown error density in [Joh09, CL11, JS13]. They can also be compared with the rates in the indirect Gaussian sequence model with partially known operator [JS13a], which provides a benchmark model for a variety of nonparametric inverse problems.

4. Adaptive estimation

The estimator considered in Theorem 4 is obtained by specializing the estimator in (7) with the truncation parameter ${k_{n}^{\ast}}$ . This procedure suffers from the apparent drawback that the resulting estimator depends on the knowledge of the classes $\Lambda_{\gamma}^{r}$ and $\mathcal{F}_{\alpha}^{d}$ . In this section, we provide adaptive choices of the truncation parameter based on model selection (see [BBM99, Mas07] for comprehensive presentations in the context of nonparametric estimation). The principal idea of model selection procedures consists in defining a truncation parameter $\widehat{k}$ in a fully data-driven way as the minimizer of a penalized empirical contrast,

[TABLE]

where $\Upsilon_{n}:\mathcal{S}_{n}\to\mathbb{R}$ is a contrast function with $\mathcal{S}_{n}$ being the linear subspace of $\mathbb{L}^{2}$ spanned by the functions $\mathbf{e}_{j}(\cdot)$ for $j\in\{-n,\ldots,n\}$ , $\textsc{pen}_{k}$ a (as a function of $k$ ) non-decreasing penalty that mimics the variance, and $\mathcal{M}_{n}$ a set of admissible values of $k$ (which represents the set of admissible models since each choice of $k$ corresponds to a finite dimensional model which is given by the functions spanned by the basis functions $\mathbf{e}_{j}(\cdot)$ with $j\in\{-k,\ldots,k\}$ ).

In order to construct an adaptive estimator which does not require any a priori knowledge of $\Lambda_{\gamma}^{r}$ and $\mathcal{F}_{\alpha}^{d}$ , we proceed in two steps: in the first step, we assume that $\Lambda_{\gamma}^{r}$ is unknown but $\mathcal{F}_{\alpha}^{d}$ known. Hence, the overall estimation procedure (in particular, the definition of the penalty term) might still depend on the knowledge of the sequence $\alpha=(\alpha_{j})_{j\in\mathbb{Z}}$ . This results in a partially adaptive definition $\widetilde{k}$ of the truncation parameter. In the second step, we dispense with any knowledge on the classes $\Lambda_{\gamma}^{r}$ and $\mathcal{F}_{\alpha}^{d}$ and propose a fully data-driven choice $\widehat{k}$ of the truncation parameter.

4.1. Partially adaptive estimation ( $\Lambda_{\gamma}^{r}$ unknown, $\mathcal{F}_{\alpha}^{d}$ known)

For the definition of our partially adaptive choice of the dimension parameter we introduce some notation: for any $k\in\mathbb{N}_{0}$ , let

[TABLE]

For all $n,m\in\mathbb{N}$ , setting $\omega_{j}^{+}=\max_{0\leq i\leq j}\omega_{i}$ , we define

[TABLE]

and set $K_{nm}^{\alpha}=N_{n}^{\alpha}\wedge M_{m}^{\alpha}$ . Now, for $t\in\mathbb{L}^{2}$ , define the contrast $\Upsilon(t)=\|t\|_{\omega}^{2}-2\Re\langle\widehat{\lambda}_{n\wedge m},t\rangle_{\omega}$ and the random sequence of penalties $(\widetilde{\textsc{pen}}_{k})_{k\in\mathbb{N}_{0}}$ via

[TABLE]

Building on our definition of contrast and penalty, we define the partially adaptive selection of the dimension parameter $k$ as

[TABLE]

Theorem 5.

Let Assumption 1 hold. Then, for any $n,m\in\mathbb{N}$ ,

[TABLE]

4.2. Fully adaptive estimation ( $\Lambda_{\gamma}^{r}$ and $\mathcal{F}_{\alpha}^{d}$ unknown)

We now also dispense with the knowledge of the smoothness of the error density and propose a fully data-driven selection ${\widehat{k}}$ of the dimension parameter. As in the case of partially adaptive estimation, we have to introduce some notation first. For $k\in\mathbb{N}_{0}$ , let

[TABLE]

For $n,m\in\mathbb{N}$ , set

[TABLE]

and ${\widehat{K}_{nm}}=\widehat{N}_{n}\wedge\widehat{M}_{m}$ . We consider the same contrast function as in the partially adaptive case but define the random sequence $(\widehat{\textsc{pen}}_{k})_{k\in\mathbb{N}_{0}}$ of penalities now by

[TABLE]

which does no longer depend on $\alpha$ nor $d$ . Finally, set

[TABLE]

In order to state and prove the upper risk bound of the estimator $\widehat{\lambda}_{\widehat{k}}$ , we have to introduce some further notation. We keep the definition of $\Delta_{k}^{\alpha}$ from Subsection 4.1 but slightly redefine $\delta_{k}^{\alpha}$ as

[TABLE]

For $k\in\mathbb{N}_{0}$ , we also define

[TABLE]

which can be regarded as analogues of $\Delta_{k}^{\alpha}$ and $\delta_{k}^{\alpha}$ in Subsection 4.1 in the case of a known error density $f$ . Finally, for $n,m\in\mathbb{N}$ , define

[TABLE]

and set ${K_{nm}^{\alpha-}}={N_{n}^{\alpha-}}\wedge{M_{m}^{\alpha-}}$ , ${K_{nm}^{\alpha+}}={N_{n}^{\alpha+}}\wedge{M_{m}^{\alpha+}}$ . In contrast to the proof of Theorem 5 we have to impose an additional assumption for the proof of an upper risk bound of $\widehat{\lambda}_{{\widehat{k}}}$ .

Assumption 2.

$\exp(-m\alpha_{{M_{m}^{\alpha+}}+1}/(128d))\leq C(\alpha,d)m^{-5}$ * for all $m\in\mathbb{N}$ .*

Theorem 6.

Let Assumptions 1 and 2 hold. Then, for any $n,m\in\mathbb{N}$ ,

[TABLE]

Note that the only additional prerequisite of Theorem 6 in contrast to Theorem 5 is the validity of Assumption 2.

4.3. Examples of convergence rates (continued from Subsection 3.4)

We consider the same configurations for the sequences $\omega$ , $\gamma$ and $\alpha$ as in Subsection 3.4. In particular, we assume that $\omega_{0}=1$ and $\omega_{j}=|j|^{2s}$ for all $j\neq 0$ . The different configurations for $\gamma$ and $\alpha$ will be investigated in the following (compare also with the minimax rates of convergence given in Table 1). Note that the additional Assumption 2 is satisfied in all the considered cases. Let us define ${k_{n}^{\diamond}}=\operatornamewithlimits{argmin}_{k\in\mathbb{N}_{0}}\max\left\{\omega_{k}/\gamma_{k},\delta_{k}^{\alpha}/n\right\}$ , that is, ${k_{n}^{\diamond}}$ realizes the best compromise between squared bias and penalty.

Scenario (pol)-(pol):

In this scenario, it holds ${k_{n}^{\diamond}}\asymp n^{1/(2p+2a+1)}$ and ${N_{n}^{\alpha-}}\asymp(n/\log n)^{1/(2s+2a+1)}$ . First assume that ${N_{n}^{\alpha-}}\leq{M_{m}^{\alpha-}}$ . In case that $s<p$ , the rate with respect to $n$ is $n^{-2(p-s)/(2p+2a+1)}$ which is the minimax optimal rate. In case that $s=p$ , it holds ${N_{n}^{\alpha-}}\lnsim{k_{n}^{\diamond}}$ and the rate is $(n/\log n)^{-2(p-s)/(2p+2a+1)}$ which is minimax optimal up to a logarithmic factor. Assume now that ${M_{m}^{\alpha-}}\leq{N_{n}^{\alpha-}}$ . If ${k_{n}^{\diamond}}\lesssim{M_{m}^{\alpha-}}$ , then the estimator obtains the optimal rate with respect to $n$ . Otherwise, ${M_{m}^{\alpha-}}\asymp(m/\log m)^{1/(2a)}$ yields the contribution $(m/\log m)^{-(p-s)/a}$ to the rate.

Scenario (exp)-(pol):

${N_{n}^{\alpha-}}\asymp(n/\log n)^{1/(2a+2s+1)}$ as in scenario (pol)-(pol). Since ${k_{n}^{\diamond}}\asymp\log n$ , it holds ${k_{n}^{\diamond}}\lesssim{N_{n}^{\alpha-}}$ and the optimal rate with respect to $n$ holds in case that ${k_{n}^{\diamond}}\lesssim{M_{m}^{\alpha-}}$ . Otherwise, the bias-penalty tradeoff generates the contribution $({M_{m}^{\alpha-}})^{2s}\cdot\exp(-2p\cdot{M_{m}^{\alpha-}})$ to the rate.

Scenario (pol)-(exp):

It holds that ${k_{n}^{\diamond}}\asymp{N_{n}^{\alpha-}}$ and again the sample size $n$ is no obstacle for attaining the optimal rate of convergence. If ${k_{n}^{\diamond}}\lesssim{M_{m}^{\alpha-}}$ , the optimal rate holds as well. If ${M_{m}^{\alpha-}}\lnsim{k_{n}^{\diamond}}$ , we get the rate $(\log m)^{-2(p-s)}$ which coincides with the optimal rate with respect to the sample size $m$ .

Scenario (exp)-(exp):

We have ${N_{n}^{\alpha-}}\asymp\log n$ and $k_{1}\leq{k_{n}^{\diamond}}\leq k_{2}$ where $k_{1}$ is the solution of $k_{1}^{2}\exp((2a+2p)k_{1})\asymp n$ and $k_{2}$ the solution of $\exp((2a+2p)k_{2})\asymp n$ . Thus, we have ${k_{n}^{\diamond}}\lnsim{N_{n}^{\alpha-}}$ and computation of $\omega_{k_{1}}/\gamma_{k_{1}}$ resp. $\delta_{k_{2}}^{\alpha}/n$ shows that only a loss by a logarithmic factor can occur as far as ${k_{n}^{\diamond}}\leq{N_{n}^{\alpha-}}\wedge{M_{m}^{\alpha-}}$ . If ${M_{m}^{\alpha-}}\leq{k_{n}^{\diamond}}$ , the contribution to the rate arising from the trade-off between squared bias and penalty is determined by $({M_{m}^{\alpha-}})^{2s}\cdot\exp(-2p{M_{m}^{\alpha-}})$ which deteriorates the optimal rate with respect to $m$ at most by a logarithmic factor.

Appendix A Proofs of Section 3

A.1. Proof of Theorem 1

Let us define $\zeta$ as in the statement of the theorem and for each $\theta=(\theta_{j})_{0\leq j\leq k_{n}^{*}}\in\{\pm 1\}^{k_{n}^{*}+1}$ the function $\lambda_{\theta}$ through

[TABLE]

Then each $\lambda_{\theta}$ is a real-valued function by definition which is non-negative since we have

[TABLE]

Moreover $\left\lVert\lambda_{\theta}\right\rVert_{\gamma}^{2}\leq r$ holds for each $\theta\in\{\pm 1\}^{{k_{n}^{\ast}}+1}$ due to the estimate

[TABLE]

This estimate and the non-negativity of $\lambda_{\theta}$ together imply $\lambda_{\theta}\in\Lambda_{\gamma}^{r}$ for all $\theta\in\{\pm 1\}^{k_{n}^{*}+1}$ . From now on let $f\in\mathcal{F}_{\alpha}^{d}$ be fixed and let $\mathbb{P}_{\theta}$ denote the joint distribution of the i.i.d. samples $N_{1},\ldots,N_{n}$ and $Y_{1},\ldots,Y_{m}$ when the true parameters are $\lambda_{\theta}$ and $f$ , respectively. Let $\mathbb{P}_{\theta}^{N_{i}}$ denote the corresponding one-dimensional marginal distributions and $\mathbb{E}_{\theta}$ the expectation with respect to $\mathbb{P}_{\theta}$ . Let $\widetilde{\lambda}$ be an arbitrary estimator of $\lambda$ . The key argument of the proof is the following reduction scheme:

[TABLE]

where for $\theta\in\{\pm 1\}^{k_{n}^{*}+1}$ and $j\in\{-{k_{n}^{\ast}},\ldots,{k_{n}^{\ast}}\}$ the element $\theta^{(|j|)}\in\{\pm 1\}^{k_{n}^{*}+1}$ is defined by $\theta^{(|j|)}_{k}=\theta_{k}$ for $k\neq|j|$ and $\theta^{(|j|)}_{|j|}=-\theta_{|j|}$ . Consider the Hellinger affinity $\rho(\mathbb{P}_{\theta},\mathbb{P}_{\theta^{(|j|)}})\vcentcolon=\int\sqrt{d\mathbb{P}_{\theta}d\mathbb{P}_{\theta^{(|j|)}}}$ . For an arbitrary estimator $\widetilde{\lambda}$ of $\lambda$ we have

[TABLE]

from which we conclude by means of the elementary inequality $(a+b)^{2}\leq 2a^{2}+2b^{2}$ that

[TABLE]

Define the Hellinger distance between two probability measures $\mathbb{P}$ and $\mathbb{Q}$ as $H(\mathbb{P},\mathbb{Q})=(\int[\sqrt{d\mathbb{P}}-\sqrt{d\mathbb{Q}}]^{2})^{1/2}$ and, analogously, the Hellinger distance between two finite measures $\nu$ and $\mu$ (that not necessarily have total mass equal to one) by $H(\nu,\mu)=(\int[\sqrt{d\nu}-\sqrt{d\mu}]^{2})^{1/2}$ (as usual, the integral is formed with respect to any measure dominating both $\nu$ and $\mu$ ). Let $\nu_{\theta}$ denote the intensity measure of a Poisson point process $N$ on $[0,1)$ whose Radon-Nikodym derivative with respect to the Lebesgue measure is given by $\ell_{\theta}=\lambda_{\theta}\star f$ . Note that we have the estimate $\ell_{\theta}\geq\delta\sqrt{r}$ for all $\theta\in\{\pm 1\}^{k_{n}^{*}+1}$ with $\delta=\frac{1}{2}-\frac{1}{2\sqrt{2}}$ due to

[TABLE]

which can be realized in analogy to the non-negativity of $\lambda_{\theta}$ shown above. We have

[TABLE]

Since the distribution of the sample $Y_{1},\ldots,Y_{m}$ does not depend on the choice of $\theta$ we obtain

[TABLE]

where the first estimate follows from Lemma 3.3.10 (i) in [Rei89] and the second one is due to Theorem 3.2.1 in [Rei93] which can be applied since each $N_{i}$ is a Poisson point process for the Poisson model. Thus, the relation $\rho(\mathbb{P}_{\theta},\mathbb{P}_{\theta^{(|j|)}})=1-\frac{1}{2}H^{2}(\mathbb{P}_{\theta},\mathbb{P}_{\theta^{(|j|)}})$ implies $\rho(\mathbb{P}_{\theta},\mathbb{P}_{\theta^{(|j|)}})\geq\frac{1}{2}$ . Finally, putting the obtained estimates into the reduction scheme (9) leads to

[TABLE]

which finishes the proof of the theorem since $\widetilde{\lambda}$ was arbitrary. ∎

A.2. Proof of Theorem 2

By Markov’s inequality we have for an arbitrary estimator $\widetilde{\lambda}$ of $\lambda$ and $A>0$ (which will be specified below)

[TABLE]

which by reduction to two hypotheses implies

[TABLE]

where $\mathbb{P}_{\theta}$ denotes the distribution when the true parameters are $\lambda_{\theta}$ and $f_{\theta}$ . The specific hypotheses $\lambda_{1},\lambda_{-1}$ and $f_{1},f_{-1}$ will be specified below. If $\lambda_{-1}$ and $\lambda_{1}$ can be chosen such that $\left\lVert\lambda_{1}-\lambda_{-1}\right\rVert_{\omega}^{2}\geq 4A\Phi_{m}$ , application of the triangle inequality yields

[TABLE]

where $\tau^{*}$ is the minimum distance test given by $\tau^{*}=\arg\min_{\theta\in\{\pm 1\}}\|\widetilde{\lambda}-\lambda_{\theta}\|_{\omega}^{2}$ . Hence, we obtain

[TABLE]

where the infimum is taken over all $\{\pm 1\}$ -valued functions $\tau$ based on the observations. Thus, it remains to find hypotheses $\lambda_{1},\lambda_{-1}\in\Lambda_{\gamma}^{r}$ and $f_{1},f_{-1}\in\mathcal{F}_{\alpha}^{d}$ such that

[TABLE]

and which allow us to bound $p^{*}$ by a universal constant (independent of $m$ ) from below. For this purpose, set ${k_{m}^{\ast}}=\arg\max_{j\geq 1}\{\omega_{j}/\gamma_{j}\min(1,\frac{1}{m\alpha_{j}})\}$ and $a_{m}=\zeta\min(1,m^{-1/2}\alpha_{k_{m}^{*}}^{-1/2})$ , where $\zeta$ is defined as in the statement of the theorem. Take note of the inequalities $1/d^{1/2}=(1-(1-1/d^{1/4}))^{2}\leq(1-a_{m})^{2}\leq 1$ and $1\leq(1+a_{m})^{2}\leq(1+(1-1/d^{1/4}))^{2}=(2-1/d^{1/4})^{2}\leq d^{1/2}$ which in combination imply $1/d^{1/2}\leq(1+\theta a_{m})^{2}\leq d^{1/2}$ for $\theta\in\{\pm 1\}$ . These inequalities will be used below without further reference. For $\theta\in\{\pm 1\}$ , we define

[TABLE]

Furthermore, we have

[TABLE]

and $\left|\lambda_{\theta}(t)\right|\geq\left(\frac{r}{2}\right)^{1/2}-2\left(\frac{r}{8}\right)^{1/2}\geq 0\text{ for all }t\in[0,1)$ which together imply that $\lambda_{\theta}\in\Lambda_{\gamma}^{r}$ for $\theta\in\{\pm 1\}$ . The identity

[TABLE]

shows that the condition in (11) is satisfied with $A=\zeta^{2}r/(4\sqrt{d})$ .

Let $f\in\mathcal{F}_{\alpha}^{\sqrt{d}}$ be such that $f\geq 1/2$ (the existence is guaranteed through condition (C4)) and define for $\theta\in\{\pm 1\}$

[TABLE]

Since $k_{m}^{*}\geq 1$ we have $\int_{0}^{1}f_{\theta}(x)dx=1$ and $f_{\theta}\geq 0$ holds because of the estimate

[TABLE]

For ${|j|}\neq k_{m}^{*}$ , we have $[f]_{j}=[f_{\theta}]_{j}$ and thus trivially $1/d\leq|[f_{\theta}]_{j}|^{2}/\alpha_{j}\leq d$ for ${|j|}\neq k_{m}^{*}$ since $\mathcal{F}_{\alpha}^{\sqrt{d}}\subset\mathcal{F}_{\alpha}^{d}$ . Moreover

[TABLE]

and hence $f_{\theta}\in\mathcal{F}_{\alpha}^{d}$ for $\theta\in\{\pm 1\}$ .

To obtain a lower bound for $p^{\ast}$ defined in (10) consider the joint distribution $\mathbb{P}_{\theta}$ of the samples $N^{1},\ldots,N^{n}$ and $Y_{1},\ldots,Y_{m}$ under $\lambda_{\theta}$ and $f_{\theta}$ . Note that due to our construction we have $\lambda_{-1}\star f_{-1}=\lambda_{1}\star f_{1}$ . Thus $\mathbb{P}_{-1}^{N^{i}}=\mathbb{P}_{1}^{N^{i}}$ for all $i=1,\ldots,n$ (due to the fact that the distribution of a Poisson point process is determined by its intensity) and the Hellinger distance between $\mathbb{P}_{-1}$ and $\mathbb{P}_{1}$ does only depend on the distribution of the sample $Y_{1},\ldots,Y_{m}$ . More precisely,

[TABLE]

and we proceed by bounding $H^{2}(\mathbb{P}_{-1}^{Y_{1}},\mathbb{P}_{1}^{Y_{1}})$ from above. Recall that $f\geq 1/2$ which is used to obtain the estimate

[TABLE]

Hence we have $H^{2}(\mathbb{P}_{-1},\mathbb{P}_{1})\leq 1$ and application of statement (ii) of Theorem 2.2 in [Tsy09] with $\alpha=1$ implies $p^{\ast}\geq\frac{1}{2}(1-\sqrt{3}/2)$ which finishes the proof of the theorem. ∎

A.3. Proof of Theorem 4

Set $\widetilde{\lambda}_{k_{n}^{\ast}}=\sum_{0\leq\left|j\right|\leq k_{n}^{\ast}}[\lambda]_{j}\mathds{1}_{\Omega_{j}}\mathbf{e}_{j}$ . The proof consists in finding appropriate upper bounds for the quantities $\square$ and $\triangle$ in the estimate

[TABLE]

Upper bound for $\square$ : Using the identity $\mathbb{E}\widehat{[\ell]}_{j}=[f]_{j}[\lambda]_{j}$ we obtain

[TABLE]

Using the estimate $\left|a\right|^{2}\leq 2\left|a-1\right|^{2}+2$ for $a=[f]_{j}/\widehat{[f]}_{j}$ , the definition of $\Omega_{j}$ and the independence of $\widehat{[\ell]}_{j}$ and $\widehat{[f]}_{j}$ we get

[TABLE]

Applying statements a) and b) from Lemma 7 together with $f\in\mathcal{F}_{\alpha}^{d}$ yields

[TABLE]

which using that $\gamma_{j}\geq 1$ (which holds due to Assumption 1) implies

[TABLE]

Now consider $\square_{2}$ . Using the estimate $\left|a\right|^{2}\leq 2\left|a-1\right|^{2}+2$ for $a=[f]_{j}/\widehat{[f]}_{j}$ and the definition of $\Omega_{j}$ yields

[TABLE]

Notice that Theorem 2.10 in [Pet95] implies the existence of a constant $C>0$ with $\mathbb{E}[|\widehat{[f]}_{j}-[f]_{j}|^{4}]\leq C/m^{2}$ . Using this inequality in combination with assertion b) from Lemma 7 and $f\in\mathcal{F}_{\alpha}^{d}$ implies

[TABLE]

In addition, $\mathbb{E}[|[f]_{j}/\widehat{[f]}_{j}-1|^{2}\mathds{1}_{\Omega_{j}}]\leq m\operatorname{Var}(\widehat{[f]}_{j})\leq 1$ which in combination with (13) implies

[TABLE]

Exploiting the fact that $\lambda\in\Lambda_{\gamma}^{r}$ and the definition of $\Phi_{m}$ in (LABEL:eq:def:Phi:m) we obtain

[TABLE]

Putting together the estimates for $\square_{1}$ and $\square_{2}$ yields

[TABLE]

Upper bound for $\triangle$ : $\triangle$ can be decomposed as

[TABLE]

$\lambda\in\Lambda_{\gamma}^{r}$ implies $\triangle_{1}\leq r\omega_{k_{n}^{*}}/\gamma_{k_{n}^{*}}\leq r\cdot\Psi_{n}$ and Lemma 7 yields the estimate $\triangle_{2}\leq 4dr\cdot\Phi_{m}$ which together imply $\triangle\leq r\cdot\Psi_{n}+4dr\cdot\Phi_{m}$ . Combining the derived estimates for $\square$ and $\triangle$ finishes the proof.∎

A.4. Auxiliary results for the proof of Theorem 4

Lemma 7.

The following assertions hold:

a)

$\operatorname{Var}(\widehat{[\ell]}_{j})\leq[\lambda]_{0}/n$ , 2. b)

$\operatorname{Var}(\widehat{[f]}_{j})\leq 1/m$ , 3. c)

$\mathbb{P}(\Omega_{j}^{\mathsf{c}})=\mathbb{P}(|\widehat{[f]}_{j}|^{2}<1/m)\leq\min\left\{1,4d/(m\alpha_{j})\right\}\quad\forall f\in\mathcal{F}_{\alpha}^{d}$ .

Proof.

The proof of statement a) is given by the identity

[TABLE]

For the proof of b), note that we have $\operatorname{Var}(\widehat{[f]}_{j})=\frac{1}{m}\,\operatorname{Var}\left(\mathbf{e}_{j}(-Y_{1})\right)$ and the assertion follows from the estimate

[TABLE]

For the proof of c), we consider two cases: if $\left|[f]_{j}\right|^{2}<4/m$ we have $1<\frac{4d}{m\alpha_{j}}$ because $f\in\mathcal{F}_{\alpha}^{d}$ and the statement is evident. Otherwise, $\left|[f]_{j}\right|^{2}\geq 4/m$ which implies

[TABLE]

Applying Chebyshev’s inequality and exploiting the definition of $\mathcal{F}_{\alpha}^{d}$ yields

[TABLE]

and statement c) follows. ∎

Appendix B Proofs of Section 4

B.1. Proof of Theorem 5

Define the events $\Xi_{1}=\{([\ell]_{0}\vee 1)/2\leq\widehat{[\ell]}_{0}\vee 1\leq 2([\ell]_{0}\vee 1)\}$ and

[TABLE]

The identity $1=\mathds{1}_{\Xi_{1}\cap\Xi_{2}}+\mathds{1}_{\Xi_{2}^{\mathsf{c}}}+\mathds{1}_{\Xi_{1}^{\mathsf{c}}\cap\Xi_{2}}$ provides the decomposition

[TABLE]

and we will establish uniform upper bounds over the ellipsoids $\Lambda_{\gamma}^{r}$ and $\mathcal{F}_{\alpha}^{d}$ for the three terms on the right-hand side separately.

Uniform upper bound for $\square_{1}$ : Denote by $\mathcal{S}_{k}$ the linear subspace of $\L^{2}$ spanned by the functions $\mathbf{e}_{j}(\cdot)$ for $j\in\{-k,\ldots,k\}$ . Since the identity $\Upsilon(t)=\|t-\widehat{\lambda}_{k}\|_{\omega}^{2}-\|\widehat{\lambda}_{k}\|_{\omega}^{2}$ holds for all $t\in\mathcal{S}_{k}$ , $k\in\{0,\ldots,n\wedge m\}$ , we obtain for all such $k$ that $\operatornamewithlimits{argmin}_{t\in\mathcal{S}_{k}}\Upsilon(t)=\widehat{\lambda}_{k}$ . Using this identity and the definition of ${\widetilde{k}}$ yields for all $k\in\{0,\ldots,{K_{nm}^{\alpha}}\}$ that

[TABLE]

where $\lambda_{k}=\sum_{0\leq|j|\leq k}[\lambda]_{j}\mathbf{e}_{j}$ denotes the projection of $\lambda$ on the subspace $\mathcal{S}_{k}$ . Elementary computations imply

[TABLE]

for all $k\in\{0,\ldots,{K_{nm}^{\alpha}}\}$ . In addition to $\lambda_{k}$ defined above, introduce the further abbreviations

[TABLE]

as well as $\Theta_{k}=\widehat{\lambda}_{k}-\check{\lambda}_{k}-\widetilde{\lambda}_{k}+\lambda_{k},\widetilde{\Theta}_{k}=\widetilde{\lambda}_{k}-\lambda_{k},\text{ and }\check{\Theta}_{k}=\check{\lambda}_{k}-\lambda_{k}$ . Using these abbrevations and the identity $\widehat{\lambda}_{n\wedge m}-\lambda_{n\wedge m}=\Theta_{n\wedge m}+\widetilde{\Theta}_{n\wedge m}+\check{\Theta}_{n\wedge m}$ , we deduce from (14) that

[TABLE]

for all $k\in\{0,\ldots,{K_{nm}^{\alpha}}\}$ . Define $\mathcal{B}_{k}=\{\lambda\in\mathcal{S}_{k}:\left\lVert\lambda\right\rVert_{\omega}\leq 1\}$ . For every $\tau>0$ and $t\in\mathcal{S}_{k}$ , the estimate $2uv\leq\tau u^{2}+\tau^{-1}v^{2}$ implies

[TABLE]

Because $\widehat{\lambda}_{\widetilde{k}}-\lambda_{k}\in\mathcal{S}_{\widetilde{k}\vee k}$ , combining the last estimate with (B.1) we get

[TABLE]

Note that $\|\widehat{\lambda}_{\widetilde{k}}-\lambda_{k}\|_{\omega}^{2}\leq 2\|\widehat{\lambda}_{\widetilde{k}}-\lambda\|_{\omega}^{2}+2\left\lVert\lambda_{k}-\lambda\right\rVert_{\omega}^{2}$ and $\left\lVert\lambda-\lambda_{k}\right\rVert_{\omega}^{2}\leq r\omega_{k}\gamma_{k}^{-1}$ for all $\lambda\in\Lambda_{\gamma}^{r}$ since $\omega\gamma^{-1}$ is non-increasing due to Assumption 1. Specializing with $\tau=1/8$ , we obtain

[TABLE]

Combining the facts that $\mathds{1}_{\Omega_{j}}\mathds{1}_{\Xi_{2}}=\mathds{1}_{\Xi_{2}}$ for $0\leq|j|\leq M_{m}^{\alpha}$ and ${K_{nm}^{\alpha}}\leq M_{m}^{\alpha}$ by definition, we obtain for all $j\in\{-{K_{nm}^{\alpha}},\ldots,{K_{nm}^{\alpha}}\}$ the estimate

[TABLE]

Hence, $\sup_{t\in\mathcal{B}_{k}}|\langle\Theta_{n\wedge m},t\rangle_{\omega}|^{2}\mathds{1}_{\Xi_{2}}\leq\frac{1}{4}\sup_{t\in\mathcal{B}_{k}}|\langle\widetilde{\Theta}_{n\wedge m},t\rangle_{\omega}|^{2}$ for all $0\leq k\leq{K_{nm}^{\alpha}}$ . Thus, from (B.1) we obtain

[TABLE]

Exploiting the definition of both the penalty $\widetilde{\textsc{pen}}$ and the event $\Xi_{1}$ , we obtain

[TABLE]

Applying Lemma 9 with $\delta_{k}^{*}=d\delta_{k}^{\alpha}$ and $\Delta_{k}^{*}=d\Delta_{k}^{\alpha}$ yields

[TABLE]

Using Statement a) of Lemma 8 and the fact that $K_{nm}^{\alpha}\leq n$ by definition, we obtain that

[TABLE]

where the last estimate is due to the fact that $\left\lVert f\right\rVert^{2}\leq d\rho$ for all $f\in\mathcal{F}_{\alpha}^{d}$ and $\left\lVert\lambda\right\rVert^{2}\leq r$ for all $\lambda\in\Lambda_{\gamma}^{r}$ . Note that we have

[TABLE]

with a numerical constant $C$ which implies

[TABLE]

The last term in (B.1) is bounded by means of Lemma 10 which immediately yields $\mathbb{E}[\sup_{t\in\mathcal{B}_{K_{nm}^{\alpha}}}|\langle\check{\Theta}_{n\wedge m},t\rangle_{\omega}|^{2}]\lesssim\Phi_{m}$ . Combining the preceding estimates, which hold uniformly for all $\lambda\in\Lambda_{\gamma}^{r}$ and $f\in\mathcal{F}_{\alpha}^{d}$ , we conclude from Equation (B.1) that

[TABLE]

Uniform upper bound for $\square_{2}$ : Define $\breve{\lambda}_{k}=\sum_{0\leq\left|j\right|\leq k}[\lambda]_{j}\mathds{1}_{\Omega_{j}}\mathbf{e}_{j}$ . Note that $\|\widehat{\lambda}_{k}-\breve{\lambda}_{k}\|_{\omega}^{2}\leq\|\widehat{\lambda}_{k^{\prime}}-\breve{\lambda}_{k^{\prime}}\|_{\omega}^{2}$ for $k\leq k^{\prime}$ and $\|\breve{\lambda}_{k}-\lambda\|_{\omega}^{2}\leq\|\lambda\|_{\omega}^{2}$ for all $k\in\mathbb{N}_{0}$ . Consequently, since $k\in\{0,\ldots,{K_{nm}^{\alpha}}\}$ , we obtain the estimate

[TABLE]

and due to Assumption 1 and Lemma 12 it is easily seen that $\|\lambda\|^{2}_{\omega}\cdot\mathbb{P}(\Xi_{2}^{\mathsf{c}})\lesssim m^{-4}$ . Using the definition of $\Omega_{j}$ , we further obtain

[TABLE]

where the last estimate follows by applying Theorem 2.10 from [Pet95] with $p=4$ two times. If ${K_{nm}^{\alpha}}=0$ , Lemma 12 implies

[TABLE]

Otherwise, if ${K_{nm}^{\alpha}}>0$ , we exploit $\omega_{j}\leq\omega_{j}^{+}\alpha_{j}^{-1}$ , ${K_{nm}^{\alpha}}\leq N_{n}^{\alpha}$ and the definition of $N_{n}^{\alpha}$ to bound the first term in (18). The second term in (18) can be bounded from above by noting that $\omega_{j}\leq\gamma_{j}$ thanks to Assumption 1, and we obtain

[TABLE]

Thanks to the logarithmic increase of the harmonic series, $N_{n}^{\alpha}\leq n$ and Lemma 12, the last estimate implies

[TABLE]

if ${K_{nm}^{\alpha}}>0$ , and thus $\mathbb{E}[\|\widehat{\lambda}_{{K_{nm}^{\alpha}}}-\breve{\lambda}_{{K_{nm}^{\alpha}}}\|_{\omega}^{2}\,\mathds{1}_{\Xi_{2}^{\mathsf{c}}}]\lesssim\frac{1}{m}+\frac{1}{m^{2}}$ , independent of the actual value of ${K_{nm}^{\alpha}}$ . Using the obtained estimates, we conclude

[TABLE]

Uniform upper bound for $\square_{3}$ : In order to find a uniform upper bound for $\square_{3}$ , first recall the definition $\breve{\lambda}_{k}=\sum_{0\leq{|j|}\leq k}[\lambda]_{j}\mathds{1}_{\Omega_{j}}\mathbf{e}_{j}$ and consider the estimate

[TABLE]

Using the estimate $\|\breve{\lambda}_{\widetilde{k}}-\lambda\|_{\omega}^{2}\leq\|\lambda\|_{\omega}^{2}$ , we obtain for $\lambda\in\Lambda_{\gamma}^{r}$ by means of Lemma 11 that

[TABLE]

which controls the second term on the right-hand side of (19). We now bound the first term on the right-hand side of (19). If ${K_{nm}^{\alpha}}=0$ , we have ${\widetilde{k}}=0$ , and by means of the Cauchy-Schwarz inequality and Theorem 2.10 from [Pet95] it is easily seen that

[TABLE]

Otherwise, ${K_{nm}^{\alpha}}>0$ , and we need the following further estimate, which is easily verified:

[TABLE]

We start by bounding the first term on the right-hand side of (B.1). Using the definition of $\Xi_{2}$ and $\omega_{j}\leq\gamma_{j}$ , we obtain for all $\lambda\in\Lambda_{\gamma}^{r}$ that

[TABLE]

Since $|[f]_{j}|^{-2}\leq d\alpha_{j}$ for $f\in\mathcal{F}_{\alpha}^{d}$ , the Cauchy-Schwarz inequality in combination with Theorem 2.10 from [Pet95] implies for the second term on the right-hand side of (B.1) that

[TABLE]

We exploit the definition of $N_{n}^{\alpha}$ together with ${K_{nm}^{\alpha}}\leq N_{n}^{\alpha}$ to obtain

[TABLE]

from which by the logarithmic increase of the harmonic series and Lemma 11 we conclude that

[TABLE]

independent of the actual value of ${K_{nm}^{\alpha}}$ . Finally, the third and last term on the right-hand side of (B.1) can be bounded from above the same way after exploiting the definition of $\Xi_{2}$ , and we obtain

[TABLE]

Putting together the derived estimates, we obtain

[TABLE]

The statement of the theorem follows by combining the upper bounds for $\square_{1}$ , $\square_{2}$ , and $\square_{3}$ .∎

B.2. Proof of Theorem 6

Consider the event

[TABLE]

in addition to the event $\Xi_{1}$ introduced in the proof of Theorem 5 and the slightly redefined event $\Xi_{2}$ defined as

[TABLE]

Defining $\Xi=\Xi_{1}\cap\Xi_{2}\cap\Xi_{3}$ , the identity $1=\mathds{1}_{\Xi}+\mathds{1}_{\Xi_{2}^{\mathsf{c}}}+\mathds{1}_{\Xi_{1}^{\mathsf{c}}\cap\Xi_{2}}+\mathds{1}_{\Xi_{1}\cap\Xi_{2}\cap\Xi_{3}^{\mathsf{c}}}$ motivates the decomposition

[TABLE]

and we establish uniform upper risk bounds for the four terms on the right-hand side separately.

Uniform upper bound for $\square_{1}$ : On $\Xi$ we have the estimate $\frac{1}{4}\Delta_{k}\leq\widehat{\Delta}_{k}\leq\frac{9}{4}\Delta_{k}$ , and thus

[TABLE]

for all $k\in\{0,\ldots,{M_{m}^{\alpha+}}\}$ . This last estimate implies

[TABLE]

from which we conclude $\frac{3}{100}\cdot\delta_{k}\leq\widehat{\delta}_{k}\leq\frac{17}{5}\cdot\delta_{k}$ . Putting $\textsc{pen}_{k}=\frac{165}{2}(\widehat{[\ell]}_{0}\vee 1)\cdot\frac{\delta_{k}}{n}$ , we observe that on $\Xi_{2}$ the estimate

[TABLE]

holds for all $k\in\{0,\ldots,{M_{m}^{\alpha+}}\}$ . Note that on $\Xi$ we have $\widehat{k}\leq{M_{m}^{\alpha+}}$ which using $\textsc{pen}_{k\vee\widehat{k}}\leq\textsc{pen}_{k}+\textsc{pen}_{\widehat{k}}$ implies

[TABLE]

Now, we can proceed by mimicking the derivation of (B.1) in the proof of Theorem 5. More precisely, replacing the penalty term $\widetilde{\textsc{pen}}_{k}$ used in that proof by $\widehat{\textsc{pen}}_{k}$ , using the definition of $\textsc{pen}_{k}$ above and (21), we obtain

[TABLE]

As in the proof of Theorem 5, the second and the third term are bounded applying Lemmata 9 (with $\delta_{k}^{*}\vcentcolon=\delta_{k}$ and $\Delta_{k}^{*}\vcentcolon=\Delta$ ) and 10, respectively. Hence, by means of an obvious adaption of Statement a) in Lemma 8 (with $N_{n}^{\alpha}$ replaced by ${N_{n}^{\alpha+}}$ ) and the estimates

[TABLE]

with $\zeta_{d}=\log(4d)/\log(4)$ , we obtain in analogy to the way of proceeding in the proof of Theorem 5 that

[TABLE]

Upper bound for $\square_{2}$ : The uniform upper bound for $\square_{2}$ can be derived in analogy to the bound for $\square_{2}$ in the proof of Theorem 5 using Assumption 2 instead of Statement b) from Lemma 8 in the proof of Lemma 12. Hence, we obtain

[TABLE]

Upper bound for $\square_{3}$ : The term $\square_{3}$ is bounded analogously to the bound established for $\square_{3}$ in the proof of Theorem 5 (here, we do not have to exploit the additional Assumption 2), and we get

[TABLE]

Upper bound for $\square_{4}$ : To find a uniform upper bound for the term $\square_{4}$ , one can use exactly the same decompositions as in the proof of the uniform upper bound for $\square_{3}$ in Theorem 5 by replacing the probability of $\Xi_{1}^{\mathsf{c}}$ with the one of $\Xi_{3}^{\mathsf{c}}$ . Doing this, we obtain by means of Lemma 13 that

[TABLE]

The result of the theorem now follows by combining (22), (23), (24) and (25). ∎

B.3. Auxiliary results

Lemma 8.

Let Assumption 1 hold. Then the following assertions hold true.

a)

$\delta_{j}^{\alpha}/n\leq 1$ * for all $n\in\mathbb{N}$ and $j\in\{0,\ldots,N_{n}^{\alpha}\}$ ,* 2. b)

$\exp\left(-m\alpha_{M_{m}^{\alpha}}/(128d)\right)\leq C(d)m^{-5}$ * for all $m\in\mathbb{N}$ , and* 3. c)

$\min_{1\leq j\leq M_{m}^{\alpha}}\left|[f]_{j}\right|^{2}\geq 2m^{-1}$ * for all $m\in\mathbb{N}$ .*

Proof.

a) In case $N_{n}^{\alpha}=0$ , we have $\delta^{\alpha}_{N_{n}^{\alpha}}=1$ and there is nothing to show. Otherwise $0<N_{n}^{\alpha}\leq n$ , and by definition of $N_{n}^{\alpha}$ we have $(2j+1)\Delta^{\alpha}_{j}\leq n/\log(n+3)$ for $0\leq j\leq N_{n}^{\alpha}$ which by the definition of $\delta_{j}^{\alpha}$ implies that

[TABLE]

We consider two cases: In the first case, $n/((2j+1)\log(n+3))\vee(j+3)=j+3$ . Then $n\geq 1$ directly implies the estimate $\delta_{j}^{\alpha}\leq n$ . In the second case, we have $n/((2j+1)\log(n+3))\vee(j+3)=n/((2j+1)\log(n+3))$ and therefrom

[TABLE]

and thus $\delta_{j}^{\alpha}\leq n$ in both cases. Division by $n$ yields the assertion of the lemma. b) Note that, due to Assumption 1, we have $M_{m}^{\alpha}>0$ for all sufficiently large $m$ and that it is sufficient to show the desired inequality for such values of $m$ . By the definition of $M_{m}^{\alpha}$ , we have $\alpha_{M_{m}^{\alpha}}\geq 640dm^{-1}\cdot\log(m+1)$ which implies $\exp(-m\alpha_{M_{m}^{\alpha}}/(128d))\leq\exp(-5\log m)=m^{-5}$ , and the assertion follows. c) Take note of the observation that

[TABLE]

and $640m^{-1}\cdot\log(m+1)\geq 2m^{-1}$ for all $m\geq 1$ . ∎

Lemma 9.

Let $(\delta^{*}_{k})_{k\in\mathbb{N}_{0}}$ and $(\Delta^{*}_{k})_{k\in\mathbb{N}_{0}}$ be sequences such that for all $k\geq 1$ ,

[TABLE]

Then, for any $k\in\{1,\ldots,n\wedge m\}$ , we have

[TABLE]

with positive numerical constants $K_{1}$ , $K_{2}$ , and $K_{3}$ .

Proof.

The proof is a combination of the proofs of Lemma A.1 in [Kro16] (which deals with the case $\omega\equiv 1$ ) and Lemma A.4 in [JS13]. More precisely, one can apply Proposition C.1 in [Kro16] with $c(\varepsilon)$ from that statement replaced with $c(\varepsilon)=4(1+2\varepsilon)$ (this makes the proposition applicable also for complex-valued functions), $M_{1}^{2}=\delta_{k}^{\ast}$ , $H^{2}=\frac{\delta_{k}^{\ast}}{n}([\ell]_{0}\vee 1)$ , $\upsilon\vcentcolon=\|\lambda\|\|f\|\Delta_{k}^{\ast}([\ell]_{0}\vee 1)$ and setting $\varepsilon=1/64$ . ∎

Lemma 10.

Let $m\in\mathbb{N}$ , $k\in\mathbb{N}_{0}$ . Then

[TABLE]

The proof follows along the lines of the proof of Lemma A5 in [JS13] and is thus omitted.

Lemma 11.

Let Assumption 1 hold and consider the event $\Xi_{1}$ defined in Theorem 5. Then, for any $n\in\mathbb{N}$ , $\mathbb{P}(\Xi_{1}^{\mathsf{c}})\leq 2\exp(-Cn)$ with a numerical constant $C>0$ .

Proof.

Note that $\mathbb{P}(\Xi_{1}^{\mathsf{c}})=\mathbb{P}(\widehat{[\ell]}_{0}\vee 1<([\ell]_{0}\vee 1)/2)+\mathbb{P}(\widehat{[\ell]}_{0}\vee 1>2([\ell]_{0}\vee 1)),$ and the two terms on the right-hand side can be bounded by Chernoff bounds for Poisson distributed random variables (see [MU17], Theorem 5.4) which yields the result. ∎

Lemma 12.

Let Assumption 1 hold and consider the event $\Xi_{2}$ defined in the proof of Theorem 5. Then, for any $m\in\mathbb{N}$ , $\mathbb{P}(\Xi_{2}^{\mathsf{c}})\leq C(d)m^{-4}$ .

The proof follows along the lines of the proof of Lemma A6 in [JS13] and is thus omitted.

Lemma 13.

Let Assumptions 1 and 2 hold. The event $\Xi_{3}$ defined in (B.2) satisfies $\mathbb{P}(\Xi_{3}^{\mathsf{c}})\leq C(\alpha,d)m^{-4}$ for all $m\in\mathbb{N}$ .

The proof follows along the lines of the proof of Lemma A7 in [JS13] and is thus omitted.

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[AB 06] Anestis Antoniadis and Jéremie Bigot “Poisson inverse problems” In Ann. Statist. 34.5 , 2006, pp. 2132–2158 DOI: 10.1214/009053606000000687 · doi ↗
2[BB 09] Yannick Baraud and Lucien Birgé “Estimating the intensity of a random measure by histogram type estimators” In Probab. Theory Related Fields 143.1-2 , 2009, pp. 239–284 DOI: 10.1007/s 00440-007-0126-6 · doi ↗
3[BBM 99] Andrew Barron, Lucien Birgé and Pascal Massart “Risk bounds for model selection via penalization” In Probab. Theory Related Fields 113.3 , 1999, pp. 301–413 DOI: 10.1007/s 004400050210 · doi ↗
4[Big+13] Jérémie Bigot, Sébastien Gadat, Thierry Klein and Clément Marteau “Intensity estimation of non-homogeneous Poisson processes from shifted trajectories” In Electron. J. Stat. 7 , 2013, pp. 881–931 DOI: 10.1214/13-EJS 794 · doi ↗
5[Bir 07] Lucien Birgé “Model selection for Poisson processes” In Asymptotics: particles, processes and inverse problems 55 , IMS Lecture Notes Monogr. Ser. Inst. Math. Statist., Beachwood, OH, 2007, pp. 32–64 DOI: 10.1214/074921707000000265 · doi ↗
6[Bré81] Pierre Brémaud “Point processes and queues” Martingale dynamics, Springer Series in Statistics Springer-Verlag, New York-Berlin, 1981, pp. xviii+354
7[Chi+13] Sung Nok Chiu, Dietrich Stoyan, Wilfrid S. Kendall and Joseph Mecke “Stochastic geometry and its applications”, Wiley Series in Probability and Statistics John Wiley & Sons, Ltd., Chichester, 2013, pp. xxvi+544 DOI: 10.1002/9781118658222 · doi ↗
8[CK 02] Laurent Cavalier and Ja-Yong Koo “Poisson intensity estimation for tomographic data using a wavelet shrinkage approach” In IEEE Trans. Inform. Theory 48.10 , 2002, pp. 2794–2802 DOI: 10.1109/TIT.2002.802632 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Nonparametric intensity estimation from noisy observations of a Poisson process under unknown error distribution

Abstract.

Key words and phrases:

1. Introduction

2. Methodology

2.1. Notation

2.2. The minimax point of view

2.3. Sequence space representation

2.4. Orthonormal series estimator

3. Minimax theory

3.1. Model assumptions

Assumption 1**.**

3.2. Minimax lower bounds

Theorem 1**.**

Theorem 2**.**

Corollary 3**.**

3.3. Upper bound

Theorem 4**.**

3.4. Examples of convergence rates

Choices for the sequence γ\gammaγ:

Choices for the sequence α\alphaα:

4. Adaptive estimation

4.1. Partially adaptive estimation (Λγr\Lambda_{\gamma}^{r}Λγr​ unknown, Fαd\mathcal{F}_{\alpha}^{d}Fαd​ known)

Theorem 5**.**

4.2. Fully adaptive estimation (Λγr\Lambda_{\gamma}^{r}Λγr​ and Fαd\mathcal{F}_{\alpha}^{d}Fαd​ unknown)

Assumption 2**.**

Theorem 6**.**

4.3. Examples of convergence rates (continued from Subsection 3.4)

Scenario (pol)-(pol):

Scenario (exp)-(pol):

Scenario (pol)-(exp):

Scenario (exp)-(exp):

Appendix A Proofs of Section 3

A.1. Proof of Theorem 1

A.2. Proof of Theorem 2

A.3. Proof of Theorem 4

A.4. Auxiliary results for the proof of Theorem 4

Lemma 7**.**

Proof.

Appendix B Proofs of Section 4

B.1. Proof of Theorem 5

B.2. Proof of Theorem 6

B.3. Auxiliary results

Lemma 8**.**

Proof.

Lemma 9**.**

Proof.

Lemma 10**.**

Lemma 11**.**

Proof.

Lemma 12**.**

Lemma 13**.**

Assumption 1.

Theorem 1.

Theorem 2.

Corollary 3.

Theorem 4.

Choices for the sequence $\gamma$ :

Choices for the sequence $\alpha$ :

4.1. Partially adaptive estimation ( $\Lambda_{\gamma}^{r}$ unknown, $\mathcal{F}_{\alpha}^{d}$ known)

Theorem 5.

4.2. Fully adaptive estimation ( $\Lambda_{\gamma}^{r}$ and $\mathcal{F}_{\alpha}^{d}$ unknown)

Assumption 2.

Theorem 6.

Lemma 7.

Lemma 8.

Lemma 9.

Lemma 10.

Lemma 11.

Lemma 12.

Lemma 13.