Nonparametric intensity estimation from noisy observations of a Poisson process under unknown error distribution
Martin Kroll

TL;DR
This paper develops a nonparametric method for estimating the intensity of a Poisson process from noisy, indirect observations, achieving minimax optimal rates even when the error distribution is unknown and estimated from additional data.
Contribution
It introduces an orthonormal series estimator that adapts to unknown smoothness and error distribution, providing minimax optimal convergence rates in a circular Poisson process model.
Findings
Estimator attains minimax optimal convergence rates.
Data-driven dimension selection improves adaptivity.
Method effectively handles unknown error distribution.
Abstract
We consider the nonparametric estimation of the intensity function of a Poisson point process in a circular model from indirect observations . These observations emerge from hidden point process realizations with the target intensity through contamination with additive error. In case that the error distribution can only be estimated from an additional sample we derive minimax rates of convergence with respect to the sample sizes and under abstract smoothness conditions and propose an orthonormal series estimator which attains the optimal rate of convergence. The performance of the estimator depends on the correct specification of a dimension parameter whose optimal choice relies on smoothness characteristics of both the intensity and the error density. We propose a data-driven choice of the dimension parameter based on model selection and show…
| Restrictions | ||||||||
|---|---|---|---|---|---|---|---|---|
| (pol) | (pol) | , | ||||||
| (exp) | (pol) | |||||||
| (pol) | (exp) | |||||||
| (exp) | (exp) |
|
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Nonparametric intensity estimation from noisy observations of a Poisson process under unknown error distribution
Martin Kroll
ENSAE-ParisTech CREST
Abstract.
We consider the nonparametric estimation of the intensity function of a Poisson point process in a circular model from indirect observations . These observations emerge from hidden point process realizations with the target intensity through contamination with additive error. In case that the error distribution can only be estimated from an additional sample we derive minimax rates of convergence with respect to the sample sizes and under abstract smoothness conditions and propose an orthonormal series estimator which attains the optimal rate of convergence. The performance of the estimator depends on the correct specification of a dimension parameter whose optimal choice relies on smoothness characteristics of both the intensity and the error density. We propose a data-driven choice of the dimension parameter based on model selection and show that the adaptive estimator attains the minimax optimal rate.
Key words and phrases:
Resarch for this article was performed while I was PhD student at Universität Mannheim. Financial support by the Deutsche Forschungsgemeinschaft (DFG) through the Research Training Group RTG 1953 is gratefully acknowledged. I am indepted to my supervisors Jan Johannes and Martin Schlather for fruitful discussions and helpful comments on the paper.
1. Introduction
Point process models are used in a wide variety of applications, including, amongst others, stochastic geometry [Chi+13], extreme value theory [Res08], and queueing theory [Bré81]. Each realization of a point process is a random set of points which can alternatively be represented as an -valued random measure where denotes the Dirac measure concentrated at . Poisson point processes (PPPs) are of particular importance since they serve as the elementary building blocks for more complex point process models. Let be a locally compact second countable Hausdorff space, the corresponding Borel -field and a locally finite measure on the measurable space , i.e., for all relatively compact sets in . A random set of points from (resp. the random measure ) is called Poisson point process with intensity measure if (i) the number of points located in follows a Poisson distribution with parameter for all relatively compact , and (ii) for all and disjoint sets , the random variables are independent. It is well-known that the distribution of a PPP is completely determined by its intensity measure. Hence, from a statistical point of view, the (nonparametric) estimation of the intensity measure or its Radon-Nikodym derivative (the intensity function) with respect to some dominating measure from observations of the point process is of fundamental importance.
Inference and testing problems for Poisson and more general point processes have been tackled in a wide range of scenarios. The monographs [Kar91] and [Kut98] offer a comprehensive overview and discuss both parametric and nonparametric methods. From a methodological point of view, our approach in this paper is related to the article [Rey03] where the estimation of the intensity function from direct observations was studied using concentration inequalities.
Other approaches to nonparametric intensity estimation from direct observations, without making a claim to be exhaustive, can be found in [BB09] (where the performance of a histogram estimator under Hellinger loss is analysed), [Bir07] (using a testing approach to model selection), [GN00] (using a minimum complexity estimator in the Aalen model), and [PW04] (suggesting a wavelet estimator in the multiplicative intensity model).
Theoretical work on intensity estimation has recently been motivated by applications to genomic data. The model considered in the article [Big+13] is motivated by data arising throughout the processing of DNA ChIP-seq data. The article [San14] takes its motivation from the analysis of genomic data as well. In addition, let us mention two further articles where the development of nonparametric statistical methods for the analysis of point processes was inspired by applications from biology: first, motivated through DNA sequencing techniques, the article [SZ12] introduces a change-point model for nonhomogeneous Poisson processes occurring in molecular biology. Second, the article [ZK10] considered the nonparametric inference of Cox process data by means of a kernel type estimator.
Usually one aims to estimate the intensity function from direct observations where
[TABLE]
are realizations of a PPP with the target intensity . In this paper, however, we assume that we are interested in the nonparametric estimation of the intensity function without having access to the observations in (1). Instead, we are in the setup of a Poisson inverse problem [AB06] where we can only observe given through
[TABLE]
The indirect observations are related to the hidden by the identity . The definition of the as the fractional part of the additively contaminated yields a circular model by means of the usual topological identification of the interval and the circle of perimeter .
In contrast to our approach, the few existing papers on Poisson inverse problems [CK02, AB06, Big+13] assume the error distribution to be known. This conservative assumption is also standard in research articles dealing with classical deconvolution problems [Mei09]. If the error density is unknown, even identifiability of the statistical model is not guaranteed. Several remedies have been introduced to overcome this problem: for instance, it is possible to impose additional assumptions on the statistical model (e.g., [SV10] which deals with blind convolution under additive Gaussian noise with unknown variance). Alternatively, one can consider a framework with panel data [Neu07]. Finally, one can assume the availability of an additional sample from the error density (e.g., [DH93, Joh09, CL10, CL11]) to guarantee identifiability and enable inference. In this paper, we will stick to this last option.
Let us assume that the errors in the general model (2) are i.i.d. for some unknown error density . We will study the resulting model and consider the nonparametric estimation of the intensity function from observations
[TABLE]
where the are given as in (2). A natural aim here is to detect optimal rates of convergence in terms of the sample sizes and and to construct adaptive estimators attaining these rates. Note that the observation of i.i.d. processes with intensity is equivalent to the observation of one process with intensity , and both directions of this equivalence can easily be made rigorous. In order to obtain from the , put (denoting by the set-theoretic union of point processes; this shows the infinite divisibility of Poisson point processes). For the other direction, given and , it suffices to assign every point to one of the processes with equal probability.
From a methodological point of view, our approach is inspired by the one conducted in [JS13]. We consider orthonormal series estimators of the form
[TABLE]
where and is an appropriate estimator of the Fourier coefficient corresponding to the basis function (see Section 2 for details). Of course, this estimator is motivated by the -convergent representation for square-integrable . It turns out that the performance of the estimator crucially depends on the choice of the dimension parameter and that its optimal value depends on smoothness characteristics of the intensity that are usually not available in practice. In order to choose in a completely data-driven manner, we follow an approach based on model selection (see [BBM99, Com15]) and select the dimension parameter as the minimizer of a penalized contrast criterion. For the theoretical analysis of the adaptive estimator we need Talagrand type concentration inequalities tailored to the framework with PPP observations which cannot be directly transferred from results applied in the usual density estimation or deconvolution frameworks (see Remark 2.2 in [Kro16]). These inequalities have already been derived in a separate manuscript [Kro16], and we only state the necessary consequences of these results in the appendix. The article is organized as follows: in Section 2 we introduce our methodological approach. In Section 3 we study the nonparametric estimation problem from a minimax point of view. Section 4 considers adaptive estimation of the intensity for the Poisson model. Proofs are given in Section A and B.
2. Methodology
2.1. Notation
Throughout this work we assume that the intensity and the density belong to the space of square-integrable functions on the interval . Let be the complex trigonometric basis of given by . The Fourier coefficients of a function are denoted as follows:
[TABLE]
For a strictly positive symmetric sequence we introduce the weighted norm defined via . The corresponding scalar product is denoted with . Throughout the paper, we use the notation if for some numerical constant independent of and .
2.2. The minimax point of view
We evaluate the performance of an arbitrary estimator of by means of the mean integrated weighted squared loss . We take up the minimax point of view and consider the maximum risk defined by
[TABLE]
where and are classes of potential intensity functions and densities , respectively. The minimax risk is defined via
[TABLE]
where the infimum is taken over all estimators of . An estimator is called rate optimal if
[TABLE]
By allowing for general weight sequences , we can treat both the estimation of (in this case, ) as well as the estimation of derivatives (take for for the -th derivative). The classes of intensity functions and of densities to be considered in this article will be specified in Section 3 below where we derive lower bounds on the minimax risk for these specific choices and prove that this lower bound is attained up to a numerical constant by a suitably defined orthonormal series estimator.
2.3. Sequence space representation
Under the considered model, the observed point processes in (3) are generated from independent Poisson point processes with intensity function by independent random contaminations of the individual points. We emphasize again that the (unobserved) contaminations are assumed to follow a probability law given by an unknown density and are to be understood additively modulo . Thus, the observations under the Poisson model are given by
[TABLE]
where is the realization of a Poisson point process with intensity function and the errors are i.i.d. . Note that each is again a realization of a Poisson point process whose intensity function is given by the circular convolution modulo 1 of with the error density . More precisely, is given by the formula
[TABLE]
By the convolution theorem, we have for all . From Campbell’s theorem (cf. [Ser09], Chapter 3, Theorem 24) it can be deduced that for measurable functions we have provided that the integral on the right-hand side exists. Exploiting this equation for and setting
[TABLE]
we thus obtain that for all . More precisely, we have
[TABLE]
where
2.4. Orthonormal series estimator
In view of (6) and the fact that , a natural estimator of is given by
[TABLE]
with as defined in (5), and . Note that in (6) is not directly available and thus has to be estimated from the sample in (3). The additional threshold occurring in the definition of through the indicator function over the set compensates for ’too small’ absolute values of and is imposed in order to avoid unstable behaviour of the estimator. The optimal choice of the dimension parameter in the minimax framework will be determined in Section 3 and depends on the classes and . The data-driven choice of the dimension parameter is discussed in Section 4.
3. Minimax theory
3.1. Model assumptions
Let and be strictly positive symmetric sequences and fix . In this section, we derive minimax rates of convergence concerning the maximum risk defined in (4) with respect to the classes
[TABLE]
and
[TABLE]
of intensity functions and error densities, respectively. We now state some regularity conditions imposed on the sequences and .
Assumption 1**.**
, and are strictly positive symmetric sequences such that , for all and the sequences and are both non-increasing. Finally, .
3.2. Minimax lower bounds
The following two theorems provide minimax lower bounds in terms of the sample sizes and in (3), respectively. To state our results, we put
[TABLE]
By the results of this section, and will turn out to be the optimal (up to constants) rates of convergence in terms of and . The two terms over which the maximum is taken in the definition of can be interpreted as a squared bias term and a variance term, respectively. The rate in should then be obtained by choosing the truncation value such that the maximum of these two terms is minimized. This suggests to choose the truncation parameter as
[TABLE]
Our first theorem establishes a lower bound in terms of .
Theorem 1**.**
Let Assumption 1 hold, and further assume that
- (C1)
, and 2. (C2)
0<\eta^{-1}={\inf_{n\in\mathbb{N}}\Psi_{n}^{-1}\cdot\min\big{\{}\frac{\omega_{k_{n}^{\ast}}}{\gamma_{k_{n}^{\ast}}},\sum_{0\leq\left|j\right|\leq k_{n}^{\ast}}\frac{\omega_{j}}{n\alpha_{j}}\big{\}}}* for some .*
Then, for any ,
[TABLE]
where with , and the infimum is taken over all estimators of based on the observations from (3).
As the proof Theorem 1 shows, the lower bound , which does not depend on the sample size of the auxiliary sample from the error density, is valid also in case of a known error density. The potential deterioration of the overall rate of convergence in contrast to this case is introduced by the uncertainty concerning the error density . Since this uncertainty is quantified by the sample size , one would expect a dependence of the lower bound on as well. This intuition is made rigorous by means of the following theorem.
Theorem 2**.**
Let Assumption 1 hold, and in addition assume that
- (C3)
there exists a density in with .
Then, for any ,
[TABLE]
where and the infimum is taken over all estimators of based on the observations from (3).
The next corollary is an immediate consequence of Theorems 1 and 2.
Corollary 3**.**
Under the assumptions of Theorems 1 and 2, for any ,
[TABLE]
Note that the contributions of the sample sizes and to the overall lower bound are separated from another, and the rate is determined by the maximum of and . This phenomenon has already been observed in the related problem of density estimation [Joh09, CL11, JS13] and other inverse problems with unknown operator [Del+12, JS13a]. In addition, it can be seen from the mere definition of and that the rate in terms of is always faster then the one in . Hence, as long as , there is no deterioration in the rate in comparison to the setup with known error density (see Table 1 for a more detailed evaluation of the rates in some special cases).
3.3. Upper bound
Let us now establish an upper bound for the maximum risk in terms of and for the estimator in (7) under a suitable choice of the dimension parameter . More precisely, the following theorem establishes an upper bound for the rate of convergence of with defined in Equation (8). Thus, due to the lower bound proofs in the preceding subsection it is shown that attains the minimax rates of convergence in terms of the samples sizes and . Note that this rate optimal choice of the dimension parameter does not depend on the sample size (recall Equation (8) for its definition, and note that none of the quantities appearing there depends on ). The non-dependence of the rate-optimal smoothing parameter can also been observed in the related model of circular density deconvolution with unknown error density considered in [JS13].
Theorem 4**.**
Let Assumption 1 hold.Then, for any ,
[TABLE]
3.4. Examples of convergence rates
Fixing , for and some , we consider specific choices of the sequences and and state the resulting rates with respect to both sample sizes and .
Choices for the sequence :
- •
(pol): and for all and some . This corresponds to the case when the unknown intensity function belongs to some Sobolev space.
- •
(exp): for all and some . In this case, belongs to some space of analytic functions.
Choices for the sequence :
- •
(pol): and for all and some . This corresponds to the case when the error density is ordinary smooth.
- •
(exp): for all and some .
Table 1 summarizes the rates and for the different choices of and . The rates in terms of coincide formally with the classical rates for nonparametric inverse problems (see [Fan91, Lac06], for instance). The rates in are of the same order as those that have already been obtained in the related model of (circular) density deconvolution with unknown error density in [Joh09, CL11, JS13]. They can also be compared with the rates in the indirect Gaussian sequence model with partially known operator [JS13a], which provides a benchmark model for a variety of nonparametric inverse problems.
4. Adaptive estimation
The estimator considered in Theorem 4 is obtained by specializing the estimator in (7) with the truncation parameter . This procedure suffers from the apparent drawback that the resulting estimator depends on the knowledge of the classes and . In this section, we provide adaptive choices of the truncation parameter based on model selection (see [BBM99, Mas07] for comprehensive presentations in the context of nonparametric estimation). The principal idea of model selection procedures consists in defining a truncation parameter in a fully data-driven way as the minimizer of a penalized empirical contrast,
[TABLE]
where is a contrast function with being the linear subspace of spanned by the functions for , a (as a function of ) non-decreasing penalty that mimics the variance, and a set of admissible values of (which represents the set of admissible models since each choice of corresponds to a finite dimensional model which is given by the functions spanned by the basis functions with ).
In order to construct an adaptive estimator which does not require any a priori knowledge of and , we proceed in two steps: in the first step, we assume that is unknown but known. Hence, the overall estimation procedure (in particular, the definition of the penalty term) might still depend on the knowledge of the sequence . This results in a partially adaptive definition of the truncation parameter. In the second step, we dispense with any knowledge on the classes and and propose a fully data-driven choice of the truncation parameter.
4.1. Partially adaptive estimation ( unknown, known)
For the definition of our partially adaptive choice of the dimension parameter we introduce some notation: for any , let
[TABLE]
For all , setting , we define
[TABLE]
and set . Now, for , define the contrast and the random sequence of penalties via
[TABLE]
Building on our definition of contrast and penalty, we define the partially adaptive selection of the dimension parameter as
[TABLE]
Theorem 5**.**
Let Assumption 1 hold. Then, for any ,
[TABLE]
4.2. Fully adaptive estimation ( and unknown)
We now also dispense with the knowledge of the smoothness of the error density and propose a fully data-driven selection of the dimension parameter. As in the case of partially adaptive estimation, we have to introduce some notation first. For , let
[TABLE]
For , set
[TABLE]
and . We consider the same contrast function as in the partially adaptive case but define the random sequence of penalities now by
[TABLE]
which does no longer depend on nor . Finally, set
[TABLE]
In order to state and prove the upper risk bound of the estimator , we have to introduce some further notation. We keep the definition of from Subsection 4.1 but slightly redefine as
[TABLE]
For , we also define
[TABLE]
which can be regarded as analogues of and in Subsection 4.1 in the case of a known error density . Finally, for , define
[TABLE]
and set , . In contrast to the proof of Theorem 5 we have to impose an additional assumption for the proof of an upper risk bound of .
Assumption 2**.**
* for all .*
Theorem 6**.**
Let Assumptions 1 and 2 hold. Then, for any ,
[TABLE]
Note that the only additional prerequisite of Theorem 6 in contrast to Theorem 5 is the validity of Assumption 2.
4.3. Examples of convergence rates (continued from Subsection 3.4)
We consider the same configurations for the sequences , and as in Subsection 3.4. In particular, we assume that and for all . The different configurations for and will be investigated in the following (compare also with the minimax rates of convergence given in Table 1). Note that the additional Assumption 2 is satisfied in all the considered cases. Let us define , that is, realizes the best compromise between squared bias and penalty.
Scenario (pol)-(pol):
In this scenario, it holds and . First assume that . In case that , the rate with respect to is which is the minimax optimal rate. In case that , it holds and the rate is which is minimax optimal up to a logarithmic factor. Assume now that . If , then the estimator obtains the optimal rate with respect to . Otherwise, yields the contribution to the rate.
Scenario (exp)-(pol):
as in scenario (pol)-(pol). Since , it holds and the optimal rate with respect to holds in case that . Otherwise, the bias-penalty tradeoff generates the contribution to the rate.
Scenario (pol)-(exp):
It holds that and again the sample size is no obstacle for attaining the optimal rate of convergence. If , the optimal rate holds as well. If , we get the rate which coincides with the optimal rate with respect to the sample size .
Scenario (exp)-(exp):
We have and where is the solution of and the solution of . Thus, we have and computation of resp. shows that only a loss by a logarithmic factor can occur as far as . If , the contribution to the rate arising from the trade-off between squared bias and penalty is determined by which deteriorates the optimal rate with respect to at most by a logarithmic factor.
Appendix A Proofs of Section 3
A.1. Proof of Theorem 1
Let us define as in the statement of the theorem and for each the function through
[TABLE]
Then each is a real-valued function by definition which is non-negative since we have
[TABLE]
Moreover holds for each due to the estimate
[TABLE]
This estimate and the non-negativity of together imply for all . From now on let be fixed and let denote the joint distribution of the i.i.d. samples and when the true parameters are and , respectively. Let denote the corresponding one-dimensional marginal distributions and the expectation with respect to . Let be an arbitrary estimator of . The key argument of the proof is the following reduction scheme:
[TABLE]
where for and the element is defined by for and . Consider the Hellinger affinity . For an arbitrary estimator of we have
[TABLE]
from which we conclude by means of the elementary inequality that
[TABLE]
Define the Hellinger distance between two probability measures and as and, analogously, the Hellinger distance between two finite measures and (that not necessarily have total mass equal to one) by (as usual, the integral is formed with respect to any measure dominating both and ). Let denote the intensity measure of a Poisson point process on whose Radon-Nikodym derivative with respect to the Lebesgue measure is given by . Note that we have the estimate for all with due to
[TABLE]
which can be realized in analogy to the non-negativity of shown above. We have
[TABLE]
Since the distribution of the sample does not depend on the choice of we obtain
[TABLE]
where the first estimate follows from Lemma 3.3.10 (i) in [Rei89] and the second one is due to Theorem 3.2.1 in [Rei93] which can be applied since each is a Poisson point process for the Poisson model. Thus, the relation implies . Finally, putting the obtained estimates into the reduction scheme (9) leads to
[TABLE]
which finishes the proof of the theorem since was arbitrary. ∎
A.2. Proof of Theorem 2
By Markov’s inequality we have for an arbitrary estimator of and (which will be specified below)
[TABLE]
which by reduction to two hypotheses implies
[TABLE]
where denotes the distribution when the true parameters are and . The specific hypotheses and will be specified below. If and can be chosen such that , application of the triangle inequality yields
[TABLE]
where is the minimum distance test given by . Hence, we obtain
[TABLE]
where the infimum is taken over all -valued functions based on the observations. Thus, it remains to find hypotheses and such that
[TABLE]
and which allow us to bound by a universal constant (independent of ) from below. For this purpose, set and , where is defined as in the statement of the theorem. Take note of the inequalities and which in combination imply for . These inequalities will be used below without further reference. For , we define
[TABLE]
Furthermore, we have
[TABLE]
and which together imply that for . The identity
[TABLE]
shows that the condition in (11) is satisfied with .
Let be such that (the existence is guaranteed through condition (C4)) and define for
[TABLE]
Since we have and holds because of the estimate
[TABLE]
For , we have and thus trivially for since . Moreover
[TABLE]
and hence for .
To obtain a lower bound for defined in (10) consider the joint distribution of the samples and under and . Note that due to our construction we have . Thus for all (due to the fact that the distribution of a Poisson point process is determined by its intensity) and the Hellinger distance between and does only depend on the distribution of the sample . More precisely,
[TABLE]
and we proceed by bounding from above. Recall that which is used to obtain the estimate
[TABLE]
Hence we have and application of statement (ii) of Theorem 2.2 in [Tsy09] with implies which finishes the proof of the theorem. ∎
A.3. Proof of Theorem 4
Set . The proof consists in finding appropriate upper bounds for the quantities and in the estimate
[TABLE]
Upper bound for : Using the identity we obtain
[TABLE]
Using the estimate for , the definition of and the independence of and we get
[TABLE]
Applying statements a) and b) from Lemma 7 together with yields
[TABLE]
which using that (which holds due to Assumption 1) implies
[TABLE]
Now consider . Using the estimate for and the definition of yields
[TABLE]
Notice that Theorem 2.10 in [Pet95] implies the existence of a constant with . Using this inequality in combination with assertion b) from Lemma 7 and implies
[TABLE]
In addition, which in combination with (13) implies
[TABLE]
Exploiting the fact that and the definition of in (LABEL:eq:def:Phi:m) we obtain
[TABLE]
Putting together the estimates for and yields
[TABLE]
Upper bound for : can be decomposed as
[TABLE]
implies and Lemma 7 yields the estimate which together imply . Combining the derived estimates for and finishes the proof.∎
A.4. Auxiliary results for the proof of Theorem 4
Lemma 7**.**
The following assertions hold:
- a)
, 2. b)
, 3. c)
.
Proof.
The proof of statement a) is given by the identity
[TABLE]
For the proof of b), note that we have and the assertion follows from the estimate
[TABLE]
For the proof of c), we consider two cases: if we have because and the statement is evident. Otherwise, which implies
[TABLE]
Applying Chebyshev’s inequality and exploiting the definition of yields
[TABLE]
and statement c) follows. ∎
Appendix B Proofs of Section 4
B.1. Proof of Theorem 5
Define the events and
[TABLE]
The identity provides the decomposition
[TABLE]
and we will establish uniform upper bounds over the ellipsoids and for the three terms on the right-hand side separately.
Uniform upper bound for : Denote by the linear subspace of spanned by the functions for . Since the identity holds for all , , we obtain for all such that . Using this identity and the definition of yields for all that
[TABLE]
where denotes the projection of on the subspace . Elementary computations imply
[TABLE]
for all . In addition to defined above, introduce the further abbreviations
[TABLE]
as well as . Using these abbrevations and the identity , we deduce from (14) that
[TABLE]
for all . Define . For every and , the estimate implies
[TABLE]
Because , combining the last estimate with (B.1) we get
[TABLE]
Note that and for all since is non-increasing due to Assumption 1. Specializing with , we obtain
[TABLE]
Combining the facts that for and by definition, we obtain for all the estimate
[TABLE]
Hence, for all . Thus, from (B.1) we obtain
[TABLE]
Exploiting the definition of both the penalty and the event , we obtain
[TABLE]
Applying Lemma 9 with and yields
[TABLE]
Using Statement a) of Lemma 8 and the fact that by definition, we obtain that
[TABLE]
where the last estimate is due to the fact that for all and for all . Note that we have
[TABLE]
with a numerical constant which implies
[TABLE]
The last term in (B.1) is bounded by means of Lemma 10 which immediately yields . Combining the preceding estimates, which hold uniformly for all and , we conclude from Equation (B.1) that
[TABLE]
Uniform upper bound for : Define . Note that for and for all . Consequently, since , we obtain the estimate
[TABLE]
and due to Assumption 1 and Lemma 12 it is easily seen that . Using the definition of , we further obtain
[TABLE]
where the last estimate follows by applying Theorem 2.10 from [Pet95] with two times. If , Lemma 12 implies
[TABLE]
Otherwise, if , we exploit , and the definition of to bound the first term in (18). The second term in (18) can be bounded from above by noting that thanks to Assumption 1, and we obtain
[TABLE]
Thanks to the logarithmic increase of the harmonic series, and Lemma 12, the last estimate implies
[TABLE]
if , and thus , independent of the actual value of . Using the obtained estimates, we conclude
[TABLE]
Uniform upper bound for : In order to find a uniform upper bound for , first recall the definition and consider the estimate
[TABLE]
Using the estimate , we obtain for by means of Lemma 11 that
[TABLE]
which controls the second term on the right-hand side of (19). We now bound the first term on the right-hand side of (19). If , we have , and by means of the Cauchy-Schwarz inequality and Theorem 2.10 from [Pet95] it is easily seen that
[TABLE]
Otherwise, , and we need the following further estimate, which is easily verified:
[TABLE]
We start by bounding the first term on the right-hand side of (B.1). Using the definition of and , we obtain for all that
[TABLE]
Since for , the Cauchy-Schwarz inequality in combination with Theorem 2.10 from [Pet95] implies for the second term on the right-hand side of (B.1) that
[TABLE]
We exploit the definition of together with to obtain
[TABLE]
from which by the logarithmic increase of the harmonic series and Lemma 11 we conclude that
[TABLE]
independent of the actual value of . Finally, the third and last term on the right-hand side of (B.1) can be bounded from above the same way after exploiting the definition of , and we obtain
[TABLE]
Putting together the derived estimates, we obtain
[TABLE]
The statement of the theorem follows by combining the upper bounds for , , and .∎
B.2. Proof of Theorem 6
Consider the event
[TABLE]
in addition to the event introduced in the proof of Theorem 5 and the slightly redefined event defined as
[TABLE]
Defining , the identity motivates the decomposition
[TABLE]
and we establish uniform upper risk bounds for the four terms on the right-hand side separately.
Uniform upper bound for : On we have the estimate , and thus
[TABLE]
for all . This last estimate implies
[TABLE]
from which we conclude . Putting , we observe that on the estimate
[TABLE]
holds for all . Note that on we have which using implies
[TABLE]
Now, we can proceed by mimicking the derivation of (B.1) in the proof of Theorem 5. More precisely, replacing the penalty term used in that proof by , using the definition of above and (21), we obtain
[TABLE]
As in the proof of Theorem 5, the second and the third term are bounded applying Lemmata 9 (with and ) and 10, respectively. Hence, by means of an obvious adaption of Statement a) in Lemma 8 (with replaced by ) and the estimates
[TABLE]
with , we obtain in analogy to the way of proceeding in the proof of Theorem 5 that
[TABLE]
Upper bound for : The uniform upper bound for can be derived in analogy to the bound for in the proof of Theorem 5 using Assumption 2 instead of Statement b) from Lemma 8 in the proof of Lemma 12. Hence, we obtain
[TABLE]
Upper bound for : The term is bounded analogously to the bound established for in the proof of Theorem 5 (here, we do not have to exploit the additional Assumption 2), and we get
[TABLE]
Upper bound for : To find a uniform upper bound for the term , one can use exactly the same decompositions as in the proof of the uniform upper bound for in Theorem 5 by replacing the probability of with the one of . Doing this, we obtain by means of Lemma 13 that
[TABLE]
The result of the theorem now follows by combining (22), (23), (24) and (25). ∎
B.3. Auxiliary results
Lemma 8**.**
Let Assumption 1 hold. Then the following assertions hold true.
- a)
* for all and ,* 2. b)
* for all , and* 3. c)
* for all .*
Proof.
a) In case , we have and there is nothing to show. Otherwise , and by definition of we have for which by the definition of implies that
[TABLE]
We consider two cases: In the first case, . Then directly implies the estimate . In the second case, we have and therefrom
[TABLE]
and thus in both cases. Division by yields the assertion of the lemma. b) Note that, due to Assumption 1, we have for all sufficiently large and that it is sufficient to show the desired inequality for such values of . By the definition of , we have which implies , and the assertion follows. c) Take note of the observation that
[TABLE]
and for all . ∎
Lemma 9**.**
Let and be sequences such that for all ,
[TABLE]
Then, for any , we have
[TABLE]
with positive numerical constants , , and .
Proof.
The proof is a combination of the proofs of Lemma A.1 in [Kro16] (which deals with the case ) and Lemma A.4 in [JS13]. More precisely, one can apply Proposition C.1 in [Kro16] with from that statement replaced with (this makes the proposition applicable also for complex-valued functions), , , and setting . ∎
Lemma 10**.**
Let , . Then
[TABLE]
The proof follows along the lines of the proof of Lemma A5 in [JS13] and is thus omitted.
Lemma 11**.**
Let Assumption 1 hold and consider the event defined in Theorem 5. Then, for any , with a numerical constant .
Proof.
Note that and the two terms on the right-hand side can be bounded by Chernoff bounds for Poisson distributed random variables (see [MU17], Theorem 5.4) which yields the result. ∎
Lemma 12**.**
Let Assumption 1 hold and consider the event defined in the proof of Theorem 5. Then, for any , .
The proof follows along the lines of the proof of Lemma A6 in [JS13] and is thus omitted.
Lemma 13**.**
Let Assumptions 1 and 2 hold. The event defined in (B.2) satisfies for all .
The proof follows along the lines of the proof of Lemma A7 in [JS13] and is thus omitted.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[AB 06] Anestis Antoniadis and Jéremie Bigot “Poisson inverse problems” In Ann. Statist. 34.5 , 2006, pp. 2132–2158 DOI: 10.1214/009053606000000687 · doi ↗
- 2[BB 09] Yannick Baraud and Lucien Birgé “Estimating the intensity of a random measure by histogram type estimators” In Probab. Theory Related Fields 143.1-2 , 2009, pp. 239–284 DOI: 10.1007/s 00440-007-0126-6 · doi ↗
- 3[BBM 99] Andrew Barron, Lucien Birgé and Pascal Massart “Risk bounds for model selection via penalization” In Probab. Theory Related Fields 113.3 , 1999, pp. 301–413 DOI: 10.1007/s 004400050210 · doi ↗
- 4[Big+13] Jérémie Bigot, Sébastien Gadat, Thierry Klein and Clément Marteau “Intensity estimation of non-homogeneous Poisson processes from shifted trajectories” In Electron. J. Stat. 7 , 2013, pp. 881–931 DOI: 10.1214/13-EJS 794 · doi ↗
- 5[Bir 07] Lucien Birgé “Model selection for Poisson processes” In Asymptotics: particles, processes and inverse problems 55 , IMS Lecture Notes Monogr. Ser. Inst. Math. Statist., Beachwood, OH, 2007, pp. 32–64 DOI: 10.1214/074921707000000265 · doi ↗
- 6[Bré81] Pierre Brémaud “Point processes and queues” Martingale dynamics, Springer Series in Statistics Springer-Verlag, New York-Berlin, 1981, pp. xviii+354
- 7[Chi+13] Sung Nok Chiu, Dietrich Stoyan, Wilfrid S. Kendall and Joseph Mecke “Stochastic geometry and its applications”, Wiley Series in Probability and Statistics John Wiley & Sons, Ltd., Chichester, 2013, pp. xxvi+544 DOI: 10.1002/9781118658222 · doi ↗
- 8[CK 02] Laurent Cavalier and Ja-Yong Koo “Poisson intensity estimation for tomographic data using a wavelet shrinkage approach” In IEEE Trans. Inform. Theory 48.10 , 2002, pp. 2794–2802 DOI: 10.1109/TIT.2002.802632 · doi ↗
