Estimation in the convolution structure density model. Part II: adaptation over the scale of anisotropic classes
Oleg Lepski, Thomas Willer

TL;DR
This paper advances adaptive minimax estimation in the convolution structure density model over anisotropic Nikol'skii classes, highlighting the impact of boundedness on risk and proposing an near-optimal adaptive estimator.
Contribution
It fully characterizes the minimax risk behavior across different parameters and introduces a selection rule for constructing nearly optimal adaptive estimators.
Findings
Boundedness of the function improves minimax risk asymptotics.
The proposed selection rule yields near-optimal adaptive estimators.
The behavior of minimax risk varies with regularity and norm parameters.
Abstract
This paper continues the research started in \cite{LW16}. In the framework of the convolution structure density model on , we address the problem of adaptive minimax estimation with --loss over the scale of anisotropic Nikol'skii classes. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. In particular, we show that the boundedness of the function to be estimated leads to an essential improvement of the asymptotic of the minimax risk. We prove that the selection rule proposed in Part I leads to the construction of an optimally or nearly optimally (up to logarithmic factor) adaptive estimator.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHydrocarbon exploration and reservoir analysis
Estimation in the convolution structure density model. Part II: adaptation over the scale of anisotropic classes.
O.V. Lepski label=e1][email protected] [
T. Willer label=e2][email protected] [ Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France
Institut de Mathématique de Marseille
Aix-Marseille Université
39, rue F. Joliot-Curie
13453 Marseille, France
Abstract
This paper continues the research started in Lepski and Willer (2016). In the framework of the convolution structure density model on , we address the problem of adaptive minimax estimation with –loss over the scale of anisotropic Nikol’skii classes. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. In particular, we show that the boundedness of the function to be estimated leads to an essential improvement of the asymptotic of the minimax risk. We prove that the selection rule proposed in Part I leads to the construction of an optimally or nearly optimally (up to logarithmic factor) adaptive estimator.
62G05, 62G20,
deconvolution model,
density estimation,
oracle inequality,
adaptive estimation,
kernel estimators,
–risk,
anisotropic Nikol’skii class,
keywords:
[class=AMS]
keywords:
\startlocaldefs\endlocaldefs
t1This work has been carried out in the framework of the Labex Archimède (ANR-11-LABX-0033) and of the A*MIDEX project (ANR-11-IDEX-0001-02), funded by the ”Investissements d’Avenir” French Government program managed by the French National Research Agency (ANR).
1 Introduction
In the present paper we will be interested in the adaptive estimation in the convolution structure density model. Our considerations here continue the research started in Lepski and Willer (2016).
Thus, we observe i.i.d. vectors with a common probability density satisfying the following structural assumption
[TABLE]
where and are supposed to be known and is the function to be estimated. Recall that for two functions f,g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}
[TABLE]
and for any , g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)} and ,
[TABLE]
Furthermore \mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)} denotes the set of probability densities on , is the ball of radius in {\mathbb{L}}_{s}\big{(}{\mathbb{R}}^{d}\big{)}:={\mathbb{L}}_{s}\big{(}{\mathbb{R}}^{d},\nu_{d}\big{)},1\leq s\leq\infty and is the Lebesgue measure on . At last, for any U\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)} let be the Fourier transform of .
The convolution structure density model (1.1) will be studied for an arbitrary g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)} and . Then, except in the case , the function is not necessarily a probability density.
We want to estimate using the observations . By estimator, we mean any -measurable map \hat{f}:{\mathbb{R}}^{n}\to{\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}. The accuracy of an estimator is measured by the –risk
[TABLE]
where denotes the expectation with respect to the probability measure of the observations . Also, , , is the -norm on . The objective is to construct an estimator of with a small –risk.
1.1 Adaptive estimation
Let be a given subset of {\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}. For any estimator define its maximal risk by {\cal R}^{(p)}_{n}\big{[}\tilde{f}_{n};\mathbb{F}\big{]}=\sup_{f\in\mathbb{F}}{\cal R}^{(p)}_{n}\big{[}\tilde{f}_{n};f\big{]} and its minimax risk on is given by
[TABLE]
Here, the infimum is taken over all possible estimators. An estimator whose maximal risk is bounded by up to some constant factor is called minimax on .
Let \big{\{}\mathbb{F}_{\vartheta},\vartheta\in\Theta\big{\}} be a collection of subsets of {\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d},\nu_{d}\big{)}, where is a nuisance parameter which may have a very complicated structure.
The problem of adaptive estimation can be formulated as follows: is it possible to construct a single estimator which would be simultaneously minimax on each class , i.e.
[TABLE]
We refer to this question as *the problem of minimax adaptive estimation over the scale * . If such an estimator exists, we will call it optimally adaptive. Using the modern statistical language we call the estimator nearly optimally adaptive if
[TABLE]
We will be interested in adaptive estimation over the scale
[TABLE]
where {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)} is the anisotropic Nikolskii class, see Definition 1 below. As it was explained in Part I, the adaptive estimation over the scale \big{\{}{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)},\;\big{(}\vec{\beta},\vec{r},\vec{L}\big{)}\in(0,\infty)^{d}\times[1,\infty]^{d}\times(0,\infty)^{d}\big{\}} can be viewed as the adaptation to anisotropy and inhomogeneity of the function to be estimated. Recall also that
[TABLE]
so simply means that the common density of observations is uniformly bounded by . It is easy to see that if and then for any .
Let us briefly discuss another example. Let and be arbitrary but a priory chosen numbers. Assume that the considered collection of anisotropic Nikol’skii classes obeys the following restrictions: and . Suppose also that , where . Then, there exists completely determined by and such that {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g,\mathbf{\infty}}(R,Q)={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R) for any .
Additionally, we will study the adaptive estimation over the collection
[TABLE]
We will show that the boundedness of the underlying function allows to improve considerably the accuracy of estimation.
1.2 Historical notes
The minimax adaptive estimation is a very active area of mathematical statistics and the interested reader can find a very detailed overview as well as several open problems in adaptive estimation in the recent paper, Lepski (2015). Below we will discuss only the articles whose results are relevant to our consideration, i.e. the density setting under -loss, from a minimax perspective.
Let us start with the following remark. If one assumes additionally that f,g\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)} the convolution structure density model can be interpreted as follows. The observations can be written as a sum of two independent random vectors, that is,
[TABLE]
where are i.i.d. -dimensional random vectors with common density to be estimated. The noise variables are i.i.d. -dimensional random vectors with known common density . At last are i.i.d. Bernoulli random variables with , where is supposed to be known. The sequences , and are supposed to be mutually independent.
The observation scheme (1.3) can be viewed as the generalization of two classical statistical models. Indeed, the case corresponds to the standard deconvolution model . Another ”extreme” case correspond to the direct observation scheme . The ”intermediate” case , considered for the first time in Hesse (1995), is understood as partially contaminated observations.
Direct case,
There is a vast literature dealing with minimax and minimax adaptive density estimation, see for example, Efroimovich (1986), Hasminskii and Ibragimov (1990), Golubev (1992), Donoho et al. (1996), Devroye and Lugosi (1997), Rigollet (2006), Rigollet and Tsybakov (2007), Samarov and Tsybakov (2007), Birgé (2008), Giné and Nickl (2009), Akakpo (2012), Gach et al. (2013), Lepski (2013), among many others. Special attention was paid to the estimation of densities with unbounded support, see Juditsky and Lambert–Lacroix (2004), Reynaud–Bouret et al. (2011). The most developed results can be found in Goldenshluger and Lepski (2011), Goldenshluger and Lepski (2014) and in Section 2 we will compare in detail our results with those obtained in these papers.
Intermediate case,
To the best of our knowledge, adaptive estimation in the case of partially contaminated observations has not been studied yet. We were able to find only two papers dealing with minimax estimation. The first one is Hesse (1995) (where the discussed model was introduced in dimension ) in which the author evaluated the -risk of the proposed estimator over a functional class formally corresponding to the Nikol’skii class . In Yuana and Chenb (2002) the latter result was developed to the multidimensional setting, i.e. to the minimax estimation on {\mathbb{N}}_{\infty,d}\big{(}\vec{2},1\big{)}. The most intriguing fact is that the accuracy of estimation in partially contaminated noise is the same as in the direct observation scheme. However none of these articles studied the optimality of the proposed estimators. We will come back to the aforementioned papers in Section 1.3.1 in order to compare the assumptions imposed on the noise density .
Deconvolution case,
First let us remark that the behavior of the Fourier transform of the function plays an important role in all the works dealing with deconvolution. Indeed ill-posed problems correspond to Fourier transforms decaying towards zero. Our results will be established for ”moderately” ill posed problems, so we detail only results in papers studying that type of operators. This assumption means that there exist and such that the Fourier transform of satisfies:
[TABLE]
Some minimax and minimax adaptive results in dimension 1 over different classes of smooth functions can be found in particular in Stefanski and Carroll (1990), Fan (1991), Fan (1993), Pensky and Vidakovic (1999), Fan and Koo (2002), Comte and al. (2006), Hall and Meister (2007), Meister (2009), Lounici and Nickl (2011), Kerkyacharian et al. (2011).
There are very few results in the multidimensional setting. It seems that Masry (1993) was the first paper where the deconvolution problem was studied for multivariate densities. It is worth noting that Masry (1993) considered more general weakly dependent observations and this paper formally does not deal with the minimax setting. However the results obtained in this paper could be formally compared with the estimation under -loss over the isotropic Hölder class of regularity , i.e. {\mathbb{N}}_{\infty,d}\big{(}\vec{2},1\big{)} which is exactly the same setting as in Yuana and Chenb (2002) in the case of partially contaminated observations. Let us also remark that there is no lower bound result in Masry (1993). The most developed results in the deconvolution model were obtained in Comte and Lacour (2013) and Rebelles (2016) and in Section 2 we will compare in detail our results with those obtained in these papers.
1.3 Lower bound for the minimax -risk
We have seen that the problem of optimal adaptation over the collection \big{\{}\mathbb{F}_{\vartheta},\vartheta\in\Theta\big{\}} is formulated as the ”attainability” of the family of minimax risks \big{\{}\phi_{n}(\mathbb{F}_{\vartheta}),\vartheta\in\Theta\big{\}} by a single estimator. Although it is not necessary, the following ”two-stage” approach is used for the majority of problems related to the minimax adaptive estimation. The first step consists in finding a lower bound for for any while the second one consists in constructing an estimator ”attaining”, at least asymptotically, this bound. We adopt this strategy in our investigations and below we present several lower bound results recently obtained in Lepski and Willer (2017).
1.3.1 Assumptions on the function imposed in Lepski and Willer (2017)
Let denote the set of all subsets of . Set and for any let denote the cardinality of while denotes its elements.
For any define the operator and let denote the identity operator. For any define \mathfrak{D}^{I,J}=\mathfrak{D}^{I}\big{(}\mathfrak{D}^{J}\big{)} and note that obviously .
Assumption 1** ().**
* exists for any and \;\sup_{J\in\mathfrak{J}^{*}}\big{\|}\mathfrak{D}^{J}\check{g}\big{\|}_{\infty}<\infty;*
Assumption 2** ().**
* exists for any and \sup_{J\in\mathfrak{J}^{*}}\big{\|}\check{g}^{-1}\mathfrak{D}^{J}\check{g}\big{\|}_{\infty}<\infty. Moreover, there exists and such that*
Assumption 3** ().**
* is a bounded function.*
Assumption 4** ().**
* exists for any and \;\sup_{I,J\in\mathfrak{J}}\big{\|}\mathfrak{D}^{I,J}\big{(}\check{g}\big{)}\big{\|}_{1}<\infty. Moreover*
[TABLE]
It is worth noting that all the bounds in Lepski and Willer (2017) are obtained under Assumptions 1 and 2. Assumption 3 is used when the estimation of unbounded functions is considered; we come back to this assumption in Section 2.4.2.
As to Assumption 4, it seems purely technical and does not appear in upper bound results. We also recall that the lower bounds in Lepski and Willer (2017) are proved under the condition: g\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}.
1.3.2 Some lower bounds from Lepski and Willer (2017)
Set , , , , and introduce for any , and the following quantities.
[TABLE]
Define for any and
[TABLE]
General case. Remind that , p^{*}=\big{[}\max_{l=1,\ldots,d}r_{l}\big{]}\vee p. Set
[TABLE]
Here and later we assume , which implies in particular that if and . Recall also that if . Put at last
[TABLE]
Theorem 1** (Lepski and Willer (2017)).**
Let and be fixed.
Then for any , , , and g\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}, satisfying Assumptions 1–4, there exists independent of such that
[TABLE]
where the infimum is taken over all possible estimators.
Following the terminology used in Lepski and Willer (2017), we will call the set of parameters satisfying the tail zone, satisfying the dense zone and satisfying the sparse zone. In its turn, the latter zone is divided into two sub-domains: the sparse zone 1 corresponding to and the sparse zone 2 corresponding to .
Bounded case. Introduce
[TABLE]
Theorem 2** (Lepski and Willer (2017)).**
Let and be fixed.
Then for any , , , and g\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}, satisfying Assumptions 1 and 2 there exists independent of such that
[TABLE]
where the infimum is taken over all possible estimators.
1.4 Assumptions on the function
The selection rule from the family of linear estimators, the -norm oracle inequalities obtained in Part I and all the adaptive results presented in the paper are established under the following condition imposed on the function .
Assumption 5**.**
(1) if then there exists such that
\big{|}1-\alpha+\alpha\check{g}(t)\big{|}\geq\varepsilon,\quad\forall t\in{\mathbb{R}}^{d};
(2) if then there exists and such that
Comparing this condition with Assumption 2 from Section 1.3.1, we can assert that both are coherent if . Indeed, in this case, we come the following assumption, which is well-known in the literature:
referred to as a moderately ill-posed statistical problem, cf. (1.4). In particular, the assumption is checked for the centered multivariate Laplace law.
Note first that Assumption 5 is in some sense weaker than Assumption 1 when , since it does not require regularity properties of the function . Moreover both assumptions are not too restrictive. They are verified for many distributions, including centered multivariate Laplace and Gaussian ones. Note also that Assumption 5 always holds with if . Additionally, it holds with if is a real positive function. The latter is true, in particular, for any probability law obtained by an even number of convolutions of a symmetric distribution with itself.
Next, our Assumption 5 is weaker than the conditions imposed in Hesse (1995) and Yuana and Chenb (2002). In these papers \check{g}\in\mathbb{C}^{(2)}\big{(}{\mathbb{R}}^{d}\big{)}, for any and
[TABLE]
2 Adaptive estimation over the scale of anisotropic Nikol’skii classes
We start this section by recalling the definition of the pointwise selection rule proposed in Part I.
2.1 Pointwise selection rule
Let be a continuous function belonging to {\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)} such that . Set {\cal H}=\big{\{}e^{k},\;k\in{\mathbb{Z}}\big{\}} and let {\cal H}^{d}=\big{\{}\vec{h}=(h_{1},\ldots,h_{d}):\;h_{j}\in{\cal H},j=1,\ldots,d\big{\}}. Recall that {\cal H}^{d}_{\text{isotr}}=\big{\{}\vec{h}\in{\cal H}^{d}:\;\vec{h}=(h,\ldots,h),\;h\in{\cal H}\big{\}}. Set and let for any
[TABLE]
Later on for any the operations and relations , , ,, , are understood in coordinate-wise sense. In particular means that for any .
For any let M\big{(}\cdot,\vec{h}\big{)} satisfy the operator equation
[TABLE]
Introduce for any and
[TABLE]
where M_{\infty}=\big{[}(2\pi)^{-d}\big{\{}\varepsilon^{-1}\big{\|}\check{K}\big{\|}_{1}\mathrm{1}_{\alpha\neq 1}+\Upsilon_{0}^{-1}\mathbf{k}_{1}\mathrm{1}_{\alpha=1}\big{\}}\big{]}\vee 1 and
[TABLE]
Let be an arbitrary subset of . For any and introduce
[TABLE]
and define \vec{\mathbf{h}}(x)=\arg\inf_{\vec{h}\in\mathbb{H}}\Big{[}\widehat{{\cal R}}_{\vec{h}}(x)+8\widehat{U}^{*}_{n}\big{(}x,\vec{h}\big{)}\Big{]}.
Our final estimator is and we will call (2.2) the pointwise selection rule.
Remark 1**.**
Note that the estimator depends on and later on we will consider two choices of the parameter set , namely and . So, to present our results we will write in order to underline the aforementioned dependence. The choice will be used when the adaptation is studied over anisotropic Nikol’skii classes while will be used when the considered scale consists of isotropic classes.
2.2 Anisotropic Nikol’skii classes
Let denote the canonical basis of . For some function and real number define the first order difference operator with step size in direction of the variable by
[TABLE]
By induction, the -th order difference operator with step size in direction of the variable is defined as
[TABLE]
Definition 1**.**
For given vectors and we say that a function belongs to the anisotropic Nikolskii class {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)} if
(i)* for all ;*
(ii)* for every there exists natural number such that*
[TABLE]
If and for any the corresponding Nikolskii class, denoted furthermore , is called isotropic.
2.3 Construction of kernel
First, we recall that all results concerning the risk of the pointwise selection rule, established in Part I, are proved under the following assumption imposed on the kernel .
Assumption 6**.**
There exist and such that
[TABLE]
Next, we will use the following specific kernel in the definition of the estimator’s family \big{\{}\widehat{f}_{\vec{\mathrm{h}}}(\cdot),\;\vec{\mathrm{h}}\in{\cal H}^{d}\big{\}} [see, e.g., Kerkyacharian et al. (2001) or Goldenshluger and Lepski (2014)].
Let be an integer number, and let be a compactly supported continuous function satisfying , and . Put
[TABLE]
and add the following structural condition to Assumption 6.
Assumption 7**.**
**
The kernel constructed in this way is bounded, compactly supported, belongs to and satisfies . Some examples of kernels satisfying simultaneously Assumptions 6 and 7 can be found for instance in Comte and Lacour (2013).
2.4 Main results
Introduce the following notations: and
[TABLE]
2.4.1 Bounded case
The first problem we address is the adaptive estimation over the collection of the functional classes \big{\{}{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R)\cap\mathbb{B}_{\infty,d}(Q)\big{\}}_{\vec{\beta},\vec{r},\vec{L},R,Q}.\;
As it was conjectured in Lepski and Willer (2017), the boundedness of the function belonging to {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R) is a minimal condition allowing to eliminate the inconsistency zone. The results obtained in Theorem 3 below together with those from Theorem 2 confirm this conjecture.
Theorem 3**.**
Let , and g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}, satisfying Assumption 5, be fixed. Let satisfy Assumptions 6 and 7.
1) Then for any , , , , , and there exists , independent of , such that:
[TABLE]
where is defined in (1.17).
2) For any , , , , and there exists , independent of , such that:
[TABLE]
Some remarks are in order. Our estimation procedure is completely data-driven, i.e. independent of , , and the assertions of Theorem 3 are completely new if . Comparing the results obtained in Theorems 2 and 3 we can assert that our estimator is optimally-adaptive if and nearly optimally adaptive if . The construction of an estimation procedure which would be optimally-adaptive when is an open problem, and we conjecture that the lower bounds for the asymptotics of the minimax risk found in Theorem 2 are sharp in order. This conjecture in the case is partially confirmed by the results obtained in Comte and Lacour (2013) and Rebelles (2016). Since both articles deal with the estimation of unbounded functions we will discuss them in the next section.
It is worth noting that all the previous statements are true not only for the convolution structure density model but also, in view of Theorem 2, for the observation scheme (1.3) as well.
We note that the asymptotic of the minimax risk under partially contaminated observations, , is independent of and coincides with the asymptotic of the risk in the direct observation model, . For the first time this phenomenon was discovered in Hesse (1995) and Yuana and Chenb (2002). In the very recent paper Lepski (2017), in the particular case , the optimally adaptive estimator was built. It is easy to check that independently of the value of and , the corresponding set of parameters belongs to the dense zone. Note however that our estimator is only optimally-adaptive in this zone, but it is applied to a much more general collection of functional classes. It is worth noting that the estimator procedure, used in Lepski (2017), has nothing in common with our pointwise selection rule.
As to the direct observation scheme, , our results coincide with those obtained recently in Goldenshluger and Lepski (2014), when . However, for the tail zone , our bound is slightly better since the bound obtained in the latter paper contains an additional factor . It is interesting to note that although both estimator constructions are based upon local selections from the family of kernel estimators, the selection rules are different.
Let us finally discuss the results corresponding to the tail zone, . First, the lower bound for the minimax risk is given by while the accuracy provided by our estimator is
[TABLE]
As we mentioned above, the passage from to seems to be an unavoidable payment for the application of a local selection scheme. It is interesting to note that the additional factor disappears in the dimension . First, note that if the one-dimensional setting was considered in Juditsky and Lambert–Lacroix (2004) and Reynaud–Bouret et al. (2011). The setting of Juditsky and Lambert–Lacroix (2004) corresponds to , while Reynaud–Bouret et al. (2011) deal with the case of and . Both settings rule out the sparse zone. The rates of convergence found in these papers are easily recovered from our results corresponding to the tail and dense zones.
Next, we remark that the aforementioned factor appears only when anisotropic functional classes are considered. Indeed, in view of the second assertion of Theorem 3 our estimator is nearly optimally adaptive on the tail zone in the isotropic case. The natural question arising in this context, is whether the -factor is an unavoidable payment for anisotropy of the underlying function or not?
At last, we note that in the isotropic case our results remain true when the corresponding Nikol’skii class is defined in -norm on (). It is worth noting that the analysis of the proof of the theorem allows us to assert that if , for some the first statement remains true up to some logarithmic factor. However the asymptotic of the maximal risk of our estimator if for any remains unknown.
We finish our discussion with the following remark. If the assumption implies in many cases that is uniformly bounded and, therefore, Theorem 3 is applicable. In particular it is always the case if the model (1.3) is considered. Indeed f,g\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)} in this case, which implies . Another case is and recall that this assumption was used in the proofs of Theorems 1 and 2, Assumption 3. We obviously have that
[TABLE]
More generally if and . Since the definition of the Nikol’skii class implies that , where and , the latter condition can be verified in particular if . All saying above explains why we study the estimation of unbounded functions only in the case .
2.4.2 Unbounded case,
The problem we address now is the adaptive estimation over the collection of functional classes \big{\{}{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g,\infty}(R,Q)\big{\}}_{\vec{\beta},\vec{r},\vec{L},R,Q}.\;
As we already mentioned, if additionally then for any and, therefore, in view of Theorem 1 discussed in Section 1.3, there is no consistent estimator if either or . Analyzing the proof of the latter theorem, we come to the following assertion.
Conjecture 1**.**
Let and assume that Assumption 4 is fulfilled. Suppose additionally that Assumption 2 holds with . Then, the assertion of Theorem 1 remains true if one replaces {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R) by {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g,\infty}(R,Q).
The latter result is formulated as a conjecture only because we will not prove it in the present paper. Its proof is postponed to Part III where the adaptive estimation over the collection
[TABLE]
introduced in Part I will be studied. For this reason, later on we will only consider the parameters belonging to the set defined below.
[TABLE]
For given and the latter set consists of the class parameters for which a uniform consistent estimation is possible.
Theorem 4**.**
Let and g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}, satisfying Assumption 5 be fixed and let satisfy Assumptions 6 and 7.
1) Then for any , , , \big{(}\vec{\beta},\vec{r})\in{\cal P}_{p,\vec{\mu}}\cap\big{\{}(0,\ell]^{d}\times(1,\infty]^{d}\big{\}} and there exists , independent of , such that:
[TABLE]
where is defined in (1.11).
2) For any , , , (\boldsymbol{\beta},\mathbf{r})\in{\cal P}_{p,\vec{\mu}}\cap\big{\{}(0,\ell]\times(1,\infty]\big{\}} and there exists , independent of , such that:
[TABLE]
Some remarks are in order.
Note that implies that and, therefore the Parseval identity together with Assumption 5 allows us to assert that
[TABLE]
Hence, the condition is automatically checked if and .
Also, it is worth noting that considering the adaptation over the collection of isotropic classes, we do not require that the coordinates of would be the same. The latter is true for the second assertion of Theorem 3 as well. At last, analyzing the proof of the theorem, we can assert that the second assertion remains true under the slightly weaker assumption .
The assertion of Theorem 1 has no analogue in the existing literature except the results obtained in Comte and Lacour (2013) and Rebelles (2016). Comte and Lacour (2013) deals with the particular case , while Rebelles (2016) studied the case , . It is easy to check that in both papers whatever the value of and , the corresponding set of parameters belongs to the dense zone. Note also that the estimation procedures used in Comte and Lacour (2013) as well as in Rebelles (2016), if , (both based on a global version of the Goldenshluger-Lepski method) are optimally-adaptive. They attain the asymptotic of minimax risks corresponding to the dense zone found in Theorem 1, while our method is only nearly optimally adaptive. However, it is well-known that the global selection from the family of standard kernel estimators leads to correct results only if when the -risk is considered, see, for instance Goldenshluger and Lepski (2011). On the other hand, estimation procedures based on a local selection scheme, which can be applied to the estimation of functions belonging to much more general functional classes, often do not lead to an optimally adaptive method. Fortunately, the loss of accuracy inherent to local procedures is logarithmic w.r.t. the number of observations.
Together with Theorems 1 and 2, Theorems 3 and 4 provide the full classification of the asymptotics of the minimax risks over anisotropic/isotropic Nikolskii classes for the class parameters belonging to the sparse zone and, up to some logarithmic factor, belonging to the tail and dense zones as well as the boundaries. We mean that the results of these theorems are valid for any fixed and . Indeed, for given and one can choose , and the number , used in the kernel construction (2.6), as any integer strictly larger than .
2.4.3 Open problems
Let us briefly discuss some unresolved adaptive estimation problems in the convolution structure density model.
Construction of an optimally-adaptive estimator
As we already mentioned the proposed pointwise selection rule leads to an optimal adaptive estimator only for the class parameters belonging to the sparse zone (in both bounded and unbounded case). We conjecture that the construction of an optimally-adaptive estimator for all values of the nuisance parameters via pointwise selection is impossible, and other methods should be invented. It is worth noting that no optimally-adaptive estimator is known neither in the density model nor in the density deconvolution even in dimension 1. In dimension larger than 1, one of the intriguing questions is related to the eventual price to pay for anisotropy (-factor) discussed in the remark after Theorem 3.
Adaptive estimation of unbounded functions
We were able to study the unbounded case only if . The estimation of unbounded densities under direct as well as partially contaminated observations remain open problems. We conjecture that the results obtained in the case are not true anymore for (neither upper bounds nor lower bound), but correct (or nearly correct) upper bounds for the asymptotics of the minimax risk can still be deduced from the oracle inequalities proved in Part I.
In the case there are at least two interesting problems. First, all our results are valid under the condition . How the absence of this assumption may have effects on the accuracy of estimation is absolutely unclear. Next, let us mention that the lower bound result proved in Theorem 1 holds only under the consideration of the convolution structure density model. Could the same bounds be established in the deconvolution model (1.3)?
Adjustment of ”lower” and ”upper bound” assumptions to each other
Comparing the assertions of Theorems 1 and 2 with those of Theorem 3 and 4, we remark that the obtention of the corresponding lower bounds for the minimax risk requires additional, rather restrictive, assumptions on the function . Can they be weakened or even removed?
3 Proof of Theorems 3 and 4
The proofs are based on the application of Theorem 3 from Part I and on some auxiliary assertions presented below.
In the subsequent proof , stand for constants that can depend on , , , , and , but are independent of and . These constants can be different on different appearances.
3.1 Important concepts from Part I and proof outline
In this section we recall the definition of some important quantities that appeared in Theorem 3 of Part I and discuss the facts which should be established to make this theorem applicable.
Theorem 3 (Part I) deals with the minimax result over a class being an arbitrary subset of defined in Section 2.3 of Part I. In Theorem 3 we will consider \mathbb{F}={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{B}_{\mathbf{\infty},d}(Q) and, therefore, with . This makes Theorem 3 (Part I) with applicable in this case.
In Theorem 4 we consider \mathbb{F}={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g,\mathbf{\infty}}(Q). We will show that for any and one can find and such that {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\subset\mathbb{B}_{\mathbf{q},d}(D) and, therefore, Theorem 3 (Part I) is applicable with . The latter inclusions are mostly based on the embedding of anisotropic Nikol’skii spaces used in the proof of Proposition 3 and on Lemma 1.
The application of Theorem 3 (Part I) in the case requires to compute
[TABLE]
where remind F_{n}\big{(}\vec{h}\big{)}=\big{(}\ln{n}+\sum_{j=1}^{d}|\ln{h_{j}|}\big{)}^{1/2}\prod_{j=1}^{d}(nh)^{-\frac{1}{2}}_{j}(h_{j}\wedge 1)^{-\boldsymbol{\mu}_{j}(\alpha)} and is a universal constant completely determined by the kernel and the dimension .
In the next section we propose quite sophisticated constructions of vectors and , and show, Propositions 1 and 2, that
[TABLE]
Here is defined in (3.19), are defined in (3.20) and , where are defined in (3.22) and is given in (3.23). In Proposition 3 we prove that for any
[TABLE]
and if then additionally
[TABLE]
where and are defined in (3.16) below and is independent of . At last the definition of and , together with (3.2) allows us to assert, see (3.31), that
[TABLE]
where . Thus, putting
[TABLE]
we obtain in view of (3.1), (3.2) and (3.4) that
[TABLE]
To get (3.6) we have used that for all large enough and all
F_{n}\big{(}\vec{\boldsymbol{h}}(v,\mathbf{1}))\big{)}\leq C_{2}(\ln{n}/n)\prod_{j=1}^{d}(\boldsymbol{h}_{j}(v,\mathbf{1}))^{-1-2\boldsymbol{\mu}_{j}(\alpha)},
where is independent of . This follows from assertions (4.1) and (4.3) established in the proof of Proposition 1. We deduce from (3.5) and (3.6), the following bound.
[TABLE]
Moreover, if we get in view of (3.1), (3.3) and (3.4)
[TABLE]
3.2 Special set of bandwidths
The bandwidth’s construction presented below as well as auxiliary statements from the next section will be exploited not only for proving Theorems 3 and 4, but also in the consideration forming Part III of this work. By this reason we formulate them in a bit more general form than what is needed for our current purposes. Set for any
Recall that \mathbf{c}=\big{(}20d\big{)}^{-1}\big{[}\max(2c_{{\cal K}_{\ell}}\|{\cal K}_{\ell}\|_{\infty},\|{\cal K}_{\ell}\|_{1})\big{]}^{-d} and let be any number satisfying (recall that appeared in (3.2))
[TABLE]
Recall that and introduce for any , and
[TABLE]
where we have put , is complimentary to and
[TABLE]
The constant will be chosen differently in accordance with some special relationships between the parameters , , , and . Determine and , from the relations
[TABLE]
and set \vec{\boldsymbol{h}}(\cdot,\mathbf{s})=\big{(}\boldsymbol{h}_{1}(\cdot,\mathbf{s}),\ldots,\boldsymbol{h}_{d}(\cdot,\mathbf{s})\big{)} and \vec{\mathfrak{h}}(\cdot,\mathbf{s})=\big{(}\mathfrak{h}_{1}(\cdot,\mathbf{s}),\ldots,\mathfrak{h}_{d}(\cdot,\mathbf{s})\big{)}.
3.3 Auxiliary statements
All the results formulated below are proved in Section 4. Let
\mathfrak{z}(v)=2\big{(}\mathfrak{a}^{-2}\delta_{n}\big{)}^{-\frac{\omega(\alpha)}{\omega(\alpha)+\mathbf{u}}}v^{\frac{\omega(\alpha)(2+1/\beta(\alpha))}{\mathbf{u}+\omega(\alpha)}},\quad\mathbf{u}\in[1,\infty],
and remark that if . Note also that
[TABLE]
Introduce the following notations: ,
[TABLE]
Recall that and define
[TABLE]
Set if and let if . Put finally .
Proposition 1**.**
Let , , , , and be given. Assume that . Then,
1) there exists independent of such that for all large enough
[TABLE]
2) there exists independent of and such that for all large enough
[TABLE]
if either or , .
Remark 2**.**
Note that if , the condition simply means , since . On the other hand if this condition holds if whatever the values of and , since . Also, note that
[TABLE]
Indeed, since for any we have
[TABLE]
and (3.21) follows. To get the last inequality we have used that and that is strictly decreasing, so . In particular we deduce from (3.21) that the condition is always fulfilled in the case .
Recall that is defined in (3.19) and introduce the following quantities.
[TABLE]
where Define also
[TABLE]
Note that , if (it will be proved in Proposition 2 below). However if . As it is shown in the proof of Proposition 1, formulae (4.11), for all large enough. Also , if . At last , since . Moreover if . Introduce finally
[TABLE]
Proposition 2**.**
Let , , , , and be given and let , . Then, there exists independent of and such that for all large enough
[TABLE]
In the current paper we will use the statements of Proposition 1 and 2 only with . In this context we remark that .
Proposition 3**.**
Let , and satisfying Assumption 7 be fixed. Then for any , and one can find independent of such that (3.2) holds. If additionally then (3.3) is fulfilled as well. At last, (3.2) and (3.3) remain true if one replaces the quantity by .
The quantities and are introduced in Part I but the reader can find them in the proof of the proposition. Let us also present the following auxiliary results which will be useful in the sequel. Their proofs are postponed to Appendix.
Lemma 1**.**
For any
[TABLE]
Let and . Then there exists such that
[TABLE]
We finish this section with the following observations which will be useful in the sequel.
If one has
[TABLE]
If one has
[TABLE]
3.4 Concluding remarks
Let us collect some bounds for several terms appearing in Theorem 3 (Part I) and used in the proofs of Theorems 3 and 4 simultaneously.
First we remark that \boldsymbol{h}_{j}(\cdot,\mathbf{1})\equiv\boldsymbol{h}_{j}(\cdot,\mathbf{\infty})\equiv\mathfrak{h}_{j}(\cdot,\mathbf{\infty})\leq\big{(}\boldsymbol{L}L_{j}^{-1}\big{)}^{\frac{1}{\beta_{j}}}, . Then, (3.4) follows from (3.2) and (3.9) because for any and
[TABLE]
We deduce from the definition of that
[TABLE]
It yields together with (3.7) and the definitions of and , choosing ,
[TABLE]
After elementary computations and taking into account (3.28), we obtain
[TABLE]
These bounds are not surprising because if . At last, if , we get from (3.8) thanks to the definition of and the presentation proved in (4.6) with
[TABLE]
At last, choosing , we obtain \ell_{\mathbb{H}}(\underline{\boldsymbol{v}})\leq c_{6}\delta_{n}^{\frac{p-1}{1-1/\omega(\alpha)+1/\beta(\alpha)}}\big{(}\ln{n}\big{)}^{t(\mathbb{H})}, which yields by (3.28), (3.29) and (3.30):
[TABLE]
3.5 Proof of Theorem 3
As it has already been mentioned we will apply Theorem 3 (Part I) with , , and .
Consider the cases or .
Choose and remark that the statements of Propositions 1 and 2 hold for any . Indeed, it suffices to note that , because and if since in this case by (3.25). Then we can apply all the bounds obtained above, and in particular we get from (3.5)
[TABLE]
since in both considered cases in view of the second equality in (3.28) and of (3.30). Applying the third assertion of Theorem 3 (Part I), we obtain from (3.4), (3.33), (3.36) and (3.35)
\displaystyle{\sup_{f\in{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R)}}{\cal R}^{(p)}_{n}[\widehat{f}_{\vec{\mathbf{h}}(\cdot)},f]\leq C\bigg{[}(c_{2}+c_{3}+c_{5}+c_{6})\mathfrak{b}^{p}_{n}(\mathbb{H})\delta_{n}^{p\rho(\alpha)}\bigg{]}^{\frac{1}{p}}\leq c_{7}\mathfrak{b}_{n}(\mathbb{H})\delta_{n}^{\rho(\alpha)},
and the assertion of Theorem 3 follows in both considered cases.
Consider the case .
Choose and remark that the statements of Propositions 1 and 2 hold hold for any . Indeed, implies and, therefore, . We deduce from (3.4), (3.33), (3.34) and (3.35), applying the first assertion of Theorem 3 (Part I) that
[TABLE]
Here we have also used (3.30). This completes the proof of Theorem 3.
3.6 Proof of Theorem 4
In the following we assume , since implies by definition of the anisotropic Nikol’skii class that {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\subset\mathbb{B}_{\infty,d}(L_{\infty}). Hence, the results in that case follow from Theorem 3 since when .
Moreover, we remark that the imposed condition implies in view of (3.21) proved in Remark 2. This, first, makes the second assertion of Proposition 1 applicable.
Next, it allows (recall that and ) to rewrite appeared in Proposition 2 as
Consider the case .
Taking into account that we remark that in view of Nikol’skii (1977) [Theorem 6.9.1, Section 6.9] {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\subset\mathbb{B}_{p^{*},d}(c_{9}L_{\infty}), where is independent of . Thus, Theorem 3 (Part I) is applicable with , and . Choose and remark that the statements of Propositions 1 and 2 hold since . The assertion of the theorem is obtained from (3.4), (3.33), (3.34), (3.35), (3.29) and the first assertion of Theorem 3 (Part I) by the same computations that led to (3.37).
Consider the case . Recall that in this case because it is necessary for the existence of an uniformly consistent estimator. Since the definition of the anisotropic Nikol’skii class implies that {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\subset\mathbb{B}_{p^{*},d}(L_{\infty}), we assert that the second assertion of Theorem 3 (Part I) is applicable with , and . Choose and note that in the considered case. Thus, we deduce from (3.4), (3.33), (3.35) and (3.29)
[TABLE]
and the assertion of the theorem follows in this case.
It remains to study the case . Let be an arbitrary number satisfying (3.27) of Lemma 1. Since and we can assert in view of Nikol’skii (1977) [Theorem 6.9.1, Section 6.9] {\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\subset\mathbb{B}_{s,d}(c_{9}L_{\infty}), where is independent of . Thus, Theorem 3 (Part) is applicable with , and . Choosing , we deduce from (3.4), (3.33), (3.35) and from the second assertion of Theorem 3 (Part 2)
[TABLE]
Since either , or , and we get
[TABLE]
Simple algebra shows that
[TABLE]
Using again and we obtain
[TABLE]
since satisfies (3.27) of Lemma 1. Thus, we have for all large enough
and the assertion of the theorem in the case follows from (3.38) and the first equality in (3.28). Theorem 4 is proved.
4 Proofs of Propositions 1, 2 and 3
The proof of Lemma 2 is postponed to Appendix.
Lemma 2**.**
For any , , , and the following is true.
1/\gamma(\alpha)-1/\beta(\alpha)=\big{[}\tau(\infty)\beta(0)\big{]}^{-1}\big{[}1/\omega(\alpha)-1/\upsilon(\alpha)\big{]}.
4.1 Proof of Proposition 1
We start the proof with several remarks which will be useful in the sequel. First, obviously there exists 0<\mathbf{T}:=T\big{(}\vec{\beta},\vec{r},\vec{\mu},p\big{)}<\infty independent of such that
[TABLE]
Next, for any and any
[TABLE]
- Let us proceed to the proof of the first assertion. First we remark that for all
[TABLE]
Indeed for any we have since ,
[TABLE]
Therefore, for any one has in view of the definition of
[TABLE]
Note that for any
[TABLE]
and the proof of (4.3) is completed since by construction.
Set T_{0}=\big{[}\mathbf{T}+2\big{]}\;e^{d+2\sum_{j=1}^{d}\mu_{j}(\alpha)}\boldsymbol{L}^{-\frac{1}{\beta(\alpha)}} and remark that in view of (4.1), (4.2) and (4.3) for all large enough and any
[TABLE]
Here we have taken into account that . Since
[TABLE]
denoting we assert that
[TABLE]
The first assertion is established.
- Before proving the second assertion, let us make several remarks.
For any the following is true.
[TABLE]
The first equality follows directly from the definition of since, remind if . Thus, let us prove the second equality. We have
[TABLE]
Here we have used that for any . Using the definition of we get
[TABLE]
Using the definition of we obtain
[TABLE]
We obtain applying Lemma 2
[TABLE]
The second formula in (4.6) is established.
Next, let us prove that
[TABLE]
If , which is equivalent to , the definition of implies that for all large enough, since and in view of (3.25). We deduce from the first equality in (4.6)
[TABLE]
and (4.7) is proved for any .
It remains to note that since and therefore, if we have
[TABLE]
for all large enough in view of (3.25) of Lemma 1, the second equality in (4.6) and since . This completes the proof of (4.7).
For any one has
[TABLE]
where we have denoted T\big{(}\alpha\big{)}=\inf_{\vec{L}\in[L_{0},L_{\infty}]^{d}}\prod_{j\in{\cal J}_{\infty}}\big{(}\boldsymbol{L}L_{j}^{-1}\big{)}^{\frac{1+2\boldsymbol{\mu}_{j}(\alpha)}{\beta_{j}}}\prod_{j\in\bar{{\cal J}}_{\infty}}\big{(}\boldsymbol{L}L_{j}^{-1}\big{)}^{\frac{1+2\boldsymbol{\mu}_{j}(\alpha)}{\gamma_{j}}}.
Indeed, we have in view of (4.6) and the definition of
[TABLE]
where we have put Note that for any
[TABLE]
and (4.8) and (4.9) are established.
Simple algebra shows that for any
[TABLE]
and we deduce from (4.8) for any (recall that if )
[TABLE]
Let us also prove that for any and all large enough
[TABLE]
The latter inclusion follows from (3.19). Indeed, if then . If
[TABLE]
in view of (3.25), so . Note at last that for any
[TABLE]
Let us proceed to the proof of the second assertion. Let us choose . We have in view of (4.1), (4.8) and (4.10) similarly to (4.5)
[TABLE]
Thus to prove the assertion all we need to show is that , i.e. G_{n}\big{(}\vec{\mathfrak{h}}(\mathbf{v},\mathbf{u})\big{)}\leq a\mathbf{v}. Let us distinguish three cases.
Let We remark that the definition of in this case yields for all large enough and we obtain from (4.10) and (4.11) that
[TABLE]
Then we have in view of (4.1), (4.7), (4.8) and (4.14) similarly to (4.5)
[TABLE]
Let and . Then by assumption , and thus . We get from (4.10) and (4.12)
[TABLE]
so G_{n}\big{(}\vec{\mathfrak{h}}(\mathbf{v},\mathbf{u})\big{)}\leq a\mathbf{v} follows from (4.15) and (4.13).
Let , . We have as previously
[TABLE]
Here we have used (4.10) and put . Our goal now is to show that for any and all large enough
[TABLE]
In view of (4.9) and of the definition of in order to establish (4.18) it suffices to show that .
Since we assumed and , then necessarily since and is strictly decreasing. Hence, the required results follows from (3.26). Thus, (4.18) is proved. Then choosing such that , we obtain from (4.17) and (4.18) that for all all large enough
[TABLE]
The second assertion is proved
4.2 Proof of Proposition 2
We start the proof with several remarks which will be useful in the sequel.
Let us show that for all large enough
[TABLE]
In view of the definition of ,
[TABLE]
Therefore, for any one has, taking into account that ,
[TABLE]
It remains to note that for all large enough and, therefore,
[TABLE]
We also have in view of the definition of ,
[TABLE]
for any . This together with (4.21) proves (4.19) in the cases when .
Noting that is equivalent to , we deduce from (4.20) for any
[TABLE]
Thus, if then for any
[TABLE]
This together with (4.21) yields (4.19) in the case , whatever the value of .
Let .
Then and we have for any and in view of the definition of
[TABLE]
in view of (3.25). Hence, (4.19) holds in this case.
Let .
Then and we have for any in view of the definition of
[TABLE]
and, therefore (4.19) holds in this case.
Let . First we note that and imply
[TABLE]
since either or and \tau\big{(}\mathbf{u}^{*}\vee p^{*}\big{)}=0. Thus and, therefore, for any
[TABLE]
Note that 1-\mathbf{u}/\omega(0)+1/\beta(0)=\varkappa_{0}(p^{*},\mathbf{u})\big{[}1/\mathbf{u}+1/\omega(0)\big{]}-(\mathbf{u}-p^{*})\big{[}1/\mathbf{u}+1/\omega(0)\big{]} and, therefore
[TABLE]
which yields \mathbf{v}_{1}^{-\varkappa_{0}(p^{*},\mathbf{u})}\leq\big{\{}\mathfrak{a}^{-2}\delta_{n}\big{\}}^{-\frac{\mathbf{u}\omega(0)}{\mathbf{u}+\omega(0)}}.
It remains to note that if then and, therefore . It implies and and this case has been already treated. This completes the proof of (4.19).
Remark that there obviously exists 0<\mathbf{S}:=S\big{(}\vec{\beta},\vec{r},\vec{\mu},p\big{)}<\infty independent of such that
[TABLE]
Hence, in view of (4.19) one has for all large enough and
[TABLE]
Taking into account that and setting S_{0}=\big{[}\mathbf{S}+2\big{]}\;e^{d+2\sum_{j=1}^{d}\mu_{j}}\boldsymbol{L}^{-\frac{1}{\beta(1)}} we obtain from (4.2) for any and
[TABLE]
From now on we choose . It yields in view of (4.22) and (4.23)
[TABLE]
Since (4.24) holds, to finish the proof of Proposition (2) all we need to show is that G_{n}\big{(}\vec{\mathfrak{h}}(\mathbf{v},\mathbf{u})\big{)}\leq a\mathbf{v},\;\;\forall v\in{\cal I}_{\mathbf{u}}(\alpha). Let us distinguish three cases.
Let or . First we note that in these cases . Next in view of the second inequality in (4.22), (4.19), (4.23) and (4.24) we obtain
[TABLE]
To get the last inequality we have used that , and .
Let . We have in view of the second inequality in (4.22) and (4.23)
[TABLE]
For any , simple algebra shows that v\mathfrak{z}^{-1}(v)=\big{\{}\mathfrak{a}^{-2}\delta_{n}\big{\}}^{\frac{\mathbf{u}\omega(0)}{\mathbf{u}+\omega(0)}}v^{\frac{\mathbf{u}-\omega(0)-\omega(0)/\beta(0)}{\mathbf{u}+\omega(0)}}, and since , which implies , the result follows from
[TABLE]
Let . We have in view of the second inequality in (4.22) and (4.23)
[TABLE]
where we have denoted .
Our goal now is to show that for all large enough
[TABLE]
We easily compute for any
[TABLE]
Denoting the right hand side of the obtained inequality by we obviously have
[TABLE]
where . Remarking that we easily compute that for any
[TABLE]
Moreover we obviously have
[TABLE]
Consider the case . Here .
If then and we deduce from (4.31)
[TABLE]
thanks to (4.30). If the definition of implies that
[TABLE]
Both last results together with (4.29) and (4.30) prove (4.27) in the case .
Consider the case . Here .
If then . Moreover since if and if . Hence in view of (3.26) of Lemma 1
[TABLE]
We have in view of the definition of
[TABLE]
Note that,
[TABLE]
To get the last inequality we have used that
[TABLE]
Thus, we conclude that which together with (4.30) implies (4.27) in the considered case.
If then . Moreover . We have in view of the definition of
[TABLE]
After routine computations we come to the following equality
[TABLE]
Hence, for all large enough, which together with (4.30) allows us to assert (4.27) in the considered case.
Consider the case . Here .
If the required result follows from (4.32).
If then by (4.31) is strictly increasing and, therefore,
[TABLE]
in view of (4.33). This completes the proof (4.27).
Finally to conclude in the case , choosing , we deduce from (4.26) and (4.27) that for all large enough
G_{n}\big{(}\vec{\boldsymbol{h}}(v,\mathbf{u})\big{)}\leq\sqrt{S_{1}}\mathfrak{a}av\leq av,\quad v\in{\cal I}_{\mathbf{u}}(1).
4.3 Proof of Proposition 3
In view of Lemma 5 in Lepski (2015), if then
[TABLE]
where is independent on . Note also that for any .
Let \big{(}\vec{\pi},\vec{s}\big{)} be either \big{(}\vec{\beta},\vec{r}\big{)} or \big{(}\vec{\gamma},\vec{q}\big{)} and without further mentioning the couple \big{(}\vec{\gamma},\vec{q}\big{)} is used below under the condition . We obviously have for any
[TABLE]
For we have
[TABLE]
The last equality follows from the definition of the -th order difference operator (2.4). Hence, for any we have in view of the definition of the Nikol’skii class (remind that )
[TABLE]
This yields for any
[TABLE]
and the first and the second assertions of the proposition are proved for any .
Let . Choosing from the relation (recall that ), we have for any
[TABLE]
We have in view of monotone convergence theorem and the triangle inequality
[TABLE]
By the Minkowski inequality for integrals [see, e.g., (Folland, 1999, Section 6.3)], we obtain
[TABLE]
Taking into account that f\in{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)} and (4.36), we have for any
[TABLE]
This proves the first and the second assertions of the proposition for any .
Set \mathbb{F}={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)} and recall that
\mathbf{B}^{*}_{j,s_{j},\mathbb{F}}(\mathbf{h}):=\displaystyle{\sup_{f\in\mathbb{F}}\sum_{h\in{\cal H}:\>h\leq\mathbf{h}}}\bigg{\|}\int_{{\mathbb{R}}}{\cal K}_{\ell}(u)\big{[}f\big{(}x+uh\mathbf{e}_{j}\big{)}-f(x)\big{]}\nu_{1}({\rm d}u)\bigg{\|}_{s_{j}}\leq\sup_{f\in\mathbb{F}}\sum_{h\in{\cal H}:\>h\leq\mathbf{h}}\big{\|}b_{h,f,j}\big{\|}_{s_{j}}.
Hence, the third assertion follows from (4.37) and (4.38).
5 Appendix
5.1 Proof of Lemma 1
Note that
[TABLE]
and (3.25) follows. On the other hand we have
[TABLE]
and (3.26) is checked if since . If and then we note first that necessarily since and is strictly decreasing. Hence and
[TABLE]
and (3.26) is established.
Let us prove (3.27). First we note that (3.27) is obvious if because in this case for any . Thus, from now on we will assume that .
Next, if then (3.27) holds. Indeed, in this case implies . Hence any number from the interval \big{(}p^{*}\vee(X+1)/Y,\mathbf{u}^{*}\big{)} satisfies (3.27). At last, note that if we have
[TABLE]
since in view of for any . The obtained contradiction completes the proof of (3.27).
5.2 Proof of Lemma 2
Indeed,
[TABLE]
Moreover, in view of the latter inequality
[TABLE]
It remains to note that 1-\big{[}\tau(p_{\pm})\beta(0)p_{\pm}\big{]}^{-1}=\tau(\infty)/\tau(p_{\pm}) and the lemma follows.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1)
- 2Akakpo (2012) Akakpo, N. (2012). Adaptation to anisotropy and inhomogeneity via dyadic piecewise polynomial selection. Math. Methods Statist. 21 , 1–28.
- 3Birgé (2008) Birgé, L. (2008). Model selection for density estimation with 𝕃 2 subscript 𝕃 2 {\mathbb{L}}_{2} –loss. ar Xiv:0808.1416 v 2, http://arxiv.org
- 4Comte and al. (2006) Comte, F., Rozenholc, Y. and Taupin, M.-L. (2006). Penalized contrast estimator for adaptive density deconvolution. Canad. J. Statist. , 34 , 3, 431–452.
- 5Comte and Lacour (2013) Comte, F. and Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. Ann. Inst. H. Poincaré Probab. Statist. 49 , 2, 569–609.
- 6Fan (1991) Fan, J. (1991). On the optimal rates of convergence for nonparametric deconvolution problems. Ann. Statist. , 19 , 3, 1257–1272.
- 7Fan (1993) Fan, J. (1993). Adaptively local one-dimensional subproblems with application to a deconvolution problem. Ann. Statist. , 21 , 2, 600–610.
- 8Fan and Koo (2002) Fan, J. and Koo, J. (2002). Wavelet deconvolution. IEEE Trans. Inform. Theory , 48 , 734–747.
