Estimation in the convolution structure density model. Part I: oracle   inequalities

Oleg Lepski; Thomas Willer

arXiv:1704.04418·math.ST·April 17, 2017

Estimation in the convolution structure density model. Part I: oracle inequalities

Oleg Lepski, Thomas Willer

PDF

Open Access

TL;DR

This paper develops a new pointwise selection rule for kernel estimators in the convolution structure density model, establishing oracle inequalities and demonstrating near-optimal adaptive minimax estimation under $L_p$-loss.

Contribution

It introduces a novel pointwise selection rule for kernel estimators in the convolution structure density model, with proven oracle inequalities and adaptive minimax optimality results.

Findings

01

Established $L_p$-norm oracle inequalities for the selected estimator.

02

Proved the proposed selection rule yields nearly optimal adaptive estimators.

03

Fully characterized the minimax risk behavior over anisotropic Nikol'skii classes.

Abstract

We study the problem of nonparametric estimation under $\bL_{p}$ -loss, $p \in [1, \infty)$ , in the framework of the convolution structure density model on $\bR^{d}$ . This observation scheme is a generalization of two classical statistical models, namely density estimation under direct and indirect observations. In Part I the original pointwise selection rule from a family of "kernel-type" estimators is proposed. For the selected estimator, we prove an $\bL_{p}$ -norm oracle inequality and several of its consequences. In Part II the problem of adaptive minimax estimation under $\bL_{p}$ --loss over the scale of anisotropic Nikol'skii classes is addressed. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. We prove that the selection rule proposed in Part I leads…

Equations433

p = (1 - α) f + α [f ⋆ g], f \in F_{g} (R), α \in [0, 1],

p = (1 - α) f + α [f ⋆ g], f \in F_{g} (R), α \in [0, 1],

\big{[}f\star g\big{]}(x)=\int_{{\mathbb{R}}^{d}}f(x-z)g(z)\nu_{d}({\rm d}z),\;\;x\in{\mathbb{R}}^{d},

\big{[}f\star g\big{]}(x)=\int_{{\mathbb{R}}^{d}}f(x-z)g(z)\nu_{d}({\rm d}z),\;\;x\in{\mathbb{R}}^{d},

\mathbb{F}_{g}(R)=\Big{\{}f\in\mathbb{B}_{1,d}(R):\;(1-\alpha)f+\alpha[f\star g]\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}\Big{\}}.

\mathbb{F}_{g}(R)=\Big{\{}f\in\mathbb{B}_{1,d}(R):\;(1-\alpha)f+\alpha[f\star g]\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}\Big{\}}.

Z_{i} = X_{i} + ϵ_{i} Y_{i}, i = 1, \dots, n,

Z_{i} = X_{i} + ϵ_{i} Y_{i}, i = 1, \dots, n,

{\cal R}^{(p)}_{n}[\hat{f},f]:=\Big{(}\mathbb{E}_{f}\|\hat{f}-f\|_{p}^{p}\Big{)}^{1/p},\;p\in[1,\infty),

{\cal R}^{(p)}_{n}[\hat{f},f]:=\Big{(}\mathbb{E}_{f}\|\hat{f}-f\|_{p}^{p}\Big{)}^{1/p},\;p\in[1,\infty),

{\cal R}^{(p)}_{n}\big{[}\hat{f}_{\hat{\mathfrak{t}}(\cdot)};f\big{]}\leq C_{1}\Big{\|}\inf_{\mathfrak{t}\in\mathfrak{T}}A_{n}\left(f,\mathfrak{t},\cdot\right)\Big{\|}_{p}+C_{2}n^{-\frac{1}{2}},\quad\forall f\in{\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}.

{\cal R}^{(p)}_{n}\big{[}\hat{f}_{\hat{\mathfrak{t}}(\cdot)};f\big{]}\leq C_{1}\Big{\|}\inf_{\mathfrak{t}\in\mathfrak{T}}A_{n}\left(f,\mathfrak{t},\cdot\right)\Big{\|}_{p}+C_{2}n^{-\frac{1}{2}},\quad\forall f\in{\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}.

\phi_{n}(\mathbb{F}):=\inf_{\tilde{f}_{n}}{\cal R}^{(p)}_{n}\big{[}\tilde{f}_{n};\mathbb{F}\big{]}.

\phi_{n}(\mathbb{F}):=\inf_{\tilde{f}_{n}}{\cal R}^{(p)}_{n}\big{[}\tilde{f}_{n};\mathbb{F}\big{]}.

\limsup_{n\to\infty}\phi^{-1}_{n}(\mathbb{F}_{\vartheta}){\cal R}^{(p)}_{n}\big{[}\hat{f}_{n};\mathbb{F}_{\vartheta}\big{]}<\infty,\;\;\forall\vartheta\in\Theta?

\limsup_{n\to\infty}\phi^{-1}_{n}(\mathbb{F}_{\vartheta}){\cal R}^{(p)}_{n}\big{[}\hat{f}_{n};\mathbb{F}_{\vartheta}\big{]}<\infty,\;\;\forall\vartheta\in\Theta?

R_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}=\sup_{f\in\mathbb{F}_{\vartheta}}\Big{\|}\inf_{\mathfrak{t}\in\mathfrak{T}}A_{n}\left(f,\mathfrak{t},\cdot\right)\Big{\|}_{p}+n^{-\frac{1}{2}},\quad\vartheta\in\Theta.

R_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}=\sup_{f\in\mathbb{F}_{\vartheta}}\Big{\|}\inf_{\mathfrak{t}\in\mathfrak{T}}A_{n}\left(f,\mathfrak{t},\cdot\right)\Big{\|}_{p}+n^{-\frac{1}{2}},\quad\vartheta\in\Theta.

\limsup_{n\to\infty}R^{-1}_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}{\cal R}^{(p)}_{n}\big{[}\hat{f}_{\hat{\mathfrak{t}}(\cdot)};\mathbb{F}_{\vartheta}\big{]}<\infty.

\limsup_{n\to\infty}R^{-1}_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}{\cal R}^{(p)}_{n}\big{[}\hat{f}_{\hat{\mathfrak{t}}(\cdot)};\mathbb{F}_{\vartheta}\big{]}<\infty.

\liminf_{n\to\infty}R_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}\phi^{-1}_{n}(\mathbb{F}_{\vartheta})<\infty,

\liminf_{n\to\infty}R_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}\phi^{-1}_{n}(\mathbb{F}_{\vartheta})<\infty,

\mathbb{F}_{\vartheta}={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R),\;\;\vartheta=\big{(}\vec{\beta},\vec{r},\vec{L},R\big{)},

\mathbb{F}_{\vartheta}={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R),\;\;\vartheta=\big{(}\vec{\beta},\vec{r},\vec{L},R\big{)},

\big{|}1-\alpha+\alpha\check{g}(t)\big{|}\geq\varepsilon,\quad\forall t\in{\mathbb{R}}^{d};

\big{|}1-\alpha+\alpha\check{g}(t)\big{|}\geq\varepsilon,\quad\forall t\in{\mathbb{R}}^{d};

∣ \overset{g}{ˇ} (t) ∣ \geq Υ_{0} j = 1 \prod d (1 + t_{j}^{2})^{- \frac{μ _{j}}{2}}, \forall t = (t_{1}, \dots, t_{d}) \in R^{d} .

∣ \overset{g}{ˇ} (t) ∣ \geq Υ_{0} j = 1 \prod d (1 + t_{j}^{2})^{- \frac{μ _{j}}{2}}, \forall t = (t_{1}, \dots, t_{d}) \in R^{d} .

Υ_{0} j = 1 \prod d (1 + t_{j}^{2})^{- \frac{μ _{j}}{2}} \leq ∣ \overset{g}{ˇ} (t) ∣ \leq Υ j = 1 \prod d (1 + t_{j}^{2})^{- \frac{μ _{j}}{2}}, \forall t \in R^{d} .

Υ_{0} j = 1 \prod d (1 + t_{j}^{2})^{- \frac{μ _{j}}{2}} \leq ∣ \overset{g}{ˇ} (t) ∣ \leq Υ j = 1 \prod d (1 + t_{j}^{2})^{- \frac{μ _{j}}{2}}, \forall t \in R^{d} .

\displaystyle\int_{{\mathbb{R}}^{d}}\big{|}\check{K}(t)\big{|}\prod_{j=1}^{d}(1+t^{2}_{j})^{\frac{\boldsymbol{\mu}_{j}(\alpha)}{2}}{\rm d}t\leq\mathbf{k}_{1},\quad\int_{{\mathbb{R}}^{d}}\big{|}\check{K}(t)\big{|}^{2}\prod_{j=1}^{d}(1+t^{2}_{j})^{\boldsymbol{\mu}_{j}(\alpha)}{\rm d}t\leq\mathbf{k}^{2}_{2}.

\displaystyle\int_{{\mathbb{R}}^{d}}\big{|}\check{K}(t)\big{|}\prod_{j=1}^{d}(1+t^{2}_{j})^{\frac{\boldsymbol{\mu}_{j}(\alpha)}{2}}{\rm d}t\leq\mathbf{k}_{1},\quad\int_{{\mathbb{R}}^{d}}\big{|}\check{K}(t)\big{|}^{2}\prod_{j=1}^{d}(1+t^{2}_{j})^{\boldsymbol{\mu}_{j}(\alpha)}{\rm d}t\leq\mathbf{k}^{2}_{2}.

K_{\vec{h}}(t)=V^{-1}_{\vec{h}}K\big{(}t_{1}/h_{1},\ldots,t_{d}/h_{d}\big{)},\;t\in{\mathbb{R}}^{d},\quad V_{\vec{h}}=\prod_{j=1}^{d}h_{j}.

K_{\vec{h}}(t)=V^{-1}_{\vec{h}}K\big{(}t_{1}/h_{1},\ldots,t_{d}/h_{d}\big{)},\;t\in{\mathbb{R}}^{d},\quad V_{\vec{h}}=\prod_{j=1}^{d}h_{j}.

\displaystyle K_{\vec{h}}(y)=(1-\alpha)M\big{(}y,\vec{h}\big{)}+\alpha\int_{{\mathbb{R}}^{d}}g(t-y)M\big{(}t,\vec{h}\big{)}{\rm d}t,\quad y\in{\mathbb{R}}^{d}.

\displaystyle K_{\vec{h}}(y)=(1-\alpha)M\big{(}y,\vec{h}\big{)}+\alpha\int_{{\mathbb{R}}^{d}}g(t-y)M\big{(}t,\vec{h}\big{)}{\rm d}t,\quad y\in{\mathbb{R}}^{d}.

\displaystyle\widehat{U}_{n}\big{(}x,\vec{\mathrm{h}}\big{)}=\sqrt{\frac{2\lambda_{n}\big{(}\vec{\mathrm{h}}\big{)}\widehat{\sigma}^{2}\big{(}x,\vec{\mathrm{h}}\big{)}}{n}}+\frac{4M_{\infty}\lambda_{n}\big{(}\vec{\mathrm{h}}\big{)}}{3n\prod_{j=1}^{d}\mathrm{h}_{j}(\mathrm{h}_{j}\wedge 1)^{\boldsymbol{\mu}_{j}(\alpha)}},\quad\widehat{\sigma}^{2}\big{(}x,\vec{\mathrm{h}}\big{)}=\frac{1}{n}\sum_{i=1}^{n}M^{2}\big{(}Z_{i}-x,\vec{\mathrm{h}}\big{)};

\displaystyle\widehat{U}_{n}\big{(}x,\vec{\mathrm{h}}\big{)}=\sqrt{\frac{2\lambda_{n}\big{(}\vec{\mathrm{h}}\big{)}\widehat{\sigma}^{2}\big{(}x,\vec{\mathrm{h}}\big{)}}{n}}+\frac{4M_{\infty}\lambda_{n}\big{(}\vec{\mathrm{h}}\big{)}}{3n\prod_{j=1}^{d}\mathrm{h}_{j}(\mathrm{h}_{j}\wedge 1)^{\boldsymbol{\mu}_{j}(\alpha)}},\quad\widehat{\sigma}^{2}\big{(}x,\vec{\mathrm{h}}\big{)}=\frac{1}{n}\sum_{i=1}^{n}M^{2}\big{(}Z_{i}-x,\vec{\mathrm{h}}\big{)};

\displaystyle\lambda_{n}\big{(}\vec{\mathrm{h}}\big{)}=4\ln(M_{\infty})+6\ln{(n)}+(8p+26)\sum_{j=1}^{d}\big{[}1+\boldsymbol{\mu}_{j}(\alpha)\big{]}\big{|}\ln(\mathrm{h}_{j})\big{|};

\displaystyle M_{\infty}=\big{[}(2\pi)^{-d}\big{\{}\varepsilon^{-1}\big{\|}\check{K}\big{\|}_{1}\mathrm{1}_{\alpha\neq 1}+\Upsilon_{0}^{-1}\mathbf{k}_{1}\mathrm{1}_{\alpha=1}\big{\}}\big{]}\vee 1.

\displaystyle\widehat{{\cal R}}_{\vec{h}}(x)=\sup_{\vec{\eta}\in\mathbb{H}}\Big{[}\big{|}\widehat{f}_{\vec{h}\vee\vec{\eta}}(x)-\widehat{f}_{\vec{\eta}}(x)\big{|}-4\widehat{U}_{n}\big{(}x,\vec{h}\vee\vec{\eta}\big{)}-4\widehat{U}_{n}\big{(}x,\vec{\eta}\big{)}\Big{]}_{+};

\displaystyle\widehat{{\cal R}}_{\vec{h}}(x)=\sup_{\vec{\eta}\in\mathbb{H}}\Big{[}\big{|}\widehat{f}_{\vec{h}\vee\vec{\eta}}(x)-\widehat{f}_{\vec{\eta}}(x)\big{|}-4\widehat{U}_{n}\big{(}x,\vec{h}\vee\vec{\eta}\big{)}-4\widehat{U}_{n}\big{(}x,\vec{\eta}\big{)}\Big{]}_{+};

\displaystyle\widehat{U}^{*}_{n}\big{(}x,\vec{h}\big{)}=\sup_{\vec{\eta}\in\mathbb{H}:\;\vec{\eta}\geq\vec{h}}\widehat{U}_{n}\big{(}x,\vec{\eta}\big{)},

\displaystyle\vec{\mathbf{h}}(x)=\arg\inf_{\vec{h}\in\mathbb{H}}\Big{[}\widehat{{\cal R}}_{\vec{h}}(x)+8\widehat{U}^{*}_{n}\big{(}x,\vec{h}\big{)}\Big{]}.

\displaystyle\vec{\mathbf{h}}(x)=\arg\inf_{\vec{h}\in\mathbb{H}}\Big{[}\widehat{{\cal R}}_{\vec{h}}(x)+8\widehat{U}^{*}_{n}\big{(}x,\vec{h}\big{)}\Big{]}.

\displaystyle U^{*}_{n}\big{(}x,\vec{h}\big{)}=\sup_{\vec{\eta}\in{\cal H}^{d}:\;\vec{\eta}\geq\vec{h}}U_{n}\big{(}x,\vec{\eta}\big{)},\qquad S_{\vec{h}}(x,f)=\int_{{\mathbb{R}}^{d}}K_{\vec{h}}(t-x)f(t)\nu_{d}({\rm d}t);

\displaystyle U^{*}_{n}\big{(}x,\vec{h}\big{)}=\sup_{\vec{\eta}\in{\cal H}^{d}:\;\vec{\eta}\geq\vec{h}}U_{n}\big{(}x,\vec{\eta}\big{)},\qquad S_{\vec{h}}(x,f)=\int_{{\mathbb{R}}^{d}}K_{\vec{h}}(t-x)f(t)\nu_{d}({\rm d}t);

\displaystyle U_{n}\big{(}x,\vec{\eta}\big{)}=\sqrt{\frac{2\lambda_{n}\big{(}\vec{\eta}\big{)}\sigma^{2}\big{(}x,\vec{\eta}\big{)}}{n}}+\frac{4M_{\infty}\lambda_{n}\big{(}\vec{\eta}\big{)}}{3n\prod_{j=1}^{d}\eta_{j}(\eta_{j}\wedge 1)^{\boldsymbol{\mu}_{j}(\alpha)}},\quad\sigma^{2}\big{(}x,\vec{\eta}\big{)}=\int_{{\mathbb{R}}^{d}}M^{2}\big{(}t-x,\vec{\eta}\big{)}\mathfrak{p}(t)\nu_{d}({\rm d}t).

\displaystyle U_{n}\big{(}x,\vec{\eta}\big{)}=\sqrt{\frac{2\lambda_{n}\big{(}\vec{\eta}\big{)}\sigma^{2}\big{(}x,\vec{\eta}\big{)}}{n}}+\frac{4M_{\infty}\lambda_{n}\big{(}\vec{\eta}\big{)}}{3n\prod_{j=1}^{d}\eta_{j}(\eta_{j}\wedge 1)^{\boldsymbol{\mu}_{j}(\alpha)}},\quad\sigma^{2}\big{(}x,\vec{\eta}\big{)}=\int_{{\mathbb{R}}^{d}}M^{2}\big{(}t-x,\vec{\eta}\big{)}\mathfrak{p}(t)\nu_{d}({\rm d}t).

B^{*}_{\vec{h}}(x,f)=\sup_{\vec{\eta}\in\mathbb{H}}\big{|}S_{\vec{h}\vee\vec{\eta}}(x,f)-S_{\vec{\eta}}(x,f)\big{|},\qquad B_{\vec{h}}(x,f)=\big{|}S_{\vec{h}}(x,f)-f(x)\big{|}.

B^{*}_{\vec{h}}(x,f)=\sup_{\vec{\eta}\in\mathbb{H}}\big{|}S_{\vec{h}\vee\vec{\eta}}(x,f)-S_{\vec{\eta}}(x,f)\big{|},\qquad B_{\vec{h}}(x,f)=\big{|}S_{\vec{h}}(x,f)-f(x)\big{|}.

\forall f\in\mathbb{F}_{g}(R),\quad{\cal R}^{(p)}_{n}[\widehat{f}_{\vec{\mathbf{h}}(\cdot)},f]\leq\Big{\|}\inf_{\vec{h}\in\mathbb{H}}\Big{\{}2B^{*}_{\vec{h}}(\cdot,f)+B_{\vec{h}}(\cdot,f)+49U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\Big{\}}\Big{\|}_{p}+\mathbf{C}_{p}n^{-\frac{1}{2}}.

\forall f\in\mathbb{F}_{g}(R),\quad{\cal R}^{(p)}_{n}[\widehat{f}_{\vec{\mathbf{h}}(\cdot)},f]\leq\Big{\|}\inf_{\vec{h}\in\mathbb{H}}\Big{\{}2B^{*}_{\vec{h}}(\cdot,f)+B_{\vec{h}}(\cdot,f)+49U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\Big{\}}\Big{\|}_{p}+\mathbf{C}_{p}n^{-\frac{1}{2}}.

{\cal H}^{d}_{\text{isotr}}:=\big{\{}\vec{h}\in{\cal H}^{d}:\;\vec{h}=(h,\ldots,h),\;\;h\in{\cal H}\big{\}}.

{\cal H}^{d}_{\text{isotr}}:=\big{\{}\vec{h}\in{\cal H}^{d}:\;\vec{h}=(h,\ldots,h),\;\;h\in{\cal H}\big{\}}.

B_{h}^{*} (\cdot, f) \leq 2 η \in H_{isotr}^{d} : η \leq h sup B_{η} (\cdot, f)

B_{h}^{*} (\cdot, f) \leq 2 η \in H_{isotr}^{d} : η \leq h sup B_{η} (\cdot, f)

{\cal R}^{(p)}_{n}[\widehat{f}_{\vec{\mathbf{h}}(\cdot)},f]\leq\bigg{\|}\inf_{\vec{h}\in{\cal H}^{d}_{\text{isotr}}}\bigg{\{}5\sup_{\vec{\eta}\in{\cal H}^{d}_{\text{isotr}}\;:\;\eta\leq h}B_{\vec{\eta}}(\cdot,f)+49U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\bigg{\}}\bigg{\|}_{p}+\mathbf{C}_{p}n^{-\frac{1}{2}},\quad\forall f\in\mathbb{F}_{g}(R).

{\cal R}^{(p)}_{n}[\widehat{f}_{\vec{\mathbf{h}}(\cdot)},f]\leq\bigg{\|}\inf_{\vec{h}\in{\cal H}^{d}_{\text{isotr}}}\bigg{\{}5\sup_{\vec{\eta}\in{\cal H}^{d}_{\text{isotr}}\;:\;\eta\leq h}B_{\vec{\eta}}(\cdot,f)+49U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\bigg{\}}\bigg{\|}_{p}+\mathbf{C}_{p}n^{-\frac{1}{2}},\quad\forall f\in\mathbb{F}_{g}(R).

\Big{\|}\inf_{\vec{h}\in\mathbb{H}}\Big{\{}2B^{*}_{\vec{h}}(\cdot,f)+B_{\vec{h}}(\cdot,f)+49U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\Big{\}}\Big{\|}_{p}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Mathematical Approximation and Integration · Risk and Portfolio Optimization

Full text

Estimation in the convolution structure density model. Part I: oracle inequalities.

O.V. Lepski label=e1][email protected] [

T. Willer label=e2][email protected] [ Aix Marseille Univ, CNRS, Centrale Marseille, I2M, Marseille, France

Institut de Mathématique de Marseille

Aix-Marseille Université

39, rue F. Joliot-Curie

13453 Marseille, France

Abstract

We study the problem of nonparametric estimation under ${\mathbb{L}}_{p}$ -loss, $p\in[1,\infty)$ , in the framework of the convolution structure density model on ${\mathbb{R}}^{d}$ . This observation scheme is a generalization of two classical statistical models, namely density estimation under direct and indirect observations. In Part I the original pointwise selection rule from a family of ”kernel-type” estimators is proposed. For the selected estimator, we prove an ${\mathbb{L}}_{p}$ -norm oracle inequality and several of its consequences. In Part II the problem of adaptive minimax estimation under ${\mathbb{L}}_{p}$ –loss over the scale of anisotropic Nikol’skii classes is addressed. We fully characterize the behavior of the minimax risk for different relationships between regularity parameters and norm indexes in the definitions of the functional class and of the risk. We prove that the selection rule proposed in Part I leads to the construction of an optimally or nearly optimally (up to logarithmic factor) adaptive estimator.

62G05, 62G20,

deconvolution model,

density estimation,

oracle inequality,

adaptive estimation,

kernel estimators,

${\mathbb{L}}_{p}$ –risk,

anisotropic Nikol’skii class,

keywords:

[class=AMS]

keywords:

\startlocaldefs\endlocaldefs

t1This work has been carried out in the framework of the Labex Archimède (ANR-11-LABX-0033) and of the A*MIDEX project (ANR-11-IDEX-0001-02), funded by the ”Investissements d’Avenir” French Government program managed by the French National Research Agency (ANR).

1 Introduction

In the present paper we will investigate the following observation scheme introduced in Lepski and Willer (2017). Suppose that we observe i.i.d. vectors $Z_{i}\in{\mathbb{R}}^{d},i=1,\ldots,n,$ with a common probability density $\mathfrak{p}$ satisfying the following structural assumption

[TABLE]

where $\alpha\in[0,1]$ and $g:{\mathbb{R}}^{d}\to{\mathbb{R}}$ are supposed to be known and $f:{\mathbb{R}}^{d}\to{\mathbb{R}}$ is the function to be estimated. We will call the observation scheme (1.1) convolution structure density model.

Here and later, for two functions $f,g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$

[TABLE]

and for any $\alpha\in[0,1]$ , $g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$ and $R>1$ ,

[TABLE]

Here $\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}$ denotes the set of probability densities on ${\mathbb{R}}^{d}$ , $\mathbb{B}_{s,d}(R)$ is the ball of radius $R>0$ in ${\mathbb{L}}_{s}\big{(}{\mathbb{R}}^{d}\big{)}:={\mathbb{L}}_{s}\big{(}{\mathbb{R}}^{d},\nu_{d}\big{)},1\leq s\leq\infty$ and $\nu_{d}$ is the Lebesgue measure on ${\mathbb{R}}^{d}$ .

We remark that if one assumes additionally that $f,g\in\mathfrak{P}\big{(}{\mathbb{R}}^{d}\big{)}$ , this model can be interpreted as follows. The observations $Z_{i}\in{\mathbb{R}}^{d},i=1,\ldots,n,$ can be written as a sum of two independent random vectors, that is,

[TABLE]

where $X_{i},i=1,\ldots,n,$ are i.i.d. $d$ -dimensional random vectors with a common density $f$ , to be estimated. The noise variables $Y_{i},i=1,\ldots,n,$ are i.i.d. $d$ -dimensional random vectors with a known common density $g$ . At last $\varepsilon_{i}\in\{0,1\},i=1,\ldots,n,$ are i.i.d. Bernoulli random variables with ${\mathbb{P}}(\varepsilon_{1}=1)=\alpha$ , where $\alpha\in[0,1]$ is supposed to be known. The sequences $\{X_{i},i=1,\ldots,n\}$ , $\{Y_{i},i=1,\ldots,n\}$ and $\{\epsilon_{i},i=1,\ldots,n\}$ are supposed to be mutually independent.

The observation scheme (1.2) can be viewed as the generalization of two classical statistical models. Indeed, the case $\alpha=1$ corresponds to the standard deconvolution model $Z_{i}=X_{i}+Y_{i},\;i=1,\ldots,n$ . Another ”extreme” case $\alpha=0$ corresponds to the direct observation scheme $Z_{i}=X_{i},\;i=1,\ldots,n$ . The ”intermediate” case $\alpha\in(0,1)$ , considered for the first time in Hesse (1995), can be treated as the mathematical modeling of the following situation. One part of the data, namely $(1-\alpha)n$ , is observed without noise, while the other part is contaminated by additional noise. If the indexes corresponding to that first part were known, the density $f$ could be estimated using only this part of the data, with the accuracy corresponding to the direct case. The question we address now is: can one obtain the same accuracy if the latter information is not available? We will see that the answer to the aforementioned question is positive, but the construction of optimal estimation procedures is based upon ideas corresponding to the ”pure” deconvolution model.

The convolution structure density model (1.1) will be studied for an arbitrary $g\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$ and $f\in\mathbb{F}_{g}(R)$ . Then, except in the case $\alpha=0$ , the function $f$ is not necessarily a probability density.

We want to estimate $f$ using the observations $Z^{(n)}=(Z_{1},\ldots,Z_{n})$ . By estimator, we mean any $Z^{(n)}$ -measurable map $\hat{f}:{\mathbb{R}}^{n}\to{\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}$ . The accuracy of an estimator $\hat{f}$ is measured by the ${\mathbb{L}}_{p}$ –risk

[TABLE]

where $\mathbb{E}_{f}$ denotes the expectation with respect to the probability measure ${\mathbb{P}}_{f}$ of the observations $Z^{(n)}=(Z_{1},\ldots,Z_{n})$ . Also, $\|\cdot\|_{p}$ , $p\in[1,\infty)$ , is the ${\mathbb{L}}_{p}$ -norm on ${\mathbb{R}}^{d}$ and without further mentioning we will assume that $f\in{\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}$ . The objective is to construct an estimator of $f$ with a small ${\mathbb{L}}_{p}$ –risk.

1.1 Oracle approach via local selection. Objectives of Part I

Let ${\cal F}=\big{\{}\hat{f}_{\mathfrak{t}},\mathfrak{t}\in\mathfrak{T}\big{\}}$ be a family of estimators built from the observation $Z^{(n)}$ . The goal is to propose a data-driven (based on $Z^{(n)}$ ) selection procedure from the collection ${\cal F}$ and to establish for it an ${\mathbb{L}}_{p}$ -norm oracle inequality. More precisely, we want to construct a $Z^{(n)}$ -measurable random map $\hat{\mathfrak{t}}:{\mathbb{R}}^{d}\to\mathfrak{T}$ and prove that for any $p\in[1,\infty)$ and $n\geq 1$

[TABLE]

Here $C_{1}$ and $C_{2}$ are numerical constants which may depend on $d,p$ and $\mathfrak{T}$ only.

We call (1.3) an ${\mathbb{L}}_{p}$ -norm oracle inequality obtained by local selection, and in Part I we provide with an explicit expression of the functional $A_{n}(\cdot,\cdot,x),x\in{\mathbb{R}}^{d}$ in the case where ${\cal F}={\cal F}\big{(}{\cal H}^{d}\big{)}$ is the family of ”kernel-type” estimators parameterized by a collection of multi-bandwidths ${\cal H}^{d}$ . The selection from the latter family is done pointwisely, i.e. for any $x\in{\mathbb{R}}^{d}$ , which allows to take into account the ”local structure” of the function to be estimated. The ${\mathbb{L}}_{p}$ -norm oracle inequality is then obtained by the integration of the pointwise risk of the proposed estimator, which is a kernel estimator with the bandwidth being a multivariate random function. This, in its turn, allows us to derive different minimax adaptive results presented in Part II of the paper. They are obtained thanks to an unique ${\mathbb{L}}_{p}$ -norm oracle inequality.

Our selection rule presented in Section 2.1 can be viewed as a generalization and modification of some statistical procedures proposed in Kerkyacharian et al. (2001) and Goldenshluger and Lepski (2014). As we mentioned above, establishing (1.3) is the main objective of Part I. We will see however that although $A_{n}(\cdot,\cdot,x),x\in{\mathbb{R}}^{d}$ will be presented explicitly, its computation in particular problems is not a simple task. The main difficulty here is mostly related to the fact that (1.3) is proved without any assumption (except for the model requirements) imposed on the underlying function $f$ . It turns out that under some nonrestrictive assumptions imposed on $f$ , the obtained bounds can be considerably simplified, see Section 2.3. Moreover these new inequalities allow to better understand the methodology for obtaining minimax adaptive results by the use of the oracle approach.

1.2 Adaptive estimation. Objectives of Part II

Let $\mathbb{F}$ be a given subset of ${\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}$ . For any estimator $\tilde{f}_{n}$ , define its maximal risk by ${\cal R}^{(p)}_{n}\big{[}\tilde{f}_{n};\mathbb{F}\big{]}=\sup_{f\in\mathbb{F}}{\cal R}^{(p)}_{n}\big{[}\tilde{f}_{n};f\big{]}$ and its minimax risk on $\mathbb{F}$ is given by

[TABLE]

Here, the infimum is taken over all possible estimators. An estimator whose maximal risk is bounded, up to some constant factor, by $\phi_{n}(\mathbb{F})$ , is called minimax on $\mathbb{F}$ .

Let $\big{\{}\mathbb{F}_{\vartheta},\vartheta\in\Theta\big{\}}$ be a collection of subsets of ${\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d},\nu_{d}\big{)}$ , where $\vartheta$ is a nuisance parameter which may have a very complicated structure.

The problem of adaptive estimation can be formulated as follows: is it possible to construct a single estimator $\hat{f}_{n}$ which would be simultaneously minimax on each class $\mathbb{F}_{\vartheta},\;\vartheta\in\Theta$ , i.e.

[TABLE]

We refer to this question as *the problem of minimax adaptive estimation over the scale of * $\{\mathbb{F}_{\vartheta},\;\vartheta\in\Theta\}$ . If such an estimator exists, we will call it optimally adaptive.

From oracle approach to adaptation. Let the oracle inequality (1.3) be established. Define

[TABLE]

We immediately deduce from (1.3) that for any $\vartheta\in\Theta$

[TABLE]

Hence, the minimax adaptive optimality of the estimator $\hat{f}_{\hat{\mathfrak{t}}(\cdot)}$ is reduced to the comparison of the normalization $R_{n}\big{(}\mathbb{F}_{\vartheta}\big{)}$ with the minimax risk $\phi_{n}(\mathbb{F}_{\vartheta})$ . Indeed, if one proves that for any $\vartheta\in\Theta$

[TABLE]

then the estimator $\hat{f}_{\hat{\mathfrak{t}}(\cdot)}$ is optimally adaptive over the scale $\big{\{}\mathbb{F}_{\vartheta},\vartheta\in\Theta\big{\}}$ .

Objectives. In the framework of the convolution structure density model, we will be interested in adaptive estimation over the scale

[TABLE]

where ${\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}$ is the anisotropic Nikolskii class (its exact definition will be presented in Part II). Here we only mention that for any $f\in{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}$ the coordinate $\beta_{i}$ of the vector $\vec{\beta}=(\beta_{1},\ldots,\beta_{d})\in(0,\infty)^{d}$ represents the smoothness of $f$ in the direction $i$ and the coordinate $r_{i}$ of the vector $\vec{r}=(r_{1},\ldots,r_{d})\in[1,\infty]^{d}$ represents the index of the norm in which $\beta_{i}$ is measured. Moreover, ${\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}$ is the intersection of balls in some semi-metric space and the vector $\vec{L}\in(0,\infty)^{d}$ represents the radii of these balls.

The aforementioned dependence on the direction is usually referred to anisotropy of the underlying function and the corresponding functional class. The use of the integral norm in the definition of the smoothness is referred to inhomogeneity of the underlying function. The latter means that the function $f$ can be sufficiently smooth on some part of the observation domain and rather irregular on another part. Thus, the adaptive estimation over the scale $\big{\{}{\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)},\;\big{(}\vec{\beta},\vec{r},\vec{L}\big{)}\in(0,\infty)^{d}\times[1,\infty]^{d}\times(0,\infty)^{d}\big{\}}$ can be viewed as the adaptation to anisotropy and inhomogeneity of the function to be estimated.

Additionally, we will consider $\mathbb{F}_{\vartheta}={\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}\cap\mathbb{F}_{g}(R)\cap\mathbb{B}_{\infty,d}(Q),\;\vartheta=\big{(}\vec{\beta},\vec{r},\vec{L},R,Q\big{)}$ . It will allow us to understand how the boundedness of the underlying function may affect the accuracy of estimation.

The minimax adaptive estimation is a very active area of mathematical statistics, and the theory of adaptation was developed considerably over the past three decades. Several estimation procedures were proposed in various statistical models, such that Efroimovich-Pinsker method, Efroimovich and Pinsker (1984); Efroimovich (1986), Lepski method, Lepskii (1991) and its generalizations, Kerkyacharian et al. (2001), Goldenshluger and Lepski (2009), unbiased risk minimization, Golubev (1992), wavelet thresholding, Donoho et al. (1996), model selection, Barron et al. (1999); Birgé and Massart (2001), blockwise Stein method, Cai (1999), aggregation of estimators, Nemirovski (2000), Wegkamp (2003), Tsybakov (2003), Goldenshluger (2009), exponential weights, Leung and Barron (2006), Dalalyan and Tsybakov (2008), risk hull method, Cavalier and Golubev (2006), among many others. The interested reader can find a very detailed overview as well as several open problems in adaptive estimation in the recent paper, Lepski (2015).

As already said, the convolution structure density model includes itself the density estimation under direct and indirect observations. In Part II we compare in detail our minimax adaptive results to those already existing in both statistical models. Here we only mention that more developed results can be found in Goldenshluger and Lepski (2011), Goldenshluger and Lepski (2014) (density model) and in Comte and Lacour (2013), Rebelles (2016) (density deconvolution).

1.3 Assumption on the function $g$

Later on for any $U\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$ , let $\check{U}$ denote its Fourier transform, defined as $\check{U}(t):=\int_{{\mathbb{R}}^{d}}U(x)e^{-i\sum_{j=1}^{d}x_{j}t_{j}}\nu_{d}(dx),t\in{\mathbb{R}}^{d}.$ The selection rule from the family of kernel estimators, the ${\mathbb{L}}_{p}$ -norm oracle inequality as well as the adaptive results presented in Part II are established under the following condition.

Assumption 1.

(1) if $\alpha\neq 1$ then there exists $\varepsilon>0$ such that

[TABLE]

(2) if $\alpha=1$ then there exists $\vec{\mu}=(\mu_{1},\ldots,\mu_{d})\in(0,\infty)^{d}$ and $\Upsilon_{0}>0$ such that

[TABLE]

Remind that the following assumption is well-known in the literature:

[TABLE]

It is referred to as a moderately ill-posed statistical problem. In particular, the assumption is satisfied for the centered multivariate Laplace law.

Note that Assumption 1 (1) is very weak and it is verified for many distributions, including centered multivariate Laplace and Gaussian ones. Note also that this assumption always holds with $\varepsilon=1-2\alpha$ if $\alpha<1/2$ . Additionally, it holds with $\varepsilon=1-\alpha$ if $\check{g}$ is a real positive function. The latter is true, in particular, for any probability law obtained by an even number of convolutions of a symmetric distribution with itself.

2 Pointwise selection rule and ${\mathbb{L}}_{p}$ -norm oracle inequality

To present our results in an unified way, let us define $\vec{\boldsymbol{\mu}}(\alpha)=\vec{\mu}$ , $\alpha=1$ , $\vec{\boldsymbol{\mu}}(\alpha)=(0,\ldots,0)$ , $\alpha\in[0,1)$ . Let $K:{\mathbb{R}}^{d}\to{\mathbb{R}}$ be a continuous function belonging to ${\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$ , $\int_{{\mathbb{R}}}K=1$ , and such that its Fourier transform $\check{K}$ satisfies the following condition.

Assumption 2.

There exist $\mathbf{k}_{1}>0$ and $\mathbf{k}_{2}>0$ such that

[TABLE]

Set ${\cal H}=\big{\{}e^{k},\;k\in{\mathbb{Z}}\big{\}}$ and let ${\cal H}^{d}=\big{\{}\vec{h}=(h_{1},\ldots,h_{d}):\;h_{j}\in{\cal H},j=1,\ldots,d\big{\}}.$ Define for any $\vec{h}=(h_{1},\ldots,h_{d})\in{\cal H}^{d}$

[TABLE]

Later on for any $u,v\in{\mathbb{R}}^{d}$ the operations and relations $u/v$ , $uv$ , $u\vee v$ , $u\wedge v$ , $u\geq v$ , $au,a\in{\mathbb{R}},$ are understood in coordinate-wise sense. In particular $u\geq v$ means that $u_{j}\geq v_{j}$ for any $j=1,\ldots,d$ .

2.1 Pointwise selection rule from the family of kernel estimators

For any $\vec{h}\in(0,\infty)^{d}$ let $M\big{(}\cdot,\vec{h}\big{)}$ satisfy the operator equation

[TABLE]

For any $\vec{\mathrm{h}}\in{\cal H}^{d}$ and $x\in{\mathbb{R}}^{d}$ introduce the estimator $\widehat{f}_{\vec{\mathrm{h}}}(x)=n^{-1}\sum_{i=1}^{n}M\big{(}Z_{i}-x,\vec{\mathrm{h}}\big{)}.$

Our first goal is to propose for any given $x\in{\mathbb{R}}^{d}$ a data-driven selection rule from the family of kernel estimators ${\cal F}\big{(}{\cal H}^{d}\big{)}=\big{\{}\widehat{f}_{\vec{\mathrm{h}}}(x),\;\vec{\mathrm{h}}\in{\cal H}^{d}\big{\}}$ . Define for any $\vec{\mathrm{h}}\in{\cal H}^{d}$

[TABLE]

Pointwise selection rule

Let $\mathbb{H}$ be an arbitrary subset of ${\cal H}^{d}$ . For any $\vec{h}\in\mathbb{H}$ and $x\in{\mathbb{R}}^{d}$ introduce

[TABLE]

and define

[TABLE]

Our final estimator is $\widehat{f}_{\vec{\mathbf{h}}(x)}(x),\;x\in{\mathbb{R}}^{d}$ and we will call (2.3) the pointwise selection rule.

Note that the estimator $\widehat{f}_{\vec{\mathbf{h}}(\cdot)}(\cdot)$ does not necessarily belong to the collection $\big{\{}\widehat{f}_{\vec{\mathrm{h}}}(\cdot),\;\vec{\mathrm{h}}\in{\cal H}^{d}\big{\}}$ since the multi-bandwidth $\vec{\mathbf{h}}(\cdot)$ is a $d$ -variate function, which is not necessarily constant on ${\mathbb{R}}^{d}$ . The latter fact allows to take into account the ”local structure” of the function to be estimated. Moreover, $\vec{\mathbf{h}}(\cdot)$ is chosen with respect to the observations, and therefore it is a random vector-function.

2.2 ${\mathbb{L}}_{p}$ -norm oracle inequality

Introduce for any $x\in{\mathbb{R}}^{d}$ and $\vec{h}\in{\cal H}^{d}$

[TABLE]

where we have put

[TABLE]

For any $\mathbb{H}\subseteq{\cal H}^{d}$ , $\vec{h}\in\mathbb{H}$ and $x\in{\mathbb{R}}^{d}$ introduce also

[TABLE]

Theorem 1.

Let Assumptions 1 and 2 be fulfilled. Then for any $\mathbb{H}\subseteq{\cal H}^{d}$ , $n\geq 3$ and $p\in[1,\infty)$ ,

[TABLE]

The explicit expression for the constant $\mathbf{C}_{p}$ can be found in the proof of the theorem.

Later on we will pay attention to a special choice for the collection of multi-bandwidths, namely

[TABLE]

More precisely, in Part II, the selection from the corresponding family of kernel estimators will be used for the adaptive estimation over the collection of isotropic Nikolskii classes. Note also that if $\mathbb{H}={\cal H}^{d}_{\text{isotr}}$ then obviously for any $\vec{h}=(h,\ldots,h)\in{\cal H}^{d}_{\text{isotr}}$

[TABLE]

and we come to the following corollary of Theorem 1.

Corollary 1.

Let Assumptions 1 and 2 be fulfilled. Then for any $n\geq 3$ and $p\in[1,\infty)$

[TABLE]

The oracle inequality proved in Theorem 1 is particularly useful since it does not require any assumption on the underlying function $f$ (except for the restrictions ensuring the existence of the model and of the risk). However, the quantity appearing in the right hand side of this inequality, namely

[TABLE]

is not easy to analyze. In particular, in order to use the result of Theorem 1 for adaptive estimation, one has to be able to compute

[TABLE]

for a given class $\mathbb{F}\subset{\mathbb{L}}_{p}\big{(}{\mathbb{R}}^{d}\big{)}\cap\mathbb{F}_{g}(R)$ with either $\mathbb{H}={\cal H}^{d}$ or $\mathbb{H}={\cal H}^{d}_{\text{isotr}}$ . It turns out that under some nonrestrictive assumptions imposed on $f$ , the obtained bounds can be considerably simplified. Moreover, the new inequality obtained below will allow us to better understand the way for proving adaptive results.

2.3 Some consequences of Theorem 1

Thus, furthermore we will assume that $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)$ , $\mathbf{q},\mathbf{u}\in[1,\infty],D>0,$ where

[TABLE]

and $\mathbb{B}^{(\infty)}_{\mathbf{u},d}(D)$ denotes the ball of radius $D$ in the weak-type space ${\mathbb{L}}_{\mathbf{u},\infty}\big{(}{\mathbb{R}}^{d}\big{)}$ , i.e.

[TABLE]

As usual $\mathbb{B}^{(\infty)}_{\mathbf{\infty},d}(D)=\mathbb{B}_{\infty,d}(D)$ and obviously $\mathbb{B}^{(\infty)}_{\mathbf{u},d}(D)\supset\mathbb{B}_{\mathbf{u},d}(D)$ . Note also that $\mathbb{F}_{g,\mathbf{1}}(R,D)=\mathbb{F}_{g}(R)$ for any $D\geq 1$ . It is worth noting that the assumption $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)$ simply means that the common density of the observations $\mathfrak{p}$ belongs to $\mathbb{B}^{(\infty)}_{\mathbf{u},d}(D)$ .

Remark 1.

It is easily seen that $\mathbb{F}_{g,\infty}\big{(}R,R\|g\|_{\infty}\big{)}=\mathbb{F}_{g}(R)$ if $\alpha=1$ and $\|g\|_{\infty}<\infty$ . Note also that $\mathbb{F}_{g,\mathbf{\infty}}(R,Q\|g\|_{1})\supset\mathbb{F}_{g}(R)\cap\mathbb{B}_{\infty,d}(Q)$ for any $\alpha\in[0,1]$ and $Q>0$ .

2.3.1 Oracle inequality over $\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)$

For any $\vec{h}\in{\cal H}^{d}$ and any $v>0$ , let

[TABLE]

Furthermore let $\mathbb{H}$ be either ${\cal H}^{d}$ or ${\cal H}^{d}_{\text{isotr}}$ and for any $v,z>0$ define

[TABLE]

Here $a>0$ is a numerical constant whose explicit expression is given in the beginning of Section 3.2. Introduce for any $v>0$ and $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)$

[TABLE]

Remark 2.

Note that $\mathfrak{H}(v)\neq\emptyset$ and $\mathfrak{H}(v,z)\neq\emptyset$ whatever the values of $v>0$ and $z\geq 2.$ Indeed, for any $v>0$ and $z>2$ one can find $b>1$ such that

[TABLE]

The latter means that $\vec{b}=(b,\ldots,b)\in\mathfrak{H}(v,z)\cap\mathfrak{H}(v)$ . Thus, we conclude that the quantities $\Lambda(v,f)$ , $\Lambda(v,f,\mathbf{u})$ and $\Lambda_{p}(v,f,\mathbf{u})$ are well-defined for all $v>0$ .

Also, It is easily seen that for any $v>0$ and $f\in\mathbb{F}_{g,\mathbf{\infty}}(R,D)$

[TABLE]

Put at last for any $v>0$ , $l_{\mathbb{H}}(v)=v^{p-1}(1+|\ln{(v)}|)^{t(\mathbb{H})}$ , where $t(\mathbb{H})=d-1$ if $\mathbb{H}={\cal H}^{d}$ and $t(\mathbb{H})=0$ if $\mathbb{H}={\cal H}^{d}_{\text{isotr}}$ .

Theorem 2.

Let the assumptions of Theorem 1 be fulfilled and let $K$ be a compactly supported function. Then for any $n\geq 3$ , $p>1,\mathbf{q}>1,R>1,D>0,0<\underline{\boldsymbol{v}}\leq\overline{\boldsymbol{v}}<\infty,\mathbf{u}\in(p/2,\infty],\mathbf{u}\geq\mathbf{q}$ and any $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)$

[TABLE]

Here $C^{(1)}$ is a universal constant independent of $f$ and $n$ . Its explicit expression can be found in the proof of the theorem. We remark also that only this constant depends on $\mathbf{q}$ .

The result announced in Theorem 2 suggests a way for establishing minimax and minimax adaptive properties of the pointwise selection rule given in (2.3). For a given $\mathbb{F}\subset\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)$ it mostly consists in finding a careful estimate for

[TABLE]

The choice of $\underline{\boldsymbol{v}},\overline{\boldsymbol{v}}>0$ is a delicate problem and it depends on $S(\cdot,\cdot)$ .

In the next section we present several results concerning some useful upper estimates for the quantities

[TABLE]

We would like to underline that these bounds will be established for an arbitrary $\mathbb{F}$ and, therefore, they can be applied to the adaptation over different scales of functional classes. In particular, the results obtained below form the basis for our consideration in Part II.

2.3.2 Application to the minimax adaptive estimation

Our objective now is to bound from above $\sup_{f\in\mathbb{F}}{\cal R}^{(p)}_{n}[\widehat{f}_{\vec{\mathbf{h}}(\cdot)},f]$ for any $\mathbb{F}\subset\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)$ . All the results in this section will be proved under an additional condition imposed on the kernel $K$ .

Assumption 3.

Let ${\cal K}:{\mathbb{R}}\to{\mathbb{R}}$ be a compactly supported, bounded function and $\int{\cal K}=1$ . Then

[TABLE]

Without loss of generality we will assume that $\|{\cal K}\|_{\infty}\geq 1$ and $\text{supp}({\cal K})\subset[-c_{\cal K},c_{\cal K}]$ with $c_{\cal K}\geq 1$ .

Introduce the following notations. Set for any $h\in{\cal H}$ , $x\in{\mathbb{R}}^{d}$ and $j=1,\ldots,d$

[TABLE]

where $(\mathbf{e}_{1},\ldots,\mathbf{e}_{d})$ denotes the canonical basis of ${\mathbb{R}}^{d}$ . For any $s\in[1,\infty]$ introduce

[TABLE]

Set for any $\vec{h}\in{\cal H}^{d}$ , $v>0$ and $j=1,\ldots,d$ ,

[TABLE]

where $\mathbf{c}=(20d)^{-1}\big{[}\max(2c_{\cal K}\|{\cal K}\|_{\infty},\|{\cal K}\|_{1})\big{]}^{-d}$ . As usual the complement of $J\big{(}\vec{h},v\big{)}$ will be denoted by $\bar{J}\big{(}\vec{h},v\big{)}$ . Furthermore, the summation over the empty set is supposed to be zero.

For any $\vec{s}=(s_{1},\ldots,s_{d})\in[1,\infty)^{d}$ , $\mathbf{u}\geq 1$ and $v>0$ introduce

[TABLE]

Theorem 3.

Let assumptions of Theorem 2 be fulfilled and suppose additionally that $K$ satisfies Assumption 3. Then for any $n\geq 3$ , $p>1,\mathbf{q}>1,R>1,D>0,0<\underline{\boldsymbol{v}}\leq\overline{\boldsymbol{v}}<\infty,\mathbf{u}\in(p/2,\infty],\mathbf{u}\geq\mathbf{q}$ , $\vec{s}\in(1,\infty)^{d}$ , $\vec{q}\in[p,\infty)^{d}$ and any $\mathbb{F}\subset\mathbb{B}_{\mathbf{q},d}(D)\cap\mathbb{F}_{g,\mathbf{u}}(R,D)$

[TABLE]

If additionally $\mathbf{q}\in(p,\infty)$ one has also

[TABLE]

Moreover, if $\mathbf{q}=\infty$ one has

[TABLE]

Finally, if $\mathbb{H}={\cal H}^{d}_{\text{isotr}}$ all the assertions above remain true for any $\vec{s}\in[1,\infty)^{d}$ if one replaces in (2.7)–(2.8) $\mathbf{B}_{j,s_{j},\mathbb{F}}(\cdot)$ by $\mathbf{B}^{*}_{j,s_{j},\mathbb{F}}(\cdot)$ .

It is important to emphasize that $C^{(2)}$ depends only on $\vec{s},\vec{q},g,{\cal K},d$ , $R,D,\mathbf{u}$ and $\mathbf{q}$ . Note also that the assertions of the theorem remain true if we minimize right hand sides of obtained inequalities w.r.t $\vec{s},\vec{q}$ since their left hand sides are independent of $\vec{s}$ and $\vec{q}$ . In this context it is important to realize that $C^{(2)}=C^{(2)}(\vec{s},\cdots)$ is bounded for any $\vec{s}\in(1,\infty)^{d}$ but $C^{(2)}(\vec{s},\cdots)=\infty$ if there exists $j=1,\ldots,d$ such that $s_{j}=1$ . Contrary to that $C^{(2)}(\vec{s},\cdots)<\infty$ for any $\vec{s}\in[1,\infty)^{d}$ if $\mathbb{H}={\cal H}^{d}_{\text{isotr}}$ and it explains in particular the fourth assertion of the theorem.

Note also that $D,R,\mathbf{u},\mathbf{q}$ are not involved in the construction of our pointwise selection rule. That means that one and the same estimator can be actually applied on any

[TABLE]

Moreover, the assertion of the theorem has a non-asymptotical nature; we do not suppose that the number of observations $n$ is large.

Discussion

As we see, the application of our results to some functional class is mainly reduced to the computation of the functions $\mathbf{B}^{*}_{j,s,\mathbb{F}}(\cdot)$ $j=1,\ldots,d,$ for some properly chosen $s$ . Note however that this task is not necessary for many functional classes used in nonparametric statistics, at least for the classes defined by the help of kernel approximation. Indeed, a typical description of $\mathbb{F}$ can be summarized as follows. Let $\lambda_{j}:{\mathbb{R}}_{+}\to{\mathbb{R}}_{+}$ , be such that $\lambda_{j}(0)=0,\lambda_{j}\uparrow$ for any $j=1,\ldots,d$ . Then, the functional class, say $\mathbb{F}_{K}\big{[}\vec{\lambda}(\cdot),\vec{r}\big{]}$ can be defined as a collection of functions satisfying

[TABLE]

for some $\vec{r}\in[1,\infty]$ . It yields obviously

[TABLE]

and the result of Theorem 3 remains valid if we replace formally $\mathbf{B}_{j,r_{j},\mathbb{F}}(\cdot)$ by $\lambda_{j}(\cdot)$ in all the expressions appearing in this theorem. In Part II we show that for some particular kernel $K^{*}$ , the anisotropic Nikol’skii class ${\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}$ is included into the class defined by (2.9) with $\lambda_{j}(\mathbf{h})=L_{j}\mathbf{h}^{\beta_{j}}$ , whatever the values of $\vec{\beta},\vec{L}$ and $\vec{r}$ .

Denote $\vartheta=(\vec{\lambda}(\cdot),\vec{r})$ and remark that in many cases $\mathbb{F}_{K}[\vartheta]\subset\mathbb{B}_{\mathbf{q},d}(D)$ for any $\vartheta\in\Theta$ for some class parameter $\Theta$ and $\mathbf{q}\geq p,D>0$ . Then, replacing $\mathbf{B}_{j,r_{j},\mathbb{F}}(\cdot)$ by $\lambda_{j}(\cdot)$ in (2.7) and (2.8) and choosing $\vec{q}=(\mathbf{q},\ldots,\mathbf{q})$ we come to the quantities $\boldsymbol{\Lambda}\big{(}v,\mathbf{u},\vartheta\big{)}$ and $\boldsymbol{\Lambda}_{\mathbf{q}}\big{(}v,\vartheta\big{)},$ completely determined by the functions $\lambda_{j}(\cdot),j=1,\ldots,d$ , the vector $\vec{r}$ and the number $\mathbf{q}$ . Therefore, putting

[TABLE]

we deduce from the first and the second assertions of Theorem 3 for any $\vec{\lambda}(\cdot)$ and $\vec{r}$ and $n\geq 3$

[TABLE]

Since the estimator $\widehat{f}_{\vec{\mathbf{h}}(\cdot)}$ is completely data-driven and, therefore, is independent of $\vec{\lambda}(\cdot)$ and $\vec{r}$ , the bound (2.10) holds for the scale of functional classes $\big{\{}\mathbb{F}_{K}[\vartheta]\big{\}}_{\vartheta}$ .

If $\phi_{n}\big{(}\mathbb{F}_{K}[\vartheta]\big{)}$ is the minimax risk defined in (1.4) and

[TABLE]

we can assert that our estimator is optimally adaptive over the considered scale $\big{\{}\mathbb{F}_{K}[\vartheta],\;\vartheta\in\Theta\big{\}}$ .

To illustrate the powerfulness of our approach, let us consider a particular scale of functional classes defined by (2.9).

Classes of Hölderian type

Let $\vec{\beta}\in(0,\infty)^{d}$ and $\vec{L}\in(0,\infty)^{d}$ be given vectors.

Definition 1.

We say that a function $f$ belongs to the class $\mathbb{F}_{K}\big{(}\vec{\beta},\vec{L}\big{)}$ , where $K$ satisfies Assumption 3, if $f\in\mathbb{B}_{\infty,d}\big{(}max_{j=1,\ldots,d}L_{j}\big{)}$ and for any $j=1,\ldots,d$

[TABLE]

We remark that this class is a particular case of the one defined in (2.9), since it corresponds to $\lambda_{j}(\mathbf{h})=L_{j}\mathbf{h}^{j}$ and $r_{j}=\infty$ for any $j=\ldots,d$ . Moreover let us introduce the following notations

[TABLE]

Then the following result is a direct consequence of Theorem 3. Its simple and short proof is postponed to Section 3.4.

Assertion 1.

Let the assumptions of Theorem 3 be fulfilled. Then for any $n\geq 3$ , $p>1$ , $\vec{\beta}\in(0,\infty)^{d}$ , $0<L_{0}\leq L_{\infty}<\infty$ and $\vec{L}\in[L_{0},L_{\infty}]^{d}$ there exists $C>0$ independent of $\vec{L}$ such that

[TABLE]

where we have denoted

[TABLE]

It is interesting to note that the obtained bound, being a very particular case of our consideration in Part II, is completely new if $\alpha\neq 0$ . As we already mentioned, for some particular choice of the kernel $K^{*}$ , the anisotropic Nikol’skii class ${\mathbb{N}}_{\vec{r},d}\big{(}\vec{\beta},\vec{L}\big{)}$ is included in the class $\mathbb{F}_{K^{*}}\big{[}\vec{\lambda}(\cdot),\vec{r}\big{]}$ with $\lambda_{j}(\mathbf{v})=L_{j}\mathbf{v}^{\beta_{j}}$ , whatever the values of $\vec{\beta},\vec{L}$ and $\vec{r}$ . Therefore, the aforementioned result holds on an arbitrary Hölder class ${\mathbb{N}}_{\vec{\infty},d}\big{(}\vec{\beta},\vec{L}\big{)}$ . Comparing the result of Assertion 1 with the lower bound for the minimax risk obtained in Lepski and Willer (2017), we can state that it differs only by some logarithmic factor. Using the modern statistical language, we say that the estimator $\widehat{f}_{\vec{\mathbf{h}}(\cdot)}$ is nearly optimally-adaptive over the scale of Hölder classes.

3 Proofs

3.1 Proof of Theorem 1

The main ingredients of the proof of the theorem are given in Proposition 1. Their proofs are postponed to Section 3.1.2. Introduce for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

Proposition 1.

Let Assumptions 1 and 2 be fulfilled. Then for any $n\geq 3$ and any $p>1$

[TABLE]

The explicit expression of constant $C_{p}$ and $C^{\prime}_{p}$ can be found in the proof.

3.1.1 Proof of the theorem

We start by proving the so-called pointwise oracle inequality.

Pointwise oracle inequality. Let $\vec{h}\in\mathbb{H}$ and $x\in{\mathbb{R}}^{d}$ be fixed. We have in view of the triangle inequality

[TABLE]

$1^{0}.\;$ First, note that obviously $\widehat{f}_{\vec{\mathbf{h}}(x)\vee\vec{h}}(x)=\widehat{f}_{\vec{h}\vee\vec{\mathbf{h}}(x)}(x)$ and, therefore,

[TABLE]

Moreover by definition, $\widehat{U}_{n}\big{(}x,\vec{\eta}\big{)}\leq\widehat{U}^{*}_{n}\big{(}x,\vec{\eta}\big{)}$ for any $\vec{\eta}\in{\cal H}^{d}$ .

Next, for any $\vec{h},\vec{\eta}\in{\cal H}^{d}$ we have obviously $\widehat{U}_{n}\big{(}x,\vec{h}\vee\vec{\eta}\big{)}\leq\widehat{U}^{*}_{n}\big{(}x,\vec{h}\big{)}\wedge\widehat{U}^{*}_{n}\big{(}x,\vec{\eta}\big{)}.$ Thus, we obtain

[TABLE]

Similarly we have

[TABLE]

The definition of $\vec{\mathbf{h}}(x)$ implies that for any $\vec{h}\in\mathbb{H}$

[TABLE]

and we get from (3.1), (3.2) and (3.3) for any $\vec{h}\in\mathbb{H}$

[TABLE]

$2^{0}.\;$ We obviously have for any $\vec{h},\vec{\eta}\in{\cal H}^{d}$

[TABLE]

Note that for any $\mathrm{h}\in{\cal H}^{d}$

[TABLE]

in view of the structural assumption (1.1) imposed on the density $\mathfrak{p}$ . Note that

[TABLE]

and, therefore, in view of the definition of $M\big{(}\cdot,\vec{h}\big{)}$ , c.f. (2.2), we obtain for any $\mathrm{h}\in{\cal H}^{d}$

[TABLE]

We deduce from (3.5) that

[TABLE]

and, therefore, for any $\vec{h},\vec{\eta}\in{\cal H}^{d}$

[TABLE]

$3^{0}.\;$ Set for any $\vec{h}\in{\cal H}^{d}$ and any $x\in{\mathbb{R}}^{d}$

[TABLE]

We obtain in view of (3.6) that for any $\vec{h}\in\mathbb{H}$ (since obviously $\vec{h}\vee\vec{\eta}\in{\cal H}^{d}$ for any $\vec{h},\vec{\eta}\in{\cal H}^{d}$ )

[TABLE]

Note also that in view of the obvious inequality $(\sup_{\alpha}F_{\alpha}-\sup_{\alpha}G_{\alpha})_{+}\leq\sup_{\alpha}(F_{\alpha}-G_{\alpha})_{+}$

[TABLE]

We get from (3.4), (3.7) and (3.8)

[TABLE]

It remains to note that

[TABLE]

and we obtain for any $\vec{h}\in\mathbb{H}$ and $x\in{\mathbb{R}}^{d}$

[TABLE]

Noting that the left hand side of the latter inequality is independent of $\vec{h}$ we obtain for any $x\in{\mathbb{R}}^{d}$

[TABLE]

This is the pointwise oracle inequality.

Application of Proposition 1. Set for any $x\in{\mathbb{R}}^{d}$

[TABLE]

Applying Proposition 1 we obtain in view of (3.9) and the triangle inequality

[TABLE]

where $\mathbf{C}_{p}=5(C_{p})^{\frac{1}{p}}+20(C^{\prime}_{p})^{\frac{1}{p}}$ . The theorem is proved.

3.1.2 Proof of Proposition 1

Since the proof of the proposition is quite long and technical, we divide it into several steps.

Preliminaries

$1^{0}.\;$ We start the proof with the following simple remark. Let $\check{M}\big{(}t,\vec{h}\big{)},t\in{\mathbb{R}}^{d},$ denote the Fourier transform of $M\big{(}\cdot,\vec{h}\big{)}$ . Then, we obtain in view of the definition of $M\big{(}\cdot,\vec{h}\big{)}$

[TABLE]

Note that Assumptions 1 and 2 guarantee that $\check{M}\big{(}\cdot,\vec{h}\big{)}\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}\cap{\mathbb{L}}_{2}\big{(}{\mathbb{R}}^{d}\big{)}$ for any $\vec{h}\in{\cal H}^{d}$ and, therefore,

[TABLE]

Thus, putting

[TABLE]

we obtain in view of Assumptions 1 and 2 for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

where $M_{2}=\big{[}(2\pi)^{-d}\big{\{}\varepsilon^{-1}\big{\|}\check{K}\big{\|}_{2}\mathrm{1}_{\alpha\neq 1}+\Upsilon_{0}^{-1}\mathbf{k}_{2}\mathrm{1}_{\alpha=1}\big{\}}\big{]}\vee 1.$ Additionally we deduce from (3.11)

[TABLE]

Let ${\cal L}\big{(}\cdot,\vec{h}\big{)}$ be either $M\big{(}\cdot,\vec{h}\big{)}$ or $M^{2}\big{(}\cdot,\vec{h}\big{)}$ and let ${\cal L}_{\infty}\big{(}\vec{h}\big{)}$ denote either ${\cal M}_{\infty}\big{(}\vec{h}\big{)}$ or ${\cal M}^{2}_{\infty}\big{(}\vec{h}\big{)}$ .

We have in view of (3.11)

[TABLE]

Additionally, we get from (3.11) and (3.12)

[TABLE]

Set $\sigma^{{\cal L}}\big{(}x,\vec{h}\big{)}=\sqrt{\int_{{\mathbb{R}}^{d}}{\cal L}^{2}\big{(}t-x,\vec{h}\big{)}\mathfrak{p}(t)\nu_{d}({\rm d}t)}$ and note that in view of (3.14) for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

Next, we have in view of (3.13)

[TABLE]

$2^{0}.\;$ Define for any $x\in{\mathbb{R}}^{d}$ and $\vec{h}\in{\cal H}^{d}$

[TABLE]

where remind $\lambda_{n}\big{(}\vec{h}\big{)}=4\ln(M_{\infty})+6\ln{(n)}+(8p+26)\sum_{j=1}^{d}\big{[}1+\boldsymbol{\mu}_{j}(\alpha)\big{]}\big{|}\ln(h_{j})\big{|}.$

Noting that $\sup_{z\in[a,b]}|\ln z|\leq|\ln a|\vee|\ln b|$ for any $0<a<b<\infty$ we deduce from (3.16) $z_{n}\big{(}x,\vec{h}\big{)}\leq\lambda_{n}\big{(}\vec{h}\big{)}$ for any $x\in{\mathbb{R}}^{d}$ and, therefore, for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

First step

Let $x\in{\mathbb{R}}^{d}$ and $\vec{h}\in{\cal H}^{d}$ be fixed and put $b=8p+22$ .

We obtain for any $z\geq 1$ and $q\geq 1$ by the integration of the Bernstein inequality

[TABLE]

where $\Gamma$ is the Gamma-function.

$1^{0}.\;$ Choose $z=z_{n}\big{(}x,\vec{h}\big{)}.$ Noting that for any $n\in{\mathbb{N}}^{*}$ and $x\in{\mathbb{R}}^{d}$

[TABLE]

and taking into account that $\exp{\{-|\ln(y)|\}}\leq y$ for any $y>0$ , we get

[TABLE]

Here to get the second inequality we have used (3.13) and put $C^{(1)}_{q}=2M_{\infty}^{2q}3^{q}\Gamma(q+1)$ .

Set ${\cal X}\big{(}\vec{h}\big{)}=\big{\{}x\in{\mathbb{R}}^{d}:\;\sigma^{{\cal L}}\big{(}x,\vec{h}\big{)}\geq n^{-3/2}{\cal L}_{\infty}\big{(}\vec{h}\big{)}\big{\}}$ , $\bar{{\cal X}}\big{(}\vec{h}\big{)}={\mathbb{R}}^{d}\setminus{\cal X}\big{(}\vec{h}\big{)}$ and later on the integration over the empty set is supposed to be zero.

We have in view of (3.17), (3.15) and (3.1.2) applied with $q=p$ that for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

where $C^{(2)}_{p}=C^{(1)}_{p}M^{2}_{2}M_{\infty}^{2}$ .

$2^{0}.\;$ Introduce the following notations. For any $i=1,\ldots,n$ set

[TABLE]

and introduce the random event $D\big{(}x,\vec{h}\big{)}=\Big{\{}\sum_{i=1}^{n}\Psi_{i}\big{(}x,\vec{h}\big{)}\geq 2\Big{\}}$ . As usual, the complimentary event will be denoted by $\bar{D}\big{(}x,\vec{h}\big{)}$ . Set finally $\pi\big{(}x,\vec{h}\big{)}={\mathbb{P}}_{f}\big{\{}\Psi_{1}\big{(}x,\vec{h}\big{)}=1\big{\}}$ .

We obviously have

[TABLE]

and, therefore,

[TABLE]

Applying Cauchy-Schwartz inequality, we deduce from (3.20) that

[TABLE]

Using (3.1.2) with $q=2p$ and (3.13) we obtain for any $x\in\bar{{\cal X}}\big{(}\vec{h}\big{)}$

[TABLE]

where we have put $C^{(3)}_{p}=\big{[}C^{(1)}_{2p}\big{]}^{\frac{1}{2}}M_{\infty}^{2}$ .

For any $\lambda>0$ we have in view of the exponential Markov inequality

[TABLE]

We get applying the Tchebychev inequality $\pi\big{(}x,\vec{h}\big{)}\leq n^{2}{\cal L}^{-2}_{\infty}\big{(}\vec{h}\big{)}\big{[}\sigma^{{\cal L}}\big{(}x,\vec{h}\big{)}\big{]}^{2}.$ It yields

[TABLE]

Note that the definition of $\bar{{\cal X}}\big{(}\vec{h}\big{)}$ implies $n^{3}{\cal L}^{-2}_{\infty}\big{(}\vec{h}\big{)}\big{[}\sigma^{{\cal L}}\big{(}x,\vec{h}\big{)}\big{]}^{2}<1$ for any $x\in\bar{{\cal X}}\big{(}\vec{h}\big{)}$ . Hence, choosing $\lambda=\ln 2-2\ln{\big{\{}n^{3/2}{\cal L}^{-1}_{\infty}\big{(}\vec{h}\big{)}\sigma^{{\cal L}}\big{(}x,\vec{h}\big{)}\big{\}}}$ we have

[TABLE]

It yields, together with (3.13), (3.15) and (3.21) and for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

where $C^{(4)}_{p}=C^{(3)}_{p}(e/2)M_{\infty}^{6}M_{2}^{2}$ . Putting $C^{(5)}_{p}=C^{(2)}_{p}+C^{(4)}_{p}$ and noting that $2p+10-b/2<0$ we obtain from (3.19) and (3.22) for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

$3^{0}.\;$ Choosing ${\cal L}=M$ and ${\cal L}_{\infty}={\cal M}_{\infty}$ we get from (3.23) and the definition of $b$

[TABLE]

The first assertion of the proposition follows from (3.24) with $C_{p}=C^{(5)}_{p}\sum_{k\in{\mathbb{Z}}^{d}}e^{-\sum_{j=1}^{d}|k_{j}|}.$

Second step

Denoting $\chi\big{(}x,\vec{h}\big{)}=\big{\{}\big{|}\widehat{\sigma}^{2}\big{(}x,\vec{h}\big{)}-\sigma^{2}\big{(}x,\vec{h}\big{)}\big{|}-\mathfrak{U}_{n}\big{(}x,\vec{h}\big{)}\big{\}}_{+}$ , where

[TABLE]

and choosing ${\cal L}=M^{2}$ and ${\cal L}_{\infty}={\cal M}^{2}_{\infty}$ , we get from (3.23)

[TABLE]

Note that $\sigma^{M^{2}}\big{(}x,\vec{h}\big{)}\leq{\cal M}_{\infty}\big{(}\vec{h}\big{)}\sigma\big{(}x,\vec{h}\big{)}$ and, therefore, for any $x\in{\mathbb{R}}^{d}$ and any $\vec{h}\in{\cal H}^{d}$

[TABLE]

This implies,

[TABLE]

where we have denoted $\chi^{*}(x,\vec{h}\big{)}={\cal M}^{-1}_{\infty}\big{(}\vec{h}\big{)}\chi(x,\vec{h}\big{)}$ . Hence

[TABLE]

By the same reason

[TABLE]

Note that the definition of $\widehat{U}_{n}\big{(}x,\vec{h}\big{)}$ and $U_{n}\big{(}x,\vec{h}\big{)}$ implies that

[TABLE]

Using the inequality $\sqrt{|ab|}\leq 2^{-1}(|ay|+|b/y|)$ , $y>0$ we get from (3.27), (3.28) and (3.29)

[TABLE]

Choosing $y=1/2$ in the first inequality and $y=1$ in the second we get for any $x\in{\mathbb{R}}^{d}$ and $\vec{h}\in{\cal H}^{d}$

[TABLE]

Remembering that $b=8p+22$ we obtain from (3.30), (3.31), (3.25) and (3.13) for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

The second and third assertions follow from (3.32) and (3.33) with $C^{\prime}_{p}=M_{\infty}^{2p}C^{(5)}_{p}.$

3.2 Proof of Theorem 2

Let $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)$ . Introduce the following notations:

[TABLE]

where $c_{1}=M_{2}\sqrt{2D}$ , $c_{2}=\frac{4M_{\infty}}{3}$ and $c_{3}=2\max\big{\{}4\ln(M_{\infty}),(8p+26)\max_{j=1,\ldots,d}[1+\boldsymbol{\mu}_{j}(\alpha)]\big{\}}.$

3.2.1 Preliminaries

Recall that for any locally integrable function $\lambda:{\mathbb{R}}^{d}\to{\mathbb{R}}$ its strong maximal function is defined as

[TABLE]

where the supremum is taken over all possible rectangles $H$ in ${\mathbb{R}}^{d}$ with sides parallel to the coordinate axes, containing point $x$ .

It is well known that the strong maximal operator $\lambda\mapsto\mathfrak{M}[\lambda]$ is of the strong $(\mathbf{t},\mathbf{t})$ –type for all $1<\mathbf{t}\leq\infty$ , i.e., if $\lambda\in{\mathbb{L}}_{\mathbf{t}}({\mathbb{R}}^{d})$ then $\mathfrak{M}[\lambda]\in{\mathbb{L}}_{\mathbf{t}}({\mathbb{R}}^{d})$ and there exists a constant $C_{\mathbf{t}}$ depending on $\mathbf{t}$ only such that

[TABLE]

Let $\mathfrak{m}[\lambda]$ be defined by (3.34), where, instead of rectangles, the supremum is taken over all possible cubes $H$ in ${\mathbb{R}}^{d}$ with sides parallel to the coordinate axes, containing point $x$ . Then, it is known that $\lambda\mapsto\mathfrak{m}[\lambda]$ is of the weak $(1,1)$ -type, i.e. there exists $C_{\mathbf{1}}$ depending on $d$ only such that for any $\lambda\in{\mathbb{L}}_{1}({\mathbb{R}}^{d})$

[TABLE]

The results presented below deal with the weak property of the strong maximal function. The following inequality can be found in Guzman (1975). There exists a constant $\mathbf{C}>0$ depending on $d$ only such that

[TABLE]

where for all $z\in{\mathbb{R}}$ , $\ln_{+}(z):=\max\{\ln(z),0\}$ .

Lemma 1.

For any given $d\geq 1,R>0$ , $Q>0$ and $\mathbf{q}\in(1,\infty]$ there exists $C(d,\mathbf{q},R,Q)$ such that for any $\lambda\in\mathbb{B}_{1,d}(R)\cap\mathbb{B}_{\mathbf{q},d}(Q)$

[TABLE]

The proof of the lemma is an elementary consequence of the aforementioned result and can be omitted.

Recall also the particular case of the Young inequality for weak-type spaces, see Grafakos (2008), Theorem 1.2.13. For any $\mathbf{u}\in(1,\infty]$ there exists $C_{\mathbf{u}}>0$ such that for any $\lambda_{1}\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$ and $\lambda_{2}\in{\mathbb{L}}_{\mathbf{u},\infty}\big{(}{\mathbb{R}}^{d}\big{)}$ one has

[TABLE]

Auxiliary results

Let us prove several simple facts. First note that for any $n\geq 3$ for any $\vec{h}\in{\cal H}^{d}$

[TABLE]

Second it is easy to see that for any any $n\geq 3$ ,

[TABLE]

where $l(v)=v^{-1}(1+\ln{v})$ . Since $\vec{\eta}\geq\vec{h}$ implies $V_{\vec{\eta}}\geq V_{\vec{h}}$ and $l(v)\leq 1$ if $v\geq 1$ , we have

[TABLE]

Then by (3.38) and the second inequality in (3.39), we have:

[TABLE]

Now let us establish two bounds for $\|U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\|_{\infty}$ .

$\bf 1^{0}a.\;$ Let $\mathbf{u}=\infty$ . We have in view of the second inequality in (3.11) for any $\vec{\eta}\in{\cal H}^{d}$

[TABLE]

It yields for any $x\in{\mathbb{R}}^{d}$ in view of the first inequality in (3.39)

[TABLE]

Then gathering (3.40), (3.41) and by definition of $a$ , we have

[TABLE]

$\bf 1^{0}b.\;$ Another bound for $\|U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\|_{\infty}$ is available regardless of the value of $\mathbf{u}$ . Indeed for any $\vec{\eta}\in{\cal H}^{d}$ in view of the first inequality in (3.11)

[TABLE]

It yields for any $x\in{\mathbb{R}}^{d}$ and any $n\geq 3$

[TABLE]

Then gathering with (3.40) again, we have $\|U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}\|_{\infty}\leq\big{[}(\sqrt{2c_{3}n}M_{\infty})\vee(c_{2}c_{3})\big{]}G_{n}\big{(}\vec{h}\big{)}$ for any $\vec{h}\in{\cal H}^{d}$ and, therefore,

[TABLE]

To get this it suffices to choose $\vec{h}=(b,\ldots,b)$ and to make $b$ tend to infinity.

$\bf 2^{0}.\;$ Let now $\mathbf{u}<\infty$ . Let us prove that for any $\mathfrak{z}>0$ , $\mathbf{s}\in\{1,\mathbf{u}\}$ and any $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)$

[TABLE]

where we have put ${\cal U}^{2}_{n}\big{(}\cdot,\vec{\eta},f\big{)}=2n^{-1}\lambda_{n}\big{(}\vec{\eta}\big{)}\sigma^{2}\big{(}\cdot,\vec{\eta}\big{)}$ and $\widetilde{D}=1$ if $\mathbf{s}=1$ and $\widetilde{D}=D$ if $\mathbf{s}=\mathbf{u}$ .

Indeed, if $\mathbf{s}=1$ , applying the Markov inequality, we obtain in view of the second inequality in (3.11) for any $\vec{\eta}\in{\cal H}^{d}$

[TABLE]

Here we have put $c_{6}=2M_{2}^{2}c_{1}^{2}c_{3}$ and to get the last inequality we have used (3.38).

To get the similar result if $\mathbf{s}=\mathbf{u}$ we remark that $\sigma^{2}\big{(}\cdot,\vec{\eta}\big{)}=M^{2}\big{(}\cdot,\vec{\eta}\big{)}\star\mathfrak{p}(\cdot)$ and that $M^{2}\big{(}\cdot,\vec{\eta}\big{)}\in{\mathbb{L}}_{1}\big{(}{\mathbb{R}}^{d}\big{)}$ in view of the second inequality in (3.11). It remains to note that $f\in\mathbb{F}_{g,\mathbf{u}}(R,D)$ implies $\mathfrak{p}\in\mathbb{B}_{\mathbf{u},d}^{(\infty)}(D)$ and to apply the inequality (3.37).

It yields together with the second inequality in (3.11) for any $\vec{\eta}\in{\cal H}^{d}$

[TABLE]

Thus, denoting $\widetilde{C}=1$ if $\mathbf{s}=1$ and $\widetilde{C}=C_{\mathbf{u}}$ if $\mathbf{s}=\mathbf{u}$ , we get from (3.45) and (3.46)

[TABLE]

It remains to note that since $\vec{\eta},\vec{h}\in{\cal H}^{d}$ and $\vec{\eta}\geq\vec{h}$ we can write $\eta_{j}=e^{m_{j}}h_{j}$ with $m_{j}\geq 0$ for any $j=1,\ldots,d$ . It yields together with the first inequality in (3.39)

[TABLE]

Hence, (3.44) with $c_{5}=c_{7}[c_{6}\widetilde{C}]^{\mathbf{s}}$ follows from (3.47).

$\bf 3^{0}.\;$ Let $c_{K}\geq 1$ be such that $\text{supp}(K)\subset[-c_{K},c_{K}]^{d}$ . We have

[TABLE]

If $\vec{h}=(h,\ldots,h),h\in(0,\infty)$ , the latter inequality holds with $\mathfrak{m}[|f|]$ instead of $\mathfrak{M}[|f|]$ . Thus,

[TABLE]

where we have denoted $\mathfrak{M}_{\mathbb{H}}=\mathfrak{M}$ if $\mathbb{H}={\cal H}^{d}$ and $\mathfrak{M}_{\mathbb{H}}=\mathfrak{m}$ if $\mathbb{H}={\cal H}^{d}_{\text{isotr}}$ .

Moreover, we deduce from (3.48) and (3.43) putting $T_{\vec{h}}(x,f)={\cal B}_{\vec{h}}(x,f)+49U^{*}_{n}\big{(}\cdot,\vec{h}\big{)}$ that

[TABLE]

3.2.2 Proof of the theorem

For any $v>0$ set ${\cal C}_{v}(f)=\big{\{}x\in{\mathbb{R}}^{d}:\;\mathbf{T}(x,f)\geq v\big{\}}$ , where we have put $\mathbf{T}(x,f)=\inf_{\vec{h}\in\mathbb{H}}|T_{\vec{h}}(x,f)|$ . For any given $\overline{\boldsymbol{v}}>0$ one obviously has

[TABLE]

Denoting ${\cal W}_{v}(\vec{h},f)=\{x\in{\mathbb{R}}^{d}:\;49U^{*}_{n}\big{(}x,\vec{h}\big{)}\geq 2^{-1}v\}$ we obviously have for any $\vec{h}\in\mathbb{H}$ and $v>0$

[TABLE]

The last inequality follows from (3.49). Set ${\cal U}^{*}_{n}\big{(}x,\vec{h},f\big{)}=\sup_{\vec{\eta}\in{\cal H}^{d}:\;\vec{\eta}\geq\vec{h}}\;{\cal U}_{n}\big{(}x,\vec{\eta},f\big{)}$ .

$\bf 1^{0}.\;$ Noting that $U^{*}_{n}\big{(}x,\vec{h}\big{)}\leq{\cal U}^{*}_{n}\big{(}x,\vec{h},f\big{)}+(196a)^{-1}G_{n}\big{(}\vec{h}\big{)}$ in view of (3.40), we get

[TABLE]

Applying (3.44) with $\mathbf{s}=1$ we deduce from (3.51) that

[TABLE]

Noting that the left hand side of the latter inequality is independent of $\vec{h}$ we get

[TABLE]

$\bf 2^{0}.\;$ Let us establish the following bounds, where $c_{9}$ is given in the paragraph $\bf 2^{0}b.$ below.

For any $\mathbf{u}\in[1,\infty]$ ,

[TABLE]

and for any $\mathbf{u}\in(p/2,\infty]$ ,

[TABLE]

$\bf 2^{0}a.\;$ Let $\mathbf{u}=\infty$ . We remark that ${\cal W}_{v}(\vec{h},f)=\emptyset$ for any $\vec{h}\in\mathfrak{H}(v,2)$ in view of (3.42). Thus, we deduce from (3.51), (3.52) and (2.6), taking into account that the left hand sides of both inequalities are independent of $\vec{h}$

[TABLE]

This inequality and (3.55) ensure that (3.56) and (3.57) hold if $\mathbf{u}=\infty$ .

$\bf 2^{0}b.\;$ Let $\mathbf{u}<\infty$ . Applying (3.44) with $\mathbf{s}=\mathbf{u}$ , we obtain in view of (3.54)

[TABLE]

It yields together with (3.51)

[TABLE]

This inequality and (3.55) ensure that (3.56) holds if $\mathbf{u}<\infty$ .

What is more, we have in view of (3.40) and (3.54) for any $\vec{h}\in\mathfrak{H}(v)$

[TABLE]

Moreover, applying (3.44) with $\mathbf{s}=\mathbf{u}$ , we have for any $y>0$ and $\vec{h}\in\mathfrak{H}(v,z)$

[TABLE]

Hence, if additionally $\mathbf{u}>p/2$ , we have for any $\vec{h}\in\mathfrak{H}(v,z)$

[TABLE]

This yields together with (3.52)

[TABLE]

This inequality ensures that (3.57) holds if $\mathbf{u}<\infty$ .

$\bf 3^{0}.\;$ Recall that $f\in\mathbb{F}_{g}(R)$ implies that $f\in\mathbb{B}_{\mathbf{1},d}(R)$ . Since additionally $f\in\mathbb{B}_{\mathbf{q},d}(D)$ , $\mathbf{q}>1$ , Lemma 1 as well as (3.36) is applicable and we obtain in view of (3.53)

[TABLE]

It yields for any $\overline{\boldsymbol{v}}>0$ and $p>1$

[TABLE]

In the case of $t(\mathbb{H})=0$ the last inequality is obvious and if $t(\mathbb{H})=d-1$ it follows by integration by parts. The assertion of the theorem follows now from (3.50), where the bound (3.61) is used for any $v<\underline{\boldsymbol{v}}$ , the estimate (3.56) for any $v\in[\underline{\boldsymbol{v}},\overline{\boldsymbol{v}}]$ and the bound (3.57) with $v=\overline{\boldsymbol{v}}$ .

3.3 Proof of Theorem 3

The proof of the theorem is based essentially on some auxiliary statements formulated in Section 3.3.1 below.

Some properties related to the kernel approximation of the underlying function $f$ are summarized in Lemma 2 and in formulae (3.62). The results presented in Lemma 1 and in formulae (3.63) deal with the properties of the strong maximal function. In the subsequent proof $c_{1},c_{2},\ldots$ , stand for constants depending only on $\vec{s},\vec{q},g,{\cal K},d$ , $R,D,\mathbf{u}$ and $\mathbf{q}$ .

3.3.1 Auxiliary results

Let $\mathfrak{J}$ denote the set of all the subsets of $\{1,\ldots d\}$ endowed with the empty set $\emptyset$ . For any $J\in\mathfrak{J}$ and $y\in{\mathbb{R}}^{d}$ set $y_{J}=\{y_{j},\;j\in J\}\in{\mathbb{R}}^{|J|}$ and we will write $y=\big{(}y_{J},y_{\bar{J}}\big{)}$ , where as usual $\bar{J}=\{1,\ldots d\}\setminus J$ .

For any $j=1,\ldots,d$ introduce the $d\times d$ matrix $\mathbf{E}_{j}=(\mathbf{0},\ldots,\mathbf{e}_{j},\ldots,\mathbf{0})$ where, recall, $(\mathbf{e}_{1},\dots,\mathbf{e}_{d})$ denotes the canonical basis of $\mathbb{R}^{d}$ . Set also $\mathbf{E}[J]=\sum_{j\in J}\mathbf{E}_{j}$ . Later on $\mathbf{E}_{0}=\mathbf{E}[\emptyset]$ denotes the matrix with zero entries.

To any $J\in\mathfrak{J}$ and any $\lambda:{\mathbb{R}}^{d}\to{\mathbb{R}}$ associate the function

[TABLE]

with the obvious agreement $\lambda_{J}\equiv\lambda$ if $J=\{1,\ldots d\}$ , which is always the case if $d=1$ .

For any $\vec{h}\in{\cal H}^{d}$ and $J\subseteq\{1,\ldots d\}$ set $K_{\vec{h},J}(u_{J})=\prod_{j\in J}h^{-1}_{j}{\cal K}\big{(}u_{j}/h_{j}\big{)}$ and define for any $y\in{\mathbb{R}}^{d}$

[TABLE]

where $\nu_{|\bar{J}|}$ is the Lebesgue measure on ${\mathbb{R}}^{|\bar{J}|}$ . For any $\vec{h},\vec{\eta}\in{\cal H}^{d}$ set

[TABLE]

Lemma 2.

Let Assumption 3 hold. One can find $k\in\{1,\ldots d\}$ and a collection of indexes $\big{\{}j_{1}<j_{2}<\cdots<j_{k}\big{\}}\in\{1,\ldots,d\}$ such that for any $x\in{\mathbb{R}}^{d}$ and any $f:{\mathbb{R}}^{d}\to{\mathbb{R}}$

[TABLE]

The proof of the lemma can be found in Lepski (2015), Lemma 2.

Also, let us mention the following bound which is a trivial consequence of the Young inequality and the Fubini theorem. If $\lambda\in{\mathbb{L}}_{\mathbf{t}}({\mathbb{R}}^{d})$ then for any $\mathbf{t}\in[1,\infty]$

[TABLE]

To any $J\in\mathfrak{J}$ and any locally integrable function $\lambda:{\mathbb{R}}^{d}\to{\mathbb{R}}_{+}$ we associate the operator

[TABLE]

where the supremum is taken over all hyper-rectangles in ${\mathbb{R}}^{|\bar{J}|}$ containing $x_{\bar{J}}=(x_{j},j\in\bar{J})$ and with sides parallel to the axis.

As we see $\mathfrak{M}_{J}[\lambda]$ is the strong maximal operator applied to the function obtained from $\lambda$ by fixing the coordinates whose indices belong to $J$ . It is obvious that $\mathfrak{M}_{\emptyset}[\lambda]\equiv\mathfrak{M}[\lambda]$ and $\mathfrak{M}_{\{1,\ldots,d\}}[\lambda]\equiv\lambda$ .

The following result is a direct consequence of (3.35) and of the Fubini theorem. For any $\mathbf{t}\in(1,\infty]$ there exists $\mathbf{C}_{\mathbf{t}}$ such that for any $\lambda\in{\mathbb{L}}_{\mathbf{t}}\big{(}{\mathbb{R}}^{d})$

[TABLE]

Obviously this inequality holds if $\mathbf{t}=\infty$ with $\mathbf{C}_{\infty}=1$ .

3.3.2 Proof of the theorem

$\mathbf{1^{0}.}\;$ We start with the following obvious observation. For any $\lambda:{\mathbb{R}}^{d}\to{\mathbb{R}}_{+}$ , $\vec{u}\in{\mathbb{R}}^{d}$ and $J\in\mathfrak{J}$

[TABLE]

Putting $C_{1}=(2c_{\cal K}\|{\cal K}\|_{\infty})^{d}$ we get for any $\vec{h},\vec{\eta}\in{\cal H}^{d}$ and $x\in{\mathbb{R}}^{d}$ in view of (3.65) and assertions of Lemma 2 that

[TABLE]

Thus noting that the right hand side of the first inequality above is independent of $\vec{\eta}$ , we obtain

[TABLE]

Applying (3.64) with $\mathbf{t}=\infty$ , we have for any $v>0$ in view of the definition of $J(\vec{h},v)$

[TABLE]

We obtain for any $f\in\mathbb{F}$ , $v>0$ and $\vec{s}=(s_{1},\ldots,s_{d})\in(1,\infty)^{d}$ , applying consecutively the Markov inequality and (3.64) with $\mathbf{t}=s_{j}$ ,

[TABLE]

Noting that the right hand side of the latter inequality is independent of $f$ and the left hand side is independent of $\vec{s}$ , we get

[TABLE]

$\mathbf{2^{0}.}\;$ Note also that in view of (3.67), we have for any $v>0$

[TABLE]

For any $v>0$ and $j=1,\ldots,d,$ introduce

[TABLE]

Noting that in view of (3.67) for any $v>0$ and any $j\in\bar{J}(\vec{h},v)$

[TABLE]

we deduce from (3.3.2) that for any $\vec{q}\in[p,\infty)^{d}$

[TABLE]

It remains to note that similarly (3.68) for any $\vec{s}\in(1,\infty)^{d}$

[TABLE]

and to apply (3.64) with $\mathbf{t}=q_{j}$ to the each term in the sum appeared in (3.3.2). All of this together with (3.68), applied with $\vec{s}=\vec{q}$ yields for any $v>0$ and $\vec{q}\in[p,\infty)^{d}$

[TABLE]

Noting that the right hand side of the latter inequality is independent of $f$ and the left hand side is independent of $\vec{q}$ , the we get

[TABLE]

The first assertion of the theorem follows from (3.69), (3.72) and Theorem 2.

$\mathbf{3^{0}.}\;$ Remark that in view of (3.48) and (3.35) $f\in\mathbb{B}_{\mathbf{q},d}(D)$ implies

[TABLE]

where $C_{\mathbf{q}}$ is the constant which appeared in (3.35). Hence for any $v>0$ and $\mathbf{q}\in[p,\infty)$

[TABLE]

Remind that $\mathfrak{H}(v)\neq\emptyset$ , $\mathfrak{H}(v,z)\neq\emptyset$ whatever $v>0$ and $z\geq 2$ , see Remark 2. Hence, in view of (3.74) for any $f$

[TABLE]

It remains to note that the right hand side of the obtained inequality is independent of $f$ and the second assertion of the theorem follows from this inequality, (3.69) and Theorem 2.

$\mathbf{4^{0}.}\;$ Since $C_{\mathbf{\infty}}=1$ we obtain in view of (3.73) for all $f\in\mathbb{B}_{\infty,d}(D)$

[TABLE]

It yields for any $\vec{s}\in(1,\infty)$ in view of (3.68) if $\mathbf{q}=\infty$

[TABLE]

Since the left hand side of the obtained inequality is independent of $f$ and the left hand side is independent of $\vec{s}$ we conclude that

[TABLE]

The third assertion of the theorem follows now from (3.69), (3.75) and Theorem 2.

$\mathbf{5^{0}.}\;$ We have already seen (Corollary 1), that $B^{*}_{\vec{h}}(\cdot,f)\leq 2\sup_{\eta\in{\cal H}:\eta\leq h}B_{\vec{\eta}}(\cdot,f)$ if $\vec{h}=(h,\ldots,h)\in{\cal H}^{d}_{\text{isotr}}$ . Therefore by definition of ${\cal B}_{\vec{h}}(\cdot,f)$ :

[TABLE]

where, remind $\vec{\eta}=(\eta,\ldots,\eta)\in{\cal H}^{d}_{\text{isotr}}$ . We remark that (3.76) is similar to (3.66) but the maximal operator is not involved in this bound. This, in its turn, allows to consider $\vec{s}\in[1,\infty)^{d}$ .

Indeed, similarly to (3.67) we have for any $v>0$ , applying (3.62) with $\mathbf{t}=\infty$

[TABLE]

We obtain for any $f\in\mathbb{F}$ , $v>0$ and $\vec{s}=(s_{1},\ldots,s_{d})\in[1,\infty)^{d}$ applying consecutively the Markov inequality and (3.62) with $\mathbf{t}=s_{j}$

[TABLE]

We note that the obtained inequality coincides with (3.68) if one replaces $\mathbf{B}_{j,s_{j},\mathbb{F}}(\cdot)$ by $\mathbf{B}^{*}_{j,s_{j},\mathbb{F}}(\cdot)$ . It remains to remark that $\mathbf{B}_{j,s_{j},\mathbb{F}}(\cdot)\leq\mathbf{B}^{*}_{j,s_{j},\mathbb{F}}(\cdot)$ . Indeed,

[TABLE]

Therefore, by the monotone convergence theorem and the triangle inequality for any $s\in[1,\infty)$

[TABLE]

The fourth statement of the theorem follows now from (3.69), (3.72), (3.74) and Theorem 2.

3.4 Proof of Assertion 1

Obviously $\mathbb{F}_{K}\big{(}\vec{\beta},\vec{L}\big{)}\subset\mathbb{B}_{\infty,d}(L_{\infty})$ . Thus, we can choose $D=L_{\infty}$ and $\mathbf{q}=\infty$ , which implies $\mathbf{u}=\infty$ . For any $v>0$ let $\vec{\boldsymbol{h}}(v)=\big{(}\boldsymbol{h}_{1}(v),\ldots,\boldsymbol{h}_{d}(v)\big{)}$ , where

[TABLE]

and $\boldsymbol{L}\in(0,1)$ is chosen to satisfy $\boldsymbol{L}L_{0}\leq\mathbf{c}$ . This in its turn implies $\boldsymbol{L}L_{0}<1$ .

This choice of $\vec{\boldsymbol{h}}(v)$ together with the definition of the class $\mathbb{F}_{K}\big{(}\vec{\beta},\vec{L}\big{)}$ implies that

[TABLE]

Moreover, there exists $T_{1}:=T_{1}\big{(}\vec{\beta}\big{)}<\infty$ independent of $\vec{L}$ such that

[TABLE]

Then set $T_{2}=e^{\frac{d}{2}+\sum_{j=1}^{m}\mu_{j}(\alpha)}\sqrt{T_{1}+2}(\boldsymbol{L}L_{0})^{-\frac{1}{2\beta(\alpha)}}$ .

We have in view of (3.79) and (3.80) for all $n$ large enough and any $v\in V_{n}$

[TABLE]

Setting $T_{3}=(\sqrt{2}a^{-1}T_{2})^{\frac{2\beta(\alpha)}{2\beta(\alpha)+1}}$ and $T_{4}=(a^{-1}T^{2}_{2})^{\frac{\beta(\alpha)}{\beta(\alpha)+1}}$ we obtain in view of (3.81) and (3.82) for all $n$ large enough

[TABLE]

It is worth noting that $T_{2}>1$ , which implies $T_{4}>1$ , and $\delta_{n}^{\frac{\beta(\alpha)}{1+\beta(\alpha)}}\varphi^{-1}_{n}\to 0,n\to\infty$ . Choose

[TABLE]

Since $\vec{\boldsymbol{h}}(v)\in\mathfrak{H}(v)$ for any $v\in[\underline{\boldsymbol{v}},\overline{\boldsymbol{v}}]$ in view of (3.83), we deduce from (3.78) and (3.81) for any $\vec{s}$

[TABLE]

This, in its turn, yields for any $\vec{s}$

[TABLE]

where we have denoted $T_{5}=T_{2}^{2}\big{\{}1\vee|p-2-1/\beta(\alpha)|^{-1}\big{\}}$ and

[TABLE]

Moreover, since $\vec{\boldsymbol{h}}(\overline{\boldsymbol{v}})\in\mathfrak{H}(\overline{\boldsymbol{v}},2)$ in view of (3.83), we deduce from (3.78) that for any $\vec{s}$

[TABLE]

At last, putting $T_{6}=T_{4}^{p-1}(1+\ln{T_{4}})^{t(\mathbb{H})}$ , we obtain

[TABLE]

Applying the third assertion of Theorem 3, we deduce from (3.84), (3.85) and (3.86) that

[TABLE]

After elementary computations we come to the statement of Assertion 1.

Bibliography27

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Barron et al. (1999) Barron, A. , Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 , 301–413.
3Birgé and Massart (2001) Birgé, L. and Massart, P. (2001). Gaussian model selection. J. Eur. Math. Soc. (JEMS) 3 , 3, 203 -268.
4Cai (1999) Cai, T. T. (1999). Adaptive wavelet estimation: a block thresholding and oracle inequality approach. Ann. Statist. 27 , 3, 898–924.
5Cavalier and Golubev (2006) Cavalier, L. and Golubev, G.K. (2006). Risk hull method and regularization by projections of ill-posed inverse problems. Ann. Statist. 34 , 1653–1677.
6Comte and Lacour (2013) Comte, F. and Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. Ann. Inst. H. Poincaré Probab. Statist. 49 , 2, 569–609.
7Dalalyan and Tsybakov (2008) Dalalyan, A. and Tsybakov, A.B. (2008). Aggregation by exponential weighting, sharp PAC-Bayesian bounds and sparsity. Machine Learning 72 , 39–61.
8Donoho et al. (1996) Donoho, D. L. , Johnstone, I. M. , Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 , 508–539.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Estimation in the convolution structure density model. Part I: oracle inequalities.

Abstract

keywords:

keywords:

1 Introduction

1.1 Oracle approach via local selection. Objectives of Part I

1.2 Adaptive estimation. Objectives of Part II

1.3 Assumption on the function ggg

Assumption 1**.**

2 Pointwise selection rule and Lp{\mathbb{L}}_{p}Lp​-norm oracle inequality

Assumption 2**.**

2.1 Pointwise selection rule from the family of kernel estimators

Pointwise selection rule

2.2 Lp{\mathbb{L}}_{p}Lp​-norm oracle inequality

Theorem 1**.**

Corollary 1**.**

2.3 Some consequences of Theorem 1

Remark 1**.**

2.3.1 Oracle inequality over Fg,u(R,D)∩Bq,d(D)\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)Fg,u​(R,D)∩Bq,d​(D)

Remark 2**.**

Theorem 2**.**

2.3.2 Application to the minimax adaptive estimation

Assumption 3**.**

Theorem 3**.**

Discussion

Classes of Hölderian type

Definition 1**.**

Assertion 1**.**

3 Proofs

3.1 Proof of Theorem 1

Proposition 1**.**

3.1.1 Proof of the theorem

3.1.2 Proof of Proposition 1

Preliminaries

First step

Second step

3.2 Proof of Theorem 2

3.2.1 Preliminaries

Lemma 1**.**

Auxiliary results

3.2.2 Proof of the theorem

3.3 Proof of Theorem 3

3.3.1 Auxiliary results

Lemma 2**.**

3.3.2 Proof of the theorem

3.4 Proof of Assertion 1

1.3 Assumption on the function $g$

Assumption 1.

2 Pointwise selection rule and ${\mathbb{L}}_{p}$ -norm oracle inequality

Assumption 2.

2.2 ${\mathbb{L}}_{p}$ -norm oracle inequality

Theorem 1.

Corollary 1.

Remark 1.

2.3.1 Oracle inequality over $\mathbb{F}_{g,\mathbf{u}}(R,D)\cap\mathbb{B}_{\mathbf{q},d}(D)$

Remark 2.

Theorem 2.

Assumption 3.

Theorem 3.

Definition 1.

Assertion 1.

Proposition 1.

Lemma 1.

Lemma 2.