Density deconvolution under general assumptions on the distribution of   measurement errors

Denis Belomestny; Alexander Goldenshluger

arXiv:1907.11024·math.ST·February 4, 2020

Density deconvolution under general assumptions on the distribution of measurement errors

Denis Belomestny, Alexander Goldenshluger

PDF

TL;DR

This paper develops a flexible method for density deconvolution that works under broad conditions on measurement error distributions, including cases with zeros in their characteristic functions, improving estimation robustness.

Contribution

It introduces a novel approach for density deconvolution that handles general error distributions, relaxing the common zero-free characteristic function assumption.

Findings

01

Derived upper bounds on estimator risk.

02

Provided conditions where zeros in characteristic functions do not affect accuracy.

03

Showed conditions are necessary in certain cases.

Abstract

In this paper we study the problem of density deconvolution under general assumptions on the measurement error distribution. Typically deconvolution estimators are constructed using Fourier transform techniques, and it is assumed that the characteristic function of the measurement errors does not have zeros on the real line. This assumption is rather strong and is not fulfilled in many cases of interest. In this paper we develop a methodology for constructing optimal density deconvolution estimators in the general setting that covers vanishing and non--vanishing characteristic functions of the measurement errors. We derive upper bounds on the risk of the proposed estimators and provide sufficient conditions under which zeros of the corresponding characteristic function have no effect on estimation accuracy. Moreover, we show that the derived conditions are also necessary in some…

Equations426

Y_{j} = X_{j} + ϵ_{j}, j = 1, \dots, n,

Y_{j} = X_{j} + ϵ_{j}, j = 1, \dots, n,

f_{Y} (y) = [f * d G] (y) = \int_{- \infty}^{\infty} f (y - x) d G (x) .

f_{Y} (y) = [f * d G] (y) = \int_{- \infty}^{\infty} f (y - x) d G (x) .

{\cal R}_{n,\Delta}[\tilde{f};{\mathscr{F}}]=\sup_{f\in{\mathscr{F}}}\Big{\{}\mathbb{E}_{f}\Delta^{2}(\tilde{f},f)\Big{\}}^{1/2},

{\cal R}_{n,\Delta}[\tilde{f};{\mathscr{F}}]=\sup_{f\in{\mathscr{F}}}\Big{\{}\mathbb{E}_{f}\Delta^{2}(\tilde{f},f)\Big{\}}^{1/2},

R_{n, Δ}^{*} [F] = \tilde{f} in f R_{n, Δ} [\tilde{f}; F],

R_{n, Δ}^{*} [F] = \tilde{f} in f R_{n, Δ} [\tilde{f}; F],

g (z) := \int_{- \infty}^{\infty} e^{- z x} d G (x),

g (z) := \int_{- \infty}^{\infty} e^{- z x} d G (x),

R_{n, Δ_{x_{0}}}^{*} [H_{α} (A)] ≍ n^{- α / (2 α + 2 γ + 1)}, R_{n, Δ_{2}}^{*} [S_{α} (A)] ≍ n^{- α / (2 α + 2 γ + 1)}

R_{n, Δ_{x_{0}}}^{*} [H_{α} (A)] ≍ n^{- α / (2 α + 2 γ + 1)}, R_{n, Δ_{2}}^{*} [S_{α} (A)] ≍ n^{- α / (2 α + 2 γ + 1)}

ϕ (z) = L [ϕ; z] := \int_{- \infty}^{\infty} ϕ (t) e^{- z t} d t .

ϕ (z) = L [ϕ; z] := \int_{- \infty}^{\infty} ϕ (t) e^{- z t} d t .

Σ_{ϕ} := {z \in C : σ_{ϕ}^{-} < Re (z) < σ_{ϕ}^{+}} for some - \infty \leq σ_{ϕ}^{-} < σ_{ϕ}^{+} \leq \infty.

Σ_{ϕ} := {z \in C : σ_{ϕ}^{-} < Re (z) < σ_{ϕ}^{+}} for some - \infty \leq σ_{ϕ}^{-} < σ_{ϕ}^{+} \leq \infty.

ϕ (iω) = F [ϕ; ω] := \int_{- \infty}^{\infty} ϕ (t) e^{- iω t} d t, ω \in R

ϕ (iω) = F [ϕ; ω] := \int_{- \infty}^{\infty} ϕ (t) e^{- iω t} d t, ω \in R

ϕ (t) = \frac{1}{2 π i} \int_{s - i \infty}^{s + i \infty} ϕ (z) e^{z t} d z = \frac{1}{2 π} \int_{- \infty}^{\infty} ϕ (s + iω) e^{(s + iω) t} d ω, s \in (σ_{ϕ}^{-}, σ_{ϕ}^{+}) .

ϕ (t) = \frac{1}{2 π i} \int_{s - i \infty}^{s + i \infty} ϕ (z) e^{z t} d z = \frac{1}{2 π} \int_{- \infty}^{\infty} ϕ (s + iω) e^{(s + iω) t} d ω, s \in (σ_{ϕ}^{-}, σ_{ϕ}^{+}) .

g (z) = \int_{- \infty}^{\infty} e^{- z t} d G (t)

g (z) = \int_{- \infty}^{\infty} e^{- z t} d G (t)

\int K (x) f (x) d x = \int L (y) f_{Y} (y) d y .

\int K (x) f (x) d x = \int L (y) f_{Y} (y) d y .

\tilde{f} (x_{0}) := \frac{1}{n} i = 1 \sum n L (Y_{i}) .

\tilde{f} (x_{0}) := \frac{1}{n} i = 1 \sum n L (Y_{i}) .

\widehat{g}(-z)\neq 0,\;\;\;\;\forall z\in S_{g}:=\big{\{}z:{\rm Re}(z)\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})\big{\}}.

\widehat{g}(-z)\neq 0,\;\;\;\;\forall z\in S_{g}:=\big{\{}z:{\rm Re}(z)\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})\big{\}}.

L_{h} (z) := \frac{K ( z h )}{g ( - z )}, z \in S_{g},

L_{h} (z) := \frac{K ( z h )}{g ( - z )}, z \in S_{g},

L_{s, h} (t)

L_{s, h} (t)

\tilde{f}_{s, h} (x_{0}) = \frac{1}{n} j = 1 \sum n L_{s, h} (Y_{j} - x_{0}), s \in (ϰ_{g}^{-}, 0) \cup (0, ϰ_{g}^{+}) .

\tilde{f}_{s, h} (x_{0}) = \frac{1}{n} j = 1 \sum n L_{s, h} (Y_{j} - x_{0}), s \in (ϰ_{g}^{-}, 0) \cup (0, ϰ_{g}^{+}) .

\int_{- \infty}^{\infty} ∣ L_{s, h} (y - x_{0}) ∣ f_{Y} (y) d y < \infty;

\int_{- \infty}^{\infty} ∣ L_{s, h} (y - x_{0}) ∣ f_{Y} (y) d y < \infty;

\int_{-\infty}^{\infty}L_{s,h}(y-x_{0})f_{Y}(y)\mathrm{d}y=\int_{-\infty}^{\infty}\frac{1}{h}K\bigg{(}\frac{x-x_{0}}{h}\bigg{)}f(x)\mathrm{d}x.

\int_{-\infty}^{\infty}L_{s,h}(y-x_{0})f_{Y}(y)\mathrm{d}y=\int_{-\infty}^{\infty}\frac{1}{h}K\bigg{(}\frac{x-x_{0}}{h}\bigg{)}f(x)\mathrm{d}x.

\widehat{g}(z)=\frac{1}{\widehat{\psi}(z)}\prod_{k=1}^{q}\bigg{(}1-\frac{e^{a_{k}z}}{\lambda_{k}}\bigg{)}^{m_{k}},

\widehat{g}(z)=\frac{1}{\widehat{\psi}(z)}\prod_{k=1}^{q}\bigg{(}1-\frac{e^{a_{k}z}}{\lambda_{k}}\bigg{)}^{m_{k}},

\widehat{\psi}(z)=\widehat{\psi}_{0}(z)\prod_{k\in\Lambda}(-a_{k}z)^{m_{k}}\prod_{k\not\in\Lambda}\bigg{(}1-\frac{1}{\lambda_{k}}\bigg{)}^{m_{k}},

\widehat{\psi}(z)=\widehat{\psi}_{0}(z)\prod_{k\in\Lambda}(-a_{k}z)^{m_{k}}\prod_{k\not\in\Lambda}\bigg{(}1-\frac{1}{\lambda_{k}}\bigg{)}^{m_{k}},

D_{1} ∣ ω ∣^{γ} \leq ∣ ψ (iω) ∣ \leq D_{2} ∣ ω ∣^{γ}, \forall∣ ω ∣ \geq ω_{0} .

D_{1} ∣ ω ∣^{γ} \leq ∣ ψ (iω) ∣ \leq D_{2} ∣ ω ∣^{γ}, \forall∣ ω ∣ \geq ω_{0} .

∣ ψ^{(j)} (iω) ∣ \leq D_{2 + j} (1 + ∣ ω ∣^{γ}), \forall ω \in R

∣ ψ^{(j)} (iω) ∣ \leq D_{2 + j} (1 + ∣ ω ∣^{γ}), \forall ω \in R

y \in [- r, r] sup ∣ ψ (iω + y) ∣ \leq C (r) (1 + ∣ ω ∣^{γ}), ω \in R

y \in [- r, r] sup ∣ ψ (iω + y) ∣ \leq C (r) (1 + ∣ ω ∣^{γ}), ω \in R

g (z) = \frac{sinh ( θ z )}{θ z} = - \frac{e ^{- θ z}}{2 θ z} (1 - e^{2 θ z}), z \in C .

g (z) = \frac{sinh ( θ z )}{θ z} = - \frac{e ^{- θ z}}{2 θ z} (1 - e^{2 θ z}), z \in C .

g (z) = k = 1 \prod q [\frac{sinh ( θ _{k} z )}{θ _{k} z}]^{m_{k}} = \frac{exp { - z \sum _{k = 1}^{q} θ _{k} m _{k} }}{\prod _{k = 1}^{q} ( - 2 θ _{k} z ) ^{m_{k}}} k = 1 \prod q (1 - e^{2 θ_{k} z})^{m_{k}}, z \in C .

g (z) = k = 1 \prod q [\frac{sinh ( θ _{k} z )}{θ _{k} z}]^{m_{k}} = \frac{exp { - z \sum _{k = 1}^{q} θ _{k} m _{k} }}{\prod _{k = 1}^{q} ( - 2 θ _{k} z ) ^{m_{k}}} k = 1 \prod q (1 - e^{2 θ_{k} z})^{m_{k}}, z \in C .

ψ (z) = k = 1 \prod q (- 2 θ_{k} z)^{m_{k}} e^{z \sum_{k = 1}^{q} θ_{k} m_{k}}, ψ_{0} (z) = e^{z \sum_{k = 1}^{q} θ_{k} m_{k}} .

ψ (z) = k = 1 \prod q (- 2 θ_{k} z)^{m_{k}} e^{z \sum_{k = 1}^{q} θ_{k} m_{k}}, ψ_{0} (z) = e^{z \sum_{k = 1}^{q} θ_{k} m_{k}} .

g (z) = k = - M \sum M p_{k} e^{- bk z} = e^{- b M z} k = 0 \sum 2 M p_{M - k} e^{bk z} = e^{- b M z} p_{M} P (e^{b z}),

g (z) = k = - M \sum M p_{k} e^{- bk z} = e^{- b M z} k = 0 \sum 2 M p_{M - k} e^{bk z} = e^{- b M z} p_{M} P (e^{b z}),

\widehat{g}(z)=p_{M}e^{-bMz}\prod_{k=1}^{2M}\Big{(}1-\frac{e^{bz}}{\lambda_{k}}\Big{)}=p_{M}e^{-bMz}\prod_{k:|\lambda_{k}|\neq 1}\Big{(}1-\frac{e^{bz}}{\lambda_{k}}\Big{)}\prod_{k:|\lambda_{k}|=1}\Big{(}1-\frac{e^{bz}}{\lambda_{k}}\Big{)},

\widehat{g}(z)=p_{M}e^{-bMz}\prod_{k=1}^{2M}\Big{(}1-\frac{e^{bz}}{\lambda_{k}}\Big{)}=p_{M}e^{-bMz}\prod_{k:|\lambda_{k}|\neq 1}\Big{(}1-\frac{e^{bz}}{\lambda_{k}}\Big{)}\prod_{k:|\lambda_{k}|=1}\Big{(}1-\frac{e^{bz}}{\lambda_{k}}\Big{)},

ψ (z) = \frac{e ^{b M z}}{p _{M} \prod _{k : ∣ λ_{k} ∣ \neq = 1} ( 1 - e ^{b z} / λ _{k} )}, ψ_{0} (z) = \frac{ψ ( z )}{\prod _{k : ∣ λ_{k} ∣ = 1} ( 1 - 1/ λ _{k} )}

ψ (z) = \frac{e ^{b M z}}{p _{M} \prod _{k : ∣ λ_{k} ∣ \neq = 1} ( 1 - e ^{b z} / λ _{k} )}, ψ_{0} (z) = \frac{ψ ( z )}{\prod _{k : ∣ λ_{k} ∣ = 1} ( 1 - 1/ λ _{k} )}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Density deconvolution under general assumptions

on the distribution of measurement errors

Denis Belomestnylabel=e1][email protected] [

Alexander Goldenshlugerlabel=e2][email protected] [ Duisburg-Essen University\thanksmarkm1, Higher School of Economics\thanksmarkm3

and University of Haifa\thanksmarkm2

Faculty of Mathematics

Duisburg-Essen University

Thea-Leymann-Str. 9

D-45127 Essen

Germany

E-mail:

Department of Statistics

University of Haifa

Haifa 3498838

Israel

National University

Higher School of Economics

11 Pokrovsky Bulvar, Pokrovka Complex

Moscow, Russia

Abstract

In this paper we study the problem of density deconvolution under general assumptions on the measurement error distribution. Typically deconvolution estimators are constructed using Fourier transform techniques, and it is assumed that the characteristic function of the measurement errors does not have zeros on the real line. This assumption is rather strong and is not fulfilled in many cases of interest. In this paper we develop a methodology for constructing optimal density deconvolution estimators in the general setting that covers vanishing and non–vanishing characteristic functions of the measurement errors. We derive upper bounds on the risk of the proposed estimators and provide sufficient conditions under which zeros of the corresponding characteristic function have no effect on estimation accuracy. Moreover, we show that the derived conditions are also necessary in some specific problem instances.

62G07,

62G20,

density deconvolution,

minimax risk,

characteristic function,

Laplace transform,

lower bounds,

density estimation,

keywords:

[class=MSC]

keywords:

\startlocaldefs\endlocaldefs

and

1 Introduction

1.1 Problem formulation and background

The problem of density deconvolution is formulated as follows. Suppose that we observe a random sample $Y_{1},\ldots,Y_{n}$ generated from the model

[TABLE]

where $X_{1},\ldots,X_{n}$ are i.i.d. random variables with density $f$ , and the measurement errors $\epsilon_{1},\ldots,\epsilon_{n}$ are i.i.d. random variables with known distribution $G$ . Furthermore, assume that $\epsilon_{1},\ldots,\epsilon_{n}$ are independent of $X_{1},\ldots,X_{n}$ . Then the probability density $f_{Y}$ of $Y=X+\epsilon$ is given by convolution

[TABLE]

The goal is to estimate $f$ from the observations $Y_{1},\ldots,Y_{n}$ .

An estimator $\tilde{f}$ of $f$ is a measurable function of observations $(Y_{1},\ldots,Y_{n})$ , and the accuracy of $\tilde{f}$ is measured by the maximal risk

[TABLE]

where $\Delta(\cdot,\cdot)$ is a loss function, ${\mathscr{F}}$ is a class of density functions, and $\mathbb{E}_{f}$ is the expectation with respect to the probability measure ${\mathbb{P}}_{f}$ of observations $Y_{1},\ldots,Y_{n}$ when the density of $X$ is $f$ . In this paper we will be interested in estimating $f$ at a single point $x_{0}\in{\mathbb{R}}$ and in the ${\mathbb{L}}_{2}$ –norm; this corresponds to the loss functions $\Delta_{x_{0}}(f_{1},f_{2}):=|f_{1}(x_{0})-f_{2}(x_{0})|$ and $\Delta_{2}(f_{1},f_{2}):=\|f_{1}-f_{2}\|_{2}=\big{\{}\int_{-\infty}^{\infty}|f_{1}(x)-f_{2}(x)|^{2}\mathrm{d}x\big{\}}^{1/2}$ , respectively. The minimax risk is then defined by

[TABLE]

where $\inf$ is taken over all possible estimators. An estimator $\tilde{f}_{*}$ is called rate–optimal if ${\cal R}_{n,\Delta}[\tilde{f}_{*};{\mathscr{F}}]\asymp O({\cal R}_{n,\Delta}^{*}[{\mathscr{F}}])$ as $n\to\infty$ , and our goal is to construct rate–optimal estimators for natural functional classes of densities.

The problem of density deconvolution has been extensively studied in the literature; see, e.g., Carroll & Hall (1988), Zhang (1990), Fan (1991), Butucea & Tsybakov (2008a, 2008b), Lounici & Nickl (2011), Comte & Lacour (2013) and Lepski & Willer (2019). We also refer to the book of Meister (2009), where many additional references can be found.

Deconvolution estimators are usually constructed using Fourier transform techniques, and the majority of results in the existing literature assumes that the characteristic function of the measurement errors has no zeros on the real line. Specifically, let $\widehat{g}$ denote the bilateral Laplace transform of $G$ ,

[TABLE]

with $\widehat{g}(i\omega)$ being the characteristic function of the measurement errors. The standard assumptions on $\widehat{g}$ in the density deconvolution problem are the following:

(A)

$\widehat{g}(i\omega)$ does not vanish, that is, $|\widehat{g}(i\omega)|\neq 0$ for all $\omega\in{\mathbb{R}}$ ;

(B)

$|\widehat{g}(i\omega)|$ decreases in an appropriate way as $|\omega|\to\infty$ : for some $\gamma>0$

(B1)

$|\widehat{g}(i\omega)|\asymp|\omega|^{-\gamma}$ as $|\omega|\to\infty$ ,

or
(B2)

$|\widehat{g}(i\omega)|\asymp\exp\{-c|\omega|^{\gamma}\}$ as $|\omega|\to\infty$ with $c>0$ .

The setting under conditions (A)–(B1) is usually referred to as the case of smooth measurement error densities, while conditions (A)–(B2) correspond to the so–called super–smooth case. Under assumption (A) the achievable estimation accuracy is determined by the rate at which $|\widehat{g}(i\omega)|$ decreases as $|\omega|\to\infty$ , and by smoothness of the density $f$ . In particular, it is well known that in the smooth case for the Hölder class ${\mathscr{H}}_{\alpha}(A)$ and for the Sobolev class ${\mathscr{S}}_{\alpha}(A)$ of regularity $\alpha$ one has

[TABLE]

as $n\to\infty$ ; see, e.g., Zhang (1990) and Fan (1991). The definitions of classes ${\mathscr{H}}_{\alpha}(A)$ and ${\mathscr{S}}_{\alpha}(A)$ are presented in Section 5. In all what follows we will refer to the rate $n^{-\alpha/(2\alpha+2\gamma+1)}$ as the standard rate of convergence.

It is worth noting that the condition (A) is rather restrictive and excludes many settings of interest. This condition does not hold if distribution of the measurement errors is compactly supported. For instance, if $g$ is a uniform density on $[-1,1]$ then $\widehat{g}(i\omega)=\sin\omega/\omega$ , and $\widehat{g}(i\omega)$ vanishes at $\omega=\pi k$ , $k=\pm 1,\pm 2,\ldots$ . Another typical situation in which condition (A) is violated is the case of measurement errors having discrete distributions. In general, if $\widehat{g}(i\omega)$ has zeros, the standard Fourier–transform–based estimation methods are not directly applicable. This fact raises the following natural questions.

(i)

How to construct rate-optimal estimators in the case when the assumption (A) does not hold, that is, $\widehat{g}(i\omega)$ has zeros, and what is the best achievable rate of convergence under these circumstances ?

(ii)

Under which conditions on $f$ one can achieve the standard rates of convergence (1.2) without assuming (A) ?

The existing literature contains only partial and fragmentary answers to questions (i) and (ii). Devroye (1989) constructed a consistent estimator of $f$ under assumption that $|\widehat{g}(i\omega)|\neq 0$ for almost all $\omega$ . The proposed estimator is a certain modification of the standard Fourier–transform–based kernel density estimator. Hall et al. (2001) consider the setting with the uniform measurement error density $g$ and develop an estimator under assumption that density $f$ is a compactly supported. Other works dealing with the uniform density deconvolution are Groeneboom & Jongbloed (2003) and Feuerverger et al. (2008). The first cited paper assumes that $X$ is non–negative, and shows that for a class of twice continuously differentiable densities, the pointwise risk of the proposed estimators converges to zero at the standard rate corresponding to $\gamma=1$ . Feuerverger et al. (2008) studied the problem of estimating densities from Sobolev functional classes with the ${\mathbb{L}}_{2}$ –risk; they show that the standard rate of convergence with $\gamma=1$ can be achieved in this setting provided that $f$ has two bounded moments. These results demonstrate that, in the problem with uniformly distributed measurement errors and under the aforementioned assumptions on $f$ , zeros of the characteristic function of $\epsilon$ have no effect on the minimax rate of convergence.

Hall & Meister (2007) and Meister (2008) considered density deconvolution problem with an oscillating Fourier transform $\widehat{g}(i\omega)$ that vanishes periodically. They proposed several modifications of the standard Fourier–transform–based estimators, considered the ${\mathbb{L}}_{2}$ –risk and showed that for certain nonparametric classes of probability densities, zeros of the characteristic function $\widehat{g}(i\omega)$ do affect the rate of convergence. Delaigle & Meister (2011) demonstrated that if the density to be estimated has a finite left endpoint, then it can be estimated with the standard rate as in the case where $\widehat{g}(i\omega)$ does not have zeros. Meister & Neumann (2010) considered a setting where $\widehat{g}(i\omega)$ may have zeros, but there are two observations of the same variable $X$ with independent measurement errors. In this setting zeros of $\widehat{g}(i\omega)$ have no influence on the rate of convergence.

The existing results in the literature leave open a fundamental question about construction of rate–optimal density deconvolution estimators under general assumptions on $\widehat{g}(i\omega).$ Specifically, it is not clear whether and under which conditions zeros of $\widehat{g}(i\omega)$ have no influence on the minimax rates of convergence.

The current paper addresses the aforementioned issues. First, we develop a methodology for constructing optimal density deconvolution estimators under general conditions on the measurement error distribution. These conditions cover settings with vanishing and non–vanishing characteristic functions of the measurement errors, and the proposed methodology treats all these settings in a unified way. The estimation methods we propose are based on the Laplace transform. In this sense they generalize standard Fourier–transform–based estimation techniques commonly used in the literature on density deconvolution. Second, we derive upper bounds on the risk of the proposed estimators and provide sufficient conditions on $f$ under which the standard rate of convergence can be achieved under general assumptions on $\widehat{g}$ . In particular, we prove that if, in addition to the smoothness restriction $f\in{\mathscr{H}}_{\alpha}(A)$ or $f\in{\mathscr{S}}_{\alpha}(A)$ , density $f$ has bounded moments of a sufficiently large order, then the standard rate of convergence can be achieved even without Assumption (A). The required number of bounded moments is characterized in terms of a sequence of coefficients (zero set sequence) which, in turn, is determined by the geometry of zeros of $\widehat{g}(i\omega)$ . Third, we specialize our general methodology to specific problem instances in which the zero set sequences can be explicitly calculated. Last but not least, it is also shown that the derived sufficient moment conditions are also necessary in order to guarantee the standard rate of convergence in absence of (A) for some specific problem instances.

The rest of the paper is organized as follows. In Section 2 we present a general idea for construction of proposed estimators. Section 3 introduces assumptions on the distribution of the measurement errors and presents examples of distributions satisfying these assumptions. Section 4 discusses construction of the estimator kernel and develops its infinite series representation. In Section 5 we define the estimator and present upper bounds on its risk. Settings corresponding to specific problem instances are discussed in Section 6, and lower bounds showing necessity of moment conditions are presented in Section 7. Some concluding remarks are brought in Section 8. Proofs of all results are given in Appendix.

1.2 Notation

For a generic locally integrable function $\phi$ the bilateral Laplace transform is defined by

[TABLE]

The Laplace transform $\widehat{\phi}(z)$ is an analytic function in the convergence region $\Sigma_{\phi}$ of the above integral which, in general, is a vertical strip:

[TABLE]

The convergence region can degenerate to a vertical line $\Sigma_{\phi}:=\{z\in\mathbb{C}:{\rm Re}(z)=\sigma_{\phi}\}$ , $\sigma_{\phi}\in{\mathbb{R}}$ , in the complex plane. If $\phi$ is a probability density, then the imaginary axis always belongs to $\Sigma_{\phi}$ , that is, $\{z:{\rm Re}(z)=0\}\subseteq\Sigma_{\phi}$ , and

[TABLE]

is the characteristic function (the Fourier transform of $\phi$ ). This degenerate case corresponds to distributions whose characteristic function cannot be analytically continued to a strip around the imaginary axes in the complex plane. The inverse Laplace transform is given by the formula

[TABLE]

The uniqueness property of the bilateral Laplace transform states that if $\widehat{\phi}_{1}(z)=\widehat{\phi}_{2}(z)$ in a common strip of convergence ${\rm Re}(z)\in(\sigma_{\phi_{1}}^{-},\sigma_{\phi_{1}}^{+})\cap(\sigma_{\phi_{2}}^{-},\sigma_{\phi_{2}}^{+}),$ then $\phi_{1}(t)$ is equal to $\phi_{2}(t)$ for almost all $t$ (Widder, 1946, Theorem 6b).

2 General idea for estimator construction

Let $G$ be the measurement error distribution function with the corresponding Laplace transform

[TABLE]

whose convergence region is denoted $\Sigma_{g}$ . Throughout the paper we suppose that $\Sigma_{g}$ is a vertical strip in the complex plane, $\Sigma_{g}=\{z:\sigma_{g}^{-}<{\rm Re}(z)<\sigma_{g}^{+}\}$ with $\sigma_{g}^{-}$ and $\sigma_{g}^{+}$ satisfying $\sigma_{g}^{-}<0<\sigma_{g}^{+}$ (see Assumption 1 in Section 3). As it was discussed above, if $\widehat{g}(z)$ has zeros on the imaginary axis in the complex plane, then the usual Fourier–transform–based methods are not directly applicable. We will be mainly interested in this case.

2.1 Linear functional strategy

The construction of our estimators follows the so-called linear functional strategy that is frequently used for solving ill–posed inverse problems [see, e.g., Goldberg (1979) and Anderssen (1980)]. In the context of the density deconvolution problem the main idea of the strategy is as follows. We want to find two kernels, say, $K$ and $L$ with the following properties:

(i)

integral $\int K(x-x_{0})f(x)\mathrm{d}x$ approximates “well” the value $f(x_{0})$ ;

(ii)

the kernel $L$ is related to the kernel $K$ via the equation:

[TABLE]

Under conditions (i) and (ii), the obvious estimator of $f(x_{0})$ from the observations $Y_{1},\ldots,Y_{n}$ is the empirical estimator $\tilde{f}$ of the integral on the right hand side of (2.1):

[TABLE]

Let $K:{\mathbb{R}}\to{\mathbb{R}}$ be a kernel with standard properties that will be specified later. For $h>0$ denote $K_{h}(\cdot):=(1/h)K(\cdot/h)$ . Assume that $K$ has bounded support so that $\widehat{K}(z)$ is an entire function, that is, $\Sigma_{K}=\mathbb{C}$ . Furthermore, assume that there exist real numbers $\varkappa_{g}^{-}$ and $\varkappa_{g}^{+}$ satisfying $\sigma_{g}^{-}\leq\varkappa_{g}^{-}<0<\varkappa_{g}^{+}\leq\sigma_{g}^{+}$ , such that

[TABLE]

In words, $S_{g}$ is the union of two open strips (with the imaginary axis as the boundary), where the function $\widehat{g}(-z)$ does not have zeros. Therefore we can define

[TABLE]

and this function is analytic in $S_{g}$ . Let

[TABLE]

with $s\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+}).$ Observe that the kernel $L_{s,h}$ is defined by the inverse Laplace transform of the function $\widehat{L}_{h}(z)=\widehat{K}(zh)/\widehat{g}(-z)$ , and the denominator of the integrand in (2.4) does not vanish as $s\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})$ . If the integral on the right hand side of (2.4) is absolutely convergent then (2.4) defines the same function $L_{s,h}$ for any value of $s\in(\varkappa_{g}^{-},0)$ or $s\in(0,\varkappa_{g}^{+})$ . In other words, depending on the sign of $s$ , the equation (2.4) defines two different functions which will be denoted by $L_{+,h}(t)$ and $L_{-,h}(t)$ , respectively. The estimator of $f(x_{0})$ is then defined by

[TABLE]

The parameters $s$ and $h$ will be specified in the sequel.

2.2 Relationship between kernels $K_{h}$ and $L_{s,h}$

The following lemma demonstrates that (2.1) holds for the kernels $K_{h}(\cdot):=(1/h)K(\cdot/h)$ and $L_{s,h}$ given by (2.4).

Lemma 1.

Suppose that for any $s\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})$ the integral on the right hand side of (2.4) is absolutely convergent, and

[TABLE]

then for any $x_{0}$

[TABLE]

Note that relation (2.6) holds for both kernels $L_{+,h}$ and $L_{-,h}$ corresponding to $s\in(0,\varkappa_{g}^{+})$ and $s\in(\varkappa_{g}^{-},0),$ respectively. Thus, both $L_{+,h}$ or $L_{-,h}$ can be used in the estimator construction.

Remark 1.

A naive approach towards estimator construction could be based on a direct application of the Laplace transform inversion formula. In particular, (1.1) implies that $\widehat{f}_{Y}(z)=\widehat{f}(z)\widehat{g}(z)$ . The empirical estimator of $\widehat{f}_{Y}(z)$ can be constructed in the standard way using the available data $Y_{1},\ldots,Y_{n}$ ; then a division by $\widehat{g}(z)$ with proper regularization and application of the inverse Laplace transform formula yields an estimator of $f$ . Note, however, that this approach requires analyticity of $\widehat{f}$ in a strip containing the imaginary axis, i.e., $f$ must have very light tails. In contrast, our construction does not require existence of $\widehat{f}(z)$ for $z$ outside the imaginary axis; only the analyticity of $\widehat{g}(z)$ is required.

3 Distribution of measurement errors

3.1 Assumptions

Accuracy of the estimator $\tilde{f}_{s,h}(x_{0})$ defined in (2.5) will be studied under the following general assumptions on the distribution of the measurement errors.

Assumption 1.

The Laplace transform $\widehat{g}(z)$ of the measurement error distribution exists in a vertical strip $\Sigma_{g}:=\{z\in\mathbb{C}:\sigma_{g}^{-}<{\rm Re}(z)<\sigma_{g}^{+}\}$ , $\sigma_{g}^{-}<0<\sigma_{g}^{+}$ , and admits the following representation:

[TABLE]

where $a_{1},\ldots,a_{q}$ are positive real numbers, $|\lambda_{k}|=1$ , $k=1,\ldots,q,$ $m_{1},\ldots,m_{q}$ are non-negative integer numbers, and the pairs $(a_{k},\lambda_{k})$ , $k=1,\ldots,q$ are distinct. The function $\widehat{\psi}(z)$ is represented as

[TABLE]

where $\Lambda:=\{k=1,\ldots,q:\lambda_{k}=1\}$ , $\widehat{\psi}_{0}(z)$ is analytic, does not vanish in a vertical strip $\Sigma_{\psi}:=\{z\in\mathbb{C}:\sigma_{\psi}^{-}<{\rm Re}(z)<\sigma_{\psi}^{+}\}$ with $\sigma_{g}^{-}\leq\sigma_{\psi}^{-}<0<\sigma_{\psi}^{+}\leq\sigma_{g}^{+}$ , and $\widehat{\psi}_{0}(0)=1$ .

Several remarks on Assumption 1 are in order.

Remark 2.

(i). Assumption 1 states that $\widehat{g}(z)$ factorizes into a product of two functions: the first function, $\prod_{k=1}^{q}(1-e^{a_{k}z}/\lambda_{k})^{m_{k}}$ , has zeros only on the imaginary axis, while the second one, $1/\widehat{\psi}(z)$ , does not have zeros in $\Sigma_{\psi}\subseteq\Sigma_{g}$ . The latter fact follows from analyticity of $\widehat{\psi}(z)$ in $\Sigma_{\psi}$ .

(ii). Zeros of $\widehat{g}$ on the imaginary axis are $z_{k,j}:=i({\rm arg}\{\lambda_{k}\}+2\pi j)/a_{k}$ , $z_{k,j}\neq 0$ , where $j\in{\mathbb{Z}}$ , $k=1,\ldots,q$ , and the multiplicity of each zero $z_{k,j}$ is equal to $m_{k}$ . Assumption 1 implies that $\widehat{g}(z)$ has no zeros in $\Sigma_{\psi}\backslash\{z:{\rm Re}(z)=0\}$ , and (2.2) holds with $S_{g}=\{z:{\rm Re}(z)\in(-\sigma_{\psi}^{+},0)\cup(0,-\sigma_{\psi}^{-})\}$ , that is, $\varkappa_{g}^{-}=-\sigma_{\psi}^{+}$ and $\varkappa_{g}^{+}=-\sigma_{\psi}^{-}$ .

(iii). The form of $\widehat{\psi}(z)$ in (3.2) follows from (3.1) and the fact that $\widehat{g}(0)=1$ .

In addition to Assumption 1 we require conditions on the growth of function $\widehat{\psi}(\cdot)$ in (3.1) on the imaginary axis. These conditions are similar to the standard ones in the smooth case [see condition (B1) in Section 1].

Assumption 2.

Assume there exist constants $\omega_{0}>0$ , $\gamma\geq 0$ and $D_{1}>0$ , $D_{2}>0$ such that

[TABLE]

In addition, suppose that

[TABLE]

for all natural $j\geq 1$ and some positive constants $D_{3},D_{4},\ldots.$

Remark 3.

Condition (3.3), when imposed on $1/\widehat{g}(i\omega)$ , is rather standard in the literature; it corresponds to the so-called smooth error densities. As for (3.4), similar restrictions on the derivatives of $\widehat{g}(i\omega)$ are usually imposed in the proofs of lower bounds; see, e.g., (Fan, 1991, Theorem 5). We also note that all derivatives of the function $\widehat{\psi}(i\omega)$ exist due to analyticity of $\widehat{\psi}$ . Furthermore, if the inequality

[TABLE]

holds for some $r\in(0,\max\{-\sigma_{\psi}^{-},\sigma_{\psi}^{+}\})$ , and constant $C$ is independent of $\omega,$ then the well-known Cauchy derivative estimates imply (3.4).

3.2 Examples of distributions

Assumptions 1 and 2 define a broad class of distributions containing densities with characteristic functions that vanish on the real line. Moreover, discrete distributions are covered by Assumptions 1 and 2. All this is illustrated in the following examples.

Example 1 (Uniform distribution).

Let $\epsilon~{}\sim U(-\theta,\theta),$ then

[TABLE]

In this case representation (3.1) holds with $q=1$ , $m_{1}=1,$ $a_{1}=2\theta$ , $\lambda_{1}=1$ , and $\widehat{\psi}(z)=-2\theta ze^{\theta z}$ , $\widehat{\psi}_{0}(z)=e^{\theta z}$ . Clearly, $\widehat{\psi}$ satisfies Assumption 2 with $\gamma=1$ . Note that $\widehat{g}(z)$ has simple zeros on the imaginary axis at $z_{k}=i\pi k/\theta$ , $k=\pm 1,\pm 2,\ldots$ , and $\Sigma_{g}=\Sigma_{\psi}=\mathbb{C}$ . **

Example 2 (Convolution of uniform distributions).

Consider a convolution of the uniform distributions $U(-\theta_{k},\theta_{k})$ , $k=1,\ldots,q,$ with distinct parameters $\theta_{1},\ldots,\theta_{q}$ , each of multiplicity $m_{k}.$ In this case

[TABLE]

Therefore Assumption 1 holds with $a_{k}=2\theta_{k}$ , $\lambda_{k}=1$ , $k=1,\ldots,q$ , $\Sigma_{g}=\Sigma_{\psi}=\mathbb{C}$ , and

[TABLE]

Thus, $\widehat{\psi}(z)$ satisfies Assumption 2 with $\gamma=m_{1}+\cdots+m_{k}$ . Of special interest is the case of $m$ identical uniform distributions $U(-\theta,\theta)$ . Here $q=1$ , $\theta_{1}=\theta$ , $m_{1}=m$ , $\widehat{\psi}(z)=(-2\theta z)^{m}\exp\{m\theta z\}$ , $a_{1}=2\theta$ and $\lambda_{1}=1$ . Note also that in this case $\gamma=m$ . **

Example 3 (Discrete distributions).

Let $\epsilon$ be a discrete random variable taking values in the set $\{0,\pm b,\ldots,\pm Mb\}$ , $b>0,$ with corresponding probabilities $p_{k}$ , $k=0,\pm 1,\ldots,\pm M$ , where $p_{M}\neq 0$ . Then

[TABLE]

where $P(x):=1+\sum_{k=1}^{2M}(p_{M-k}/p_{M})x^{k}$ . Let $\lambda_{1},\ldots,\lambda_{2M}$ be the roots of the polynomial $P(x)$ ; note that $\lambda_{k}\neq 1$ , $\forall k=1,\ldots,2M.$ Then we have

[TABLE]

and $\widehat{g}$ is an entire function, i.e., $\Sigma_{g}=\mathbb{C}$ . Representations (3.1) and (3.2) hold with

[TABLE]

and $\Sigma_{\psi}=\big{\{}z:b^{-1}\ln(\lambda_{-})<{\rm Re}(z)<b^{-1}\ln(\lambda_{+})\big{\}}$ , where $\lambda_{-}:=\max\{|\lambda_{k}|:|\lambda_{k}|<1\}$ , and $\lambda_{+}:=\min\{|\lambda_{k}|:|\lambda_{k}|>1\}$ . In this example if all $\lambda_{k}$ with $|\lambda_{k}|=1$ are distinct, then $q:=\#\{k:|\lambda_{k}|=1\}$ , $a_{1}=\cdots=a_{q}=b$ , and $m_{1}=\cdots=m_{q}=1$ . It is obvious that Assumption 2 holds with $\gamma=0$ .

In the special case of the Bernoulli distribution with the success probability parameter $1/2$ we have $\widehat{g}(z)=\frac{1}{2}(1+e^{z})$ ; hence (3.1) holds with $q=1$ , $a_{1}=1$ , $\lambda_{1}=-1$ , $m_{1}=1$ , and $\widehat{\psi}(z)=2$ . If $\epsilon$ is a binomial random variable with the number of trials $m$ and a success probability $1/2$ , then $\widehat{g}(z)=2^{-m}(1+e^{z})^{m}$ , and (3.1) holds with $q=1$ , $a_{1}=1$ , $\lambda_{1}=-1$ , $m_{1}=m$ , and $\widehat{\psi}(z)=2^{m}$ . **

Example 4 (Convolution of uniform and smooth densities).

Let $\varphi$ be a probability density with Laplace transform $\widehat{\varphi}$ defined in a strip $\Sigma_{\varphi}=\{z:\sigma_{\varphi}^{-}<{\rm Re}(z)<\sigma_{\varphi}^{+}\}$ such that $\widehat{\varphi}(z)\neq 0$ , $\forall z\in\Sigma_{\varphi}$ . Assume also that $|\widehat{\varphi}(i\omega)|\asymp|\omega|^{-\gamma}$ for some $\gamma>0$ as $|\omega|\to\infty$ , that is, $\varphi$ satisfies the standard conditions of the smooth case. Let $g$ be a convolution of the uniform density on $[-\theta,\theta]$ with $\varphi$ ; then

[TABLE]

and (3.1) obviously holds with $\widehat{\psi}(z)=-2\theta ze^{\theta z}/\widehat{\varphi}(z)$ . For instance, let $\varphi$ be a density of the Gamma distribution with parameters $\gamma>0$ and $\lambda>0$ , that is, $\varphi(x)=\lambda^{\gamma}[\Gamma(\gamma)]^{-1}x^{\gamma-1}e^{-\lambda x}$ , $x>0.$ Then $\widehat{\varphi}(z)=\lambda^{\gamma}(z+\lambda)^{-\gamma}$ , ${\rm Re}(z)>-\lambda$ , and $\widehat{\psi}(z)=-2\theta\lambda^{-\gamma}ze^{\theta z}(z+\lambda)^{\gamma}$ . **

4 Kernel representation

Under Assumption 1 kernel $L_{s,h}$ defined in (2.4) is rewritten as follows

[TABLE]

where $S_{g}=\{z:{\rm Re}(z)\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})\}$ is the set on which $\widehat{g}(-z)$ does not vanish. Thus, for any $s\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})$ the denominator of the integrand in (4.1) is non–zero. Below we demonstrate that $L_{s,h}$ can be formally represented as an infinite series.

4.1 Infinite series representation

To develop the infinite series representation we need the following notation. According to Assumption 1, the set of zeros of $\widehat{g}(z)$ on the imaginary axis is determined by three $q$ -tuples $a=(a_{1},\ldots,a_{q})$ , $\lambda=(\lambda_{1},\ldots,\lambda_{q})$ and $m=(m_{1},\ldots,m_{q})$ . For a given vector $a=(a_{1},\ldots,a_{q})$ define

[TABLE]

The set ${\mathscr{L}}$ can be represented as an ordered set of real numbers ${\mathscr{L}}:=\{\ell_{0},\ell_{1},\ell_{2},\ldots\}$ , where $0=\ell_{0}<\ell_{1}<\ell_{2}<\ell_{3}<\cdots$ . Define also

[TABLE]

and let

[TABLE]

In fact, $C_{j,m}$ is the number of weak compositions of $j$ into $m$ parts [see, e.g., (Stanley, 1997, p. 25)]. Recall that an $m$ –tuple $(i_{1},\ldots,i_{m})$ of non–negative integers with $i_{1}+\cdots+i_{m}=j$ is called a weak composition of $j$ into $m$ parts.

Lemma 2.

Let Assumption 1 hold, and $\int_{-\infty}^{\infty}\big{|}\widehat{K}(i\omega h)\widehat{\psi}(-i\omega)\big{|}\mathrm{d}\omega<\infty$ .

(a)

If $s\in(0,\varkappa_{g}^{+}),$ then

[TABLE]

provided that the summation on the right hand side of (4.3) defines a finite function for any $t$ .

(b)

If $s\in(\varkappa_{g}^{-},0)$ then

[TABLE]

provided that the summation on the right hand side of (4.5) is finite for any $t$ .

Remark 4.

(i).* Lemma 2 shows that under Assumption 1 kernel $L_{s,h}(t)$ can be represented as infinite linear combination of one–sided translations of $R_{h}$ , where the translation parameter $\ell$ takes values in the set ${\mathscr{L}}$ .*

(ii).* The coefficients $\{{\cal C}_{\ell}^{+}\}$ and $\{{\cal C}_{\ell}^{-}\}$ of the linear combination are completely determined by the structure of the zero set of $\widehat{g}(z)$ on the imaginary axis. The sequences $\{{\cal C}_{\ell}^{+}\}$ , $\{{\cal C}_{\ell}^{-}\}$ will play an important role in the sequel, and we call them the zero set sequences. The definitions in (4.4) and (4.6) imply that the coefficients $|{\cal C}^{+}_{\ell}|$ , $|{\cal C}^{-}_{\ell}|$ may grow at most polynomially in $\ell$ as $\ell\to\infty$ . Note also that ${\cal C}_{\ell_{0}}^{+}=1$ and ${\cal C}^{-}_{\ell_{0}}=\prod_{k=1}^{q}(-\lambda_{k})^{m_{k}}$ .*

4.2 Kernel representation in specific problem instances

In general, determination of coefficients $\{{\cal C}^{+}_{\ell}\}$ and $\{{\cal C}^{-}_{\ell}\}$ in (4.4) and (4.6) is difficult. It is instructive to apply the result of Lemma 2 to some particular cases of Examples 1–4 where the zero set sequences and the corresponding kernels can be explicitly calculated.

Uniform distribution

This is the setting of Example 1. Recall that here $q=1$ , $m_{1}=1$ , $a_{1}=2\theta$ , $\lambda_{1}=1$ ; hence ${\mathscr{L}}=\{2\theta j:j=0,1,\ldots\}$ , and ${\cal C}^{+}_{\ell}=1$ , ${\cal C}^{-}_{\ell}=-1$ for all $\ell=2\theta j$ , $j=0,1,\ldots$ . Since $\widehat{\psi}(-i\omega)=2\theta i\omega e^{-i\theta\omega}$ ,

[TABLE]

Thus, in view of (4.3) and (4.5)

[TABLE]

If $K$ is a bounded continuously differentiable kernel with finite support, and $h$ is small enough then formulas in (4.7) define functions $L_{+,h}(t)$ and $L_{-,h}(t)$ which are finite for any $t\in{\mathbb{R}}$ .

Convolution of uniform distributions

We consider two specific cases of Example 2.

(a). Consider convolution of $m$ identical uniform distributions $U(-\theta,\theta)$ . In this case $q=1$ , $m_{1}=m$ , $a_{1}=2\theta$ and $\lambda_{1}=1$ . Thus, ${\mathscr{L}}=\{2\theta j:j=0,1,\ldots\}$ , ${\cal C}^{+}_{2\theta j}=C_{j,m}$ , ${\cal C}^{-}_{2\theta j}=(-1)^{m}C_{j,m}$ , $j=0,1,2,\ldots$ . Since $\widehat{\psi}(z)=(-2\theta z)^{m}\exp\{m\theta z\}$ ,

[TABLE]

Therefore

[TABLE]

Similarly to the previous example, if $K$ is $m$ times continuously differentiable with a finite support, and $h$ is small enough then the formulas define finite functions for any $t$ .

(b). Consider convolution of $q$ uniform distributions $U(-\theta_{k},\theta_{k})$ , $k=1,\ldots,q$ with distinct $\theta_{k}$ , $k=1,\ldots,q$ . In this case $a_{k}=2\theta_{k}$ , $\lambda_{k}=1$ , $m_{k}=1$ for $k=1,\ldots,q$ . Thus, ${\mathscr{L}}=\{2\sum_{k=1}^{q}\theta_{k}j_{k}:(j_{1},\ldots,j_{q})\in{\mathbb{Z}}_{+}^{q}\}$ , and if $\ell=2\sum_{k=1}^{q}\theta_{k}j_{k}^{*}$ for some $(j_{1}^{*},\ldots,j_{q}^{*})\in{\mathbb{Z}}_{+}^{q}$ then ${\cal C}^{+}_{\ell}$ is the number of non–negative integer solutions $(x_{1},\ldots,x_{q})$ to the equation

[TABLE]

and ${\cal C}^{-}_{\ell}=(-1)^{q}{\cal C}^{+}_{\ell}$ . It is clear that there is at least one solution $(x_{1},\ldots,x_{q})=(j_{1}^{*},\ldots,j_{q}^{*})$ ; the total number of solutions depends on $\theta_{1},\ldots,\theta_{q}$ . For instance, assume that $\theta_{k}=r_{k}\theta_{1}$ , $k=1,\ldots,q$ , where $1=r_{1}<r_{2}<\cdots<r_{q}$ are coprime integer numbers. Then ${\cal C}^{+}_{\ell}$ with $\ell=2\theta_{1}\ell_{*}$ is the number of representations of the integer number $\ell_{*}=j_{1}^{*}+r_{2}j_{2}^{*}+\cdots+r_{q}j_{q}^{*}$ by non–negative integer linear combination of $r_{1},\ldots,r_{q}$ . By Schur’s theorem [see, e.g., (Wilf, 2006, Section 3.15)]

[TABLE]

It follows from (3.5) that

[TABLE]

and therefore

[TABLE]

Thus,

[TABLE]

where ${\cal C}^{-}_{\ell}=(-1)^{q}{\cal C}^{+}_{\ell}$ , and the sequence $\{{\cal C}^{+}_{\ell},\,\ell=2\theta_{1}j,\,j=0,1,\ldots\}$ satisfies (4.8). If kernel $K$ is $q$ times continuously differentiable and has bounded support then the last formulas define functions which are finite for any fixed $t$ .

Binomial distribution

Assume that the measurement error distribution is binomial with parameters $m$ and $1/2$ ; this is a particular case of Example 3. Here $q=1$ , $a_{1}=1$ , $\lambda_{1}=-1$ , $m_{1}=m$ and $\widehat{\psi}(z)=2^{m}$ . Hence ${\mathscr{L}}={\mathbb{Z}}_{+}^{1}$ , ${\cal C}_{\ell}^{+}=C_{\ell,m}$ , ${\cal C}_{\ell}^{-}=(-1)^{m}C_{\ell,m}$ ,

[TABLE]

and

[TABLE]

5 Estimator and upper bounds on the risk

Based on the general ideas presented in Section 2 and kernel representations developed in Section 4 we are now in a position to define the proposed estimator of $f$ and to study its accuracy.

5.1 Estimator

We assume that kernel $K$ is chosen to satisfy the following condition.

(K)

Let $K\in C^{\infty}({\mathbb{R}})$ be a function supported on $[-1,1]$ such that for a fixed positive integer $k_{0},$

[TABLE]

Condition (K) is standard in nonparametric kernel density estimation; clearly, one can always construct kernel $K$ satisfying (K) with prescribed parameter $k_{0}$ .

Let $N$ be a natural number, and denote

[TABLE]

The estimator of $f(x_{0})$ is defined as follows

[TABLE]

where we set

[TABLE]

and

[TABLE]

In what follows we will write $\tilde{f}_{+,h}^{(N)}(x_{0})$ and $\tilde{f}_{-,h}^{(N)}(x_{0})$ for the estimator (5.1) associated with $s\in(0,\varkappa_{g}^{+})$ and $s\in(\varkappa_{g}^{-},0),$ respectively. Let finally

[TABLE]

Recall that the function $R_{h}$ and the sequences $\{{\cal C}^{+}_{\ell},\ell\in{\mathscr{L}}\}$ , $\{{\cal C}^{-}_{\ell},\ell\in{\mathscr{L}}\}$ are defined in (4.2), (4.3) and (4.5), respectively. The estimator construction follows the linear functional strategy of Section 2 in conjunction with the kernel representation developed in Section 4. Note that we truncate the infinite series kernel representation by the cut–off parameter $N$ ; this introduces some bias but ensures that the integral on the left hand side of (2.6) is absolutely convergent. The estimator $\tilde{f}^{(N)}_{s,h}(x_{0})$ requires specification of $h$ and $N$ ; this will be done in the sequel.

5.2 Functional classes

Now we define functional classes over which accuracy of the proposed estimators will be assessed. The next two definitions introduce standard classes of smooth functions.

Definition 1.

Let $A>0$ , $\alpha>0$ be real numbers. We say that a probability density $f$ belongs to the functional class ${\mathscr{H}}_{\alpha}(A)$ if $f$ is $\lfloor\alpha\rfloor:=\max\{k\in{\mathbb{N}}\cup\{0\}:k<\alpha\}$ times continuously differentiable and

[TABLE]

Definition 2.

For real numbers $A>0$ and $\alpha>1/2$ we say that a probability density $f$ belongs to the functional class ${\mathscr{S}}_{\alpha}(A)$ if $f$ is $\lfloor\alpha\rfloor:=\max\{k\in{\mathbb{N}}\cup\{0\}:k<\alpha\}$ times differentiable and

[TABLE]

We will also consider classes of probability densities with bounded moments.

Definition 3.

Let $p>0$ and $B>0$ be real numbers. We say that a probability density $f$ belongs to the functional class ${\mathscr{M}}_{p}(B)$ if

[TABLE]

We also denote ${\mathscr{M}}_{p}^{\prime}(B)$ the class of all densities $f$ from ${\mathscr{M}}_{p}(B)$ satisfying the following additional condition:

[TABLE]

where $\gamma\geq 0$ is a constant appearing in Assumption 2.

Remark 5.

(i).* Condition (5.5) is rather mild. Note that $f\in{\mathscr{M}}_{p}(B)$ implies boundedness of $|\widehat{f}^{(j)}(i\omega)|$ for all $j=0,\ldots,p$ , and in (5.5) we require integrability of $\widehat{f}^{(j)}(i\omega)$ with the weight $(1+|\omega|^{\gamma})^{-1}$ . If $\gamma>1$ and $f\in{\mathscr{M}}_{p}(B_{1}),$ then (5.5) holds trivially: $f\in{\mathscr{M}}_{p}^{\prime}(B_{2})$ with $B_{2}=c_{\gamma}B_{1}$ where $c_{\gamma}:=\int_{-\infty}^{\infty}(1+|\omega|^{\gamma})^{-1}\mathrm{d}\omega$ . Therefore in the definition of ${\mathscr{M}}^{\prime}_{p}(B)$ the restriction (5.5) is active only if $\gamma\leq 1$ .*

(ii).* If $f\in{\mathscr{H}}_{\alpha}(A),$ then $f$ is uniformly bounded above by a constant depending on $A$ only. However, for the sake of convenience, we explicitly require boundedness of $f$ in the definition of the class ${\mathscr{M}}_{p}(B)$ .*

We also denote

[TABLE]

5.3 Upper bounds

In this section we derive an upper bound on the maximal risk of the estimator (5.1) under Assumptions 1, 2, and under the following additional condition on the growth of zero set sequences $\{{\cal C}_{\ell}^{+}\}$ and $\{{\cal C}_{\ell}^{-}\}$ .

Assumption 3.

Assume that

[TABLE]

for some $C_{0}>0$ and $\nu>1.$

Theorem 1.

Suppose that Assumptions 1, 2 and 3 hold. Let $\tilde{f}_{h}^{(N)}(x_{0})$ be associated with kernel $K$ satisfying the condition (K) with $k_{0}\geq\alpha+1$ .

(a)

Assume that $f\in{\mathscr{F}}^{\prime}_{\alpha,p}(A,B)$ with $p\geq 2\nu.$ Let $h=h_{*}:=\big{[}B(A^{2}n)^{-1}\big{]}^{1/(2\alpha+2\gamma+1)}$ and $N\geq\big{(}A^{-2\gamma+1}B^{2\gamma+\alpha}n^{\alpha+1}\big{)}^{1/p(2\alpha+2\gamma+1)}$ . Then for all large enough $n$ one has

[TABLE]

where $C_{1}$ may depend on $\alpha,$ and $p$ only.

(b)

Let $f\in{\mathscr{G}}^{\prime}_{\alpha,p}(A,B)$ with $p\geq 2\nu+1.$ Let $N\geq\big{(}A^{-2\gamma+1}B^{2\gamma+\alpha}n^{\alpha+1}\big{)}^{2/(2p-1)(2\alpha+2\gamma+1)}$ and $h=h_{*}$ . Then for all large enough $n$ one has

[TABLE]

where $C_{2}$ may depend on $\alpha$ and $p$ only.

Remark 6.

(i).* It is well known that under assumptions $\widehat{g}(i\omega)\neq 0$ , $\omega\in{\mathbb{R}}$ and $\widehat{g}(i\omega)\asymp|\omega|^{-\gamma}$ , $|\omega|\to\infty$ , we have ${\cal R}_{n,\Delta_{x_{0}}}^{*}[{\mathscr{H}}_{\alpha}(A)]\asymp n^{-\alpha/(2\alpha+2\gamma+1)}$ and ${\cal R}_{n,\Delta_{2}}^{*}[{\mathscr{S}}_{\alpha}(A)]\asymp n^{-\alpha/(2\alpha+2\gamma+1)}$ as $n\to\infty$ . Theorem 1 provides conditions on $f$ that guarantee the standard rate of convergence in the case when $\widehat{g}(i\omega)$ may have zeros. In particular, conditions $f\in{\mathscr{M}}^{\prime}_{p}(B)$ with $p\geq 2\nu$ and $p\geq 2\nu+1$ are sufficient in order to ensure the standard rate $n^{-\alpha/(2\alpha+2\gamma+1)}$ for the pointwise and ${\mathbb{L}}_{2}$ –risks, respectively. It is worth noting that the ${\mathbb{L}}_{2}$ –risk bound requires stronger condition; as we will show in Section 7, this is an intrinsic feature of the problem.*

(ii).* The result of Theorem 1 is rather general: it holds for any configuration of zeros of $\widehat{g}$ and function $\widehat{\psi}$ satisfying Assumption 2. One interesting implication is that for discrete error distributions such as in Example 3, the achievable rate of convergence is $n^{-\alpha/(2\alpha+1)},$ provided that $f$ has a finite absolute moment of sufficiently large order. Note that $n^{-\alpha/(2\alpha+1)}$ is the minimax rate of convergence in the problem of density estimation from direct i.i.d. observations.*

(iii).* If the measurement error distribution is uniform, then Assumption 3 holds with $\nu=1+\varepsilon$ for arbitrary small $\varepsilon>0$ . Therefore Theorem 1 implies that the standard rate of convergence $n^{-\alpha/(2\alpha+3)}$ of the pointwise and ${\mathbb{L}}_{2}$ –risks is achieved if $f\in{\mathscr{M}}_{p}^{\prime}(B)$ with $p>2$ and $p>3$ , respectively.*

In some specific cases when closed form expressions for the zero sequences $\{{\cal C}^{+}_{\ell}\}$ and $\{{\cal C}_{\ell}^{-}\}$ and function $\widehat{\psi}(z)$ are available, the conditions of Theorem 1 can be relaxed. We demonstrate this in the next section.

6 Specific problem instances

In this section we consider specific distributions of measurement errors for which conditions of Theorem 1 can be relaxed.

6.1 Convolution of uniform distributions

Consider a particular case of Example 2 where $\widehat{g}(z)=[\sinh(\theta z)/(\theta z)]^{m}$ , $m\geq 1$ . This setting also covers Example 1 that corresponds to $m=1$ . Recall that in this case ${\cal C}^{+}_{2\theta j}=C_{j,m}$ , ${\cal C}^{-}_{2\theta j}=(-1)^{m}C_{j,m}$ , and therefore Assumption 3 is valid for any $\nu>m$ . Then Theorem 1 implies that the pointwise and ${\mathbb{L}}_{2}$ –risks converge to zero at the standard rate provided that $f\in{\mathscr{M}}^{\prime}_{p}(B)$ with $p>2m$ , and $p>2m+1$ , respectively. In fact, as the following result demonstrates, these conditions are too strong: the standard rate is achievable if $p\geq 2m-2$ for the pointwise risk, and if $p>2m-1$ for the ${\mathbb{L}}_{2}$ –risk. Recall that $\widehat{\psi}(z)=(-2\theta z)^{m}\exp\{m\theta z\}$ and

[TABLE]

The corresponding kernels are

[TABLE]

Theorem 2.

Let $\widehat{g}(z)=[\sinh(\theta z)/(\theta z)]^{m}$ , $m\in\mathbb{N}$ . Let $K$ be a kernel satisfying the condition (K) with $k_{0}\geq\alpha+1$ , and let $\tilde{f}_{h}^{(N)}$ denote the estimator defined in (5.1)–(5.4) and associated with the kernels (6.2) and (6.2).

(a)

Assume that $f\in{\mathscr{F}}_{\alpha,p}(A,B)$ with $p\geq 2m-2$ if $m>1$ , and $p>0$ if $m=1$ . Let $h=h_{*}:=\big{[}B(A^{2}n)^{-1}\big{]}^{1/(2\alpha+2m+1)}$ , and $N\geq\big{(}A^{-2m+1}B^{2m+\alpha}n^{\alpha+1}\big{)}^{1/p(2\alpha+2m+1)}$ . Then for large enough $n,$

[TABLE]

where $C_{1}$ may depend on $\alpha$ , $m$ and $\theta$ only.

(b)

Let $f\in{\mathscr{G}}_{\alpha,p}(A,B)$ with $p>2m-1$ , $N\geq(B^{2m+\alpha}A^{-2m+1}n^{\alpha+1})^{2/(2p-1)(2\alpha+2m+1)}$ , and $h=h_{*}$ . Then for all large enough $n$

[TABLE]

where $C_{2}$ may depend on $\alpha$ , $m$ and $\theta$ only.

Remark 7.

(i).* In contrast to the proof of Theorem 1, the proof of Theorem 2 relies on a closed form expressions for kernels $L_{\pm,h}^{(N)}(t)$ [cf. (6.2), (6.2)]. In this case support of function $R_{h}$ has a “small” length, and this fact is crucial for relaxing assumptions of Theorem 1.*

(ii).* Theorem 2 shows that the standard rate of convergence $n^{-\alpha/(2\alpha+2\gamma+1)}$ is achieved*

•

by the maximal pointwise risk over ${\mathscr{H}}_{\alpha}(A),$ if $f\in{\mathscr{M}}_{2m-2}(B)$ when $m>1$ , and $f\in{\mathscr{M}}_{\varepsilon}(B)$ for any $\varepsilon>0$ when $m=1$ ;

•

by the maximal ${\mathbb{L}}_{2}$ –risk over ${\mathscr{S}}_{\alpha}(A),$ if $f\in{\mathscr{M}}_{2m-1+\epsilon}(B)$ , $\epsilon>0$ .

In contrast, Theorem 1 requires $f\in{\mathscr{M}}_{p}(B)$ with $p>2m$ and $f\in{\mathscr{M}}_{p}(B)$ , $p>2m+1,$ respectively. The difference in conditions is particularly noticeable in the case of the uniform density where $m=1$ . Indeed, while Theorem 1 requires finiteness of $p$ -th moment with $p>2$ for the pointwise risk and $p>3$ for the ${\mathbb{L}}_{2}$ –risk, Theorem 2 shows that it is sufficient to require conditions $f\in{\mathscr{M}}_{\varepsilon}(B)$ and $f\in{\mathscr{M}}_{1+\varepsilon}(B)$ , $\varepsilon>0$ . It also follows from the proof that assumption $f\in{\mathscr{M}}_{\varepsilon}(B)$ for the pointwise risk can be further relaxed: any uniform decrease of $f(x)$ as $|x|\to\infty$ will be sufficient.

6.2 Binomial distribution

In this section we consider a specific case of Example 3 of Section 3.2, where the measurement errors have binomial distribution with parameters $m$ and $1/2$ . Here $\widehat{g}(z)=2^{-m}(1+e^{z})^{m}$ . Recall that in this case ${\mathscr{L}}={\mathbb{Z}}_{+}^{1}$ , ${\cal C}_{\ell}^{+}=C_{\ell,m}$ , ${\cal C}^{-}_{\ell}=(-1)^{m}C_{\ell,m}$ ,

[TABLE]

and

[TABLE]

Theorem 3.

Let $\widehat{g}(z)=2^{-m}(1+e^{z})^{m}$ , $m\geq 1$ . Fix some $\alpha>0.$ Let $K$ be a kernel satisfying condition (K) with $k_{0}\geq\alpha+1$ , and let $\tilde{f}_{h}^{(N)}$ denote the estimator defined in and (5.1)–(5.4) and associated with the kernels in (6.3).

(a)

Assume that $f\in{\mathscr{F}}_{\alpha,p}(A,B)$ with $p\geq 2m-2$ if $m>1$ , and $p>0$ if $m=1$ . Let $h=h_{*}:=[B(A^{2}n)^{-1}]^{1/(2\alpha+1)}$ , and $N\geq(AB^{\alpha}n^{\alpha+1})^{1/p(2\alpha+1)}$ . Then for large enough $n,$

[TABLE]

where $C_{1}$ may depend on $\alpha$ only.

(b)

Assume that $f\in{\mathscr{G}}_{\alpha,p}(A,B)$ for some $p>2m-1$ . Let $N\geq(AB^{\alpha}n^{\alpha+1})^{2/(2p-1)(2\alpha+1)}$ , and $h=h_{*}$ . Then for all large enough $n,$

[TABLE]

where $C_{2}$ may depend on $\alpha$ only.

The proof is omitted as it goes along the same lines as the proof of Theorem 2 with minor modifications.

7 Lower bounds: necessity of moment conditions

Theorem 2 shows that if the error distribution is the $m$ –fold convolution of the uniform distributions on $[-\theta,\theta],$ then the maximal pointwise and ${\mathbb{L}}_{2}$ –risks on the classes ${\mathscr{H}}_{\alpha}(A)$ and ${\mathscr{S}}_{\alpha}(A)$ converge to zero at the standard rate $n^{-\alpha/(2\alpha+2m+1)}$ , provided that moment conditions hold. The following theorem demonstrates that these moment conditions are also necessary.

Theorem 4.

Let $\widehat{g}(i\omega)=[\sin(\theta\omega)/(\theta\omega)]^{m}$ , where $m\geq 1$ is integer.

(a)

If $f\in{\mathscr{F}}_{\alpha,p}(A,B)$ with $0<p<2m-2$ then

[TABLE]

(b)

Moreover, if $f\in{\mathscr{G}}_{\alpha,p}(A,B)$ with $p<2m-1$ then

[TABLE]

Remark 8.

The theorem states that under conditions $0<p<2m-2$ and $p<2m-1$ the standard rates of convergence cannot be attained in estimation with pointwise and ${\mathbb{L}}_{2}$ –risks, respectively. On the other hand, we have constructed an estimator that achieves the standard rate of convergence, provided that $p\geq 2m-2$ , $m>1$ and $p>0$ , $m=1$ in the former case, and $p>2m-1$ in the latter case; see Theorem 2. Thus, the indicated moment conditions are necessary for convergence of the risks at the standard rate.

8 Concluding remarks

We close this paper with some concluding remarks.

1. The proposed estimator $\tilde{f}_{h}^{(N)}(x_{0})$ in (5.4) is associated with the one–sided kernels $L_{+,h}$ and $L_{-,h}$ , which were used for positive and negative values of $x_{0},$ respectively. This definition was adopted for the sake of convenience and unification of proofs. In fact, a closer inspection of the proofs of Theorems 1 and 2 shows that for any $x_{0}$ one can construct an estimator relying on any one of these two kernels with the same risk guarantees. In this case the parameter $N$ should be chosen depending on $x_{0}$ .

In this paper we considered the functional class ${\mathscr{M}}_{p}(B)$ of densities satisfying certain moment conditions. It is worth noting that the proposed estimators can be analyzed under other assumptions as well. For instance, if the support of $f$ has a finite left endpoint, then there is no need to assume that $f\in{\mathscr{M}}_{p}(B)$ . Indeed, the proof of Theorem 2 shows that the accuracy of $\tilde{f}_{+,h}^{(N)}(x_{0})$ ( $\tilde{f}_{+,h}^{(N)}(x_{0})$ ) is determined by the right (left) tail of $f$ . Therefore if the support of $f$ has a finite left endpoint, then it is reasonable to use the estimator $\tilde{f}_{-,h}^{(N)}(x_{0})$ whose risk will converge to zero at the standard rate. This fact connects our result to those of Groeneboom & Jongbloed (2003) and Delaigle & Meister (2011).
The following lower bounds on the minimax risks ${\cal R}_{n,\Delta_{x_{0}}}^{*}[{\mathscr{H}}_{\alpha}(A)]$ and ${\cal R}_{n,\Delta_{2}}^{*}[{\mathscr{S}}_{\alpha}(A)]$ can be extracted from the proof of Theorem 4. If the measurement error distribution is a $m$ –fold convolution of the uniform distribution then for any $\delta>0$

[TABLE]

Observe that $\phi_{n,\Delta_{x_{0}}}(m)\gg n^{-\alpha/(2\alpha+2m+1)}$ if $m>1$ , and $\phi_{n,\Delta_{2}}(m)\gg n^{-\alpha/(2\alpha+2m+1)}$ for all $m\geq 1$ . These results should be compared with the upper bounds in Theorem 2. In particular, even in the case of the uniform error density there is a significant difference in the behavior of the minimax risks ${\cal R}^{*}_{n,\Delta_{2}}[{\mathscr{S}}_{\alpha}(A)\cap{\mathscr{M}}_{1+\varepsilon}(B)]$ and ${\cal R}^{*}_{n,\Delta_{2}}[{\mathscr{S}}_{\alpha}(A)]$ : while the former is of the order $n^{-\alpha/(2\alpha+3)}$ , the latter one converges to zero at the rate slower than $n^{-\alpha/(4\alpha+2)-\delta}$ for any small $\delta>0$ . It is worth noting that some lower bounds on the minimax ${\mathbb{L}}_{2}$ –risk are reported in Hall & Meister (2007) and Meister (2008). However, these bounds are not directly comparable with ours since the considered functional classes and assumptions on $\widehat{g}(z)$ in the above papers are different from ones adopted in our paper. Moreover we mainly focus here on the minimal conditions needed to preserve the standard convergence rates and do not consider the problem of constructing optimal (in minimax sense) estimators in the case where these conditions are violated.

We focused on the setting when characteristic function of measurement errors has zeros on the imaginary axis and decreases at a polynomial rate. This corresponds to the case of smooth error densities. The super–smooth case when the characteristic function decreases at an exponential rate can be also considered within the proposed framework. This assumption leads to slow logarithmic rates, and it can be shown that zeros of the error characteristic function do not affect the minimax rates of convergence, i.e., the standard minimax rates are preserved with no additional tail conditions.
The proposed estimators are not adaptive in the sense that they require knowledge of the underlying functional classes. However, based on the results of this paper, adaptive estimators can be developed using standard methods [see, e.g., Lepski (1990) and Goldenshluger & Lepski (2011)]. We do not pursue this direction in the current paper.
Johnstone & Raimondo (2004) considered a closely related problem of signal deconvolution in the periodic Gaussian white noise model $\mathrm{d}y(t)=(f\ast g)(t)\mathrm{d}t+\sigma\mathrm{d}W(t)$ , $t\in[-1,1]$ , where $g$ is a boxcar kernel, that is, $g(t)=(2\theta)^{-1}{\bf 1}_{[-\theta,\theta]}(t)$ , and $\{W(t)\}$ is the standard two–sided Wiener process. If $\theta$ is a rational number then the signal $f$ is non–identifiable. Assuming that $\theta$ is irrational, Johnstone & Raimondo (2004) studied behavior of the minimax ${\mathbb{L}}_{2}$ –risk over the classes of ellipsoids and hyperrectangles defined on the Fourier coefficients of $f$ . They show that the minimax rates of convergence for the ${\mathbb{L}}_{2}$ –risk are affected by an oscillating behavior of the Fourier coefficients of the boxcar kernel. Our results suggest that if the assumption of periodicity of $f$ and $g$ is dropped then the minimax ${\mathbb{L}}_{2}$ –risk over the class ${\mathscr{S}}_{\alpha}(A)$ should be of the standard order $(\sigma^{2})^{-\alpha/(2\alpha+3)}$ . We plan to study these signal deconvolution models in our future research.

Appendix

Proof of Lemma 1

Fix $s\in(\varkappa_{g}^{-},0)\cup(0,\varkappa_{g}^{+})$ . By the Fubini’s theorem

[TABLE]

Now we show that for almost all $x,$

[TABLE]

Applying the bilateral Laplace transform to the left hand side of the previous display formula we obtain

[TABLE]

In view of (2.3), the function on the right hand side is analytic and equal to $\widehat{K}(zh)e^{-zx_{0}}$ on $S_{g}$ . On the other hand,

[TABLE]

Thus, the bilateral Laplace transforms of the functions on both sides of (.1) coincide on $\mathbb{C}$ ; therefore (.1) holds by the uniqueness property of the bilateral Laplace transform. This implies the lemma statement.

Proof of Lemma 2

(a). For $s\in(0,\varkappa_{g}^{+})$ , $a>0$ and $|\lambda|=1$ we have

[TABLE]

Therefore

[TABLE]

and it follows from (4.1) that

[TABLE]

where the third line follows from analyticity of the integrand, and ${\cal C}_{\ell}^{+}$ is defined in (4.4). Note that the change of the order of integration and summation is permissible under the premise of the lemma.

(b). If $s\in(\varkappa_{g}^{-},0)$ then

[TABLE]

and, similarly to the above,

[TABLE]

which yields

[TABLE]

where ${\cal C}_{\ell}^{-}$ is defined in (4.6). This completes the proof.

Proof of Theorem 1

Throughout the proof we keep track of dependence of all constants on parameters of the classes ${\mathscr{F}}^{\prime}_{\alpha,p}(A,B)$ and ${\mathscr{G}}_{\alpha,p}^{\prime}(A,B)$ . In what follows $c_{1},c_{2},\ldots$ stand for constants that can depend on parameters appearing in Assumptions 1, 2 and 3, and on $\alpha$ and $p$ only. For the sake of brevity, in the subsequent proof we do not indicate integration limits if the corresponding integrals are taken over the entire real line.

Proof of statement (a)

We assume that $x_{0}\geq 0$ and consider the estimator $\tilde{f}_{+,h}^{(N)}(x_{0})$ only; the derivation for $x_{0}<0$ and $\tilde{f}_{-,h}^{(N)}(x_{0})$ is similar.

First we verify that under Assumption 2 and condition (K) the estimator $\tilde{f}_{+,h}^{(N)}(x_{0})$ is well defined. Because $K$ has finite support and it is infinitely differentiable on the real line, $\widehat{K}(i\omega)$ is also infinitely differentiable and rapidly decreasing as $|\omega|\to\infty$ in the sense that $|\omega|^{k}|\widehat{K}^{(j)}(i\omega)|\leq c(k,j)$ for all $k$ and $j$ . In particular, for $k>\gamma+1$ by (3.3) of Assumption 2 we have

[TABLE]

Thus function $R_{h}(\cdot)$ in (4.2) and kernels $L_{+,h}^{(N)}$ and $L_{-,h}^{(N)}$ in (5.3) are well defined.

First we derive an upper bound on the bias of $\tilde{f}_{+,h}^{(N)}(x_{0})$ . We have

[TABLE]

It follows from definition of ${\cal C}_{\ell}^{+}$ [cf. (4.4)] that

[TABLE]

Now noting that

[TABLE]

we obtain

[TABLE]

Substituting this expression in (.2), and taking into account (3.1) and $\widehat{f}_{Y}(-i\omega)=\widehat{f}(-i\omega)\widehat{g}(-i\omega)$ we obtain

[TABLE]

where we denote

[TABLE]

If $f\in{\mathscr{M}}_{p}(B)$ then

[TABLE]

where we denoted $a_{\min}:=\min\{a_{1},\ldots,a_{q}\}$ and took into account that $a_{\min}>h$ for large enough $n$ . Then (.3), (.4), condition (K) and the fact that $k_{0}\geq\alpha+1$ imply the following upper bound on the bias

[TABLE]

Now we bound the variance of $\tilde{f}_{+,h}^{(N)}(x_{0})$ . We need the following notation. For non–negative integer number $j$ we let

[TABLE]

where ${\mathscr{L}}^{*}_{N}:={\mathscr{L}}_{N}\backslash\{0\}$ , and $\{{\cal C}_{\ell}^{+},\ell\in{\mathscr{L}}\}$ and $\{{\cal C}_{\ell}^{-},\ell\in{\mathscr{L}}\}$ are given in (4.4) and (4.6).

We have

[TABLE]

where for $x_{1},x_{2}>0$ we put

[TABLE]

where $\widehat{R}_{h}(i\omega)=\widehat{K}(i\omega h)\widehat{\psi}(-i\omega)$ . Since $\widehat{K}(i\omega)$ is rapidly decreasing as $|\omega|\to\infty$ and in view of (3.4) we have that $|\widehat{R}^{(j)}_{h}(i\omega)|\to 0$ as $|\omega|\to\infty$ for all $0\leq j\leq p_{0}$ , where $p_{0}$ appears in Assumption 2. Recall that by premise of the theorem $p_{0}\geq p$ .

Now, we proceed with bounding the terms $S_{i}(x_{0})$ , $i=1,2,3$ on the right hand side of (.7). First, since $f\in{\mathscr{M}}_{p}(B)$ , $f(x)\leq B$ and therefore $f_{Y}(y)\leq B$ for all $y$ . By Parseval’s identity and Assumption 2

[TABLE]

Furthermore, $f\in{\mathscr{M}}_{p}(B)$ implies that $|\widehat{f}^{(j)}(i\omega|\leq B$ , $0\leq j\leq p$ , and since $\widehat{g}$ is analytic, the derivatives $\widehat{f}_{Y}^{(j)}(i\omega)$ are finite for $0\leq j\leq p$ . Therefore for any integer $r\leq p$ by repeated integration by parts with respect to $\omega$ in (.8) we obtain for $x_{1}\neq 0$

[TABLE]

In the first line we have taken into account that $|\widehat{R}^{(j)}_{h}(i\omega)|\to 0$ as $|\omega|\to\infty$ for all $0\leq j\leq p_{0}$ , and $p_{0}\geq p\geq r$ . Now, invoking (.6) we have

[TABLE]

The Cauchy–Schwarz inequality applied to the double integral on the right hand side yields

[TABLE]

We bound $S_{3}(x_{0})$ similarly. In particular, for any integer $r$ such that $2r\leq p$ repeated integration by parts in (.8) first with respect to $\omega$ and then with respect to $\mu$ yields

[TABLE]

Hence

[TABLE]

and by the Cauchy–Schwarz inequality

[TABLE]

Combining the above bounds on $S_{1}$ , $S_{2}$ and $S_{3}$ we obtain that for any integer number $r$ such that $2r\leq p$ one has

[TABLE]

Now we bound the integrals on the right hand side of the above display formula.

Note that for any $j=0,\ldots,2r,$

[TABLE]

In view of (3.1) for $l=0,\ldots,2r$

[TABLE]

It is obvious that $|\widehat{\varphi}^{(k)}(i\omega)|\leq c_{6}$ for all $k=0,\ldots,2r$ . Furthermore, by the Faá di Bruno formula

[TABLE]

where $B_{k,m}(\cdot)$ are the Bell polynomials. Recall that $B_{k,m}$ is a homogeneous polynomial in $k$ variables of degree $m$ . This fact and (3.4) imply that for $k,m=0,\ldots,2r,$

[TABLE]

Using (3.3) we obtain $|[1/\widehat{\psi}(i\omega)]^{(k)}|\leq c_{8}|\omega|^{-\gamma}$ for $|\omega|\geq\omega_{0}$ ; hence $|\widehat{g}^{(l)}(i\omega)|\leq c_{9}|\omega|^{-\gamma}$ , $\forall|\omega|\geq\omega_{0}$ . This inequality in conjunction with boundedness of $\widehat{g}(i\omega)$ and $|\widehat{f}^{(j)}(i\omega)|\leq B$ , $0\leq j\leq p$ for all $\omega$ implies that for $l=0,\ldots,j$ and $j=0,\ldots 2r,$

[TABLE]

Thus, if $f\in{\mathscr{M}}^{\prime}_{p}(B)$ with $p\geq 2r$ then

[TABLE]

and therefore

[TABLE]

Taking into account that

[TABLE]

we have

[TABLE]

Combining the above bounds on the bias and variance we obtain that for any non–negative integer number $r$ satisfying $2r\leq p$ one has

[TABLE]

where $C_{1}$ and $C_{2}$ may depend on $\alpha$ and $p$ only.

To complete the proof of statement (a) it is suffices to note that under Assumption 3, $|H_{N,r}(\omega;x_{0})|\leq c_{17}$ , provided that $r\geq\nu$ . Therefore, in view of Assumption 2 and condition (K) the last term on the right hand side of (.12) is bounded above by $c_{18}Bh^{-2\gamma-1}$ . Then the announced result follows by substitution of the values of $h$ and $N$ in inequality (.12).

Proof of statement (b)

The proof uses pointwise bounds derived in the proof of statement (a).

To derive the upper bound on the integrated squared bias consider equality (.3). First we note the standard bound (Tsybakov, 2009, Section 1.2.3):

[TABLE]

Moreover, if $2p-1>0$ then by (.4)

[TABLE]

The same upper bound holds for the integral of the squared bias of the estimator over $x_{0}\in(-\infty,0]$ . Thus,

[TABLE]

Now consider the variance term. We use the variance decomposition given in (.7). It follows from (.9) that

[TABLE]

Furthermore, we note that for $r>1/2$

[TABLE]

and the same inequality holds for the integral $\int_{-\infty}^{0}|H_{N,r}(\omega;x_{0})|^{2}\mathrm{d}x_{0}$ . By Assumption 3, if $r\geq\nu+1/2$ then the sum on the right hand side of (.14) is uniformly bounded in $N$ . Therefore using (.10) and applying the Cauchy–Schwarz inequality we obtain

[TABLE]

The term originating from $S_{3}(x_{0})$ is bounded similarly. The bound (.14) together with with (.11) yields

[TABLE]

Combining the obtained inequalities with the bound on the bias and using the same reasoning as in the proof of Theorem 1 we conclude that for $r\geq\nu+1/2$ we have

[TABLE]

Substitution of the values $h=h_{*}$ and $N$ completes the proof of statement (b).

Proof of Theorem 2

In the subsequent proof we keep track of all constants depending on parameters of classes ${\mathscr{F}}_{\alpha,p}(A,B)$ and ${\mathscr{G}}_{\alpha,p}(A,B)$ . In what follows $c_{1},c_{2},\ldots$ denote positive constants that depend on $m$ and $\alpha$ only.

Proof of statement (a)

We provide the proof of statement (a) for the estimator corresponding to $x_{0}\geq 0$ only. The proof for $x_{0}<0$ is identical in every detail. The estimator is given by the formula

[TABLE]

The variance of this estimator is bounded as follows:

[TABLE]

Assume that $h<\theta$ (this is always fulfilled for large $n$ ), and denote

[TABLE]

Since ${\rm supp}(K)\subseteq[-1,1]$ and $h<\theta$ , the intervals $I_{j}(x_{0})$ and $I_{l}(x_{0})$ are disjoint for $j\neq l$ . Therefore

[TABLE]

where we have used that $K^{(m)}(\cdot)$ is bounded above by a constant. Now we bound the sum on the right hand side of (.15).

First, note that $g$ is supported on $[-m\theta,m\theta]$ , and $g(x)\leq c_{2}/\theta$ . Therefore

[TABLE]

Furthermore, writing $\xi_{j}:=x_{0}+(2j+m)\theta$ for brevity we have

[TABLE]

and since $m\theta>h$ we obtain

[TABLE]

Note that $C_{j,m}\leq\big{[}(j+m-1)/(m-1)\big{]}^{m-1}e^{m-1}\leq c_{4}j^{m-1}$ for $j\geq 1$ . Taking into account that $f\in{\mathscr{M}}_{2m-2}(B)$ we have

[TABLE]

where the first line follows from elementary inequality $\int_{a}^{b}f(t)\mathrm{d}t\leq\int_{a}^{b}(t/a)^{k}f(t)\mathrm{d}t$ for $a>0$ , $k\geq 0$ and non–negative $f$ , while the second line follows from the fact that the integrals under the sum are taken over disjoint intervals because $h<\theta$ . The similar upper bound holds for the sum corresponding to the second integral on the right hand side of (.16). The expression corresponding to the third integral is bounded as follows

[TABLE]

where the last inequality holds because

[TABLE]

Therefore

[TABLE]

Following the proof of Theorem 1 preceding formula (.4) we have that

[TABLE]

where

[TABLE]

Letting $\xi_{j}=x_{0}+2\theta(N+1)j$ and taking into account that $f\in{\mathscr{M}}_{p}(B)$ , and $\theta>h$ we have for any $j=1,\ldots,m$

[TABLE]

These inequality yields $|T_{N}(f;x_{0})|\leq c_{11}Bh^{-1}(\theta N)^{-p}$ , and since $f\in{\mathscr{H}}_{\alpha}(A)$ ,

[TABLE]

Then statement (a) follows immediately from the established bounds on the bias and variance by substitution of the chosen values of $h$ and $N$ .

Proof of statement (b)

We start with the bounding the variance term. The basis for the derivation is formula (.15) that should be integrated over $x_{0}\in[0,\infty)$ . In view of (.16) and the subsequent formulas

[TABLE]

Since $f\in{\mathscr{M}}_{p}(B)$ we have

[TABLE]

Hence taking into account that $p>2m-1$ and integrating with respect to $x_{0}\in[0,\infty)$ we obtain

[TABLE]

Using the same reasoning it is immediate to show that the same bound holds for the integrals of $J_{2}(x_{0})$ and $J_{3}(x_{0})$ : $\int_{0}^{\infty}J_{i}(x_{0})\mathrm{d}x_{0}\leq c_{4}B\theta^{-p}$ , $i=1,2$ . Combining these bounds with (.15) we obtain

[TABLE]

The integral over the negative semi–axis is bounded similarly.

To bound the integrated squared bias we note that (.17)–(.18) and $f\in{\mathscr{M}}_{p}(B)$ imply

[TABLE]

and the same estimate holds for the integral of $T_{N}(f;x_{0})$ over the negative semi–axis. In view of (.13) we obtain

[TABLE]

Then the statement follows from the established upper bounds on the integrated variance and the integrated squared bias.

Proof of Theorem 4

In the subsequent proof $c_{1},c_{2},\ldots$ stand for positive constants that do not depend on $n$ . The proof of (7.1) is based on the standard reduction to a two–point testing problem, while in the proof of (7.2) we use reduction to the problem of testing multiple hypotheses [see (Tsybakov, 2009, Chapter 2)].

Proof of statement (a)

Let $r>1/2$ be a real number and consider the probability density

[TABLE]

where $C_{r}$ is a normalizing constant. Clearly, $f_{0}\in{\mathscr{M}}_{p}(B)$ for $p<2r-1$ and sufficiently large constant $B$ depending on $p$ . In addition, $f_{0}$ is infinitely differentiable and belongs to ${\mathscr{H}}_{\alpha}(A)$ for any $\alpha$ and large enough $A$ .

Pick function $\widehat{\psi}_{0}$ with the following properties:

(i)

$\widehat{\psi}_{0}(\omega)=\widehat{\psi}_{0}(-\omega)$ , $\forall\omega$ ;

(ii)

$\widehat{\psi}_{0}(\omega)=1$ for $\omega\in[-1+\delta,1-\delta]$ with some fixed $\delta>0$ , and $\widehat{\psi}_{0}(\omega)=0$ , $|\omega|>1$ ;

(iii)

$\widehat{\psi}_{0}$ monotonically climbs from [math] to $1$ on $[-1,-1+\delta]$ . In addition, $\widehat{\psi}_{0}$ is infinitely differentiable function on the real line.

Let $h>0$ be a small real number such that $h<\pi/\theta$ , and $N\geq 1$ be an integer number. Define

[TABLE]

Note that $\widehat{\psi}_{h}$ is even and supported on the union of disjoint sets $\cup_{k=N+1}^{2N}A_{k}$ , where $A_{k}:=[-\pi k/\theta-h,-\pi k/\theta+h]\cup[\pi k/\theta-h,\pi k/\theta+h]$ . Function $\psi_{h}$ is given by the inverse Fourier transform:

[TABLE]

For real numbers $M>0$ and $c_{0}>0$ define

[TABLE]

We demonstrate that under appropriate choice of constants $M$ , $h$ and $N$ function $f_{1}$ is a probability density, and it belongs to ${\mathscr{H}}_{\alpha}(A)\cap{\mathscr{M}}_{p}(B)$ with $p<2r-1$ .

First, we note that $\int_{-\infty}^{\infty}\psi_{h}(x)\mathrm{d}x=0$ because $\widehat{\psi}_{h}(0)=0$ . Second, since $\widehat{\psi}_{0}$ is infinitely differentiable, $\psi_{0}$ is rapidly decreasing, and $|\psi_{0}(x)|\leq c_{1}|x|^{-2r}$ for some constant $c_{1}=c_{1}(r)$ depending on $r$ . Therefore $|\psi_{h}(x)|\leq c_{2}|x|^{-2r}h^{-2r+1}N$ for all $x$ , and if we set

[TABLE]

then by choice of constant $c_{0}$ we can ensure that $c_{0}M|\psi_{h}(x)|\leq f_{0}(x)$ for all $x$ . Thus, $f_{1}$ is a probability density provided (.21) holds. This also shows that $f_{1}\in{\mathscr{M}}_{p}(B)$ for $p<2r-1$ .

For simplicity assume that $\alpha$ is integer; then

[TABLE]

This implies that $f_{1}\in{\mathscr{H}}_{\alpha}(A)$ if $MhN^{\alpha+1}=h^{2r}N^{\alpha}\leq A$ .

Without loss of generality we consider the problem of estimating the value $f(0)$ . Note that we have

[TABLE]

Now we bound from above the $\chi^{2}$ –divergence between densities of observations $f_{Y,0}$ and $f_{Y,1}$ corresponding to the hypotheses $f=f_{0}$ and $f=f_{1}$ . We have

[TABLE]

Since $g$ is supported on $[-m\theta,m\theta]$ we have

[TABLE]

and

[TABLE]

Therefore

[TABLE]

Now we bound integrals $I_{1}$ and $I_{2}$ on the right hand side. First we assume that $r$ is an integer number. Then by Parseval’s identity

[TABLE]

Furthermore,

[TABLE]

We have

[TABLE]

By the Faá di Bruno formula for $j\geq 1$

[TABLE]

where $B_{j,l}(\cdot)$ are the Bell polynomials, and $\widehat{g}_{0}(\omega):=(\sin\theta\omega)/(\theta\omega)$ . First, we note that $|\widehat{g}_{0}^{(k)}(\omega)|\leq c_{10}(k)\min\{|\omega|^{-1},1\}$ for any $k$ . Using this fact and taking into account that $B_{j,l}$ is a homogeneous polynomial in $j$ variables of degree $l$ we have

[TABLE]

Taking into account this inequality, (.23), (.22), the fact that $\widehat{\psi}_{0}$ is supported on $[-1,1]$ , and recalling that $A_{k}:=[-\pi k/\theta-h,-\pi k/\theta+h]\cup[\pi k/\theta-h,\pi k/\theta+h]$ are disjoint for all $k=N+1,\ldots,2N$ , we obtain

[TABLE]

Combining these bounds we obtain that for integer $r\geq 1$

[TABLE]

The same upper bound holds for non–integer $r$ . Indeed, it follows from the above bounds on $I_{1}$ and $I_{2}$ that for integer $k\geq 1$

[TABLE]

Then for real $0\leq r\leq k$ by the interpolation inequality for the Sobolev spaces [see, e.g., (Aubin, 2000, Proposition 6.3.3)] we have

[TABLE]

which yields (.25) for real $r$ . The choice

[TABLE]

ensures that $f_{0}$ and $f_{1}$ are not distinguishable from observations which leads to the lower bound

[TABLE]

The rate of convergence obtained in (.26) dominates the standard rate if

[TABLE]

Therefore if $p<2r-1\leq 2m-2$ then the standard rate of convergence is not achievable. This completes the proof of (7.1).

Proof of statement (b)

Let $r>1/2$ , and $f_{0}$ be defined in (.19). Clearly, $f_{0}$ given by (.19) satisfies $f_{0}\in{\mathscr{S}}_{\alpha}(A)\cap{\mathscr{M}}_{p}(B)$ for $p<2r-1$ and large enough $A$ and $B$ .

Let $\widehat{\psi}_{0}$ be the function satisfying conditions (i)–(iii) in the proof of statement (a). For positive integer $k$ define

[TABLE]

Note that $\widehat{\psi}_{h,k}$ is even, supported on $[-\pi k/\theta-h,-\pi k/\theta+h]\cup[\pi k/\theta-h,\pi k/\theta+h]$ , and $\psi_{h,k}$ and $\psi_{h,k^{\prime}}$ , $k\neq k^{\prime}$ have disjoint supports because $h<\pi/\theta$ . Moreover,

[TABLE]

Let $N\geq 1$ be an integer number. Define the following family of functions

[TABLE]

where $M>0$ and $c_{0}>0$ are real numbers.

First we demonstrate that $f_{w}(x)$ , $w\in\{0,1\}^{N}$ is a probability density from the class ${\mathscr{S}}_{\alpha}(A)\cap{\mathscr{M}}_{p}(B)$ , provided that constants $M$ , $h$ and $N$ are chosen in an appropriate way. We have $\int_{-\infty}^{\infty}\varphi_{h,w}(x)\mathrm{d}x=0$ because $\widehat{\psi}_{h,k}(0)=0$ for all $k=N+1,\ldots,2N$ . Moreover, similarly to the proof of statement (a), since $\psi_{0}$ is rapidly decreasing,

[TABLE]

Therefore, if we let

[TABLE]

then by an appropriate choice of constant $c_{0}$ we can ensure that $c_{0}M|\varphi_{h,w}(x)|\leq f_{0}(x)$ for all $x$ . Thus, $f_{w}(x)$ is indeed a probability density for any $w\in\{0,1\}^{N}$ . This also shows that $f_{w}\in{\mathscr{M}}_{p}(B)$ with $p<2r-1$ . Furthermore,

[TABLE]

where the last expression follows from (.27). Therefore, if

[TABLE]

then $f_{w}\in{\mathscr{S}}_{\alpha}(A)$ for any $w\in\{0,1\}^{N}$ .

By the Varshamov–Gilbert lemma [see, e.g. (Tsybakov, 2009, Lemma 2.9)] there exists a subset $W=\{w^{(0)},\ldots,w^{(J)}\}$ of $\{0,1\}^{N}$ such that $w^{(0)}=(0,\ldots,0)$ , $J\geq 2^{N/8}$ , and any pair of vectors $w,w^{\prime}\in W$ are distinct in at least $N/8$ entries (the Hamming distance between $w$ and $w^{\prime}$ is at least $N/8$ ). In what follows we consider the family functions $\{f_{w},w\in W\}$ . Clearly, for any $w,w^{\prime}\in W$ we have

[TABLE]

Next, we bound the $\chi^{2}$ –divergence between distributions of observations corresponding to $f_{0}$ and $f_{w}$ , $w\in W\backslash w^{(0)}$ . As in the proof of statement (a) we have

[TABLE]

Furthermore,

[TABLE]

An upper bound on $I_{2}$ is derived as in the proof of statement (a). In particular, taking into account that $\widehat{\varphi}_{h,w}(\omega)$ is a sum of functions with disjoint supports, and repeating the steps from (.22) to (.24) we obtain

[TABLE]

Combining these bounds on $I_{1}$ and $I_{2}$ we obtain

[TABLE]

To complete the proof we use Theorem 2.7 from Tsybakov (2009); see also Lemma 4 from Goldenshluger & Lepski (2014). In particular, this result implies that if $N$ and $h$ satisfy

[TABLE]

then for large $n$ one has ${\cal R}^{*}_{n,\Delta_{2}}[{\mathscr{G}}_{\alpha,p}(A,B)]\geq c_{12}\rho$ , where $p<2r-1$ , and $\rho$ is defined in (.29). Now we choose $h$ and $N$ so that (.30) and (.28) are satisfied. To this end define

[TABLE]

and set

[TABLE]

With this choice (.28) and (.30) hold, and

[TABLE]

where $\beta:=2\big{[}(2m+2)(\mu/\varkappa)-1\big{]}/(2\alpha-1)$ . The rate of convergence obtained in (.29) dominates the standard rate of convergence if

[TABLE]

which is equivalent to $2r\leq 2m$ . Therefore if $p<2r-1\leq 2m-1$ then the standard rate of convergence is not attained. This completes the proof.

Acknowledgement

The authors are grateful to Taeho Kim for careful reading and useful remarks. This article was prepared within the framework of the HSE University Basic Research Program and funded by the Russian Academic Excellence Project ’5-100’. AG is supported by the ISF Reserarch Grant no. 361/15.

Bibliography29

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Anderssen (1980) Anderssen, R. S. (1980). On the use of linear functionals for Abel–type integral equations. The Application and Numerical Solution of Integral Equations. Edited by Anderssen, R and De Hoog, F. and Lucas, M., 195–221, Sijthoff and Noordhof International Publishers.
3Aubin (2000) Aubin, J.-P. (2000). Applied Functional Analysis. Second edition. John Wiley & Sons, New York.
4Butucea & Tsybakov (2008 a, 2008 b) Butucea, C. and Tsybakov, A. B. (2008 a). Sharp optimality in density deconvolution with dominating bias. I. Theory Probab. Appl. 52 , 24–39.
5Buticea & Tsybakov (2008 b) Butucea, C. and Tsybakov, A. B. (2008 b). Sharp optimality in density deconvolution with dominating bias. II. Theory Probab. Appl. 52 , 237–249.
6Carroll & Hall (1988) Carroll, R. J. and Hall, P. (1988). Optimal rates of convergence for deconvolving a density. J. Amer. Statist. Assoc. 83 , 1184-1186.
7Comte & Lacour (2013) Comte, F. and Lacour, C. (2013). Anisotropic adaptive kernel deconvolution. Ann. Inst. Henri Poincaré Probab. Stat. 49 , 569–609.
8Delaigle & Meister (2011) Delaigle, A. and Meister, A. (2011). Nonparametric function estimation under Fourier–oscillating noise. Statist. Sinica 21 , 1065–1092.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Density deconvolution under general assumptions

Abstract

keywords:

keywords:

1 Introduction

1.1 Problem formulation and background

1.2 Notation

2 General idea for estimator construction

2.1 Linear functional strategy

2.2 Relationship between kernels KhK_{h}Kh​ and Ls,hL_{s,h}Ls,h​

Lemma 1**.**

Remark 1**.**

3 Distribution of measurement errors

3.1 Assumptions

Assumption 1**.**

Remark 2**.**

Assumption 2**.**

Remark 3**.**

3.2 Examples of distributions

Example 1** (Uniform distribution).**

Example 2** (Convolution of uniform distributions).**

Example 3** (Discrete distributions).**

Example 4** (Convolution of uniform and smooth densities).**

4 Kernel representation

4.1 Infinite series representation

Lemma 2**.**

Remark 4**.**

4.2 Kernel representation in specific problem instances

Uniform distribution

Convolution of uniform distributions

Binomial distribution

5 Estimator and upper bounds on the risk

5.1 Estimator

5.2 Functional classes

Definition 1**.**

Definition 2**.**

Definition 3**.**

Remark 5**.**

5.3 Upper bounds

Assumption 3**.**

Theorem 1**.**

Remark 6**.**

6 Specific problem instances

6.1 Convolution of uniform distributions

Theorem 2**.**

Remark 7**.**

6.2 Binomial distribution

Theorem 3**.**

7 Lower bounds: necessity of moment conditions

Theorem 4**.**

Remark 8**.**

8 Concluding remarks

Appendix

Proof of Lemma 1

Proof of Lemma 2

Proof of Theorem 1

Proof of statement (a)

Proof of statement (b)

Proof of Theorem 2

Proof of statement (a)

Proof of statement (b)

Proof of Theorem 4

Proof of statement (a)

Proof of statement (b)

Acknowledgement

2.2 Relationship between kernels $K_{h}$ and $L_{s,h}$

Lemma 1.

Remark 1.

Assumption 1.

Remark 2.

Assumption 2.

Remark 3.

Example 1 (Uniform distribution).

Example 2 (Convolution of uniform distributions).

Example 3 (Discrete distributions).

Example 4 (Convolution of uniform and smooth densities).

Lemma 2.

Remark 4.

Definition 1.

Definition 2.

Definition 3.

Remark 5.

Assumption 3.

Theorem 1.

Remark 6.

Theorem 2.

Remark 7.

Theorem 3.

Theorem 4.

Remark 8.