On asymptotically efficient maximum likelihood estimation of linear   functionals in Laplace measurement error models

Catia Scricciolo

arXiv:1902.00653·math.ST·February 5, 2019

On asymptotically efficient maximum likelihood estimation of linear functionals in Laplace measurement error models

Catia Scricciolo

PDF

Open Access

TL;DR

This paper investigates the conditions under which maximum likelihood estimation can efficiently estimate linear functionals in Laplace measurement error models, revealing fundamental limitations and the absence of regular estimators at parametric rates.

Contribution

It characterizes when the MLE achieves asymptotic efficiency for linear functionals in Laplace deconvolution and demonstrates the fundamental limitations in achieving parametric rates.

Findings

01

MLE can be asymptotically efficient for some functionals

02

No regular estimator can achieve parametric rate in general

03

Estimation often requires slower rates with non-Gaussian limits

Abstract

Maximum likelihood estimation of linear functionals in the inverse problem of deconvolution is considered. Given observations of a random sample from a distribution $P_{0} \equiv P_{F_{0}}$ indexed by a (potentially infinite-dimensional) parameter $F_{0}$ , which is the distribution of the latent variable in a standard additive Laplace measurement error model, one wants to estimate a linear functional of $F_{0}$ . Asymptotically efficient maximum likelihood estimation (MLE) of integral linear functionals of the mixing distribution $F_{0}$ in a convolution model with the Laplace kernel density is investigated. Situations are distinguished in which the functional of interest can be consistently estimated at $n^{- 1/2}$ -rate by the plug-in MLE, which is asymptotically normal and efficient, in the sense of achieving the variance lower bound, from those in which no integral linear functional can be…

Equations159

X = Y + Z,

X = Y + Z,

k (z) = \frac{1}{2} e^{- ∣ z ∣}, z \in R .

k (z) = \frac{1}{2} e^{- ∣ z ∣}, z \in R .

p_{0} (x) \equiv p_{F_{0}} (x) = \int_{Y} k (x - y) d F_{0} (y) = \int_{Y} \frac{1}{2} e^{- ∣ x - y ∣} d F_{0} (y), x \in R .

p_{0} (x) \equiv p_{F_{0}} (x) = \int_{Y} k (x - y) d F_{0} (y) = \int_{Y} \frac{1}{2} e^{- ∣ x - y ∣} d F_{0} (y), x \in R .

X_{i} = Y_{i} + Z_{i}, i = 1, \dots, n,

X_{i} = Y_{i} + Z_{i}, i = 1, \dots, n,

f_{E} (e) = \frac{1}{2 s} exp (- ∣ e - μ ∣/ s), e \in R .

f_{E} (e) = \frac{1}{2 s} exp (- ∣ e - μ ∣/ s), e \in R .

M = D + E,

M = D + E,

f_{D} (d) = \frac{1}{τ} exp (- d / τ), d > 0,

f_{D} (d) = \frac{1}{τ} exp (- d / τ), d > 0,

\mathscr{P}:=\{\mbox{all p.m.'s $F$ on $(\mathscr{Y},\,\mathscr{B}(\mathscr{Y}))$}\}

\mathscr{P}:=\{\mbox{all p.m.'s $F$ on $(\mathscr{Y},\,\mathscr{B}(\mathscr{Y}))$}\}

F \mapsto ψ (F) := \int_{Y} a (y) d F (y)

F \mapsto ψ (F) := \int_{Y} a (y) d F (y)

\psi({\hat{F}_{n}}):=\psi(F)\big{|}_{F=\hat{F}_{n}}.

\psi({\hat{F}_{n}}):=\psi(F)\big{|}_{F=\hat{F}_{n}}.

\hat{F}_{n} \in ar g max_{F \in P} \frac{1}{n} i = 1 \sum n lo g p_{F} (X_{i}),

\hat{F}_{n} \in ar g max_{F \in P} \frac{1}{n} i = 1 \sum n lo g p_{F} (X_{i}),

\hat{F}_{n} \in ar g max_{F \in P} P_{n} lo g p_{F},

\hat{F}_{n} \in ar g max_{F \in P} P_{n} lo g p_{F},

p_{F} (\cdot) := \int_{Y} k (\cdot - y) d F (y)

p_{F} (\cdot) := \int_{Y} k (\cdot - y) d F (y)

P_{n} := \frac{1}{n} i = 1 \sum n δ_{X_{i}}

P_{n} := \frac{1}{n} i = 1 \sum n δ_{X_{i}}

p_{0} (x) = \int_{Y} k (x - y) f_{0} (y) d y, x \in R,

p_{0} (x) = \int_{Y} k (x - y) f_{0} (y) d y, x \in R,

\hat{F}_{n} ([X_{1 : n}, X_{n : n}]) = 1,

\hat{F}_{n} ([X_{1 : n}, X_{n : n}]) = 1,

\hat{θ}_{n} = {X_{m + 1 : n}, \frac{1}{2} (X_{m : n} + X_{m + 1 : n}), for n = 2 m + 1, m \in N, for n = 2 m .

\hat{θ}_{n} = {X_{m + 1 : n}, \frac{1}{2} (X_{m : n} + X_{m + 1 : n}), for n = 2 m + 1, m \in N, for n = 2 m .

\sqrt{n}\big{(}\hat{\theta}_{n}-\theta\big{)}\xrightarrow{\mathscr{L}}\,\mathscr{N}\big{(}0,\,1\big{)},

\sqrt{n}\big{(}\hat{\theta}_{n}-\theta\big{)}\xrightarrow{\mathscr{L}}\,\mathscr{N}\big{(}0,\,1\big{)},

\sqrt{n}\big{(}\psi(\hat{F}_{n})-\psi(F_{0})\big{)}

\sqrt{n}\big{(}\psi(\hat{F}_{n})-\psi(F_{0})\big{)}

p_{0} (x) \equiv p_{F_{0}} (x) = \int_{Y} e^{- (x - y)} 1 {y \leq x} d F_{0} (y), x \in X,

p_{0} (x) \equiv p_{F_{0}} (x) = \int_{Y} e^{- (x - y)} 1 {y \leq x} d F_{0} (y), x \in X,

\sup_{y\in\mathscr{Y}}\bigg{|}\frac{\mathrm{d}\big{(}\dot{a}(y)e^{-y}\big{)}}{\mathrm{d}F_{0}(y)}\bigg{|}\leq c_{0}.

\sup_{y\in\mathscr{Y}}\bigg{|}\frac{\mathrm{d}\big{(}\dot{a}(y)e^{-y}\big{)}}{\mathrm{d}F_{0}(y)}\bigg{|}\leq c_{0}.

\sqrt{n}\big{(}\psi(\hat{F}_{n})-\psi(F_{0})\big{)}=\sqrt{n}\mathbb{P}_{n}b_{F_{0}}+o_{\mathbf{P}}(1)\,\,\bigg{(}\xrightarrow{\mathscr{L}}\,\mathscr{N}\big{(}0,\,\|b_{F_{0}}\|_{2,P_{0}}^{2}\big{)}\bigg{)},

\sqrt{n}\big{(}\psi(\hat{F}_{n})-\psi(F_{0})\big{)}=\sqrt{n}\mathbb{P}_{n}b_{F_{0}}+o_{\mathbf{P}}(1)\,\,\bigg{(}\xrightarrow{\mathscr{L}}\,\mathscr{N}\big{(}0,\,\|b_{F_{0}}\|_{2,P_{0}}^{2}\big{)}\bigg{)},

x \mapsto b_{F_{0}} (x) := a (x) - \overset{a}{˙} (x) - ψ (F_{0}),

x \mapsto b_{F_{0}} (x) := a (x) - \overset{a}{˙} (x) - ψ (F_{0}),

∥ b_{F_{0}} ∥_{2, P_{0}}^{2} := \int b_{F_{0}}^{2} d P_{0}

∥ b_{F_{0}} ∥_{2, P_{0}}^{2} := \int b_{F_{0}}^{2} d P_{0}

I_{θ}^{- 1} \dot{ℓ}_{θ},

I_{θ}^{- 1} \dot{ℓ}_{θ},

E [(I_{θ}^{- 1} \dot{ℓ}_{θ}) (I_{θ}^{- 1} \dot{ℓ}_{θ})^{T}] = I_{θ}^{- 1} E [\dot{ℓ}_{θ} \dot{ℓ}_{θ}^{T}] I_{θ}^{- 1} = I_{θ}^{- 1} I_{θ} I_{θ}^{- 1} = I_{θ}^{- 1} .

E [(I_{θ}^{- 1} \dot{ℓ}_{θ}) (I_{θ}^{- 1} \dot{ℓ}_{θ})^{T}] = I_{θ}^{- 1} E [\dot{ℓ}_{θ} \dot{ℓ}_{θ}^{T}] I_{θ}^{- 1} = I_{θ}^{- 1} I_{θ} I_{θ}^{- 1} = I_{θ}^{- 1} .

\frac{\mathrm{d}}{\mathrm{d}t}\mathbb{P}_{n}\log p_{\hat{F}_{n,t}}\bigg{|}_{t=0}=0.

\frac{\mathrm{d}}{\mathrm{d}t}\mathbb{P}_{n}\log p_{\hat{F}_{n,t}}\bigg{|}_{t=0}=0.

A_{F} h_{F} (x) := \int_{Y} h_{F} (y) \frac{e ^{- (x - y)}}{p _{F} ( x )} 1 {y \leq x} d F (y) .

A_{F} h_{F} (x) := \int_{Y} h_{F} (y) \frac{e ^{- (x - y)}}{p _{F} ( x )} 1 {y \leq x} d F (y) .

\tilde{ψ}_{n} := \frac{1}{n} i = 1 \sum n [a (X_{i}) - \overset{a}{˙} (X_{i})] = P_{n} (a - \overset{a}{˙})

\tilde{ψ}_{n} := \frac{1}{n} i = 1 \sum n [a (X_{i}) - \overset{a}{˙} (X_{i})] = P_{n} (a - \overset{a}{˙})

\begin{split}\sqrt{n}\big{(}\psi(\hat{F}_{n})-\psi(F_{0})\big{)}&=\sqrt{n}\mathbb{P}_{n}b_{F_{0}}+o_{\mathbf{P}}(1)\\ &=\sqrt{n}\big{(}\tilde{\psi}_{n}-\psi(F_{0})\big{)}+o_{\mathbf{P}}(1)\,\,\bigg{(}\,\xrightarrow{\mathscr{L}}\,\mathscr{N}\big{(}0,\,\|b_{F_{0}}\|_{2,P_{0}}^{2}\big{)}\bigg{)}.\end{split}

\begin{split}\sqrt{n}\big{(}\psi(\hat{F}_{n})-\psi(F_{0})\big{)}&=\sqrt{n}\mathbb{P}_{n}b_{F_{0}}+o_{\mathbf{P}}(1)\\ &=\sqrt{n}\big{(}\tilde{\psi}_{n}-\psi(F_{0})\big{)}+o_{\mathbf{P}}(1)\,\,\bigg{(}\,\xrightarrow{\mathscr{L}}\,\mathscr{N}\big{(}0,\,\|b_{F_{0}}\|_{2,P_{0}}^{2}\big{)}\bigg{)}.\end{split}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Statistical Methods and Bayesian Inference

Full text

On asymptotically efficient maximum likelihood

estimation of linear functionals in Laplace measurement error models

Catia Scricciolo Catia Scricciolo

Dipartimento di Scienze Economiche, Università degli Studi di Verona, Polo Universitario Santa Marta, Via Cantarane 24, I-37129 Verona (VR), ITALY, [email protected]

Abstract

Maximum likelihood estimation of linear functionals in the inverse problem of deconvolution is considered. Given observations of a random sample from a distribution $P_{0}\equiv P_{F_{0}}$ indexed by a (potentially infinite-dimensional) parameter $F_{0}$ , which is the distribution of the latent variable in a standard additive Laplace measurement error model, one wants to estimate a linear functional of $F_{0}$ . Asymptotically efficient maximum likelihood estimation (MLE) of integral linear functionals of the mixing distribution $F_{0}$ in a convolution model with the Laplace kernel density is investigated. Situations are distinguished in which the functional of interest can be consistently estimated at $n^{-1/2}$ -rate by the plug-in MLE, which is asymptotically normal and efficient, in the sense of achieving the variance lower bound, from those in which no integral linear functional can be estimated at parametric rate, which precludes any possibility for asymptotic efficiency. The $\sqrt{n}$ -convergence of the MLE, valid in the case of a degenerate mixing distribution at a single location point, fails in general, as does asymptotic normality. It is shown that there exists no regular estimator sequence for integral linear functionals of the mixing distribution that, when recentered about the estimand and $\sqrt{n}$ -rescaled, is asymptotically efficient, viz., has Gaussian limit distribution with minimum variance. One can thus only expect estimation with some slower rate and, often, with a non-Gaussian limit distribution.

Keywords:

Asymptotic efficiency Asymptotic normality Laplace convolution model Linear functionals Non-parametric maximum likelihood estimation

MSC:

62G05 62G20 62G30

1 Introduction

The problem of asymptotically efficient estimation of integral linear functionals of the distribution of the latent variable in a standard additive Laplace measurement error model is considered. The focus is on establishing whether asymptotic normality and efficiency hold for the estimator obtained by plugging into the functional of interest the NPMLE of the mixing distribution in a convolution model with the Laplace kernel density. We study the behaviour of the plug-in NPMLE to answer the question of whether there exist integral linear functionals of the mixing distribution that can be consistently estimated by the maximum likelihood method at $n^{-1/2}$ -rate, the recentered and $\sqrt{n}$ -rescaled version of the plug-in NPMLE being asymptotically normal with zero mean and minimum variance. Situations are distinguished in which the plug-in NPMLE is consistent at parametric rate and asymptotically efficient, albeit the mixing distribution itself can typically be estimated only at slower rates, from those in which there exists no regular sequence of estimators that can be asymptotically efficient. The model is described hereafter and the problem formally stated.

Model description

Let $X$ be a real-valued random variable (r.v.) with distribution $P_{0}$ defined, for every Borel set $B$ on the real line, by the mapping $B\mapsto P_{0}(B):=\mathrm{P}(X\in B)$ . Suppose that $P_{0}$ is dominated by Lebesgue measure $\lambda$ on $\mathbb{R}$ , with probability density function (p.d.f.) $p_{0}:=\mathrm{d}P_{0}/\mathrm{d}\lambda$ . Let $X$ satisfy the relationship

[TABLE]

where $Y$ and $Z$ are (stochastically) independent, unobservable random variables such that $Y$ has unknown cumulative distribution function (c.d.f.) $F_{0}$ and $Z$ has the standard classical Laplace111It is also known as the first law of Laplace to distinguish it from the second law of Laplace, as the normal distribution is sometimes called. It was named after Pierre-Simon Laplace (1749–1827) who, in 1774 (Laplace 1774), obtained $e^{-|z-\theta|}/2$ , for $z,\,\theta\in\mathbb{R}$ , as the density of the distribution whose likelihood is maximized when the location parameter $\theta$ is equal to the sample median. or double exponential222It is so called because it is formed by reflecting the exponential distribution around its mean. distribution with scale parameter $s=1$ , in symbols, $Z\sim\mathrm{Laplace}\,(0,\,1)$ , whose density $k$ has expression

[TABLE]

The density $p_{0}$ is therefore the convolution of $F_{0}$ and $k$ or a location mixture of Laplace densities with mixing distribution $F_{0}$ supported on a subset $\mathscr{Y}\subseteq\mathbb{R}$ , where $\mathscr{Y}$ stands for a support of $F_{0}$ , see, e.g., Billingsley (1995), p. 23,

[TABLE]

For ease of exposition, the density of the standard Laplace distribution is considered as a kernel, but the density of any Laplace distribution centered at zero, with known scale parameter $s>0$ , in symbols, $Z\sim\mathrm{Laplace}\,(0,\,s)$ , whose variance is $\sigma^{2}_{Z}=2s^{2}$ ,333To see that $\sigma^{2}_{Z}=2s^{2}$ , one can take into account that, if $V_{1}$ and $V_{2}$ are independent r.v.’s, identically distributed as an exponential with parameter $1/s$ , in symbols, $V_{j}\sim\textrm{Exp}\,(1/s)$ , $j=1,\,2$ , then $V_{1}-V_{2}$ has a $\textrm{Laplace}\,(0,\,s)$ distribution. Consequently, $\sigma^{2}_{Z}=2\sigma^{2}_{V_{1}}=2(1/s)^{-2}=2s^{2}$ . could be employed. Assume that $X_{1},\,\ldots,\,X_{n}$ constitute a random sample from $p_{0}$ . Every $X_{i}$ then satisfies

[TABLE]

where $Y_{1},\,\ldots,\,Y_{n}$ and $Z_{1},\,\ldots,\,Z_{n}$ are independent samples from the distributions with c.d.f. $F_{0}$ and p.d.f. $k$ , respectively. The r.v.’s $Y_{1},\,\ldots,\,Y_{n}$ are independent and identically distributed (i.i.d.) as $Y$ and $Z_{1},\,\ldots,\,Z_{n}$ are independent copies of $Z$ . Realizations $x_{1},\,\ldots,\,x_{n}$ of the noisy sample data $X_{1},\,\ldots,\,X_{n}$ are observed instead of outcomes of the uncorrupted r.v.’s $Y_{1},\,\ldots,\,Y_{n}$ . The r.v.’s $Z_{1},\,\ldots,\,Z_{n}$ represent additive errors and their distribution is called the error distribution. In this model, the variable of interest $Y$ cannot be directly observed and empirical access is limited to the sum of $Y$ and the “noise” $Z$ . Therefore, estimating the distribution function $F_{0}$ of $Y$ , or related quantities like the p.d.f. $f_{0}$ (if it exists), based on a sample from $P_{0}$ , accounts for solving a particular inverse problem, called deconvolution, which consists in reconstructing (estimating) $F_{0}$ from indirect noisy observations $X_{1},\,\ldots,\,X_{n}$ drawn from $P_{0}\equiv P_{F_{0}}$ , the latter being the image of $F_{0}$ under a known transformation that has to be “inverted”. As remarked in Groeneboom and Wellner (1992), p. 4, as well as in Bolthausen et al. (2002), p. 363, the problem can be viewed as a missing data problem: the complete observations would consist of the independent pairs $X^{0}_{i}:=(Y_{i},\,Z_{i})$ , with $X^{0}_{i}\sim Q:=F_{0}\times F_{Z}$ ,444The symbol $F_{Z}$ denotes the c.d.f. of $Z$ , that is, $F_{Z}(z)=\int_{-\infty}^{z}k(u)\,\mathrm{d}u$ , with density $k$ as in (2). but part of the data is missing and only outcomes or realizations of the sums $X_{i}=T(X_{i}^{0}):=(Y_{i}+Z_{i})\sim P_{0}\equiv QT^{-1}$ , viewed as transformations of the $X^{0}_{i}$ ’s through the function $T$ , are observed.

The statistical model described by relationship (1), with a zero-mean Laplace measurement error r.v. $Z$ independent of $Y$ , is a special case of the classical error model $X=Y+Z$ , in which $X$ is a measurement of $Y$ in the usual sense, $Z$ has zero mean and is independent of $Y$ , see, e.g., Buzas et al. (2005), p. 733, and the references therein. Measurement errors with possibly different structures occur in nearly every discipline from medical statistics to astronomy and econometrics, cf. the monographs of Fuller (1987) and Buonaccorsi (2010). Furthermore, the Laplace distribution finds applications in a variety of disciplines, from image and speech recognition to ocean engineering, see Kotz et al. (2001), chpts. 7–10, pp. 343–397. An application to quality control of the classical Laplace measurement error model is outlined hereafter.

Application to steam generator inspection

An application of the Laplace measurement error model to steam generator inspection and testing is described herein, see Easterling (1980) and Sollier (2017) for more details. Steam generators of pressurized water reactors contain many tubes through which heated water flows. For a variety of reasons, such as corrosion-induced wastage, the steam generator tube integrity can be degraded, the walls becoming thinned or cracked. Leaks may occur during normal operating conditions, thus requiring the plant to be shut down. In order to develop an inspection plan, a statistical model for tube degradation is considered. Experimental data evidentiate that measurements are affected by heavy-tailed and biased errors that can be represented by a r.v. $E$ following a Laplace distribution with mean $\mu>0$ and scale parameter $s>0$ , in symbols, $E\sim\mathrm{Laplace}\,(\mu,\,s)$ , with density

[TABLE]

Denoted by $D$ the actual degradation (extent of thinning) of a tube, expressed as a percentage of the initial tube wall thickness, the measured degradation $M$ is modeled as

[TABLE]

where $E$ is supposed to be independent of $D$ . Assuming that the scale parameter $s$ is known and the distribution of $D$ possesses probability density function, say $f_{D}$ , the interest is, in the first place, in estimating the p.d.f. of $D$ , based on i.i.d. observations $M_{1},\,\ldots,\,M_{n}$ drawn from the distribution of $M$ . An exponential distribution for $D$ , with scale parameter $\tau>0$ , in symbols, $D\sim\textrm{Exp}\,(1/\tau)$ , whose density has expression

[TABLE]

provides an exponential-double exponential model for the actual degradation $M$ , which has proved to have an adequate fit on experimental data. Statistical procedures for fitness-for-service assessment are described in Carroll (2017).

Asymptotic efficiency of the NPMLE for linear functionals of the mixing distribution

For many purposes, interest can lie in only few aspects of the distribution of $Y$ , key features of which can be represented as linear functionals of $F_{0}$ . In what follows, symbols $F_{0}$ and $F$ will be used to indicate probability measures (p.m.’s) on $(\mathscr{Y},\,\mathscr{B}(\mathscr{Y}))$ , where $\mathscr{B}(\mathscr{Y})$ denotes the Borel $\sigma$ -field on $\mathscr{Y}$ , as well as the corresponding cumulative distribution functions, the correct meaning being clear from the context. Letting

[TABLE]

be the collection of all probability measures $F$ on $(\mathscr{Y},\,\mathscr{B}(\mathscr{Y}))$ , a functional is a mapping $\psi:\,\mathscr{P}\rightarrow\mathbb{R}$ that maps every $F\in\mathscr{P}$ to a real number $\psi(F)$ . The focus is on estimating integral linear functionals

[TABLE]

at the “point” $F_{0}$ , where the function $a\in L^{1}(F_{0})$ is given. The following examples illustrate choices of $a$ for some common statistical functionals.

•

Distribution function at a point If, for some fixed $y_{1}\in\mathbb{R}$ , the function $a(\cdot)=1_{(-\infty,\,y_{1}]}(\cdot)$ , then $\psi(F_{0})=\int_{\mathscr{Y}}1_{(-\infty,\,y_{1}]}(y)\,\mathrm{d}F_{0}(y)=F_{0}(y_{1})$ is the c.d.f. of $Y$ at the point $y_{1}$ .

•

Probability of an interval If, for fixed points $y_{1},\,y_{2}\in\mathbb{R}$ , the function $a(\cdot)=1_{(y_{1},\,y_{2}]}(\cdot)$ , then $\psi(F_{0})=\int_{\mathscr{Y}}1_{(y_{1},\,y_{2}]}(y)\,\mathrm{d}F_{0}(y)=F_{0}(y_{2})-F_{0}(y_{1})=\mathrm{P}(y_{1}<Y\leq y_{2})$ is the probability of the interval $(y_{1},\,y_{2}]$ .

•

Mean If $a(\cdot)=\textrm{id}_{\mathscr{Y}}(\cdot)$ is the identity function on $\mathscr{Y}$ , then $\psi(F_{0})=\int_{\mathscr{Y}}y\,\mathrm{d}F_{0}(y)=\mathrm{E}Y$ is the expected value of $Y$ which, for any kernel density $k$ (not necessarily the Laplace) with zero mean, $\mathrm{E}Z=0$ , is equal to $\mathrm{E}X$ : in fact, from the relationship $X=Y+Z$ in (1), it follows that $\mathrm{E}X=\mathrm{E}Y+\mathrm{E}Z=\mathrm{E}Y$ by linearity of the expected value.

•

$r$ th moment If, for any positive integer $r$ , the function $y\mapsto a(y)=y^{r}$ , then $\psi(F_{0})=\int_{\mathscr{Y}}y^{r}\,\mathrm{d}F_{0}(y)=\mathrm{E}Y^{r}$ is the $r$ th moment of $Y$ .

•

Moment generating function If, for some fixed point $t\in\mathbb{R}$ such that $0<|t|<1$ , the mapping $y\mapsto a(y)=e^{ty}$ , then $\psi(F_{0})$ coincides with the moment generating function (m.g.f.) of $F_{0}$ at $t$ , denoted by $M_{F_{0}}(t)$ or $M_{Y}(t)$ , that is, $\psi(F_{0})=\int_{\mathscr{Y}}e^{ty}\,\mathrm{d}F_{0}(y)=M_{F_{0}}(t)$ . Some features of the mixing distribution $F_{0}$ , like the mean or the variance, can be expressed in terms of the derivatives of the corresponding m.g.f. $M_{F_{0}}$ evaluated at zero. Therefore, in principle, results for estimating aspects of $F_{0}$ can be obtained as by-products of the inference on $M_{F_{0}}$ .

A standard and principled method for pointwise estimation of linear functionals consists in plugging the555Uniqueness of $\hat{F}_{n}$ is not guaranteed, it is therefore with an abuse of language that we refer to the NPMLE throughout the article. NPMLE $\hat{F}_{n}$ of $F_{0}$ into $\psi(\cdot)$ to obtain the plug-in estimator

[TABLE]

A NPMLE $\hat{F}_{n}$ of $F_{0}$ is a measurable function of the observations $X_{1},\,\ldots,\,X_{n}$ taking values in $\mathscr{P}$ , which is not necessarily uniquely defined by the relationship

[TABLE]

equivalently written as

[TABLE]

where

[TABLE]

is the generic location mixture of Laplace densities with mixing distribution $F$ and

[TABLE]

is the empirical probability measure associated with the random sample $X_{1},\,\ldots,\,X_{n}$ , namely, the discrete uniform distribution on the sample values that puts mass $1/n$ on each one of the observations. In the sequel, for a measurable function $f:\,\mathscr{X}\rightarrow\mathbb{R}$ , where $\mathscr{X}\subseteq\mathbb{R}$ is specified at the different occurrences, the notation $\mathbb{P}_{n}f$ is used to abbreviate the empirical average $n^{-1}\sum_{i=1}^{n}f(X_{i})$ . Analogously, $P_{0}f$ is used in lieu of $\int f\,\mathrm{d}P_{0}$ . Hereafter, unless it is necessary within the context to specify the integral domain, integration is understood to be performed over the entire natural domain of the integrand. Throughout the article, the probability measure $\mathbf{P}$ stands for $P_{0}^{n}$ , the joint law of the first $n$ coordinate projections of the infinite product probability measure $P_{0}^{\mathbb{N}}$ . Sequences of random variables are meant to convergence (in law or in probability) as the sample size $n$ grows indefinitely large (as $n\rightarrow+\infty$ ).

Historical and conceptual background, overview of the results

The deconvolution problem has been intensively studied over the last thirty years. There exists a vast literature on density deconvolution, which accounts for reconstructing/estimating the density $f_{0}$ of $Y$ (if it exists) that satisfies the equation

[TABLE]

wherein the kernel density $k$ (not necessarily a Laplace) is assumed to be known, based on outcomes $x_{1},\,\ldots,\,x_{n}$ of i.i.d. r.v.’s $X_{1},\,\ldots,\,X_{n}$ as in (3). We cite key articles of the early 90’s like Carroll and Hall (1988), Stefanski and Carroll (1990), Fan (1991)666For a recent reference list, see also Davidian et al. (2014). that have been ground-breaking and have had a great impact on the area of measurement error, setting the general framework for attacking measurement error/deconvolution problems and developing an approach based on Fourier inversion techniques to construct a deconvolution kernel density estimator for recovering the density of the latent distribution, meanwhile showing how difficult it is to account for measurement errors: in fact, the smoother the error distribution, the stronger its confounding effect on the latent distribution, hence, the slower the optimal attainable rate of convergence for its estimators.

Far less instead seems to be known about distribution function deconvolution, keynote contributions, also based on Fourier inversion techniques, being those of Hall and Lahiri (2008), Dattner et al. (2011), the former article containing an illuminating critical analysis of the background to the problem of distribution estimation in deconvolution problems. Since the focus of this article is on the behaviour of the NPMLE $\hat{F}_{n}$ of the mixing distribution $F_{0}$ , attention is hereafter restricted to review the theory of non-parametric maximum likelihood estimation in deconvolution problems. In general mixture models, the NPMLE $\hat{F}_{n}$ of $F_{0}$ is discrete, with at most $l\leq n$ support points, $l$ being the number of distinct values of the data points, cf. Lindsay (1983). In deconvolution problems with continuous and symmetric (about the origin) kernels decreasing on $[0,\,+\infty)$ , a NPMLE $\hat{F}_{n}$ always exist (uniqueness is not guaranteed), see Groeneboom and Wellner (1992), Lemma 2.1, pp. 57–58; for kernels that are also strictly convex on $[0,\,+\infty)$ , like the (standard) Laplace, the NPMLE $\hat{F}_{n}$ is supported on the set of observation points $\{X_{1},\,\ldots,\,X_{n}\}$ , so that the corresponding probability measure, still denoted by $\hat{F}_{n}$ consistently with the notational convention adopted throughout, is concentrated on the range of the data points $[X_{1:n},\,X_{n:n}]$ ,777Following a common notational convention, we denote by $X_{r:n}$ the $r$ th order statistic.

[TABLE]

see ibid., Corollary 2.2 and Corollary 2.3, p. 59 and p. 60, respectively. Consistency of $\hat{F}_{n}$ at a continuous distribution function $F_{0}$ is proved in ibid., § 4.2, pp. 79–81. More is known about the one-parameter (location only) Laplace model, which can be viewed as a degenerate mixture with point mass mixing distribution at some fixed $\theta\in\mathbb{R}$ , the Dirac measure at $\theta$ .888The Dirac measure at $\theta$ , denoted by $\delta_{\theta}(\cdot)$ , is defined on the Borel sets $B\in\mathscr{B}(\mathbb{R})$ by $\delta_{\theta}(B)=1,\,0$ if $B\ni\theta$ or $B\notni\theta$ , respectively. A simple maximization argument to find a MLE of the location parameter $\theta$ , denoted by $\hat{\theta}_{n}$ , is given in Norton (1984): the sample median is a MLE which is an M-estimator, see Huber (1967), solving the equation $\sum_{i=1}^{n}\mathrm{sign}(X_{i}-\theta)=0$ .999The sign-function is defined as $\mathrm{sign}(x)=-1,\,0,\,1$ if $x<0$ , $x=0$ or $x>0$ , respectively. Therefore, a MLE exists, but may not be unique: if $n$ is odd, that is, $n=2m+1$ for some $m\in\mathbb{N}$ , then the sample median is uniquely defined as the middle observation $X_{m+1:n}$ , while, if $n=2m$ is even, then there are two middle observations $X_{m:n}$ and $X_{m+1:n}$ so that, in principle, any value in the interval $[X_{m:n},\,X_{m+1:n}]$ could be chosen, even if the canonical median $(X_{m:n}+X_{m+1:n})/2$ , the average of the middle observations, is typically used in practice. Therefore,

[TABLE]

The sample median $\hat{\theta}_{n}$ is a MLE and is asymptotically efficient, that is, consistent and, when recentered at $\theta$ and $\sqrt{n}$ -rescaled, asymptotically normal with zero mean and variance equal to one, which is the information lower bound corresponding to the amount of information in a single observation,

[TABLE]

where “ $\,\xrightarrow{\mathscr{L}}\,$ ” denotes convergence in law. This has been established by Daniels (1961) who, motivated by the non-differentiability at zero of the (standard) Laplace density, proved a general theorem on the asymptotic efficiency of the MLE under conditions not involving the second and higher-order derivatives of the likelihood function. Even though, as later on noted by Huber (1967), a crucial step has been overlooked in Daniels’ proof, the assertion remains valid.

In this article, we study the behaviour of the plug-in NPMLE $\psi(\hat{F}_{n})$ to answer the question of whether there exist integral linear functionals $\psi(F_{0})$ that can be consistently estimated by the maximum likelihood method at $n^{-1/2}$ -rate and for which

[TABLE]

is asymptotically efficient, in the sense of Definition 2.8 in Bolthausen et al. (2002), p. 349, that is, asymptotically normal with zero mean and minimum variance. In fact, also in non-parametric problems estimation can be performed at $n^{-1/2}$ -rate and, in general mixture models $\int_{\mathscr{Y}}k(\cdot\mid y)\,\mathrm{d}F_{0}(y)$ , where $k$ is any (not necessarily the Laplace) kernel density, there may exist linear functionals of $F_{0}$ that are estimable at parametric rate, even if $F_{0}$ itself can be pointwise estimated only at slower rates. Fundamental contributions developing the theory of information bounds are van der Vaart (1991), van der Vaart (1998), chpt. 25, pp. 358–432, Bolthausen et al. (2002), Part III, pp. 331–457, Groeneboom and Wellner (1992), with emphasis on non-parametric maximum likelihood estimation, and van de Geer (2000), chpt. 11, pp. 211–246, with a focus on asymptotic efficiency of the NPMLE in mixture models. To exemplify the issue, consider estimating the mean functional $\psi(F_{0})=\mathrm{E}Y$ which, in a mixture model such that $\mathrm{E}(X\mid Y)=Y$ , is equal to $\mathrm{E}X$ . Then, the sample mean $\bar{X}_{n}=n^{-1}\sum_{i=1}^{n}X_{i}$ is a $n^{-1/2}$ -consistent and, after $\sqrt{n}$ -rescaling, asymptotically normal estimator of $\mathrm{E}Y$ , but may not be a MLE; furthermore, it does not take into account the information that the sampling density is a mixture. On other side, the MLE may be $n^{-1/2}$ -consistent and converge to a normal distribution with smaller variance than that of the sample mean. This is the case for the sample median in the single-parameter (location only) Laplace model, see van der Vaart (1998), Example 7.8 on location models, p. 96. Surprisingly, little is known in general about the asymptotic behaviour of the plug-in NPMLE for linear functionals in Laplace convolution mixtures, even if only for estimating the mean functional. Although the topic is useful, existence of this gap can be partially explained by the fact that the Laplace or double-exponential distribution is not an exponential family model so that standard results may not be valid or immediately available from the theory of exponential families.

In order to investigate whether integral linear functionals of the mixing distribution in a convolution model with the Laplace kernel density are estimable at $n^{-1/2}$ -rate, we appeal to van der Vaart’s differentiability theorem, see Theorem 3.1 in van der Vaart (1991), p. 183, a general result that allows for a unified treatment of the information lower bound theory based on the concept of a differentiable functional, see, for a definition of the latter, display (2.2) in van der Vaart (1991), p. 180, or Definition 1.10 in Bolthausen et al. (2002), p. 343. The differentiability theorem characterizes differentiable functionals and, by combining the description of the set of differentiable functionals with a result stating that the existence of regular estimator sequences for a functional implies its differentiability, provides a way to distinguish situations in which the functional of interest is estimable at $n^{-1/2}$ -rate from situations in which this is not the case, see van der Vaart (1998), p. 365, for a definition of a regular estimator sequence. A necessary and sufficient condition for differentiability of a (not necessarily linear) functional is that its gradients are contained in the range of the adjoint of the score operator, where the score operator can be viewed as a derivative (in quadratic mean) of the map $F\mapsto P_{F}$ , see (3.6) in van der Vaart (1991), p. 183, or (25.29) in § 25.5 of van der Vaart (1998), p. 372. As previously mentioned, differentiability is necessary for regular estimability of a functional or, equivalently, for the existence of regular estimator sequences, see Theorem 2.1 of van der Vaart (1991), p. 181, so that if the functional is not differentiable, then there exists no regular estimators and estimation at $n^{-1/2}$ -rate is impossible. Interestingly, for real-valued functionals, the differentiability condition is equivalent to having positive efficient information, see Theorem 4.1 of van der Vaart (1991), pp. 186–187. We find that the differentiability condition fails for integral linear functionals of the mixing distribution in a convolution model with the Laplace density, this implying that there exists no estimator sequence for $\psi(F_{0})$ that is regular at $F_{0}$ and estimation at $n^{-1/2}$ -rate is impossible.

Organization

The rest of the article is organized as follows. The main results are presented in Sect. 2, which is split into two parts. In the first one, asymptotic efficiency of the plug-in NPMLE for integral linear functionals of the mixing distribution in a convolution model with the (one-sided) exponential kernel density is analysed and set in the affirmative. Construction of interval estimators and tests based on a Studentized version of the plug-in NPMLE, when the asymptotic variance is consistently estimated, is revisited. Conditions for extending results to non-linear functionals are discussed as a side-issue. In the second part, the focus is on asymptotically efficient estimation by the plug-in NPMLE for integral linear functionals of the mixing distribution in a convolution model with the double-exponential (Laplace) kernel density. It is shown that, except for the case of a degenerate mixing distribution at a single location point, maximum likelihood estimation completely fails, in the sense that no integral linear functional can be estimated at $n^{-1/2}$ -rate, which precludes any possibility for the NPMLE of being asymptotically efficient. Indeed, there exists no regular sequence of estimators for integral linear functionals of the mixing distribution that can be asymptotically efficient, therefore, estimation of linear functionals is impossible at $n^{-1/2}$ -rate. Final remarks and comments are exposed in Sect. 3. Proofs of the main results are deferred to the appendices: Appendix A reports the proof of the result for convolution with the exponential density, Appendix B reports the proof of the result for convolution with the Laplace density.

2 Main results

In this section, the main results of the article are presented. First, the case of a convolution model with an exponential kernel density is considered. Since, as previously noted, the Laplace density in (2) can be thought of as two exponential densities spliced together back-to-back, the positive half being a standard exponential density scaled by $1/2$ , it is reasonable to begin the analysis from the problem of asymptotically efficient non-parametric maximum likelihood estimation of linear functionals in a convolution model with the exponential kernel density. A preliminary study of the one-sided problem, beyond being of interest in itself, is useful to attack the two-sided one by partially reducing it to the previously solved case; it may furthermore provide insight for a better understanding of the reasons why symmetrization leads to a failure of asymptotically efficient estimation of linear functionals in the double-exponential (Laplace) case.

Convolution with the exponential density

In this paragraph, a standard exponential kernel density on $[0,\,+\infty)$ is considered. This gives rise to a one-sided mixture density generating the data,

[TABLE]

where $\mathscr{Y}:=\mathrm{support}\,(F_{0})$ is assumed, without loss of generality, to be a proper, left-closed subset of the real line, and $\mathscr{X}:=[y_{\textrm{min}},\,+\infty)$ is the support of $P_{0}$ , with $y_{\textrm{min}}:=\mathrm{min}\,\mathscr{Y}>-\infty$ . Proposition 1 below establishes that, under sufficient conditions, an integral linear functional $\psi(F_{0})$ can be consistently estimated at $n^{-1/2}$ -rate by the plug-in NPMLE $\psi(\hat{F}_{n})$ , which, when recentered about the estimand and $\sqrt{n}$ -rescaled, is asymptotically normal and efficient. In stating hereafter the assumptions, Newton’s notation (or the dot notation) for differentiation is adopted, that is, ${\dot{a}}(y):=\mathrm{d}a(y)/\mathrm{d}y$ .

Assumptions

$(\bf{A0})$

$a\in L^{1}(F_{0})$ , 2. $(\bf{A1})$

$M_{F_{0}}(1):=\int_{\mathscr{Y}}e^{y}\,\mathrm{d}F_{0}(y)<+\infty$ , 3. $(\bf{A2})$

$|\psi(\hat{F}_{n})-\psi(F_{0})|=o_{\mathbf{P}}(1)$ , 4. $(\bf{A3})$

(i) $a$ is continuous on $\mathscr{Y}$ ,

(ii) either $\mathscr{Y}$ is compact or $a$ is bounded on $\mathscr{Y}$ ,

(iii) $F_{0}(y_{\textrm{min}})=0$ , 5. $(\bf{A4})$

there exists ${\dot{a}}$ on $\mathscr{Y}$ and $\sup_{y\in\mathscr{Y}}|\dot{a}(y)|<+\infty$ , 6. $(\bf{A5})$

there exists a constant $0<c_{0}<+\infty$ such that

[TABLE]

Some remarks and comments on the above listed assumptions are in order. Except for Assumption $({\bf{A2}})$ , which concerns the plug-in NPMLE $\psi(\hat{F}_{n})$ , all other assumptions involve the function $a$ and/or the mixing distribution $F_{0}$ that jointly define the functional $\psi(F_{0})$ . Specifically, Assumption $(\bf{A0})$ guarantees that $\psi(F_{0})$ is well defined. Assumption $(\bf{A1})$ ensures the existence of the moment generating function of $F_{0}$ at the point $t=1$ . Assumption $(\bf{A2})$ requires consistency of $\psi(\hat{F}_{n})$ at $\psi(F_{0})$ . If $\hat{F}_{n}$ converges weakly to $F_{0}$ in $P_{0}^{n}$ -probability, then parts (i) and (ii) of Assumption $(\bf{A3})$ together imply Assumption $(\bf{A2})$ because $a$ is continuous and bounded on $\mathscr{Y}$ . Sufficient conditions for $\hat{F}_{n}$ to converge weakly to $F_{0}$ in the convolution model with the standard exponential kernel density on $[0,\,+\infty)$ are stated in Groeneboom and Wellner (1992), p. 86; see also Theorem 2.3 of Chen (2017), p. 54, for sufficient conditions for strong consistency of $\hat{F}_{n}$ in general mixture models. Part (ii) of Assumption $(\bf{A3})$ postulates that either $\mathscr{Y}$ is a closed and bounded interval $[y_{\mathrm{min}},\,y_{\mathrm{max}}]$ or $\mathscr{Y}$ is a right-unbounded interval $[y_{\mathrm{min}},\,+\infty)$ and $a$ is bounded. Assumption $(\bf{A4})$ requires $a$ to be differentiable and bounded on $\mathscr{Y}$ , which, in particular, accounts for $a$ to be right-differentiable at $y_{\mathrm{min}}$ , that is, $\dot{a}_{+}(y_{\mathrm{min}})<+\infty$ ,101010The right-derivative of $a$ at $y_{\mathrm{min}}$ , denoted by $\dot{a}_{+}(y_{\mathrm{min}})$ , is defined as the one-sided limit $\lim_{y\rightarrow y_{\mathrm{min}}+}[a(y)-a(y_{\mathrm{min}})]/(y-y_{\mathrm{min}})$ if it exists as a real number. and, in the case where $\mathscr{Y}=[y_{\mathrm{min}},\,y_{\mathrm{max}}]$ , to be also left-differentiable at $y_{\mathrm{max}}$ , that is, $\dot{a}_{-}(y_{\mathrm{max}})<+\infty$ .111111The left-derivative of $a$ at $y_{\mathrm{max}}$ , denoted by $\dot{a}_{-}(y_{\mathrm{max}})$ , is defined as the one-sided limit $\lim_{y\rightarrow y_{\mathrm{max}}-}[a(y_{\mathrm{max}})-a(y)]/(y_{\mathrm{max}}-y)$ if it exists as a real number. Assumption $(\bf{A5})$ plays its role in the proof of Proposition 1 when bounding the worst possible sub-directions, see (Proof of Proposition 1) in the Appendix A.

Proposition 1

*Under Assumptions *(A0)**–(A5)**, we have

[TABLE]

where the mapping $b_{F_{0}}:\,\mathscr{X}\rightarrow\mathbb{R}$ , defined as

[TABLE]

is the efficient influence function, whose squared $L^{2}(P_{0})$ -norm

[TABLE]

is the efficient asymptotic variance.

Proposition 1 establishes that, under sufficient conditions listed as Assumptions (A0)–(A5), cf. Lemma 4.6 of van de Geer (2003), p. 461, and van der Geer (2000), p. 231, the plug-in NPMLE $\psi(\hat{F}_{n})$ consistently estimates $\psi(F_{0})$ at $n^{-1/2}$ -rate; furthermore, when recentered at $\psi(F_{0})$ and $\sqrt{n}$ -rescaled, it is asymptotically distributed as a zero-mean Gaussian, with variance attaining the lower bound given by the squared $L^{2}(P_{0})$ -norm of the efficient influence function, which plays here the same role as the normalized score function for the case of independent sampling from a parametric model $\{p_{\theta}$ , $\theta\in\Theta\subseteq\mathbb{R}^{d}\}$ , $d\geq 1$ ,

[TABLE]

where $\dot{\ell}_{\theta}(\cdot)=\partial[\log p_{\theta}(\cdot)]/\partial\theta$ is the score function of the model and $I_{\theta}=\mathrm{E}[\dot{\ell}_{\theta}\dot{\ell}_{\theta}^{\mathrm{T}}]$ the Fisher information matrix for $\theta$ . In a parametric set-up, the minimum variance lower bound reduces to the Cramér-Rao bound, which states that the inverse of the Fisher information matrix $I^{-1}_{\theta}$ is a lower bound on the variance of any $\sqrt{n}$ -rescaled unbiased estimator $T_{n}\equiv T_{n}(X_{1},\,\ldots,\,X_{n})$ of $\theta$ , in symbols, $\textrm{var}(\sqrt{n}T_{n})\geq I^{-1}_{\theta}$ . Therefore, the counterpart of $\|b_{F_{0}}\|_{2,P_{0}}^{2}=P_{0}b_{F_{0}}^{2}$ is, by symmetry of ( $I_{\theta}$ hence of) $I^{-1}_{\theta}$ ,

[TABLE]

In general, considered a function $\psi$ that maps $\Theta$ into $\mathbb{R}^{m}$ , $m\geq 1$ , and denoted by $\dot{\psi}_{\theta}$ the derivative of $\theta\mapsto\psi(\theta)$ , the matrix $\dot{\psi}_{\theta}I^{-1}_{\theta}\dot{\psi}_{\theta}^{\mathrm{T}}$ is a lower bound on the variance of any $\sqrt{n}$ -rescaled unbiased estimator of $\psi(\theta)$ .

Even if a statement of the result in Proposition 1 appears in Lemma 4.6 of van der Geer (2003), p. 461, as far as we are aware, a complete derivation of the assertion is not available in the literature, cf. also van der Geer (2000), p. 231, so the proof reported in the Appendix A might prove helpful. The underlying idea is outlined hereafter. A NPMLE $\hat{F}_{n}$ solves the likelihood equation for every path $t\mapsto F_{t}$ , with $\mathrm{d}F_{t}:=(1+th_{F})\,\mathrm{d}F$ , starting at a fixed point $F$ corresponding to $t=0$ and direction $h_{F}$ such that $\int h_{F}\,\mathrm{d}F=0$ , that is, for every parametric sub-model which passes (at $t=0$ ) through it. In symbols,

[TABLE]

For ease of notation, let

[TABLE]

Equation (8) reduces to $\mathbb{P}_{n}b_{\hat{F}_{n}}=0$ , where $b_{\hat{F}_{n}}(\cdot)=A_{\hat{F}_{n}}h_{\hat{F}_{n}}(\cdot)$ is the score function (at $t=0$ ) in an “information loss model”, see, e.g., § 25.5.2 in van der Vaart (1998), pp. 374–375. If $\hat{F}_{n}$ dominates $F_{0}$ , which, however, is seldom true, then $-P_{0}b_{\hat{F}_{n}}=\psi(\hat{F}_{n})-\psi(F_{0})$ so that $\psi(\hat{F}_{n})-\psi(F_{0})=(\mathbb{P}_{n}-P_{0})b_{\hat{F}_{n}}$ . Asymptotic equicontinuity arguments then yield that $(\mathbb{P}_{n}-P_{0})b_{\hat{F}_{n}}=(\mathbb{P}_{n}-P_{0})b_{F_{0}}+o_{\bf P}(n^{-1/2})=\mathbb{P}_{n}b_{F_{0}}+o_{\bf P}(n^{-1/2})$ because $P_{0}b_{F_{0}}=0$ , namely, the score has zero mean. So, $\sqrt{n}(\psi(\hat{F}_{n})-\psi(F_{0}))=\sqrt{n}\mathbb{P}_{n}b_{F_{0}}+o_{\bf P}(1)$ . Asymptotic normality follows. The reader is referred to Sect. 11.2 of van de Geer (2000), pp. 211–246, for a more comprehensive treatment of the topic taking into account technical difficulties to which it cannot be here dedicated the necessary space.

Remark 1

For simplicity, a convolution model with an exponential kernel density on $[0,\,+\infty)$ having intensity $\lambda=1$ has been considered (we warn the reader of the clash of notation with the symbol $\lambda$ previously used to denote Lebesgue measure on the real line), but, as revealed by an inspection of the proof of Proposition 1, the assertion holds true for every $\lambda>0$ .

Remark 2

Part (i) of Assumption (A3) requires $a$ to be continuous on $\mathscr{Y}$ , which is not true for indicator functions, therefore Proposition 1 does not apply to pointwise estimation of the c.d.f. $F_{0}$ nor to the estimation of the probability of an interval, so that it cannot be concluded that these functionals are estimable at $n^{-1/2}$ -rate by the corresponding plug-in NPMLE’s. Indeed, $F_{0}$ can be pointwise estimated only at $n^{-1/3}$ -rate, see Groeneboom and Wellner (1992), p. 121. Part (ii) of Assumption (A3) and Assumption (A4) require that both $a$ and $\dot{a}$ are bounded on $\mathscr{Y}$ , which, for example, may not be true for the functions $y^{r}$ and $e^{ty}$ that define the $r$ th moment and the moment generating function of $Y$ at the point $t$ , respectively: in fact, both $y^{r}$ and $e^{ty}$ , as well as their first derivatives, are continuous on the half-line $[y_{\mathrm{min}},\,+\infty)$ , but not bounded therein. Nonetheless, boundedness can be retrieved by restriction to a compact domain. Therefore, if, besides Assumptions (A0)–(A2) and (A5), it also holds that $F_{0}$ has compact support, then, by Proposition 1, it can be concluded that $\mathrm{E}Y^{r}$ and $M_{F_{0}}(t)$ are consistently and efficiently estimated at $n^{-1/2}$ -rate by their respective plug-in NPMLE’s.

Although Proposition 1 asserts that certain integral linear functionals can be consistently estimated at $n^{-1/2}$ -rate by the plug-in NPMLE $\psi(\hat{F}_{n})$ , which, when recentered at $\psi(F_{0})$ and $\sqrt{n}$ -rescaled, is asymptotically normal and efficient, two orders of problems may arise that can make it difficult to employ the result for statistical inference:

a)

computation of the NPMLE $\hat{F}_{n}$ ,

b)

dependence of the variance $\|b_{F_{0}}\|_{2,P_{0}}^{2}$ on the unknown sampling distribution $P_{0}$ .

As for the former difficulty, although the NPMLE $\hat{F}_{n}$ can be found by a one-step procedure computing the slope of the convex minorant of a certain function, cf. Groeneboom and Wellner (1992), pp. 62–63 (see also Vardi (1989) for a different approach), as a by-product of Theorem 11.8 of van de Geer (2000), p. 217, which the assertion of Proposition 1 relies on, the recentered and $\sqrt{n}$ -rescaled plug-in NPMLE $\sqrt{n}(\psi(\hat{F}_{n})-\psi(F_{0}))$ is equivalent, in the sense of being asymptotically approximable, up to an $o_{\mathbf{P}}(1)$ -error term, by the empirical average of the efficient influence function. This is part of a general issue concerning the fact that sequences of efficient estimators for functionals are asymptotically approximable by an empirical average of the efficient influence function, see, e.g., Lemma 2.9 in Bolthausen et al. (2002), p. 349. In fact, set the position

[TABLE]

and noted that, from the definition of $b_{F_{0}}$ in (6), the term $\mathbb{P}_{n}b_{F_{0}}$ appearing in (5) writes as $\tilde{\psi}_{n}-\psi(F_{0})$ , we have

[TABLE]

Thus, both $\sqrt{n}(\psi(\hat{F}_{n})-\psi(F_{0}))$ and $\sqrt{n}(\tilde{\psi}_{n}-\psi(F_{0}))$ are asymptotically normal and efficient. Moreover, estimators arising from $\tilde{\psi}_{n}$ may coincide with simple naïve estimators. For example,

•

if $\psi(F_{0})=\mathrm{E}Y$ , from (9), for $(a-\dot{a})(y)=y-1$ , we get the estimator $\tilde{\psi}_{n}=\bar{X}_{n}-1$ , which is the one we would suggest considering that $\mathrm{E}Y=\mathrm{E}X-\mathrm{E}Z=\mathrm{E}X-1$ ;

•

if $\psi(F_{0})=M_{Y}(t)$ for any fixed $t<1$ such that $\int_{\mathscr{Y}}e^{ty}\,\mathrm{d}F_{0}(y)<+\infty$ , then the estimator derived from (9), for $(a-\dot{a})(y)=(1-t)e^{ty}$ , is $\tilde{\psi}_{n}=(1-t)\times{n}^{-1}\sum_{i=1}^{n}e^{tX_{i}}$ , which is the one we would suggest taking into account that $M_{Y}(t)=M_{X}(t)/M_{Z}(t)=(1-t)M_{X}(t)$ , where $M_{Z}(t)=(1-t)^{-1}$ , $t<1$ , is the m.g.f. of a standard exponential r.v. $Z$ . So, letting $M_{n}(t):=n^{-1}\sum_{i=1}^{n}e^{tX_{i}}$ , $t\in\mathbb{R}$ , be the empirical m.g.f. for the random sample $X_{1},\,\ldots,\,X_{n}$ , it turns out that $\tilde{\psi}_{n}={M_{n}(t)}/{M_{Z}(t)}$ .

As for the difficulty listed in b), by the plug-in approach, replacing the asymptotic variance $\|b_{F_{0}}\|_{2,P_{0}}^{2}$ with a consistent estimator $S_{n}^{2}$ leads to the following assertion.

Corollary 1

Under the conditions of Proposition 1, if, in addition, $S_{n}^{2}\,\xrightarrow{\mathrm{P}}\,\|b_{F_{0}}\|_{2,P_{0}}^{2}$ , where “ $\xrightarrow{\mathrm{P}}$ ” denotes convergence in $P_{0}^{n}$ -probability, then

[TABLE]

Replaced the efficient asymptotic variance in (7) with a consistent sequence of estimators, asymptotic normality of the Studentized version of $\tilde{\psi}_{n}$ allows to carry out pointwise inference on linear functionals by interval estimation or hypotheses testing constructing confidence intervals or tests, respectively. For every $0<\alpha<1$ , let $z_{\alpha/2}$ be the $(1-\alpha/2)$ -quantile of a standard normal distribution, i.e., $\Phi(z_{\alpha/2})=1-\alpha/2$ , where $\Phi(\cdot)$ stands for the c.d.f. of a standard normal. Then,

[TABLE]

is an approximate $(1-\alpha)$ -level confidence interval for $\psi(F_{0})$ .

Remark 3

Asymptotic normality of the plug-in NPMLE for linear functionals of the mixing distribution can be employed to establish asymptotic normality for non-linear functionals. Suppose, for instance, that $F\mapsto\varphi(F)$ is defined as

[TABLE]

where the function $g:\,\mathbb{R}\rightarrow\mathbb{R}$ has non-zero derivative at $\psi(F_{0})$ denoted by $\dot{g}(\psi(F_{0}))$ . Asymptotic normality of $\sqrt{n}(\varphi(\hat{F}_{n})-\varphi(F_{0}))$ then follows from asymptotic normality of $\sqrt{n}(\psi(\hat{F}_{n})-\psi(F_{0}))$ by the delta method, see, e.g., chpt. 3 in van der Vaart (1998), pp. 25–34. So, if the convergence in (5) takes place, then

[TABLE]

where efficiency of $\psi(\hat{F}_{n})$ carries over into efficiency of $\varphi(\hat{F}_{n})$ , see ibid., p. 386, for details.

Alternatively, set the position

[TABLE]

under the condition

[TABLE]

which requires that, in probability, $\varphi(\hat{F}_{n})-\varphi(F_{0})$ behaves asymptotically as $\psi(\hat{F}_{n})-\psi(F_{0})$ , after $\sqrt{n}$ -rescaling, the two differences have the same limiting distribution. In fact, if the convergence in (5) takes place, then Slutsky’s lemma implies that

[TABLE]

see also the Remark of van de Geer (2000) on p. 223.

Convolution with the double-exponential (Laplace) density

In this paragraph, the case of main interest of the article concerning asymptotically efficient maximum likelihood estimation of linear functionals of the mixing distribution in a convolution model with the (standard) Laplace kernel density is considered. It has been recalled in Sect. 1 that, for a one-parameter $\theta$ (location only) Laplace model, the sample median $\hat{\theta}_{n}$ is a MLE, consistent and asymptotically efficient, even if, for small sample sizes, it may not be the best estimator to use because there exist other unbiased estimators with smaller variances, which are therefore more efficient, see, e.g., Remark 2.6.2 in Kotz et al. (2001), p. 82. More precisely, for a sample of odd size $n$ from a general $\mathrm{Laplace}\,(\theta,\,s)$ distribution, the variance of $\hat{\theta}_{n}$ is equal to

[TABLE]

where $k(\cdot)$ is the density of a standard Laplace distribution as defined in (2), while the asymptotic variance is equal to

[TABLE]

It is just the case to observe that also the sample mean $\bar{X}_{n}$ is asymptotically normal with mean $\theta$ , but the asymptotic relative efficiency (ARE) of the median to the mean, namely, the ratio of the variance of the sample mean to the asymptotic variance of the sample median equals $2$ :

[TABLE]

On a side note, we recall that, for any function $g$ differentiable at $\theta$ , with derivative $\dot{g}(\theta)$ , the plug-in MLE $g(\hat{\theta}_{n})$ is also asymptotically efficient, with

[TABLE]

see, e.g., Lehmann and Casella (1998), p. 440.

In what follows, we aim at giving results on asymptotically efficient maximum likelihood estimation of linear functionals of the mixing distribution, beyond the case of a degenerate mixing distribution localized at a point $\theta$ on the real line. As recalled in Sect. 1, in the deconvolution problem with the Laplace kernel density, a NPMLE $\hat{F}_{n}$ always exists and consistency at a continuous distribution function $F_{0}$ holds, but little is known about the asymptotic behaviour of the plug-in NPMLE for linear functionals. The following proposition states that, except for the above recalled degenerate case, estimation of integral linear functionals at $n^{-1/2}$ -rate is impossible.

Proposition 2

Let $F_{0}$ be a non-degenerate probability measure supported on $\mathscr{Y}$ . Let $\psi(F_{0})$ be any integral linear functional evaluated at $F_{0}$ . Then, there exists no estimator sequence for $\psi(F_{0})$ that is regular at $F_{0}$ .

Some comments on Proposition 2, whose proof is deferred to the Appendix B, are in order. It states that no integral linear functional is estimable at parametric rate, in particular, by the plug-in NPMLE $\psi(\hat{F}_{n})$ . One can thus expect estimation, performed by any method, only at slower rates and, possibly, with a non-Gaussian limiting distribution, even if the theorem we invoke to establish Proposition 2 does not give any indication about which rates to expect when estimation at $n^{-1/2}$ -rate fails, an issue that requires further investigation. A related open question concerns the possible extension of the negative result of Proposition 2 to convolution models with general kernel densities that are symmetric about zero, but not differentiable at it, a feature that seems to play a crucial role in causing failure of estimation at parametric rate. To sum-up, only in the case of a degenerate mixing distribution at a point $\theta$ , the MLE $\hat{\theta}_{n}$ is asymptotically efficient for the location parameter and the plug-in MLE $g(\hat{\theta}_{n})$ is asymptotically efficient for any $g(\theta)$ , with $g$ differentiable at $\theta$ .

3 Final remarks

In this article, we have studied asymptotically efficient maximum likelihood estimation of linear functionals of the mixing distribution in a standard additive measurement error model, when the error has either the exponential or Laplace distribution. In the former case, the plug-in NPMLE of certain linear functionals is $\sqrt{n}$ -consistent, asymptotically normal, efficient and equivalent to naïve estimators that are empirical averages of a given transformation of the observations. In the latter case, instead, even if the kernel is generated by symmetrization about the origin of the exponential density, left aside the degenerate case of a single Laplace model in which the MLE, the sample median, is asymptotically efficient for the location parameter, asymptotically efficient estimation of linear functionals completely fails, in the sense that estimation at $n^{-1/2}$ -rate is impossible for linear functionals of non-degenerate mixing distributions. An open question then is whether this negative result extends to general kernel densities symmetric about zero, but not differentiable at zero, a feature that seems to play a crucial role in causing the failure.

Appendix A

In this section, we present the proof of Proposition 1 on the asymptotic efficiency of the plug-in NPMLE for integral linear functionals of the mixing distribution in a convolution model with the exponential kernel density on $[0,\,+\infty)$ .

Proof of Proposition 1

We appeal to Theorem 2.1 of van de Geer (1997), p. 21 (see also Theorem 11.8 of van de Geer (2000), pp. 217–220, for a slightly more general version) and, in showing that Conditions 1–4 are satisfied, we follow the indications exposed in Sect. 3, ibid., pp. 24–27.

Verification of Condition 1. (Consistency and rates).

Under Assumption $(\bf{A1})$ that $M_{F_{0}}(1)<+\infty$ , the MLE $p_{\hat{F}_{n}}$ converges in the Hellinger distance $d_{\mathrm{H}}$ , defined as the $L^{2}$ -distance between the square-root densities, at the rate $O_{\mathbf{P}}(n^{-1/3})$ . In symbols, for $\delta_{n}:=n^{-1/3}$ ,

[TABLE]

The result can be obtained by applying Theorem 7.4 in van de Geer (2000), pp. 99–100, see also ibid., p. 124. As a consequence, see, e.g., Corollary 7.5, ibid., p. 100,

[TABLE]

where $\delta_{n}^{2}=o(n^{-1/2})$ . Consistency of $\psi(\hat{F}_{n})$ is guaranteed by Assumption $(\bf{A2})$ .

Verification of Condition 2. (Existence of the worst possible sub-directions and efficient influence functions. Differentiability of $\psi$ in a neighborhood of $F_{0}$ ).

For real numbers $M>M_{F_{0}}(1)>0$ and $r>0$ , let

[TABLE]

be a Hellinger-type ball centered at $p_{0}$ with radius $r>0$ . For every $\alpha\in[0,\,1)$ and $F\in\mathscr{P}_{0}$ , let $F_{\alpha}:=\alpha F+(1-\alpha)F_{0}$ . We prove

(a)

existence of the worst possible sub-directions $h_{F_{\alpha}}$ such that $h_{F_{\alpha}}\in L^{2}(F_{\alpha})$ and $\int h_{F_{\alpha}}\,\mathrm{d}F_{\alpha}=0$ ; 2. (b)

existence of the efficient influence functions $b_{F_{\alpha}}:=A_{F_{\alpha}}h_{F_{\alpha}}$ , where $A_{F_{\alpha}}h_{F_{\alpha}}(\cdot):=\mathrm{E}[h_{F_{\alpha}}(Y)\mid X=\cdot]$ ; 3. (c)

differentiability of $\psi$ at $F_{\alpha}$ :

[TABLE]

where $A^{*}b_{F_{\alpha}}(\cdot):=\mathrm{E}[b_{F_{\alpha}}(X)\mid Y=\cdot]$ .

For every $\alpha\in[0,\,1)$ , we prove the existence of $h_{F_{\alpha}}$ such that the corresponding $b_{F_{\alpha}}=A_{F_{\alpha}}h_{F_{\alpha}}$ satisfies $A^{*}b_{F_{\alpha}}(y)=a(y)-\psi({F_{\alpha}})$ for $F_{\alpha}$ -almost all $y$ ’s. We proceed by first deriving the expression of $b_{F_{\alpha}}$ as a solution of (11) and then proving the existence of the corresponding worst possible sub-direction $h_{F_{\alpha}}$ as required in (a) and (b). The function $b_{F_{\alpha}}$ has to satisfy

[TABLE]

where $\mathscr{X}=[y_{\textrm{min}},\,+\infty)$ , for $F_{\alpha}$ -almost all $y$ ’s. Differentiating both sides of (12) with respect to $y$ , we get

[TABLE]

Using constraint (12) in (13), we obtain that $a(y)-\psi(F_{\alpha})-b_{F_{\alpha}}(y)=\dot{a}(y)$ , whence $b_{F_{\alpha}}(y)=a(y)-\dot{a}(y)-\psi(F_{\alpha})$ . The solution is unique up to sets of $F_{\alpha}$ -measure zero. By an extension to $\mathscr{X}$ , $b_{F_{\alpha}}$ is then defined as in (6).

(a) Existence of $h_{F_{\alpha}}\in L^{2}(F_{\alpha})$ such that $\int h_{F_{\alpha}}\,\mathrm{d}F_{\alpha}=0$ .

Recall that

[TABLE]

Defined the function $I_{F_{\alpha}}:\,\mathscr{X}\rightarrow\mathbb{R}$ as

[TABLE]

integration by parts yields that

[TABLE]

because $a(y_{\textrm{min}})<+\infty$ and ${F_{\alpha}}(y_{\textrm{min}})=0$ by part (iii) of Assumption $(\bf{A3})$ combined with the fact that $F\in\mathscr{P}_{0}$ . Analogously, since $\dot{a}_{+}(y_{\mathrm{min}})<+\infty$ ,

[TABLE]

Then, defined the mapping $h_{F_{\alpha}}:\,\mathscr{Y}\rightarrow\mathbb{R}$ as

[TABLE]

by previous computations, we have that

[TABLE]

In order to check that $h_{F_{\alpha}}$ has expected value $\int h_{F_{\alpha}}\,\mathrm{d}F_{\alpha}=0$ , it suffices to note that, by applying twice the tower rule and using equalities (15) and (12),

[TABLE]

Next, we show that, for every $\alpha\in[0,\,1)$ and $F\in\mathscr{P}_{0}$ ,

[TABLE]

which implies that $h_{F_{\alpha}}\in L^{2}(F_{\alpha})$ . Noting that

[TABLE]

we can rewrite $h_{F_{\alpha}}$ in (14) as

[TABLE]

To conclude that $h_{F_{\alpha}}$ is bounded on $\mathscr{Y}$ , we observe two facts. First,

[TABLE]

where $|\psi(F_{0})|<+\infty$ by Assumption $(\bf{A0})$ and $\sup_{y\in\mathscr{Y}}|a(y)|<+\infty$ by parts (i) and (ii) of Assumption $(\bf{A3})$ . Second, for every $\alpha\in[0,\,1)$ and $F\in\mathscr{P}_{0}$ ,

[TABLE]

where $(\mathrm{d}F_{0}/\mathrm{d}F_{\alpha})$ exists because $F_{\alpha}$ dominates $F_{0}$ . The bound in (17) holds uniformly over $\mathscr{P}_{0}$ . Therefore,

[TABLE]

by Assumptions $(\bf{A0})$ , $(\bf{A1})$ , $(\bf{A3})$ – $(\bf{A5})$ and the fact that $M_{F}(1)$ is bounded by a constant $M$ on $\mathscr{P}_{0}$ .

(b)–(c) Definition of $b_{F_{\alpha}}$ and differentiability of $\psi$ at $F_{\alpha}$ .

The function $b_{F_{\alpha}}$ defined in (6), which solves equation (12), is such that $A_{F_{\alpha}}h_{F_{\alpha}}(x)=b_{F_{\alpha}}(x)$ for every $x\in\mathscr{X}$ , in virtue of (15).

Verification of Condition 3. (Control on the worst possible sub-directions $h_{F_{\alpha}}$ ).

Recall that $M_{F}(1)$ in (Proof of Proposition 1) is bounded by $M$ on $\mathscr{P}_{0}$ . Besides, the factor $(1-\alpha)^{-1}$ , which diverges to $+\infty$ as $\alpha\rightarrow 1$ , is counterbalanced by $1-\alpha$ . There thus exists a positive constant $B\equiv B(M,\,r)<+\infty$ such that

[TABLE]

Verification of Condition 4. (Control on the efficient influence functions $b_{F_{\alpha}}$ ).

The information for estimating $\psi(F_{0})$ is positive and finite, $0<\|b_{F_{0}}\|_{2,P_{0}}^{2}<+\infty$ . Also, the influence functions are uniformly bounded. In fact, for every $x\in\mathscr{X}$ , we have $|b_{F_{\alpha}}(x)|<|b_{F_{0}}(x)|+|\psi(F)-\psi(F_{0})|$ so that

[TABLE]

by Assumptions $(\bf{A0})$ , $(\bf{A3})$ (parts (i) and (ii)) and $(\bf{A4})$ .

Next, to show that relationships (2.10) and (2.11) in van de Geer (1997), p. 21, are satisfied, we follow the reasoning illustrated in Sect. 3.4, ibid., pp. 26–27, and check that, for some positive sequence $r_{n}\rightarrow 0$ ,

[TABLE]

where $\mathscr{P}_{n}$ is the set obtained from $\mathscr{P}_{0}$ in (10) by replacing $r$ with $r_{n}$ . Note that $b_{F_{\alpha}}-b_{F_{0}}=\alpha[\psi(F)-\psi(F_{0})]=\alpha\int_{\mathscr{Y}}a(y)\,\mathrm{d}(F-F_{0})(y)$ . Using integration by parts, together with conditions (i) and (ii) of Assumption $(\bf{A3})$ , which jointly guarantee that $a$ is bounded on $\mathscr{Y}$ , as well as the fact that every $F\in\mathscr{P}_{n}$ has the same support as $F_{0}$ , we find that $\int_{\mathscr{Y}}a(y)\,\mathrm{d}(F-F_{0})(y)=-\int_{\mathscr{Y}}\dot{a}(y)(F-F_{0})(y)\,\mathrm{d}y$ . The latter integral can be bounded above by applying inequality (30) in Scricciolo (2018), p. 358, which relates the $L^{1}$ -Wasserstein or Kantorovich distance $W_{1}(F,\,F_{0})=\|F-F_{0}\|_{1}$ between distribution functions $F$ and $F_{0}$ to the Hellinger distance between the corresponding mixtures (of exponential densities) $d_{\mathrm{H}}\equiv d_{\mathrm{H}}(p_{F},\,p_{0})=\|p_{F}^{1/2}-p_{0}^{1/2}\|_{2}$ ,

[TABLE]

where “ $\lesssim$ ” indicates inequality valid up to a constant multiple that is universal or fixed within the context, but anyway inessential for our purposes because the bound is uniform over $\mathscr{P}_{n}$ . The inequality is obtained by setting $p=1$ and $\beta=1$ , the latter value being determined by condition (29), ibid., p. 358, on the Fourier transform of a standard exponential density. By Assumption $(\bf{A4})$ , which guarantees that $\dot{a}$ is bounded on $\mathscr{Y}$ , and inequality (20), we have

[TABLE]

where $\lim_{n\rightarrow+\infty}d_{\mathrm{H}}\log^{3/2}(1/d_{\mathrm{H}})=0$ because $d_{\mathrm{H}}\leq r_{n}$ on $\mathscr{P}_{n}$ . The limit in (19) follows.

It remains to check that, for the collection of functions $\mathscr{I}:=\{b_{F_{\alpha}}:\,d_{\mathrm{H}}(p_{F},\,p_{0})\leq r,\,\,\,0\leq\alpha<1\}$ , the bracketing integral

[TABLE]

where $N_{[]}(\varepsilon,\,\mathscr{I},\,L^{2}(P_{0}))$ is the $\varepsilon$ -bracketing number of $\mathscr{I}$ for the $L^{2}(P_{0})$ -metric, namely, the smallest number of $\varepsilon$ -brackets needed to cover $\mathscr{I}$ , see, e.g., 2.1.6 Definition (Bracketing numbers) in van der Vaart and Wellner (1996), p. 83, or Definition 2.2 in van der Geer (2000), p. 16. Under Assumption $(\bf{A3})$ (parts (i) and (ii)) and Assumption $(\bf{A4})$ , by the same arguments as before, the $L^{2}(P_{0})$ -distance between the lower and upper functions $b_{F^{L}_{\alpha}}$ and $b_{F^{U}_{\alpha}}$ of every bracket $[b_{F^{L}_{\alpha}},\,b_{F^{U}_{\alpha}}]$ can be bounded above as follows:

[TABLE]

By 2.7.5 Theorem in van der Vaart and Wellner (1996), pp. 159–162, the bracketing entropy of the class of all uniformly bounded, monotone functions on the real line is of the order $O(1/\varepsilon)$ . Therefore, $\log N_{[]}(\varepsilon,\,\mathscr{I},\,L^{2}(P_{0}))=O(1/\varepsilon)$ and the integral in (21) is finite. The proof of Condition 4 is thus complete.

The conclusion of Theorem 2.1 follows:

[TABLE]

where $\mathbb{P}_{n}b_{F_{0}}$ has expected value $P_{0}b_{F_{0}}=0$ , as it can be deduced from (Proof of Proposition 1) when $\alpha=0$ . Hence,

[TABLE]

and the proof is complete. ∎

Appendix B

In this section, we present the proof of Proposition 2 which states that no integral linear functional of a non-degenerate mixing distribution in a convolution model with the Laplace kernel density is estimable at parametric rate, in particular, by the maximum likelihood method.

Proof of Proposition 2

We let, at the outset, $\psi(F_{0})$ be any integral linear functional, as defined in (4), evaluated at the “point” $F_{0}$ . Arguments are laid down to identify functions $a$ (if any) whose corresponding functionals are estimable at $n^{-1/2}$ -rate. To the aim, we appeal to van der Vaart’s differentiability theorem, which provides a necessary and sufficient condition for pathwise differentiability of a (not necessarily linear) functional, see Theorem 3.1, Corollaries 3.2, 3.3 and Lemma 3.4 of van der Vaart (1991), pp. 183–185, or Theorem 3.1, Corollaries 3.1, 3.2 and Proposition 3.1 in Groeneboom and Wellner (1992), pp. 24–28. If differentiability of a functional fails, then, by Theorem 2.1 of van der Vaart (1991), p. 181, the functional is not estimable at $n^{-1/2}$ -rate, see also chpt. 25 in van der Vaart (1998), pp. 358–432. A necessary and sufficient condition for differentiability of an integral linear functional $\psi(F_{0})$ is that, for $\mathscr{X}=\mathbb{R}$ , there exists a function $b:\,\mathscr{X}\rightarrow\mathbb{R}$ , with $b\in L^{2}(P_{0})$ , satisfying

[TABLE]

explicitly,

[TABLE]

where the conditional density of $X$ , given $Y=y$ , is $k(x-y)=e^{-|x-y|}/2$ , see § 7 in van der Vaart (1991), pp. 189–191, or Example 3.2 in Groeneboom and Wellner (1992), pp. 30–31. If an integral linear functional $\psi(F_{0})$ is regularly estimable, then the condition in (22) must be necessarily satisfied and a regular estimator for $\psi(F_{0})$ is given by $\mathbb{P}_{n}b=n^{-1}\sum_{i=1}^{n}b(X_{i})$ . The following arguments are aimed at deriving the expression of $b$ . Let $y\in\mathscr{Y}$ be fixed. For a function $a:\,\mathscr{Y}\rightarrow\mathbb{R}$ such that

[TABLE]

where, in the case when $\mathscr{Y}$ is bounded, $a$ (hence its derivative $\dot{a}$ ) is taken to be identically equal to zero on $\mathscr{Y}^{c}$ so that the limit is automatically verified, integration by parts yields that

[TABLE]

whence

[TABLE]

The integral analogous to the one on the left-hand side of (Proof of Proposition 2), but with the right branch of the Laplace density, can be dealt with similarly. For some $a$ satisfying the limit in (23) and also

[TABLE]

where the same proviso on $a$ and $\dot{a}$ applies for the case when $\mathscr{Y}$ is bounded, we get

[TABLE]

whence

[TABLE]

Summing side by side (25) and (26) and subtracting $\psi(F_{0})$ on both sides of the resulting equation, we obtain

[TABLE]

In order to get rid of the dependence of the function $a(\cdot)-\mathrm{sgn}(\cdot-y)\dot{a}(\cdot)-\psi(F_{0})$ on $y$ , the derivative $\dot{a}(\cdot)$ must be equal to zero, which means that $a(\cdot)$ is identically equal to a constant on ${\mathscr{Y}}$ and the functional is trivially equal to the constant itself. Conclude that there exists no integral linear functional $\psi(F_{0})$ of a non-degenerate mixing distribution $F_{0}$ that can be estimated at $n^{-1/2}$ -rate. This completes the proof. ∎

References

Billingsley P (1995) Probability and measure. Wiley, New York, 3rd edition

Bolthausen E, Perkins E, van der Vaart A (2002) Lectures on probability theory and statistics. Ecole d’Eté de Probabilités de Saint-Flour XXIX – 1999. Bernard P (ed) Lecture Notes in Mathematics, Vol 1781. Springer-Verlag, Berlin, pp 331–457

Buonaccorsi, JP (2010) Measurement error: models, methods, and applications. Chapman & Hall/CRC Press, Boca Raton, FL

Buzas JS, Stefanski LA, Tosteson TD (2005) Measurement error. In: Ahrens W, Pigeot I (eds) Handbook of epidemiology. Springer-Verlag, Berlin, Heidelberg, pp 729–765

Carroll LB (2017) Nuclear steam generator fitness-for-service assessment. In: Riznic J (ed) Steam generators for nuclear power plants. Woodhead Publishing, pp 511–523

Carroll RJ, Hall P (1988) Optimal rates of convergence for deconvolving a density. J Amer Statist Assoc 83:1184–1186

Chen J (2017) Consistency of the MLE under mixture models. Stat Sci 32:47–63

Daniels HE (1961) The asymptotic efficiency of a maximum likelihood estimator. In: Proc Fourth Berkeley Symp on Math Statist and Prob, Vol 1. Univ of Calif Press, pp 151–163

Dattner I, Goldenshluger A, Juditsky A (2011) On deconvolution of distribution functions. Ann Stat 39:2477–2501

Davidian M, Lin X, Morris JS, Stefanski LA (2014) The work of Raymond J. Carroll: The impact and influence of a statistician. Springer International Publishing, Switzerland

Easterling RG (1980) Statistical analysis of steam generator inspection plans and eddy current testing. Washington, DC: Division of Operating Reactors, Office of Nuclear Reactor Regulation, US Nuclear Regulatory Commission

Fan J (1991) On the optimal rates of convergence for nonparametric deconvolution problems. Ann Stat 19:1257–1272

Fuller WA (1987) Measurement error models. John Wiley, New York

Groeneboom P, Wellner JA (1992) Information bounds and nonparametric maximum likelihood estimation. Birkhäuser, Basel

Hall P, Lahiri SN (2008) Estimation of distributions, moments and quantiles in deconvolution problems. Ann Stat 36:2110–2134

Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proc Fifth Berkeley Symp on Math Statist and Prob, Vol 1. Univ of Calif Press, pp 221–233

Kotz S, Kozubowski TJ, Podgórski K (2001) The Laplace distribution and generalizations: a revisit with applications to communications, economics, engineering, and finance. Birkhäuser, Boston

Laplace P-S (1774) Mémoire sur la probabilité des causes par les événements. Mém Acad Roy Sci Paris (Savants étrangers) Tome VI:621–656

Lehmann EL, Casella G (1998) Theory of point estimation, 2nd ed. Springer-Verlag, New York

Lindsay BG (1983) The geometry of mixture likelihoods: a general theory. Ann Stat 11:86–94

Norton RM (1984) The double exponential distribution: using calculus to find a maximum likelihood estimator. Am Stat 38:135–136

Scricciolo C (2018) Bayes and maximum likelihood for $L^{1}$ -Wasserstein deconvolution of Laplace mixtures. Stat Methods Appl 27:333–362

Sollier T (2017) Nuclear steam generator inspection and testing. In: Riznic J (ed) Steam generators for nuclear power plants. Woodhead Publishing, pp 471–493

Stefanski L, Carroll RJ (1990) Deconvoluting kernel density estimators. Statistics 21:169–184

van de Geer S (1997) Asymptotic normality in mixture models. ESAIM Probab Stat 1:17–33

van de Geer SA (2000) Empirical processes in M-estimation. Cambridge University Press, Cambridge

van de Geer S (2003) Asymptotic theory for maximum likelihood in nonparametric mixture models. Comput Stat Data An 41:453–464

van der Vaart A (1991) On differentiable functionals. Ann Stat 19:178–204

van der Vaart AW (1998) Asymptotic statistics. Cambridge University Press, Cambridge

van der Vaart AW, Wellner JA (1996) Weak convergence and empirical processes. Springer-Verlag, New York

Vardi Y (1989) Multiplicative censoring, renewal processes, deconvolution and decreasing density: nonparametric estimation. Biometrika 76:751–761