An alternative local polynomial estimator for the error-in-variables   problem

Xianzheng Huang; Haiming Zhou

arXiv:1701.06105·stat.ME·January 24, 2017

An alternative local polynomial estimator for the error-in-variables problem

Xianzheng Huang, Haiming Zhou

PDF

Open Access

TL;DR

This paper introduces a new local polynomial estimator for error-in-variables regression problems that reduces bias and improves numerical stability, supported by theoretical analysis and empirical validation.

Contribution

It proposes an alternative estimator that avoids kernel transformation, with rigorous asymptotic analysis, an efficient implementation, and demonstrated advantages over existing methods.

Findings

01

The new estimator is less biased than existing methods.

02

It is numerically more stable in simulations.

03

The estimator performs well on real motorcycle crash data.

Abstract

We consider the problem of estimating a regression function when a covariate is measured with error. Using the local polynomial estimator of Delaigle, Fan, and Carroll (2009) as a benchmark, we propose an alternative way of solving the problem without transforming the kernel function. The asymptotic properties of the alternative estimator are rigorously studied. A detailed implementing algorithm and a computationally efficient bandwidth selection procedure are also provided. The proposed estimator is compared with the existing local polynomial estimator via extensive simulations and an application to the motorcycle crash data. The results show that the new estimator can be less biased than the existing estimator and is numerically more stable.

Figures40

Click any figure to enlarge with its caption.

Equations89

E (Y_{j} ∣ X_{j}) = m (X_{j}), W_{j} = X_{j} + U_{j},

E (Y_{j} ∣ X_{j}) = m (X_{j}), W_{j} = X_{j} + U_{j},

\hat{m}(x)=\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{S}_{n}^{-1}\mathbf{T}_{n},

\hat{m}(x)=\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{S}_{n}^{-1}\mathbf{T}_{n},

S_{n} = S_{n, 0} (x) ⋮ S_{n, p} (x) \dots ⋱ \dots S_{n, p} (x) ⋮ S_{n, 2 p} (x),

S_{n} = S_{n, 0} (x) ⋮ S_{n, p} (x) \dots ⋱ \dots S_{n, p} (x) ⋮ S_{n, 2 p} (x),

\left\{\begin{array}[]{l}S_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}\left(\frac{X_{j}-x}{h}\right)^{\ell}K_{h}(X_{j}-x)},\textrm{ for $\ell=0,1,\ldots,2p$},\\ T_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}Y_{j}\left(\frac{X_{j}-x}{h}\right)^{\ell}K_{h}(X_{j}-x)},\textrm{ for $\ell=0,1,\ldots,p$},\end{array}\right.

\left\{\begin{array}[]{l}S_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}\left(\frac{X_{j}-x}{h}\right)^{\ell}K_{h}(X_{j}-x)},\textrm{ for $\ell=0,1,\ldots,2p$},\\ T_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}Y_{j}\left(\frac{X_{j}-x}{h}\right)^{\ell}K_{h}(X_{j}-x)},\textrm{ for $\ell=0,1,\ldots,p$},\end{array}\right.

E {(W_{j} - x)^{ℓ} L_{ℓ, h} (W_{j} - x) ∣ X_{j}} = (X_{j} - x)^{ℓ} K_{h} (X_{j} - x), for ℓ = 0, 1, \dots, 2 p,

E {(W_{j} - x)^{ℓ} L_{ℓ, h} (W_{j} - x) ∣ X_{j}} = (X_{j} - x)^{ℓ} K_{h} (X_{j} - x), for ℓ = 0, 1, \dots, 2 p,

K_{U, ℓ} (x) = i^{- ℓ} \frac{1}{2 π} \int e^{- i t x} \frac{ϕ _{K}^{(ℓ)} ( t )}{ϕ _{U} ( - t / h )} d t, for ℓ = 0, 1, \dots, 2 p,

K_{U, ℓ} (x) = i^{- ℓ} \frac{1}{2 π} \int e^{- i t x} \frac{ϕ _{K}^{(ℓ)} ( t )}{ϕ _{U} ( - t / h )} d t, for ℓ = 0, 1, \dots, 2 p,

\left\{\begin{array}[]{l}\hat{S}_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}\left(\frac{W_{j}-x}{h}\right)^{\ell}L_{\ell,h}(W_{j}-x)},\textrm{ for $\ell=0,1,\ldots,2p$,}\\ \hat{T}_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}Y_{j}\left(\frac{W_{j}-x}{h}\right)^{\ell}L_{\ell,h}(W_{j}-x)},\textrm{ for $\ell=0,1,\ldots,p$}.\end{array}\right.

\left\{\begin{array}[]{l}\hat{S}_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}\left(\frac{W_{j}-x}{h}\right)^{\ell}L_{\ell,h}(W_{j}-x)},\textrm{ for $\ell=0,1,\ldots,2p$,}\\ \hat{T}_{n,\ell}(x)=n^{-1}\displaystyle{\sum_{j=1}^{n}Y_{j}\left(\frac{W_{j}-x}{h}\right)^{\ell}L_{\ell,h}(W_{j}-x)},\textrm{ for $\ell=0,1,\ldots,p$}.\end{array}\right.

ϕ_{m^{*} f_{W}} (t) = ϕ_{m f_{X}} (t) ϕ_{U} (t),

ϕ_{m^{*} f_{W}} (t) = ϕ_{m f_{X}} (t) ϕ_{U} (t),

\overset{m}{^}_{HZ} (x) = {\hat{f}_{X} (x)}^{- 1} \frac{1}{2 π} \int e^{- i t x} \frac{ϕ _{\overset{m}{^}^{*} \hat{f}_{W}} ( t )}{ϕ _{U} ( t )} d t,

\overset{m}{^}_{HZ} (x) = {\hat{f}_{X} (x)}^{- 1} \frac{1}{2 π} \int e^{- i t x} \frac{ϕ _{\overset{m}{^}^{*} \hat{f}_{W}} ( t )}{ϕ _{U} ( t )} d t,

B (x) = \int A (w) D (x - w) d w = (A * D) (x),

B (x) = \int A (w) D (x - w) d w = (A * D) (x),

t \to + \infty lim t^{b} ϕ_{U} (t) = c and t \to + \infty lim t^{b + 1} ϕ_{U}^{'} (t) = - c b

t \to + \infty lim t^{b} ϕ_{U} (t) = c and t \to + \infty lim t^{b + 1} ϕ_{U}^{'} (t) = - c b

d_{0} ∣ t ∣^{b_{0}} exp (- ∣ t ∣^{b} / d_{2}) \leq ∣ ϕ_{U} (t) ∣ \leq d_{1} ∣ t ∣^{b_{1}} exp (- ∣ t ∣^{b} / d_{2}) as ∣ t ∣ \to \infty

d_{0} ∣ t ∣^{b_{0}} exp (- ∣ t ∣^{b} / d_{2}) \leq ∣ ϕ_{U} (t) ∣ \leq d_{1} ∣ t ∣^{b_{1}} exp (- ∣ t ∣^{b} / d_{2}) as ∣ t ∣ \to \infty

{f_{X} (x)}^{- 1} [E {B (x) ∣ W} - m (x) \hat{f}_{X} (x)],

{f_{X} (x)}^{- 1} [E {B (x) ∣ W} - m (x) \hat{f}_{X} (x)],

\hat{f}_{X} (x) = f_{X} (x) + μ_{2} h^{2} f_{X}^{(2)} (x) /2 + o_{P} (h^{2}) .

\hat{f}_{X} (x) = f_{X} (x) + μ_{2} h^{2} f_{X}^{(2)} (x) /2 + o_{P} (h^{2}) .

E {\overset{m}{^}^{*} (w) ∣ W}

E {\overset{m}{^}^{*} (w) ∣ W}

\hat{f}_{W} (w)

E {A (w) ∣ W} = A (w) + μ_{2} M (w) h^{2} /2 + o_{P} (h^{2}),

E {A (w) ∣ W} = A (w) + μ_{2} M (w) h^{2} /2 + o_{P} (h^{2}),

E {B (x) ∣ W} = {E (A ∣ W) * D} (x) = B (x) + μ_{2} h^{2} (M * D) (x) /2 + o_{P} (h^{2}) .

E {B (x) ∣ W} = {E (A ∣ W) * D} (x) = B (x) + μ_{2} h^{2} (M * D) (x) /2 + o_{P} (h^{2}) .

\frac{μ _{2} h ^{2}}{2 f _{X} ( x )} {(M * D) (x) - m (x) f_{X}^{(2)} (x)} + o_{P} (h^{2}),

\frac{μ _{2} h ^{2}}{2 f _{X} ( x )} {(M * D) (x) - m (x) f_{X}^{(2)} (x)} + o_{P} (h^{2}),

Var {\overset{m}{^}_{HZ} (x) ∣ W} = Var {B (x) ∣ W} f_{X}^{- 2} (x) {1 + o_{P} (1)},

Var {\overset{m}{^}_{HZ} (x) ∣ W} = Var {B (x) ∣ W} f_{X}^{- 2} (x) {1 + o_{P} (1)},

Var {B (x) ∣ W} = \int D (x - w_{1}) \int D (x - w_{2}) Cov {A (w_{1}), A (w_{2}) ∣ W} d w_{2} d w_{1} .

Var {B (x) ∣ W} = \int D (x - w_{1}) \int D (x - w_{2}) Cov {A (w_{1}), A (w_{2}) ∣ W} d w_{2} d w_{1} .

Cov {A (w_{1}), A (w_{2}) ∣ W} = Cov {\overset{m}{^}^{*} (w_{1}), \overset{m}{^}^{*} (w_{2}) ∣ W} f_{W} (w_{1}) f_{W} (w_{2}) {1 + o_{P} (1)} .

Cov {A (w_{1}), A (w_{2}) ∣ W} = Cov {\overset{m}{^}^{*} (w_{1}), \overset{m}{^}^{*} (w_{2}) ∣ W} f_{W} (w_{1}) f_{W} (w_{2}) {1 + o_{P} (1)} .

\textrm{Cov}\{\hat{m}^{*}(w_{1}),\,\hat{m}^{*}(w_{2})|\mathbb{W}\}=\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}(\mathbf{G}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{W}_{1}\mathbf{G}_{1})^{-1}(\mathbf{G}_{1}^{\mathrm{\scriptscriptstyle T}}\mbox{\boldmath$\Sigma$}_{12}\mathbf{G}_{2})(\mathbf{G}_{2}^{\mathrm{\scriptscriptstyle T}}\mathbf{W}_{2}\mathbf{G}_{2})^{-1}\mbox{\boldmath$e$}_{1},

\textrm{Cov}\{\hat{m}^{*}(w_{1}),\,\hat{m}^{*}(w_{2})|\mathbb{W}\}=\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}(\mathbf{G}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{W}_{1}\mathbf{G}_{1})^{-1}(\mathbf{G}_{1}^{\mathrm{\scriptscriptstyle T}}\mbox{\boldmath$\Sigma$}_{12}\mathbf{G}_{2})(\mathbf{G}_{2}^{\mathrm{\scriptscriptstyle T}}\mathbf{W}_{2}\mathbf{G}_{2})^{-1}\mbox{\boldmath$e$}_{1},

G_{k} = 1 ⋮ 1 (W_{1} - w_{k}) ⋮ (W_{n} - w_{k}) \dots ⋱ \dots (W_{1} - w_{k})^{p} ⋮ (W_{n} - w_{k})^{p} .

G_{k} = 1 ⋮ 1 (W_{1} - w_{k}) ⋮ (W_{n} - w_{k}) \dots ⋱ \dots (W_{1} - w_{k})^{p} ⋮ (W_{n} - w_{k})^{p} .

Cov {\overset{m}{^}^{*} (w_{1}), \overset{m}{^}^{*} (w_{2}) ∣ W}

Cov {\overset{m}{^}^{*} (w_{1}), \overset{m}{^}^{*} (w_{2}) ∣ W}

ξ_{ℓ_{1}, ℓ_{2}} (w, h) = \int (u - w / h)^{ℓ_{1}} (u + w / h)^{ℓ_{2}} K (u - w / h) K (u + w / h) d u .

ξ_{ℓ_{1}, ℓ_{2}} (w, h) = \int (u - w / h)^{ℓ_{1}} (u + w / h)^{ℓ_{2}} K (u - w / h) K (u + w / h) d u .

\textrm{Cov}\left\{\mathcal{A}(w_{1}),\,\mathcal{A}(w_{2})|\mathbb{W}\right\}=\frac{\gamma\left\{(w_{1}+w_{2})/2\right\}}{nh}\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{S}^{-1}\mathbf{S}^{*}_{\hbox{\tiny$W$},h}\mathbf{S}^{-1}\mbox{\boldmath$e$}_{1}\left\{1+o_{\hbox{\tiny$P$}}\left(\frac{1}{nh}\right)\right\},

\textrm{Cov}\left\{\mathcal{A}(w_{1}),\,\mathcal{A}(w_{2})|\mathbb{W}\right\}=\frac{\gamma\left\{(w_{1}+w_{2})/2\right\}}{nh}\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{S}^{-1}\mathbf{S}^{*}_{\hbox{\tiny$W$},h}\mathbf{S}^{-1}\mbox{\boldmath$e$}_{1}\left\{1+o_{\hbox{\tiny$P$}}\left(\frac{1}{nh}\right)\right\},

Var {B (x) ∣ W}

Var {B (x) ∣ W}

\int D (x - w_{1}) \int D (x - w_{2}) γ (\frac{w _{1} + w _{2}}{2}) ξ_{ℓ_{1}, ℓ_{2}} (\frac{w _{1} - w _{2}}{2}, h) d w_{2} d w_{1},

\int D (x - w_{1}) \int D (x - w_{2}) γ (\frac{w _{1} + w _{2}}{2}) ξ_{ℓ_{1}, ℓ_{2}} (\frac{w _{1} - w _{2}}{2}, h) d w_{2} d w_{1},

{γ (x) + O (h)} \int K_{U, ℓ_{1}} (v) K_{U, ℓ_{2}} (v) d v .

{γ (x) + O (h)} \int K_{U, ℓ_{1}} (v) K_{U, ℓ_{2}} (v) d v .

\textrm{Var}\left\{\mathcal{B}(x)|\mathbb{W}\right\}=\frac{\gamma(x)}{nh}\mbox{\boldmath$e$}_{1}^{\mathrm{\scriptscriptstyle T}}\mathbf{S}^{-1}\mathbf{K}(h)\mathbf{S}^{-1}\mbox{\boldmath$e$}_{1}\left\{1+o_{\hbox{\tiny$P$}}\left(\frac{1}{nh}\right)\right\}.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStatistical Methods and Inference · Fault Detection and Control Systems · Gaussian Processes and Bayesian Inference

Full text

An alternative local polynomial estimator for the error-in-variables problem

\nameXianzheng Huanga*∗* and Haiming Zhoub ∗Corresponding author. Email: [email protected] aDepartment of Statistics, University of South Carolina, Columbia, South Carolina, U.S.A.; bDivision of Statistics, Northern Illinois University, DeKalb, Illinois, U.S.A.

(v4.0 released June 2015)

Abstract

We consider the problem of estimating a regression function when a covariate is measured with error. Using the local polynomial estimator of Delaigle et al. (2009) as a benchmark, we propose an alternative way of solving the problem without transforming the kernel function. The asymptotic properties of the alternative estimator are rigorously studied. A detailed implementing algorithm and a computationally efficient bandwidth selection procedure are also provided. The proposed estimator is compared with the existing local polynomial estimator via extensive simulations and an application to the motorcycle crash data. The results show that the new estimator can be less biased than the existing estimator and is numerically more stable.

keywords:

convolution; deconvolution; Fourier transform; measurement error.

{classcode}

62G05; 62G08; 62G20

1 Introduction

The error-in-covariates problem has received great attention among researchers who study nonparametric inference for regression functions over the past two decades. Schennach (2004a, b) proposed an estimator of the regression function when the error-prone covariate is measured twice. Her estimator does not require a known measurement error distribution. Zwanzig (2007) proposed a local least square estimator of the regression function, assuming a uniformly distributed error-prone covariate with normal measurement error. Many more existing methods are developed under the assumption of a known measurement error distribution and an unknown true covariate distribution. Among these works, many follow the theme of deconvolution kernel pioneered in the density estimation problem in the presence of measurement error (Carroll and Hall, 1988; Stefanski and Carroll, 1990). In particular, starting from the well-known Nadaraya-Watson kernel estimator developed for error-free case (Nadaraya, 1964; Watson, 1964), Fan and Truong (1993) formulated the local constant estimator of a regression function using the deconvolution kernel technique. Generalization of this estimator to local polynomial estimators of higher orders was achieved by Delaigle et al. (2009) via introducing a complex transform of the kernel function. This transform is the key step that allows for the extension from the zero-order to a higher-order local polynomial estimator in error-in-variables problems.

In this study, we propose a new estimator motivated by an identity that relates the Fourier transform of the functions to be estimated to the Fourier transform of the counterpart naive functions. Here, a naive estimate refers to an estimate that results from replacing the unobserved true covariate one would use in the absence of measurement error with the error-contaminated observed covariate. This identity and the new estimator are presented in Section 2, following a brief review of the estimator in Delaigle et al. (2009), which we refer to as the DFC estimator henceforth. Sections 3, 4, and 5 are devoted to studying the asymptotic distribution of the new estimator. The finite sample performance of our estimator is demonstrated in comparison with the DFC estimator in Section 6. We summarize our contribution and findings, discuss some practical issues in Section 7. All appendices referenced in this article are provided in the Supplementary Materials.

2 Existing and proposed estimators

Denote by $\{(Y_{j},W_{j}),\,j=1,\ldots,n\}$ a random sample of size $n$ from a regression model with additive measurement error in the covariate specified as follow,

[TABLE]

where $X_{j}$ is the unobserved true covariate following a distribution with probability density function (pdf) $f_{\hbox{\tiny$ X $}}(x)$ , $U_{j}$ is the measurement error, assumed to be independent of $(X_{j},Y_{j})$ and follow a known distribution with pdf $f_{\hbox{\tiny$ U $}}(u)$ , $W_{j}$ is the error-contaminated observed covariate following a distribution with pdf $f_{\hbox{\tiny$ W $}}(w)$ , for $j=1,\ldots,n$ . The problem of interest in this study is to estimate the regression function, $m(x)$ , based on the observed data. The index $j$ is often suppressed in the sequel when a generic observation or random variable is referenced.

2.1 The DFC estimator

In the absence of measurement error, the well-known local polynomial estimator of order $p$ for $m(x)$ is given by (Fan and Gijbels, 1996, Chapter 3)

[TABLE]

where $\mbox{\boldmath$ e $}_{1}$ is a $(p+1)\times 1$ vector with 1 in the first entry and 0 in the remaining $p$ entries,

[TABLE]

and $\mathbf{T}_{n}=(T_{n,0}(x),\ldots,T_{n,p}(x))^{\mathrm{\scriptscriptstyle T}}$ , in which

[TABLE]

and $K_{h}(x)=h^{-1}K(x/h)$ with $K(\cdot)$ being a symmetric kernel function and $h$ being the bandwidth.

In the presence of measurement error, one could replace $X_{j}$ with $W_{j}$ for $j=1,\ldots,n$ in the above local polynomial estimator, yielding a naive estimator of $m(x)$ , denoted by $\hat{m}^{*}(x)$ . Clearly, $\hat{m}^{*}(x)$ is merely a sensible estimator of the naive regression function $m^{*}(x)=E(Y|W=x)$ . Following the rationale behind the corrected score method (Carroll et al., 2006, Section 7.4), Delaigle et al. (2009) sought some function, denoted by $L_{\ell}(\cdot)$ , that satisfies

[TABLE]

where $L_{\ell,h}(x)=h^{-1}L_{\ell}(x/h)$ . The authors derived such function via solving the Fourier transform version of (4), and showed that $L_{\ell}(x)=x^{-\ell}K_{\hbox{\tiny$ U $},\ell}(x)$ , where

[TABLE]

in which $i=\sqrt{-1}$ , $\phi^{(\ell)}_{\hbox{\tiny$ K $}}(t)$ is the $\ell$ -th derivative of $\phi_{\hbox{\tiny$ K $}}(t)=\int e^{itx}K(x)dx$ , and $\phi_{\hbox{\tiny$ U $}}(x)$ is the characteristic function of $U$ . Throughout this article, $\phi_{g}$ denotes the Fourier transform (characteristic function) of $g$ if $g$ is a function (random variable). All integrals in this article integrate over either the entire real line or a subset of it that guarantees the existence of relevant integrals, and we will make remarks on such subset whenever it is needed for clarity. The DFC estimator is given by $\hat{m}_{\hbox{\tiny DFC}}(x)=\mbox{\boldmath$ e $}_{1}^{\mathrm{\scriptscriptstyle T}}\hat{\mathbf{S}}_{n}^{-1}\hat{\mathbf{T}}_{n}$ , where $\hat{\mathbf{S}}_{n}$ and $\hat{\mathbf{T}}_{n}$ are similarly defined as $\mathbf{S}_{n}$ and $\mathbf{T}_{n}$ in (2) but with the elements in the matrices given by

[TABLE]

The transform of $K$ defined in (5) is a natural extension of the transform used in the deconvolution density estimator (Stefanski and Carroll, 1990) and the local constant estimator (Fan and Truong, 1993) of $m(x)$ under the setting of (1). In particular, the estimator in Fan and Truong (1993) is a special case of the DFC estimator with $p=0$ .

2.2 The proposed estimator

Deviating from the theme of deconvolution kernel and its extension in (5), we propose a new estimator that more directly exploits the naive inference as a whole. This direct use of the naive inference is motivated by the following result proved in Delaigle (2014), $m^{*}(w)f_{\hbox{\tiny$ W $}}(w)=(mf_{\hbox{\tiny$ X $}})*f_{\hbox{\tiny$ U $}}(w)$ , where $(mf_{\hbox{\tiny$ X $}})*f_{\hbox{\tiny$ U $}}(w)$ is the convolution given by $\int m(x)f_{\hbox{\tiny$ X $}}(x)f_{\hbox{\tiny$ U $}}(w-x)dx$ . Applying Fourier transform on both sides of this identity, one has

[TABLE]

where $\phi_{m^{*}f_{\hbox{\tiny$ W $}}}(t)$ is the Fourier transform of $m^{*}(w)f_{\hbox{\tiny$ W $}}(w)$ and $\phi_{mf_{\hbox{\tiny$ X $}}}(t)$ is the Fourier transform of $m(x)f_{\hbox{\tiny$ X $}}(x)$ . Immediately following (6), by the Fourier inversion theorem, one has $m(x)f_{\hbox{\tiny$ X $}}(x)=(2\pi)^{-1}\int e^{-itx}\phi_{m^{*}f_{\hbox{\tiny$ W $}}}(t)/\phi_{\hbox{\tiny$ U $}}(t)dt$ . This motivates our local polynomial estimator of order $p$ for $m(x)$ given by, assuming the relevant Fourier transforms well defined,

[TABLE]

where $\hat{f}_{\hbox{\tiny$ X $}}(x)$ is the deconvolution kernel density estimator of $f_{\hbox{\tiny$ X $}}(x)$ in Stefanski and Carroll (1990), and $\phi_{\hat{m}^{*}\hat{f}_{\hbox{\tiny$ W $}}}(t)$ is the Fourier transform of $\hat{m}^{*}(w)\hat{f}_{\hbox{\tiny$ W $}}(w)$ , in which $\hat{m}^{*}(w)$ is the $p$ -th order local polynomial estimator of $m^{*}(w)$ , and $\hat{f}_{\hbox{\tiny$ W $}}(w)$ is the regular kernel density estimator of $f_{\hbox{\tiny$ W $}}(w)$ (Fan and Gijbels, 1996, Section 2.7.1), i.e., the naive estimator of $f_{\hbox{\tiny$ X $}}(\cdot)$ . Note that, although we consider a scalar covariate for notational simplicity in this article, the estimators on the right-hand side of (7) have their multivariate counterparts to account for multivariate covariates. Hence, with multivariate (inverse) Fourier transform used in (7), the proposed estimator becomes applicable to regression models with multiple covariates. Moreover, if some of these covariates are measured without error, one may reflect this in $\phi_{\hbox{\tiny$ U $}}(t)$ by viewing that the elements in the multivariate $U$ corresponding to the error-free covariates follow a degenerate distribution with all probability mass on zero.

By its appearance, the new estimator in (7) results from applying an integral transform similar to that in (5) on the naive product $\hat{m}^{*}(\cdot)\hat{f}_{\hbox{\tiny$ W $}}(\cdot)$ rather than on $K$ . It can be shown (via straightforward algebra omitted here) that, when $p=0$ , this new estimator is the same as the DFC estimator, both reducing to the local constant estimator in Fan and Truong (1993). Other than this special case, $\hat{m}_{\hbox{\tiny HZ}}(x)$ differs from $\hat{m}_{\hbox{\tiny DFC}}(x)$ in general.

2.3 Preamble for asymptotic analyses

The majority of the theoretical development presented in Delaigle et al. (2009) revolves around properties of the transformed kernel, $K_{\hbox{\tiny$ U $},\ell}(x)$ , which is not surprising as $K_{\hbox{\tiny$ U $},\ell}(x)$ is everywhere in the building blocks of their estimator. Because of the close tie between our proposed estimator and the naive estimators, much of our theoretical development builds upon well established results for kernel-based estimators in the absence of measurement error. This can be better appreciated by interchanging the order of the two integrals in (7), assuming that $\phi_{\hat{m}^{*}\hat{f}_{\hbox{\tiny$ W $}}}(t)$ is compactly supported on $I_{t}$ (to allow the interchange), $\hat{m}_{\hbox{\tiny HZ}}(x)\hat{f}_{\hbox{\tiny$ X $}}(x)=\int\hat{m}^{*}(w)\hat{f}_{\hbox{\tiny$ W $}}(w)(2\pi)^{-1}\int_{I_{t}}e^{-it(x-w)}/\phi_{\hbox{\tiny$ U $}}(t)\,dtdw$ . This identity can be re-expressed more succinctly as

[TABLE]

where $\mathcal{A}(w)=\hat{m}^{*}(w)\hat{f}_{\hbox{\tiny$ W $}}(w)$ , $\mathcal{B}(x)=\hat{m}_{\hbox{\tiny HZ}}(x)\hat{f}_{\hbox{\tiny$ X $}}(x)$ , and $D(s)=(2\pi)^{-1}\int_{I_{t}}e^{-its}/\phi_{\hbox{\tiny$ U $}}(t)dt$ . Note that $\mathcal{A}(w)$ is a random process depending on the native estimators $\hat{m}^{*}(w)$ and $\hat{f}_{\hbox{\tiny$ W $}}(w)$ , and $\mathcal{B}(x)$ results from convoluting $\mathcal{A}(w)$ and the non-random function $D(s)$ . A natural question is, given the asymptotic properties of $\mathcal{A}(w)$ , what can be deduced from the convolution of $\mathcal{A}$ and $D$ . More specifically, we are interested to know how the moments of $\mathcal{A}$ compare with those of $\mathcal{B}$ , and whether a Gaussian process on $\mathcal{A}(w)$ implies another Gaussian process on $\mathcal{B}(x)$ . These questions about random process convolution are of mathematical interest in their own rights besides being the key to understanding $\hat{m}_{\hbox{\tiny HZ}}(x)$ .

Here we provide two definitions of smoothness of a distribution (Fan, 1991a; Fan,, 1991b; Fan, 1991c) and two sets of conditions to be referenced later.

Definition 1.

The distribution of $U$ is ordinary smooth of order $b$ if

[TABLE]

for some positive constants $b$ and $c$ .

Definition 2.

The distribution of $U$ is super smooth of order $b$ if

[TABLE]

for some positive constants $d_{0}$ , $d_{1}$ , $d_{2}$ , $b$ , $b_{0}$ and $b_{1}$ .

Condition O:

For $\ell=0,\ldots,2p+1$ , $\|\phi^{(\ell)}_{\hbox{\tiny$ K $}}(t)\|_{\infty}<\infty$ and $\int(|t|^{b}+|t|^{b-1})|\phi^{(\ell)}_{\hbox{\tiny$ K $}}(t)|dt<\infty$ . For $0\leq\ell_{1},\ell_{2}\leq 2p$ , $\int|t|^{2b}|\phi^{(\ell_{1})}_{\hbox{\tiny$ K $}}(t)||\phi^{(\ell_{2})}_{\hbox{\tiny$ K $}}(t)|dt<\infty$ . And, $\|\phi^{\prime}_{\hbox{\tiny$ U $}}(t)\|_{\infty}<\infty$ .

Condition S:

For $\ell=0,\ldots,2p$ , $\|\phi^{(\ell)}_{\hbox{\tiny$ K $}}(t)\|_{\infty}<\infty$ , and $\phi_{\hbox{\tiny$ K $}}(t)$ is supported on $[-1,1]$ .

In addition, we assume $f_{\hbox{\tiny$ X $}}(x)>0$ and $\phi_{\hbox{\tiny$ U $}}(t)$ is an even function that never vanishes. We reach the convolution form in (8) under the assumption that $\phi_{\hat{m}^{*}\hat{f}_{\hbox{\tiny$ W $}}}(t)$ is compactly supported on $I_{t}$ , where $I_{t}$ is a region that guarantees $D(s)$ well defined. This assumption can be easily satisfied by choosing a kernel of which the Fourier transform has a finite support. Even without this assumption the asymptotic properties presented in the following three sections still hold, although some of the proof need to be revised to use the estimator of its original form in (7). While acknowledging the overlap between the regularity conditions needed in our asymptotic analyses and those required for the DFC estimator, we also assume existence of the Fourier transform of $m^{*}(\cdot)f_{\hbox{\tiny$ W $}}(\cdot)$ and that of $m(\cdot)f_{\hbox{\tiny$ X $}}(\cdot)$ in (6). We next dissect the asymptotic bias, variance and normality of $\hat{m}_{\hbox{\tiny HZ}}(x)$ .

3 Asymptotic bias

We provide the derivations of the asymptotic bias of $\hat{m}_{\hbox{\tiny HZ}}(x)$ for $p\geq 0$ in Appendix A. To better apprehend the distinction between our bias results and those of $\hat{m}_{\hbox{\tiny DFC}}(x)$ , we present a brief derivation of the bias when $p=1$ in this section.

3.1 Dominating bias when $p=1$

Define $\mu_{\ell}=\int u^{\ell}K(u)\,du$ , for $\ell=0,1,\ldots,2p$ . Let $A(w)=m^{*}(w)f_{\hbox{\tiny$ W $}}(w)$ and $B(x)=m(x)f_{\hbox{\tiny$ X $}}(x)$ be the non-random counterparts of $\mathcal{A}(w)$ and $\mathcal{B}(x)$ in (8), respectively. Then, like (8), we have $B(x)=(A*D)(x)$ .

By Theorem 2.1 in Stefanski and Carroll (1990), the deconvolution density estimator $\hat{f}_{\hbox{\tiny$ X $}}(x)$ is a consistent estimator of $f_{\hbox{\tiny$ X $}}(x)$ . Noting that $\hat{f}_{\hbox{\tiny$ X $}}(x)/f_{\hbox{\tiny$ X $}}(x)$ converges to one in probability, we derive the dominating bias via elaborating $E[\{\hat{m}_{\hbox{\tiny HZ}}(x)-m(x)\}\hat{f}_{\hbox{\tiny$ X $}}(x)/f_{\hbox{\tiny$ X $}}(x)|\mathbb{W}]$ , which is equal to

[TABLE]

where $\mathbb{W}=(W_{1},\ldots,W_{n})$ , and

[TABLE]

To derive $E\{\mathcal{B}(x)|\mathbb{W}\}$ in (9), we invoke the following two results for kernel-based estimators in the absence of measurement error (Fan and Gijbels, 1996, Chapter 3),

[TABLE]

Following these results, one can show that

[TABLE]

where $M(w)=m^{*}(w)f^{(2)}_{\hbox{\tiny$ W $}}(w)+m^{*(2)}(w)f_{\hbox{\tiny$ W $}}(w)$ . Then, assuming interchangeability of expectation and integration, (8) and (11) imply

[TABLE]

Finally, by (10) and (12), (9) reduces to

[TABLE]

which reveals the dominating bias of $\hat{m}_{\hbox{\tiny HZ}}(x)$ of order $h^{2}$ .

Different from Delaigle et al. (2009), we directly use the existing results associated with estimators in the absence of measurement error for deriving the asymptotic bias.

3.2 Comparison with the bias of the DFC estimator

By Theorem 3.2 in Delaigle et al. (2009), the dominating bias of $\hat{m}_{\hbox{\tiny DFC}}(x)$ is the same as that of $\hat{m}(x)$ , which is $\mu_{2}h^{2}m^{(2)}(x)/2$ when $p=1$ . To make the comparison of dominating bias more tractable, we consider regression functions in the form of a polynomial of order $r$ , $m(x)=\sum_{k=0}^{r}\beta_{k}x^{k}$ . Furthermore, we set $X\sim N(0,1)$ and $U\sim N(0,\sigma^{2}_{u})$ , resulting in a reliability ratio (Carroll et al., 2006, Section 3.2.1) of $\lambda=1/(1+\sigma_{u}^{2})$ .

Under this setting, the dominating bias in (13) can be derived explicitly. Instead of directly comparing the dominating bias associated with the two estimators, we focus on studying the number of $x$ ’s at which each dominating bias is zero. Note that $m^{(2)}(x)$ is a polynomial of order $r-2$ provided that $r\geq 2$ , and thus the dominating bias of $\hat{m}_{\hbox{\tiny DFC}}(x)$ is zero at no more than $r-2$ $x$ ’s. In contrast, we show in Appendix A that the dominating bias in (13) reduces to a polynomial of order $r$ , suggesting that the dominating bias of $\hat{m}_{\hbox{\tiny HZ}}(x)$ can be zero at $r$ $x$ ’s. Suppose that the bias of each estimator is continuous in $x$ , which is a realistic assumption in many applications. Then having two more roots to the equation, $\textrm{dominating bias}=0$ , for $\hat{m}_{\hbox{\tiny HZ}}(x)$ indicates that the proposed estimator can have two more regions in the support of $m(x)$ within which $\hat{m}_{\hbox{\tiny HZ}}(x)$ is less biased than $\hat{m}_{\hbox{\tiny DFC}}(x)$ , where each region is a neighborhood of some root. For example, when $r=2$ , clearly the dominating bias of $\hat{m}_{\hbox{\tiny DFC}}(x)$ can never be zero. It is shown in Appendix A that, the dominating bias of $\hat{m}_{\hbox{\tiny HZ}}(x)$ is zero at the roots of the equation $2(\lambda-1)\beta_{2}x^{2}+(\lambda-1)\beta_{1}x+(2\lambda^{2}-2\lambda+1)\beta_{2}=0$ . With $\lambda\in(0,1)$ , one can easily show that this quadratic equation has two roots.

4 Asymptotic variance

Because

[TABLE]

we focus on deriving $\textrm{Var}\{\mathcal{B}(x)|\mathbb{W}\}$ in order to study the asymptotic variance of $\hat{m}_{\hbox{\tiny HZ}}(x)$ . Detailed derivations are provided in Appendix B, which consists of five steps. In what follows, we provide a sketch of the derivations, where we highlight the connection between our results and the counterpart results in the absence of measurement error, and how our derivations differ from and relate to those in Delaigle et al. (2009).

4.1 Derivations of $\textrm{Var}\{\mathcal{B}(x)|\mathbb{W}\}$

First, we deduce from (8) that $\textrm{Var}\{\mathcal{B}(x)|\mathbb{W}\}$ can be formulated as an iterative convolution of the covariance of $\mathcal{A}(w)$ as follows,

[TABLE]

Since $\hat{f}_{\hbox{\tiny$ W $}}(w)/f_{\hbox{\tiny$ W $}}(w)$ converges to 1 in probability under regularity conditions,

[TABLE]

Second, we view $\hat{m}^{*}(w)$ as a weighted least squares estimator (Fan and Gijbels, 1996, page 58), and show that

[TABLE]

where $\mbox{\boldmath$ \Sigma $}_{12}=\textrm{diag}\{K_{h}(W_{1}-w_{1})K_{h}(W_{1}-w_{2})\nu^{2}(W_{1}),\ldots,K_{h}(W_{n}-w_{1})K_{h}(W_{n}-w_{2})\nu^{2}(W_{n})\}$ , $\nu^{2}(w)=\textrm{Var}(Y|W=w)$ , and, for $k=1,2$ , $\mathbf{W}_{k}=\textrm{diag}\{K_{h}(W_{1}-w_{k}),\ldots,K_{h}(W_{n}-w_{k})\}$ ,

[TABLE]

Then we approximate the random quantities on the right hand side of (17) to establish that

[TABLE]

where $\mathbf{S}=(\mu_{\ell_{1}+\ell_{2}})_{0\leq\ell_{1},\ell_{2}\leq p}$ and $\mathbf{S}^{*}_{\hbox{\tiny$ W $},h}=(\xi_{\ell_{1},\ell_{2}}((w_{1}-w_{2})/2,h))_{0\leq\ell_{1},\ell_{2}\leq p}$ , in which, for $\ell_{1},\ell_{2}=0,1,\ldots,p$ ,

[TABLE]

The result in (LABEL:eq:covmm2) is a counterpart result of $\textrm{Var}\{\hat{m}(x)|\mathbb{X}\}$ , where $\mathbb{X}=(X_{1},\ldots,X_{n})$ (Fan and Gijbels, 1996, equation (3.7)).

Third, substituting (LABEL:eq:covmm2) in (16) gives

[TABLE]

where $\gamma(w)=\nu^{2}(w)f_{\hbox{\tiny$ W $}}(w)$ . And plugging (20) in (15) yields

[TABLE]

Note that, among the matrices in (LABEL:eq:varB2), only $\mathbf{S}^{*}_{\hbox{\tiny$ W $},h}$ depends on $w_{1}$ and $w_{2}$ , of which the entries are $\xi_{\ell_{1},\ell_{2}}(w,h)$ in (19).

The fourth step is to derive

[TABLE]

which is equal to

[TABLE]

Define $\kappa_{\ell_{1},\ell_{2}}(h)=\int K_{\hbox{\tiny$ U $},\ell_{1}}(v)K_{\hbox{\tiny$ U $},\ell_{2}}(v)\,dv$ to highlight the dependence of this integral on $h$ (since $K_{\hbox{\tiny$ U $},\ell}(v)$ depends on $h$ according to (5)), and define matrix $\mathbf{K}(h)=(\kappa_{\ell_{1},\ell_{2}}(h))_{0\leq\ell_{1},\ell_{2}\leq p}$ . To this end, we can conclude that, by (LABEL:eq:varB2) and (23),

[TABLE]

This is where the path of our derivations meets that of Delaigle et al. (2009), as now we need to incorporate the properties of $\kappa_{\ell_{1},\ell_{2}}(h)$ as $n\to\infty$ (and thus $h\to 0$ ), for an ordinary smooth $U$ and for a super smooth $U$ , respectively. These properties are thoroughly studied in Delaigle et al. (2009) and summarized in their Lemmas B.4, B.6, B.9, which are restated in Appendix B for completeness. Equipped with these lemmas, we are ready to move on to the fifth step of the derivations.

By Lemma B.4, for an ordinary smooth $U$ , under Condition O, $\kappa_{\ell_{1},\ell_{2}}(h)=h^{-2b}\eta_{\ell_{1},\ell_{2}}+o\left(h^{-2b}\right)$ as $n\to\infty$ , where

[TABLE]

in which $b$ and $c$ are constants in Definition 1. Define $\mathbf{S}^{*}=(\eta_{\ell_{1},\ell_{2}})_{0\leq\ell_{1},\ell_{2}\leq p}$ , then $\mathbf{K}(h)=h^{-2b}\mathbf{S}^{*}+o\left(h^{-2b}\right)$ , and thus (24) implies (25) in Theorem 4.1 below. For a super smooth $U$ , by Lemma B.9, under Condition S, $|\kappa_{\ell_{1},\ell_{2}}(h)|\leq Ch^{2b_{2}}\exp(2h^{-b}/d_{2})$ , where $b_{3}=b_{0}I(b_{0}<0.5)$ , $b_{0}$ , $b$ and $d_{2}$ are constants in Definition 2, and $C$ is some generic non-negative finite constant appearing in Lemma B.8 in Delaigle et al. (2009). This leads to (26) in Theorem 4.1 below, which serves as a recap of our findings in this subsection.

Theorem 4.1.

When $U$ is ordinary smooth of order $b$ , under Condition O, if $nh^{2b+1}\to\infty$ , then

[TABLE]

When $U$ is super smooth of order $b$ , under Condition S, if $n\exp(2h^{b}/d_{2})h^{1-2b_{3}}\to\infty$ , then $\textrm{Var}\left\{\hat{m}_{\hbox{\tiny HZ}}(x)|\mathbb{W}\right\}$ is bounded from above by

[TABLE]

4.2 Comparison with the variance of the DFC estimator

By Theorem 3.1 in Delaigle et al. (2009), when the distribution of $U$ is ordinary smooth, under Condition O, if $nh^{2b+1}\to\infty$ , then

[TABLE]

where $\tau^{2}(x)=\textrm{Var}(Y|X=x)$ . Note that the asymptotic variance results in Theorem 4.1, as well as the asymptotic bias results in Section 3, are conditional on $\mathbb{W}$ whereas (27) is an unconditional variance. The conditional arguments in our moment analysis originate from the direct use of asymptotic moments of the local polynomial estimator of a regression function in the absence of measurement error, which are conditional moments given $\mathbb{X}$ (Ruppert and Wand, 1994). As pointed out in Ruppert and Wand (1994, Remark 1, page 1351), because the dominating terms in these conditional moments are free of $\mathbb{W}$ , they still have the interpretation of unconditional dominating moments. Once this is clear, one can see that the difference between the dominating variance in (27) and that in (25) lies in the distinction between $(\tau^{2}f_{\hbox{\tiny$ X $}})*f_{\hbox{\tiny$ U $}}(x)$ and $\gamma(x)$ . It is shown in Appendix B that $\gamma(x)=(\tau^{2}f_{\hbox{\tiny$ X $}})*f_{\hbox{\tiny$ U $}}(x)+f_{\hbox{\tiny$ W $}}(x)\textrm{Var}\{m(X)|W=x\}\geq(\tau^{2}f_{\hbox{\tiny$ X $}})*f_{\hbox{\tiny$ U $}}(x)$ . Hence, for an ordinary smooth $U$ , the dominating variance of $\hat{m}_{\hbox{\tiny HZ}}(x)$ is greater than or equal to that of $\hat{m}_{\hbox{\tiny DFC}}(x)$ . In Section 6, we will see how this large sample comparison takes effect in the comparison of finite sample variances associated with the two estimators.

5 Asymptotic normality

Under the conditions stated in Theorem 4.1, we show the asymptotic normality of $\hat{m}_{\hbox{\tiny HZ}}(x)$ in Appendix C. The logic behind the proof is similar to that in Delaigle et al. (2009). More specifically, we first approximate $\mathcal{B}(x)-B(x)$ via an average, $n^{-1}\sum_{j=1}^{n}\tilde{U}_{n,j}(x)$ , where $\{\tilde{U}_{n,j}(x)\}_{j=1}^{n}$ is a set of independent and identically distributed (i.i.d.) random variables at each fixed $x$ . Then we show that, for some positive constant $\eta$ ,

[TABLE]

which is a sufficient condition for

[TABLE]

This in turn leads to the asymptotic normality of $\mathcal{B}(x)-B(x)$ , and further suggests the asymptotic normality of $\hat{m}_{\hbox{\tiny HZ}}(x)$ .

To this end, we have answered the questions raised in Section 2.3 regarding the properties of a random process $\mathcal{B}(x)$ resulting from the convolution of another random process $\mathcal{A}(w)$ and the non-random function $D(s)$ . We now see that the first two moments of $\mathcal{B}(x)$ are closely related to the the first two moments of $\mathcal{A}(w)$ via similar convolutions. Also, if $\mathcal{A}(w)$ is asymptotically Gaussian, then under mild regularity conditions, $\mathcal{B}(x)$ is also asymptotically Gaussian, and many of these conditions can be satisfied by choosing an appropriate kernel function in $\mathcal{A}(w)$ .

6 Implementation and finite sample performance

After a thorough investigation of asymptotic properties of the proposed estimator, we are now in the position to look into its finite sample performance. By the construction of $\hat{m}_{\hbox{\tiny HZ}}(x)$ , we need to evaluate continuous Fourier transforms (CFT) and inverse CFTs. In this section we first describe the algorithm for these evaluations, then discuss bandwidth selection. Finally, we present experiments to compare our estimator with the DFC estimator under four settings where we simulate data from the true models with our design of $m(x)$ , and under another setting where error-prone data are simulated from a motorcycle-crash data set with the underlying $m(x)$ unknown.

6.1 Numerical evaluations

For an integrable function that maps the real line onto the complex space, $f:\mathbb{R}\rightarrow\mathbb{C}$ , define the CFT of $f$ as

[TABLE]

In our study, we first approximate the CFT via a discrete Fourier transform (DFT), then we use the fast Fourier transform algorithm (FFT, Bailey and Swarztrauber, 1994) to evaluate the corresponding DFT. For a sequence of $G$ complex values $\boldsymbol{z}=\{z_{0},\ldots,z_{G-1}\}$ , the DFT is defined as $D_{k}[\boldsymbol{z}]=\sum_{g=0}^{G-1}z_{g}e^{-i2\pi kg/G}$ , for $k=0,\ldots,G-1$ , which can be easily evaluated using FFT in standard statistical software. The approximation of CFT using DFT is sketched next.

To prepare for the approximation, one first specifies a sequence of input values and then specifies a sequence of output values accordingly. More specifically, let $\{s_{g}=(g-G/2)\alpha_{1},\,g=0,1,\ldots,G-1\}$ be the input values for the CFT, where $G/2$ is an even integer, $\alpha_{1}=a/G$ is the increment, and $a$ is chosen such that (28) can be well approximated by $\int_{-a/2}^{a/2}f(s)e^{-its}ds$ . With the input values specified, the corresponding output values are $\{t_{k}=(k-G/2)\alpha_{2},\,k=0,1,\ldots,G-1\}$ , where $\alpha_{2}=2\pi/(G\alpha_{1})$ . With the input and output values ready, we approximate the CFT as follows, for $k=0,1,\ldots,G-1$ ,

[TABLE]

This approximation converges to the truth very rapidly provided that the Fourier coefficients of $f$ rapidly decrease (Davis and Rabinowitz, 1984). The values of $\alpha_{1}$ and $\alpha_{2}$ determine the resolution of the input and output results, respectively. Comparable resolutions in $s$ and $t$ are typically desired, which can be achieved by setting $\alpha_{1}=\alpha_{2}=\sqrt{2\pi/G}$ . A larger $G$ tends to yield a more accurate approximation of the CFT. Bailey and Swarztrauber (1994) computed the CFT of the standard normal density function using $G=2^{16}$ and achieved the root-mean-squared error of order $10^{-16}$ . In the simulations presented in this article, we set $G=2^{16}$ , resulting in $\alpha_{1}=\alpha_{2}\approx 0.01$ and $a\approx 641.7$ . In additional simulation studies where we used a larger $G$ , we found the results essentially unchanged. This algorithm can be similarly applied to approximate the inverse CFT.

6.2 Bandwidth selection

It has been well acknowledged that the choice of bandwidth is crucial in kernel-based nonparametric estimation. In our study, we adopt the method of cross-validation (CV) in conjunction with simulation extrapolation (SIMEX, Carroll et al., 2006, Chapter 5) as proposed by Delaigle and Hall (2008). To implement this method, one first randomly divides the observed data, $\{(Y_{j},W_{j})\}_{j=1}^{n}$ , into $\delta$ subsamples of (nearly) equal size. Denote by $\mathcal{D}_{k}$ the $k$ th subsample, and $I_{k}$ the set of subject indices corresponding to the observations in $\mathcal{D}_{k}$ , for $k=1,\ldots,\delta$ . Then one carries out two rounds of $\delta$ -fold cross validation using further contaminated data. In the first round, one generates further contaminated data according to $W_{b,j}^{*}=W_{j}+U_{b,j}^{*}$ , for $b=1,\ldots,B$ and $j=1,\ldots,n$ , where $\{U_{b,j}^{*},b=1,\ldots,B\}_{j=1}^{n}$ are i.i.d. according to $f_{\hbox{\tiny$ U $}}(u)$ . Viewing $\mathbb{W}$ as the “unobserved true” covariate values, and $m^{*}(x)=E(Y|W=x)$ as the target regression function to be estimated using the “observed” data, $\{(Y_{j},W^{*}_{b,j})\}_{j=1}^{n}$ , for $b=1,\ldots,B$ , one may use the proposed method to estimate $m^{*}(x)$ . Denote this estimator by $\hat{m}_{\hbox{\tiny HZ}}^{*}(x)$ . Now one carries out the $\delta$ -fold cross validation to choose a bandwidth for estimating $m^{*}(x)$ that minimizes

[TABLE]

where $\hat{m}_{\hbox{\tiny HZ},b}^{*(-k)}(x)$ is the estimate $\hat{m}_{\hbox{\tiny HZ}}^{*}(x)$ computed using the further contaminated data excluding $\mathcal{D}_{k}$ , for $k=1,\ldots,\delta$ , and $w(\cdot)$ is a suitable weight function. Define $\hat{h}_{1}=\textrm{argmin}_{h>0}\textrm{CV}_{1}(h)$ . In the second round of $\delta$ -fold cross validation, another set of further contaminated data is produced according to $W_{b,j}^{**}=W_{b,j}^{*}+U_{b,j}^{**}$ , where $\{U_{b,j}^{**},b=1,\ldots,B\}_{j=1}^{n}$ are i.i.d. according to $f_{\hbox{\tiny$ U $}}(u)$ , for $b=1,\ldots,B$ and $j=1,\ldots,n$ , also independent of $\{U_{b,j}^{*},b=1,\ldots,B\}_{j=1}^{n}$ . Similar to the first round, one views $\mathbb{W}^{*}=\{W_{b,j}^{*},b=1,\ldots,B\}_{j=1}^{n}$ as the “unobserved true” covariate values, and considers estimating another target regression function $m^{**}(x)=E(Y|W^{*}=x)$ using the proposed method based on the “observed” data $\{(Y_{j},W^{**}_{b,j})\}_{j=1}^{n}$ , for $b=1,\ldots,B$ . Denote this estimator by $\hat{m}_{\hbox{\tiny HZ}}^{**}(x)$ . To select a bandwidth for estimating $m^{**}(x)$ , one minimizes the following criterion with respect to $h$ ,

[TABLE]

where $\hat{m}_{\hbox{\tiny HZ},b}^{**(-k)}(x)$ is the estimate $\hat{m}_{\hbox{\tiny HZ}}^{**}(x)$ computed using the data $\{(Y_{j},W^{**}_{b,j})\}_{j=1}^{n}$ excluding $\mathcal{D}_{k}$ , for $k=1,\ldots,\delta$ . Define $\hat{h}_{2}={\rm argmin}_{h>0}~{}\textrm{CV}_{2}(h)$ . Finally, one sets $\hat{h}=\hat{h}_{1}^{2}/\hat{h}_{2}$ as the bandwidth used in $\hat{m}_{\hbox{\tiny HZ}}(x)$ for estimating $m(x)$ based on the original observed data $\{(Y_{j},W_{j})\}_{j=1}^{n}$ .

This bandwidth selection procedure can be computationally cumbersome because, first, in search of $\hat{h}_{1}$ and $\hat{h}_{2}$ , one needs to evaluate $\textrm{CV}_{1}(h)$ and $\textrm{CV}_{2}(h)$ on a fine grid of candidate bandwidths; second, as recommended in most SIMEX applications, one needs a $B$ not too small in order to control the Monte Carlo variability when generating further contaminated data. To reduce the computational burden, we propose a procedure to refine the search region of $h$ . Take the first round of cross validation described above as an example. Recall that, during this round, $\mathbb{W}$ is viewed as the unobserved true covariate values whereas $\mathbb{W}^{*}$ is the error-contaminated version of the true covariate values. To narrow down the search region of $h$ when minimizing $\textrm{CV}_{1}(h)$ , we first find an initial bandwidth, $\tilde{h}_{1}$ . In particular, we obtain $\tilde{h}_{1}$ by minimizing the following approximated mean integrated squared error (MISE) for the deconvolution kernel density estimator of $f_{\hbox{\tiny$ W $}}(w)$ using $\mathbb{W}^{*}$ (Stefanski and Carroll, 1990),

[TABLE]

where $\int\{f^{\prime\prime}_{\hbox{\tiny$ W $}}(w)\}^{2}dw$ can be easily estimated using $\mathbb{W}$ . After $\tilde{h}_{1}$ is found, we search for $\hat{h}_{1}$ across $L$ grid points within $[0.2\tilde{h}_{1},\,2\tilde{h}_{1}]$ . This strategy is motivated by the theoretical finding that the deconvolution kernel regression estimators have the same optimal rates as the deconvolution kernel density estimators. In our extensive trial-and-error simulation experiments under the model settings described in Section 6.3, we considered a wider search region that encompasses $[0.2\tilde{h}_{1},\,2\tilde{h}_{1}]$ , and we observed all selected $h$ indeed fell in the above refined search region. Similarly, in the second round of cross validation where we search for $\hat{h}_{2}$ across $L$ grid points within $[0.2\tilde{h}_{2},\,2\tilde{h}_{2}]$ , where $\tilde{h}_{2}$ is chosen by minimizing (29), but, different from the first round, now $\int\{f^{\prime\prime}_{\hbox{\tiny$ W $}}(w)\}^{2}dw$ there is replaced by $\int\{f^{\prime\prime}_{\hbox{\tiny$ W^{*} $}}(w)\}^{2}dw$ , which can be easily estimated using $\mathbb{W}^{*}$ .

One may legitimately question our choice of the multiplicative factors, 0.2 and 2, in the recommended refined search region of $h$ . For a given application, the safe and conservative way to choose $h$ usually involves some trial-and-error. If the optimal $h$ found within this refined region is too close to one of the boundaries, one may consider pushing that end of the region out slightly and adjusting the search region accordingly. Using the refined search region of $h$ at each round of cross validation, we also observe in simulations that one can even use a much smaller $B$ without noticeably compromising the quality of $\hat{m}_{\hbox{\tiny HZ}}(x)$ . This refined bandwidth selection procedure and the algorithm for approximating CFT and inverse CFT described in Section 6.1 are implemented in an R package called lpme created and maintained by the second author, which provides both $\hat{m}_{\hbox{\tiny HZ}}(x)$ and $\hat{m}_{\hbox{\tiny DFC}}(x)$ .

6.3 Simulation study

In the simulation experiments, we compare realizations of $\hat{m}_{\hbox{\tiny HZ}}(x)$ and $\hat{m}_{\hbox{\tiny DFC}}(x)$ (with $p=1$ ) obtained under the following four model configurations:

(C1)

$[Y|X=x]\sim N(m(x),0.2^{2})$ , where $m(x)=2x\exp(-10x^{4}/81)$ , $X=0.8X_{1}+0.2X_{2}$ , $X_{1}\sim f_{\hbox{\tiny$ X_{1} $}}(x)=0.1875x^{2}I_{[-2,2]}(x)$ , $X_{2}\sim\textrm{uniform}(-1,1)$ , and $U\sim{\rm Laplace}(0,\sigma_{u}/\sqrt{2})$ . 2. (C2)

$[Y|X=x]\sim N(m(x),0.5^{2})$ , where $m(x)=(x+x^{2})/4$ , $X\sim N(0,1)$ , and $U\sim N(0,\sigma_{u}^{2})$ . 3. (C3)

$[Y|X=x]\sim N(m(x),0.2^{2})$ , where $m(x)=x^{6}/30-5x^{4}/6+9x^{2}/2+x$ , $X\sim\mathrm{uniform}(-2,2)$ , and $U\sim\mathrm{Laplace}(0,\sigma_{u}/\sqrt{2})$ . 4. (C4)

$[Y|X=x]\sim N(m(x),0.2^{2})$ , where $m(x)=\cos(x^{2})+\sin(x)$ , $X\sim\mathrm{uniform}(-2,2)$ , and $U\sim\mathrm{Laplace}(0,\sigma_{u}/\sqrt{2})$ .

Among these configurations, (C1) is considered in Delaigle et al. (2009); (C2) creates a scenario where the dominating bias of $\hat{m}_{\hbox{\tiny DFC}}(x)$ never vanishes since $m(x)$ is a second-order polynomial; (C3), with $m(x)$ being a higher order polynomial, results in zero dominating bias for $\hat{m}_{\hbox{\tiny DFC}}(x)$ within the support of $X$ at $\pm 1$ ; and (C4) has $m(x)$ out of the polynomial family yet it can be expanded as a polynomial of infinite order. Besides the model configuration, we also vary the reliability ratio $\lambda=\textrm{Var}(X)/\{\textrm{Var}(X)+\sigma_{u}^{2}\}$ from 0.7 to 0.95 at increments of 0.05 when generating $\mathbb{W}$ . Under (C2), although the measurement errors are simulated from a normal distribution, we computed the estimates of $m(x)$ assuming a normal $U$ first, and then we repeated the estimation assuming a Laplace $U$ . This exercise allows us to observe the effects of a misspecified distribution for $U$ on the estimates. Under each simulation setting, $500$ Monte Carol (MC) replicates of sample size $n=500$ are generated from the true model of $(Y,W)$ . For both estimation methods, we used the kernel of which the Fourier transform is given by $\phi_{\hbox{\tiny$ K $}}(t)=(1-t^{2})^{8}I_{[-1,1]}(t)$ .

Denote by $\hat{m}_{[\cdot]}(x)$ one of the two estimates under comparison generically. For the majority of the simulation experiments, in order to mitigate the confounding effect of a data-driven bandwidth selection method on the quality of $\hat{m}_{[\cdot]}(x)$ , we computed $\hat{m}_{[\cdot]}(x)$ using the theoretical optimal bandwidth obtained via minimizing an approximate of the integrated squared error (ISE), ${\rm ISE}=\int_{x_{\hbox{\tiny L}}}^{x_{\hbox{\tiny U}}}\{\hat{m}_{[\cdot]}(x)-m(x)\}^{2}dx$ , where $[x_{\hbox{\tiny L}},\,x_{\hbox{\tiny U}}]$ is the interval of the true covariate value of interest. This approximated ${\rm ISE}$ is given by $\sum_{k=0}^{\mathcal{M}}\{\hat{m}_{[\cdot]}(x_{k})-m(x_{k})\}^{2}\Delta$ , where $\Delta$ is the partition resolution, $\mathcal{M}$ is the largest integer no greater than $(x_{\hbox{\tiny U}}-x_{\hbox{\tiny L}})/\Delta$ , and $x_{k}=x_{\hbox{\tiny L}}+k\Delta$ , for $k=0,\ldots,\mathcal{M}$ . For a small portion of the presented simulation experiments, we used the CV-SIMEX bandwidth selection strategy described in Section 6.2 to select a bandwidth for each of the two estimators. Note that, when choosing a bandwidth for $\hat{m}_{\hbox{\tiny DFC}}(x)$ , one should change $\hat{m}^{*}_{\hbox{\tiny HZ}}(x)$ and $\hat{m}^{**}_{\hbox{\tiny HZ}}(x)$ in Section 6.2 to the counterpart estimates $\hat{m}^{*}_{\hbox{\tiny DFC}}(x)$ and $\hat{m}^{**}_{\hbox{\tiny DFC}}(x)$ , respectively.

We compare the performance of $\hat{m}_{\hbox{\tiny HZ}}(x)$ and $\hat{m}_{\hbox{\tiny DFC}}(x)$ with regard to the quality of the entire regression curve estimation over $[x_{\hbox{\tiny L}},\,x_{\hbox{\tiny U}}]$ , as well as the quality of the estimation of $m(x)$ at individual $x$ ’s. The quantity used to monitor the overall regression curve estimation is the approximated ISE. The quantities used to assess the quality of $\hat{m}_{[\cdot]}(x)$ at a particular point $x=x_{0}$ are based on the pointwise absolute error (PAE), $\textrm{PAE}(x_{0})=|\hat{m}_{[\cdot]}(x_{0})-m(x_{0})|$ . Specifically, we compute the following three summary statistics: first, the pointwise mean absolute error ratio (PmAER) defined by

[TABLE]

second, the pointwise standard deviation of absolute error ratio (PsdAER) defined by

[TABLE]

and third, the pointwise mean squared error ratio (PMSER) defined by

[TABLE]

These quantities are presented in Figures 1–4 for (C1)–(C4), respectively. Figure 5 shows the counterpart results of Figure 2 under (C2) when it is (incorrectly) assumed that $U$ follows a Laplace distribution. These five figures depict results obtained when the theoretical optimal $h$ is used. Lastly, Figure 6 is the counterpart of Figure 4 under (C4) with $h$ chosen by the CV-SIMEX bandwidth selection procedure with $B=10$ and $L=10$ . Very similar performance of the two estimates is observed when larger values of $B$ or $L$ are used in this round of experiment.

When the theoretical optimal bandwidth is used, as in Figures 1–5, $\hat{m}_{\hbox{\tiny HZ}}(x)$ outperforms $\hat{m}_{\hbox{\tiny DFC}}(x)$ over the majority region of each considered range of $x$ in regard to both accuracy and precision. Even though it is shown in Section 4.2 that the dominating variance of $\hat{m}_{\hbox{\tiny HZ}}(x)$ is higher than that of $\hat{m}_{\hbox{\tiny DFC}}(x)$ when the distribution of $U$ is ordinary smooth (e.g., a Laplace distribution), this large sample trend does not take effect for the majority region of $x$ in these finite sample experiments. The regions where $\hat{m}_{\hbox{\tiny DFC}}(x)$ performs better than $\hat{m}_{\hbox{\tiny HZ}}(x)$ in regard to bias, variance, and MSE are usually neighborhoods of the inflection points of $m(x)$ . For instance, under (C3) (see panel (i) in Figure 3), $\hat{m}_{\hbox{\tiny DFC}}(x)$ is less biased than $\hat{m}_{\hbox{\tiny HZ}}(x)$ at the small neighborhoods of $\pm 1$ . It is worth pointing out that the gain in accuracy and precision from our estimator compared to the DFC estimator is especially promising at the boundary of $x$ in (C3) and (C4) (see panels (c), (f), and (i) in Figures 3 and 4). In both cases, data points uniformly distribute over the domain of $m(x)$ . Different from (C3) and (C4), in (C1), there are more data points near the boundaries than elsewhere in the domain. Excluding (C2) (since the plotted range of $x$ in Figures 2 and 5 is not the entire observed range), (C1) is the only case among all considered cases here that $\hat{m}_{\hbox{\tiny DFC}}(x)$ outperforms $\hat{m}_{\hbox{\tiny HZ}}(x)$ near the boundaries in terms of bias. However, $\hat{m}_{\hbox{\tiny HZ}}(x)$ is still substantially less variable, and its MSE is lower than that of the competing estimator (see panels (c), (f), and (i) of Figure 1). Finally, contrasting Figure 2 and Figure 5, one can see that both estimators are fairly robust to the misspecification of the measurement error distribution.

When the bandwidth is chosen by the refined CV-SIMEX method, as in Figure 6, both estimates become more variable, with our estimates better than the DFC estimates over most of the 500 MC replicates. As mentioned earlier, increasing $B$ to a larger value does not substantially change our estimate. More importantly, using a $B$ smaller than ten affects our estimator far less than it affects the DFC estimator.

6.4 Motorcycle data

We now apply the two estimation methods to error-contaminated data sets created based on the motorcycle crash data from a simulated motorcycle crash designed to test crash helmets (available under R library MASS). The original data set consists of 133 measurements of head acceleration measured in standard gravity acceleration (gs) at various times in milliseconds after impact. It is of interest to estimate the underlying head acceleration, $Y$ , as a function of time after impact, $X$ . Having the error-free data in this example allows us to have a reference estimate of the regression function with which the estimates based on error-prone data can be compared.

Based on the original data, we first obtain the local linear estimate of $m(x)$ , denoted by $\hat{m}(x)$ , using the R function locpol in the locpol package, with the bandwidth chosen by cross validation (Wang and Jones, 1995) implemented by function regCVBwSelC in the same R package. Compared to the fitted curves using error-prone data, the $\hat{m}(x)$ can be viewed as the “ideal” estimate in the sense that one cannot do better than this with error-contaminated data. We use this ideal curve as the reference curve in our follow-up experiments, where we contaminate $X$ with simulated independent Laplace measurement errors to achieve different levels of reliability ratio $\lambda$ . At each level of $\lambda$ , we use the error-contaminated data to estimate the acceleration curve using the two estimation methods, both assuming Laplace $U$ . This experiment of curve estimation is repeated 500 times at each level of $\lambda$ . We obtained very similar results when we contaminated $X$ with simulated normal $U$ while assuming Laplace $U$ for estimations.

Figure 7 depicts the results, including boxplots of ISE at each $\lambda$ level, the fitted curves for $\lambda=0.95$ selected according to quantiles of ISE when the approximated theoretical optimal $h$ is used, and the counterpart fitted curves when the refined CV-SIMEX method is used to select $h$ with $B=10$ and $L=10$ . Using the ideal estimate as the “truth,” our estimate appears to be less biased and less variable at all considered levels of error contamination than the DFC estimate. When the refined CV-SIMEX method is used to select $h$ , our estimator suffers less numerical instability compared to the competing method.

7 Discussion

In this study we proposed a local polynomial estimator of the regression function when the covariate is measured with error. The proposed estimator makes direct use of the naive inference, leading to relatively more transparent connections between the properties of the proposed estimator and those of the inference from error-free data. We rigorously derived the asymptotic properties of the proposed estimator in comparison with the estimator proposed by Delaigle et al. (2009). Under very similar regularity conditions, besides the asymptotic normality that both estimators possess, the asymptotic bias and variance of these estimators are carefully compared. Theoretical evidence suggests that the new estimator can be less biased than the competing estimator. Results from extensive simulation study also support this finding.

To implement the proposed method, we thoughtfully refined the CV-SIMEX bandwidth selection method proposed by Delaigle and Hall (2008) to narrow the search region of $h$ , which in turn allows us to use a much smaller $B$ in the SIMEX implementation without noticeable loss in accuracy. This refinement greatly reduces the computational burden for the otherwise intrinsically cumbersome bandwidth selection procedure.

In our simulation studies, how the proposed estimator and the DFC estimator compare at the boundary of the support of $X$ depends on the distribution of $X$ . Even though the proposed estimator appears to suffer less numerical instability when the refined CV-SIMEX method is used to select $h$ , it can still be rather challenging to estimate the curve near the boundary. The properties of our estimator near the boundary deserve further investigation, which may lead to ways to improve its behavior near the boundary. Besides the generalization of the proposed method pointed out earlier in Section 2.2 to allow multiple covariates, one can also follow the construction of the proposed estimator in (7) to obtain non-naive estimators of $m(x)$ by starting with a parametric naive estimator $\hat{m}^{*}(w)$ . For instance, one may naively fit a polynomial regression function to obtain $\hat{m}^{*}(w)$ , then use it in (7) to achieve a non-naive estimator of $m(x)$ that is not completely nonparametric. However, the obtained estimator of $m(x)$ is usually not of the same functional form as $m^{*}(w)$ . If one wishes to fit a polynomial regression function accounting for measurement error, the method proposed by Zavala et al. (2007) is a more appealing approach than our proposed nonparametric approach.

The measurement error distribution is assumed be known in the simulation study presented in Section 6.3, where in one case the distribution is misspecified as a Laplace distribution, and we apprehend little influence of such misspecification on the proposed estimator. This robustness phenomenon is also pointed out in Delaigle et al. (2009) for the DFC estimator, and is discussed in Meister (2004) and Delaigle (2008). Taking advantage of this robustness feature, when the measurement error distribution is unknown, we recommend using the mean-zero Laplace characteristic function, $\phi_{\hbox{\tiny$ U $}}(t)=1/\{1+(\sigma^{2}_{u}/2)t^{2}\}$ in the estimator, where $\sigma^{2}_{u}$ can be trivially and consistently estimated by equation (4.3) in Carroll et al. (2006) when repeated measures of each $X_{j}$ are available. We implement this recommended strategy for the four cases considered in Section 6.3 and observe very similar results as those shown in Figures 1–5. In particular, we generate two replicate measures, $W_{j,k}=X_{j}+U_{j,k}$ , where $U_{j,k}$ ’s are i.i.d. with variance $2\sigma^{2}_{u}$ , for $k=1,2$ , $j=1,\ldots,n$ . Then we define $W_{j}=(W_{j,1}+W_{j,2})/2$ , for $j=1,\ldots,n$ , as the observed covariate values used in $\hat{m}_{\hbox{\tiny HZ}}(x)$ and $\hat{m}_{\hbox{\tiny DFC}}(x)$ , where the associated measurement error variance is $\sigma_{u}^{2}$ . Following equation (4.3) in Carroll et al. (2006), we estimate $\sigma^{2}_{u}$ via $\sum_{j=1}^{n}\sum_{k=1}^{2}(W_{j,k}-W_{j})^{2}/(2n)$ . Figure 8 shows the counterpart results of those shown in Figure 5, from which we can see that using an estimated variance in the misspecified $\phi_{\hbox{\tiny$ U $}}(t)$ does not affect the estimates noticeably. Plots parallel to Figures 1, 3, and 4, which show estimates obtained using the same strategy under the other three cases, are given in Appendix D.

Alternatively, one may follow the approach proposed by Delaigle et al. (2008) to estimate $\phi_{\hbox{\tiny$ U $}}(t)$ when repeated measures are available, which we also implement in the four cases considered in Section 6.3 using the aforementioned simulated repeated measures. Although this approach frees one from assuming a specific distribution for $U$ and estimating $\sigma^{2}_{u}$ , the resultant estimates are mostly inferior to the estimates resulting from an assumed Laplace $U$ with $\sigma^{2}_{u}$ estimated. Figure 9 shows the comparison between these two treatments of $\phi_{\hbox{\tiny$ U $}}(t)$ in our proposed estimator in regard to bias, variability, and MSE. In the three ratios, PmAER, PsdAER, and PMSER, depicted in Figure 9, the estimate in the numerators is our estimate assuming Laplace $U$ with an estimated $\sigma_{u}^{2}$ , and the estimate in the denominator is our estimate with the estimated $\phi_{\hbox{\tiny$ U $}}(t)$ . The comparison clearly shows that there is no gain from estimating $\phi_{\hbox{\tiny$ U $}}(t)$ instead of simply assuming a Laplace $U$ with $\sigma_{u}^{2}$ estimated. Obviously, neither $\sigma^{2}_{u}$ nor $\phi_{\hbox{\tiny$ U $}}(t)$ is identifiable when one does not have repeated measures or other forms of external data that allow one estimate the measurement error distribution. In this case, one can carry out sensitivity analysis with $\sigma^{2}_{u}$ varying over a range of practical interest.

Supplemental materials

The supplement to this article contains Appendices A–D referenced in Sections 3, 4, 5, and 7.

Acknowledgments

The authors express sincere thanks to the Editor, the Associate Editor and an anonymous referee for their constructive and valuable suggestions on this article, which have led to significant improvements.

Disclosure statement

No potential conflict of interest was reported by the authors.

Funding

The first author gratefully acknowledges support from the NSF under grant number DMS-1006222.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bailey and Swarztrauber (1994) Bailey, D. and Swarztrauber, P. (1994), ‘A fast method for the numerical evaluation of continuous Fourier and Laplace transforms’, SIAM Journal on Scientific Computing , 15, 1105–1110.
2Carroll and Hall (1988) Carroll, R. and Hall, P. (1988), ‘Optimal rates of convergence for deconvoluting a density’, Journal of the American Statistical Association , 83, 1184–1186.
3Carroll et al. (2006) Carroll, R., Ruppert, D., Stefanski, L. A., Crainiceanu, C. M. (2006), Measurement error in nonlinear models: A model perspective (2nd ed.), Chapman & Hall/CRC. Boca Raton, FL.
4Davis and Rabinowitz (1984) Davis, P. J. and Rabinowitz, P. (1984), Methods of Numerical Integration . Academic Press.
5Delaigle (2008) Delaigle, A. (2008), ‘An alternative view of the deconvolution problem’, Statistica Sinica , 18, 1025-1045.
6Delaigle (2014) Delaigle, A. (2014), ‘Nonparametric kernel methods with errors-in-variables: constructing estimators, computing them, and avoiding common mistakes’, Australian & New Zealand Journal of Statistics , 56, 105–124.
7Delaigle et al. (2009) Delaigle, A., Fan, J., and Carroll, R. (2009), ‘A design-adaptive local polynomial estimator for the error-in-variables problem’, Journal of the American Statistical Association , 104, 348–359.
8Delaigle and Hall (2008) Delaigle, A. and Hall, P. (2008), ‘Using SIMEX for smoothing-parameter choice in errors-in-variables problems’, Journal of the American Statistical Association , 103, 280-287.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

An alternative local polynomial estimator for the error-in-variables problem

Abstract

keywords:

1 Introduction

2 Existing and proposed estimators

2.1 The DFC estimator

2.2 The proposed estimator

2.3 Preamble for asymptotic analyses

Definition 1**.**

Definition 2**.**

3 Asymptotic bias

3.1 Dominating bias when p=1p=1p=1

3.2 Comparison with the bias of the DFC estimator

4 Asymptotic variance

4.1 Derivations of Var{B(x)∣W}\textrm{Var}\{\mathcal{B}(x)|\mathbb{W}\}Var{B(x)∣W}

Theorem 4.1**.**

4.2 Comparison with the variance of the DFC estimator

5 Asymptotic normality

6 Implementation and finite sample performance

6.1 Numerical evaluations

6.2 Bandwidth selection

6.3 Simulation study

6.4 Motorcycle data

7 Discussion

Supplemental materials

Acknowledgments

Disclosure statement

Funding

Definition 1.

Definition 2.

3.1 Dominating bias when $p=1$

4.1 Derivations of $\textrm{Var}\{\mathcal{B}(x)|\mathbb{W}\}$

Theorem 4.1.