Rate-optimal nonparametric estimation for random coefficient regression   models

Hajo Holzmann; Alexander Meister

arXiv:1902.05261·math.ST·February 18, 2020

Rate-optimal nonparametric estimation for random coefficient regression models

Hajo Holzmann, Alexander Meister

PDF

TL;DR

This paper establishes the optimal convergence rate for nonparametric density estimation in linear random coefficient models, highlighting the influence of tail behavior and proposing an estimator that does not require density division.

Contribution

It derives the first optimal pointwise convergence rate for density estimation in these models, accounting for tail behavior, and introduces an adaptive estimator without dividing by a density estimate.

Findings

01

Achieves the optimal convergence rate for density estimation.

02

Shows the tail behavior of the design density affects the rate.

03

Proposes an estimator that does not require dividing by a nonparametric density estimate.

Abstract

Random coefficient regression models are a popular tool for analyzing unobserved heterogeneity, and have seen renewed interest in the recent econometric literature. In this paper we obtain the optimal pointwise convergence rate for estimating the density in the linear random coefficient model over H\"older smoothness classes, and in particular show how the tail behavior of the design density impacts this rate. In contrast to previous suggestions, the estimator that we propose and that achieves the optimal convergence rate does not require dividing by a nonparametric density estimate. The optimal choice of the tuning parameters in the estimator depends on the tail parameter of the design density and on the smoothness level of the H\"older class, and we also study adaptive estimation with respect to both parameters.

Equations338

Y_{j} = A_{0, j} + A_{1, j} X_{j} .

Y_{j} = A_{0, j} + A_{1, j} X_{j} .

C_{X} (1 + ∣ x ∣)^{- β - 2} \geq f_{X} (x) \geq c_{X} \cdot (1 + ∣ x ∣)^{- β - 2}, \forall x \in R,

C_{X} (1 + ∣ x ∣)^{- β - 2} \geq f_{X} (x) \geq c_{X} \cdot (1 + ∣ x ∣)^{- β - 2}, \forall x \in R,

U_{j}=Y_{j}/\sqrt{1+X_{j}^{2}}\,,\qquad\big{(}\cos Z_{j},\sin Z_{j}\big{)}=(1,X_{j})/\sqrt{1+X_{j}^{2}}\,,

U_{j}=Y_{j}/\sqrt{1+X_{j}^{2}}\,,\qquad\big{(}\cos Z_{j},\sin Z_{j}\big{)}=(1,X_{j})/\sqrt{1+X_{j}^{2}}\,,

U_{j} = A_{0, j} cos Z_{j} + A_{1, j} sin Z_{j} .

U_{j} = A_{0, j} cos Z_{j} + A_{1, j} sin Z_{j} .

\psi_{U|Z}(t|z)=\psi_{A}\big{(}t\cos z,t\sin z\big{)}\,.

\psi_{U|Z}(t|z)=\psi_{A}\big{(}t\cos z,t\sin z\big{)}\,.

f_{A} (a)

f_{A} (a)

\displaystyle=\frac{1}{(2\pi)^{2}}\,\int_{\mathbb{R}}\,\int_{-\pi/2}^{\pi/2}|t|\,\exp\big{(}-it(a_{0}\cos z+a_{1}\sin z)\big{)}\,\psi_{U|Z}\big{(}t|z\big{)}\,\,\mathrm{d}z\,\,\mathrm{d}t\,.

\tilde{f}_{A} (a; h)

\tilde{f}_{A} (a; h)

\displaystyle=\frac{1}{(2\pi)^{2}}\int_{\mathbb{R}}\int_{-\pi/2}^{\pi/2}w(th)|t|\exp\big{(}-it(a_{0}\cos z+a_{1}\sin z)\big{)}\psi_{U|Z}\big{(}t|z\big{)}\,\mathrm{d}z\,\mathrm{d}t

\displaystyle=\int_{-\pi/2}^{\pi/2}\,\int_{\mathbb{R}}K\big{(}u-a_{0}\cos z-a_{1}\sin z;h\big{)}\,f_{U|Z}(u|z)\,\,\mathrm{d}u\,\,\mathrm{d}z\,,

\displaystyle K\big{(}x;h\big{)}:=\frac{1}{(2\pi)^{2}}\,\int_{\mathbb{R}}\,w(th)\,|t|\,\exp(itx)\,\,\mathrm{d}t=\frac{2}{(2\pi)^{2}}\,\int_{0}^{\infty}\,w(th)\,t\,\cos(tx)\,\,\mathrm{d}t.

\displaystyle K\big{(}x;h\big{)}:=\frac{1}{(2\pi)^{2}}\,\int_{\mathbb{R}}\,w(th)\,|t|\,\exp(itx)\,\,\mathrm{d}t=\frac{2}{(2\pi)^{2}}\,\int_{0}^{\infty}\,w(th)\,t\,\cos(tx)\,\,\mathrm{d}t.

\hat{f}_{A} (a; h, δ)

\hat{f}_{A} (a; h, δ)

\cdot \mathbbm 1 (- π /2 + δ \leq Z_{(j)} \leq Z_{(j + 1)} \leq π /2 - δ)

\displaystyle=\frac{1}{(2\pi)^{2}}\,\int_{\mathbb{R}}\,w(th)\,|t|\,\sum_{j=1}^{n-1}\,\exp\big{(}it\big{(}U_{[j]}-a_{0}\cos Z_{(j)}-a_{1}\sin Z_{(j)}\big{)}\big{)}

\displaystyle\hskip 56.9055pt\cdot\big{(}Z_{(j+1)}-Z_{(j)}\big{)}\,\mathbbm{1}(-\pi/2+\delta\leq Z_{(j)}\leq Z_{(j+1)}\leq\pi/2-\delta)\,\,\mathrm{d}t,

j, n, δ \sum := j \in {1, \dots, n}, - π /2 + δ \leq Z_{(j)} \leq Z_{(j + 1)} \leq π /2 - δ \sum

j, n, δ \sum := j \in {1, \dots, n}, - π /2 + δ \leq Z_{(j)} \leq Z_{(j + 1)} \leq π /2 - δ \sum

\hat{f}_{A}(a;h,\delta)=\sum_{j,n,\delta}\,\,K\big{(}U_{[j]}-a_{0}\cos Z_{(j)}-a_{1}\sin Z_{(j)};h\big{)}\,\big{(}Z_{(j+1)}-Z_{(j)}\big{)}.

\hat{f}_{A}(a;h,\delta)=\sum_{j,n,\delta}\,\,K\big{(}U_{[j]}-a_{0}\cos Z_{(j)}-a_{1}\sin Z_{(j)};h\big{)}\,\big{(}Z_{(j+1)}-Z_{(j)}\big{)}.

\Big{|}\frac{\partial^{s}f_{A}}{\partial x^{k}\partial y^{s-k}}(x,y)-\frac{\partial^{s}f_{A}}{\partial x^{k}\partial y^{s-k}}(a_{0},a_{1})\Big{|}\,\leq\,c_{A}\cdot\big{|}(x,y)-a\big{|}^{\alpha-s}\,,

\Big{|}\frac{\partial^{s}f_{A}}{\partial x^{k}\partial y^{s-k}}(x,y)-\frac{\partial^{s}f_{A}}{\partial x^{k}\partial y^{s-k}}(a_{0},a_{1})\Big{|}\,\leq\,c_{A}\cdot\big{|}(x,y)-a\big{|}^{\alpha-s}\,,

\int\underset{y\in\mathbb{R}}{\operatorname{essup}}\big{|}\nabla\psi_{A}(x,y)\big{|}\,\mathrm{d}x\,\leq\,c_{B}\,,

\int\underset{y\in\mathbb{R}}{\operatorname{essup}}\big{|}\nabla\psi_{A}(x,y)\big{|}\,\mathrm{d}x\,\leq\,c_{B}\,,

δ ≍ n^{- \frac{1}{β + 1}}, and h ≍ n^{- \frac{1}{( α + 2 ) ( β + 1 )}},

δ ≍ n^{- \frac{1}{β + 1}}, and h ≍ n^{- \frac{1}{( α + 2 ) ( β + 1 )}},

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{A}(a;h,\delta)-f_{A}(a)\big{|}^{2}\big{]}\,=\,{\cal O}\big{(}n^{-\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\big{)}\,.

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{A}(a;h,\delta)-f_{A}(a)\big{|}^{2}\big{]}\,=\,{\cal O}\big{(}n^{-\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\big{)}\,.

\liminf_{n\to\infty}\,n^{\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\,\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{n}(0)-f_{A}(0)\big{|}^{2}\big{]}\,>\,0\,.

\liminf_{n\to\infty}\,n^{\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\,\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{n}(0)-f_{A}(0)\big{|}^{2}\big{]}\,>\,0\,.

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{A}(a;h,\delta)-f_{A}(a)\big{|}^{2}\big{]}\,={\cal O}\big{(}n^{-\frac{2\alpha}{2\alpha+4}}\big{)};

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{A}(a;h,\delta)-f_{A}(a)\big{|}^{2}\big{]}\,={\cal O}\big{(}n^{-\frac{2\alpha}{2\alpha+4}}\big{)};

\displaystyle\text{Var}_{f_{A}}\big{(}\hat{f}_{A}(a;h,\delta)|\sigma_{Z}\big{)}

\displaystyle\text{Var}_{f_{A}}\big{(}\hat{f}_{A}(a;h,\delta)|\sigma_{Z}\big{)}

\delta\asymp\Big{(}\frac{\log n}{n}\Big{)}^{\frac{1}{\beta+1}},\quad\text{ and }\quad h\asymp\Big{(}\frac{\log n}{n}\Big{)}^{\frac{1}{(\alpha+2)(\beta+1)}}.

\delta\asymp\Big{(}\frac{\log n}{n}\Big{)}^{\frac{1}{\beta+1}},\quad\text{ and }\quad h\asymp\Big{(}\frac{\log n}{n}\Big{)}^{\frac{1}{(\alpha+2)(\beta+1)}}.

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\sup_{a\in K}\,\big{|}\hat{f}_{A}(a;h,\delta)-f_{A}(a)\big{|}^{2}\big{]}\,=\,{\cal O}\Big{(}\Big{(}\frac{\log n}{n}\Big{)}^{\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\Big{)}\,.

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\sup_{a\in K}\,\big{|}\hat{f}_{A}(a;h,\delta)-f_{A}(a)\big{|}^{2}\big{]}\,=\,{\cal O}\Big{(}\Big{(}\frac{\log n}{n}\Big{)}^{\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\Big{)}\,.

L_{n} (δ)

L_{n} (δ)

C_{n} (δ) :=

C_{n} (δ) :=

+ (L_{n} (δ) + π /2)^{2} + (π /2 - R_{n} (δ))^{2} + δ^{2},

C_{n} (\hat{δ}_{n}) \leq exp (- n) + δ \in [n^{- 1/2}, π /4] in f C_{n} (δ) .

C_{n} (\hat{δ}_{n}) \leq exp (- n) + δ \in [n^{- 1/2}, π /4] in f C_{n} (δ) .

\hat{h}_{n}=\big{(}{\cal C}_{n}(\hat{\delta}_{n})\big{)}^{\frac{1}{2\,(\alpha+2)}},

\hat{h}_{n}=\big{(}{\cal C}_{n}(\hat{\delta}_{n})\big{)}^{\frac{1}{2\,(\alpha+2)}},

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{A}\big{(}a;\hat{h}_{n},\hat{\delta}_{n}\big{)}-f_{A}(a)\big{|}^{2}\big{]}\,=\,{\cal O}\big{(}n^{-\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\big{)}\,,

\sup_{f_{A}\in{\cal F}}\operatorname{\mathbb{E}}_{f_{A}}\big{[}\big{|}\hat{f}_{A}\big{(}a;\hat{h}_{n},\hat{\delta}_{n}\big{)}-f_{A}(a)\big{|}^{2}\big{]}\,=\,{\cal O}\big{(}n^{-\frac{2\,\alpha}{(\alpha+2)(\beta+1)}}\big{)}\,,

h_{k} = \hat{δ}_{n}^{1/2} q^{k}, k \in K_{n} = {0, \dots, K},

h_{k} = \hat{δ}_{n}^{1/2} q^{k}, k \in K_{n} = {0, \dots, K},

\hat{f}_{k} = \hat{f}_{A} (a; h_{k}, \hat{δ}_{n}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Rate-optimal nonparametric estimation for random coefficient regression models

Hajo Holzmannlabel=e1][email protected] [

Alexander Meisterlabel=e2][email protected] [ Philipps-Universität Marburg\thanksmarkm1

Universität Rostock\thanksmarkm2

Hajo Holzmann

Fachbereich Mathematik und Informatik,

Philipps-Universität Marburg,

35037 Marburg, Germany.

Alexander Meister

Institut für Mathematik,

Universität Rostock,

18051 Rostock, Germany.

Abstract

Random coefficient regression models are a popular tool for analyzing unobserved heterogeneity, and have seen renewed interest in the recent econometric literature. In this paper we obtain the optimal pointwise convergence rate for estimating the density in the linear random coefficient model over Hölder smoothness classes, and in particular show how the tail behavior of the design density impacts this rate. In contrast to previous suggestions, the estimator that we propose and that achieves the optimal convergence rate does not require dividing by a nonparametric density estimate. The optimal choice of the tuning parameters in the estimator depends on the tail parameter of the design density and on the smoothness level of the Hölder class, and we also study adaptive estimation with respect to both parameters.

62G07,

62G20,

62G30,

adaptive estimation,

ill-posed inverse problem,

minimax risk,

nonparametric estimation.,

keywords:

[class=MSC]

keywords:

\arxiv

arXiv:1902.05261 \startlocaldefs

\endlocaldefs

and

1 Introduction

In this paper we consider the linear random coefficient regression model, in which i.i.d. (independent and identically distributed) data $(X_{j},Y_{j})$ , $j=1,\ldots,n$ are observed according to

[TABLE]

Therein $A_{j}:=(A_{0,j},A_{1,j})$ are unobserved i.i.d. random variables with the bivariate Lebesgue density $f_{A}$ ; while $A_{j}$ and $X_{j}$ are independent. Note that (1.1) represents a randomized extension of the standard linear regression model. We shall derive the optimal convergence rates for estimating $f_{A}$ over Hölder smoothness classes in case when the $X_{j}$ have a Lebesgue density $f_{X}$ with polynomial tail behaviour, as specified in Assumption 1 below.

From a parametric point of view with focus on means and variances of the random coefficients, a multivariate version of model (1.1) is studied by [11]. They assume the coefficients $A_{j}$ to be mutually independent. The nonparametric analysis of model (1.1) has been initiated by [3] and [4]. [2] use Fourier methods to construct an estimator of $f_{A}$ . They do not derive the optimal convergence rate, though. Furthermore, their estimator is rather involved as it requires a nonparametric estimator of a conditional characteristic function, which is then plugged into a regularized Fourier inversion.

Extensions of model (1.1) have seen renewed interest in the econometrics literature in recent years. [13] suggest a nonparametric estimator in a multivariate version of model (1.1). They only obtain its convergence rate for very heavy tailed regressors. Moreover, their estimator requires dividing by a nonparametric density estimator for a transformed version of the regressors. This involves an additional smoothing step, and potentially renders the estimator unstable. [5] propose a specification test for model (1.1) against a general nonseparable model as the alternative, while [6] suggest multiscale tests for qualitative hypotheses on $f_{A}$ . Extensions and modifications of model (1.1) are studied in [9], [17], [1], [8], [10], [18], [19] and [12]. Methods of analytic continuation of the coefficients density outside the support of the covariates are considered under more restrictive conditions in [12] and in the recent work of [7].

In this paper, we consider the basic model (1.1) under the following condition.

Assumption 1 (Design density).

For some constants $\beta>0$ and $C_{X}>c_{X}>0$ , the density $f_{X}$ satisfies

[TABLE]

We analyze precisely how the tail parameter $\beta$ of $f_{X}$ influences the optimal rate of convergence of $f_{A}$ at a given point $a\in\mathbb{R}^{2}$ in a minimax sense in case $\beta>1$ . Note that the heavy tailed setting which is studied in [13] corresponds to $\beta=0$ in Assumption 1. To our best knowledge a rigorous study of the minimax convergence rate in the more realistic case of $\beta>1$ has been missing so far. Indeed we fill this gap and derive optimal rates, which are fundamentally new and not known from any other nonparametric estimation problem.

The estimator which we propose is inspired by [12]. It achieves the optimal convergence rate and does not require dividing by a nonparametric density estimator. Instead we exploit the order statistic of the transformed design variables in a Priestley-Chao manner. The optimal choice of the tuning parameters depends both on the two parameters $\beta$ and on the smoothness parameter of the Hölder class, which is reminiscent of the estimation problem in [14] and in contrast to usual adaptation problems in nonparametric curve estimation, in which the smoothing parameters shall adapt only to an unknown smoothness level. Here we show how to make the estimator adaptive with respect to both of these parameters.

The paper is organized as follows. In Section 2 we introduce our estimation procedure. Section 3 is devoted to upper and lower risk bounds, which yield minimax rate optimality for the pointwise risk. We also derive an upper risk bound for the uniform risk, here, an additional logarithmic factor occurs. In Section 4 we deal with adaptivity. The proofs and technical lemmas are deferred to Section 5.

Let us fix some notation: $\psi_{A}$ denotes the characteristic function of the $A_{j}$ , while $\psi_{U|Z}$ is the conditional characteristic function of the random variable $U$ given the random variable $Z$ . Throughout $|\cdot|$ stands for the Euclidean norm of a real or complex vector, and $\mathbbm{1}(A)$ denotes the indicator function of the event $A$ . For positive sequences $(a_{n})$ and $(b_{n})$ we write $a_{n}\asymp b_{n}$ if $c\,a_{n}\leq b_{n}\leq C\,a_{n}$ , $n\in\mathbb{N}$ for constants $0<c<C$ .

2 The estimator

In order to construct an estimator for $f_{A}$ in model (1.1), we transform the data $(X_{j},Y_{j})$ into $(Z_{j},U_{j})$ via

[TABLE]

so that $Z_{j}\in(-\pi/2,\pi/2)$ almost surely (a.s.), $Z_{j}$ and $A_{j}$ are independent, and

[TABLE]

Then the conditional characteristic function $\psi_{U|Z}(\cdot|z)$ of $U_{j}$ given $Z_{j}=z$ equals

[TABLE]

By Fourier inversion, integral substitution into polar coordinates (with signed radius) and (2.2) we deduce that

[TABLE]

The equation (2) motivates us to estimate $f_{A}$ by an empirical version of the conditional characteristic function $\psi_{U|Z}$ which is directly accessible from the data $(Z_{j},U_{j})$ . For that purpose choose a function $w$ which satisfies the following assumption.

Assumption 2 (Kernel).

For a number $\ell\in\mathbb{N}_{0}$ the function $w:\mathbb{R}\to\mathbb{R}$ is even, supported on $[-1,1]$ , $(\ell+1)$ -fold continuously differentiable on the whole real line, satisfies $w(0)=1$ as well as $w^{(k)}(0)=0$ for all $k=1,\ldots,\ell$ , and $|w|$ is bounded by $1$ .

Assumption 2 could be relaxed somewhat. In particular, we may assume compact support instead of imposing the support of $w$ to be a subset of $[-1,1]$ and we may remove the condition that $|w|$ is bounded by $1$ . Simple boundedness is sufficient, which follows from the other conditions.

Now we consider the regularized version of $f_{A}$ by kernel smoothing as follows

[TABLE]

where

[TABLE]

Inspired by (2.4) we introduce a Priestley-Chao type estimator of the density $f_{A}$ ,

[TABLE]

where $(U_{[j]},Z_{(j)})$ , $j=1,\ldots,n$ , denotes the sample $(U_{j},Z_{j})$ , $j=1,\ldots,n$ , sorted such that $Z_{(1)}\leq\ldots\leq Z_{(n)}$ , and where $h=h_{n}>0$ is a classical bandwidth parameter and $\delta=\delta_{n}\geq 0$ is a threshold parameter both of which remain to be selected. By the parameter $\delta$ we cut off that subset of the interval $[-\pi/2,\pi/2]$ in which the $Z_{j}$ are sparse.

In the following we shall use the symbol

[TABLE]

to denote the sum over the random set of indices $1\leq j\leq n-1$ for which $-\pi/2+\delta\leq Z_{(j)}\leq Z_{(j+1)}\leq\pi/2-\delta$ . Thus, we may write the estimator in (2.6) as

[TABLE]

In this paper we consider one-dimensional covariates $Z_{1},\ldots,Z_{n}$ only. From a methodological point of view, the estimator (2.6) could be extended to the multivariate setting by using Voronoi cells instead of the order statistics. A similar technique is proposed in eq. (36) in [12]. On the other hand, the asymptotic properties of such an estimator might be completely different from the univariate case.

3 Upper and lower risk bounds

We consider the following Hölder smoothness class of densities.

Definition.

For a point $a=(a_{0},a_{1})\in\mathbb{R}^{2}$ , a smoothness index $\alpha>0$ and constants $c_{A},c_{B},r_{A},c_{M}>0$ define the class ${\cal F}={\cal F}(a,\alpha,c_{A},c_{B},r_{A},c_{M})$ of densities as follows: $f_{A}\in{\cal F}(a,\alpha,c_{A},c_{B},r_{A},c_{M})$ is Hölder-smooth of the degree $\alpha$ in the neighborhood $U_{r_{A}}(a)=\{b\in\mathbb{R}^{2}\mid|a-b|<r_{A}\}$ , that is, $f_{A}$ is $s=\lfloor\alpha\rfloor=\max\{k\in\mathbb{N}_{0}\mid k<\alpha\}$ -times continuously differentiable in $U_{r_{A}}(a)$ and its partial derivatives satisfy

[TABLE]

for all $k=0,\ldots,s$ and $(x,y)\in U_{r_{A}}(a)$ . Furthermore, assume that the Fourier transform $\psi_{A}$ of $f_{A}$ is weakly differentiable and its weak derivative $\nabla\psi_{A}$ satisfies

[TABLE]

and that $f_{A}(a)\leq c_{M}$ for all $a\in\mathbb{R}^{2}$ .

For the proof of the first theorem, the global partial tail and smoothness condition (3.2) of the order $1$ is required in addition to the local smoothness assumption (3.1) of the order $\alpha$ . The theorem provides an upper bound on the convergence rate for the estimator in (2.6).

Theorem 3.1.

Consider model (1.1) and assume that $f_{X}$ satisfies (1.2) for some $\beta>1$ . If $w$ satisfies Assumption 2 for $l\geq 2\,\lfloor\alpha\rfloor$ , and if $\delta=\delta_{n}$ and $h=h_{n}$ are chosen such that

[TABLE]

then the estimator (2.6) attains the following asymptotic risk upper bound over the function class ${\cal F}={\cal F}(a,\alpha,c_{A},c_{B},r_{A},c_{M})$ ,

[TABLE]

The following theorem yields that the convergence rates which our estimator (2.6) achieves according to Theorem 3.1 are optimal for the pointwise risk in the minimax sense.

Theorem 3.2.

Fix $a=0$ and the constants $c_{A}$ , $c_{B}$ sufficiently large for any $\alpha>0$ and $\beta>1$ . Let $(\hat{f}_{n})_{n}$ be an arbitrary sequence of estimators of $f_{A}$ , where $\hat{f}_{n}$ is based on the data $(X_{j},Y_{j})$ , $j=1,\ldots,n$ , for each $n$ . Assume that $f_{X}$ satisfies (1.2). Then

[TABLE]

The convergence rates from Theorem 3.1 and 3.2 differ significantly from standard rates in nonparametric estimation. While they become faster as $\alpha$ increases, they become slower as $\beta$ gets larger. It is remarkable that they do not approach the (squared) parametric rate $n^{-1}$ but the slower rate $n^{-2/(\beta+1)}$ for large $\alpha$ .

The case $\beta\leq 1$ . An analysis of the proof of Theorem 3.1 shows that in case $\beta<1$ , choosing $\delta\asymp n^{-\frac{1}{\beta+1}}$ and $h\asymp n^{-\frac{1}{2\alpha+4}}$ gives the rate

[TABLE]

in case $\beta=1$ , an additional logarithmic factor occurs. The upper bound no longer depends on $\beta$ in this regime. For $\beta=0$ , [13] obtain the faster rate ${\cal O}\big{(}n^{-\frac{2\alpha}{2\alpha+3}}\big{)}$ ; their rate is in $\mathcal{L}_{2}$ but could be transferred to a pointwise rate. However, they additionally impose the assumption that the density $f_{A}$ is uniformly bounded with a bounded support. This implies that $f_{U|Z}$ is also uniformly bounded. Under this additional assumption, instead of (5.4) in our analysis, we have the sharper bound

[TABLE]

since $\int_{\mathbb{R}}K^{2}\big{(}u;h\big{)}\,\,\mathrm{d}u\leq\mbox{const.}\cdot h^{-3}.$ Then one can show that our estimator also achieves the rate ${\cal O}\big{(}n^{-\frac{2\alpha}{2\alpha+3}}\big{)}$ for $\beta=0$ , even with the choice $\delta=0$ .

Finally, we consider the uniform rate of convergence, again in the case $\beta>1$ .

Theorem 3.3.

Consider model (1.1) and assume that $f_{X}$ satisfies (1.2) for some $\beta>1$ . Suppose that $w$ satisfies Assumption 2 for $l\geq 2\,\lfloor\alpha\rfloor$ , and that $\delta=\delta_{n}$ and $h=h_{n}$ are chosen such that

[TABLE]

For a compact rectangle $K\subseteq\mathbb{R}^{2}$ let ${\cal F}(K,\alpha,c_{A},c_{B},r_{A},c_{M})$ denote the class of densities on $\mathbb{R}^{2}$ such that $f\in{\cal F}(a,\alpha,c_{A},c_{B},r_{A},c_{M})$ for each $a\in K$ . Then the estimator (2.6) attains the following uniform asymptotic risk upper bound over the function class ${\cal F}={\cal F}(K,\alpha,c_{A},c_{B},r_{A},c_{M})$ ,

[TABLE]

4 Adaptation

4.1 Adaptation with respect to $\beta$ for given smoothness

Assume that (1.2) holds with unknown $\beta>1$ . If there are at least two observations $Z_{j}$ in the interval $[-\pi/2+\delta,\pi/2-\delta]$ so that $\sum_{j,n,\delta}\,$ is not the sum over the empty set, we set

[TABLE]

otherwise we put $L_{n}(\delta)=-\pi/2$ and $R_{n}(\delta)=\pi/2$ . To define a selection rule for $\delta$ , define the function

[TABLE]

which is continuous except at the sites $\pi/2$ , $Z_{j}+\pi/2$ and $\pi/2-Z_{j}$ for $j=1,\ldots,n$ . Now choose $\delta=\hat{\delta}_{n}$ in the interval $[n^{-1/2},\pi/4]$ such that

[TABLE]

The next proposition shows that the convergence rate from Theorem 3.1 does not deteriorate if only $\beta$ is unknown but $\alpha$ is known.

Proposition 4.1.

Consider model (1.1) and assume that $f_{X}$ satisfies (1.2) for some unknown $\beta>1$ . Choose $w$ satisfying the Assumption 2 for $2\,\lfloor\alpha\rfloor\leq l$ for given $\alpha>0$ . If $\hat{\delta}_{n}$ is chosen in (4.2) and

[TABLE]

then for the estimator $\hat{f}_{A}\big{(}a;\hat{h}_{n},\hat{\delta}_{n}\big{)}$ we have that

[TABLE]

where ${\cal F}={\cal F}(a,\alpha,c_{A},c_{B},r_{A},c_{M})$ .

4.2 Adaptation by the Lepski method

Finally we consider adaptivity with respect to both parameters $\beta$ and $\alpha$ based on a combination of Lepski’s method, see [15] and [16], and the choice (4.2). Consider the grid of bandwidths

[TABLE]

where $q>1$ , $K=K_{n}=\lfloor\log_{q}n\rfloor$ and $\hat{\delta}_{n}$ is defined in (4.2). Fix $a\in\mathbb{R}^{2}$ and denote

[TABLE]

For $C_{\text{Lep}}>0$ sufficiently large to be chosen we let

[TABLE]

where

[TABLE]

Theorem 4.1.

Consider model (1.1) and assume that $f_{X}$ satisfies (1.2) for some unknown $\beta>1$ . Choose $w$ according to Assumption 2 for some $l\in\mathbb{N}_{0}$ . Then for sufficiently large $C_{\text{Lep}}>0$ (e.g. $C_{\text{Lep}}=20^{2}$ suffices), we have, for every $\alpha>0$ with $2\,\lfloor\alpha\rfloor\leq l$ , that

[TABLE]

where ${\cal F}:={\cal F}(a,\alpha,c_{A},c_{B},r_{A},c_{M})$ .

Thus for adaptivity an additional logarithmic factor occurs in the pointwise rate under Hölder smoothness constraints.

5 Proofs

In the proofs we drop $f_{A}\in{\cal F}$ in $\operatorname{\mathbb{E}}=\operatorname{\mathbb{E}}_{f_{A}}$ and in $\operatorname{\mathbb{P}}=\operatorname{\mathbb{P}}_{f_{A}}$ from the notation.

5.1 Proofs for Section 3

Proof of Theorem 3.1.

By passing to Cartesian coordinates in (2.4) we can write

[TABLE]

Assumption 2 guarantees that $\tilde{w}$ is a kernel of order $\ell$ . Then, using Taylor approximation as usual in kernel regularization, see p. 37–38 in [20] for the argument in case of non-compactly supported kernels, the following asymptotic rate of the regularization bias term occurs

[TABLE]

where the constant factor $C_{\text{Bias}}(\alpha,w,c_{A},c_{M})$ only depends on $c_{A}$ , $c_{M}$ , $w$ and $\alpha$ .

Now let $\sigma_{Z}$ denote the $\sigma$ -field generated by $Z_{1},\ldots,Z_{n}$ , and consider the conditional bias-variance decomposition

[TABLE]

Since $U_{[1]},\ldots,U_{[n]}$ are independent given $\sigma_{Z}$ , observing from (2.5) that $\|K(\cdot;h)\|_{\infty}={\cal O}(h^{-2})$ , we may bound

[TABLE]

where the constant factor only depends on $w$ . Therein we use the notation (2.7). For the conditional expectation, we obtain that

[TABLE]

where we set

[TABLE]

We deduce that

[TABLE]

where

[TABLE]

where $L_{n}(\delta)$ and $R_{n}(\delta)$ are defined in (4.1). If there are no two consecutive $Z_{j}$ in the interval $[-\pi/2+\delta,\pi/2-\delta]$ , then $\tilde{\psi}(t,z)=0$ (indeed $\hat{f}_{A}(a;h,\delta)=0$ ). In this case, by our convention we have $L_{n}(\delta)=-\pi/2$ and $R_{n}(\delta)=\pi/2$ so that $I_{2}=I_{3}=$ and $I_{1}$ is the integral from $-\pi/2$ to $\pi/2$ , as required for the estimate (5.5) to remain true in this case.

First, consider the term $I_{3}$ . Using the Cauchy-Schwarz inequality, it holds that

[TABLE]

Analogously we establish that

[TABLE]

Finally, consider the term $I_{1}$ . In case when there are two consecutive $Z_{j}$ in the interval $[-\pi/2+\delta,\pi/2-\delta]$ so that the sum in (2.7) is not empty, it holds that

[TABLE]

Now, for $z\in[Z_{(j)},Z_{(j+1)})$ , we get that

[TABLE]

according to (2.2). Hence we may bound

[TABLE]

Applying the Cauchy-Schwarz inequality gives for $I_{1,2}$

[TABLE]

For $I_{1,1}$ interchanging sum and integrals we obtain

[TABLE]

Using the Cauchy-Schwarz inequality twice yields

[TABLE]

Hence, the term $I_{1}$ obeys the upper bound

[TABLE]

Finally, if there are no two consecutive $Z_{j}$ in the interval $[-\pi/2+\delta,\pi/2-\delta]$ , we simply have $I_{1}\leq\big{|}\tilde{f}_{A}(a;h)\big{|}^{2}\leq f_{A}(a)^{2}+\text{const.}\,\cdot h^{2\alpha}\leq\text{const.}$ Collecting the terms that bound (5.5) and using (5.4), from (5.3) we obtain that

[TABLE]

Here, the last term takes care of the event in which the sum $\sum_{j,n,\delta}\,$ is empty and the estimator actually is zero. In order to bound the terms in (5.7) involving the order statistics, we note that since $\beta>1$ ,

[TABLE]

From (5.2) and (5.7) and Lemma 5.1 we obtain for $\delta\leq\pi/4$ that

[TABLE]

Upon inserting the rates for $\delta$ and $h$ we obtain the result.

∎

Proof of Theorem 3.2.

We introduce the functions

[TABLE]

for $\theta\in\{0,1\}$ , some constant $c_{L}>0$ and some sequences $(\alpha_{n})_{n}\downarrow 0$ and $(\beta_{n})_{n}\uparrow\infty$ which remain to be selected; moreover we specify

[TABLE]

and

[TABLE]

where

[TABLE]

We verify that $f_{A,0}$ is a probability density as $f_{0}$ and $\varphi$ are probability densities. The Fourier transform of $f_{A,\theta}$ equals

[TABLE]

so that

[TABLE]

since $\varphi^{ft}$ is supported on the interval $[-1,1]$ . Choosing the constant $c_{L}>0$ sufficiently small we can guarantee that $f_{A,1}$ is a non-negative function and satisfies the inequality

[TABLE]

for some constant $c_{L}^{*}\in(0,1)$ . Thus, $f_{A,1}$ is a probability density as well. Furthermore we verify that $f_{A,\theta}\in{\cal F}$ for both $\theta\in\{0,1\}$ under the constraint

[TABLE]

as $c_{A}$ and $c_{B}$ may be viewed as sufficiently large. Therein note that (3.2) is satisfied as $\psi_{A,\theta}$ can be written as the sum of two functions $(x,y)\mapsto\psi_{0}(x/\alpha_{n})\cdot\psi_{1}(y/\beta_{n})$ where $\psi_{j}$ , $j=0,1$ are bounded, weakly differentiable, integrable functions whose weak derivatives are essentially bounded and integrable as well.

The squared pointwise distance between $f_{A,0}$ and $f_{A,1}$ at [math] equals

[TABLE]

Using (5.8), the conditional density of $Y_{j}$ given $X_{j}$ under the parameter $\theta$ equals

[TABLE]

for all $y\in\mathbb{R}$ . Moreover we have that

[TABLE]

where the Fourier transform equals

[TABLE]

Therefore the $\chi^{2}$ -distance between the competing observation densities is bounded from above as follows,

[TABLE]

where

[TABLE]

Moreover, this choice also guarantees that $f_{C,\theta}$ integrates to $1$ and, hence, is a probability density. Then the integrals in (5.11) range over a subset of

[TABLE]

as $H_{0}^{ft}$ and its (weak) derivative are supported on $[-1,1]$ . Also these functions are uniformly bounded by $1$ . Thus the integrals vanish whenever $|X_{j}|<\beta_{n}/\alpha_{n}$ . It follows that

[TABLE]

if $|X_{j}|\geq\beta_{n}/\alpha_{n}$ ; and $\chi^{2}(f_{Y_{j}\mid X_{j},\theta=0},f_{Y_{j}\mid X_{j},\theta=1})=0$ otherwise. According to standard arguments from decision theory, (5.10) represents a lower bound on the attainable rate if the Hellinger distance between the competing data distributions $f_{X,Y;\theta}^{(n)}$ (for $\theta=0$ and $\theta_{1}$ , respectively) obeys an upper bound which is smaller than $1$ – uniformly with respect to $n$ , see e.g. [21]. Writing ${\cal H}$ for the Hellinger distance, it holds that

[TABLE]

as the distribution of the $X_{j}$ is identical for $\theta=0$ and $\theta=1$ . Then, the term (5.12) is bounded from above by

[TABLE]

as $\beta>1$ . We choose $\beta_{n}\asymp n^{1/[(2+\alpha)(1+\beta)]}$ so that the $\chi^{2}$ -distance between the joint densities of the observations under $\theta=0$ and $\theta=1$ in (5.13) is bounded from above as $n$ tends to infinity. By elementary decision theoretic arguments and by (5.10), a lower bound on the attainable convergence rate is given by

[TABLE]

which completes the proof of the theorem. ∎

Proof of Theorem 3.3.

We estimate

[TABLE]

where $\tilde{f}_{A}(a;h)$ is defined in (5.1). The second term - the regularization bias - is bounded in (5.2), and that bound is uniform in $a\in K$ from the assumptions on the function class ${\cal F}(K,\alpha,c_{A},c_{B},r_{A},c_{M})$ . For the first term we have, similarly to (5.3), that

[TABLE]

The second term in (5.14) is bounded by

[TABLE]

where $I_{j}(a)$ are defined as in (5.5), and the dependence on $a$ is stressed in the notation. The bounds on the $I_{j}(a)$ derived after (5.5) are uniform in $a$ over a bounded set $K$ . Thus, it remains to bound the first term in (5.14).

Given $\epsilon>0$ let $I_{\epsilon}$ be a subset of $K$ for which the $\epsilon$ -balls with centers at points in $I_{\epsilon}$ cover $K$ . It is possible to choose such a set with a cardinality of order $\text{card}\,(I_{\epsilon})\leq C_{K}\,\epsilon^{-2}$ , where $c_{K}>0$ depends on $K$ but not on $\epsilon$ . Then

[TABLE]

Since $\|\partial_{x}\,K(x;h)\|_{\infty}\leq\,h^{-3}$ , see the formula (2.5) for $K(\cdot;h)$ and the Assumption 2 in $w$ , by Lipschitz-continuity the second term is $\leq 8\,\epsilon^{2}\,h^{-6}$ . From the Hoeffding inequality, since $\|K(\cdot;h)\|_{\infty}\leq h^{-2}$ we obtain for $t>0$ that

[TABLE]

Set

[TABLE]

Then, for $\kappa>0$ we estimate

[TABLE]

Choose $\epsilon=n^{-2}$ and $\kappa=10^{1/2}$ . Then if $h^{-1}={\cal O}(n^{1/2})$ we obtain from Lemma 5.1 that

[TABLE]

and overall

[TABLE]

Plugging in the choices of $\delta$ and $h$ gives the result. ∎

5.2 Proofs for Section 4

Proof of Proposition 4.1.

From (5.7) and (5.2) we estimate

[TABLE]

Observe that from the term $\delta^{2}$ in the definition of ${\cal C}_{n}(\delta)$ ,

[TABLE]

Since $\hat{\delta}_{n}\leq\pi/4\leq 1$ , and since ${\cal C}_{n}(\delta)$ contains the term $\delta^{-1}\sum_{j,n,\delta}\,\,(Z_{(j+1)}-Z_{(j)})^{3}$ , from (5.16) and the choice of $\hat{h}_{n}$ we obtain the bound

[TABLE]

By definition of $\hat{\delta}_{n}$ ,

[TABLE]

for the deterministic choice $\delta_{n}=n^{-1/(\beta+1)}$ , which is contained in $[n^{-1/2},\pi/4]$ for sufficiently large $n$ since $\beta>1$ . Further, by Jensen’s inequality, Lemma 5.1 and the choice of $\delta_{n}$ ,

[TABLE]

Substituting these estimates into (5.17), and using (5.28) finally gives

[TABLE]

∎

Proof of Theorem 4.1.

Fix $0<\alpha$ with $2\,\lfloor\alpha\rfloor\leq l$ and $f_{A}\in{\cal F}(a,c_{A},c_{B},r_{A},\alpha,c_{M})$ , and set

[TABLE]

see the bound for the regularization bias in (5.2). We shall abbreviate $f_{A}(a)=f$ .

On the event

[TABLE]

where $\hat{f}_{\hat{k}}=0$ , we may estimate

[TABLE]

since $\hat{\delta}_{n}\leq\pi/4$ . In the following, suppose that there are two design points $Z_{j}$ in the interval $[-\pi/2+\hat{\delta}_{n},\pi/2-\hat{\delta}_{n}]$ . Since $h_{k}\geq\hat{\delta}^{1/2}_{n}$ for each $k\in\mathcal{K}_{n}$ , as in the proof of Proposition 4.1 the term involving $h_{k}^{-6}$ in (5.16) is negligible as compared to that with the factor $\hat{\delta}_{n}^{-1}\,h_{k}^{-4}$ . Hence using (5.7) and (5.2) we estimate

[TABLE]

Define the ‘oracle index’ $k^{*}$ by

[TABLE]

Note that $b(0,\alpha)=C_{\text{Bias}}^{2}(\alpha,w,c_{A},c_{M})\,\hat{\delta}_{n}^{\alpha}\leq\text{const.}$ since $\hat{\delta}_{n}^{\alpha}\leq 1$ , while $\sigma(0,n)=\delta_{n}^{-2}\,C_{n}(\hat{\delta}_{n})\,\log n\geq\,\log n$ since ${\cal C}_{n}(\hat{\delta}_{n})\,\hat{\delta}_{n}^{-2}\geq 1$ from the definition of ${\cal C}_{n}(\delta)$ . Further, since by the choice of $K$ we have that $q^{K}\geq n/q$ we estimate

[TABLE]

since $\hat{\delta}_{n}^{\alpha}\geq n^{-\alpha/2}$ by the choice of $\hat{\delta}_{n}$ . Finally,

[TABLE]

since ${\cal C}_{n}(\hat{\delta}_{n})\,\hat{\delta}_{n}^{-2}\leq\text{const.}\,\cdot n^{5/2}$ since from the definition of ${\cal C}_{n}(\delta)$ and since $\hat{\delta}_{n}\geq n^{-1/2}$ .

Since $b(k,\alpha)$ increase by factors $q^{2\alpha}$ in $k$ , and $\sigma(k,n)$ decrease by factors $q^{-4}$ in $k$ , it follows from the above estimates that $k^{*}\to\infty$ and $K-k^{*}\to\infty$ , and that there are constants $0<\tilde{c}_{1}<\tilde{c}_{2}$ such that $\tilde{c}_{1}\leq\sigma(k^{*},n)/b(k^{*},\alpha)\leq\tilde{c}_{2}$ . Rearranging yields

[TABLE]

for constants $c_{2}>c_{1}>0$ . We obtain from (5.18) that

[TABLE]

Now, for $\hat{f}_{\hat{k}}$ we estimate

[TABLE]

For the second term, we have that

[TABLE]

The second term in (5.22) is bounded by (5.20) after a trivial estimate of the indicator. Further, from the definition of $\hat{k}$ and (5.19) we have the bound

[TABLE]

which also holds in conditional expectation given $\sigma_{Z}$ .

For the first term in (5.21) we estimate

[TABLE]

Then

[TABLE]

Now let

[TABLE]

By choice of $k^{*}$ , for $0\leq l<k\leq k^{*}$ we have that

[TABLE]

Hence, setting $\tilde{f}_{k}=\tilde{f}_{A}(a;h_{k})$ we may estimate

[TABLE]

Therefore, for $0\leq l<k\leq k^{*}$ ,

[TABLE]

Since $\sigma(l,n)>\sigma(k,n)$ , $l<k$ , it suffices to bound

[TABLE]

By choice of the grid $\mathcal{K}_{n}$ , $h_{l}^{2}\geq h_{0}^{2}=\hat{\delta}_{n}$ , therefore

[TABLE]

for $n$ sufficiently large. Hence

[TABLE]

where $\tilde{C}=\big{(}C_{\text{Lep}}^{1/2}/4\,-1\big{)}$ . Using the bound $\|K(\cdot;h)\|_{\infty}\leq\,h^{-2}$ , see the formula (2.5) for $K(\cdot;h)$ and the Assumption 2 in $w$ , we use the conditional Hoeffding inequality in order to estimate

[TABLE]

see (5.4), where

[TABLE]

for the choice $C_{\text{Lep}}=20^{2}$ . Note that in this step, the logarithmic factor is essential.

Hence

[TABLE]

and in (5.23) we obtain the bound

[TABLE]

The crude bound

[TABLE]

now suffices to conclude that for sufficiently large choice of the constant $C_{\text{Lep}}$ ,

[TABLE]

The remainder of the proof is as that of Proposition 4.1. ∎

5.3 Spacings

As $Z_{j}=\arctan X_{j}$ the density of $Z_{j}$ equals

[TABLE]

so that (1.2) implies

[TABLE]

for some constants $C_{Z},c_{Z}>0$ .

Lemma 5.1.

If $f_{X}$ satisfies (1.2) and hence $f_{Z}$ fulfills (5.25), then for $\kappa>1$ we have that

[TABLE]

Furthermore,

[TABLE]

and for $\delta\leq\pi/4$ that

[TABLE]

Proof of Lemma 5.1.

Setting

[TABLE]

we deduce under (5.25) that

[TABLE]

that is, (5.26). Moreover we write $Z_{j}^{*}:=Z_{j}+\pi/2$ and $L_{n}^{*}(\delta):=L_{n}(\delta)+\pi/2$ so that

[TABLE]

as $\delta\downarrow 0$ . The term $\operatorname{\mathbb{E}}\big{[}\big{(}R_{n}(\delta)-\pi/2)^{2}\big{]}$ can be bounded analogously.

Concerning (5.28), we bound the probability that there is at most one observation in $[-\pi/2+\delta,\pi/2-\delta]$ for $\delta\leq\pi/4$ by

[TABLE]

which implies the result. ∎

Acknowledgements

The authors are grateful to the editors and a referee for their thorough review and very helpful and constructive comments. H. Holzmann gratefully acknowledges financial support of the DFG, grant Ho 3260/5-1.

Bibliography21

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Arellano, M. and Bonhomme, S. (2011). Identifying distributional characteristics in random coefficients panel data models. Rev. Econ. Stud. 79 , 987–1020.
2[2] Beran, R. Feuerverger, A. and Hall, P. (1996). On nonparametric estimation of intercept and slope distributions in random coefficient regression. Ann. Statist. 24 , 2569–2592.
3[3] Beran, R. and Hall, P. (1992). Estimating coefficient distributions in random coefficient regressions. Ann. Statist. 20 , 1970–1984.
4[4] Beran, R. and Millar, P.W. (1994). Minimum distance estimation in random coefficient regression models. Ann. Statist. 22 , 1976–1992.
5[5] Breunig, C. and Hoderlein, S. (2018). Specification testing in random coefficient models. Quant. Econ. 9 , 1371–1417.
6[6] Dunker, F., Eckle, K., Proksch, K. and Schmidt-Hieber, J. (2019). Tests for qualitative features in the random coefficients model. Elect. J. Statist. 13 , 2257–2306.
7[7] Gaillac, C. and Gautier, E. (2019). Adaptive estimation in the linear random coefficients model when regressors have limited variation. ar Xiv: 1905.06584 .
8[8] Gautier, E. and Hoderlein, S. (2011). A triangular treatment effect model with random coefficients in the selection equation. ar Xiv: 1109.0362 .

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Rate-optimal nonparametric estimation for random coefficient regression models

Abstract

keywords:

keywords:

1 Introduction

Assumption 1** (Design density).**

2 The estimator

Assumption 2** (Kernel).**

3 Upper and lower risk bounds

Definition**.**

Theorem 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

4 Adaptation

4.1 Adaptation with respect to β\betaβ for given smoothness

Proposition 4.1**.**

4.2 Adaptation by the Lepski method

Theorem 4.1**.**

5 Proofs

5.1 Proofs for Section 3

Proof of Theorem 3.1.

Proof of Theorem 3.2.

Proof of Theorem 3.3.

5.2 Proofs for Section 4

Proof of Proposition 4.1.

Proof of Theorem 4.1.

5.3 Spacings

Lemma 5.1**.**

Proof of Lemma 5.1.

Acknowledgements

Assumption 1 (Design density).

Assumption 2 (Kernel).

Definition.

Theorem 3.1.

Theorem 3.2.

Theorem 3.3.

4.1 Adaptation with respect to $\beta$ for given smoothness

Proposition 4.1.

Theorem 4.1.

Lemma 5.1.