Adaptive regression with Brownian path covariate

Karine Bertin; Nicolas Klutchnikoff

arXiv:1907.11284·math.ST·November 23, 2020

Adaptive regression with Brownian path covariate

Karine Bertin, Nicolas Klutchnikoff

PDF

TL;DR

This paper introduces an adaptive estimation method for regression functions with continuous outcomes and Wiener process covariates, utilizing Wiener-Itô decomposition and data-driven selection to achieve optimal convergence rates.

Contribution

It develops a new adaptive regression estimator for functional covariates based on Wiener-Itô decomposition, with proven minimax convergence rates and a data-driven selection procedure.

Findings

01

Achieves minimax convergence rates for the regression function estimation.

02

Provides an oracle inequality leading to adaptive estimation.

03

Demonstrates the effectiveness of the proposed method on Wiener process covariates.

Abstract

This paper deals with estimation with functional covariates. More precisely, we aim at estimating the regression function $m$ of a continuous outcome $Y$ against a standard Wiener coprocess $W$ . Following Cadre and Truquet (2015) and Cadre, Klutchnikoff, and Massiot (2017) the Wiener-It\^o decomposition of $m (W)$ is used to construct a family of estimators. The minimax rate of convergence over specific smoothness classes is obtained. A data-driven selection procedure is defined following the ideas developed by Goldenshluger and Lepski (2011). An oracle-type inequality is obtained which leads to adaptive results.

Equations444

Y = m (W) + ε

Y = m (W) + ε

\tilde{m}_{h} (w) = i = 1 \sum n Y_{i} \frac{I _{{d (W_{i}, w) \leq h}}}{\sum _{j = 1}^{n} I _{{d (W_{j}, w) \leq h}}},

\tilde{m}_{h} (w) = i = 1 \sum n Y_{i} \frac{I _{{d (W_{i}, w) \leq h}}}{\sum _{j = 1}^{n} I _{{d (W_{j}, w) \leq h}}},

h^{β} + (\frac{1}{n φ _{w} ( h )})^{1/2}

h^{β} + (\frac{1}{n φ _{w} ( h )})^{1/2}

lo g P (t \in [0, 1] sup ∣ W (t) ∣ \leq h) = lo g φ_{0} (h) h \to 0 ≍ - h^{- 2}

lo g P (t \in [0, 1] sup ∣ W (t) ∣ \leq h) = lo g φ_{0} (h) h \to 0 ≍ - h^{- 2}

m (W) = L^{2} E (Y) + ℓ = 1 \sum \infty \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

m (W) = L^{2} E (Y) + ℓ = 1 \sum \infty \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

I_{ℓ} (f_{ℓ}) (W) = \int_{Δ_{ℓ}} f_{ℓ} d W^{\otimes ℓ} = \int_{Δ_{ℓ}} f_{ℓ} (u_{1}, \dots, u_{ℓ}) W (d u_{1}) \dots W (d u_{ℓ}) .

I_{ℓ} (f_{ℓ}) (W) = \int_{Δ_{ℓ}} f_{ℓ} d W^{\otimes ℓ} = \int_{Δ_{ℓ}} f_{ℓ} (u_{1}, \dots, u_{ℓ}) W (d u_{1}) \dots W (d u_{ℓ}) .

\overset{m}{^}_{L} (W) = \frac{1}{n} i = 1 \sum n Y_{i} + ℓ = 1 \sum L \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

\overset{m}{^}_{L} (W) = \frac{1}{n} i = 1 \sum n Y_{i} + ℓ = 1 \sum L \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

ℓ = 1 \sum \infty \frac{e ^{2 γ ℓ}}{ℓ !} ∥ f_{ℓ} ∥_{Δ_{ℓ}}^{2} \leq M^{2},

ℓ = 1 \sum \infty \frac{e ^{2 γ ℓ}}{ℓ !} ∥ f_{ℓ} ∥_{Δ_{ℓ}}^{2} \leq M^{2},

Y = m (W) + ε

Y = m (W) + ε

m (W) = E (Y) + ℓ = 1 \sum \infty \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

m (W) = E (Y) + ℓ = 1 \sum \infty \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

D^{α} f = \frac{\partial ^{∣ α ∣} f}{\partial x _{1}^{α_{1}} \dots \partial x _{ℓ}^{α_{ℓ}}} .

D^{α} f = \frac{\partial ^{∣ α ∣} f}{\partial x _{1}^{α_{1}} \dots \partial x _{ℓ}^{α_{ℓ}}} .

∣ α ∣ = ⌊ s_{ℓ} ⌋ \sum ∣ D^{α} f (x) - D^{α} f (y) ∣ \leq Λ_{ℓ} ∣ x - y ∣^{s_{ℓ} - ⌊ s_{ℓ} ⌋},

∣ α ∣ = ⌊ s_{ℓ} ⌋ \sum ∣ D^{α} f (x) - D^{α} f (y) ∣ \leq Λ_{ℓ} ∣ x - y ∣^{s_{ℓ} - ⌊ s_{ℓ} ⌋},

m (W) = L^{2} a + ℓ = 1 \sum \infty \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

m (W) = L^{2} a + ℓ = 1 \sum \infty \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

ℓ = 1 \sum \infty \frac{e ^{2 γ ℓ}}{ℓ !} ∥ f_{ℓ} ∥_{Δ_{ℓ}}^{2} \leq M^{2}

ℓ = 1 \sum \infty \frac{e ^{2 γ ℓ}}{ℓ !} ∥ f_{ℓ} ∥_{Δ_{ℓ}}^{2} \leq M^{2}

m (W) \in k \geq 0 1 < p < e^{γ} + 1 ⋂ D_{k, p}

m (W) \in k \geq 0 1 < p < e^{γ} + 1 ⋂ D_{k, p}

m (W) = a + ℓ = 1 \sum L \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

m (W) = a + ℓ = 1 \sum L \frac{1}{ℓ !} I_{ℓ} (f_{ℓ}) (W),

R_{p}(\tilde{m}_{n},m)=\big{(}\mathbf{E}|\tilde{m}_{n}(W)-m(W)|^{p}\big{)}^{1/p}

R_{p}(\tilde{m}_{n},m)=\big{(}\mathbf{E}|\tilde{m}_{n}(W)-m(W)|^{p}\big{)}^{1/p}

R_{p} (\tilde{m}_{n}, M) = m \in M sup R_{p} (\tilde{m}_{n}, m),

R_{p} (\tilde{m}_{n}, M) = m \in M sup R_{p} (\tilde{m}_{n}, m),

Φ_{n} (M, p) = \tilde{m}_{n} in f R_{p} (\tilde{m}_{n}, M) .

Φ_{n} (M, p) = \tilde{m}_{n} in f R_{p} (\tilde{m}_{n}, M) .

R_{p} (m_{n}^{*}, m) \leq η \in H in f R_{p} (\tilde{m}_{n, η}, m),

R_{p} (m_{n}^{*}, m) \leq η \in H in f R_{p} (\tilde{m}_{n, η}, m),

R_{p} (m_{n}^{*}, m) \leq Υ_{1, p} η \in H in f R_{p}^{*} (m, η) + Υ_{2, p} (\frac{lo g n}{n})^{1/2},

R_{p} (m_{n}^{*}, m) \leq Υ_{1, p} η \in H in f R_{p}^{*} (m, η) + Υ_{2, p} (\frac{lo g n}{n})^{1/2},

Y = g (X) + ε,

Y = g (X) + ε,

d X_{t} = b (t, X_{t}) d t + σ (t, X_{t}) d W_{t}, 0 \leq t \leq 1.

d X_{t} = b (t, X_{t}) d t + σ (t, X_{t}) d W_{t}, 0 \leq t \leq 1.

W_{t} = \int_{0}^{t} \frac{d X _{s}}{σ ( s , X _{s} )} - \int_{0}^{t} \frac{μ ( s , X _{s} )}{σ ( s , X _{s} )} d s .

W_{t} = \int_{0}^{t} \frac{d X _{s}}{σ ( s , X _{s} )} - \int_{0}^{t} \frac{μ ( s , X _{s} )}{σ ( s , X _{s} )} d s .

{d X_{t} = - θ (X_{t} - μ) d t + σ d W_{t} X_{0} = x_{0},

{d X_{t} = - θ (X_{t} - μ) d t + σ d W_{t} X_{0} = x_{0},

W_{t} = \frac{1}{σ} [X_{t} - x_{0} + θ \int_{0}^{t} (X_{s} - μ) d s] .

W_{t} = \frac{1}{σ} [X_{t} - x_{0} + θ \int_{0}^{t} (X_{s} - μ) d s] .

{d X_{t} = X_{t} (μ d t + σ d W_{t}) X_{0} = x_{0} .

{d X_{t} = X_{t} (μ d t + σ d W_{t}) X_{0} = x_{0} .

W_{t} = \frac{1}{σ} [lo g (X_{t} / x_{0}) + (\frac{σ ^{2}}{2} - μ) t] .

W_{t} = \frac{1}{σ} [lo g (X_{t} / x_{0}) + (\frac{σ ^{2}}{2} - μ) t] .

Y = m (W) + ε where m = g \circ ϕ^{- 1} .

Y = m (W) + ε where m = g \circ ϕ^{- 1} .

\overset{g}{^} = g \circ ϕ^{- 1} \circ ϕ = \overset{m}{^} \circ ϕ .

\overset{g}{^} = g \circ ϕ^{- 1} \circ ϕ = \overset{m}{^} \circ ϕ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Adaptive regression with Brownian path covariate

Karine Bertin CIMFAV-INGEMAT, Universidad de Valparaíso, General Cruz 222, Valparaíso, Chile, [email protected]

Nicolas Klutchnikoff Univ Rennes, CNRS, IRMAR – UMR 6625, F-35000 Rennes, France, [email protected]

Abstract

This paper deals with estimation with functional covariates. More precisely, we aim at estimating the regression function $m$ of a continuous outcome $Y$ against a standard Wiener coprocess $W$ . Following Cadre and Truquet (2015) and Cadre et al. (2017) the Wiener-Itô decomposition of $m(W)$ is used to construct a family of estimators. The minimax rate of convergence over specific smoothness classes is obtained. A data-driven selection procedure is defined following the ideas developed by Goldenshluger and Lepski (2011). An oracle-type inequality is obtained which leads to adaptive results.

Keywords: Functional regression, Wiener-Itô chaos expansion, Oracle inequalities, Adaptive minimax rates of convergence.

AMS Subject Classification: 62G08, 62H12

1 Introduction

The problem of regression estimation is one of the most studied in statistics and different models have been considered depending on the nature of the data. In an increasing number of applications, it seems natural to assume that the covariate takes values in a functional space. The book of Ramsay and Silverman (2005) provides an overview on the subject of functional data analysis. In this context, several authors studied linear functional regression models (see for example Müller and Stadtmüller, 2005; Cai and Hall, 2006; Crambes et al., 2009). Nonparametric functional regression models have also been investigated (see Ferraty and Vieu, 2006, and references therein). In this paper, we are interested in such a model where the covariate is a Wiener Process. More precisely, let $\varepsilon$ be a real-valued random variable and $W=(W(t):0\leq t\leq 1)$ be a standard Brownian motion independent of $\varepsilon$ . We define

[TABLE]

where $m:\mathcal{C}\to\mathbb{R}$ is a mapping defined on the set $\mathcal{C}$ of all continuous functions $w:[0,1]\to\mathbb{R}$ and we assume that both $m(W)$ and $\varepsilon$ are square integrable random variables. Our goal is to estimate the function $m$ using a dataset $(Y_{1},W_{1}),\ldots,(Y_{n},W_{n})$ of independent realizations of $(Y,W)$ .

Since this framework is a specific case of the more general functional regression framework, usual approaches (which mainly consist in extending classical local methods such as $k$ -nearest neighbors, kernel smoothing or local polynomial smoothing) could be used. However, in our context, these methods are known to lead to slow rates of convergence over classical models (see below for detailed references). Taking advantage of the probabilistic properties of the Wiener coprocess, we aim at defining a new family of models as well as dedicated estimation procedures with faster rates of convergence (in both minimax and adaptive minimax senses). Despite the fact that considering Brownian paths covariates seems restrictive for pratical purposes, several Brownian diffusion paths could also be considered. Albeilt the systematic theoretical study of such models is beyond the scope of this paper (and is left to further developments), we propose some extensions of our framework as well as some examples of usual processes that can be considered, such as geometric Brownian motions or Ornstein-Uhlenbeck processes.

In usual functional approaches the set $\mathcal{C}$ is endowed with a metric $d$ (see for example Ferraty and Vieu, 2006; Ferraty et al., 2007; Biau et al., 2010) which allows to extend several nonparametric estimators. For example a simple version of the Nadaraya-Watson estimator is given, for any function $w\in\mathcal{C}$ and any bandwidth $h>0$ , by:

[TABLE]

where $\mathbf{I}$ stands for the indicator function. The properties of these estimators are related to the behavior of a quantity known as the small ball probability defined for $w\in\mathcal{C}$ and $h>0$ by $\varphi_{w}(h)=\mathbf{P}(d(W,w)\leq h)$ . Pointwise risks of such methods can be generally bounded, up to a positive factor by

[TABLE]

where $\beta$ denotes the smoothness of the mapping $m$ measured in a Hölder sense. For example, if $0<\beta\leq 1$ it is assumed that there exists $L>0$ such that $|m(w)-m(w^{\prime})|\leq Ld(w,w^{\prime})^{\beta}$ for any $w,w^{\prime}\in\mathcal{C}$ . Under additional assumptions similar results can be obtained for integrated risks.

The classical assumption $\varphi_{w}(h)\asymp h^{k}$ corresponds roughly to the situation where the covariate $W$ lies in some space of finite dimension $k$ (see Azaïs and Fort, 2013). This framework corresponds to the usual nonparametric case. The minimax rates of convergence are then given by $n^{-\beta/(2\beta+k)}$ (see Tsybakov, 2009). However if $W$ lies in a functional space, the behavior of $\varphi_{w}(h)$ is quite different. In our context, where $W$ is a standard Wiener process, it is well-known (see Li and Shao, 2001) that

[TABLE]

which leads to slower rates of convergence of the form $(\log n)^{-\beta/2}$ assuming a $\beta-$ Hölder condition on $m$ . We refer the reader to Chagny and Roche (2016) for recent results with different behavior of $\varphi_{w}(h)$ .

In practical situations, since $\beta$ is unknown, finding adaptive procedures to select the smoothing parameter $h$ is of prime interest. To our best knowledge few papers deal with this problem. Adaptive procedures based on cross validation have been used in Rachdi and Vieu (2007). Chagny and Roche (2016) also propose an adaptation of the method developed by Goldenshluger and Lepski (see Goldenshluger and Lepski, 2011) using an empirical version of the quantity $\varphi_{w}(h)$ . Lower bounds have been investigated by Mas (2012). In all these papers, the pointwise risk is studied in terms of $\varphi_{w}(h)$ and theoretical properties are obtained assuming a $\beta$ -Hölder condition on $m$ with respect to the metric $d$ with smoothness $\beta\in(0,1]$ .

In this paper we follow a different strategy. Taking advantage of probabilistic properties of the Wiener process, similarly to the methodology developed by Cadre and Truquet (2015) and Cadre et al. (2017), we consider the Wiener-Itô chaotic decomposition of $m(W)$ . Indeed, every random variable that belongs to $\mathbb{L}^{2}_{W}=\{\mathfrak{m}(W)\mid\mathfrak{m}:\mathcal{C}\to\mathbb{R}\text{ and }\mathbf{E}(\mathfrak{m}(W))^{2}<+\infty\}$ can be decomposed as a sum of multiple stochastic integrals (see Di Nunno et al., 2009, for more details). There exists a unique sequence of functions $(f_{\ell})_{\ell\geq 1}$ such that

[TABLE]

where $f_{\ell}$ belongs to ${\mathbf{L}^{2}_{\mathrm{sym}}}(\Delta_{\ell})$ , the set of symmetric and square integrable real-valued functions defined on $\Delta_{\ell}=[0,1]^{\ell}$ and

[TABLE]

We recall that $f$ is symmetric on $\Delta_{\ell}$ if for any $(t_{1},\dots,t_{\ell})\in\Delta_{\ell}$ and any permutation $\sigma$ of $\{1,2,\dots,\ell\}$ , $f(t_{1},\dots,t_{\ell})=f(t_{\sigma(1)},\dots,t_{\sigma(\ell)})$ . Note that the symmetry implies that the functions $f_{\ell}$ are isotropic. The iterated integral $I_{\ell}(f_{\ell})(W)$ is called a chaos of order $\ell$ .

Our approach consists in defining kernel-type estimators $\widetilde{f}_{\ell}$ of $f_{\ell}$ using the Itô’s isometry, see (14). Then, based on (2), we propose the following estimator of $m$

[TABLE]

with $\mathcal{L}\in\mathbb{N}$ . To study these estimators, we assume that $m$ belongs to a specific class of mappings that satisfy

[TABLE]

for some $\gamma>0$ and $M>0$ , $\|\cdot\|_{\Delta_{\ell}}$ is the classical $L_{2}$ norm on $\Delta_{\ell}$ and that the $f_{\ell}$ defined by (2) are Hölderian. Such classes are quite natural in our context and are connected with the usual Meyer-Watanabe test function space (see section 2.1 for more details).

In this case, we find rates of convergence for the prediction error in $\mathbb{L}^{p}$ norm. Contrary to the classical functional framework, where logarithmic rates are derived, the rates we obtain are intermediate between logarithmic and polynomial rates.

If we assume moreover that the summation in (2) stops at a known index $L$ , we prove that the estimators $\hat{m}_{{L}}$ achieve optimal rates of convergence. We derive minimax rates of convergence which are polynomial in $n$ with an exponent that depends on the smoothness of the functions $f_{\ell}$ . A data-driven procedure, based on the method developed by Goldenshluger and Lepski (2011), is then defined to tune the bandwidths used in the estimation of the functions $f_{\ell}$ . The resulting estimator of $m$ satisfies an oracle-type inequality that allows us to derive adaptive results.

The paper is organized as follows. Section 2 presents the model and the studied problem. Section 3 describes the construction of the estimators. Section 4 gives the main results and Section 5 is dedicated to the proofs.

2 Statistical framework

2.1 Model

Let $W=(W(t):0\leq t\leq 1)$ be a standard Brownian motion and let $\varepsilon$ be a centered real-valued random variable independent of $W$ . We define:

[TABLE]

where $m:\mathcal{C}\to\mathbb{R}$ is a given mapping. We assume that $m(W)$ as well as $\varepsilon$ belong to $\mathbb{L}^{2}$ , the set of square integrable random variables, then

[TABLE]

where for $\ell\in\mathbb{N}$ , $f_{\ell}$ belongs to ${\mathbf{L}^{2}_{\mathrm{sym}}}(\Delta_{\ell})$ . As mentioned in the introduction we also assume that $f_{\ell}$ is a regular function. Below we define precisely the functional classes used to measure the smoothness of each function $f_{\ell}$ .

Definition 1

Set $\ell\in\mathbb{N}$ $s_{\ell}>0$ and $\Lambda_{\ell}>0$ . The Hölder ball $\mathcal{H}_{\ell}(s_{\ell},\Lambda_{\ell})$ is the set of all functions $f:\Delta_{\ell}\to\mathbb{R}$ that satisfy the following properties:

For any $\alpha=(\alpha_{1},\dotsc,\alpha_{\ell})\in\mathbb{N}^{\ell}$ such that $|\alpha|=\sum_{i}\alpha_{i}\leq\lfloor s_{\ell}\rfloor=\max\{k\in\mathbb{N}\mid k<s_{\ell}\}$ , the partial derivative $D^{\alpha}f$ exists where

[TABLE] 2. 2.

For any $x$ and $y$ in $\Delta_{\ell}$ we have:

[TABLE]

where $|\cdot|$ stands for the Euclidean norm of $\mathbb{R}^{\ell}$ . 3. 3.

We have $f\in{\mathbf{L}^{2}_{\mathrm{sym}}}(\Delta_{\ell})$ .

Equipped with these notations we can define a scale of classes $\mathfrak{A}(s,\Lambda,\gamma,M)$ for the mapping $m$ . Roughly, we impose some restrictions on the functions $f_{\ell}$ that appear in (2) of two kinds: a minimal smoothness, for each $f_{\ell}$ , is imposed and the growth of the $\mathbf{L}^{2}$ -norm of the $f_{\ell}$ is controlled.

Definition 2

Set $s=(s_{1},s_{2},\dotsc)\in(0,+\infty)^{\mathbb{N}}$ , $\Lambda=(\Lambda_{1},\Lambda_{2},\dotsc)\in(0,+\infty)^{\mathbb{N}}$ , $\gamma>0$ and $M>0$ . We say that $m:\mathcal{C}\to\mathbb{R}$ belongs to the mapping class $\mathfrak{A}(s,\Lambda,\gamma,M)$ if there exist $a\in\mathbb{R}$ and a sequence of functions $(f_{\ell})_{\ell\in\mathbb{N}}$ satisfying

[TABLE]

with $f_{\ell}\in\mathcal{H}_{\ell}(s_{\ell},\Lambda_{\ell})$ and

[TABLE]

where $\|f_{\ell}\|_{\Delta_{\ell}}=\left(\int_{\Delta_{\ell}}f^{2}_{\ell}(u)\,\mathrm{d}u\right)^{1/2}$ .

Remark 1

Equation (5) implies that

[TABLE]

where $\mathbb{D}_{k,p}$ denotes the usual Sobolev space over the Wiener space defined in Watanabe (1984). Note also that, if, for any $\ell\geq 1$ we have $\|f_{\ell}\|_{\Delta_{\ell}}\leq C^{\ell}$ for some positive constant $C$ , then (5) is fulfilled for any $\gamma\geq 0$ .

We also define subclasses of the classes $\mathfrak{A}(s,\Lambda,\gamma,M)$ assuming that the summation in (2) stops at a finite index $L\in\mathbb{N}$ .

Definition 3

Set $L\in\mathbb{N}$ . Set $s=(s_{1},\dotsc,s_{L})\in(0,+\infty)^{L}$ and $\Lambda=(\Lambda_{1},\dotsc,\Lambda_{L})\in(0,+\infty)^{L}$ . We say that $m:\mathcal{C}\to\mathbb{R}$ belongs to the mapping class $\mathfrak{M}(s,\Lambda,L,M)$ if there exist $a\in\mathbb{R}$ and a sequence of functions $(f_{\ell})_{1\leq\ell\leq L}$ satisfying

[TABLE]

with $f_{\ell}\in\mathcal{H}_{\ell}(s_{\ell},\Lambda_{\ell})$ and $\|f_{\ell}\|^{2}_{\Delta_{\ell}}\leq M^{2}\ell!$ for any $\ell=1,\dotsc,L$ .

More precisely, for any $s$ , $\Lambda$ and $M$ , the subclasses $\mathfrak{M}(s,\Lambda,L,M)$ satisfy $\mathfrak{M}(s,\Lambda,L,M)\subset\mathfrak{A}(s,\Lambda,\gamma,M^{\prime})$ with $M^{\prime 2}=M^{2}\sum_{\ell=1}^{L}e^{-2\gamma\ell}$ . Let us comment on the above definitions since the framework we consider in this paper is quite different to the usual functional framework recalled in the introduction. In our framework the “regularity” of a map $m$ is seen through the prism of the chaotic decomposition of $m(W)$ and, thus, the functions $f_{\ell}$ . This is not directly linked with the regularity of the mapping $m$ between the space $\mathcal{C}$ endowed with the topology induced by the $\mathbf{L}^{2}$ norm and $\mathbb{R}$ . For example, it can be easily seen that the mapping $m$ defined, for any $w\in\mathcal{C}$ by $m(w)=w^{2}(1)$ is not continuous (which implies that this function is not hölderian and, thus, cannot be considered in the usual framework). However it is well known that $m(W)=1+I_{2}(1)(W)$ . As a consequence, the mapping $m$ falls within our scope since $m$ belongs to $\mathcal{M}(s,\Lambda,2,1)$ for any $s$ and $\Lambda$ .

2.2 Minimax and adaptive framework

The observations consist in a $n$ -sample $(Y_{1},W_{1}),\dotsc,(Y_{n},W_{n})$ distributed as and independent of $(Y,W)$ . Our first goal is to investigate the estimation of $m$ , based on these observations, over the classes $\mathfrak{M}(s,\Lambda,L,M)$ and $\mathfrak{A}(s,\Lambda,\gamma,M)$ for $0<s\leq s^{*}$ where ${s^{*}}$ is fixed. To measure the accuracy of an arbitrary estimator $\tilde{m}_{n}=\tilde{m}(\;\cdot\;;(Y_{1},W_{1}),\dotsc,(Y_{n},W_{n}))$ of $m$ , we consider the prediction risk:

[TABLE]

where $p\geq 2$ . The maximal risk of an arbitrary estimator $\tilde{m}_{n}$ over a given class of mappings $\mathfrak{M}$ is defined by:

[TABLE]

whereas the minimax risk is defined, taking the infimum over all possible estimators, by:

[TABLE]

An estimator $\tilde{m}_{n}$ whose maximal risk is asymptotically bounded, up to a multiplicative factor, by $\Phi_{n}(\mathfrak{M},p)$ is called minimax over $\mathfrak{M}$ . Such an estimator is well-adapted to the estimation over $\mathfrak{M}$ but it can perform poorly over another class of mappings. The problem of adaptive estimation consists in finding a single estimation procedure that is simultaneously minimax over a scale of mapping classes.

Our second goal is to investigate the adaptive estimation of $m$ over the scale of classes $\mathcal{M}(s^{*},L,M)=\{\mathfrak{M}_{s,\Lambda,L,M}\mid s\in(0,{s^{*}})^{L},\Lambda\in(0,+\infty)^{L}\}$ where ${s^{*}}>0$ , $L\in\mathbb{N}$ and $M>0$ are fixed and known by the statistician. More precisely our goal is to construct a single estimation procedure $m_{n}^{*}$ such that, for any $\mathfrak{M}\in\mathcal{M}(s^{*},L,M)$ , the risk $R_{p}(m_{n}^{*},\mathfrak{M})$ is asymptotically bounded, up to a multiplicative constant, by $\Phi_{n}(\mathfrak{M},p)$ . One of the main tools to prove such a result is to find an oracle-type inequality that guarantees that this procedure performs almost as well as the best estimator in a rich family of estimators. Ideally, we would like to have, for any $\mathfrak{m}\in\bigcup\mathfrak{M}_{s,\Lambda,L,M}$ , an inequality of the following form:

[TABLE]

where $\{\tilde{m}_{n,\eta}\mid\eta\in H\}$ is a family of estimators well-adapted to our problem in the following sense: for any $\mathfrak{M}\in\mathcal{M}(s^{*},L,M)$ , there exists $\eta\in H$ such that $\tilde{m}_{n,\eta}$ is minimax over $\mathfrak{M}$ . However, in many situations, (8) is relaxed and we prove a weaker inequality of the type:

[TABLE]

where $\Upsilon_{\!1,p}$ and $\Upsilon_{\!2,p}$ are two positive constants and $R_{p}^{*}(\mathfrak{m},\eta)$ is an appropriate quantity to be determined that can be viewed as a tight upper bound on $R_{p}(\tilde{m}_{n,\eta},\mathfrak{m})$ . Inequalities of the form (9) are called oracle-type inequalities.

Theorems 3 and 4 below correspond respectively to an oracle-type inequality and an adaptive result of these types.

2.3 Extensions to our model

In this paper, we focus on pure Brownian coprocesses. However our framework allows us to consider a larger class of covariates. Assume that we aim at estimating the regression function $g:\mathcal{C}\to\mathbb{R}$ in the model:

[TABLE]

where $X$ is a process driven by the SDE:

[TABLE]

Here $\sigma$ and $b$ are assumed to be known functions and we also assume that assumptions guaranteeing the existence and uniqueness of the solution of (10) are fulfilled. If for any $0\leq t\leq 1$ , $\sigma(t,X_{t})>0$ , then, under mild integrability conditions, we have:

[TABLE]

This implies that there exists a known invertible function $\phi:\mathcal{C}\to\mathcal{C}$ such that $W=\phi(X)$ . In general, this function can be computed by numerical integration. However, in some situations, an exact expression can be obtained using Itô’s formula. This is the case for two parametric families of processes widely used to model several practical situations. First, Ornstein–Uhlenbeck processes $X_{t}$ are driven by the following SDE:

[TABLE]

where $x_{0}\in\mathbb{R}$ is fixed and $\theta>0$ , $\mu\in\mathbb{R}$ and $\sigma>0$ are known parameters. By Itô’s formula we have:

[TABLE]

Next, Geometric Brownian motions are used to model stock prices in the Black–Scholes model. Let $\mu\in\mathbb{R}$ and $\sigma>0$ be given parameters. We assume that the process $X=(X_{t}:t\in I)$ is driven by the following SDE:

[TABLE]

By Itô’s formula we have:

[TABLE]

Remark 2

In practical situation the parameters $\mu$ , $\sigma$ and $\theta$ in the above examples are not known. However, estimators of these parameters could be used to estimate the coprocess $W$ . This leads to new models where the covariate in observed with errors. The study of such models is beyond the scope of this paper and left to further developments.

In view of (12), equation (10) can be written as:

[TABLE]

Thus, the regression problem (10) falls into our framework. The estimation strategy consists of estimating the function $m$ based on the reconstruction of Brownian path $W=\phi(X)$ . This can be summarized by the formula:

[TABLE]

Remark also that, in this context, it is relevant to assume that the chaotic decomposition of $m(W)$ is finite. Indeed, under mild assumptions on $b$ and $\sigma$ (see Hu, 1997, for more details), if $g(X)$ is a polynomial of the terminal value $X_{1}$ of the process $X$ , then the mapping $m(W)=g(X)$ can be written as a finite chaotic decomposition with smooth functions $f_{\ell}$ .

3 Estimator construction

In this section we present our estimation procedure. To do so, we first recall classical properties satisfied by Wiener chaos which allow us to construct a family of “simple” estimators that depends on a multivariate tuning parameter. Next we construct a procedure which selects, in a data-driven way, this tuning parameter using the methodology developed by Goldenshluger and Lepski (2011).

3.1 Classical properties of the chaos

Throughout this paper and in the construction of our statistical procedure, we use the following two fundamental properties satisfied by the iterated integrals.

For $\ell,\ell^{\prime}\in\mathbb{N}$ , Itô’s isometry (Di Nunno et al., 2009) ensures that, if $g\in{\mathbf{L}^{2}_{\mathrm{sym}}}(\Delta_{\ell})$ and $g^{\prime}\in{\mathbf{L}^{2}_{\mathrm{sym}}}(\Delta_{\ell^{\prime}})$ , then

[TABLE]

where $\delta_{\ell,\ell^{\prime}}$ denotes the Kronecker delta.

The hypercontractivity property (Nourdin and Peccati, 2012) will be used to control the concentration of our estimators. Set $q\geq 2$ and $\ell\in\mathbb{N}$ . For any $g\in{\mathbf{L}^{2}_{\mathrm{sym}}}(\Delta_{\ell})$ we have:

[TABLE]

3.2 A simple family of estimators

Let $\mathbf{k}:\mathbb{R}\to\mathbb{R}$ be a function that satisfies the following properties: $\mathbf{k}$ is continuous inside $[0,1]$ , $\mathbf{k}(x)=0$ for any $x\notin[0,1]$ ,

[TABLE]

Let $\ell\in\mathbb{N}$ . A natural estimator of the function $f_{\ell}$ is given, for $h\in(0,1)$ , by:

[TABLE]

where $K^{(\ell)}_{h}$ is a multivariate kernel defined by:

[TABLE]

This specific construction allows one to obtain an estimator free of boundary bias (see Bertin et al., 2019, for more details).

Indeed, note that for any $t\in\Delta_{\ell}$ and under regularity assumptions on $f_{\ell}$ we have:

[TABLE]

where the last two lines are obtained using (2) and (14). Since $\varepsilon$ is centered and independent of $W$ we have:

[TABLE]

Equipped with these notations we define a family of plugin estimators of the mapping $m$ . For $\mathcal{L}\in\mathbb{N}$ and all $\boldsymbol{h}=(h_{1},\dotsc,h_{\mathcal{L}})\in(0,1)^{\mathcal{L}}$ we set:

[TABLE]

where $\overline{Y}_{\!\!n}=\sum_{i=1}^{n}Y_{i}/n$ . In the following, we study the rate of convergence of the estimator (22) when $\mathcal{L}=L_{n}$ where $(L_{n})_{n\in\mathbb{N}}$ is a sequence of integers that tends to $\infty$ as $n$ tends to $\infty$ (see Theorem 1) and $\mathcal{L}=L$ where $L$ is a known fixed integer (see Theorem 2).

3.3 Selection procedure

Set ${s_{*}}>0$ , $L\in\mathbb{N}$ and $M>0$ . Assume that $\mu_{4}=\left(\mathbf{E}|\varepsilon|^{4}\right)^{1/4}$ exists. Let $\ell\in\{1,\ldots,L\}$ be fixed and define

[TABLE]

Now, define

[TABLE]

where

[TABLE]

where the constants $\mathfrak{c}_{\ell}(k)$ are defined in (15). Define for $h\in\mathbf{H}_{\ell}$

[TABLE]

and set

[TABLE]

The estimation procedure is the defined by $\hat{m}_{L}=\hat{m}_{\boldsymbol{\hat{h}},L}$ where $\boldsymbol{\hat{h}}=\left(\hat{h}_{1},\ldots,\hat{h}_{L}\right)$ .

Remark 3

This selection rule follows the principles and the ideas developed by Goldenshluger and Lepski in a series of papers (see Goldenshluger and Lepski, 2011, 2014, among others). The quantity $M(\ell,h)$ , which is called a majorant in the papers cited above, is a penalized version of the standard deviation of the estimator $\hat{f}^{(\ell)}_{h}$ while the quantity $B(\ell,h)$ is, in some sense, closed to its bias term, see (83). Finding tight majorants is the key point of the method since $\hat{h}_{\ell}$ is chosen in (24) in order to realize an empirical trade-off between these two quantities.

It is worth noting that the procedure depends on a hyperparameter ${s_{*}}>0$ which can be chosen arbitrary small. The introduction of this parameter is due to technical reasons, see (106) in the proof of Lemma 2. This additional assumption (we would like to take ${s_{*}}=0$ ) implies some restrictions on Theorem 4 below.

4 Main results

4.1 Result for the infinite chaos model

Our first result studies the risk of our family of estimators over the class $\mathfrak{A}(s,\Lambda,\gamma,M)$ . In this class, the function $m$ is decomposed into an infinite sum of chaos:

[TABLE]

Theorem 1

Set $p\geq 2$ , $\Lambda_{*}>0$ and ${s^{*}}>0$ . Set $s\in(0,{s^{*}})^{\mathbb{N}}$ , $\Lambda\in(\Lambda_{*},+\infty)^{\mathbb{N}}$ and $M>0$ and let $\gamma$ be such that $2\gamma>\max(2s^{*}+\log(p-1),\log(3))$ . Assume that $\mu_{p}=\mathbf{E}|\varepsilon|^{p}<+\infty$ . Define

[TABLE]

where $[\cdot]$ denotes the integer part and $\boldsymbol{h}_{n}=\big{(}h_{n}^{(\ell)}(s,\Lambda)\big{)}_{\ell=1,\dotsc,L_{n}}\in(0,1)^{L_{n}}$ where for any $\ell=1,\dotsc,L_{n}$ :

[TABLE]

There exists a positive constant $\kappa$ depending on ${s^{*}}$ , $\Lambda_{*}$ , $\mu_{2}$ , $\mu_{p}$ , $\gamma$ and $M$ such that

[TABLE]

Let us briefly comment on this result. Assume first that the parameters $s_{\ell}$ are constant and denote by $s_{0}$ their common value. In this case we obtain

[TABLE]

This implies that, for $n$ large enough, $R_{p}\big{(}\hat{m}_{\boldsymbol{h}_{n},L_{n}},\mathfrak{A}(s,\Lambda,\gamma,M)\big{)}$ , is upperbounded, up to a multiplicative constant by

[TABLE]

Remark that such a rate of convergence lies in-between polylogarithmic rates of convergence and polynomial ones. This result can be compared with those obtained by Cadre and Truquet (2015). Recall that, in this paper, the authors study a similar model with a Poisson point process covariate. The rates $v_{n}$ obtained in this paper are slightly better than ours since they obtain, for some $\alpha\in(0,1)$

[TABLE]

whereas, in our case,

[TABLE]

However remark that their study is limited to $p=2$ and $s_{0}=1$ and that, moreover, they assume that the response $Y$ is a bounded variable. In our situation neither $m(W)$ nor $\varepsilon$ are assumed to be bounded.

4.2 Results for finite chaos model

In the three following results, we assume that it exists a known integer $L\in\mathbb{N}$ such that

[TABLE]

Our second result proves that the minimax rate of convergence over the class $\mathfrak{M}_{s,\Lambda,L,M}$ is of the same order as:

[TABLE]

Theorem 2

Set $p\geq 2$ , $s\in(0,{s^{*}})^{L}$ , $\Lambda\in(\Lambda_{*},+\infty)^{L}$ , $M>0$ and assume that $\mu_{p}=(\mathbf{E}|\varepsilon|^{p})^{1/p}<+\infty$ . Define $\tilde{\boldsymbol{h}}_{n}=\big{(}\tilde{h}_{n}^{(\ell)}(s,\Lambda)\big{)}_{\ell=1,\dotsc,L}\in(0,1)^{L}$ where:

[TABLE]

There exist two positive constants $\kappa_{*}$ and $\kappa^{*}$ that depend only on $L$ , $M$ , $\Lambda_{*}$ , $\mu_{2}$ , $\mu_{p}$ and ${s^{*}}$ such that

[TABLE]

and

[TABLE]

Note that this result also ensures that the family of estimators constructed in Section 3.2 is well-adapted to our problem. The next result states an oracle-type inequality satisfied by our data-driven estimator $\hat{m}_{L}$ .

Theorem 3

Set $p\geq 2$ and assume that for any $\ell=1,\ldots,L$ , $\|f_{\ell}\|_{\Delta_{\ell}}^{2}\leq M^{2}\ell!$ and that for any $q\geq 1$ the moment $\mu_{q}=\left(\mathbf{E}|\varepsilon|^{q}\right)^{1/q}$ exists. Then:

[TABLE]

where $\Upsilon_{1}$ and $\Upsilon_{2}$ are two positive constants that depend on $L$ , $M$ , ${s_{*}}$ and ${s^{*}}$ .

Using Theorems 2 and 3 we can derive our last result: the data-driven estimation procedure is adaptive, up to a logarithmic factor, over the scale $\{\mathfrak{M}_{s,\Lambda,L,M}:s\in({s_{*}},s^{*})^{L},\Lambda\in(0,+\infty)^{L},M>0\}$ .

Theorem 4

Set $p\geq 2$ and assume that for any $q\geq 1$ the moment $\mu_{q}=\left(\mathbf{E}|\varepsilon|^{q}\right)^{1/q}$ exists. For any $s\in({s_{*}},s^{*})^{L}$ , any $\Lambda\in(0,+\infty)^{L}$ , any $M>0$ , we have

[TABLE]

where $\kappa^{**}$ is a positive constant that depends on $\Lambda$ , $L$ , $M$ , ${s_{*}}$ and ${s^{*}}$ and

[TABLE]

Remark 4

While the selection procedure is defined using the $\mathbf{L}^{2}$ -norms, the procedure is adaptive for any $p\geq 2$ . This phenomenon is due to the hypercontractivity property, see (79). Note that in Theorem 3, the quantity

[TABLE]

is a tight upper bound of the bias term of the estimator $\hat{f}_{h}^{\ell}$ .

This result ensures that our data-driven procedure is adaptive, up to a logarithmic factor, over a large scale of mapping classes.

The presence of the extra logarithmic factor in the adaptive rate of convergence is not usual for prediction risks. This term is introduced in the definition of $M(\ell,h)$ to control the deviation of the estimator (16) based on the variables $I_{\ell}\left(K_{h}^{(\ell)}(t,\cdot)\right)(W_{i})$ . See (133) for more details.

5 Proofs

We first consider some notations and lemmas. Define for $i\in\{1,\ldots,n\}$ , $\ell\in\{1,\ldots,L\}$ and $h\in(0,1)$

[TABLE]

and

[TABLE]

Lemma 1

We have, for any $\ell\in\{1,\ldots,L\}$ , $h\in\mathbf{H}_{\ell}$ and $r\geq 1$

[TABLE]

Moreover for $\varphi>0$ and $q\geq 1$

[TABLE]

Lemma 2

Let $\ell\in\{1,\ldots,L\}$ and $h\in\mathbf{H}_{\ell}$ . Let $\chi,\chi_{1},\ldots,\chi_{n}$ be i.i.d random variables such that, for any $r\geq 1$

[TABLE]

Define

[TABLE]

and

[TABLE]

Then there exists a positive constant $C>0$ such that

[TABLE]

The following Lemma recalls the Bousquet’s version of Talagrand’s concentration inequality (see Bousquet, 2002; Boucheron et al., 2013).

Lemma 3 (Bousquet’s inequality)

Let $X_{1},\ldots,X_{n}$ be independent identically distributed random variables. Let $\mathcal{S}$ be a countable set of functions and define $Z=\sup_{s\in\mathcal{S}}\sum_{i=1}^{n}s(X_{i})$ . Assume that, for all $i=1,\ldots,n$ and $s\in\mathcal{S}$ , we have $\mathbf{E}s(X_{i})=0$ and $s(X_{i})\leq 1$ almost surely. Assume also that $v=2\mathbf{E}Z+\sup_{s\in\mathcal{S}}\sum_{i=1}^{n}\mathbf{E}(s(X_{i}))^{2}<\infty$ . Then we have for all $t>0$

[TABLE]

5.1 Proof of Theorem 1

Set $p\geq 2$ , $s\in(0,{s^{*}})^{\mathbb{N}}$ , $\Lambda\in(\Lambda_{*},+\infty)^{\mathbb{N}}$ , $\gamma>0$ , and $M>0$ . For the sake of readability we denote $\boldsymbol{h}=\boldsymbol{h}_{n}$ , $h_{\ell}=h^{(\ell)}_{n}(s,\Lambda)$ , $K_{\ell}=K^{(\ell)}_{h_{\ell}}$ , $\xi_{\ell}(t)=\xi_{\ell}(t,h_{\ell})$ and $\hat{f}_{\ell}=\hat{f}^{(\ell)}_{h_{\ell}}$ .

Decomposition of the risk.

Using the triangle inequality we have:

[TABLE]

Last line comes from the hypercontractivity property. Now, using Itô’s isometry, we obtain:

[TABLE]

where the bias term $B(\ell)$ and the stochastic term $V(\ell)$ are defined by:

[TABLE]

Study of the constant term.

Remark that

[TABLE]

where the last line is obtained using Rosenthal’s inequality (Johnson et al., 1985). Here $C_{1,p}$ and $C_{2,p}$ denote two positive constants while $\sigma_{Y}(p)=(\mathbf{E}|Y-\mathbf{E}Y|^{p})^{1/p}$ . Moreover since

[TABLE]

the hypercontractivity property, implies that, for any $p\geq 2$

[TABLE]

Last line comes from Itô’s isometry. Now, using that $m\in\mathfrak{A}(s,\Lambda,\gamma,M)$ and applying Cauchy-Schwarz inequality we obtain:

[TABLE]

where, using the definition of $\mathfrak{c}_{\ell}(p)$ and the fact that $2\gamma>\log(p-1)$

[TABLE]

We finally obtain

[TABLE]

with $\kappa_{0}=C_{1,p}s_{Y}(p)+C_{2,p}s_{Y}(2)$ depends only on $M$ , $\gamma$ , $\mu_{2}$ and $\mu_{p}$ .

Study of the bias term.

Set $\ell\in\{1,\dotsc,L_{n}\}$ and note that:

[TABLE]

Using Itô’s isometry we thus obtain:

[TABLE]

To apply multivariate Taylor formula we introduce, for any $\alpha=(\alpha_{1},\dotsc,\alpha_{\ell})\in(\mathbb{N}\cup\{0\})^{\ell}$ , the notation $|\alpha|=\alpha_{1}+\dotsc+\alpha_{\ell}$ . Moreover we define $s_{\ell}=m_{\ell}+\gamma_{\ell}$ with $m_{\ell}\in\mathbb{N}\cup\{0\}$ and $0<\gamma_{\ell}\leq 1$ . Since $f_{\ell}\in\mathcal{H}_{\ell}(s_{\ell},\Lambda_{\ell})$ , we obtain, using classical arguments (see Bertin et al., 2019), that:

[TABLE]

where

[TABLE]

and $\wp(\cdot)$ denotes the partition function of an integer. We then obtain:

[TABLE]

Since the sequence

[TABLE]

tends to [math] as $\ell$ goes to infinity, there exists an absolute constant $C_{0}>0$ that depends only on $p$ , $s^{*}$ and $\|\mathbf{k}\|_{[0,1]}$ such that:

[TABLE]

Study of the stochastic term $V(\ell)$

Set $\ell\in\{1,\dotsc,L_{n}\}$ . We have:

[TABLE]

where

[TABLE]

and

[TABLE]

Then we have

[TABLE]

Since $W$ and $\varepsilon$ are independent Lemma 1 implies:

[TABLE]

Then we have,

[TABLE]

where

[TABLE]

Now we have:

[TABLE]

Using Cauchy-Schwarz inequality we obtain:

[TABLE]

Now, using Lemma 1 and that $m\in\mathfrak{A}(s,\Lambda,\gamma,M)$ , we obtain:

[TABLE]

This implies that

[TABLE]

where

[TABLE]

Note that $a_{2}$ is finite since $2\gamma>\log(3)$ Combining (56) and (61), we obtain, denoting $C_{1}=a_{2}+\mu_{2}$ , that

[TABLE]

General bound on the risk

Combining (49), (56) and (61), the following bound can be easily obtained:

[TABLE]

Study of the residual term

Finally we have using that $m\in\mathfrak{A}(s,\Lambda,\gamma,M)$

[TABLE]

where

[TABLE]

Note that $\psi_{\gamma}(L)$ tends to [math] as $L$ tends to infinity.

Upper bound

Using the definitions of $L_{n}$ and $h_{\ell}$ we have

[TABLE]

Now, remark that, since $\gamma_{p}>s^{*}$ and $C_{2}>1$ , we have

[TABLE]

Then there exists a positive constant $a_{4}$ that depends on $\Lambda_{*}$ and $s^{*}$ such that

[TABLE]

This implies that

[TABLE]

where $\rho_{n}$ is a negligeable reminder term.

5.2 Proof of Theorem 2

This proof is decomposed into two parts. We first prove the upper bound (28) and then the lower bound (29).

5.2.1 Proof of the upper bound

For the sake of readability we denote $\tilde{\boldsymbol{h}}=\tilde{\boldsymbol{h}}_{n}$ and $h_{\ell}=\tilde{h}^{(\ell)}_{n}(s,\Lambda)$ . Following the same notations as in the proof of Theorem 1, we have

[TABLE]

Note that in this case there is no residual term. Similarly to the proof of Theorem 1, and using the same notations, we have

[TABLE]

with $\kappa_{0}$ depending on $M$ , $\mu_{2}$ and $\mu_{p}$ . The bias term satisfies

[TABLE]

and the stochastic term satisfies

[TABLE]

where $C_{4}$ depends on $L$ , $M$ , and $\mu_{2}$ . Now by substituting $h_{\ell}$ by its value, we obtain

[TABLE]

where $\kappa^{*}$ is a positive constant that depends only on $L$ , $M$ $\mu_{2}$ , $\mu_{p}$ , $\Lambda_{*}$ and ${s^{*}}$ . This ends the proof of the upper bound. Now, let us prove the lower bound.

5.2.2 Proof of the lower bound

Note that for any $p\geq 2$ and any estimator $\tilde{m}_{n}$ of $m$ we have $R_{p}(\tilde{m}_{n},m)\geq R_{2}(\tilde{m}_{n},m)$ . This implies that, to prove the lower bound, it is sufficient to consider the case $p=2$ .

Method.

We fix $s\in(0,{s^{*}})^{L}$ , $\Lambda\in(0,+\infty)^{L}$ and $M>0$ . To prove the lower bound over the space $\mathfrak{M}(s,\Lambda,L,M)$ , we define

[TABLE]

and we follow the strategy developed by Cadre et al. (2017). In particular Lemma 6.1 of this paper implies (using Itô’s isometry combined with Theorem 2.5 in Tsybakov (2009)) that the problem boils down to find a finite family of functions $\left\{g_{\omega}\right\}_{\omega\in\mathcal{W}}$ with cardinal $|\mathcal{W}|\geq 2$ that satisfies the following assumptions:

(i)

the null function $0\in\{g_{\omega}\}_{\omega\in\mathcal{W}}$ .

(ii)

for any $\omega\in\mathcal{W}$ , the function $g_{\omega}\in\mathcal{H}_{\ell}(s_{\ell},\Lambda_{\ell},M)$ and $\|g_{\omega}\|_{\Delta_{\ell}}^{2}\leq\ell!M^{2}$

(iii)

there exists $\kappa_{*}>0$ such that for $\omega\neq\omega^{\prime}$ , $\|g_{\omega}-g_{\omega^{\prime}}\|_{\Delta_{\ell}}\geq 2\kappa_{*}\phi_{n}(s,\Lambda)$

(iv)

there exists $0<\alpha<1/8$ such that

[TABLE]

Under these assumptions, the lower-bound (29) holds for $p=2$ .

Notation.

Here, we construct a finite set of functions used in the rest of the proof. We consider the function $\psi:\mathbb{R}\to\mathbb{R}$ defined, for any $u\in\mathbb{R}$ by

[TABLE]

This function is in $L_{2}$ and we denote $\|\psi\|^{2}=\int_{\mathbb{R}}\psi^{2}(x)dx<2$ . Note that, since the function $\psi$ is infinitely differentiable with compact support, we have:

[TABLE]

Now we consider $0<\alpha<1/8$ ,

[TABLE]

and

[TABLE]

We consider the bandwidth

[TABLE]

and we set $R=1/(2h)$ . We assume, without loss of generality, that $R$ is an integer and $nh^{\ell}\geq 1$ . Let $\mathcal{R}=\{0,\dotsc,R-1\}^{\ell}$ and define, for any $r=(r_{1},\dotsc,r_{\ell})\in\mathcal{R}$ , the function $\phi_{r}:\Delta_{\ell}\to\mathbb{R}$ by:

[TABLE]

where $x_{i}^{(r)}=(2r_{i}+1)h$ . Finally, for any $w:\mathcal{R}\to\{0,1\}$ we define:

[TABLE]

where

[TABLE]

Proof of (ii).

Set $w:\mathcal{R}\to\{0,1\}$ . The following property can be readily verified:

[TABLE]

This implies that

[TABLE]

Moreover note that, for any $y\in\Delta_{\ell}$ and $\alpha=(\alpha_{1},\dotsc,\alpha_{\ell})$ such that $|\alpha|=\lfloor s_{\ell}\rfloor$ , we have:

[TABLE]

which implies that, for any $z\in\Delta_{\ell}$ we have

[TABLE]

This also implies, since the function $\psi$ vanishes outside $(-1,1)$ , that

[TABLE]

Using (70), we deduce that $g_{w}$ belongs to $\mathcal{H}_{\ell}(s_{\ell},\Lambda_{\ell})$ . Combining with (67), (ii) is fulfilled.

Proof of (i) and (iii).

Using Lemma 2.9 of Tsybakov (2009), there exists a set $\mathcal{W}\subset\{w:\mathcal{R}\to\{0,1\}\}$ such that the null function belongs to $\mathcal{W}$ , $\log_{2}|\mathcal{W}|\geq R^{\ell}/8$ and

[TABLE]

Let $w,w^{\prime}\in\mathcal{W}$ such that $w\neq w^{\prime}$ . We have

[TABLE]

Then Assumptions (i) and (iii) are fulfilled.

Proof of (iv).

Using (67), we deduce that using the definition of $c_{1}$

[TABLE]

Then Assumption (iv) is fulfilled.

5.3 Proof of Theorem 3

We have using (40) and (49) that

[TABLE]

Let $\ell\in\{1,\ldots,L\}$ . Let $h\in\mathbf{H}_{\ell}$ . We have

[TABLE]

Then we have

[TABLE]

Note that we have

[TABLE]

where we use the properties of $V_{1}(\ell)$ and $V_{2}(\ell)$ stated in page 61. In the following, we will demonstrate that

[TABLE]

Combining (80) with (81) and (82), we obtain that

[TABLE]

Theorem 3 is then a direct consequence of the above inequality and (79).

Proof of (82)

Now let us control $B(\ell,h)$ for $h\in\mathbf{H}_{\ell}$ . We have

[TABLE]

Then

[TABLE]

where

[TABLE]

We have

[TABLE]

where

[TABLE]

and

[TABLE]

Using these notations we have:

[TABLE]

where

[TABLE]

and

[TABLE]

Now note that

[TABLE]

and for $k\in\{0,\ldots,L\}$

[TABLE]

Using Lemma 2 with $\mathbf{U}=\mathbf{U}_{\ell}$ (respectively $\mathbf{U}=k!\mathbf{U}_{k,\ell}$ ), $T=T(\ell)$ (respectively $T=T(k,\ell)$ ) and $\chi_{i}=\varepsilon_{i}$ (respectively $\chi=\Theta_{i,k}$ ), we deduce that for all for $h^{\prime}\leq h$

[TABLE]

This implies that

[TABLE]

Now (83) and (84) entail (82).

5.4 Proof of Theorem 4.

Let $s=(s_{1},\ldots,s_{L})\in[{s_{*}},{s^{*}})^{L}$ , $\Lambda\in(0,+\infty)^{L}$ , $M>0$ and $f\in\mathfrak{M}_{s,\Lambda,L,M}$ . Define for $\ell\in\{1,\ldots,L\}$

[TABLE]

For $n$ large enough, we have $h_{\ell}=e^{-k_{\ell}}\in\mathbf{H}_{\ell}$ . Using (5.1) and (23), Theorem 3 implies that

[TABLE]

where $C$ is a constant that changes from line to line and depends on $s$ , $\Lambda$ and $M$ . Since $C$ does not depend on $m\in\mathfrak{M}(s,\Lambda,L,M)$ , this ends the proof.

5.5 Proof of Lemma 1.

We have

[TABLE]

Moreover we have

[TABLE]

5.6 Proof of Lemma 2

In this proof, $C$ is a positive constant that changes of value from line to line. Since $h$ is fixed, we simplify the notation and use in the proof $\xi_{i,\ell}(t,h)=\xi_{i,\ell}(t)$ and $\xi_{\ell}(t,h)=\xi_{\ell}(t)$ . Now, we have for $k\geq 1$ :

[TABLE]

where

[TABLE]

where for any $\ell=1,\ldots,L$ :

[TABLE]

and

[TABLE]

Note that both $\alpha$ and $\beta$ are positive numbers.

5.6.1 Control of $\eta_{1}$

We have

[TABLE]

Note that we have using Cauchy-Schwarz and Markov inequality

[TABLE]

Moreover since $h\in\mathbf{H}_{\ell}$ , using Lemma 1 with $q=\frac{8{s_{*}}+6\ell}{{s_{*}}}$ , we have

[TABLE]

Now using (98), (103) and (106), we finally obtain

[TABLE]

5.6.2 Control of $\eta_{2}$

Define

[TABLE]

We have

[TABLE]

where using Lemma 1 with $q=4/\beta$

[TABLE]

and following (103)

[TABLE]

5.6.3 Control of $\eta_{3}$

Define

[TABLE]

We have

[TABLE]

where

[TABLE]

and

[TABLE]

Note that using similar arguments as above with $q=(8{s_{*}}+4\ell)/{s_{*}}$ ,

[TABLE]

and following (106) $B\leq Cn^{-1}$ .

5.6.4 Control of $\bar{\eta}_{0}$

We have to bound

[TABLE]

Note that, using duality arguments, there exists a countable set $\mathcal{S}$ of functions $s\in\mathbb{L}^{2}(\Delta_{\ell})$ such that $\|s\|_{\Delta_{\ell}}\leq 1$ and

[TABLE]

where

[TABLE]

and, for $s\in\mathcal{S}$ , we have:

[TABLE]

and

[TABLE]

Note that we have both $\mathbf{E}(X_{i,s})=0$ and $\|X_{i,s}\|_{\infty}\leq 1$ . Now, let us control:

[TABLE]

Using Cauchy-schwarz’s inequality and Fubini’s theorem we obtain:

[TABLE]

We have

[TABLE]

and

[TABLE]

Combining the previous results we have:

[TABLE]

Define

[TABLE]

We have:

[TABLE]

Define:

[TABLE]

Using Bousquet’s inequality we have:

[TABLE]

where

[TABLE]

and

[TABLE]

Since $\mathfrak{a}\mathfrak{d}-\mathfrak{b}\mathfrak{c}>0$ , we have, $D_{n}(u)\leq D_{n}(0)$ , that is:

[TABLE]

Since $h\in\mathbf{H}_{\ell}$ , for $n$ large enough $(1+\delta/6)\leq\sqrt{\theta_{n}}$ we have $D_{n}(u)\leq h^{\ell}$ . Moreover we have doing the change of variables $v=\sqrt{nh^{\ell}}u$

[TABLE]

This implies that:

[TABLE]

Combining results of Sections 5.6.1, 5.6.2, 5.6.3 and 5.6.4, we obtain (31)

Acknowledgements

The authors have been supported by Fondecyt projects 1171335 and 1190801, and Mathamsud projects 19-MATH-06 and 20-MATH-05.

Bibliography25

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Azaïs and Fort (2013) Jean-Marc Azaïs and Jean-Claude Fort. Remark on the finite-dimensional character of certain results of functional statistics. C. R. Math. Acad. Sci. Paris , 351(3-4):139–141, 2013. ISSN 1631-073X. doi: 10.1016/j.crma.2013.02.004 . URL https://doi.org/10.1016/j.crma.2013.02.004 . · doi ↗
2Bertin et al. (2019) Karine Bertin, Salima El Kolei, and Nicolas Klutchnikoff. Adaptive density estimation on bounded domains. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques , 55(4):1916–1947, 2019. doi: 10.1214/18-AIHP 938 . URL https://projecteuclid.org/euclid.aihp/1573203619 .
3Biau et al. (2010) Gérard Biau, Frédéric Cérou, and Arnaud Guyader. Rates of convergence of the functional k 𝑘 k -nearest neighbor estimate. IEEE Trans. Inform. Theory , 56(4):2034–2040, 2010. ISSN 0018-9448. doi: 10.1109/TIT.2010.2040857 . URL https://doi.org/10.1109/TIT.2010.2040857 . · doi ↗
4Boucheron et al. (2013) Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities . Oxford University Press, Oxford, 2013. ISBN 978-0-19-953525-5. doi: 10.1093/acprof:oso/9780199535255.001.0001 . URL https://doi.org/10.1093/acprof:oso/9780199535255.001.0001 . A nonasymptotic theory of independence, With a foreword by Michel Ledoux. · doi ↗
5Bousquet (2002) Olivier Bousquet. A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris , 334(6):495–500, 2002. ISSN 1631-073X. doi: 10.1016/S 1631-073X(02)02292-6 . URL https://doi.org/10.1016/S 1631-073X(02)02292-6 . · doi ↗
6Cadre and Truquet (2015) Benoît Cadre and Lionel Truquet. Nonparametric regression estimation onto a Poisson point process covariate. ESAIM Probab. Stat. , 19:251–267, 2015. ISSN 1292-8100. doi: 10.1051/ps/2014023 . URL https://doi.org/10.1051/ps/2014023 . · doi ↗
7Cadre et al. (2017) Benoît Cadre, Nicolas Klutchnikoff, and Gaspar Massiot. Minimax regression estimation for Poisson coprocess. ESAIM Probab. Stat. , 21:138–158, 2017. ISSN 1292-8100. doi: 10.1051/ps/2017004 . URL https://doi.org/10.1051/ps/2017004 . · doi ↗
8Cai and Hall (2006) T. Tony Cai and Peter Hall. Prediction in functional linear regression. Ann. Statist. , 34(5):2159–2179, 2006. ISSN 0090-5364. doi: 10.1214/009053606000000830 . URL https://doi.org/10.1214/009053606000000830 . · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Adaptive regression with Brownian path covariate

Abstract

1 Introduction

2 Statistical framework

2.1 Model

Definition 1

Definition 2

Remark 1

Definition 3

2.2 Minimax and adaptive framework

2.3 Extensions to our model

Remark 2

3 Estimator construction

3.1 Classical properties of the chaos

3.2 A simple family of estimators

3.3 Selection procedure

Remark 3

4 Main results

4.1 Result for the infinite chaos model

Theorem 1

4.2 Results for finite chaos model

Theorem 2

Theorem 3

Theorem 4

Remark 4

5 Proofs

Lemma 1

Lemma 2

Lemma 3** (Bousquet’s inequality)**

5.1 Proof of Theorem 1

Decomposition of the risk.

Study of the constant term.

Study of the bias term.

Study of the stochastic term V(ℓ)V(\ell)V(ℓ)

General bound on the risk

Study of the residual term

Upper bound

5.2 Proof of Theorem 2

5.2.1 Proof of the upper bound

5.2.2 Proof of the lower bound

Method.

Notation.

Proof of (ii).

Proof of (i) and (iii).

Proof of (iv).

5.3 Proof of Theorem 3

Proof of (82)

5.4 Proof of Theorem 4.

5.5 Proof of Lemma 1.

5.6 Proof of Lemma 2

5.6.1 Control of η1\eta_{1}η1​

5.6.2 Control of η2\eta_{2}η2​

5.6.3 Control of η3\eta_{3}η3​

5.6.4 Control of ηˉ0\bar{\eta}_{0}ηˉ​0​

Acknowledgements

Lemma 3 (Bousquet’s inequality)

Study of the stochastic term $V(\ell)$

5.6.1 Control of $\eta_{1}$

5.6.2 Control of $\eta_{2}$

5.6.3 Control of $\eta_{3}$

5.6.4 Control of $\bar{\eta}_{0}$