Adaptive regression with Brownian path covariate
Karine Bertin, Nicolas Klutchnikoff

TL;DR
This paper introduces an adaptive estimation method for regression functions with continuous outcomes and Wiener process covariates, utilizing Wiener-Itô decomposition and data-driven selection to achieve optimal convergence rates.
Contribution
It develops a new adaptive regression estimator for functional covariates based on Wiener-Itô decomposition, with proven minimax convergence rates and a data-driven selection procedure.
Findings
Achieves minimax convergence rates for the regression function estimation.
Provides an oracle inequality leading to adaptive estimation.
Demonstrates the effectiveness of the proposed method on Wiener process covariates.
Abstract
This paper deals with estimation with functional covariates. More precisely, we aim at estimating the regression function of a continuous outcome against a standard Wiener coprocess . Following Cadre and Truquet (2015) and Cadre, Klutchnikoff, and Massiot (2017) the Wiener-It\^o decomposition of is used to construct a family of estimators. The minimax rate of convergence over specific smoothness classes is obtained. A data-driven selection procedure is defined following the ideas developed by Goldenshluger and Lepski (2011). An oracle-type inequality is obtained which leads to adaptive results.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Adaptive regression with Brownian path covariate
Karine Bertin CIMFAV-INGEMAT, Universidad de Valparaíso, General Cruz 222, Valparaíso, Chile, [email protected]
Nicolas Klutchnikoff Univ Rennes, CNRS, IRMAR – UMR 6625, F-35000 Rennes, France, [email protected]
Abstract
This paper deals with estimation with functional covariates. More precisely, we aim at estimating the regression function of a continuous outcome against a standard Wiener coprocess . Following Cadre and Truquet (2015) and Cadre et al. (2017) the Wiener-Itô decomposition of is used to construct a family of estimators. The minimax rate of convergence over specific smoothness classes is obtained. A data-driven selection procedure is defined following the ideas developed by Goldenshluger and Lepski (2011). An oracle-type inequality is obtained which leads to adaptive results.
Keywords: Functional regression, Wiener-Itô chaos expansion, Oracle inequalities, Adaptive minimax rates of convergence.
AMS Subject Classification: 62G08, 62H12
1 Introduction
The problem of regression estimation is one of the most studied in statistics and different models have been considered depending on the nature of the data. In an increasing number of applications, it seems natural to assume that the covariate takes values in a functional space. The book of Ramsay and Silverman (2005) provides an overview on the subject of functional data analysis. In this context, several authors studied linear functional regression models (see for example Müller and Stadtmüller, 2005; Cai and Hall, 2006; Crambes et al., 2009). Nonparametric functional regression models have also been investigated (see Ferraty and Vieu, 2006, and references therein). In this paper, we are interested in such a model where the covariate is a Wiener Process. More precisely, let be a real-valued random variable and be a standard Brownian motion independent of . We define
[TABLE]
where is a mapping defined on the set of all continuous functions and we assume that both and are square integrable random variables. Our goal is to estimate the function using a dataset of independent realizations of .
Since this framework is a specific case of the more general functional regression framework, usual approaches (which mainly consist in extending classical local methods such as -nearest neighbors, kernel smoothing or local polynomial smoothing) could be used. However, in our context, these methods are known to lead to slow rates of convergence over classical models (see below for detailed references). Taking advantage of the probabilistic properties of the Wiener coprocess, we aim at defining a new family of models as well as dedicated estimation procedures with faster rates of convergence (in both minimax and adaptive minimax senses). Despite the fact that considering Brownian paths covariates seems restrictive for pratical purposes, several Brownian diffusion paths could also be considered. Albeilt the systematic theoretical study of such models is beyond the scope of this paper (and is left to further developments), we propose some extensions of our framework as well as some examples of usual processes that can be considered, such as geometric Brownian motions or Ornstein-Uhlenbeck processes.
In usual functional approaches the set is endowed with a metric (see for example Ferraty and Vieu, 2006; Ferraty et al., 2007; Biau et al., 2010) which allows to extend several nonparametric estimators. For example a simple version of the Nadaraya-Watson estimator is given, for any function and any bandwidth , by:
[TABLE]
where stands for the indicator function. The properties of these estimators are related to the behavior of a quantity known as the small ball probability defined for and by . Pointwise risks of such methods can be generally bounded, up to a positive factor by
[TABLE]
where denotes the smoothness of the mapping measured in a Hölder sense. For example, if it is assumed that there exists such that for any . Under additional assumptions similar results can be obtained for integrated risks.
The classical assumption corresponds roughly to the situation where the covariate lies in some space of finite dimension (see Azaïs and Fort, 2013). This framework corresponds to the usual nonparametric case. The minimax rates of convergence are then given by (see Tsybakov, 2009). However if lies in a functional space, the behavior of is quite different. In our context, where is a standard Wiener process, it is well-known (see Li and Shao, 2001) that
[TABLE]
which leads to slower rates of convergence of the form assuming a Hölder condition on . We refer the reader to Chagny and Roche (2016) for recent results with different behavior of .
In practical situations, since is unknown, finding adaptive procedures to select the smoothing parameter is of prime interest. To our best knowledge few papers deal with this problem. Adaptive procedures based on cross validation have been used in Rachdi and Vieu (2007). Chagny and Roche (2016) also propose an adaptation of the method developed by Goldenshluger and Lepski (see Goldenshluger and Lepski, 2011) using an empirical version of the quantity . Lower bounds have been investigated by Mas (2012). In all these papers, the pointwise risk is studied in terms of and theoretical properties are obtained assuming a -Hölder condition on with respect to the metric with smoothness .
In this paper we follow a different strategy. Taking advantage of probabilistic properties of the Wiener process, similarly to the methodology developed by Cadre and Truquet (2015) and Cadre et al. (2017), we consider the Wiener-Itô chaotic decomposition of . Indeed, every random variable that belongs to can be decomposed as a sum of multiple stochastic integrals (see Di Nunno et al., 2009, for more details). There exists a unique sequence of functions such that
[TABLE]
where belongs to , the set of symmetric and square integrable real-valued functions defined on and
[TABLE]
We recall that is symmetric on if for any and any permutation of , . Note that the symmetry implies that the functions are isotropic. The iterated integral is called a chaos of order .
Our approach consists in defining kernel-type estimators of using the Itô’s isometry, see (14). Then, based on (2), we propose the following estimator of
[TABLE]
with . To study these estimators, we assume that belongs to a specific class of mappings that satisfy
[TABLE]
for some and , is the classical norm on and that the defined by (2) are Hölderian. Such classes are quite natural in our context and are connected with the usual Meyer-Watanabe test function space (see section 2.1 for more details).
In this case, we find rates of convergence for the prediction error in norm. Contrary to the classical functional framework, where logarithmic rates are derived, the rates we obtain are intermediate between logarithmic and polynomial rates.
If we assume moreover that the summation in (2) stops at a known index , we prove that the estimators achieve optimal rates of convergence. We derive minimax rates of convergence which are polynomial in with an exponent that depends on the smoothness of the functions . A data-driven procedure, based on the method developed by Goldenshluger and Lepski (2011), is then defined to tune the bandwidths used in the estimation of the functions . The resulting estimator of satisfies an oracle-type inequality that allows us to derive adaptive results.
The paper is organized as follows. Section 2 presents the model and the studied problem. Section 3 describes the construction of the estimators. Section 4 gives the main results and Section 5 is dedicated to the proofs.
2 Statistical framework
2.1 Model
Let be a standard Brownian motion and let be a centered real-valued random variable independent of . We define:
[TABLE]
where is a given mapping. We assume that as well as belong to , the set of square integrable random variables, then
[TABLE]
where for , belongs to . As mentioned in the introduction we also assume that is a regular function. Below we define precisely the functional classes used to measure the smoothness of each function .
Definition 1
Set and . The Hölder ball is the set of all functions that satisfy the following properties:
For any such that , the partial derivative exists where
[TABLE] 2. 2.
For any and in we have:
[TABLE]
where stands for the Euclidean norm of . 3. 3.
We have .
Equipped with these notations we can define a scale of classes for the mapping . Roughly, we impose some restrictions on the functions that appear in (2) of two kinds: a minimal smoothness, for each , is imposed and the growth of the -norm of the is controlled.
Definition 2
Set , , and . We say that belongs to the mapping class if there exist and a sequence of functions satisfying
[TABLE]
with and
[TABLE]
where .
Remark 1
Equation (5) implies that
[TABLE]
where denotes the usual Sobolev space over the Wiener space defined in Watanabe (1984). Note also that, if, for any we have for some positive constant , then (5) is fulfilled for any .
We also define subclasses of the classes assuming that the summation in (2) stops at a finite index .
Definition 3
Set . Set and . We say that belongs to the mapping class if there exist and a sequence of functions satisfying
[TABLE]
with and for any .
More precisely, for any , and , the subclasses satisfy with . Let us comment on the above definitions since the framework we consider in this paper is quite different to the usual functional framework recalled in the introduction. In our framework the “regularity” of a map is seen through the prism of the chaotic decomposition of and, thus, the functions . This is not directly linked with the regularity of the mapping between the space endowed with the topology induced by the norm and . For example, it can be easily seen that the mapping defined, for any by is not continuous (which implies that this function is not hölderian and, thus, cannot be considered in the usual framework). However it is well known that . As a consequence, the mapping falls within our scope since belongs to for any and .
2.2 Minimax and adaptive framework
The observations consist in a -sample distributed as and independent of . Our first goal is to investigate the estimation of , based on these observations, over the classes and for where is fixed. To measure the accuracy of an arbitrary estimator of , we consider the prediction risk:
[TABLE]
where . The maximal risk of an arbitrary estimator over a given class of mappings is defined by:
[TABLE]
whereas the minimax risk is defined, taking the infimum over all possible estimators, by:
[TABLE]
An estimator whose maximal risk is asymptotically bounded, up to a multiplicative factor, by is called minimax over . Such an estimator is well-adapted to the estimation over but it can perform poorly over another class of mappings. The problem of adaptive estimation consists in finding a single estimation procedure that is simultaneously minimax over a scale of mapping classes.
Our second goal is to investigate the adaptive estimation of over the scale of classes where , and are fixed and known by the statistician. More precisely our goal is to construct a single estimation procedure such that, for any , the risk is asymptotically bounded, up to a multiplicative constant, by . One of the main tools to prove such a result is to find an oracle-type inequality that guarantees that this procedure performs almost as well as the best estimator in a rich family of estimators. Ideally, we would like to have, for any , an inequality of the following form:
[TABLE]
where is a family of estimators well-adapted to our problem in the following sense: for any , there exists such that is minimax over . However, in many situations, (8) is relaxed and we prove a weaker inequality of the type:
[TABLE]
where and are two positive constants and is an appropriate quantity to be determined that can be viewed as a tight upper bound on . Inequalities of the form (9) are called oracle-type inequalities.
Theorems 3 and 4 below correspond respectively to an oracle-type inequality and an adaptive result of these types.
2.3 Extensions to our model
In this paper, we focus on pure Brownian coprocesses. However our framework allows us to consider a larger class of covariates. Assume that we aim at estimating the regression function in the model:
[TABLE]
where is a process driven by the SDE:
[TABLE]
Here and are assumed to be known functions and we also assume that assumptions guaranteeing the existence and uniqueness of the solution of (10) are fulfilled. If for any , , then, under mild integrability conditions, we have:
[TABLE]
This implies that there exists a known invertible function such that . In general, this function can be computed by numerical integration. However, in some situations, an exact expression can be obtained using Itô’s formula. This is the case for two parametric families of processes widely used to model several practical situations. First, Ornstein–Uhlenbeck processes are driven by the following SDE:
[TABLE]
where is fixed and , and are known parameters. By Itô’s formula we have:
[TABLE]
Next, Geometric Brownian motions are used to model stock prices in the Black–Scholes model. Let and be given parameters. We assume that the process is driven by the following SDE:
[TABLE]
By Itô’s formula we have:
[TABLE]
Remark 2
In practical situation the parameters , and in the above examples are not known. However, estimators of these parameters could be used to estimate the coprocess . This leads to new models where the covariate in observed with errors. The study of such models is beyond the scope of this paper and left to further developments.
In view of (12), equation (10) can be written as:
[TABLE]
Thus, the regression problem (10) falls into our framework. The estimation strategy consists of estimating the function based on the reconstruction of Brownian path . This can be summarized by the formula:
[TABLE]
Remark also that, in this context, it is relevant to assume that the chaotic decomposition of is finite. Indeed, under mild assumptions on and (see Hu, 1997, for more details), if is a polynomial of the terminal value of the process , then the mapping can be written as a finite chaotic decomposition with smooth functions .
3 Estimator construction
In this section we present our estimation procedure. To do so, we first recall classical properties satisfied by Wiener chaos which allow us to construct a family of “simple” estimators that depends on a multivariate tuning parameter. Next we construct a procedure which selects, in a data-driven way, this tuning parameter using the methodology developed by Goldenshluger and Lepski (2011).
3.1 Classical properties of the chaos
Throughout this paper and in the construction of our statistical procedure, we use the following two fundamental properties satisfied by the iterated integrals.
For , Itô’s isometry (Di Nunno et al., 2009) ensures that, if and , then
[TABLE]
where denotes the Kronecker delta.
The hypercontractivity property (Nourdin and Peccati, 2012) will be used to control the concentration of our estimators. Set and . For any we have:
[TABLE]
3.2 A simple family of estimators
Let be a function that satisfies the following properties: is continuous inside , for any ,
[TABLE]
Let . A natural estimator of the function is given, for , by:
[TABLE]
where is a multivariate kernel defined by:
[TABLE]
This specific construction allows one to obtain an estimator free of boundary bias (see Bertin et al., 2019, for more details).
Indeed, note that for any and under regularity assumptions on we have:
[TABLE]
where the last two lines are obtained using (2) and (14). Since is centered and independent of we have:
[TABLE]
Equipped with these notations we define a family of plugin estimators of the mapping . For and all we set:
[TABLE]
where . In the following, we study the rate of convergence of the estimator (22) when where is a sequence of integers that tends to as tends to (see Theorem 1) and where is a known fixed integer (see Theorem 2).
3.3 Selection procedure
Set , and . Assume that exists. Let be fixed and define
[TABLE]
Now, define
[TABLE]
where
[TABLE]
where the constants are defined in (15). Define for
[TABLE]
and set
[TABLE]
The estimation procedure is the defined by where .
Remark 3
This selection rule follows the principles and the ideas developed by Goldenshluger and Lepski in a series of papers (see Goldenshluger and Lepski, 2011, 2014, among others). The quantity , which is called a majorant in the papers cited above, is a penalized version of the standard deviation of the estimator while the quantity is, in some sense, closed to its bias term, see (83). Finding tight majorants is the key point of the method since is chosen in (24) in order to realize an empirical trade-off between these two quantities.
It is worth noting that the procedure depends on a hyperparameter which can be chosen arbitrary small. The introduction of this parameter is due to technical reasons, see (106) in the proof of Lemma 2. This additional assumption (we would like to take ) implies some restrictions on Theorem 4 below.
4 Main results
4.1 Result for the infinite chaos model
Our first result studies the risk of our family of estimators over the class . In this class, the function is decomposed into an infinite sum of chaos:
[TABLE]
Theorem 1
Set , and . Set , and and let be such that . Assume that . Define
[TABLE]
where denotes the integer part and \boldsymbol{h}_{n}=\big{(}h_{n}^{(\ell)}(s,\Lambda)\big{)}_{\ell=1,\dotsc,L_{n}}\in(0,1)^{L_{n}} where for any :
[TABLE]
There exists a positive constant depending on , , , , and such that
[TABLE]
Let us briefly comment on this result. Assume first that the parameters are constant and denote by their common value. In this case we obtain
[TABLE]
This implies that, for large enough, R_{p}\big{(}\hat{m}_{\boldsymbol{h}_{n},L_{n}},\mathfrak{A}(s,\Lambda,\gamma,M)\big{)}, is upperbounded, up to a multiplicative constant by
[TABLE]
Remark that such a rate of convergence lies in-between polylogarithmic rates of convergence and polynomial ones. This result can be compared with those obtained by Cadre and Truquet (2015). Recall that, in this paper, the authors study a similar model with a Poisson point process covariate. The rates obtained in this paper are slightly better than ours since they obtain, for some
[TABLE]
whereas, in our case,
[TABLE]
However remark that their study is limited to and and that, moreover, they assume that the response is a bounded variable. In our situation neither nor are assumed to be bounded.
4.2 Results for finite chaos model
In the three following results, we assume that it exists a known integer such that
[TABLE]
Our second result proves that the minimax rate of convergence over the class is of the same order as:
[TABLE]
Theorem 2
Set , , , and assume that . Define \tilde{\boldsymbol{h}}_{n}=\big{(}\tilde{h}_{n}^{(\ell)}(s,\Lambda)\big{)}_{\ell=1,\dotsc,L}\in(0,1)^{L} where:
[TABLE]
There exist two positive constants and that depend only on , , , , and such that
[TABLE]
and
[TABLE]
Note that this result also ensures that the family of estimators constructed in Section 3.2 is well-adapted to our problem. The next result states an oracle-type inequality satisfied by our data-driven estimator .
Theorem 3
Set and assume that for any , and that for any the moment exists. Then:
[TABLE]
where and are two positive constants that depend on , , and .
Using Theorems 2 and 3 we can derive our last result: the data-driven estimation procedure is adaptive, up to a logarithmic factor, over the scale .
Theorem 4
Set and assume that for any the moment exists. For any , any , any , we have
[TABLE]
where is a positive constant that depends on , , , and and
[TABLE]
Remark 4
While the selection procedure is defined using the -norms, the procedure is adaptive for any . This phenomenon is due to the hypercontractivity property, see (79). Note that in Theorem 3, the quantity
[TABLE]
is a tight upper bound of the bias term of the estimator .
This result ensures that our data-driven procedure is adaptive, up to a logarithmic factor, over a large scale of mapping classes.
The presence of the extra logarithmic factor in the adaptive rate of convergence is not usual for prediction risks. This term is introduced in the definition of to control the deviation of the estimator (16) based on the variables . See (133) for more details.
5 Proofs
We first consider some notations and lemmas. Define for , and
[TABLE]
and
[TABLE]
Lemma 1
We have, for any , and
[TABLE]
Moreover for and
[TABLE]
Lemma 2
Let and . Let be i.i.d random variables such that, for any
[TABLE]
Define
[TABLE]
and
[TABLE]
Then there exists a positive constant such that
[TABLE]
The following Lemma recalls the Bousquet’s version of Talagrand’s concentration inequality (see Bousquet, 2002; Boucheron et al., 2013).
Lemma 3** (Bousquet’s inequality)**
Let be independent identically distributed random variables. Let be a countable set of functions and define . Assume that, for all and , we have and almost surely. Assume also that . Then we have for all
[TABLE]
5.1 Proof of Theorem 1
Set , , , , and . For the sake of readability we denote , , , and .
Decomposition of the risk.
Using the triangle inequality we have:
[TABLE]
Last line comes from the hypercontractivity property. Now, using Itô’s isometry, we obtain:
[TABLE]
where the bias term and the stochastic term are defined by:
[TABLE]
Study of the constant term.
Remark that
[TABLE]
where the last line is obtained using Rosenthal’s inequality (Johnson et al., 1985). Here and denote two positive constants while . Moreover since
[TABLE]
the hypercontractivity property, implies that, for any
[TABLE]
Last line comes from Itô’s isometry. Now, using that and applying Cauchy-Schwarz inequality we obtain:
[TABLE]
where, using the definition of and the fact that
[TABLE]
We finally obtain
[TABLE]
with depends only on , , and .
Study of the bias term.
Set and note that:
[TABLE]
Using Itô’s isometry we thus obtain:
[TABLE]
To apply multivariate Taylor formula we introduce, for any , the notation . Moreover we define with and . Since , we obtain, using classical arguments (see Bertin et al., 2019), that:
[TABLE]
where
[TABLE]
and denotes the partition function of an integer. We then obtain:
[TABLE]
Since the sequence
[TABLE]
tends to [math] as goes to infinity, there exists an absolute constant that depends only on , and such that:
[TABLE]
Study of the stochastic term
Set . We have:
[TABLE]
where
[TABLE]
and
[TABLE]
Then we have
[TABLE]
Since and are independent Lemma 1 implies:
[TABLE]
Then we have,
[TABLE]
where
[TABLE]
Now we have:
[TABLE]
Using Cauchy-Schwarz inequality we obtain:
[TABLE]
Now, using Lemma 1 and that , we obtain:
[TABLE]
This implies that
[TABLE]
where
[TABLE]
Note that is finite since Combining (56) and (61), we obtain, denoting , that
[TABLE]
General bound on the risk
Combining (49), (56) and (61), the following bound can be easily obtained:
[TABLE]
Study of the residual term
Finally we have using that
[TABLE]
where
[TABLE]
Note that tends to [math] as tends to infinity.
Upper bound
Using the definitions of and we have
[TABLE]
Now, remark that, since and , we have
[TABLE]
Then there exists a positive constant that depends on and such that
[TABLE]
This implies that
[TABLE]
where is a negligeable reminder term.
5.2 Proof of Theorem 2
This proof is decomposed into two parts. We first prove the upper bound (28) and then the lower bound (29).
5.2.1 Proof of the upper bound
For the sake of readability we denote and . Following the same notations as in the proof of Theorem 1, we have
[TABLE]
Note that in this case there is no residual term. Similarly to the proof of Theorem 1, and using the same notations, we have
[TABLE]
with depending on , and . The bias term satisfies
[TABLE]
and the stochastic term satisfies
[TABLE]
where depends on , , and . Now by substituting by its value, we obtain
[TABLE]
where is a positive constant that depends only on , , , and . This ends the proof of the upper bound. Now, let us prove the lower bound.
5.2.2 Proof of the lower bound
Note that for any and any estimator of we have . This implies that, to prove the lower bound, it is sufficient to consider the case .
Method.
We fix , and . To prove the lower bound over the space , we define
[TABLE]
and we follow the strategy developed by Cadre et al. (2017). In particular Lemma 6.1 of this paper implies (using Itô’s isometry combined with Theorem 2.5 in Tsybakov (2009)) that the problem boils down to find a finite family of functions with cardinal that satisfies the following assumptions:
- (i)
the null function .
- (ii)
for any , the function and
- (iii)
there exists such that for ,
- (iv)
there exists such that
[TABLE]
Under these assumptions, the lower-bound (29) holds for .
Notation.
Here, we construct a finite set of functions used in the rest of the proof. We consider the function defined, for any by
[TABLE]
This function is in and we denote . Note that, since the function is infinitely differentiable with compact support, we have:
[TABLE]
Now we consider ,
[TABLE]
and
[TABLE]
We consider the bandwidth
[TABLE]
and we set . We assume, without loss of generality, that is an integer and . Let and define, for any , the function by:
[TABLE]
where . Finally, for any we define:
[TABLE]
where
[TABLE]
Proof of (ii).
Set . The following property can be readily verified:
[TABLE]
This implies that
[TABLE]
Moreover note that, for any and such that , we have:
[TABLE]
which implies that, for any we have
[TABLE]
This also implies, since the function vanishes outside , that
[TABLE]
Using (70), we deduce that belongs to . Combining with (67), (ii) is fulfilled.
Proof of (i) and (iii).
Using Lemma 2.9 of Tsybakov (2009), there exists a set such that the null function belongs to , and
[TABLE]
Let such that . We have
[TABLE]
Then Assumptions (i) and (iii) are fulfilled.
Proof of (iv).
Using (67), we deduce that using the definition of
[TABLE]
Then Assumption (iv) is fulfilled.
5.3 Proof of Theorem 3
We have using (40) and (49) that
[TABLE]
Let . Let . We have
[TABLE]
Then we have
[TABLE]
Note that we have
[TABLE]
where we use the properties of and stated in page 61. In the following, we will demonstrate that
[TABLE]
Combining (80) with (81) and (82), we obtain that
[TABLE]
Theorem 3 is then a direct consequence of the above inequality and (79).
Proof of (82)
Now let us control for . We have
[TABLE]
Then
[TABLE]
where
[TABLE]
We have
[TABLE]
where
[TABLE]
and
[TABLE]
Using these notations we have:
[TABLE]
where
[TABLE]
and
[TABLE]
Now note that
[TABLE]
and for
[TABLE]
Using Lemma 2 with (respectively ), (respectively ) and (respectively ), we deduce that for all for
[TABLE]
This implies that
[TABLE]
Now (83) and (84) entail (82).
5.4 Proof of Theorem 4.
Let , , and . Define for
[TABLE]
For large enough, we have . Using (5.1) and (23), Theorem 3 implies that
[TABLE]
where is a constant that changes from line to line and depends on , and . Since does not depend on , this ends the proof.
5.5 Proof of Lemma 1.
We have
[TABLE]
Moreover we have
[TABLE]
5.6 Proof of Lemma 2
In this proof, is a positive constant that changes of value from line to line. Since is fixed, we simplify the notation and use in the proof and . Now, we have for :
[TABLE]
where
[TABLE]
where for any :
[TABLE]
and
[TABLE]
Note that both and are positive numbers.
5.6.1 Control of
We have
[TABLE]
Note that we have using Cauchy-Schwarz and Markov inequality
[TABLE]
Moreover since , using Lemma 1 with , we have
[TABLE]
Now using (98), (103) and (106), we finally obtain
[TABLE]
5.6.2 Control of
Define
[TABLE]
We have
[TABLE]
where using Lemma 1 with
[TABLE]
and following (103)
[TABLE]
5.6.3 Control of
Define
[TABLE]
We have
[TABLE]
where
[TABLE]
and
[TABLE]
Note that using similar arguments as above with ,
[TABLE]
and following (106) .
5.6.4 Control of
We have to bound
[TABLE]
Note that, using duality arguments, there exists a countable set of functions such that and
[TABLE]
where
[TABLE]
and, for , we have:
[TABLE]
and
[TABLE]
Note that we have both and . Now, let us control:
[TABLE]
Using Cauchy-schwarz’s inequality and Fubini’s theorem we obtain:
[TABLE]
We have
[TABLE]
and
[TABLE]
Combining the previous results we have:
[TABLE]
Define
[TABLE]
We have:
[TABLE]
Define:
[TABLE]
Using Bousquet’s inequality we have:
[TABLE]
where
[TABLE]
and
[TABLE]
Since , we have, , that is:
[TABLE]
Since , for large enough we have . Moreover we have doing the change of variables
[TABLE]
This implies that:
[TABLE]
Combining results of Sections 5.6.1, 5.6.2, 5.6.3 and 5.6.4, we obtain (31)
Acknowledgements
The authors have been supported by Fondecyt projects 1171335 and 1190801, and Mathamsud projects 19-MATH-06 and 20-MATH-05.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Azaïs and Fort (2013) Jean-Marc Azaïs and Jean-Claude Fort. Remark on the finite-dimensional character of certain results of functional statistics. C. R. Math. Acad. Sci. Paris , 351(3-4):139–141, 2013. ISSN 1631-073X. doi: 10.1016/j.crma.2013.02.004 . URL https://doi.org/10.1016/j.crma.2013.02.004 . · doi ↗
- 2Bertin et al. (2019) Karine Bertin, Salima El Kolei, and Nicolas Klutchnikoff. Adaptive density estimation on bounded domains. Annales de l’Institut Henri Poincaré, Probabilités et Statistiques , 55(4):1916–1947, 2019. doi: 10.1214/18-AIHP 938 . URL https://projecteuclid.org/euclid.aihp/1573203619 .
- 3Biau et al. (2010) Gérard Biau, Frédéric Cérou, and Arnaud Guyader. Rates of convergence of the functional k 𝑘 k -nearest neighbor estimate. IEEE Trans. Inform. Theory , 56(4):2034–2040, 2010. ISSN 0018-9448. doi: 10.1109/TIT.2010.2040857 . URL https://doi.org/10.1109/TIT.2010.2040857 . · doi ↗
- 4Boucheron et al. (2013) Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities . Oxford University Press, Oxford, 2013. ISBN 978-0-19-953525-5. doi: 10.1093/acprof:oso/9780199535255.001.0001 . URL https://doi.org/10.1093/acprof:oso/9780199535255.001.0001 . A nonasymptotic theory of independence, With a foreword by Michel Ledoux. · doi ↗
- 5Bousquet (2002) Olivier Bousquet. A Bennett concentration inequality and its application to suprema of empirical processes. C. R. Math. Acad. Sci. Paris , 334(6):495–500, 2002. ISSN 1631-073X. doi: 10.1016/S 1631-073X(02)02292-6 . URL https://doi.org/10.1016/S 1631-073X(02)02292-6 . · doi ↗
- 6Cadre and Truquet (2015) Benoît Cadre and Lionel Truquet. Nonparametric regression estimation onto a Poisson point process covariate. ESAIM Probab. Stat. , 19:251–267, 2015. ISSN 1292-8100. doi: 10.1051/ps/2014023 . URL https://doi.org/10.1051/ps/2014023 . · doi ↗
- 7Cadre et al. (2017) Benoît Cadre, Nicolas Klutchnikoff, and Gaspar Massiot. Minimax regression estimation for Poisson coprocess. ESAIM Probab. Stat. , 21:138–158, 2017. ISSN 1292-8100. doi: 10.1051/ps/2017004 . URL https://doi.org/10.1051/ps/2017004 . · doi ↗
- 8Cai and Hall (2006) T. Tony Cai and Peter Hall. Prediction in functional linear regression. Ann. Statist. , 34(5):2159–2179, 2006. ISSN 0090-5364. doi: 10.1214/009053606000000830 . URL https://doi.org/10.1214/009053606000000830 . · doi ↗
