Aggregated kernel based tests for signal detection in a regression model
Thi Thien Trang Bui

TL;DR
This paper introduces an aggregated kernel-based testing method for detecting signals in regression models, which is effective even with unknown variance and adapts to various alternative hypotheses.
Contribution
It proposes a novel aggregation approach for kernel-based tests that automatically selects kernels and parameters, ensuring adaptivity and non-asymptotic control.
Findings
The method achieves minimax adaptive testing over multiple classes of alternatives.
It provides non-asymptotic level-? tests with controlled error rates.
The aggregation procedure simplifies kernel choice and improves detection power.
Abstract
Considering a regression model, we address the question of testing the nullity of the regression function. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. We first propose a single testing procedure based on a general symmetrickernel and an estimation of the variance of the observations. The corresponding critical values are constructed to obtain non asymptotic level-? tests. We then introduce an aggregation procedure to avoid the difficult choice of the kernel and of the parameters of the kernel. The multiple tests satisfy non-asymptotic properties and are adaptive in the minimax sense over several classes of regular alternatives.
| 1st error | CI | |
|---|---|---|
| P | 0.0504 | |
| G | 0.0506 | |
| PG | 0.0498 |
| CI | CI | CI | CI | |||||
| P | 0.876 | 0.986 | 0.996 | 0.699 | ||||
| G | 0.831 | 0.977 | 0.992 | 0.635 | ||||
| PG | 0.884 | 0.984 | 0.996 | 0.690 | ||||
| CI | CI | CI | ||||
|---|---|---|---|---|---|---|
| P | 0.218 | 0.654 | 1 | * | ||
| G | 0.208 | 0.668 | 1 | * | ||
| PG | 0.210 | 0.678 | 1 | * | ||
| ICI | CI | CI | ||||
|---|---|---|---|---|---|---|
| P | 0.35 | 0.90 | 0.98 | |||
| G | 0.56 | 0.98 | 1 | * | ||
| PG | 0.34 | 0.89 | 1 | * | ||
| Test | |||||
| P | 0.049 | 0.606 | 1 | 1 | |
| G | 0.048 | 0.459 | 0.99 | 1 | |
| PG | 0.048 | 0.441 | 0.99 | 1 | |
| EL1 | 0.074 | 0.837 | 1 | 1 | |
| EL2 | 0.062 | 0.805 | 1 | 1 | |
| P | 0.053 | 0.224 | 0.905 | 1 | |
| G | 0.053 | 0.630 | 0.922 | 1 | |
| PG | 0.049 | 0.228 | 1 | 1 | |
| EL1 | 0.069 | 0.718 | 1 | 1 | |
| EL2 | 0.058 | 0.693 | 1 | 1 | |
| P | 0.043 | 0.134 | 0.696 | 0.990 | |
| G | 0.044 | 0.146 | 0.741 | 0.995 | |
| PG | 0.045 | 0.134 | 0.700 | 0.996 | |
| EL1 | 0.076 | 0.134 | 0.428 | 0.979 | |
| EL2 | 0.056 | 0.107 | 0.368 | 0.961 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
**Aggregated kernel based tests for signal detection in a regression model **
Bui Thi Thien Trang 1
1* Institut de Mathématiques de Toulouse
Université Paul Sabatier 118, route de Narbonne F-31062 Toulouse Cedex 9
Abstract. Considering a regression model, we address the question of testing the nullity of the regression function. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. We first propose a single testing procedure based on a general symmetric kernel and an estimation of the variance of the observations. The corresponding critical values are constructed to obtain non asymptotic level- tests. We then introduce an aggregation procedure to avoid the difficult choice of the kernel and of the parameters of the kernel. The multiple tests satisfy non-asymptotic properties and are adaptive in the minimax sense over several classes of regular alternatives.
Keywords. Separation rates, adaptive tests, regression model, kernel methods, aggregated test.
1 Introduction
We observe that obey to the regression model described as follows,
[TABLE]
We assume that are i.i.d real random variables with values in a measurable set such that with bounded density with respect to the Lebesgue measure on and are i.i.d standard Gaussian variables, independent of . All along the paper, is assumed to be in . We also assume that . In order to estimate , we assume that we also observe that obey to the model
[TABLE]
where is independent of .
Given the observation of , we want to test the null hypothesis
[TABLE]
against the alternative
[TABLE]
Hypothesis testing in nonparametric regression have been considered in the papers by King, (1988), Hardle and Marron, (1990), Hall and Hart, (1990), King et al., (1991) and Delgado, (1992). Tests for no effect in nonparametric regression are investigated in Eubank and LaRiccia, (1993). In the paper of Spokoiny et al., (1996), the authors considered the particular case where is assumed to be known. They propose tests that tests achieve the minimax rates of testing [up to an unvoidable factor] for a wide range of Besov classes. Baraud et al., (2003) propose a test, based on model selection methods, for testing in a fixed design regression model that belongs to a linear subspace of againts a nonparametric alternative. They obtain optimal rates of testing are up to a possible factor over various classes of alternatives simultaneously. More recently, in a Poisson process framework, Fromont et al., (2012, 2013) consider two independent Poisson processes and address the question of testing equality of their respective intensities. They introduce tests based on a single kernel function and aggregate several kernel based tests to obtain adaptive minimax testing procedures over alternatives based on Besov or Sobolev balls.
Our this work, we propose to construct aggregated kernel based testing procedures of versus in a regression model. Our test statistics are based on a single kernel function which can be chosen either as a projection or Gaussian kernel and we propose an estimation for the unknown variance . Our tests are exactly (and not only asymptotically) of level . We obtain the optimal non-asymptotic conditions on the alternative which guarantee that the probability of second kind error is at most equal to a precribed level . However, the testing procedures that we introduce hereafter also intended to overcome the question of calibrating the choice of kernel and/or the parameters of the kernel. They are based on an aggregation approach, that is well-known in adaptive testing (Baraud et al., (2003) and Fromont et al., (2013)). This paper is strengly inspired by the paper of Fromont et al., (2013). Instead of considering a particular single kernel, we consider a collection of kernels and the corresponding collection of tests, each with an adapted level of significance. We then reject the null hypothesis when there exists at least one of the tests in the collection which rejects the null hypothesises. The aggregated testing procedures are constructed to be of level and the loss in second kind error due to the aggregation, when unavoidable, is as small as possible. Then we prove that these multiples tests satisfy the adaptive minimax properties over several classes of alternatives. At last, we compare our tests with tests investigated in Eubank and LaRiccia, (1993) from a practical point of view.
The paper is organized as follows. We describe the single tests based on a single kernel function with the corresponding critical values approximated by a Monte Carlo method in Section 2. In Section 3, we specify the performances of the single tests for two particular examples of kernels and explain the reasons why we need to aggregate tests based on a collection of kernel functions which are presented in Section 4. We present the simulation study in Section 5 and the major proofs are given in Appendix.
2 Single tests based on a single kernel.
2.1 Definition of the testing procedure.
We assume that we observe that obey to model (1). In order to estimate the unknown variance , we assume that we observe another sample from the model (2). We are interested in testing the null hypothesis against . Let be a symmetric kernel function: satisfying:
Assumption 1**.**
[TABLE]
We introduce the test statistic defined as follows,
[TABLE]
where
[TABLE]
and
[TABLE]
where for the sake of simplicity, we assume that is even. Let us now introduce some notations. We set and is a constant depending on and , that will be used all along the paper and may vary from line to line.
The expectation of is equal to
[TABLE]
In the following, we denote for all ,
[TABLE]
and for all
[TABLE]
Within these notations,
[TABLE]
whose existence is ensured by Assumption 1. We now compute the expectation of .
[TABLE]
with .
Thus is a biased estimator of with bias . If is a regular function this bias will be small.
We have chosen to consider and study in this paper two possible examples of kernel functions. For each example, we give a simpler expression of .
Example 1. When , our first choice for is a symmetric kernel function based on a finite orthonormal family with respect to the scalar product ,
[TABLE]
For all in we get
[TABLE]
where is the subspace of generated by the functions and denotes the orthogonal projection onto for . Thus
[TABLE]
Hence, when is well-chosen, can also be viewed as a relevant estimator of .
Example 2. When and is a density function respect to the Lebesgue measure on , our second choice for is a Gaussian kernel defined by,
[TABLE]
where and is a positive bandwidth. Then, for all we have
[TABLE]
where is the convolution producer with respect to the measure and . Thus in this case
[TABLE]
Hence, when the bandwidth is well chosen, can also be viewed as a relevant estimator of .
From the choices of the two examples above for , we have seen that the test statistic can be viewed as a relevant estimator of . Thus, it seems to be reasonable proposal to consider a test which rejects when is as ”large enough”. Now, we define the critical values used in our tests.
We define
[TABLE]
Note that, under , conditionally on , and have exactly the same distribution. We now choose the quantile of the conditional distribution of given as the critical value for our test. This quantity can easily be estimated by simulations.
More precisely, for in , if denotes the quantile of the distribution of conditionally on , we consider the test that rejects when . The corresponding test function is defined by
[TABLE]
Notice that in practice, the true quantile is not available, but it can be approximated by a Monte Carlo procedure.
2.2 Probabilities of first and second kind errors of the test.
Since under , and have the same distribution conditionally on , for any , we have
[TABLE]
By taking the expectation over , we obtain
[TABLE]
Let us now consider an alternative hypothesis, corresponding to a non zero regression function . Given in , we now aim to determine a non-asymptotic condition on the regression function which guarantees that . Denoting by the quantile of the conditional quantile ,
[TABLE]
Thus, a condition which guarantees that will ensure that . The following proposition gives such a condition.
Proposition 2.1**.**
Let be the fixed levels in . We have that
[TABLE]
as soon as
[TABLE]
with
[TABLE]
Thus we have, under (11),
[TABLE]
Moreover, there exists some constant such that, for every and
[TABLE]
To prove the first part of this result, we simply use Markov’s inequality for the term and an exponential inequality for non-central Chi-square variables due to (Birgé, (2001)) for the term . The control of derives from a property of Gaussian chaoes combined with an exponential inequality (due to De la Pena and Giné, (2012) and Huskova and Janssen, (1993)). The detailed proof is given in the Appendix.
The following theorem gives a condition on for the test to be powerful.
Theorem 2.2**.**
Let be fixed levels in , be a positive constant, be a symmetric kernel function, and be the test defined by (10). Let be an upper bound for . Then for all , we have , as soon as
[TABLE]
The right hand side of the above inequality corresponds to a bias-variance trade-off. For particular choices of the kernel function , these terms will be upper bounded in Section 3.
2.3 Performance of the Monte Carlo approximation.
In this section, we introduce a Monte Carlo method used to approximate the conditional quantiles by as follows. We consider the set of independent sequences of i.i.d standard Gaussian variables
[TABLE]
where , , .
We define
[TABLE]
where are observed from model (2).
Under , conditionally on , the variables have the same distribution function as and as . We denote by the empirical distribution function of the sample , conditionally on .
[TABLE]
Then the Monte Carlo approximation of is defined by
[TABLE]
We recall the test function defined in (10) and we reject when with the quantile of defined by (9) conditionally on . Now, by using the estimated quantile , we consider the test given by
[TABLE]
For the test defined in (14), the probabilities of first and second kind errors can above upper bounded. This is the purpose of the two following propositions, whose proofs are given in Fromont et al., (2013).
Proposition 2.3**.**
Let be some fixed level in , and be the test defined by (14). Then,
[TABLE]
Proposition 2.4**.**
Let and be fixed levels in such that and . Let be the test given in (14). Let and as in Proposition 2.1, and let be the quantile of . If
[TABLE]
then . Moreover,
[TABLE]
Comments. When comparing (15) and (16) with (11) and (12) in Proposition 2.1, we notice that they asymptotically coincide when . Moreover, if and , the multiplicative factor of is of order in (16) compared with (12).
3 Two particular examples of kernel function.
In this section, we specify the performances of the above test for two examples of the kernels including projection kernels and Gaussian kernels.
3.1 Projection kernels.
We assume . We consider the projection kernel defined in (7) and aim to give a more explicit formulation for the result of Theorem 2.2 under the choice of this kernel. We also evaluate the uniform separation rates over Besov bodies.
Corollary 3.1**.**
Let and be a constant. Let be defined in (10), where is the projection kernel defined by (7). We denote by the linear subspace of , generated by the functions , and we assume that the dimension of is equal . Then if
[TABLE]
then
[TABLE]
Let us consider the particular case when the kernel is the projection kernel onto the space generated by functions of the Haar basis defined as follows.
Let be the Haar basis of with
[TABLE]
where . The linear subspace is generated by a subset of the Haar basis. More precisely, we denote by the subspace of generated by , and we define
[TABLE]
We also consider, for the subspace generated by with , and
[TABLE]
We set and for every , .
We now introduce the Besov body defined for by
[TABLE]
For all , we consider the kernel function defined by (18), (19) and the associated test function defined in (10) with . For an optimal choice of , realizing a good compromise between the bias term and the variance term appearing in (2.2), we give a condition of for which ensures that the power of our test is larger than .
Proposition 3.2**.**
Let . For all , let defined by (18), (19) and consider the test function where
[TABLE]
For all such that
[TABLE]
we have .
Comments.
Non asymptotic lower bounds for the rates of testing in signal detection over Besov bodies are given in Baraud et al., (2002). These lower bounds coincide with the bound given in (21), hence our result is sharp. 2. 2.
In (20), depends on , the regularility parameter of the Besov body, so it leads to the natural question of the choice if this parameter. In order to propose a procedure that is adaptive with respect to the regularity of the unknown regression function , we introduce aggregated tests in Section 4.
3.2 Gaussian kernels.
For this second example, we assume that . We consider the Gaussian kernel defined in (8) and rewrite the result of Theorem 2.2 under the choice of this kernel. We also evaluate the uniform separation rates over Sobolev balls for this test.
Corollary 3.3**.**
Let , be a constant and be the test function defined in (10) where is defined in (8). For if
[TABLE]
We obtain that
[TABLE]
Let and . For in and , for all , we consider
[TABLE]
with
[TABLE]
Let us introduce for the Sobolev ball defined by
[TABLE]
where denotes the Fourier transform of : .
For all , we consider the kernel function defined by (23) and the associated test function defined in (10) with . For an optimal choice of , realizing a good compromise between the bias term and the variance term appearing in (3.3), we give a condition of for which ensures that the power of our test is larger than .
Proposition 3.4**.**
Let . For all , let defined by (23) and the test function we set
[TABLE]
For all such that
[TABLE]
We have .
Comments.
As in Proposition 3.2, we obtain in the right hand term of (25) a classical bound for the separation rates of testing over regular classes of alternatives such as Holderian balls (see Ingster, (1993)) for nonparametric minimax rates of testing in various setups. 2. 2.
Non asymptotic lower bounds for the rates of testing in signal detection over Sobolev balls are given in Fromont and Lévy-Leduc, (2006). These bounds coincide with the bound given in (25). 3. 3.
In (24), as previously, depends on , the regularity parameter of the Sobolev ball, so it leads to the natural question of the choice of this parameter answered through the aggregated tests in Section 4.
4 Multiple or aggregated tests based on collections of kernel functions.
In the previous section, we have considered testing procedures based on a single kernel function . However, the following question is natural: how can we choose the kernel, and its parameters. For instance, the orthonormal family in the projection kernel in Section 3.1, the bandwidth in the Gaussian kernel in Section 3.2. Baraud et al., (2003) proposed adaptive testing procedures based on the aggregation of a collection of tests. This idea is presented in a series of papers, among which Fromont et al., (2013) proposed an aggregation procedure. Following this idea, we consider in this section a collection of kernel functions instead of a single one. Beside that, we define a multiple testing procedure by aggregating the corresponding single tests, with an adapted choice of the critical values.
4.1 The aggregated testing procedure.
Let us describe the aggregated testing procedure by introducing a finite collection of symmetric kernel functions: . For , we replace in (3) and (9) by to define and and let be a collection of positive numbers such that . Conditionally on , for , we denote by the quantile of . Given in , we consider the test which rejects when there exists at least one in such that
[TABLE]
where is defined by
[TABLE]
We consider the test function defined by
[TABLE]
Using the Monter Carlo method, we can estimate and the quantiles for all . The following theorem provides a coltrol of the first and second kind error for the test . The detailed proof is given in the Appendix.
Theorem 4.1**.**
Let be fixed levels in and be the test defined by (27). We have
[TABLE]
And for all regression function , we have
[TABLE]
as soon as there exists in such that
[TABLE]
Comments. This theorem shows that the aggregated test is of level , for all . Moreover, as soon as the second kind error is controlled by for at least one test in the collection, the same holds for the aggregated procedure with the price that the level is replaced by to guarantee that the aggregated procedure is of level .
4.2 The aggregation of projection kernels.
Let us specify the performance of the aggregated test for a collection of projection kernels.
Corollary 4.2**.**
Let be fixed levels in . Let be a finite collection of linear subspaces of , generated by the functions and we assume that the dimension of is equal to . We set, for all , . Let be defined by (27) with the collection of kernels and the collection of positive numbers such that .
Then is a level test. Moreover, if
[TABLE]
where and .
Comments. Comparing this result with the one obtained in Corollary 3.1 for the single test based on a projection kernel, we can see that the multiple testing procedure allows to obtain the infimum over all in in the right hand side of (4.2) at the price of the additional term .
Let us consider the particular case when the collection of kernels is the collection of projection kernels based on the constructions in Section 3.1. Let for some , and for all in , .
We consider , the test defined by (27) with the collection of kernels where defined in (18), (19). We obtain from the Corollary 4.2 that there exists some constant such that as soon as
[TABLE]
For any we consider
[TABLE]
Corollary 4.3**.**
Let . For all , we consider the test function . Assuming that . Then, for any we set
[TABLE]
For all such that
[TABLE]
we have .
Comments. We obtain a right hand term in (33) of order . This rate of testing was shown to be optimal for the signal detection in a Gaussian white noise by Spokoiny et al., (1996). In particular, he showed that the logarithm factor is the price to pay for adaptation.
4.3 The aggregation of Gaussian kernels.
We here consider the aggregated test based on a collection of Gaussian kernels.
Corollary 4.4**.**
*Let , be a collection of positive bandwidths, we consider a collection of Gaussian kernels corresponding to the above collection of positive bandwidths, where defined in (23). Let be defined by (27) with the collection of kernel and a collection of positive numbers such that .
Then is a level test. Moreover, there exists such that if*
[TABLE]
We obtain that
For . We consider the particular case where we take and for all . Let be the test defined by (27) with the collection of Gaussian kernels and . We obtain from Corollary 4.4 that there exists such that if
[TABLE]
For we consider
[TABLE]
Corollary 4.5**.**
Let . For all , we consider the test function and assume that . For any , we set
[TABLE]
For all such that
[TABLE]
we have .
Comments. The rate of testing is of order . This rate was shown to be optimal over periodic Sobolev balls up to the logarithm, by Castillo et al., (2006).
5 Simulation study.
5.1 Presentation of the simulation study.
We study our aggregated testing procedures from a practical point of view in this section. We consider and choose . In the following simulation, are i.i.d uniform random variables on .
Let us introduce the collection of symmetric kernel functions and the aggregated testing procedure defined by (27) as follows. First, we consider the test denoted by P corresponding to a collection of projection kernels. To be more explicit, we consider the Haar basis introduced in Section 3.1. Let and for with . Let and for all in , . We consider the multiple testing procedure with the collection of kernels .
Second, we also consider the multiple test associated with the collection of Gaussian kernel functions defined in Section 4.3. For we take , let with . Then taking , we consider the multiple testing procedure denoted by G, with the collection of kernels .
At last, we are interested in the collection of both projection and Gaussian kernels. We define denoted by PG, the multiple testing procedure with the collection of kernels . For we take and for we take .
We recall that the test rejects when there exists at least one in such that . Hence, for each observation we have to estimate defined by (26) and . Applying the Monte Carlo method introduced in the Section 2.3, these quantities are well approximated. To be more explicit, we generate samples of and , in which we use one half to approximate the conditional probability occurring in (26) and other half is used to estimate the distribution of each . We note that is approximated by taking in a regular grid of with bandwidth and choosing the approximation of as the largest value of the grid such that the estimated conditional probabilities in (26) are less than .
5.2 Simulation results.
We first study the probabilities of first kind error of each test. We realize simulations of . For each simulation, we determine the conclusions of tests P, G and PG where the critical values are approximated by the Monte Carlo methods described above. The probabilities of first kind error of tests are estimated by the number of rejections for these tests divided by . The obtained estimated levels of tests and the corresponding confidence intervals (CI) are showed in the Table 1.
We then study the probabilities of rejection for each test under several alternatives. We first consider the following alternative,
[TABLE]
with and . Second, we consider the alternative defined by
[TABLE]
with , and for all . Next, we consider the following alternative,
[TABLE]
with . The last alternative, for which we aim to compare our results with the results of Eubank and LaRiccia, (1993) is defined as follows
[TABLE]
where and .
For each alternative , we realize 1000 simulations of . For each simulation, we determine conclusions of tests P, G and PG, where the critical values of our tests are still approximated by the Monte Carlo method. The powers of tests are estimated by the number of rejections divided by 1000. The obtained estimated powers of tests and lower bounds of the asymptotic confidence intervals with the confidence level are represented in the Table 2, 3 and 4. Table 5 is proposed for comparing our tests and the two of tests denoted by EL1, denoted by EL2, which were proposed in Eubank and LaRiccia, (1993). We recall briefly tests , as follows.
[TABLE]
and
[TABLE]
where indicates summation excluding the zero index and are the sample Fourier coefficients,
[TABLE]
In the three alternatives , and , the test PG is more powerful than P and G tests. Our conclusion is that the test PG is a good choice in practice. In Table 5, we see in the firt column (), which corresponds to the null hypothesis that our test is of level , which is not the case for the tests proposed by Eubank and LaRiccia, (1993), which are only asymptotically of level . This explain why our test is generally less powerful than the tests EL1 and EL2 for . In the other cases, we obtain as good or better results.
Appendix A Proof of Proposition 2.1
Let us prove the first part of Prop 2.1. Recall that denotes the quantile of which is the quantile of conditionally on . We here want to find a condition on , ensuring that
[TABLE]
From Markov’s inequality, we have for all
[TABLE]
Let us compute . We see that
[TABLE]
Then
[TABLE]
Since \mathbb{E}\left[T_{K}^{2}\right]=\mathbb{E}\left[\mathbb{E}\left[T_{K}^{2}\big{|}X\right]\right], and since are i.i.d with density on , we obtain
[TABLE]
Thus
[TABLE]
In fact
[TABLE]
Then
[TABLE]
Replacing (38) into (37) we obtain
[TABLE]
Choosing , the above inequality leads to
[TABLE]
This implies
[TABLE]
Now we consider the term . Following to the Cochran’s theorem, we consider the orthogonal subspace of dimension . We denote be an orthogonal basis of , where for all , is a vetor includes elements within two values and its values equal to at two positions and . On the other hand, for , with we have
[TABLE]
Using the Cochran’s theorem, we have
[TABLE]
where and denotes a non central Chi-square variable with degrees of freedom and non centrality parameter .
Moreover,
[TABLE]
Hence
[TABLE]
Now, we consider the variable . Using Lemma 8.1 in Birgé, (2001), we have
[TABLE]
This implies
[TABLE]
Choosing , (41) leads to
[TABLE]
where
[TABLE]
Thus
[TABLE]
[TABLE]
with
[TABLE]
If we have .
Therefore, if
[TABLE]
then
[TABLE]
Let us now give an upper bound for . Reasoning conditionally on , we recognize in be a Gaussian chaos, as defined by De la Pena and Giné, (2012), of the form , where ’s are some real deterministic numbers and is a sequence of i.i.d Gaussian variables. Corollary 3.26 of De la Pena and Giné, (2012) states that there exists some absolute constant such that if . Then
[TABLE]
Hence by Markov’s inequality,
[TABLE]
Applying the result (45) for with
[TABLE]
we have
[TABLE]
On the other hand, we have, under , where
[TABLE]
Since the variables , are i.i.d standard Gaussian variables. Using the Lemma 8.1 in Birgé, (2001), we obtain
[TABLE]
Choosing , (48) leads to
[TABLE]
Moreover, we have
[TABLE]
[TABLE]
This implies
[TABLE]
Thus the quantile of conditionally on satisfies
[TABLE]
Taking , so , (51) returns to
[TABLE]
Hence is upper bounded by the quantile of .
We define
[TABLE]
We use Markov’s inequality again for the nonnegative random variable , we obtain for any
[TABLE]
We have
[TABLE]
Choosing , (53) returns to
[TABLE]
and
[TABLE]
which concludes the proof.
Appendix B Proof of Theorem 2.2
For all symmetric kernel function , we have
[TABLE]
On the other hand
[TABLE]
Let be an upper bound for , we have
[TABLE]
From Proposition 2.1, the bounds for and and the inequality for all , we deduce that as soon as,
[TABLE]
By using the elementary inequality with and in the right hang side of the above condition, the above condition holds if
[TABLE]
Appendix C Proof of Corollary 3.1 and 3.3.
Under the hypothesis of corollary 3.1,
[TABLE]
and the linear space generated by the functions is of dimension . Hence, we have
[TABLE]
Thus, we can take .
Second, under choice of the Gaussian kernel defined by (8), we recall that
[TABLE]
where and is a positive bandwidth.
We have
[TABLE]
Hence, we can choose .
Appendix D Proof of Proposition 3.2
For all , we set be the dimension of . Let us assume that , it implies
[TABLE]
We obtain from Corollary 3.1 that there exists
[TABLE]
such that if
[TABLE]
In this case, we see that the right hand side of (21) reproduces a bias-variance decomposition close to the bias-variance decomposition for projection estimators, with the bias term and the variance term . The optimal choice of satisfies
[TABLE]
Thus, we obtain the optimal choice ,
[TABLE]
leading to the desired result.
Appendix E Proof of Proposition 3.4
Considering (3.3), we mainly have to find a sharp upper bound for when . Plancherel’s theorem gives that when ,
[TABLE]
We assume that and
[TABLE]
for some . There also exists some constant such that
[TABLE]
Then
[TABLE]
and since ,
[TABLE]
We obtain from corollary 3.3 that there exists
[TABLE]
such that if
[TABLE]
In this case, we see that the right hand side of (59) reproduces a bias-variance decomposition with the bias term and the variance term . The optimal choice of satisfies
[TABLE]
Thus, we obtain the optimal choice as follows.
[TABLE]
leading to the desired result.
Appendix F Proof of Theorem 4.1, Corollary 4.2 and 4.4.
We have
[TABLE]
We have,
[TABLE]
by definition of , which implies that .
On the other hand, we know that . Setting , we have
[TABLE]
as soon as there exists in such that
[TABLE]
We can now apply Corollary 3.1 and 3.3 with , so we replace by for desired results in Corollary 4.2 and 4.4.
Appendix G Proof of Corollary 4.3.
Considering (31), we aim to find a sharp upper bound for the right hand side of the inequality when . Let us assume that . Then , we have
[TABLE]
and
[TABLE]
Hence (31) can be upper bounded by
[TABLE]
Taking
[TABLE]
[TABLE]
That leads to if
[TABLE]
Appendix H Proof of Corollary 4.4.
Considering (35), we aim to find a sharp upper bound for the right hand side of the inequality when . Let us assume that . Similarly, with regards to the proof of Proposition 3.4, we have
[TABLE]
and
[TABLE]
Hence (35) can be upper bounded by
[TABLE]
Choosing
[TABLE]
[TABLE]
That leads to if
[TABLE]
Acknowledgement
I gratefully thank to Professor Béatrice Laurent of Institut National des Sciences Appliquées de Toulouse and Professor Jean-Michel Loubes of Institut de Mathématiques de Toulouse for supporting me in the best ideas and comments.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Bachoc et al., (2017) Bachoc, F., Gamboa, F., Loubes, J.-M., and Venet, N. (2017). A gaussian process regression model for distribution inputs. IEEE Transactions on Information Theory .
- 2Baraud et al., (2002) Baraud, Y. et al. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli , 8(5):577–606.
- 3Baraud et al., (2003) Baraud, Y., Huet, S., Laurent, B., et al. (2003). Adaptive tests of linear hypotheses by model selection. The Annals of Statistics , 31(1):225–251.
- 4Birgé, (2001) Birgé, L. (2001). An alternative point of view on lepski’s method. Lecture Notes-Monograph Series , pages 113–133.
- 5Butucea and Tribouley, (2006) Butucea, C. and Tribouley, K. (2006). Nonparametric homogeneity tests. Journal of statistical planning and inference , 136(3):597–639.
- 6Castillo et al., (2006) Castillo, I., Lévy-Leduc, C., and Matias, C. (2006). Exact adaptive estimation of the shape of a periodic function with unknown period corrupted by white noise. Mathematical methods of statistics , 15(2):146–175.
- 7De la Pena and Giné, (2012) De la Pena, V. and Giné, E. (2012). Decoupling: from dependence to independence . Springer Science & Business Media.
- 8Delgado, (1992) Delgado, M. A. (1992). Testing the equality of nonparametric regression curves.
