Aggregated kernel based tests for signal detection in a regression model

Thi Thien Trang Bui

arXiv:1904.02965·math.ST·April 8, 2019

Aggregated kernel based tests for signal detection in a regression model

Thi Thien Trang Bui

PDF

TL;DR

This paper introduces an aggregated kernel-based testing method for detecting signals in regression models, which is effective even with unknown variance and adapts to various alternative hypotheses.

Contribution

It proposes a novel aggregation approach for kernel-based tests that automatically selects kernels and parameters, ensuring adaptivity and non-asymptotic control.

Findings

01

The method achieves minimax adaptive testing over multiple classes of alternatives.

02

It provides non-asymptotic level-? tests with controlled error rates.

03

The aggregation procedure simplifies kernel choice and improves detection power.

Abstract

Considering a regression model, we address the question of testing the nullity of the regression function. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. We first propose a single testing procedure based on a general symmetrickernel and an estimation of the variance of the observations. The corresponding critical values are constructed to obtain non asymptotic level-? tests. We then introduce an aggregation procedure to avoid the difficult choice of the kernel and of the parameters of the kernel. The multiple tests satisfy non-asymptotic properties and are adaptive in the minimax sense over several classes of regular alternatives.

Tables5

Table 1. Table 1: The probabilities of first kind error of the test for α = 0.05 𝛼 0.05 \alpha=0.05 and the upper and lower bounds of an asymptotic confidence interval with confidence level 99 % percent 99 99\% .

	1st error	CI
P	0.0504	$[0.033, 0.068]$
G	0.0506	$[0.032, 0.068]$
PG	0.0498	$[0.032, 0.0657]$

Table 2. Table 2: The power of the test for the alternative f 1 , a , ϵ subscript 𝑓 1 𝑎 italic-ϵ f_{1,a,\epsilon} corresponding to ( a , ϵ ) = ( 1 / 4 , 0.7 ) , ( 1 / 4 , 0.9 ) , ( 1 / 4 , 1 ) , ( 1 / 8 , 1 ) 𝑎 italic-ϵ 1 4 0.7 1 4 0.9 1 4 1 1 8 1 \left(a,\epsilon\right)=\left(1/4,0.7\right),\ \left(1/4,0.9\right),\ \left(1/4,1\right),\left(1/8,1\right) and the upper and lower bounds of an asymptotic confidence intervals with confidence level 99 % percent 99 99\% .

$(a, ϵ)$	$(1 / 4, 0.7)$		$(1 / 4, 0.9)$		$(1 / 4, 1)$		$(1 / 8, 1)$
$(a, ϵ)$	$\hat{p}$	CI	$\hat{p}$	CI	$\hat{p}$	CI	$\hat{p}$	CI
P	0.876	$[0.849, 0.903]$	0.986	$[0.976, 0.996]$	0.996	$[0.990, 1.001]$	0.699	$[0.662, 0.736]$
G	0.831	$[0.801, 0.861]$	0.977	$[0.965, 0.989]$	0.992	$[0.985, 0.999]$	0.635	$[0.596, 0.674]$
PG	0.884	$[0.858, 0.910]$	0.984	$[0.973, 0.994]$	0.996	$[0.991, 1.001]$	0.690	$[0.652, 0.727]$

Table 3. Table 3: The power of the test for the alternative f 2 , τ subscript 𝑓 2 𝜏 f_{2,\tau} corresponding to τ = 1 , 2 , 3 𝜏 1 2 3 \tau=1,2,3 and the upper and lower bounds of an asymptotic confidence intervals with confidence level 99 % percent 99 99\% .

$τ$	$0.05$		$0.1$		$0.5$
$τ$	$\hat{p}$	CI	$\hat{p}$	CI	$\hat{p}$	CI
P	0.218	$[0.177, 0.243]$	0.654	$[0.615, 0.693]$	1	*
G	0.208	$[0.175, 0.241]$	0.668	$[0.629, 0.704]$	1	*
PG	0.210	$[0.177, 0.243]$	0.678	$[0.639, 0.716]$	1	*

Table 4. Table 4: The power of the test for the alternative f 3 , c subscript 𝑓 3 𝑐 f_{3,c} corresponding to c = 1 , 2 , 3 𝑐 1 2 3 c=1,2,3 and the upper and lower bounds of an asymptotic confidence intervals with confidence level 99 % percent 99 99\% .

$c$	$1$		$2$		$3$
$c$	$\hat{p}$	ICI	$\hat{p}$	CI	$\hat{p}$	CI
P	0.35	$[0.311, 0.389]$	0.90	$[0.876, 0.924]$	0.98	$[0.969, 0.991]$
G	0.56	$[0.519, 0.600]$	0.98	$[0.967, 0.991]$	1	*
PG	0.34	$[0.301, 0.379]$	0.89	$[0.864, 0.915]$	1	*

Table 5. Table 5: The power of the test for the alternative f 4 , ϱ , j subscript 𝑓 4 italic-ϱ 𝑗 f_{4,\varrho,j} corresponding to ϱ = 0 , 0.5 , 1 , 1.5 , j = 1 , 2 , 3 formulae-sequence italic-ϱ 0 0.5 1 1.5 𝑗 1 2 3 \varrho=0,0.5,1,1.5,\ j=1,2,3 .

	Test	$ϱ = 0$	$ϱ = 0.5$	$ϱ = 1$	$ϱ = 1.5$
	P	0.049	0.606	1	1
$j = 1$	G	0.048	0.459	0.99	1
	PG	0.048	0.441	0.99	1
	EL1	0.074	0.837	1	1
	EL2	0.062	0.805	1	1
	P	0.053	0.224	0.905	1
$j = 3$	G	0.053	0.630	0.922	1
	PG	0.049	0.228	1	1
	EL1	0.069	0.718	1	1
	EL2	0.058	0.693	1	1
	P	0.043	0.134	0.696	0.990
$j = 6$	G	0.044	0.146	0.741	0.995
	PG	0.045	0.134	0.700	0.996
	EL1	0.076	0.134	0.428	0.979
	EL2	0.056	0.107	0.368	0.961

Equations348

Y_{i} = f (X_{i}) + σ ϵ_{i}, i = 1, \dots, n .

Y_{i} = f (X_{i}) + σ ϵ_{i}, i = 1, \dots, n .

Y_{i}^{^{'}} = f (\frac{i}{n}) + σ ϵ_{i}^{^{'}}, i = 1, \dots, n,

Y_{i}^{^{'}} = f (\frac{i}{n}) + σ ϵ_{i}^{^{'}}, i = 1, \dots, n,

(H_{0}) : f = 0,

(H_{0}) : f = 0,

(H_{1}) : f \neq = 0.

(H_{1}) : f \neq = 0.

\int_{E^{2}} K^{2} (x, y) f (x) f (y) d ν (x) d ν (y) < + \infty.

\int_{E^{2}} K^{2} (x, y) f (x) f (y) d ν (x) d ν (y) < + \infty.

V_{K} = \frac{T _{K}}{σ ^ _{n}^{2}},

V_{K} = \frac{T _{K}}{σ ^ _{n}^{2}},

T_{K} = \frac{1}{n ( n - 1 )} i \neq = j = 1 \sum n K (X_{i}, X_{j}) Y_{i} Y_{j}

T_{K} = \frac{1}{n ( n - 1 )} i \neq = j = 1 \sum n K (X_{i}, X_{j}) Y_{i} Y_{j}

\overset{σ}{^}_{n}^{2} = \frac{1}{n} i = 1 \sum n /2 (Y_{2 i - 1}^{^{'}} - Y_{2 i}^{^{'}})^{2},

\overset{σ}{^}_{n}^{2} = \frac{1}{n} i = 1 \sum n /2 (Y_{2 i - 1}^{^{'}} - Y_{2 i}^{^{'}})^{2},

E [T_{K}]

E [T_{K}]

= E \frac{1}{n ( n - 1 )} i \neq = j = 1 \sum n K_{ij} f (X_{i}) f (X_{j})

= \int_{E^{2}} K (x, y) f (x) f (y) d ν (x) d ν (y) .

K [f] (x) = \int_{E} K (x, y) f (y) d ν (y),

K [f] (x) = \int_{E} K (x, y) f (y) d ν (y),

⟨ f, g ⟩ = \int_{E} f (x) g (x) d ν (x) and ∥ f ∥^{2} = ⟨ f, f ⟩ .

⟨ f, g ⟩ = \int_{E} f (x) g (x) d ν (x) and ∥ f ∥^{2} = ⟨ f, f ⟩ .

E (T_{K}) = ⟨ K [f], f ⟩,

E (T_{K}) = ⟨ K [f], f ⟩,

E [σ_{n}^{2}]

E [σ_{n}^{2}]

= \frac{1}{n} i = 1 \sum n /2 [f (\frac{2 i - 1}{n}) - f (\frac{2 i}{n})]^{2} + \frac{1}{n} i = 1 \sum n /2 σ^{2} E (ϵ_{2 i - 1}^{^{'}} - ϵ_{2 i}^{^{'}})^{2}

= a^{2} + σ^{2},

K (x, y) = λ \in Λ \sum ϕ_{λ} (x) ϕ_{λ} (y) .

K (x, y) = λ \in Λ \sum ϕ_{λ} (x) ϕ_{λ} (y) .

K [f] (x)

K [f] (x)

= λ \in Λ \sum (\int_{0}^{1} ϕ_{λ} (y) f (y) d ν (y)) ϕ_{λ} (x) = Π_{S} (f),

E (T_{K}) = ⟨ Π_{S} (f), f ⟩ .

E (T_{K}) = ⟨ Π_{S} (f), f ⟩ .

K (x, y) = \frac{1}{h} k (\frac{x - y}{h}), for all (x, y) \in R^{2}

K (x, y) = \frac{1}{h} k (\frac{x - y}{h}), for all (x, y) \in R^{2}

K [f] (x) = \int_{- \infty}^{\infty} \frac{1}{h} k (\frac{x - y}{h}) f (y) d ν (y) = k_{h} * f (x),

K [f] (x) = \int_{- \infty}^{\infty} \frac{1}{h} k (\frac{x - y}{h}) f (y) d ν (y) = k_{h} * f (x),

E (T_{K}) = ⟨ k_{h} * f, f ⟩ .

E (T_{K}) = ⟨ k_{h} * f, f ⟩ .

V_{K}^{(0)} = \frac{\frac{1}{n ( n - 1 )} \sum _{i \neq = j = 1}^{n} K ( X _{i} , X _{j} ) ϵ _{i} ϵ _{j}}{\frac{1}{n} \sum _{i = 1}^{n /2} ( ϵ _{2 i - 1}^{^{'}} - ϵ _{2 i}^{^{'}} ) ^{2}} .

V_{K}^{(0)} = \frac{\frac{1}{n ( n - 1 )} \sum _{i \neq = j = 1}^{n} K ( X _{i} , X _{j} ) ϵ _{i} ϵ _{j}}{\frac{1}{n} \sum _{i = 1}^{n /2} ( ϵ _{2 i - 1}^{^{'}} - ϵ _{2 i}^{^{'}} ) ^{2}} .

Φ_{K, α} = \mathbbm 1 {V_{K} > q_{K, 1 - α}^{(X)}} .

Φ_{K, α} = \mathbbm 1 {V_{K} > q_{K, 1 - α}^{(X)}} .

\mathbb{P}_{(H_{0})}\left(V_{K}>q^{(X)}_{K,1-\alpha}\bigg{|}X\right)\leq\alpha.

\mathbb{P}_{(H_{0})}\left(V_{K}>q^{(X)}_{K,1-\alpha}\bigg{|}X\right)\leq\alpha.

P_{(H_{0})} (Φ_{K, α} = 1) \leq α .

P_{(H_{0})} (Φ_{K, α} = 1) \leq α .

P_{f} (Φ_{K, α} = 0)

P_{f} (Φ_{K, α} = 0)

\leq P_{f} (V_{K} \leq q_{K, 1 - β /2}^{α}) + β /2.

P_{f} (V_{K} \leq q_{K, 1 - β /2}^{α}) \leq β /2,

P_{f} (V_{K} \leq q_{K, 1 - β /2}^{α}) \leq β /2,

⟨ K [f], f ⟩ \geq \frac{16 A _{K} + 8 B _{K}}{β} + D_{n, β} q_{K, 1 - β /2}^{α},

⟨ K [f], f ⟩ \geq \frac{16 A _{K} + 8 B _{K}}{β} + D_{n, β} q_{K, 1 - β /2}^{α},

A_{K}

A_{K}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

**Aggregated kernel based tests for signal detection in a regression model **

Bui Thi Thien Trang 1

1* Institut de Mathématiques de Toulouse

Université Paul Sabatier 118, route de Narbonne F-31062 Toulouse Cedex 9

[email protected] *

Abstract. Considering a regression model, we address the question of testing the nullity of the regression function. The testing procedure is available when the variance of the observations is unknown and does not depend on any prior information on the alternative. We first propose a single testing procedure based on a general symmetric kernel and an estimation of the variance of the observations. The corresponding critical values are constructed to obtain non asymptotic level- $\alpha$ tests. We then introduce an aggregation procedure to avoid the difficult choice of the kernel and of the parameters of the kernel. The multiple tests satisfy non-asymptotic properties and are adaptive in the minimax sense over several classes of regular alternatives.

Keywords. Separation rates, adaptive tests, regression model, kernel methods, aggregated test.

1 Introduction

We observe $\left(X_{i},Y_{i}\right)_{1\leq i\leq n}$ that obey to the regression model described as follows,

[TABLE]

We assume that $X=\left(X_{1},X_{2},\cdots,X_{n}\right)$ are i.i.d real random variables with values in a measurable set $E$ such that $\left[0,1\right]\subset E\subset\mathbb{R}$ with bounded density $\nu$ with respect to the Lebesgue measure on $E$ and $\epsilon=\left(\epsilon_{1},\epsilon_{2},\cdots,\epsilon_{n}\right)$ are i.i.d standard Gaussian variables, independent of $\left(X_{1},X_{2},\cdots,X_{n}\right)$ . All along the paper, $f$ is assumed to be in $\mathbb{L}^{2}\left(E,d\nu\right)$ . We also assume that $\|f\|_{\infty}=\sup_{x\in E}|f(x)|<+\infty$ . In order to estimate $\sigma^{2}$ , we assume that we also observe $\left(Y^{{}^{\prime}}_{1},\cdots,Y^{{}^{\prime}}_{n}\right)$ that obey to the model

[TABLE]

where $\epsilon^{{}^{\prime}}=\left(\epsilon_{1}^{{}^{\prime}},\cdots,\epsilon_{n}^{{}^{\prime}}\right)$ is independent of $\left(X_{1},\cdots,X_{n},\epsilon_{1},\cdots,\epsilon_{n}\right)$ .

Given the observation of $\left(X_{i},Y_{i}\right)_{1\leq i\leq n},\ \left(Y^{{}^{\prime}}_{i}\right)_{1\leq i\leq n}$ , we want to test the null hypothesis

[TABLE]

against the alternative

[TABLE]

Hypothesis testing in nonparametric regression have been considered in the papers by King, (1988), Hardle and Marron, (1990), Hall and Hart, (1990), King et al., (1991) and Delgado, (1992). Tests for no effect in nonparametric regression are investigated in Eubank and LaRiccia, (1993). In the paper of Spokoiny et al., (1996), the authors considered the particular case where $\sigma$ is assumed to be known. They propose tests that tests achieve the minimax rates of testing [up to an unvoidable $\log\log(n)$ factor] for a wide range of Besov classes. Baraud et al., (2003) propose a test, based on model selection methods, for testing in a fixed design regression model that $\left(f\left(X_{1}\right),\cdots,f\left(X_{n}\right)\right)$ belongs to a linear subspace of $\mathbb{R}^{n}$ againts a nonparametric alternative. They obtain optimal rates of testing are up to a possible $\log n$ factor over various classes of alternatives simultaneously. More recently, in a Poisson process framework, Fromont et al., (2012, 2013) consider two independent Poisson processes and address the question of testing equality of their respective intensities. They introduce tests based on a single kernel function and aggregate several kernel based tests to obtain adaptive minimax testing procedures over alternatives based on Besov or Sobolev balls.

Our this work, we propose to construct aggregated kernel based testing procedures of $(H_{0})$ versus $(H_{1})$ in a regression model. Our test statistics are based on a single kernel function which can be chosen either as a projection or Gaussian kernel and we propose an estimation for the unknown variance $\sigma^{2}$ . Our tests are exactly (and not only asymptotically) of level $\alpha$ . We obtain the optimal non-asymptotic conditions on the alternative which guarantee that the probability of second kind error is at most equal to a precribed level $\beta$ . However, the testing procedures that we introduce hereafter also intended to overcome the question of calibrating the choice of kernel and/or the parameters of the kernel. They are based on an aggregation approach, that is well-known in adaptive testing (Baraud et al., (2003) and Fromont et al., (2013)). This paper is strengly inspired by the paper of Fromont et al., (2013). Instead of considering a particular single kernel, we consider a collection of kernels and the corresponding collection of tests, each with an adapted level of significance. We then reject the null hypothesis when there exists at least one of the tests in the collection which rejects the null hypothesises. The aggregated testing procedures are constructed to be of level $\alpha$ and the loss in second kind error due to the aggregation, when unavoidable, is as small as possible. Then we prove that these multiples tests satisfy the adaptive minimax properties over several classes of alternatives. At last, we compare our tests with tests investigated in Eubank and LaRiccia, (1993) from a practical point of view.

The paper is organized as follows. We describe the single tests based on a single kernel function with the corresponding critical values approximated by a Monte Carlo method in Section 2. In Section 3, we specify the performances of the single tests for two particular examples of kernels and explain the reasons why we need to aggregate tests based on a collection of kernel functions which are presented in Section 4. We present the simulation study in Section 5 and the major proofs are given in Appendix.

2 Single tests based on a single kernel.

2.1 Definition of the testing procedure.

We assume that we observe $\{(X_{i},Y_{i})\}_{i=1}^{n}$ that obey to model (1). In order to estimate the unknown variance $\sigma^{2}$ , we assume that we observe another sample $\left(Y^{{}^{\prime}}_{i}\right)_{1\leq i\leq n}$ from the model (2). We are interested in testing the null hypothesis $(H_{0}):\ f=0$ against $(H_{1}):\ f\neq 0$ . Let $K$ be a symmetric kernel function: $E\times E\rightarrow\mathbb{R}$ satisfying:

Assumption 1.

[TABLE]

We introduce the test statistic $V_{K}$ defined as follows,

[TABLE]

where

[TABLE]

and

[TABLE]

where for the sake of simplicity, we assume that $n$ is even. Let us now introduce some notations. We set $K_{ij}=K\left(X_{i},X_{j}\right),\ f_{i}=f(X_{i})$ and $C(a,b)$ is a constant depending on $a$ and $b$ , that will be used all along the paper and may vary from line to line.

The expectation of $T_{K}$ is equal to

[TABLE]

In the following, we denote for all $x\in E$ ,

[TABLE]

and for all $f,g\in\mathbb{L}^{2}(E,d\nu)$

[TABLE]

Within these notations,

[TABLE]

whose existence is ensured by Assumption 1. We now compute the expectation of $\widehat{\sigma}^{2}_{n}$ .

[TABLE]

with $a^{2}:=\frac{1}{n}\sum_{i=1}^{n/2}\left[f\left(\frac{2i-1}{n}\right)-f\left(\frac{2i}{n}\right)\right]^{2}$ .

Thus $\widehat{\sigma}^{2}_{n}$ is a biased estimator of $\sigma^{2}$ with bias $a^{2}$ . If $f$ is a regular function this bias will be small.

We have chosen to consider and study in this paper two possible examples of kernel functions. For each example, we give a simpler expression of $\mathbb{E}\left(T_{K}\right)$ .

Example 1. When $E=[0,1]$ , our first choice for $K$ is a symmetric kernel function based on a finite orthonormal family $\{\phi_{\lambda},\ \lambda\in\Lambda\}$ with respect to the scalar product $\langle.,.\rangle$ ,

[TABLE]

For all $f$ in $\mathbb{L}^{2}([0,1],d\nu)$ we get

[TABLE]

where $S$ is the subspace of $\mathbb{L}^{2}([0,1],d\nu)$ generated by the functions $\{\phi_{\lambda},\ \lambda\in\Lambda\}$ and $\Pi_{S}$ denotes the orthogonal projection onto $S$ for $\langle.,.\rangle$ . Thus

[TABLE]

Hence, when $\{\phi_{\lambda},\ \lambda\in\Lambda\}$ is well-chosen, $T_{K}$ can also be viewed as a relevant estimator of $\left\lVert f\right\rVert^{2}$ .

Example 2. When $E=\mathbb{R}$ and $\nu$ is a density function respect to the Lebesgue measure on $\mathbb{R}$ , our second choice for $K$ is a Gaussian kernel defined by,

[TABLE]

where $k(u)=\frac{1}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right),\ \text{for all}\ u\in\mathbb{R}$ and $h$ is a positive bandwidth. Then, for all $f\in\mathbb{L}^{2}(\mathbb{R},d\nu)$ we have

[TABLE]

where $\ast$ is the convolution producer with respect to the measure $\nu$ and $k_{h}(u)=\frac{1}{h}k\left(\frac{u}{h}\right),\ \forall u\in\mathbb{R}$ . Thus in this case

[TABLE]

Hence, when the bandwidth $h$ is well chosen, $T_{K}$ can also be viewed as a relevant estimator of $\|f\|^{2}$ .

From the choices of the two examples above for $K$ , we have seen that the test statistic $V_{K}$ can be viewed as a relevant estimator of $\|f\|^{2}$ . Thus, it seems to be reasonable proposal to consider a test which rejects $(H_{0})$ when $V_{K}$ is as ”large enough”. Now, we define the critical values used in our tests.

We define

[TABLE]

Note that, under $\left(H_{0}\right)$ , conditionally on $X$ , $V_{K}$ and $V_{K}^{\left(0\right)}$ have exactly the same distribution. We now choose the quantile of the conditional distribution of $V_{K}^{\left(0\right)}$ given $X$ as the critical value for our test. This quantity can easily be estimated by simulations.

More precisely, for $\alpha$ in $(0,1)$ , if $q^{(X)}_{K,1-\alpha}$ denotes the $(1-\alpha)$ quantile of the distribution of $V_{K}^{\left(0\right)}$ conditionally on $X$ , we consider the test that rejects $(H_{0})$ when $V_{K}>q^{(X)}_{K,1-\alpha}$ . The corresponding test function is defined by

[TABLE]

Notice that in practice, the true quantile $q^{(X)}_{K,1-\alpha}$ is not available, but it can be approximated by a Monte Carlo procedure.

2.2 Probabilities of first and second kind errors of the test.

Since under $(H_{0})$ , $V_{K}$ and $V_{K}^{\left(0\right)}$ have the same distribution conditionally on $X$ , for any $\alpha\in(0,1)$ , we have

[TABLE]

By taking the expectation over $X$ , we obtain

[TABLE]

Let us now consider an alternative hypothesis, corresponding to a non zero regression function $f$ . Given $\beta$ in $(0,1)$ , we now aim to determine a non-asymptotic condition on the regression function $f$ which guarantees that $\mathbb{P}_{f}(\Phi_{K,\alpha}=0)\leq\beta$ . Denoting by $q^{\alpha}_{K,1-\beta/2}$ the $(1-\beta/2)$ quantile of the conditional quantile $q^{(X)}_{K,1-\alpha}$ ,

[TABLE]

Thus, a condition which guarantees that $\mathbb{P}_{f}\left(V_{K}\leq q^{\alpha}_{K,1-\beta/2}\right)\leq\beta/2$ will ensure that $\mathbb{P}_{f}(\Phi_{K,\alpha}=0)\leq\beta$ . The following proposition gives such a condition.

Proposition 2.1.

Let $\alpha,\ \beta$ be the fixed levels in $(0,1)$ . We have that

[TABLE]

as soon as

[TABLE]

with

[TABLE]

Thus we have, under (11),

[TABLE]

Moreover, there exists some constant $\kappa>0$ such that, for every $K$ and $n\geq 32\ln(2/\alpha)$

[TABLE]

To prove the first part of this result, we simply use Markov’s inequality for the term $T_{K}$ and an exponential inequality for non-central Chi-square variables due to (Birgé, (2001)) for the term $\hat{\sigma}^{2}_{n}$ . The control of $q^{\alpha}_{K,1-\beta/2}$ derives from a property of Gaussian chaoes combined with an exponential inequality (due to De la Pena and Giné, (2012) and Huskova and Janssen, (1993)). The detailed proof is given in the Appendix.

The following theorem gives a condition on $\|f\|^{2}$ for the test to be powerful.

Theorem 2.2.

Let $\alpha,\ \beta$ be fixed levels in $(0,1)$ , $\kappa$ be a positive constant, $K$ be a symmetric kernel function, and $\Phi_{K,\alpha}$ be the test defined by (10). Let $C_{K}$ be an upper bound for $\int_{E^{2}}K^{2}(x,y)d\nu(x)d\nu(y)$ . Then for all $n\geq 32\ln(2/\alpha)$ , we have $\mathbb{P}_{f}(\Phi_{K,\alpha}=0)\leq\beta$ , as soon as

[TABLE]

The right hand side of the above inequality corresponds to a bias-variance trade-off. For particular choices of the kernel function $K$ , these terms will be upper bounded in Section 3.

2.3 Performance of the Monte Carlo approximation.

In this section, we introduce a Monte Carlo method used to approximate the conditional quantiles $q^{(X)}_{K,1-\alpha}$ by $\hat{q}^{(X)}_{K,1-\alpha}$ as follows. We consider the set of $2B$ independent sequences of i.i.d standard Gaussian variables

[TABLE]

where $\epsilon^{b}=\{\epsilon^{b}_{i}\}_{i=1}^{n}$ , $\epsilon^{{}^{\prime}b}=\{\epsilon^{{}^{\prime}b}_{i}\}_{i=1}^{n}$ , $1\leq b\leq B$ .

We define

[TABLE]

where $X=\left(X_{1},\cdots,X_{n}\right)$ are observed from model (2).

Under $(H_{0})$ , conditionally on $X$ , the variables $V_{K}^{\left(\epsilon^{b},\epsilon^{{}^{\prime}b}\right)}$ have the same distribution function as $V_{K}$ and as $V_{K}^{\left(0\right)}$ . We denote by $F_{K,B}$ the empirical distribution function of the sample $\left\{V_{K}^{\left(\epsilon^{b},\epsilon^{{}^{\prime}b}\right)},\ 1\leq b\leq B\right\}$ , conditionally on $X$ .

[TABLE]

Then the Monte Carlo approximation of $q^{(X)}_{K,1-\alpha}$ is defined by

[TABLE]

We recall the test function defined in (10) and we reject $(H_{0})$ when $V_{K}>q^{(X)}_{K,1-\alpha}$ with $q^{(X)}_{K,1-\alpha}$ the $(1-\alpha)$ quantile of $V_{K}^{\left(0\right)}$ defined by (9) conditionally on $X$ . Now, by using the estimated quantile $\hat{q}^{(X)}_{K,1-\alpha}$ , we consider the test given by

[TABLE]

For the test defined in (14), the probabilities of first and second kind errors can above upper bounded. This is the purpose of the two following propositions, whose proofs are given in Fromont et al., (2013).

Proposition 2.3.

Let $\alpha$ be some fixed level in $(0,1)$ , and $\widehat{\Phi}_{K,\alpha}$ be the test defined by (14). Then,

[TABLE]

Proposition 2.4.

Let $\alpha$ and $\beta$ be fixed levels in $(0,1)$ such that $\alpha_{B}=\alpha-\sqrt{\ln B/(2B)}$ and $\beta_{B}=\beta-2/B>0$ . Let $\widehat{\Phi}_{K,\alpha}$ be the test given in (14). Let $A_{K},B_{K},D_{n,\beta}$ and $\kappa$ as in Proposition 2.1, and let $q^{\alpha_{B}}_{K,1-\beta_{B}/2}$ be the $(1-\beta_{B}/2)$ quantile of $q^{(X)}_{K,1-\alpha_{B}}$ . If

[TABLE]

then $\mathbb{P}_{f}\left(\widehat{\Phi}_{K,\alpha}=0\right)\leq\beta$ . Moreover,

[TABLE]

Comments. When comparing (15) and (16) with (11) and (12) in Proposition 2.1, we notice that they asymptotically coincide when $B\rightarrow+\infty$ . Moreover, if $\alpha=\beta=0.05$ and $B\geq 6000$ , the multiplicative factor of $\kappa n\sqrt{B_{K}}$ is of order $1.2$ in (16) compared with (12).

3 Two particular examples of kernel function.

In this section, we specify the performances of the above test for two examples of the kernels including projection kernels and Gaussian kernels.

3.1 Projection kernels.

We assume $E=[0,1]$ . We consider the projection kernel defined in (7) and aim to give a more explicit formulation for the result of Theorem 2.2 under the choice of this kernel. We also evaluate the uniform separation rates over Besov bodies.

Corollary 3.1.

Let $\alpha,\ \beta\in(0,1)$ and $\kappa>0$ be a constant. Let $\Phi_{K,\alpha}$ be defined in (10), where $K$ is the projection kernel defined by (7). We denote by $S$ the linear subspace of $\mathbb{L}^{2}(\left[0,1\right],d\nu)$ , generated by the functions $\{\phi_{\lambda},\ \lambda\in\Lambda\}$ , and we assume that the dimension of $S$ is equal $D$ . Then $n\geq 32\ln(\alpha/2)$ if

[TABLE]

then

[TABLE]

Let us consider the particular case when the kernel $K$ is the projection kernel onto the space generated by functions of the Haar basis defined as follows.

Let $\{\phi_{0},\ \phi_{\left(j,k\right)},\ j\in\mathbb{N},k\in\{0,\cdots,2^{j}-1\}$ be the Haar basis of $\mathbb{L}^{2}([0,1])$ with

[TABLE]

where $\psi(x)=\mathbbm{1}_{\left[0,1/2\right)}(x)-\mathbbm{1}_{\left[1/2,1\right]}(x)$ . The linear subspace $S$ is generated by a subset of the Haar basis. More precisely, we denote by $S_{0}$ the subspace of $\mathbb{L}^{2}([0,1])$ generated by $\phi_{0}$ , and we define

[TABLE]

We also consider, for $J\geq 1$ the subspace $S_{J}$ generated by $\{\phi_{\lambda},\ \lambda\in\{0\}\cup\Lambda_{J}\}$ with $\Lambda_{J}=\{(j,k),\ j\in\{0,\cdots,J-1\},\ k\in\{0,\cdots,2^{j}-1\}\}$ , and

[TABLE]

We set $\alpha_{0}=\left\langle f,\phi_{0}\right\rangle$ and for every $j\in\mathbb{N},\ k\in\{0,\cdots,2^{j}-1\}$ , $\alpha_{j,k}=\left\langle s,\phi_{j,k}\right\rangle$ .

We now introduce the Besov body defined for $\delta>0,\ R>0$ by

[TABLE]

For all $J\geq 0$ , we consider the kernel function $K_{J}$ defined by (18), (19) and the associated test function $\Phi_{K_{J},\alpha}$ defined in (10) with $K=K_{J}$ . For an optimal choice of $J$ , realizing a good compromise between the bias term and the variance term appearing in (2.2), we give a condition of $\|f\|^{2}$ for $f\in\mathcal{B}_{2,\infty}^{\delta}(R)$ which ensures that the power of our test is larger than $1-\beta$ .

Proposition 3.2.

Let $\alpha,\ \beta\in(0,1)$ . For all $J\geq 0$ , let $K_{J}$ defined by (18), (19) and consider the test function $\Phi_{{K_{J^{*}},\alpha}}=\mathbbm{1}\{V_{K_{J^{*}}}>q^{(X)}_{K_{J^{*}},1-\alpha}\}$ where

[TABLE]

For all $f\in\mathcal{B}_{2,\infty}^{\delta}(R)$ such that

[TABLE]

we have $\mathbb{P}_{f}(\Phi_{K_{J^{*}},\alpha}=0)\leq\beta$ .

Comments.

Non asymptotic lower bounds for the rates of testing in signal detection over Besov bodies are given in Baraud et al., (2002). These lower bounds coincide with the bound given in (21), hence our result is sharp. 2. 2.

In (20), $J^{*}$ depends on $\delta$ , the regularility parameter of the Besov body, so it leads to the natural question of the choice if this parameter. In order to propose a procedure that is adaptive with respect to the regularity of the unknown regression function $f$ , we introduce aggregated tests in Section 4.

3.2 Gaussian kernels.

For this second example, we assume that $E=\mathbb{R}$ . We consider the Gaussian kernel defined in (8) and rewrite the result of Theorem 2.2 under the choice of this kernel. We also evaluate the uniform separation rates over Sobolev balls for this test.

Corollary 3.3.

Let $\alpha,\beta\in(0,1)$ , $\kappa>0$ be a constant and $\Phi_{K,\alpha}$ be the test function defined in (10) where $K$ is defined in (8). For $n\geq 32\ln(2/\alpha)$ if

[TABLE]

We obtain that

[TABLE]

Let $E=\mathbb{R}$ and $\mathcal{L}=\mathbb{N}^{*}$ . For $x,\ y$ in $\mathbb{R}$ and $h=2^{-l}$ , for all $l\in\mathcal{L}$ , we consider

[TABLE]

with

[TABLE]

Let us introduce for $\delta>0$ the Sobolev ball $\mathcal{S}^{\delta}(R)$ defined by

[TABLE]

where $\hat{s}$ denotes the Fourier transform of $s$ : $\hat{s}(u)=\int_{\mathbb{R}}s(x)e^{i\langle x,u\rangle}dx$ .

For all $l\in\mathcal{L}$ , we consider the kernel function $K_{l}$ defined by (23) and the associated test function $\Phi_{K_{l},\alpha}$ defined in (10) with $K=K_{l}$ . For an optimal choice of $l$ , realizing a good compromise between the bias term and the variance term appearing in (3.3), we give a condition of $\|f\|^{2}$ for $f\in\mathcal{S}_{\delta}(R)$ which ensures that the power of our test is larger than $1-\beta$ .

Proposition 3.4.

Let $\alpha,\ \beta\in(0,1)$ . For all $l\in\mathcal{L}$ , let $K_{l}$ defined by (23) and the test function $\Phi_{K_{l},\alpha}=\mathbbm{1}\{V_{K_{l}}>q^{(X)}_{K_{l},1-\alpha}\}$ we set

[TABLE]

For all $f\in\mathcal{S}_{\delta}(R)$ such that

[TABLE]

We have $\mathbb{P}_{f}(\Phi_{K_{l^{*}},\alpha}=0)\leq\beta$ .

Comments.

As in Proposition 3.2, we obtain in the right hand term of (25) a classical bound for the separation rates of testing over regular classes of alternatives such as Holderian balls (see Ingster, (1993)) for nonparametric minimax rates of testing in various setups. 2. 2.

Non asymptotic lower bounds for the rates of testing in signal detection over Sobolev balls are given in Fromont and Lévy-Leduc, (2006). These bounds coincide with the bound given in (25). 3. 3.

In (24), as previously, $l^{*}$ depends on $\delta$ , the regularity parameter of the Sobolev ball, so it leads to the natural question of the choice of this parameter answered through the aggregated tests in Section 4.

4 Multiple or aggregated tests based on collections of kernel functions.

In the previous section, we have considered testing procedures based on a single kernel function $K$ . However, the following question is natural: how can we choose the kernel, and its parameters. For instance, the orthonormal family in the projection kernel in Section 3.1, the bandwidth $h$ in the Gaussian kernel in Section 3.2. Baraud et al., (2003) proposed adaptive testing procedures based on the aggregation of a collection of tests. This idea is presented in a series of papers, among which Fromont et al., (2013) proposed an aggregation procedure. Following this idea, we consider in this section a collection of kernel functions instead of a single one. Beside that, we define a multiple testing procedure by aggregating the corresponding single tests, with an adapted choice of the critical values.

4.1 The aggregated testing procedure.

Let us describe the aggregated testing procedure by introducing a finite collection $\left\{K_{m},\ m\in\mathcal{M}\right\}$ of symmetric kernel functions: $E\times E\rightarrow\mathbb{R}$ . For $m\in\mathcal{M}$ , we replace $K$ in (3) and (9) by $K_{m}$ to define $V_{K_{m}}$ and $V_{K_{m}}^{\left(0\right)}$ and let $\left\{w_{m},\ m\in\mathcal{M}\right\}$ be a collection of positive numbers such that $\sum_{m\in\mathcal{M}}e^{-w_{m}}\leq 1$ . Conditionally on $X$ , for $u\in(0,1)$ , we denote by $q^{(X)}_{m,1-u}$ the $(1-u)$ quantile of $V_{K_{m}}^{\left(0\right)}$ . Given $\alpha$ in $(0,1)$ , we consider the test which rejects $(H_{0})$ when there exists at least one $m$ in $\mathcal{M}$ such that

[TABLE]

where $u_{\alpha}^{(X)}$ is defined by

[TABLE]

We consider the test function $\Phi_{\alpha}$ defined by

[TABLE]

Using the Monter Carlo method, we can estimate $u_{\alpha}^{(X)}$ and the quantiles $q^{(X)}_{m,1-u_{\alpha}^{(X)}e^{-w_{m}}}$ for all $m\in\mathcal{M}$ . The following theorem provides a coltrol of the first and second kind error for the test $\Phi_{\alpha}$ . The detailed proof is given in the Appendix.

Theorem 4.1.

Let $\alpha,\beta$ be fixed levels in $(0,1)$ and $\Phi_{\alpha}$ be the test defined by (27). We have

[TABLE]

And for all regression function $f$ , we have

[TABLE]

as soon as there exists $m$ in $\mathcal{M}$ such that

[TABLE]

Comments. This theorem shows that the aggregated test is of level $\alpha$ , for all $n$ . Moreover, as soon as the second kind error is controlled by $\beta$ for at least one test in the collection, the same holds for the aggregated procedure with the price that the level $\alpha$ is replaced by $u_{\alpha}^{(X)}e^{-w_{m}}$ to guarantee that the aggregated procedure is of level $\alpha$ .

4.2 The aggregation of projection kernels.

Let us specify the performance of the aggregated test for a collection of projection kernels.

Corollary 4.2.

Let $\alpha,\beta$ be fixed levels in $(0,1)$ . Let $\left\{S_{m},m\in\mathcal{M}\right\}$ be a finite collection of linear subspaces of $\mathbb{L}^{2}([0,1],d\nu)$ , generated by the functions $\left\{\phi_{\lambda},\lambda\in\Lambda_{m}\right\}$ and we assume that the dimension of $S_{m}$ is equal to $D_{m}$ . We set, for all $m\in\mathcal{M}$ , $K_{m}(x,y)=\sum_{\lambda\in\Lambda_{m}}\phi_{\lambda}(x)\phi_{\lambda}(y)$ . Let $\Phi_{\alpha}$ be defined by (27) with the collection of kernels $\left\{K_{m},\ m\in\mathcal{M}\right\}$ and the collection $\left\{w_{m},m\in\mathcal{M}\right\}$ of positive numbers such that $\sum_{m\in\mathcal{M}}e^{-w_{m}}\leq 1$ .

Then $\Phi_{\alpha}$ is a level $\alpha$ test. Moreover, $\mathbb{P}_{f}\left(\Phi_{\alpha}=0\right)\leq\beta$ if

[TABLE]

where $\kappa>0$ and $n\geq 32\ln(\alpha/2)$ .

Comments. Comparing this result with the one obtained in Corollary 3.1 for the single test based on a projection kernel, we can see that the multiple testing procedure allows to obtain the infimum over all $m$ in $\mathcal{M}$ in the right hand side of (4.2) at the price of the additional term $w_{m}$ .

Let us consider the particular case when the collection of kernels $\left\{K_{m},\ m\in\mathcal{M}\right\}$ is the collection of projection kernels based on the constructions in Section 3.1. Let for some $\bar{J}\geq 1,\ \mathcal{M}_{\bar{J}}=\left\{J,\ 0\leq J\leq\bar{J}\right\}$ , and for all $J$ in $\mathcal{M}_{\bar{J}}$ , $w_{J}=2\left(\ln(J+1)+\ln(\pi/\sqrt{6})\right)$ .

We consider $\Phi_{\alpha}^{(1)}$ , the test defined by (27) with the collection of kernels $\left\{K_{J},0\leq J\leq\bar{J}\right\}$ where $K_{0},\ K_{J},0<J\leq\bar{J}$ defined in (18), (19). We obtain from the Corollary 4.2 that there exists some constant $C(\alpha,\beta,\sigma,\|f\|_{\infty})$ such that $\mathbb{P}_{f}\left(\Phi_{\alpha}^{(1)}=0\right)\leq\beta$ as soon as

[TABLE]

For any $\delta>0,R,R^{{}^{\prime}}>0$ we consider

[TABLE]

Corollary 4.3.

Let $\alpha,\beta\in(0,1)$ . For all $J\in\mathcal{M}_{\bar{J}}$ , we consider the test function $\Phi_{\alpha}^{(1)}$ . Assuming that $\ln\ln(n)\geq 1,\ 2^{\bar{J}}\geq n^{2}$ . Then, for any $\delta,R,R^{{}^{\prime}}>0$ we set

[TABLE]

For all $f\in\mathcal{B}_{2,\infty}^{\delta}(R,R^{{}^{\prime}})$ such that

[TABLE]

we have $\mathbb{P}_{f}\left(\Phi_{\alpha}^{(1)}=0\right)\leq\beta$ .

Comments. We obtain a right hand term in (33) of order $\left(\ln\ln(n)/n\right)^{4\delta/(1+4\delta)}$ . This rate of testing was shown to be optimal for the signal detection in a Gaussian white noise by Spokoiny et al., (1996). In particular, he showed that the logarithm factor is the price to pay for adaptation.

4.3 The aggregation of Gaussian kernels.

We here consider the aggregated test based on a collection of Gaussian kernels.

Corollary 4.4.

*Let $\alpha,\beta\in(0,1)$ , $\left\{h_{l},\ l\in\mathcal{L}\right\}$ be a collection of positive bandwidths, we consider $\left\{K_{l},l\in\mathcal{L}\right\}$ a collection of Gaussian kernels corresponding to the above collection of positive bandwidths, where $K_{l}$ defined in (23). Let $\Phi_{\alpha}$ be defined by (27) with the collection of kernel $\left\{K_{l},\ l\in\mathcal{L}\right\}$ and a collection $\left\{w_{l},\ l\in\mathcal{L}\right\}$ of positive numbers such that $\sum_{l\in\mathcal{L}}e^{-w_{l}}\leq 1$ .

Then $\Phi_{\alpha}$ is a level $\alpha$ test. Moreover, there exists $\kappa>0$ such that if*

[TABLE]

We obtain that $\mathbb{P}_{f}(\Phi_{K,\alpha}=0)\leq\beta.$

For $l\in\mathcal{L}=\mathbb{N}\setminus\{0\}$ . We consider the particular case where we take $h_{l}=2^{-l}$ and $w_{l}=2\left(\ln(l+1)+\ln(\pi^{2}/6)\right)$ for all $l\in\mathcal{L}$ . Let $\Phi_{\alpha}^{(2)}$ be the test defined by (27) with the collection of Gaussian kernels $\left\{K_{l},l\in\mathcal{L}\right\}$ and $\left\{w_{l},l\in\mathcal{L}\right\}$ . We obtain from Corollary 4.4 that there exists $C(\alpha,\beta,\sigma,\left\lVert f\right\rVert_{\infty})$ such that $\mathbb{P}_{f}\left(\Phi_{\alpha}^{(2)}=0\right)\leq\beta$ if

[TABLE]

For $\delta>0,R,R^{{}^{\prime}}>0$ we consider

[TABLE]

Corollary 4.5.

Let $\alpha,\beta\in(0,1)$ . For all $l\in\mathcal{L}$ , we consider the test function $\Phi_{\alpha}^{(2)}$ and assume that $\ln\ln\geq 1$ . For any $\delta>0,R,R^{{}^{\prime}}>0$ , we set

[TABLE]

For all $f\in\mathcal{S}_{\delta}(R,R^{{}^{\prime}})$ such that

[TABLE]

we have $\mathbb{P}_{f}\left(\Phi_{\alpha}^{(2)}=0\right)\leq\beta$ .

Comments. The rate of testing is of order $\left(\ln\ln(n)/n\right)^{4\delta/(1+4\delta)}$ . This rate was shown to be optimal over periodic Sobolev balls up to the logarithm, by Castillo et al., (2006).

5 Simulation study.

5.1 Presentation of the simulation study.

We study our aggregated testing procedures from a practical point of view in this section. We consider $E=[0,1],\ n=100$ and choose $\alpha=0.05$ . In the following simulation, $X_{1},\cdots,X_{n}$ are i.i.d uniform random variables on $[0,1]$ .

Let us introduce the collection of symmetric kernel functions and the aggregated testing procedure $\Phi_{\alpha}$ defined by (27) as follows. First, we consider the test $\Phi_{\alpha}^{(1)}$ denoted by P corresponding to a collection of projection kernels. To be more explicit, we consider the Haar basis $\{\phi_{0},\ \phi_{\left(j,k\right)},\ j\in\mathbb{N},k\in\{0,\cdots,2^{j}-1\}$ introduced in Section 3.1. Let $K_{0}(x,x^{{}^{\prime}})=\phi_{0}(x)\phi_{0}(x^{{}^{\prime}})$ and for $J\geq 1$ $K_{J}(x,x^{{}^{\prime}})=\sum_{\lambda\in\{0\}\cup\Lambda_{J}}\phi_{\lambda}(x)\phi_{\lambda}(x^{{}^{\prime}})$ with $\Lambda_{J}=\{(j,k),\ j\in\{0,\cdots,J-1\},\ k\in\{0,\cdots,2^{j}-1\}\}$ . Let $\mathcal{M}_{\overline{J}}=\left\{J,\ 0\leq J\leq 7\right\}$ and for all $J$ in $\mathcal{M}_{\overline{J}}$ , $w_{J}=2\left(\ln(J+1)+\ln(\pi/\sqrt{6})\right)$ . We consider $\Phi_{\alpha}^{(1)}$ the multiple testing procedure with the collection of kernels $\left\{K_{J},\ J\in\mathcal{M}_{\overline{J}}\right\}$ .

Second, we also consider the multiple test associated with the collection of Gaussian kernel functions defined in Section 4.3. For $\mathcal{L}=\left\{1,2,\cdots,6\right\}$ we take $\left\{h_{l},l\in\mathcal{L}\right\}=\left\{1/24,1/16,1/12,1/8,1/4,1/2\right\}$ , let $K_{l}(x,y)=\frac{1}{h_{l}}k\left(\frac{x-y}{h_{l}}\right)$ with $k(u)=(2\pi)^{-1/2}\exp\left(-u^{2}/2\right)$ . Then taking $w_{l}=1/|\mathcal{L}|=1/6$ , we consider $\Phi_{\alpha}^{(2)}$ the multiple testing procedure denoted by G, with the collection of kernels $\left\{K_{l},l\in\mathcal{L}\right\}$ .

At last, we are interested in the collection of both projection and Gaussian kernels. We define $\Phi_{\alpha}^{(3)}$ denoted by PG, the multiple testing procedure with the collection of kernels $\left\{K_{p},\ p\in\mathcal{P}=\mathcal{M}_{\overline{J}}\cup\mathcal{L}\right\}$ . For $p\in\mathcal{M}_{\overline{J}}$ we take $w_{p}=\ln(J+1)+\ln(\pi/\sqrt{6})$ and for $p\in\mathcal{L}$ we take $w_{p}=1/12$ .

We recall that the test rejects $(H_{0})$ when there exists at least one $m$ in $\mathcal{M}$ such that $V_{K_{m}}>q^{(X)}_{m,1-u^{(X)}_{\alpha}e^{-w_{m}}}$ . Hence, for each observation $X=\left(X_{1},\cdots,X_{n}\right)$ we have to estimate $u_{\alpha}^{(X)}$ defined by (26) and $q^{(X)}_{m,1-u^{(X)}_{\alpha}e^{-w_{m}}}$ . Applying the Monte Carlo method introduced in the Section 2.3, these quantities are well approximated. To be more explicit, we generate $400000$ samples of $\left\{\epsilon^{b}\right\}_{b=1}^{400000}$ and $\left\{\epsilon^{{}^{\prime}b}\right\}_{b^{{}^{\prime}}=1}^{400000}$ , in which we use one half to approximate the conditional probability occurring in (26) and other half is used to estimate the distribution of each $V_{K_{m}}^{\left(0\right)}$ . We note that $u_{\alpha}^{(X)}$ is approximated by taking $u$ in a regular grid of $[0,1]$ with bandwidth $2^{-16}$ and choosing the approximation of $u_{\alpha}^{(X)}$ as the largest value of the grid such that the estimated conditional probabilities in (26) are less than $\alpha$ .

5.2 Simulation results.

We first study the probabilities of first kind error of each test. We realize $5000$ simulations of $X$ . For each simulation, we determine the conclusions of tests P, G and PG where the critical values are approximated by the Monte Carlo methods described above. The probabilities of first kind error of tests are estimated by the number of rejections for these tests divided by $5000$ . The obtained estimated levels of tests and the corresponding confidence intervals (CI) are showed in the Table 1.

We then study the probabilities of rejection for each test under several alternatives. We first consider the following alternative,

[TABLE]

with $0<\epsilon\leq 1$ and $0<a<1$ . Second, we consider the alternative defined by

[TABLE]

with $\tau>0$ , and $h_{j}\in\mathcal{Z},\ 0<p_{j}<1$ for all $j$ . Next, we consider the following alternative,

[TABLE]

with $c>0$ . The last alternative, for which we aim to compare our results with the results of Eubank and LaRiccia, (1993) is defined as follows

[TABLE]

where $\varrho\geq 0$ and $j\in\mathbb{N}^{*}$ .

For each alternative $f$ , we realize 1000 simulations of $X$ . For each simulation, we determine conclusions of tests P, G and PG, where the critical values of our tests are still approximated by the Monte Carlo method. The powers of tests are estimated by the number of rejections divided by 1000. The obtained estimated powers of tests and lower bounds of the asymptotic confidence intervals with the confidence level $99\%$ are represented in the Table 2, 3 and 4. Table 5 is proposed for comparing our tests and the two of tests $T_{nm}$ denoted by EL1, $T_{n\lambda}$ denoted by EL2, which were proposed in Eubank and LaRiccia, (1993). We recall briefly tests $T_{nm}$ , $T_{n\lambda}$ as follows.

[TABLE]

and

[TABLE]

where $\sum^{{}^{\prime}}$ indicates summation excluding the zero index and $\tilde{a}_{jn}$ are the sample Fourier coefficients,

[TABLE]

In the three alternatives $f_{1,a,\epsilon}$ , $f_{2,\tau}$ and $f_{3,c}$ , the test PG is more powerful than P and G tests. Our conclusion is that the test PG is a good choice in practice. In Table 5, we see in the firt column ( $\rho=0$ ), which corresponds to the null hypothesis that our test is of level $\alpha=0.05$ , which is not the case for the tests proposed by Eubank and LaRiccia, (1993), which are only asymptotically of level $\alpha$ . This explain why our test is generally less powerful than the tests EL1 and EL2 for $\rho=0.5$ . In the other cases, we obtain as good or better results.

Appendix A Proof of Proposition 2.1

Let us prove the first part of Prop 2.1. Recall that $q^{\alpha}_{K,1-\beta/2}$ denotes the $(1-\beta/2)$ quantile of $q^{(X)}_{K,1-\alpha}$ which is the $(1-\alpha)$ quantile of $V_{K}^{\left(0\right)}$ conditionally on $X$ . We here want to find a condition on $\varepsilon_{K}=\mathbb{E}(T_{K})$ , ensuring that

[TABLE]

From Markov’s inequality, we have for all $\lambda>0$

[TABLE]

Let us compute $\mathbb{E}\left(T_{K}^{2}|X\right)$ . We see that

[TABLE]

Then

[TABLE]

Since $\mathbb{E}\left[T_{K}^{2}\right]=\mathbb{E}\left[\mathbb{E}\left[T_{K}^{2}\big{|}X\right]\right]$ , and since $\left(X_{1},\cdots,X_{n}\right)$ are i.i.d with density $\nu$ on $E$ , we obtain

[TABLE]

Thus

[TABLE]

In fact

[TABLE]

Then

[TABLE]

Replacing (38) into (37) we obtain

[TABLE]

Choosing $\lambda=\sqrt{\frac{16A_{K}+8B_{K}}{\beta}}$ , the above inequality leads to

[TABLE]

This implies

[TABLE]

Now we consider the term $\widehat{\sigma}_{n}^{2}=\frac{1}{n}\sum_{i=1}^{n}\left(Y_{2i-1}-Y_{2i}\right)^{2}$ . Following to the Cochran’s theorem, we consider the orthogonal subspace $W$ of dimension $n/2$ . We denote $\left(e_{1},\cdots,e_{n/2}\right)$ be an orthogonal basis of $W$ , where for all $i=1,\cdots,n/2$ , $e^{T}_{i}$ is a vetor includes $n$ elements within two values $\{0,1\}$ and its values equal to $1$ at two positions $2i$ and $2i-1$ . On the other hand, for $Y=\left(Y_{1},\cdots,Y_{n}\right)$ , with $Y_{i}=f(X_{i})+\sigma\epsilon_{i}$ we have

[TABLE]

Using the Cochran’s theorem, we have

[TABLE]

where $a^{2}:=\frac{1}{n}\sum_{i=1}^{n/2}\left[f\left(\frac{2i-1}{n}\right)-f\left(\frac{2i}{n}\right)\right]^{2}$ and $\chi^{2}(k,\lambda)$ denotes a non central Chi-square variable with $k$ degrees of freedom and non centrality parameter $\lambda$ .

Moreover,

[TABLE]

Hence

[TABLE]

Now, we consider the variable $Z\sim\chi^{2}\left(\frac{n}{2},\frac{na^{2}}{2\sigma^{2}}\right)$ . Using Lemma 8.1 in Birgé, (2001), we have

[TABLE]

This implies

[TABLE]

Choosing $\rho=\ln\left(4/\beta\right)$ , (41) leads to

[TABLE]

where

[TABLE]

Thus

[TABLE]

From (40) and (42), we obtain

[TABLE]

with

[TABLE]

If $q^{\alpha}_{K,1-\beta/2}\leq UV$ we have $\mathbb{P}_{f}\left(V_{K}\leq q^{\alpha}_{K,1-\beta/2}\right)\leq\beta/2$ .

Therefore, if

[TABLE]

then

[TABLE]

Let us now give an upper bound for $q^{\alpha}_{K,1-\beta/2}$ . Reasoning conditionally on $X$ , we recognize in $\frac{1}{n(n-1)}\sum_{i\neq j}^{n}K\left(X_{i},X_{j}\right)\epsilon_{i}\epsilon_{j}:=T_{K}^{\left(0\right)}$ be a Gaussian chaos, as defined by De la Pena and Giné, (2012), of the form $Z=\sum_{i\neq i^{{}^{\prime}}}x_{i,i^{{}^{\prime}}}\epsilon_{i}\epsilon_{i^{{}^{\prime}}}$ , where $x_{i,i^{{}^{\prime}}}$ ’s are some real deterministic numbers and $(\epsilon_{i})_{i}$ is a sequence of i.i.d Gaussian variables. Corollary 3.26 of De la Pena and Giné, (2012) states that there exists some absolute constant $\kappa>0$ such that if $\gamma^{2}=\mathbb{E}[Z^{2}]=\sum_{i\neq i^{{}^{\prime}}}x_{i,i^{{}^{\prime}}}^{2}$ . Then

[TABLE]

Hence by Markov’s inequality,

[TABLE]

Applying the result (45) for $T_{K}^{\left(0\right)}$ with

[TABLE]

we have

[TABLE]

On the other hand, we have, under $H_{0}$ , $\hat{\sigma}_{n}^{2}=\hat{\sigma}^{2}_{n,\epsilon}$ where

[TABLE]

Since the variables $Z_{i}=\epsilon^{{}^{\prime}}_{2i-1}-\epsilon^{{}^{\prime}}_{2i}$ , $i=1,\cdots,n/2$ are i.i.d standard Gaussian variables. Using the Lemma 8.1 in Birgé, (2001), we obtain

[TABLE]

Choosing $x=\ln(2/\alpha)$ , (48) leads to

[TABLE]

Moreover, we have

[TABLE]

From (47) and (49), we obtain

[TABLE]

This implies

[TABLE]

Thus the $(1-\alpha)$ quantile of $V_{K}^{\left(0\right)}$ conditionally on $X$ satisfies

[TABLE]

Taking $n\geq 32\ln\left(\frac{2}{\alpha}\right)$ , so $\sqrt{n}\geq 4\sqrt{2}\sqrt{\ln\left(\frac{2}{\alpha}\right)}$ , (51) returns to

[TABLE]

Hence $q^{\alpha}_{K,1-\beta/2}$ is upper bounded by the $(1-\beta/2)$ quantile of $\frac{2\kappa}{\sqrt{n(n-1)}}\ln\left(\frac{2}{\alpha}\right)\sqrt{\frac{1}{n(n-1)}\sum_{i\neq j=1}^{n}K^{2}_{ij}}$ .

We define

[TABLE]

We use Markov’s inequality again for the nonnegative random variable $U_{n}$ , we obtain for any $\delta>0$

[TABLE]

We have

[TABLE]

Choosing $\delta=2\int_{E^{2}}K^{2}(x,y)d\nu(x)d\nu(y)/\beta$ , (53) returns to

[TABLE]

and

[TABLE]

which concludes the proof.

Appendix B Proof of Theorem 2.2

For all symmetric kernel function $K$ , we have

[TABLE]

On the other hand

[TABLE]

Let $C_{K}$ be an upper bound for $\int_{E^{2}}K^{2}(x,y)d\nu(x)d\nu(y)$ , we have

[TABLE]

From Proposition 2.1, the bounds for $A_{K}$ and $B_{K}$ and the inequality $\sqrt{a+b}\leq\sqrt{a}+\sqrt{b}$ for all $a\geq 0,b\geq 0$ , we deduce that $\mathbb{P}_{f}\left(\Phi_{K,\alpha}=0\right)\leq\beta$ as soon as,

[TABLE]

By using the elementary inequality $2cd\leq c^{2}+d^{2}$ with $c=\left\lVert K[f]\right\rVert$ and $d=4\sqrt{\frac{\left(\|f\|^{2}_{\infty}+\sigma^{2}\right)}{n\beta}}$ in the right hang side of the above condition, the above condition holds if

[TABLE]

Appendix C Proof of Corollary 3.1 and 3.3.

Under the hypothesis of corollary 3.1,

[TABLE]

and the linear space $S$ generated by the functions $\left(\phi_{\lambda},\lambda\in\Lambda\right)$ is of dimension $D$ . Hence, we have

[TABLE]

Thus, we can take $C_{K}=D$ .

Second, under choice of the Gaussian kernel defined by (8), we recall that

[TABLE]

where $k(u)=\frac{1}{\sqrt{2\pi}}\exp\left(-u^{2}/2\right),\ \text{for all}\ u\in\mathbb{R}$ and $h$ is a positive bandwidth.

We have

[TABLE]

Hence, we can choose $C_{K}=\frac{\left\lVert\nu\right\rVert_{\infty}}{h\sqrt{2\pi}}$ .

Appendix D Proof of Proposition 3.2

For all $J\geq 0$ , we set $D=2^{J}$ be the dimension of $S_{J}$ . Let us assume that $f\in\mathcal{B}_{2,\infty}^{\delta}(R)$ , it implies

[TABLE]

We obtain from Corollary 3.1 that there exists

[TABLE]

such that $\mathbb{P}_{f}\left(\Phi_{K,\alpha}=0\right)\leq\beta$ if

[TABLE]

In this case, we see that the right hand side of (21) reproduces a bias-variance decomposition close to the bias-variance decomposition for projection estimators, with the bias term $R^{2}2^{-2J\delta}$ and the variance term $2^{J/2}/n$ . The optimal choice of $J$ satisfies

[TABLE]

Thus, we obtain the optimal choice $J^{*}$ ,

[TABLE]

leading to the desired result.

Appendix E Proof of Proposition 3.4

Considering (3.3), we mainly have to find a sharp upper bound for $\|f-k_{h}*f\|^{2}$ when $f\in\mathcal{S}_{\delta}(R)$ . Plancherel’s theorem gives that when $f\in\mathbb{L}^{1}(\mathbb{R})\cap\mathbb{L}^{2}(\mathbb{R})$ ,

[TABLE]

We assume that $\left\lVert\hat{k}\right\rVert_{\infty}<+\infty$ and

[TABLE]

for some $C>0$ . There also exists some constant $C(\delta)>0$ such that

[TABLE]

Then

[TABLE]

and since $f\in\mathcal{S}_{\delta}(R)$ ,

[TABLE]

We obtain from corollary 3.3 that there exists

[TABLE]

such that $\mathbb{P}_{f}\left(\Phi_{K,\alpha}=0\right)\leq\beta$ if

[TABLE]

In this case, we see that the right hand side of (59) reproduces a bias-variance decomposition with the bias term $2^{-2\delta l}$ and the variance term $2^{l/2}/n$ . The optimal choice of $l$ satisfies

[TABLE]

Thus, we obtain the optimal choice $l^{*}$ as follows.

[TABLE]

leading to the desired result.

Appendix F Proof of Theorem 4.1, Corollary 4.2 and 4.4.

We have

[TABLE]

We have,

[TABLE]

by definition of $u_{\alpha}^{(X)}$ , which implies that $\mathbb{P}_{(H_{0})}\left(\Phi_{\alpha}=1\right)\leq\alpha$ .

On the other hand, we know that $u_{\alpha}^{(X)}\geq\alpha$ . Setting $\alpha_{m}=\alpha e^{-w_{m}}$ , we have

[TABLE]

as soon as there exists $m$ in $\mathcal{M}$ such that

[TABLE]

We can now apply Corollary 3.1 and 3.3 with $\alpha_{m}=\alpha e^{-w_{m}}$ , so we replace $\ln(2/\alpha)$ by $\left(\ln(2/\alpha)+w_{m}\right)$ for desired results in Corollary 4.2 and 4.4.

Appendix G Proof of Corollary 4.3.

Considering (31), we aim to find a sharp upper bound for the right hand side of the inequality when $f\in\mathcal{B}_{2,\infty}^{\delta}(R,R^{{}^{\prime}})$ . Let us assume that $f\in\mathcal{B}_{2,\infty}^{\delta}(R,R^{{}^{\prime}})$ . Then $f\in\mathcal{B}_{2,\infty}^{\delta}(R)$ , we have

[TABLE]

and

[TABLE]

Hence (31) can be upper bounded by

[TABLE]

Taking

[TABLE]

That leads to $\mathbb{P}_{f}\left(\Phi_{\alpha}^{(1)}=0\right)\leq\beta$ if

[TABLE]

Appendix H Proof of Corollary 4.4.

Considering (35), we aim to find a sharp upper bound for the right hand side of the inequality when $f\in\mathcal{S}_{\delta}(R,R^{{}^{\prime}})$ . Let us assume that $f\in\mathcal{S}_{\delta}(R,R^{{}^{\prime}})$ . Similarly, with regards to the proof of Proposition 3.4, we have

[TABLE]

and

[TABLE]

Hence (35) can be upper bounded by

[TABLE]

Choosing

[TABLE]

That leads to $\mathbb{P}_{f}\left(\Phi_{\alpha}^{(1)}=0\right)\leq\beta$ if

[TABLE]

Acknowledgement

I gratefully thank to Professor Béatrice Laurent of Institut National des Sciences Appliquées de Toulouse and Professor Jean-Michel Loubes of Institut de Mathématiques de Toulouse for supporting me in the best ideas and comments.

Bibliography26

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1Bachoc et al., (2017) Bachoc, F., Gamboa, F., Loubes, J.-M., and Venet, N. (2017). A gaussian process regression model for distribution inputs. IEEE Transactions on Information Theory .
2Baraud et al., (2002) Baraud, Y. et al. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli , 8(5):577–606.
3Baraud et al., (2003) Baraud, Y., Huet, S., Laurent, B., et al. (2003). Adaptive tests of linear hypotheses by model selection. The Annals of Statistics , 31(1):225–251.
4Birgé, (2001) Birgé, L. (2001). An alternative point of view on lepski’s method. Lecture Notes-Monograph Series , pages 113–133.
5Butucea and Tribouley, (2006) Butucea, C. and Tribouley, K. (2006). Nonparametric homogeneity tests. Journal of statistical planning and inference , 136(3):597–639.
6Castillo et al., (2006) Castillo, I., Lévy-Leduc, C., and Matias, C. (2006). Exact adaptive estimation of the shape of a periodic function with unknown period corrupted by white noise. Mathematical methods of statistics , 15(2):146–175.
7De la Pena and Giné, (2012) De la Pena, V. and Giné, E. (2012). Decoupling: from dependence to independence . Springer Science & Business Media.
8Delgado, (1992) Delgado, M. A. (1992). Testing the equality of nonparametric regression curves.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

1 Introduction

2 Single tests based on a single kernel.

2.1 Definition of the testing procedure.

Assumption 1**.**

2.2 Probabilities of first and second kind errors of the test.

Proposition 2.1**.**

Theorem 2.2**.**

2.3 Performance of the Monte Carlo approximation.

Proposition 2.3**.**

Proposition 2.4**.**

3 Two particular examples of kernel function.

3.1 Projection kernels.

Corollary 3.1**.**

Proposition 3.2**.**

3.2 Gaussian kernels.

Corollary 3.3**.**

Proposition 3.4**.**

4 Multiple or aggregated tests based on collections of kernel functions.

4.1 The aggregated testing procedure.

Theorem 4.1**.**

4.2 The aggregation of projection kernels.

Corollary 4.2**.**

Corollary 4.3**.**

4.3 The aggregation of Gaussian kernels.

Corollary 4.4**.**

Corollary 4.5**.**

5 Simulation study.

5.1 Presentation of the simulation study.

5.2 Simulation results.

Appendix A Proof of Proposition 2.1

Appendix B Proof of Theorem 2.2

Appendix C Proof of Corollary 3.1 and 3.3.

Appendix D Proof of Proposition 3.2

Appendix E Proof of Proposition 3.4

Appendix F Proof of Theorem 4.1, Corollary 4.2 and 4.4.

Appendix G Proof of Corollary 4.3.

Appendix H Proof of Corollary 4.4.

Acknowledgement

Assumption 1.

Proposition 2.1.

Theorem 2.2.

Proposition 2.3.

Proposition 2.4.

Corollary 3.1.

Proposition 3.2.

Corollary 3.3.

Proposition 3.4.

Theorem 4.1.

Corollary 4.2.

Corollary 4.3.

Corollary 4.4.

Corollary 4.5.