Minimax $L_2$-Separation Rate in Testing the Sobolev-Type Regularity of   a function

Maurilio Gutzeit

arXiv:1901.00880·math.ST·February 19, 2020

Minimax $L_2$-Separation Rate in Testing the Sobolev-Type Regularity of a function

Maurilio Gutzeit

PDF

Open Access

TL;DR

This paper investigates the minimax $L_2$-separation rate for testing whether a function in a Sobolev space has higher smoothness, deriving bounds that reveal the rate's independence from the higher smoothness level.

Contribution

It provides the first precise characterization of the minimax separation rate in Sobolev smoothness testing, showing it matches the rate in simple signal detection.

Findings

01

The separation rate scales as $n^{-t/(2t+1/2)}$.

02

The rate is independent of the higher smoothness level $s$.

03

The results unify the understanding of smoothness testing and signal detection rates.

Abstract

In this paper we study the problem of testing if an $L_{2} -$ function $f$ belonging to a certain $l_{2}$ -Sobolev-ball $B_{t} (R)$ of radius $R > 0$ with smoothness level $t > 0$ indeed exhibits a higher smoothness level $s > t$ , that is, belongs to $B_{s} (R)$ . We assume that only a perturbed version of $f$ is available, where the noise is governed by a standard Brownian motion scaled by $\frac{1}{n}$ . More precisely, considering a testing problem of the form $H_{0} : f \in B_{s} (R) vs. H_{1} : f \in B_{t} (R), h \in B_{s} in f ∥ f - h ∥_{L_{2}} > ρ$ for some $ρ > 0$ , we approach the task of identifying the smallest value for $ρ$ , denoted $ρ^{*}$ , enabling the existence of a test $φ$ with small error probability in a minimax sense. By deriving lower and upper bounds on $ρ^{*}$ , we expose its precise dependence on $n$ : $ρ^{*} \sim n^{- \frac{t}{2 t + 1/2}} .$ As a remarkable…

Equations365

H_{0} : f \in B_{s} (R) vs. H_{1} : f \in B_{t} (R), h \in B_{s} in f ∥ f - h ∥_{L_{2}} > ρ

H_{0} : f \in B_{s} (R) vs. H_{1} : f \in B_{t} (R), h \in B_{s} in f ∥ f - h ∥_{L_{2}} > ρ

ρ^{*} \sim n^{- \frac{t}{2 t + 1/2}} .

ρ^{*} \sim n^{- \frac{t}{2 t + 1/2}} .

L_{2} := L_{2} ([0, 1]) = {g : [0, 1] \to R; \int_{0}^{1} g (x)^{2} d λ (x) < \infty}

L_{2} := L_{2} ([0, 1]) = {g : [0, 1] \to R; \int_{0}^{1} g (x)^{2} d λ (x) < \infty}

d Y (x) = f (x) d x + \frac{1}{n} d B (x), x \in [0, 1] .

d Y (x) = f (x) d x + \frac{1}{n} d B (x), x \in [0, 1] .

B_{s, t} (R, ρ) := {g \in B_{t} (R); h \in B_{s} (R) in f ∥ g - h ∥_{L_{2}} > ρ} .

B_{s, t} (R, ρ) := {g \in B_{t} (R); h \in B_{s} (R) in f ∥ g - h ∥_{L_{2}} > ρ} .

H_{0} : f \in B_{s} (R) vs. H_{1} : f \in B_{s, t} (R, ρ) .

H_{0} : f \in B_{s} (R) vs. H_{1} : f \in B_{s, t} (R, ρ) .

ρ^{*} (η) = in f {ρ > 0; \exists test φ : f \in B_{s} (R) sup \mathds P_{f} (φ = 1) + f \in B_{s, t} (R, ρ) sup \mathds P_{f} (φ = 0) \leq η} .

ρ^{*} (η) = in f {ρ > 0; \exists test φ : f \in B_{s} (R) sup \mathds P_{f} (φ = 1) + f \in B_{s, t} (R, ρ) sup \mathds P_{f} (φ = 0) \leq η} .

n^{- \frac{t}{2 t + 1/2}} .

n^{- \frac{t}{2 t + 1/2}} .

n^{- \frac{t}{2 t + 1/2}} ≲ ρ^{*} (η) ≲ max (n^{- \frac{s}{2 s + 1}}, n^{- \frac{t}{2 t + 1/2}}) .

n^{- \frac{t}{2 t + 1/2}} ≲ ρ^{*} (η) ≲ max (n^{- \frac{s}{2 s + 1}}, n^{- \frac{t}{2 t + 1/2}}) .

n^{- \frac{t}{2 t + 1/2}} .

n^{- \frac{t}{2 t + 1/2}} .

< g, h >:= \int_{0}^{1} g (x) h (x) d λ (x) with ∥ g ∥_{L_{2}} := < g, g >, g, h \in L_{2} .

< g, h >:= \int_{0}^{1} g (x) h (x) d λ (x) with ∥ g ∥_{L_{2}} := < g, g >, g, h \in L_{2} .

W = j = 2 ⋃ \infty {ψ_{j, k} : k \in {1, 2, \dots, 2^{j}}},

W = j = 2 ⋃ \infty {ψ_{j, k} : k \in {1, 2, \dots, 2^{j}}},

< g, ψ_{j, k} >= \int_{0}^{1} g (x) ψ_{j, k} (x) d x, j \geq 2, k \in {1, 2, \dots, 2^{j}} .

< g, ψ_{j, k} >= \int_{0}^{1} g (x) ψ_{j, k} (x) d x, j \geq 2, k \in {1, 2, \dots, 2^{j}} .

g = j = 2 \sum \infty k = 1 \sum 2^{j} < g, ψ_{j, k} > ψ_{j, k} .

g = j = 2 \sum \infty k = 1 \sum 2^{j} < g, ψ_{j, k} > ψ_{j, k} .

B_{r} (R) := ⎩ ⎨ ⎧ g \in L_{2}; j = 2 \sum \infty 4^{j r} k = 1 \sum 2^{j} < g, ψ_{j, k} >^{2} \leq R^{2} ⎭ ⎬ ⎫

B_{r} (R) := ⎩ ⎨ ⎧ g \in L_{2}; j = 2 \sum \infty 4^{j r} k = 1 \sum 2^{j} < g, ψ_{j, k} >^{2} \leq R^{2} ⎭ ⎬ ⎫

∥ g ∥_{B_{r}} := j = 2 \sum \infty 4^{j r} k = 1 \sum 2^{j} < g, ψ_{j, k} >^{2}, g \in L_{2}

∥ g ∥_{B_{r}} := j = 2 \sum \infty 4^{j r} k = 1 \sum 2^{j} < g, ψ_{j, k} >^{2}, g \in L_{2}

B_{r, \infty} (R) := ⎩ ⎨ ⎧ g \in L_{2}; j \geq 2 sup 4^{j r} k = 1 \sum 2^{j} < g, ψ_{j, k} >^{2} \leq R^{2} ⎭ ⎬ ⎫ .

B_{r, \infty} (R) := ⎩ ⎨ ⎧ g \in L_{2}; j \geq 2 sup 4^{j r} k = 1 \sum 2^{j} < g, ψ_{j, k} >^{2} \leq R^{2} ⎭ ⎬ ⎫ .

I = {(j, k) \in N^{2} ∣ j \geq 2, k \leq 2^{j}} .

I = {(j, k) \in N^{2} ∣ j \geq 2, k \leq 2^{j}} .

a_{j, k} :=< f, ψ_{j, k} >

a_{j, k} :=< f, ψ_{j, k} >

f = j = 2 \sum \infty k = 1 \sum 2^{j} a_{j, k} ψ_{j, k} .

f = j = 2 \sum \infty k = 1 \sum 2^{j} a_{j, k} ψ_{j, k} .

a_{j, k} :=< d Y, ψ_{j, k} >, f = j = 2 \sum \infty k = 1 \sum 2^{j} a_{j, k} ψ_{j, k} .

a_{j, k} :=< d Y, ψ_{j, k} >, f = j = 2 \sum \infty k = 1 \sum 2^{j} a_{j, k} ψ_{j, k} .

a_{j, k} \sim N (a_{j, k}, \frac{1}{n}) .

a_{j, k} \sim N (a_{j, k}, \frac{1}{n}) .

H_{0}^{'} : := S_{J} j = 2 \sum J k = 1 \sum 2^{j} a_{j, k} ψ_{j, k}_{B_{s}} \leq R vs. H_{1}^{'} : h \in B_{s} (R) in f j = 2 \sum J k = 1 \sum 2^{j} a_{j, k} ψ_{j, k} - h_{L_{2}} > ρ_{J},

H_{0}^{'} : := S_{J} j = 2 \sum J k = 1 \sum 2^{j} a_{j, k} ψ_{j, k}_{B_{s}} \leq R vs. H_{1}^{'} : h \in B_{s} (R) in f j = 2 \sum J k = 1 \sum 2^{j} a_{j, k} ψ_{j, k} - h_{L_{2}} > ρ_{J},

α_{j^{*}}

α_{j^{*}}

T_{j^{*}, α_{j^{*}}}

β_{j^{*}}

C_{β_{j^{*}}}

D_{j^{*}, β_{j^{*}}}

τ_{j^{*}, α_{j^{*}}}

φ = 1 - j^{*} = 2 \prod J \mathds 1_{{T_{j^{*}, α_{j^{*}}} \leq τ_{j^{*}, α_{j^{*}}}}} .

φ = 1 - j^{*} = 2 \prod J \mathds 1_{{T_{j^{*}, α_{j^{*}}} \leq τ_{j^{*}, α_{j^{*}}}}} .

J = ⌊ \frac{1}{2 t + 1/2} \frac{ln ( n )}{ln ( 2 )} ⌋ .

J = ⌊ \frac{1}{2 t + 1/2} \frac{ln ( n )}{ln ( 2 )} ⌋ .

ρ \geq (\frac{1346}{η} + \frac{R}{1 - 2 ^{- t}}) n^{- \frac{t}{2 t + 1/2}},

ρ \geq (\frac{1346}{η} + \frac{R}{1 - 2 ^{- t}}) n^{- \frac{t}{2 t + 1/2}},

f \in B_{s} (R) sup \mathds P_{f} (φ = 1) + f \in B_{s, t} (R, ρ) sup \mathds P_{f} (φ = 0) \leq η .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Approximation and Integration · Nonlinear Partial Differential Equations · Numerical methods in inverse problems

Full text

Minimax $\boldsymbol{L_{2}}$ -Separation Rate in Testing the Sobolev-type Regularity of a Function

Maurilio Gutzeitlabel=e3][email protected] [ OvGU Magdeburg, Institut für Mathematische Stochastik

Universitätsplatz 2, 39106 Magdeburg, Germany

OvGU Magdeburg

(2019)

Abstract

In this paper we study the problem of testing if an $L_{2}-$ function $f$ belonging to a certain $l_{2}$ -Sobolev-ball $B_{t}(R)$ of radius $R>0$ with smoothness level $t>0$ indeed exhibits a higher smoothness level $s>t$ , that is, belongs to $B_{s}(R)$ . We assume that only a perturbed version of $f$ is available, where the noise is governed by a standard Brownian motion scaled by $\frac{1}{\sqrt{n}}$ . More precisely, considering a testing problem of the form

[TABLE]

for some $\rho>0$ , we approach the task of identifying the smallest value for $\rho$ , denoted $\rho^{\ast}$ , enabling the existence of a test $\varphi$ with small error probability in a minimax sense. By deriving lower and upper bounds on $\rho^{\ast}$ , we expose its precise dependence on $n$ :

[TABLE]

As a remarkable aspect of this composite-composite testing problem, it turns out that the rate does not depend on $s$ and is equal to the rate in signal-detection, i.e. the case of a simple null hypothesis.

62G10,

minimax hypothesis testing,

nonasymptotic minimax separation rate,

Gaussian white noise,

Sobolev ball,

smoothness,

keywords:

[class=MSC]

keywords:

††volume: 0††issue: 0

\startlocaldefs\endlocaldefs

1 Introduction
2 Setting
3 Main results
3.1 Upper Bound
3.2 Remark on the relation to [8]
3.3 Lower Bound
4 Alternative settings
5 Proof of Theorem 3.1
5.1 General preparations
5.2 Preliminary Bounds on $\boldsymbol{\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}}$
5.3 Estimation of $\boldsymbol{M_{j^{\ast}}}$
5.4 Conclusion
6 Proof of Theorem 3.2
6.1 Description of the Strategy
6.2 Application to our Problem

1 Introduction

Let $n\in\mathbb{N}^{\ast}=\mathbb{N}\backslash\{0\}$ , $f$ a fixed unknown element of

[TABLE]

and $(B(x))_{x\in[0,1]}$ a standard Brownian motion. Suppose we observe the Gaussian process $(Y(x))_{x\in[0,1]}$ determined by the stochastic differential equation

[TABLE]

The resulting probability measure, expectation and variance given $f$ will be written $\mathds{P}_{f}$ , $\mathds{E}_{f}$ and $\mathds{V}\mathrm{ar}_{f}$ , respectively. Depending on the context and if there is no risk of confusion we may drop the index $f$ or write another index, for instance in the context of lower bounds (section 3.2).

Testing problem

We now fix $s>t>0$ and $R,\rho>0$ . For any $r>0$ , we denote by $B_{r}(R)$ the $l_{2}$ -Sobolev-ball of radius $R$ of functions on $[0,1]$ with regularity at least $r$ – see section 2 for a precise definition. Based on that, let

[TABLE]

Hence, if we interpret $s$ and $t$ as degrees of smoothness, $\widetilde{B}_{s,t}(R,\rho)$ is the set of functions with smoothness level at least $t$ which are separated from the class $B_{s}(R)$ with stronger smoothness $s$ by $\rho$ in $L_{2}$ -sense. Now, the testing problem of interest is

[TABLE]

More specifically, given $\eta\in(0,1)$ , we aim at finding the magnitude in terms of $n$ of the smallest separation distance $\rho^{\ast}(\eta)=\rho^{\ast}(n,s,t,\eta)$ which enables the existence of a test $\varphi$ of level $\eta$ in a minimax sense, i.e. of

[TABLE]

2 Setting

In this section, we describe how the relevant Sobolev balls and the observed Gaussian process will be represented throughout the paper.

Wavelet transform and associated Sobolev ball

Throughout the paper, we make heavy use of a wavelet decomposition of $f$ . As is well-known, we can define a scalar product and associated norm on $L_{2}$ by

[TABLE]

There are many orthogonal wavelet bases of $L_{2}$ with respect to $<\cdot,\cdot>$ . A suitable choice for our purposes is a basis developed in [10] that can be written as

[TABLE]

i.e. it is tailored such that there are exactly $2^{j}$ basis functions at resolution $j\geq 2$ . Clearly, the coefficients of $g\in L_{2}$ with respect to $\mathcal{W}$ are given by

[TABLE]

and yield the representation

[TABLE]

Let $r>0$ . By virtue of isometry properties discussed for instance in [24] and [13], we may now define a functional $(r,2)$ -Sobolev-ball of radius $R$ solely through the wavelet coefficients of its elements, based on the basis from (2.1):

[TABLE]

with associated $(r,2)$ -Sobolev-norm

[TABLE]

or also, as mentioned at the end of the previous section,

[TABLE]

Discrete observation scheme based on the wavelet basis

Let

[TABLE]

Motivated by (2.3), for each $(j,k)\in\mathcal{I}$ we consider

[TABLE]

so that

[TABLE]

The natural corresponding estimators read

[TABLE]

By construction and due to the orthonormality of $\mathcal{W}$ , we know that the family $(\widehat{a}_{i,j})_{(j,k)\in\mathcal{I}}$ is independent with

[TABLE]

Clearly, observing this family is equivalent to observing the original process $(Y(x))_{x\in[0,1]}$ .

3 Main results

In this section, we state and discuss our main results, that is upper and lower bounds on $\rho^{\ast}(\eta)$ . We also provide a high-level description of the strategy and ideas included in the upper bound proof, which is our main contribution.

3.1 Upper Bound

The test

Note that $\widehat{f}$ from (2.4) is not a useful estimator as it exhibits infinite variance. Therefore, we need to carefully impose a restriction of the form $j\leq J$ for some fixed $J\in\mathbb{N}$ , $J\geq 2$ . Actually, section 5 is primarily concerned with obtaining an upper bound on $\rho_{J}^{\ast}(\eta)$ for the reduced, finite-dimensional problem

[TABLE]

where $\rho_{J}$ and $\rho_{J}^{\ast}(\eta)$ are analogous in definition and relation to their counterparts in (1.2) and (1.3). In fact, finding a sufficient separation distance $\rho_{J}\geq\rho^{\ast}_{J}(\eta)$ here is the central and most involved part of the paper.

As we illustrate in section 3.2, it turns out that a test based on estimating $S_{J}^{2}$ only cannot perform well enough under the targeted separation distance of order $n^{-t/(2t+1/2)}$ due to the strong variance at high levels, so that more flexibility is necessary: In Lemma 5.3, we analyse the smallest level $j^{\ast}$ such that $S_{j^{\ast}}$ considerably exceeds $R$ (such an index must exist under $H_{1}^{\prime}$ ) and it turns out that this is detectable through the estimator $\|P_{2}^{j^{\ast}}\widehat{f}\|_{\mathcal{B}_{s}}^{2}$ (section 5.4, second paragraph). Hence, we propose a test which evaluates the individual accumulated (squared) Sobolev-norms of the projections up until level $J$ and rejects the null hypothesis whenever one of these norms is too large.

In particular, we define for $j^{\ast}\in\{2,3,\ldots,J\}$

[TABLE]

and finally the test

[TABLE]

In principle, the conditions $T_{j^{\ast},\alpha_{j^{\ast}}}\leq\tau_{j^{\ast},\alpha_{j^{\ast}}}$ are based on applying Chebyshev’s inequality to the estimators $\|P_{2}^{j^{\ast}}\widehat{f}\|_{B_{s}}^{2}$ with a bias-correction term $A_{j^{\ast}}$ (Lemma 5.2 below). Now, since the variance of $\|P_{2}^{j^{\ast}}\widehat{f}\|_{B_{s}}^{2}$ depends on $f$ , it needs to be estimated, which manifests itself especially in the last part of $T_{j^{\ast},\tau_{j^{\ast}}}$ .

The choice of $J$ is then governed by reaching a trade-off between the resulting upper bound on $\rho_{J}^{\ast}(\eta)$ and the error incurred by ignoring the resolutions beyond $J$ - it is the index where they are both of order $n^{-\frac{t}{2t+1/2}}$ ,

[TABLE]

In terms of technical ingredients, all these considerations are remarkable in that they solely rely on elementary computations based on the Sobolev-balls’ geometry and classical properties of the $\chi^{2}-$ distribution.

Our main result reads as follows:

Theorem 3.1.

Let $\eta\in(0,1)$ . Whenever

[TABLE]

the test $\varphi$ from (3.1) fulfils

[TABLE]

Hence,

[TABLE]

3.2 Remark on the relation to [8]

In order to clarify the distinction between the previous work [8] with $H_{0}:~{}f\in B_{s,\infty}(R)$ and the present paper, we consider two rather specific examples.

Testing the resolutions separately does not suffice

First of all, note that $B_{s,\infty}(R)$ is very large compared to $B_{s}(R)$ , which ensures that, as mentioned above, the test $\Psi$ from [8] performs well under the null hypothesis $H_{0}$ of the present paper. However, this geometric imbalance is so strong that often for one and the same function, we would like one test to reject the null hypothesis and the other test to not reject it:

Consider a simple extreme case where

[TABLE]

Then clearly we have

[TABLE]

It can be assured that $f\in B_{t}(R)$ through the condition $t<s-\log_{4}\left(\frac{2}{\sqrt{5}-1}\right)$ , so that clearly we have found a case where

[TABLE]

i.e. both the null hypothesis of [8] and our alternative hypothesis are met. The test from [8] based on separately evaluating the individual levels will clearly not reject our null hypothesis with high probability. On the other hand, in order to check the new test’s performance, let us invoke Theorem 3.1: By construction, for any $h\in B_{s}(R)$ , there is a sequence $(a_{j})_{j\in\{2,3,\ldots\}}$ in $[0,1]$ such that

[TABLE]

Then we have

[TABLE]

where the last bound can be derived from the observation that necessarily $a_{2}\leq\frac{1}{\sqrt{2}}$ or $a_{3}\leq\frac{1}{\sqrt{2}}$ . As this holds for any $h\in B_{s}(R)$ , in particular we have

[TABLE]

for appropriate $n\in\mathbb{N}$ so that the new test detects that $f\notin B_{s}(R)$ with high probability.

Estimating only $\boldsymbol{\|P_{2}^{J}f\|_{\mathcal{B}_{s,2}}}$ does not suffice

The strategy of only estimating $\|P_{2}^{J}f\|_{\mathcal{B}_{s}}$ is too optimistic in the present setting:

Consider a case where for some $a>1$

[TABLE]

Then on the one hand,

[TABLE]

which, again, exceeds our (squared) upper bound for appropriate $n$ or $a$ so that in principle, it is possible to detect $f\notin B_{s}(R)$ in the sense of Theorem 3.1.

Note that we can see this without using information on more than the first level. This is an important observation with regards to the construction of our test.

Furthermore, we have

[TABLE]

On the other hand, as we show in Lemma 5.2 below, in this special case the cost in terms of standard deviation of including the estimate $\|P_{J}\widehat{f}\|_{\mathcal{B}_{s}}^{2}$ would be

[TABLE]

(absolute constand). For large enough $s$ and/or small enough $a$ , this standard deviation exceeds the (squared) distance to be detected - hence, a test based on level $J$ is unlikely to correctly reject the null hypothesis. The test we propose copes with such a situation through analysing multiple accumulated estimates and would have detected $f\in B_{s}(R)$ at the first level already with high probability.

3.3 Lower Bound

Using the same choice for $J$ as indicated above, a lower bound on $\rho^{\ast}(\eta)$ of the same order can be derived through studying the statistical distance between specific distributions agreeing with $H_{0}$ and $H_{1}$ respectively.

Theorem 3.2.

Let $\eta\in(0,1)$ . There are $C_{\eta}>0$ and $N_{\eta}\in\mathbb{N}$ such that whenever $n\geq N_{\eta}$ and

[TABLE]

for any test $\varphi$ it holds that

[TABLE]

Hence,

[TABLE]

In particular, one may choose

[TABLE]

Note that, as mentioned in the introduction, Theorems 3.1 and 3.2 in conjunction reveal the minimax separation rate to be of order

[TABLE]

which does not depend on the size of the null hypothesis and is equal to the signal-detection rate. Indeed, in order to obtain the lower bound of Theorem 3.2, the fact that $H_{0}$ is a composite hypothesis need not be used.

4 Alternative settings

Before presenting the proofs of our main results, we briefly discuss their possible application in two alternative settings which might also be of interest, see also [8, Section 3.3] and references therein.

Heteroscedastic noise

As a generalisation of (1.1), consider the model

[TABLE]

where $\sigma\in L_{2}$ is unknown. The proof of Theorem 3.1 relies heavily on unbiased estimators of $a_{j,k}^{2}$ , $(j,k)\in\mathcal{I}$ , and hence on knowledge of the noise coefficient, so that in this generalised version we cannot directly apply our result. However, there is a relatively simple solution under certain conditions: Suppose we have access to two independent realisations $(Y^{(1)}(x))_{x\in[0,1]}$ and $(Y^{(2)}(x))_{x\in[0,1]}$ with noise coefficient, say, $\frac{\sigma(x)}{\sqrt{n/2}}$ . Then we can still consider the estimates

[TABLE]

and define a new unbiased estimator for $a_{j,k}^{2}$ based on the simple observation

[TABLE]

If in addition we know an upper bound on $\|\sigma\|_{L_{2}}$ , it turns out that we can state an analogous concentration result as the one for the homoscedastic model (see Lemma 5.2 below) and obtain essentially the same result.

Regression

Another possible observation scheme for testing the smoothness of $f$ would be collecting $n$ iid samples $(X_{i},Y_{i})_{i\in\{1,2,\ldots,n\}}$ according to the model

[TABLE]

for $\epsilon\sim\mathcal{N}(0,1)$ and $X$ uniformly distributed on $[0,1]$ . This situation is particularly interesting since, as mentioned above, it is asymptotically equivalent to (4.1) in the sense of Le Cam ([18]) We could then arrive at the same situation as in the previous setting by considering

[TABLE]

Note that if $X$ is not uniformly distributed, $\mathds{E}[\widehat{a}^{(i)}_{j,k}]=a_{j,k}$ is generally not true and it becomes crucial to guarantee a certain spread of the design points $(X_{i})_{i\in\{1,2,\ldots,n\}}$ over $[0,1]$ .

Open problems: separation in $\boldsymbol{L}_{p}$ -norm and more general Sobolev-spaces

We only consider separation in $L_{2}$ -norm, which raises the question if it is possible to generalise the results to separation in $L_{p}$ -norm, $p>0$ ; the same is true for the previous paper [8]. We believe that the strategies of both papers cannot be easily generalised as different values for $p$ result in fundamentally different problems. Indeed, strong differences with varying $p$ already show in the allegedly simple setting of signal detection in the Gaussian vector/sequence model, see [15, section 3.3.2]). Much more closely related to the present paper, in [19] the authors derive optimal rates for estimating $\|f\|_{L_{p}}$ and give very different results and approaches for even versus odd integers $p$ . With that said, considering more general Sobolev-balls would seem to produce similar effects as our results heavily rely on estimating the $2$ -norm of projections of $f$ (or, in some sense, $\sup$ -norm in the previous paper); coping with different parameters here is not trivial as can be seen for instance in the proofs of [7].

In summary, such considerations are generally possible and constitute worthwile future work, but they are beyond the scope of the present paper.

5 Proof of Theorem 3.1

5.1 General preparations

Reduction of the range of resolutions

Let us make this more clear at this point already: For $j_{1},j_{2}\in\mathbb{N}\cup\{\infty\}$ with $2\leq j_{1}\leq j_{2}$ and $g\in L_{2}$ , define the projections

[TABLE]

Now observe that since $f\in B_{t}(R)$ , for each $j\in\mathbb{N}$ , $j\geq 2$ , we have

[TABLE]

and hence

[TABLE]

Using the triangle inequality, this tells us that under the alternative hypothesis

[TABLE]

Accordingly, under $H_{1}$ we consider the assumption

[TABLE]

and firstly solve $\eqref{s:testprob}$ for $\rho_{J}$ in terms of the reduced range $j\in\{2,3,\ldots,J\}$ , that is, subsequently, we will primarily study the testing problem

[TABLE]

Finally, $\rho$ will be determined by choosing $J$ such that a reasonable trade-off between the two summands,

[TABLE]

is realised.

Now, more specifically, with $a=1346$ , for $j^{\ast}\in\{2,3,\ldots,J\}=:\mathcal{J}$ , let

[TABLE]

Under $H_{1}^{\prime}$ it will be technically useful to detect the level $\boldsymbol{j^{\ast}}\in\mathcal{J}$ at which $\displaystyle\inf_{h\in B_{s}(R)}\|P_{2}^{j^{\ast}}f-h\|_{\mathcal{B}_{s}}$ firstly exceeds $\rho_{j^{\ast}}$ in the sense of Lemma 5.1 below. That leads to a multiple test across the set $\mathcal{J}$ finally given in (5.89).

Decomposition of $\boldsymbol{H_{1}^{\prime}}$

Lemma 5.1.

Under the alternative hypothesis $H_{1}^{\prime}$ , we have

[TABLE]

**Proof. ** By contradiction: Assume that (5.9) is false, i.e.

[TABLE]

Then clearly $F_{J}$ is false, so that $E_{J}$ is true. Equivalently, $F_{J-1}$ is false and in turn $E_{J-1}$ must be true. Continued application of this argument leads to the contradiction

[TABLE]

$\Box$

Concentration of $\boldsymbol{\|P_{2}^{j^{\ast}}\widehat{f}\|_{\mathcal{B}_{s}}^{2}}$

Lemma 5.2.

Let $j^{\ast}\in\mathcal{J}$ . Then, with

[TABLE]

it holds that

[TABLE]

**Proof. ** For $j\in\mathcal{J}$ , let

[TABLE]

Then, by construction, we know that

[TABLE]

i.e. a $\chi^{2}-$ distribution with $2^{j}$ degrees of freedom and non-centrality parameter $\lambda_{j}$ . Classical properties of this distribution now tell us

[TABLE]

Since

[TABLE]

independence in conjunction with (5.12) yields

[TABLE]

We obtain the desired result directly through Chebyshev’s inequality: For $\epsilon>0$ ,

[TABLE]

and hence the claim. $\Box$

More specifically, observe that

[TABLE]

(where we use that for $x\geq 2$ , $\frac{x}{x-1}\leq 2$ ) and hence for $\delta\in(0,1)$

[TABLE]

Furthermore,

[TABLE]

The maximum in the latter computation will play an important role in the sequel. From now on we use the abbreviation

[TABLE]

Plugging these bounds in (5.11) leads to

[TABLE]

for any $\delta\in(0,1)$ .

5.2 Preliminary Bounds on $\boldsymbol{\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}}$

As a next step towards controlling the type-I and type-II errors of our test, we study $\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}$ more closely.

On the one hand, under $H_{0}^{\prime}$ , for any $j^{\ast}\in\mathcal{J}$ we clearly have $\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}\leq R$ .

On the other hand, under $H_{1}^{\prime}$ , we require a lower bound on $\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}$ . The following bound is preliminary in the sense that it requires the knowledge of an index $j^{\ast}\in\mathcal{J}$ with the property from (5.9) and the corresponding $M_{j^{\ast}}$ . The generalisation will be considered in sections 5.3 and 5.4.

Lemma 5.3.

Let $j^{\ast}\in\mathcal{J}$ be an index with the property

[TABLE]

Then the following assertion holds for $A=11$ :

[TABLE]

**Proof. ** Before giving the main arguments, we need a technical preparation and a general (i.e. only depending on $j^{\ast}$ ) lower bound on $\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}$ :

Proxy minimisation of $\inf_{h\in B_{s}(R)}\|P_{2}^{j^{\ast}}f-h\|_{L_{2}}$

For $\widetilde{j}\in\mathcal{J}$ , write $P_{j\neq\widetilde{j}}:=P_{2}^{j^{\ast}}-P_{\widetilde{j}}$ . In the case that $\|P_{j\neq\widetilde{j}}f\|_{\mathcal{B}_{s}}\leq R$ , we can introduce the function $\widetilde{h}$ through the wavelet coefficients

[TABLE]

Then $\widetilde{h}\in B_{s}(R)$ holds since

[TABLE]

Hence, by assumption

[TABLE]

where

[TABLE]

This tells us that if $\|P_{j\neq\widetilde{j}}f\|_{\mathcal{B}_{s}}\leq R$ ,

[TABLE] 2. 2.

Bound in terms of $4^{j^{\ast}s}\rho_{j^{\ast}}^{2}$

If $\|P_{2}^{j^{\ast}-1}f\|_{\mathcal{B}_{s}}\leq R$ , we can use (5.37) with $\widetilde{j}=j^{\ast}$ and $d\geq\rho_{j^{\ast}}\geq 0$ and obtain

[TABLE]

If $\|P_{2}^{j^{\ast}-1}f\|_{\mathcal{B}_{s}}>R$ , observe that by the triangle inequality

[TABLE]

and since

[TABLE]

we obtain

[TABLE]

So, in any case,

[TABLE] 3. 3.

Main arguments

We are now ready to prove (5.28) effectively. To that end, fix an index

[TABLE]

Case 1: $\boldsymbol{\|P_{j\neq\overline{j}}f\|_{\mathcal{B}_{s}}\leq R}$

In that case, we can use (5.36) and (5.37) with $\widetilde{j}=\overline{j}$ in comination with (5.41) and obtain

[TABLE]

remembering (5.25).

Case 2: $\boldsymbol{\|P_{j\neq\overline{j}}f\|_{\mathcal{B}_{s}}>R}$

That case can be handled quickly by considering two subcases:

Subcase 1: $\boldsymbol{4^{j^{\ast}s}\rho_{j^{\ast}}^{2}\geq\rho_{j^{\ast}}M_{j^{\ast}}}$

Observe that with (5.41)

[TABLE]

Subcase 2: $\boldsymbol{4^{j^{\ast}s}\rho_{j^{\ast}}^{2}<\rho_{j^{\ast}}M_{j^{\ast}}}$

In that case we have

[TABLE]

and thus

[TABLE]

This concludes the proof since in any case (5.28) holds.

$\Box$

5.3 Estimation of $\boldsymbol{M_{j^{\ast}}}$

As a last major step before directly controlling the type-I and type-II error probabilities, we need to find an appropriate estimator for $M_{j^{\ast}}$ .

Lemma 5.4.

For $\delta\in(0,1)$ and $j^{\ast},j\in\mathcal{J}$ , let

[TABLE]

and define the events

[TABLE]

Then, for any monotone decreasing sequence $(\beta_{j})_{j\in\mathcal{J}}$ in $(0,1)$ , the following holds:

[TABLE]

**Proof. ** Remembering (5.12), we know that for $j\in\{2,3,\ldots,j^{\ast}\}$

[TABLE]

has the properties

[TABLE]

Now observe that for $\delta\in(0,1)$

[TABLE]

With $Y_{j}=Z_{j}-16^{js}\frac{2^{j}}{n}$ , Chebyshev’s inequality now tells us that

[TABLE]

We derive two bounds from this statement by lower bounding the the left hand side in two different ways:

On the one hand, observe

[TABLE]

Now, since $(\beta_{j})_{j\in\mathcal{J}}$ is monotone decreasing, the sequence $(v_{\beta_{j},j^{\ast}})_{j\in\mathcal{J}}$ is increasing, so that via a union bound we obtain

[TABLE]

With

[TABLE]

we have

[TABLE]

and hence the first claim from $\eqref{s:resXi}$ .

On the other hand, observe

[TABLE]

and consider the specific case $j=\overline{j}$ in (5.65):

[TABLE]

which asserts the second claim from (5.58). $\Box$

5.4 Conclusion

We will now assemble the individual results of the previous sections to obtain the claim of Theorem 3.1. For $j\in\mathcal{J}$ we introduce

[TABLE]

so that in particular

[TABLE]

and $(\beta_{j})_{j\in\mathcal{J}}$ is monotone decreasing.

Result for fixed index

For $j^{\ast}\in\mathcal{J}$ define

[TABLE]

Then under $H_{0}^{\prime}\cap\xi_{j^{\ast},\beta_{j^{\ast}}}^{0}$ , (5.56) and (5.26) yield that with probability at least $1-\alpha_{j^{\ast}}$

[TABLE]

so that with

[TABLE]

we obtain

[TABLE]

On the other hand, let $\boldsymbol{j^{\ast}}$ be a transition index with property (5.27). Then under $H_{1}^{\prime}\cap\xi_{\boldsymbol{j^{\ast}},\beta_{\boldsymbol{j^{\ast}}}}^{1}$ , (5.26) and (5.28) tell us that with probability at least $1-\alpha_{\boldsymbol{j^{\ast}}}$

[TABLE]

Provided that

[TABLE]

using (5.57) this yields

[TABLE]

Now by explicit computation we see that the choices in (5.76) ensure (5.86) as well as

[TABLE]

so that (5.87) can be continued as

[TABLE]

and hence, finally,

[TABLE]

Generalisation to unknown $\boldsymbol{j^{\ast}}$

For our test

[TABLE]

we can conclude with (5.58) and (5.76) that on the one hand

[TABLE]

and on the other hand

[TABLE]

Specification of $\boldsymbol{J}$ and conclusion

We are now ready to return to (5.7). Choose

[TABLE]

so that

[TABLE]

That yields

[TABLE]

and, on the other hand,

[TABLE]

Therefore, whenever we choose

[TABLE]

indeed by (5.91) and (5.95)

[TABLE]

6 Proof of Theorem 3.2

6.1 Description of the Strategy

According to (1.3), given $\eta\in(0,1)$ , we aim at finding $\rho>0$ such that for any test $\varphi$ ,

[TABLE]

This can be achieved through a Bayesian-type approach, see e.g. [1]: Let $\nu_{0},\nu_{\rho}$ be probability distributions (priors) such that $\mathrm{supp}(\nu_{0})\subseteq B_{s}(R)$ and $\mathrm{supp}(\nu_{\rho})\subseteq\widetilde{B}_{s,t}(R,\rho)$ . Then we have

[TABLE]

This tells us that if we find $\widetilde{\rho}>0$ such that

[TABLE]

for any test $\varphi$ it holds that

[TABLE]

and hence

[TABLE]

6.2 Application to our Problem

Priors

Since the upper bound does not depend on $s$ and we found the index $J$ from (5.96) to be critical, we choose the following structurally simple priors: Let $\nu_{0}$ be the Dirac- $\delta$ distribution on $\{0\}$ (i.e. $f\equiv 0$ ) and $\nu_{\rho}$ be the uniform distribution on

[TABLE]

where $v>0$ needs further specification: On the one hand, it is necessary to ensure that each $f\in\mathcal{A}_{\rho,v}$ fulfils $\|f\|_{\mathcal{B}_{t}}\leq R$ - note that for any such $f$ , $\|f\|_{L_{2}}=2^{J/2}v$ , so that by construction that condition reads

[TABLE]

This motivates the choice $v:=a_{\eta}\cdot R\cdot 2^{-J(t+1/2)}$ for some $a_{\eta}\in(0,1]$ specified later based on further restrictions. On the other hand, we require

[TABLE]

Since only the level $J$ is involved, this is in fact merely the minimum over the Euclidean ball with radius $R\cdot 2^{-Js}$ so that

[TABLE]

Now, by explicit computation we see that if

[TABLE]

with our choice of $v$ we have

[TABLE]

so that (6.6) holds if

[TABLE]

Statistical distance

Again, the central task in this proof is to compute the $\chi^{2}$ -divergence between $\mathds{P}_{f\sim\nu_{0}}$ and $\mathds{P}_{f\sim\nu_{\rho}}$ . By construction, $\mathds{P}_{f\sim\nu_{0}}$ corresponds to the $2^{J}$ -fold product of Gaussian distributions with mean [math] and variance $\frac{1}{n}$ , so that for $x\in\mathbb{R}^{2^{J}}$

[TABLE]

On the other hand, $\mathds{P}_{f\sim\nu_{\rho}}$ corresponds to a uniform mixture of $2^{2^{J}}$ products of $2^{J}$ independent Gaussians with means of the form $\pm v$ and variance $\frac{1}{n}$ .

Let $\mathcal{S}:=\{1,-1\}^{2^{J}}$ and $R$ be uniformly distributed on $\mathcal{S}$ (i.e. the product of $2^{J}$ Rademacher variables). Then

[TABLE]

and furthermore, with an independent copy $R^{\prime}$ of $R$ ,

[TABLE]

The quotient we need to integrate in (6.5) therefore reads

[TABLE]

Since the product of independent Rademacher variables is itself a Rademacher variable, we obtain

[TABLE]

Conclusion

Now, (6.5) holds if

[TABLE]

which, by explicit computation, is fulfilled if

[TABLE]

Through (5.97) and (5.98) we find that

[TABLE]

and obtain the stronger condition

[TABLE]

In summary: Let

[TABLE]

If

[TABLE]

the priors $\nu_{0}$ and $\nu_{\rho}$ meet all requirements and the lower bound

[TABLE]

is established, where we write $C_{\eta}:=\frac{R}{2}a_{\eta}$ .

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Baraud, Y. Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 , 5 (2002), 577–606.
2[2] Belitser, E., et al. On coverage and local radial rates of credible sets. The Annals of Statistics 45 , 3 (2017), 1124–1151.
3[3] Blanchard, G., Carpentier, A., and Gutzeit, M. Minimax Euclidean separation rates for testing convex hypotheses in ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} . Electronic Journal of Statistics 12 , 2 (2018), 3713–3735.
4[4] Bull, A., and Nickl, R. Adaptive confidence sets in L 2 subscript 𝐿 2 L_{2} . Probability Theory and Related Fields 156 , 3-4 (2013), 889–919.
5[5] Cai, T. T., and Low, M. G. Adaptive confidence balls. The Annals of Statistics 34 , 1 (2006), 202–228.
6[6] Cai, T. T., and Low, M. G. Testing Composite Hypotheses, Hermite Polynomials and Optimal Estimation of a Nonsmooth Functional. The Annals of Statistics 39 , 2 (2011), 1012–1041.
7[7] Carpentier, A. Honest and adaptive confidence sets in l p subscript 𝑙 𝑝 l_{p} . Electronic Journal of Statistics 7 (2013), 2875–2923.
8[8] Carpentier, A. Testing the regularity of a smooth signal. Bernoulli 21 , 1 (2015), 465–488.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Minimax L2\boldsymbol{L_{2}}L2​-Separation Rate in Testing the Sobolev-type Regularity of a Function

Abstract

keywords:

keywords:

Contents

1 Introduction

Testing problem

Related questions and literature

2 Setting

Wavelet transform and associated Sobolev ball

Discrete observation scheme based on the wavelet basis

3 Main results

3.1 Upper Bound

The test

Theorem 3.1**.**

3.2 Remark on the relation to [8]

Testing the resolutions separately does not suffice

Estimating only ∥P2Jf∥Bs,2\boldsymbol{\|P_{2}^{J}f\|_{\mathcal{B}_{s,2}}}∥P2J​f∥Bs,2​​ does not suffice

3.3 Lower Bound

Theorem 3.2**.**

4 Alternative settings

Heteroscedastic noise

Regression

Open problems: separation in Lp\boldsymbol{L}_{p}Lp​-norm and more general Sobolev-spaces

5 Proof of Theorem 3.1

5.1 General preparations

Reduction of the range of resolutions

Decomposition of H1′\boldsymbol{H_{1}^{\prime}}H1′​

Lemma 5.1**.**

Concentration of ∥P2j∗f^∥Bs2\boldsymbol{\|P_{2}^{j^{\ast}}\widehat{f}\|_{\mathcal{B}_{s}}^{2}}∥P2j∗​f​∥Bs​2​

Lemma 5.2**.**

5.2 Preliminary Bounds on ∥P2j∗f∥Bs\boldsymbol{\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}}∥P2j∗​f∥Bs​​

Lemma 5.3**.**

5.3 Estimation of Mj∗\boldsymbol{M_{j^{\ast}}}Mj∗​

Lemma 5.4**.**

5.4 Conclusion

Result for fixed index

Generalisation to unknown j∗\boldsymbol{j^{\ast}}j∗

Specification of J\boldsymbol{J}J and conclusion

6 Proof of Theorem 3.2

6.1 Description of the Strategy

6.2 Application to our Problem

Priors

Statistical distance

Conclusion

Minimax $\boldsymbol{L_{2}}$ -Separation Rate in Testing the Sobolev-type Regularity of a Function

Theorem 3.1.

Estimating only $\boldsymbol{\|P_{2}^{J}f\|_{\mathcal{B}_{s,2}}}$ does not suffice

Theorem 3.2.

Open problems: separation in $\boldsymbol{L}_{p}$ -norm and more general Sobolev-spaces

Decomposition of $\boldsymbol{H_{1}^{\prime}}$

Lemma 5.1.

Concentration of $\boldsymbol{\|P_{2}^{j^{\ast}}\widehat{f}\|_{\mathcal{B}_{s}}^{2}}$

Lemma 5.2.

5.2 Preliminary Bounds on $\boldsymbol{\|P_{2}^{j^{\ast}}f\|_{\mathcal{B}_{s}}}$

Lemma 5.3.

5.3 Estimation of $\boldsymbol{M_{j^{\ast}}}$

Lemma 5.4.

Generalisation to unknown $\boldsymbol{j^{\ast}}$

Specification of $\boldsymbol{J}$ and conclusion