Minimax $L_2$-Separation Rate in Testing the Sobolev-Type Regularity of a function
Maurilio Gutzeit

TL;DR
This paper investigates the minimax $L_2$-separation rate for testing whether a function in a Sobolev space has higher smoothness, deriving bounds that reveal the rate's independence from the higher smoothness level.
Contribution
It provides the first precise characterization of the minimax separation rate in Sobolev smoothness testing, showing it matches the rate in simple signal detection.
Findings
The separation rate scales as $n^{-t/(2t+1/2)}$.
The rate is independent of the higher smoothness level $s$.
The results unify the understanding of smoothness testing and signal detection rates.
Abstract
In this paper we study the problem of testing if an function belonging to a certain -Sobolev-ball of radius with smoothness level indeed exhibits a higher smoothness level , that is, belongs to . We assume that only a perturbed version of is available, where the noise is governed by a standard Brownian motion scaled by . More precisely, considering a testing problem of the form for some , we approach the task of identifying the smallest value for , denoted , enabling the existence of a test with small error probability in a minimax sense. By deriving lower and upper bounds on , we expose its precise dependence on : As a remarkable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Approximation and Integration · Nonlinear Partial Differential Equations · Numerical methods in inverse problems
Minimax -Separation Rate in Testing the Sobolev-type Regularity of a Function
Maurilio Gutzeitlabel=e3][email protected] [ OvGU Magdeburg, Institut für Mathematische Stochastik
Universitätsplatz 2, 39106 Magdeburg, Germany
OvGU Magdeburg
(2019)
Abstract
In this paper we study the problem of testing if an function belonging to a certain -Sobolev-ball of radius with smoothness level indeed exhibits a higher smoothness level , that is, belongs to . We assume that only a perturbed version of is available, where the noise is governed by a standard Brownian motion scaled by . More precisely, considering a testing problem of the form
[TABLE]
for some , we approach the task of identifying the smallest value for , denoted , enabling the existence of a test with small error probability in a minimax sense. By deriving lower and upper bounds on , we expose its precise dependence on :
[TABLE]
As a remarkable aspect of this composite-composite testing problem, it turns out that the rate does not depend on and is equal to the rate in signal-detection, i.e. the case of a simple null hypothesis.
62G10,
minimax hypothesis testing,
nonasymptotic minimax separation rate,
Gaussian white noise,
Sobolev ball,
smoothness,
keywords:
[class=MSC]
keywords:
††volume: 0††issue: 0
\startlocaldefs\endlocaldefs
Contents
1 Introduction
Let , a fixed unknown element of
[TABLE]
and a standard Brownian motion. Suppose we observe the Gaussian process determined by the stochastic differential equation
[TABLE]
The resulting probability measure, expectation and variance given will be written , and , respectively. Depending on the context and if there is no risk of confusion we may drop the index or write another index, for instance in the context of lower bounds (section 3.2).
Testing problem
We now fix and . For any , we denote by the -Sobolev-ball of radius of functions on with regularity at least – see section 2 for a precise definition. Based on that, let
[TABLE]
Hence, if we interpret and as degrees of smoothness, is the set of functions with smoothness level at least which are separated from the class with stronger smoothness by in -sense. Now, the testing problem of interest is
[TABLE]
More specifically, given , we aim at finding the magnitude in terms of of the smallest separation distance which enables the existence of a test of level in a minimax sense, i.e. of
[TABLE]
Related questions and literature
There are in essence two lines of work with questions or ideas closely related to the present paper.
Firstly, considering the simpler null hypothesis puts us in the so-called signal-detection setting which has already been studied, see for instance the series of seminal papers [14] as well as [15, 19]. or [11] for a more recent treatment or [22] for the question of adaptivity to . In that context, the order of with respect to is shown to be
[TABLE]
Moreover, the question of adaptivity to e.g. is considered in [22] and [1] covers signal detection for Besov balls in a Gaussian sequence setting.
Secondly, another closely related task is the construction of (adaptive and honest) confidence regions for . In [4], the authors study such sets in terms of -separation, but rather than the observation they use a Gaussian sequence model. However, due to the asymptotic equivalence of these models in the sense of Le Cam (see [18]), it is possible to derive from their arguments that for our problem (1.2),
[TABLE]
While the resulting gap in the case is not essential in the confidence region setting (see also [5] and [16]), it is quite important from a testing perspective as it raises the question how the complexity of the null hypothesis influences the separation rate.
Further relevant literature on confidence sets and adaptivity would be [7] as well as [9] (matrix completion), [20] (linear regression) and [12] (density estimation). Moreover, interestingly, literature on the frequentist coverage of Bayesian credible sets reveals conditions (“polished tail”, “self-similarity” or also “excessive bias”) which enable deriving adaptive or honest confidence sets from adaptive Bayesian credible sets - see for instance [23] (adaptive confidence sets in Gaussian sequence model with Sobolev-type regularity), [21] (honest confidence sets in rather abstract framework) or also [2] (white-noise model, e.g. adaptive minimax results for the setting from [23] under “excessive bias”).
Now, the article [8] is by far the closest previous work to the present paper. Indeed, the author studies the same problem with another choice of Sobolev-ball, namely the -Sobolev-balls . In this context, is proved to be of magnitude
[TABLE]
Note that this quantity is equal to the rate in the signal-detection case and hence in particular does not depend on . This makes the issue of the gap in (1.4) even more interesting and, from a technical perspective, it is rather striking given that moving from a simple to the composite null hypothesis is a significant step. On top of that, there are settings where the separation rate strongly depends on the shape of the null hypothesis, see e.g. [3] and [17] or also [6].
To the best of our knowledge, the case of [8] is the only one for which the minimax -separation rate is known and our main contribution is to extend that result to the -Sobolev-space. While our lower bound (Theorem 3.2 in section 3) is essentially a corollary of the corresponding result [8, Theorem 3.2], the upper bound (Theorem 3.1 in section 3) cannot be established through a simple application of [8, Theorem 3.1]. As , this might be surprising at first sight: Indeed, the test from [8] would perform well in the present setting in terms of type-I-error. However, ensuring sufficient power is significantly more difficult when considering -Sobolev-balls, see 3.2 for an explicit example.
2 Setting
In this section, we describe how the relevant Sobolev balls and the observed Gaussian process will be represented throughout the paper.
Wavelet transform and associated Sobolev ball
Throughout the paper, we make heavy use of a wavelet decomposition of . As is well-known, we can define a scalar product and associated norm on by
[TABLE]
There are many orthogonal wavelet bases of with respect to . A suitable choice for our purposes is a basis developed in [10] that can be written as
[TABLE]
i.e. it is tailored such that there are exactly basis functions at resolution . Clearly, the coefficients of with respect to are given by
[TABLE]
and yield the representation
[TABLE]
Let . By virtue of isometry properties discussed for instance in [24] and [13], we may now define a functional -Sobolev-ball of radius solely through the wavelet coefficients of its elements, based on the basis from (2.1):
[TABLE]
with associated -Sobolev-norm
[TABLE]
or also, as mentioned at the end of the previous section,
[TABLE]
Discrete observation scheme based on the wavelet basis
Let
[TABLE]
Motivated by (2.3), for each we consider
[TABLE]
so that
[TABLE]
The natural corresponding estimators read
[TABLE]
By construction and due to the orthonormality of , we know that the family is independent with
[TABLE]
Clearly, observing this family is equivalent to observing the original process .
3 Main results
In this section, we state and discuss our main results, that is upper and lower bounds on . We also provide a high-level description of the strategy and ideas included in the upper bound proof, which is our main contribution.
3.1 Upper Bound
The test
Note that from (2.4) is not a useful estimator as it exhibits infinite variance. Therefore, we need to carefully impose a restriction of the form for some fixed , . Actually, section 5 is primarily concerned with obtaining an upper bound on for the reduced, finite-dimensional problem
[TABLE]
where and are analogous in definition and relation to their counterparts in (1.2) and (1.3). In fact, finding a sufficient separation distance here is the central and most involved part of the paper.
As we illustrate in section 3.2, it turns out that a test based on estimating only cannot perform well enough under the targeted separation distance of order due to the strong variance at high levels, so that more flexibility is necessary: In Lemma 5.3, we analyse the smallest level such that considerably exceeds (such an index must exist under ) and it turns out that this is detectable through the estimator (section 5.4, second paragraph). Hence, we propose a test which evaluates the individual accumulated (squared) Sobolev-norms of the projections up until level and rejects the null hypothesis whenever one of these norms is too large.
In particular, we define for
[TABLE]
and finally the test
[TABLE]
In principle, the conditions are based on applying Chebyshev’s inequality to the estimators with a bias-correction term (Lemma 5.2 below). Now, since the variance of depends on , it needs to be estimated, which manifests itself especially in the last part of .
The choice of is then governed by reaching a trade-off between the resulting upper bound on and the error incurred by ignoring the resolutions beyond - it is the index where they are both of order ,
[TABLE]
In terms of technical ingredients, all these considerations are remarkable in that they solely rely on elementary computations based on the Sobolev-balls’ geometry and classical properties of the distribution.
Our main result reads as follows:
Theorem 3.1**.**
Let . Whenever
[TABLE]
the test from (3.1) fulfils
[TABLE]
Hence,
[TABLE]
3.2 Remark on the relation to [8]
In order to clarify the distinction between the previous work [8] with and the present paper, we consider two rather specific examples.
Testing the resolutions separately does not suffice
First of all, note that is very large compared to , which ensures that, as mentioned above, the test from [8] performs well under the null hypothesis of the present paper. However, this geometric imbalance is so strong that often for one and the same function, we would like one test to reject the null hypothesis and the other test to not reject it:
Consider a simple extreme case where
[TABLE]
Then clearly we have
[TABLE]
It can be assured that through the condition , so that clearly we have found a case where
[TABLE]
i.e. both the null hypothesis of [8] and our alternative hypothesis are met. The test from [8] based on separately evaluating the individual levels will clearly not reject our null hypothesis with high probability. On the other hand, in order to check the new test’s performance, let us invoke Theorem 3.1: By construction, for any , there is a sequence in such that
[TABLE]
Then we have
[TABLE]
where the last bound can be derived from the observation that necessarily or . As this holds for any , in particular we have
[TABLE]
for appropriate so that the new test detects that with high probability.
Estimating only does not suffice
The strategy of only estimating is too optimistic in the present setting:
Consider a case where for some
[TABLE]
Then on the one hand,
[TABLE]
which, again, exceeds our (squared) upper bound for appropriate or so that in principle, it is possible to detect in the sense of Theorem 3.1.
Note that we can see this without using information on more than the first level. This is an important observation with regards to the construction of our test.
Furthermore, we have
[TABLE]
On the other hand, as we show in Lemma 5.2 below, in this special case the cost in terms of standard deviation of including the estimate would be
[TABLE]
(absolute constand). For large enough and/or small enough , this standard deviation exceeds the (squared) distance to be detected - hence, a test based on level is unlikely to correctly reject the null hypothesis. The test we propose copes with such a situation through analysing multiple accumulated estimates and would have detected at the first level already with high probability.
3.3 Lower Bound
Using the same choice for as indicated above, a lower bound on of the same order can be derived through studying the statistical distance between specific distributions agreeing with and respectively.
Theorem 3.2**.**
Let . There are and such that whenever and
[TABLE]
for any test it holds that
[TABLE]
Hence,
[TABLE]
In particular, one may choose
[TABLE]
Note that, as mentioned in the introduction, Theorems 3.1 and 3.2 in conjunction reveal the minimax separation rate to be of order
[TABLE]
which does not depend on the size of the null hypothesis and is equal to the signal-detection rate. Indeed, in order to obtain the lower bound of Theorem 3.2, the fact that is a composite hypothesis need not be used.
4 Alternative settings
Before presenting the proofs of our main results, we briefly discuss their possible application in two alternative settings which might also be of interest, see also [8, Section 3.3] and references therein.
Heteroscedastic noise
As a generalisation of (1.1), consider the model
[TABLE]
where is unknown. The proof of Theorem 3.1 relies heavily on unbiased estimators of , , and hence on knowledge of the noise coefficient, so that in this generalised version we cannot directly apply our result. However, there is a relatively simple solution under certain conditions: Suppose we have access to two independent realisations and with noise coefficient, say, . Then we can still consider the estimates
[TABLE]
and define a new unbiased estimator for based on the simple observation
[TABLE]
If in addition we know an upper bound on , it turns out that we can state an analogous concentration result as the one for the homoscedastic model (see Lemma 5.2 below) and obtain essentially the same result.
Regression
Another possible observation scheme for testing the smoothness of would be collecting iid samples according to the model
[TABLE]
for and uniformly distributed on . This situation is particularly interesting since, as mentioned above, it is asymptotically equivalent to (4.1) in the sense of Le Cam ([18]) We could then arrive at the same situation as in the previous setting by considering
[TABLE]
Note that if is not uniformly distributed, is generally not true and it becomes crucial to guarantee a certain spread of the design points over .
Open problems: separation in -norm and more general Sobolev-spaces
We only consider separation in -norm, which raises the question if it is possible to generalise the results to separation in -norm, ; the same is true for the previous paper [8]. We believe that the strategies of both papers cannot be easily generalised as different values for result in fundamentally different problems. Indeed, strong differences with varying already show in the allegedly simple setting of signal detection in the Gaussian vector/sequence model, see [15, section 3.3.2]). Much more closely related to the present paper, in [19] the authors derive optimal rates for estimating and give very different results and approaches for even versus odd integers . With that said, considering more general Sobolev-balls would seem to produce similar effects as our results heavily rely on estimating the -norm of projections of (or, in some sense, -norm in the previous paper); coping with different parameters here is not trivial as can be seen for instance in the proofs of [7].
In summary, such considerations are generally possible and constitute worthwile future work, but they are beyond the scope of the present paper.
5 Proof of Theorem 3.1
5.1 General preparations
Reduction of the range of resolutions
Let us make this more clear at this point already: For with and , define the projections
[TABLE]
Now observe that since , for each , , we have
[TABLE]
and hence
[TABLE]
Using the triangle inequality, this tells us that under the alternative hypothesis
[TABLE]
Accordingly, under we consider the assumption
[TABLE]
and firstly solve for in terms of the reduced range , that is, subsequently, we will primarily study the testing problem
[TABLE]
Finally, will be determined by choosing such that a reasonable trade-off between the two summands,
[TABLE]
is realised.
Now, more specifically, with , for , let
[TABLE]
Under it will be technically useful to detect the level at which firstly exceeds in the sense of Lemma 5.1 below. That leads to a multiple test across the set finally given in (5.89).
Decomposition of
Lemma 5.1**.**
Under the alternative hypothesis , we have
[TABLE]
**Proof. ** By contradiction: Assume that (5.9) is false, i.e.
[TABLE]
Then clearly is false, so that is true. Equivalently, is false and in turn must be true. Continued application of this argument leads to the contradiction
[TABLE]
Concentration of
Lemma 5.2**.**
Let . Then, with
[TABLE]
it holds that
[TABLE]
**Proof. ** For , let
[TABLE]
Then, by construction, we know that
[TABLE]
i.e. a distribution with degrees of freedom and non-centrality parameter . Classical properties of this distribution now tell us
[TABLE]
Since
[TABLE]
independence in conjunction with (5.12) yields
[TABLE]
We obtain the desired result directly through Chebyshev’s inequality: For ,
[TABLE]
and hence the claim.
More specifically, observe that
[TABLE]
(where we use that for , ) and hence for
[TABLE]
Furthermore,
[TABLE]
The maximum in the latter computation will play an important role in the sequel. From now on we use the abbreviation
[TABLE]
Plugging these bounds in (5.11) leads to
[TABLE]
for any .
5.2 Preliminary Bounds on
As a next step towards controlling the type-I and type-II errors of our test, we study more closely.
On the one hand, under , for any we clearly have .
On the other hand, under , we require a lower bound on . The following bound is preliminary in the sense that it requires the knowledge of an index with the property from (5.9) and the corresponding . The generalisation will be considered in sections 5.3 and 5.4.
Lemma 5.3**.**
Let be an index with the property
[TABLE]
Then the following assertion holds for :
[TABLE]
**Proof. ** Before giving the main arguments, we need a technical preparation and a general (i.e. only depending on ) lower bound on :
Proxy minimisation of
For , write . In the case that , we can introduce the function through the wavelet coefficients
[TABLE]
Then holds since
[TABLE]
Hence, by assumption
[TABLE]
where
[TABLE]
This tells us that if ,
[TABLE] 2. 2.
Bound in terms of
If , we can use (5.37) with and and obtain
[TABLE]
If , observe that by the triangle inequality
[TABLE]
and since
[TABLE]
we obtain
[TABLE]
So, in any case,
[TABLE] 3. 3.
Main arguments
We are now ready to prove (5.28) effectively. To that end, fix an index
[TABLE]
Case 1:
In that case, we can use (5.36) and (5.37) with in comination with (5.41) and obtain
[TABLE]
remembering (5.25).
Case 2:
That case can be handled quickly by considering two subcases:
Subcase 1:
Observe that with (5.41)
[TABLE]
Subcase 2:
In that case we have
[TABLE]
and thus
[TABLE]
This concludes the proof since in any case (5.28) holds.
5.3 Estimation of
As a last major step before directly controlling the type-I and type-II error probabilities, we need to find an appropriate estimator for .
Lemma 5.4**.**
For and , let
[TABLE]
and define the events
[TABLE]
Then, for any monotone decreasing sequence in , the following holds:
[TABLE]
**Proof. ** Remembering (5.12), we know that for
[TABLE]
has the properties
[TABLE]
Now observe that for
[TABLE]
With , Chebyshev’s inequality now tells us that
[TABLE]
We derive two bounds from this statement by lower bounding the the left hand side in two different ways:
On the one hand, observe
[TABLE]
Now, since is monotone decreasing, the sequence is increasing, so that via a union bound we obtain
[TABLE]
With
[TABLE]
we have
[TABLE]
and hence the first claim from .
On the other hand, observe
[TABLE]
and consider the specific case in (5.65):
[TABLE]
which asserts the second claim from (5.58).
5.4 Conclusion
We will now assemble the individual results of the previous sections to obtain the claim of Theorem 3.1. For we introduce
[TABLE]
so that in particular
[TABLE]
and is monotone decreasing.
Result for fixed index
For define
[TABLE]
Then under , (5.56) and (5.26) yield that with probability at least
[TABLE]
so that with
[TABLE]
we obtain
[TABLE]
On the other hand, let be a transition index with property (5.27). Then under , (5.26) and (5.28) tell us that with probability at least
[TABLE]
Provided that
[TABLE]
using (5.57) this yields
[TABLE]
Now by explicit computation we see that the choices in (5.76) ensure (5.86) as well as
[TABLE]
so that (5.87) can be continued as
[TABLE]
and hence, finally,
[TABLE]
Generalisation to unknown
For our test
[TABLE]
we can conclude with (5.58) and (5.76) that on the one hand
[TABLE]
and on the other hand
[TABLE]
Specification of and conclusion
We are now ready to return to (5.7). Choose
[TABLE]
so that
[TABLE]
That yields
[TABLE]
and, on the other hand,
[TABLE]
Therefore, whenever we choose
[TABLE]
[TABLE]
6 Proof of Theorem 3.2
6.1 Description of the Strategy
According to (1.3), given , we aim at finding such that for any test ,
[TABLE]
This can be achieved through a Bayesian-type approach, see e.g. [1]: Let be probability distributions (priors) such that and . Then we have
[TABLE]
This tells us that if we find such that
[TABLE]
for any test it holds that
[TABLE]
and hence
[TABLE]
6.2 Application to our Problem
Priors
Since the upper bound does not depend on and we found the index from (5.96) to be critical, we choose the following structurally simple priors: Let be the Dirac- distribution on (i.e. ) and be the uniform distribution on
[TABLE]
where needs further specification: On the one hand, it is necessary to ensure that each fulfils - note that for any such , , so that by construction that condition reads
[TABLE]
This motivates the choice for some specified later based on further restrictions. On the other hand, we require
[TABLE]
Since only the level is involved, this is in fact merely the minimum over the Euclidean ball with radius so that
[TABLE]
Now, by explicit computation we see that if
[TABLE]
with our choice of we have
[TABLE]
so that (6.6) holds if
[TABLE]
Statistical distance
Again, the central task in this proof is to compute the -divergence between and . By construction, corresponds to the -fold product of Gaussian distributions with mean [math] and variance , so that for
[TABLE]
On the other hand, corresponds to a uniform mixture of products of independent Gaussians with means of the form and variance .
Let and be uniformly distributed on (i.e. the product of Rademacher variables). Then
[TABLE]
and furthermore, with an independent copy of ,
[TABLE]
The quotient we need to integrate in (6.5) therefore reads
[TABLE]
Since the product of independent Rademacher variables is itself a Rademacher variable, we obtain
[TABLE]
Conclusion
Now, (6.5) holds if
[TABLE]
which, by explicit computation, is fulfilled if
[TABLE]
Through (5.97) and (5.98) we find that
[TABLE]
and obtain the stronger condition
[TABLE]
In summary: Let
[TABLE]
If
[TABLE]
the priors and meet all requirements and the lower bound
[TABLE]
is established, where we write .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Baraud, Y. Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 , 5 (2002), 577–606.
- 2[2] Belitser, E., et al. On coverage and local radial rates of credible sets. The Annals of Statistics 45 , 3 (2017), 1124–1151.
- 3[3] Blanchard, G., Carpentier, A., and Gutzeit, M. Minimax Euclidean separation rates for testing convex hypotheses in ℝ d superscript ℝ 𝑑 \mathbb{R}^{d} . Electronic Journal of Statistics 12 , 2 (2018), 3713–3735.
- 4[4] Bull, A., and Nickl, R. Adaptive confidence sets in L 2 subscript 𝐿 2 L_{2} . Probability Theory and Related Fields 156 , 3-4 (2013), 889–919.
- 5[5] Cai, T. T., and Low, M. G. Adaptive confidence balls. The Annals of Statistics 34 , 1 (2006), 202–228.
- 6[6] Cai, T. T., and Low, M. G. Testing Composite Hypotheses, Hermite Polynomials and Optimal Estimation of a Nonsmooth Functional. The Annals of Statistics 39 , 2 (2011), 1012–1041.
- 7[7] Carpentier, A. Honest and adaptive confidence sets in l p subscript 𝑙 𝑝 l_{p} . Electronic Journal of Statistics 7 (2013), 2875–2923.
- 8[8] Carpentier, A. Testing the regularity of a smooth signal. Bernoulli 21 , 1 (2015), 465–488.
