Testing Gaussian Process with Applications to Super-Resolution
Jean-Marc Aza\"is, Yohann De Castro, St\'ephane Mourareau

TL;DR
This paper develops exact Gaussian process testing methods for super-resolution applications, including grid-based and grid-less procedures, demonstrating their effectiveness in deconvolution and sparse signal detection.
Contribution
It introduces novel grid-based and grid-less testing procedures for Gaussian processes, applicable to super-resolution and deconvolution, with proven convergence and power advantages.
Findings
Grid-less test is more powerful in detecting sparse signals.
Both tests work with unknown variance and known correlation.
Numerical results show the grid-less method's superior performance.
Abstract
This article introduces exact testing procedures on the mean of a Gaussian process derived from the outcomes of -minimization over the space of complex valued measures. The process can be thought as the sum of two terms: first, the convolution between some kernel and a target atomic measure (mean of the process); second, a random perturbation by an additive centered Gaussian process. The first testing procedure considered is based on a dense sequence of grids on the index set of~ and we establish that it converges (as the grid step tends to zero) to a randomized testing procedure: the decision of the test depends on the observation and also on an independent random variable. The second testing procedure is based on the maxima and the Hessian of in a grid-less manner. We show that both testing procedures can be performed when the variance is unknown (and theâŚ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques ¡ Gaussian Processes and Bayesian Inference ¡ Statistical Methods and Inference
\arxiv
1706.00679
,
Testing Gaussian Process with Applications to Super-Resolution
Jean-Marc AzaĂŻsâlabel=e1][email protected] [ ââ
Yohann De Castro*â ,â*label=e2][email protected]=e4][email protected] [[ ââ
ââ
StĂŠphane Mourareauâlabel=e3][email protected] [ UniversitĂŠ Paul Sabatier
âInstitut de MathĂŠmatiques de Toulouse, UniversitĂŠ Paul Sabatier\bUnif 118 route de Narbonne, 31062 Toulouse, France.
UniversitĂŠ Paris-Sud
*â *Laboratoire de MathĂŠmatiques dâOrsay\bUnif Univ. Paris-Sud, CNRS, UniversitĂŠ Paris-Saclay, 91405 Orsay, France.
INRIA
âINRIA, Centre de Recherche de Paris, Ăquipe MoKaPlan\bUnif 2 rue Simone Iff, 75012 Paris, France.
UniversitĂŠ Paris-Est Marne-la-VallĂŠe
âLaboratoire dâAnalyse et de MathĂŠmatiques AppliquĂŠes, Univ. Paris-Est\bUnif Champs sur Marne, 77454 Marne La VallĂŠe, France.
Abstract
This article introduces exact testing procedures on the mean of a Gaussian process derived from the outcomes of -minimization over the space of complex valued measures. The process can be thought as the sum of two terms: first, the convolution between some kernel and a target atomic measure (mean of the process); second, a random perturbation by an additive centered Gaussian process. The first testing procedure considered is based on a dense sequence of grids on the index set of and we establish that it converges (as the grid step tends to zero) to a randomized testing procedure: the decision of the test depends on the observation and also on an independent random variable. The second testing procedure is based on the maxima and the Hessian of in a grid-less manner. We show that both testing procedures can be performed when the variance is unknown (and the correlation function of is known). These testing procedures can be used for the problem of deconvolution over the space of complex valued measures, and applications in frame of the Super-Resolution theory are presented. As a byproduct, numerical investigations may demonstrate that our grid-less method is more powerful (it detects sparse alternatives) than tests based on very thin grids.
62E15,
62F03,
60G15,
62H10,
62H15,
60E05,
60G10,
62J05,
94A08,
Hypothesis Testing,
Gaussian Process,
Kac-Rice formula,
Super-Resolution,
keywords:
[class=MSC]
keywords:
Preprint of March 3, 2024
1 Introduction
1.1 Grid-less spike detection through the âcontinuousâ LARS
New testing procedures based on the outcomes of minimization methods have attracted a lot of attention in the statistical community. Of particular interest is the so-called âSpacing testââ, that we referred to as , based on the Least-Angle Regression Selection (LARS), that measures the significance of the addition of a new active variable along the LARS path, see [16, Chapter 6] for further details. Specifically, one is testing the relative distance between consecutive âknotsââ of the LARS, for instance and . The first knot is the maximal correlation between a response variable and predictors. The second knot is then the correlation between some residuals and predictors, and so on. This approach is now well referenced among the regularized methods of high-dimensional statistics and it can be linked to minimizing the -norm over coordinates, see for instance [16, Chapter 6].
In this paper, we focus on -minimization over the space of signed measures and we ask for testing procedures based on these solutions. Indeed, in deconvolution problems over the space of measures [7]âe.g., Super-Resolution or line spectral estimation [8, 14, 12, 20, 11, 3]âone may observe a noisy version of a convolution of a target discrete measure by some known kernel and one may be willing to infer on the target discrete measure. In this case, testing a particular measure is encompassed by testing the mean of some âcorrelationâ process , see Section 6 for further details.
In general deconvolution problems, remark that there is an uncountable number of predictors with valued in a hilbert space (not necessarily finite)âwhile there were predictors previously when inferring on vectors of in the high-dimensional statistics framework. Indeed, we are looking at correlations between a response variable and a âfeature mapâ indexed by a continuum, say for instance . In this case, the set of predictors is uncountable and given by . Furthermore, is an element of the Reproducing Kernel Hilbert Space (RKHS) defined by the convolution kernel âassumed to be symmetric positive definite. In particular, the hilbert space can be infinite dimensional. As an example, assume that one observes some Fourier coefficients of some discrete measure on the torus and one is willing to infer on its support. A strategy would be to look at correlations between the response variable and the Fourier curve for some frequency cut-off so that . It results in a complex valued correlation process indexed by . In this case, the RKHS has dimension , the number of observed Fourier coefficients, and the convolution kernel is given by the Dirichlet kernel, see Section 6. As an illustration, we present Figure 1 where we take and the red curve displays the absolute value of the correlation process . One can standardly show that is the likelihood of the model that consists in one spike at point . Therefore, its maximal value can be interpreted as the Maximum Likelihood for models with one spike. Its argument maximal point is then the Maximum Likelihood Estimator and one may be willing to consider it as a first estimation of the target discrete measureâs support. Then one can consider the residuals where is the weight of the estimated signal chosen so that we get the blue curve of Figure 1, namely a second support point should enter the model since the residuals achieve their maximal absolute value at two locations, and . More details can be found in Section 2.4.
In this framework, the LARS algorithm does not return a sequence of entries (among possible coordinates) and phases as in high-dimensional statistics but rather a sequence of locations (among the continuum ) and phases. In this paper, we invoke the LARS to this frameworkâwe referred to it as âcontinuousâ LARSâfor which an uncountable number of active variables may enter the model. We present this extension in Section 2 defining consecutive knots . One can wonder:
- â˘
*Can the Spacing test be used in the frame of Super-Resolution? *
- â˘
*Is there a grid-less procedure more powerful, in the sense of detecting spikes, than the Spacing tests constructed on thin grids? *
Interestingly, as we will prove, the answer is no to the first question if no modifications of the test statistic is done. Furthermore, the way that the Spacing test can be fixed to be extended to a âgrid-lessâ frame gives a new testing procedure that accounts for the distance between consecutive knots with respect to value of the Hessian at some maximal point, see Figure 2 for a global view on our approach.
1.2 A comparative study
When the predictors are normalized, the Spacing test (ST) statistics is given by the expression
[TABLE]
where is the Gaussian survival function and the standard normal cumulative distribution function. In the framework of high-dimensional statistics, this statistics is exactly distributed w.r.t. a uniform law on under the global null, namely can be considered as the observed significance [21, 4]. It is clear that one should not use this testing procedure in the Super-Resolution framework since there is no theoretical guarantees in this case. Yet the practitioner may be tempted to replace by given by the âcontinuousâ LARS. Unfortunately, this paper shows that the resulting test statistics is non conservative in this frame, i.e., it makes too many false rejections and one should avoid using it in practice, see the green line in Figure 3.
To overcome this disappointing feature, one may be willing to consider thinner and thinner grids and look at the limit as tends to infinity. In this case, one can show that tends to the of âcontinuousâ LARS, but does not converge to , it converges to as shown in (15). This results in a limit test that is a randomized version of the Spacing test that we referred to as and presented in Theorem 1.
The second approach is to take a thin grid and to use . This approach is perfectly valid, this test statistics follows a uniform distribution under the null and it should be compared to our new testing procedure . This numerical investigation has been performed in the frame of Super-Resolution and it is presented in Figure 4, more details can be found in Section 6.2. Figure 4 gives the cumulative distribution functions of the test statistics under âsparseâ alternatives, i.e., when true spikes are to be detected. The larger the power, the better the test detects spikes (abscissa represents the level of the test and the ordinate the probability to detect the spike). In these sets of experiments, we can note that
- â˘
The testing procedure based on some Hessian and the whole process is uniformly better than the spacing test even if one takes very thin grids.
One can see that the power (the ability to detect sparse objects, Dirac masses here) of the grid methods seems to present a limit that is always improved by the continuous approach.
1.3 Contribution
For the first time, this paper paves the way to build new testing procedures in the framework of Super-Resolution theory and line spectral estimation. In particular, we prove that we can rightfully construct global null exact testing procedures on the first two knots and of the âcontinuousâ LARS when one has a continuum of predictors, see Theorems 1 and 4 and Figure 3. These two new procedures offer the ability to test the mean of any stationary Gaussian process with known correlation function and -paths. Furthermore, one of these tests is unbiased, see Theorem 1 and they can be both studentized, see Theorems 2 and 8 and Figure 6, when variance is unknown.
Outline
section.1 subsection.1.1 subsection.1.2 subsection.1.3
section.2 subsection.2.1 subsection.2.2 subsection.2.3 subsection.2.4 section.3 subsection.3.1 subsection.3.2 section.4 section.5 subsection.5.1 subsection.5.2 section.6 subsection.6.1 subsection.6.2 appendix.A subsection.A.1 subsection.A.2 subsection.A.3 subsection.A.4 appendix.B subsection.B.1 subsection.B.2 section*.7
Notations and the formal problem formulation is described in Section 3. In Section 4, we present the test statistic which is constructed taking the limit of consecutive LARS knots on thinner and thinner grids (namely the number of predictors tends to infinity). Section 5 is the theoretical construction of our grid-less test based on consecutive knots of the âcontinuousâ LARS. The main result concerning the test statistic is presented in this section. Applications to spike detection in Super-Resolution are developed in Section 6. An appendix with the proofs can be found at the end of the paper.
The general construction of the âcontinuousâ LARS is given in Section 2. This section is independent from the rest of the paper.
2 The âcontinuousâ LARS
2.1 Cameron-Martin type Assumption on the mean
The algorithm presented here can be used for a large class of complex processes . We consider a complex-valued Gaussian process indexed on a compact metric space with covariance function .
Remark 1**.**
Note that this model encompasses our to-be-announced-framework (see Section 3) setting and with the correlation of defined in Section 3.1. We do not assume that the process is stationary in this section.
We assume that its covariance is such that there exists such that
[TABLE]
The scalar accounts for the contribution of the real and the imaginary part of and is the variance of the real part of . We assume that has continuous sample paths.
We present here the underlying hypothesis on the mean of the Gaussian processes under consideration when using the LARS algorithm. This hypothesis is of Cameron-Martin type. Indeed, the main drawback that should be avoided is when the mean cannot be represented in the RKHS of the Gaussian process . We recall that we can define a reproducing Hilbert space of the covariance , see [15, Chapter 2.6] for instance. Denote this complex Hilbert space. Also, we can invoke a Karhunen-Loève expansion of the process . Namely, there exist i.i.d. complex standard normal variables , a real orthonormal system on and such that
[TABLE]
where the identity holds almost surely in the Banach space of continuous functions on equipped with the -norm. By Mercerâs theorem, we know that
[TABLE]
where the identity holds almost surely in the Banach space of continuous functions on equipped with the -norm. We recall also that the Hilbert space can be defined as
[TABLE]
with the inner product
[TABLE]
We observe and we want to estimate its mean . Remark that almost surely it holds , where is the closure of in the space of continuous functions equipped with the infinity norm, see e.g. [15, Corollary 2.6.11]. Remark that is also closed in . Denoting by the orthogonal space of , one has where the sum is orthogonal. We denote by (resp. ) the orthogonal projection onto (resp. ). Since almost surely , remark that almost surely and this process can be observed and is deterministic. Without loss of generality, we assume that subtracting to . Also, we assume that
[TABLE]
Recall that and Assumption (2) gives that using .
2.2 Description of the âcontinuousâ LARS
We assume that and, as mentioned above, this assumption is equivalent to Assumption (2). Following standard references, e.g., [16, Chapter 5.6], the Least-Angle Regression Selection (LARS) algorithm can be extended to Gaussian processes. To the best of our knowledge, the LARS for complex Gaussian processes has never been introduced and we present its formulation here for the first time. Actually, the presentation given in this section can be applied to any RKHS setting. It results in a description of the LARS in infinite dimensional feature spaces and this framework has been dealt in [19]. However, note that the paper [19] only concerns real signed measures and their âdoublingâ dimension trick [19, page 546] cannot be used when dealing with complex measures. In particular, their result cannot be invoked in Super-Resolution where it is of utmost importance to deal with complex measures. This section presents the âcontinuousâ LARS for Super-Resolution.
The LARS is a variable selection algorithm giving a sequence where the knots are ordered such that and is a complex-valued measure. We recall that the space is defined as the dual space of the space of continuous functions on equipped with the -norm. A pseudo-code is presented in Algorithm 1 and the technical details are presented below. When defining the âcontinuous LARSâ, we assume that
[TABLE]
Under this assumption, the process is twice differentiable in quadratic mean and once differentiable almost surely.
2.2.1 The first knot
Inspired by the Super-Resolution frameworkâpresented in Section 6, we consider as some âcorrelation processâ in the spirit of (22). In particular, the most correlated point can be defined by (13), namely
[TABLE]
Under Assumption (1), Proposition 16 shows that almost surely there exists a unique point such that . Define the âactive setâ function as
[TABLE]
and for . The path for will be defined in the sequel. It is a piecewise continuously differentiable path representing the support of a discrete measure such that
[TABLE]
namely the residual has -norm less than . Set the first fitted solution to and the first residual to for initialization purposes. Observe that
[TABLE]
2.2.2 The second knot
We want to add an other point to the active set and define a discrete measure supported on while keeping the above inequalities true. First, we solve the least-squares fit given by
[TABLE]
This program can be solved in closed form and it holds that . Then, for any , define by
[TABLE]
and observe that . Remark also that has a local maxima at point . Indeed, under (), the function is continuously differentiable and it has as global maximum by definition of and . Il follows from (11) that is a local maxima of and therefore is a local maxima of .
Now, we keep track of the largest value of the âcorrelationâ process on the complementary set of while moving from toward zero. We define as the largest value for which there exists a point such that . Set
[TABLE]
If is not unique, we add all the solutions of (4) to the active set . For sake of readability, we assume that is the only solution to (4). Then, update
[TABLE]
where, for all ,
[TABLE]
is the second residual associated to the second fitted solution . Remark also that
[TABLE]
2.2.3 The other Knots: Moving the Active Set between Knots
From this point we proceed iteratively. For , we assume that we have found and such that
[TABLE]
We want to define the path for values starting from . We look for a path such that are continuously differentiable and there exists supported on such that the above inequalities hold true. This path will be defined on for a value defined later.
Consider and define
[TABLE]
where we denote and we assume that is invertible. If is not invertible then we stop. The path will be defined later on. Note that for .
Remark 2**.**
Note that the function is the regression of onto the finite dimensional space .
Then, for any , define
[TABLE]
and observe that for all . Indeed, it holds
[TABLE]
and recall that it holds .
We will enforce that is a local maximum of imposing that its derivative is zero along the path for . This can be done invoking the implicit function theorem as follows. Define for and
[TABLE]
where
[TABLE]
Assume that the jacobian is invertible. If is not invertible then we stop. Therefore, the implicit function theorem implies that there exists a continuously differentiable path such that is equivalent to on a neighborhood of . On this path, the derivative of at points is zero (thanks to (9)) while . We deduce that there exists a neighborhood of on which each point is a local maximum of .
Now, we keep track of the largest value of the âcorrelationâ process on the complementary set of while moving from toward zero. We define as the largest value for which there exists a point such that . Set
[TABLE]
If is not unique, we add all the solutions of (10) to the active set . For sake of readability, we assume that is the only solution to (10).
Update
[TABLE]
where, for all ,
[TABLE]
is the th residual associated to the th fitted solution . Remark also that
[TABLE]
and update to to iterate the procedure.
2.3 Equivalent expression of the second knot
First, observe that is defined as in (13) and that the two definitions agree. Indeed, recall that so that at point with as in (3). By optimality, it holds that .
Then, the case is interesting since is a statistic used in the test statistics described in the sequel. We will see that the two definitions agree here again, please refer to Section 3 for notations. For , it holds and the least squares direction is given by and by
[TABLE]
Multiplying by and taking the real part, this latter can be equivalently written as
[TABLE]
where . Now, recall that to compute
[TABLE]
We deduce that
[TABLE]
showing that the second knot is exactly the quantity defined in (14).
2.4 Illustration: The two first knots of Super-Resolution
The Super-Resolution process is defined in (22). It satisfies Condition () and Condition () of Section 5.2.1 with . The first point is given by the maximum of the modulus of , see the red curve in Figure 1. Observe that and the maximum satisfies . Then, we compute
[TABLE]
where denotes the Dirichlet kernel. For , the maximum of is achieved at a unique point, namely . For , a second point achieves the maximum. This transition defines , see Figure 1.
From this point, we can iterate fitting the least squares direction on the support and decreasing while a third point achieves the maximum. Given the red curve in Figure 1, it was not obvious that the second knot would have been since other local maxima seemed more significant than on the red curve.
3 Notations and problem formulation
3.1 Hypothesis testing problem
In this paper, our purpose is to test the mean value of a stationary complex-valued Gaussian process with -paths indexed by . We assume that where and are two independent and identically distributed real-valued processes with -paths. Assume that the correlation function of (and ) satisfies
[TABLE]
and let so that
[TABLE]
We denote by the -dimensional torus. Assume that we observe a real-valued process indexed by such that
[TABLE]
where and denotes the real part of a complex number. Remark that observing is equivalent to observe since we can recover from and conversely. Furthermore, we may assume that the process satisfies
[TABLE]
where and denote the gradient and the Hessian of at point . Note that sufficient conditions for () are given by [5, Proposition 6.5] applied to . In particular if the distribution of is non degenerated, using [5, Condition (b) of Proposition 6.5], it implies that Assumption () is met. Note also that Assumption () is referred to as âMorseâ process in [2]. Remark that () and () are mild assumptions ensuring that is a non-pathological process with -paths.
This paper aims at testing the following hypotheses:
[TABLE]
Remark that this framework encompasses any testing problem whose null hypothesis is a single hypothesis on the mean of , subtracting the mean tested by the null hypothesis. Indeed, remark that can always be decomposed into
[TABLE]
where is the deterministic noiseless response and is some centered random additive perturbation of . Given any function , one might be interested in testing wether or equivalently is centered. Not rejecting this hypothesis means that there is no evidence that the residual is not centered. On the other hand, rejecting the null means that the testing procedure have found some evidence that one should not consider that the residual is centered. Now, the same discussion can be made for remarking that
[TABLE]
where we denote by X^{0}(z):=\mathrm{Re}\big{(}e^{-\imath\theta}Z^{0}(t)\big{)} the deterministic noiseless response part and by N(z):=\mathrm{Re}\big{(}e^{-\imath\theta}\eta(t)\big{)} some centered random additive perturbation of .
3.2 The first and second knots of a Gaussian process
As in high-dimensional statistics, we can define the first and second knots as follows. If we model some spatial correlation by means of the process , the most correlated point and the maximal correlation are respectively the argument maximum and the maximum of defined by
[TABLE]
Under Assumption (), one can check that the argument maximum is almost surely a singleton, see Proposition 16.
To construct the second knot, given a fixed , one can equivalently consider two regressions of , as follows.
On the one hand, the regression on that will appear in the grid method of Section 4. Using a convenient normalisation related to the definition of the LARS knots, we set
[TABLE]
where
[TABLE]
is the correlation function of the stationary Gaussian process . One can check that is a Gaussian process indexed by and independent of .
On the other hand, the regression on will be needed for convergence purposes in Section 5. With the convenient normalization, we set
[TABLE]
where is the gradient of the correlation function and is the variance-covariance matrix of the derivative process of , namely .
Since the derivative at is zero, note that and we define the second knot  as
[TABLE]
where we prove that are well defined and that is almost surely unique, see Proposition 16 and Remark 8. Furthermore, the couple can be equivalently defined using the extension of the LARS to our framework, the interested reader may consult Section 2.3.
4 Passing to the limit, the grid approach
The main idea of this section is to define a sequence of grids on , to construct a sequence of test statistics from the values of the process on as in [4] and to pass to the limit as . More precisely, we consider to be the grid with mesh on (corresponding to grid points so that ),
[TABLE]
It is the maximum of the process when indexing by the grid. We can also define the maximum of the regression when indexing by the grid, namely
[TABLE]
The Hessian at the maximum (13) on is denoted by (in particular it does not depend on the grid but on the maximum of ). By Assumption (), it is a random variable with values in the set of non degenerated negative definite matrices of size . We can define a non degenerated positive quadratic form (i.e., a metric) on by , for . Using this metric, we can consider the corresponding Voronoi tessellation of . It is a regular partition of by parallelograms, invariant by translations and . Denote by the Voronoi cell of the origin in this partition and by the uniform distribution on this cell. We understand the law as a conditional law with respect to and, conditionally to , this law is taken independent of , see Lemma 12. Conditionally to , define the randomized statistics
[TABLE]
where is the square root of and . A proof of the following result is given in Appendix A.1.
Remark 3**.**
Remark that we have taken dyadic grids here. Following the proof in Appendix A.1, one can exhibit how depend on the sequence of grids. The key result is Lemma 12 and we borrow its notation in this remark. In the general case where one consider a different type of sequence of grids, one still have independence between and but the law of the limit of (for some that may depend on the grid sequence) may differ from . We refer to this law (if it exists) as where . The dependence in depicts the fact cells defined by joining adjacent points of the grid might be topologically different (which is not the case in the dyadic case). It results that the definition of should be modified changing by . It does not change the main message here: the resulting test is randomized and (16) is non-conservative and should be avoided in pratice.
Theorem 1**.**
Under , Assumptions () and (), the randomized test statistics
[TABLE]
where denotes the standard Gaussian survival function. Moreover, the test with -value is unbiased: under the alternative , it holds for all .
Theorem 1 shows in particular that the statisticsâreferred to as the Spacing test statistics in the introductionâgiven by
[TABLE]
does not follows a distribution under ans leads to a non-conservative test. Indeed, observe that almost surely so that almost surely. Note that the two test statistics differ on the event .
Now, when the variance is unknown, we can build an estimator defined in (20) and obtain a studentized version of the previous theorem. Please consult Section 5.2.1 for further details on the construction of the estimator and on Conditions () and ().
Theorem 2**.**
Assume (), (), () and () where , then the following test statistics satisfies
[TABLE]
under where , is the Student cumulative distribution function with degrees of freedom, its survival function and is defined by (20).
A proof can be found in Appendix A.2.
Remark 4**.**
Only the first point of () is required for the proof. Moreover, if , the Student distribution is to be replaced by a standard normal distribution.
5 The Rice method: a grid-less approach
In this section, we build our test statistic directly on the entire path of the process in a grid-less manner. We assume that the process satisfies Assumptions () and (), and is centered (namely ). As in the preceding section, we consider and defined by (13) and (14) respectively.
We denote so that the covariance function of is the correlation function of , namely is the standardized version of . Note that, by regression formulas and stationarity, it holds
[TABLE]
so that we can define the process by the decomposition
[TABLE]
where and are independent for any and is the variance-covariance matrix of . In particular, observe that
[TABLE]
where . Using the Rice method of [5, Theorem 7.2] (see also [18]), it follows that the maximum has for density w.r.t the Lebesgue measure on at point
[TABLE]
where  denotes the standard Gaussian density, is the event and , as in the following, denotes a positive constant. The numerical values may vary from an occurence to another and it may depend on and which are assumed fixed in our framework.
5.1 The known variance case
We begin by the known variance case. The main observation is that the method of [5, Theorem 7.2] can be extended to compute the joint distribution of as follows.
- â˘
Denote the set of symmetric matrices and pick a Borel set on .
- â˘
For every , recall that
[TABLE]
and define
[TABLE]
Remark that, for fixed , is a.s. finite by Lemma 9, is independent of and, by way of consequence, is independent of . Furthermore, note that since is without boundary, for , one has and as defined by (14).
- â˘
Observe that on the event one has almost surely that , , and . Also, a simple computation shows that
[TABLE]
almost surely. Hence, by unicity of and recalling that the set is finite under (), we deduce that
[TABLE]
- â˘
On define smooth lower approximations of the indicator function of that converge when goes to infinity i.e.
[TABLE]
- â˘
Apply Rice formula with weights [5, Theorem 6.4] (see also the proof of [5, Theorem 7.2]) to compute
[TABLE]
where the last equality relies on the fact that is independent of .
- â˘
Combining the previous observations and passing to the monotone limit as tends to in the aforementioned Rice formula with weights, we get that
[TABLE]
by stationarity and using that, on the event , the matrix belongs to the set of positive definite symmetric matrices, namely .
Before stating the key result on the joint density of we need to introduce a dominating measure. First, recall that is independent of the pair . Then, observe that where is defined as in (17) for the process . Denote the law of and note that it does not depend on . Denote the law of and remark that for any Borel set of , it holds . Eventually, remark that
[TABLE]
where denotes the Lebesgue measure on . As a consequence we can prove the following proposition.
Proposition 3**.**
Under , the joint law of satisfies for all ,
[TABLE]
where is defined by (19) and denotes the set of symmetric matrices.
Proof.
Observe that the density at point of with respect to the Lebesgue measure is and recall (19). Now, for any Borel set of , note that
[TABLE]
thanks to (18), which prove the result. â
We can now state our result when the variance is known.
Theorem 4**.**
Set
[TABLE]
where denotes the Hessian of the correlation function of at the origin. Under Assumptions () and (), the test statistic
[TABLE]
under .
Proof.
Using Proposition 3, we know that the density of at and conditional to is equal to
[TABLE]
It is well known that, if a random variable has for cumulative density function then follows an uniform distribution on . This implies that, conditionally to ,
[TABLE]
Since the conditional distribution does not depend on , it is also the non conditional distribution and it yields
[TABLE]
as claimed. â
5.2 The unknown variance case
5.2.1 Estimating the variance
When the variance is unknown in (12), we precise here the assumptions and the estimator we use to estimate the variance. In this section, except for explicit examples, we consider a real valued centered Gaussian process not necessarily stationary defined on the -dimensional torus . Let (possibly infinite) and assume that admits an order Karhunen-Loève expansion in the sense that
[TABLE]
where the equality holds in and is a system of non-zero functions orthogonal on . Through our analysis, we need to consider one of the following assumptions.
- â˘
If is finite,
[TABLE]
- â˘
If ,
[TABLE]
Recall that a Gaussian vector is called non-degenerated if its variance-covariance matrix is non-degenerated, i.e., it has full rank.
Some examples of process satisfying () and () with are given by the normalized Brownian motion and any Gaussian stationary process with a spectrum that admits an accumulation point, see [10, Page 203]. For instance, the process corresponding to the Super-Resolution problem satisfies () and () with finite, namely is twice the number of observed frequencies, see Section 6.
Definition 5**.**
Let be a Gaussian process with constant variance and satisfying Assumptions () and () with finite. The quantity
[TABLE]
is called the Karhunen-Loève estimator of .
Remark 5**.**
An explicit expression of the estimator is always possible from some set of pairwise disjoint points with . We only need to check that the variance-covariance matrix of the has rank .
Remark 6**.**
Sufficiency considerations imply that is an optimal unbiased estimator for the mean-squared error by RaoâBlackwell theorem.
Given the aforementioned definition, we are now able to construct variance estimators for the process . We assume that the complex Gaussian process that define satisfies the following hypotheses for some .
[TABLE]
and satisfies the following non-degeneracy conditions:
[TABLE]
Our aim is to build, for each , two estimators of the variance independently from or . Indeed, in the following, we will distinguish two kind of statistics. The first one is the limit of the finite dimensional statistic , see Section 4. The second one is the case of the maximum over , see Section 5. Both cases wonât use the same estimation of .
- â˘
In the grid situation, we define
[TABLE]
where belongs to , denotes the correlation function of the process and set
[TABLE]
which is well defined, independent of and with constant variance . Finally, we consider the variance estimator
[TABLE]
defined at point given by (13).
- â˘
In the continuous case, we define
[TABLE]
where belongs to and set
[TABLE]
which is well defined, independent of and with constant variance . Finally, we consider the variance estimator
[TABLE]
defined at point given by (13).
Proposition 6**.**
Let satisfy () and () and set then the following claims are true under .
* is well defined and follows a distribution.*
* is well defined and follows a distribution.*
The process is independent of , and the process is independent of the random variable .
Proof.
Fix . Since satisfies (), there exists pairwise different such that is non degenerated. Then, considering , and
[TABLE]
the vector satisfies
[TABLE]
where denotes the rank of the covariance matrix of a random vector. Deduce that satisfies . This, in turn, implies that the functions
[TABLE]
are in fact in a space of dimension and a Gram-Schmidt orthogonalization in gives for the process . Finally, from , we compute that follows the desired distribution.
In the case of the regression over , remark that
[TABLE]
and where (resp. ) denote the partial derivative with respect to (resp. ). Because of hypothesis (), the two vectors and
[TABLE]
have rank so both are invertible functions of . In particular, is a linear combination of . Let and be the coefficients associated to and . By triangular combination, we deduce that the distribution of
[TABLE]
is non-degenerated and so that . Setting such that
[TABLE]
we get the non-degeneracy of
[TABLE]
where . Finally, similarly to the proof of the previous point, regression, scaling and independence prove that the rank of is so that satisfies and and that is well defined and distributed as .
This is a direct consequence of the independence of the angle and the norm for each marginal Gaussian vector build from or . â
Remark 7**.**
When the complex process admits an infinite Karhunen-Loève decomposition, we need the following modified hypothesis
[TABLE]
Indeed, for every enter , note that from the observation of the vector resp. for pairwise disjoint points , we can construct an estimator, say resp. , of with distribution resp. under . Making tend to infinity, classical concentration inequalities and Borel-Cantelli lemma prove that resp. converges almost surely to under . Thus the variance is directly observable from the entire path of . We still denote resp. this observation, where .
5.2.2 Computing the Joint Law
Hence, suppose that we observe where is unknown. Assume that satisfies () and (), and set . The regression of the Hessian on reads now
[TABLE]
because is independent of by stationarity. The variance being unknown, we estimate it using which is defined by (21). For fixed , by Claims and of Proposition 6, we know that the following random variables or random processes
[TABLE]
are mutually independent. As where is a deterministic function and as Lemma 15 shows that can be expressed as radial limits of at point , we get that
[TABLE]
and by consequence
[TABLE]
We turn now to the Rice formula described previously and introduce the notation
[TABLE]
Denote the Lebesgue measure on and let be the joint law of the couple of random variables . Under , note that is a centered Gaussian variable with variance and is distributed as a -distribution with degrees of freedom, i.e., the law of density
[TABLE]
where is the Gamma function. Then the quadruplet has a density with respect to at point equal to
[TABLE]
Using the same method as for the proof of Proposition 3 we have the following proposition.
Proposition 7**.**
Assume that satisfies (), (), () and (), and set . Then, under , the joint distribution of \big{(}\lambda_{1},{\widehat{\sigma}_{|}}/{\sigma},T_{2},{R(\widehat{z})}/{\widehat{\sigma}}_{|}\big{)} has a density with respect to at point equal to
[TABLE]
where is a positive constant that may depend on and .
Consequently, we derive the following result.
Theorem 8**.**
Assume that satisfies (), (), () and (), and set . For all , define as
[TABLE]
where is the density of the Student distribution with degrees of freedom. Under the null , the test statistic
[TABLE]
where , and is defined by (21).
Proof.
First, using Proposition 7 and the change of variable , the joint distribution of the quadruplet at point is given by
[TABLE]
Second, note that if and are two independent random variables of density and then the density of satisfies
[TABLE]
In our case, integrating over and with the change of variable , it holds
[TABLE]
Putting together, the density of at point is now given by
[TABLE]
and we conclude using the same trick as the one of Theorem 4. â
6 Applications to the Super-Resolution Theory
6.1 Framework and results
Deconvolution over the space of complex-valued Radon measure has recently attracted a lot of attention in the âSuper-Resolutionâ communityâand its companion formulation in âLine spectral estimationâ. A standard aim is to recover fine scale details of an image from few low frequency measurementsâideally the observation is given by a low-pass filter. The novelty in this body of work relies on new theoretical guarantees of the -minimization over the space of Radon measures with finite support. Some recent works on this topic can be found in the papers [11, 7, 20, 8, 3, 13, 6, 12] and references therein.
An important example throughout this paper is given by the Super-Resolution problem which can be stated as follows. Let a complex-valued Radon measure on the one dimensional torus identified to equipped with the natural circle-wise metric. Note that denotes the total variation norm on . The space can be defined as the topological dual space of continuous functions on equipped with the -norm.
Let where is referred to as the âfrequency cut-offâ. Denote by the Dirichlet kernel defined by
[TABLE]
Consider the linear operator that maps any complex-valued Radon measure onto its Fourier coefficients where
[TABLE]
for integers such that . Consider where and are i.i.d. standard Gaussian random variables for and . In the Super-Resolution frame, we observe a perturbed version of the Fourier coefficients, namely
[TABLE]
Applying âthe dual operator of , remark that we observe the trigonometric polynomial
[TABLE]
which reads as
[TABLE]
Hence, one observes and infers on assuming that it has finite support. To this purpose, consider the process defined for all by
[TABLE]
where and denote the real and imaginary part of a complex number. When , remark that the processes and are two independent and identically distributed real-valued processes with -paths. An elementary computation shows that has correlation function and has correlation function with
[TABLE]
for all and in . Remark that () holds true for . In this case, we are testing
[TABLE]
or equivalently
[TABLE]
Subtracting the known measure , remark that this framework encompasses testing problem whose null hypothesis is any single hypothesis against alternatives of the form .
Furthermore, we have the following propositions. First, we check that we can apply our results to the Super-Resolution process.
Proposition 9**.**
The process defined by (23) satisfies Condition () and Condition () with .
Then, we derive a first result when the noise level si known.
Proposition 10**.**
Under the null , the test statistic
[TABLE]
where is the standard Gaussian cumulative distribution function , its survival function, its density function, is defined by (13), (14) and
[TABLE]
Finally, we have the following result when the noise level is unknown.
Proposition 11**.**
Under the null , the test statistic
[TABLE]
where is the Student cumulative distribution function with degrees of freedom, its survival function, its density function, , , is defined by (21) and .
A proof of these propositions can be found in Appendix A.4.
6.2 A numerical study
A Python code (and a Jupyter notebook) illustrating the following numerical experiments can be found at: https://github.com/ydecastro/super-resolution-testing.
6.2.1 Computation of
To build our test statistic in the Super-Resolution context (namely ), we need to compute three quantities. The first one is , the maximum of over the torus . Its simple form allow us to use classical optimization routines, for instance scipy.optimize.minimize on Python, fminsearch on MATLAB or optim on R both combined with global resolution options on . The second one is which appears in the test statistic through the coefficients and that are simple functions of the observation and . Finally, the third one is
[TABLE]
Contrary to , there is some indetermination problem when is close to . In particular, the approximation of is by definition not exact and the radial limits of are not numerically achieved. A way to get around that is the integral form of the remainder in Taylorâs theorem. In full generality, we compute
[TABLE]
Denote by the distance between and . The numerical indetermination occurs for small values of . But remark that one can factorize in both the numerator and the denominator. This leads to the expression
[TABLE]
which is more robust in practice. In the Super-Resolution case, elementary trigonometry identities give the following simpler form of the denominator
[TABLE]
where and sinc denote the cardinal sine function, i.e.
[TABLE]
which is a numerically robust function. We conclude the optimization using the same routine as the one of .
6.2.2 Monte-Carlo experiment
In this section we compare the cumulative distribution of several statistics of test in the case where the variance is known, namely
- â˘
The statistics of the Rice test , given by Theorem 4, are displayed in blue.
- â˘
The statistics of the Spacing test , given by (16), are displayed in green.
- â˘
The statistics of the Spacing test on grids given by are displayed with a color that take the respective values green, red, purple and cyan for sizes equal to .
- â˘
The grid test, based on of Theorem 1 can be viewed as the limit of the discrete grid tests above as the size growths to infinity. As one can see in the figures, there is some evidence that this limit is numerically reached for a size .
We complete each graph by the diagonal to the cumulative distribution function of the uniform law on displayed in black. All the figures are based on simulations of the corresponding statistics.
The first figure studies the distribution of and under the Null. This figure is displayed in the introduction (see Figure 3). The second figure deals with the grid statistic and under various alternatives defined by a single spike and compares the power of the Rice test with the discrete grid tests, see Figure 4. Finally, the third figure performs the same study but with an alternative defined by two atoms, see Figure 5.
A last set of experiments is devoted to the computation of the testing procedure when the noise level is unknown, see Figure 6.
These latter numerical experiments were conducted using a Python code. The notebook testing_super_resolution.ipynb available at github.com/ydecastro/super-resolution-testing allows to reproduce these experiments.
Discussion
Figure 3 suggests that the Spacing test is highly non-conservative which is a major drawback. For instance, when , the empirical level of the Spacing test at a nominal level of 5% is in fact 11,3%, showing that this test is very non-conservative. For its part, the Rice test is exact as predicted by the theory. This numerical agreement prove that the numerical algorithm described in Section 6.2.1 is efficient.
In Figure 4 and 5 we see that the power of the discrete grid tests may seem an increasing function of the number of points of the grid. This power seems to converge since the curves associated to (purple) and (cyan) are almost indistinguishable. This suggests that the Rice test (blue) is always more powerful than the discrete grid test or the limit grid test. Consequently, it seems unbiaised for any choice of alternative.
In conclusion the Rice test seems to be the best choice even if we are still not able to prove theoretically that it is unbiased.
Acknowledgement
The authors would like to thank the referees for their useful comments and interesting remarks that have improved the presentation of this paper.
Appendix A Proofs
We denote for random variables, and (for ) means that converges to [math] in probability and is uniformly tight, respectively. Furthermore, we consider the following processes.
- â˘
The stationary process defined on with covariance function given by where we recall the correlation function is given by ,
- â˘
For every , recall the regressions with respect to
[TABLE]
- â˘
For every , recall the regressions with respect to
[TABLE]
In particular, recall that is defined by (13) so and it yields that .
A.1 Proof of Theorem 1
Since the variance is known, we consider without loss of generality that . Using the metric given by the quadratic form represented by , we can consider the closest point of the grid to by
[TABLE]
The main claim is that, while it holds a.s., we donât have the same result for , see Lemma 14. We begin with the following preliminary result, which is related to the result of AzaĂŻs-Chassan [9].
Lemma 12**.**
Under and conditionally to , follows a uniform distribution on and this distribution is independent from and .
Proof.
Remark that has uniform distribution on by stationarity and this distribution is independent from and . Let be a Borelian in . Remark that by definition of and note that . Conditionally to , it holds
[TABLE]
Since has uniform distribution on and since is a partition of , it holds that
[TABLE]
where denotes the Lebesgue measure on . â
Lemma 13**.**
Under , it holds that
.
* as goes to .*
Let be any measurable function, then tends to zero in probability at arbitrary speed.
Almost surely, one has and as goes to infinity.
Proof.
Let . By definition of and since , it holds that
[TABLE]
almost surely. Since has -paths and by Taylor expansion, one has
[TABLE]
Since is positive definite, there exists sufficiently large such that
[TABLE]
where denotes the Lowner ordering between symmetric matrices. Then, it holds
[TABLE]
From (24), (25) and (26), we deduce that
[TABLE]
using the optimality of and .
By compactness of , uniqueness of optimum and -continuity of , there exists and a neighborhood of such that for any and
[TABLE]
using again a Taylor expansion as in (25). Using (27), it holds that, on an event of probability at least and for large enough, implying that . Invoke (26), (27) and (28) to deduce that .
Using Taylor formula again, we get that
[TABLE]
By optimality of and and using (25) and (29), one gets
[TABLE]
Observing that , we get .
Conditionally to and in the metric defined by , there exists , such that the -neighborhood, denoted by , of the boundary of has relative volume (for the Lebesgue measure) less than . More precisely, denotes the set of points in with -distance less than to the boundary of . In particular,
[TABLE]
by Cauchy-Schwarz inequality. Using Lemma 12 and by homogeneity, we deduce that it holds
[TABLE]
with probability at least . It follows that
[TABLE]
using that for . Now, invoke (30) to get that
[TABLE]
On these events, we get that, for sufficiently large, and must be equal except on an event of probability at most . Furthermore, this result holds unconditionally in . We deduce that , proving . Note that is a consequence of the fact that, for sufficiently large, and must be equal except on an event of arbitrarily small size. In particular, it shows that converges towards zero in probability, which is equivalent to almost sure converge of towards zero. Claim follows when remarking that (24) proves a.s. convergence of towards . â
Lemma 14**.**
As tends to infinity, converges in distribution to .
Proof.
Let be such that , say . Let . We can write with
[TABLE]
We first prove that as tends to infinity in distribution. By compactness, remark that there exists a constant such that
[TABLE]
It also holds that
[TABLE]
Let us look to the rhs of (32) and (33). By Claim of Lemma 13 and the continuous mapping theorem, note that converges toward a.s. and we can omit these terms. It remains to prove that on the second terms are equivalent. Because of Lemma 13, converges to at arbitrary speed. Remember that (24) gives and it holds that on . It follows that there exists such that
[TABLE]
with probability greater than . As for the numerators, Eqs. (27) and (28) show that for all
[TABLE]
In this sense, we say that is uniformly equivalent to on the grid in probability. Using (34) and noticing that for any
[TABLE]
the same result holds for the denominators, namely is uniformly equivalent to on the grid in probability. We deduce that is uniformly equivalent to on the grid in probability and, passing tho their maximum, one can deduce that converges to in probability.
We turn now to the study of the local part . Again, by Claim of Lemma 13 we can replace by in the numerator of the r.h.s in (32) and we forget the first term which limit is clearly almost surely. We perform a Taylor expansion at , it gives that
[TABLE]
for any . Since , we also get that
[TABLE]
As for the denominator, invoke (24), (31) and Claim of Lemma 13 to get that
[TABLE]
where denotes the Hessian at point [math] of . Putting all together yields
[TABLE]
for any . Now we know that, in distribution, and we know that with belonging to a certain growing subset of which limit is . Finally, conditionally to , we obtain that
[TABLE]
in distribution. â
Eventually, consider the test statistic and keep in mind that X\big{(}u+(0,\pi)\big{)}=-X(u) and that if belongs to , \big{(}u+(0,\pi)\big{)} also belongs. So Theorem of [4] applies showing that, under the alternative, . It suffices to pass to the limit to get the desired result.
A.2 Proof of Theorem 2
We use the same grid argument as for the proof of Theorem 1.
Let be pairwise distinct points of , , and set
[TABLE]
Because of the first assumption of (), the distribution of is non degenerated. Consequently, following the proof of Proposition 6, we know that satisfies and . Denote the eigenfunctions of the Karhunen-Loève (KL) representation of . Note that has the same distribution as (stationarity) and that both are defined on the same space so the KL-eigenfunctions of are .
Now consider the matrix with entries which is invertible thanks to and and build so that
[TABLE]
One possible explicit expression, among many others, of , the estimator of on the grid , is
[TABLE]
which is a composition of continuous functions of . In particular, as converges a.s. to (see Lemma 13, Claim ), we deduce that converges a.s. to as goes to infinity.
Finally, since the KL estimator is unique, this estimator coincide with the estimator of [4] and Theorem 3 of [4] implies that
[TABLE]
Note that converges almost surely to and converges in distribution to (see Lemma 14) to complete the proof.
A.3 Proof of Proposition 9
(a). We can assume that defined by (22) is centered and, in this case, it holds
[TABLE]
where we recall that and for are independent standard complex Gaussian variables. Formula (35) shows that satisfies ().
(b). Let be pairwise differents, and set
[TABLE]
where is a Vandermonde matrix, invertible as soon as for all . This prove the first point of . For the second assertion, consider such that and the Gaussian vector
[TABLE]
where the covariance matrix satisfies
[TABLE]
where if are pairwise distincts. Finally, denote by the linear transformation involving the first and the last coordinate such that
[TABLE]
and remark that
[TABLE]
giving the desired non degeneracy condition.
A.4 Proof of Proposition 10 and Proposition 11
Easy computations give the following results for ,
[TABLE]
for ,
[TABLE]
and for ,
[TABLE]
where
[TABLE]
To conclude, use Proposition 9 to apply Theorem 4 and Theorem 8.
Appendix B Auxiliary results
B.1 Regularity of and new expression of
Lemma 15**.**
* admits radials limits as . More precisely for all in the unit sphere*
[TABLE]
Proof.
As tends to zero
[TABLE]
Moreover, a Taylor expansion gives
[TABLE]
and
[TABLE]
where and are directional derivative and directional Hessian. By consequence,
[TABLE]
which tends to
[TABLE]
as tends to [math] since . The result follows from . â
B.2 Maximum of a continuous process
The following result is borrowed from [17, Theorem 3] and [22].
Proposition 16**.**
Let be a Gaussian process with continuous sample paths defined on a compact metric space . Suppose in addition that:
[TABLE]
Then almost surely the maximum of on is attained at a single point.
Observe that () implies (36).
Remark 8**.**
Proposition 16 can be applied to the process which is not continuous on a compact set. We use the âpumping methodâ as follows. Use
a parameterization of as ,
polar coordinates for with origin at ,
the change of parameter
[TABLE]
that transforms the non-compact set into a compact set we have inflated the âholeâ into a ball centered around with radius one on which the process is continuous thanks to Lemma 15.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] M. Abramowitz and I. Stegun. Handbook of Mathematical Functions . Dover, New York, fifth edition, 1964.
- 2[2] R. J. Adler and J. E. Taylor. Random fields and geometry . Springer Science & Business Media, 2009.
- 3[3] J.-M. AzaĂŻs, Y. De Castro, and F. Gamboa. Spike detection from inaccurate samplings. Applied and Computational Harmonic Analysis , 38(2):177â195, 2015.
- 4[4] J.-M. AzaĂŻs, Y. De Castro, and S. Mourareau. Power of the Spacing test for Least-Angle Regression. Bernoulli , 2016.
- 5[5] J.-M. AzaĂŻs and M. Wschebor. Level sets and extrema of random processes and fields . John Wiley & Sons Inc., 2009.
- 6[6] T. Bendory, S. Dekel, and A. Feuer. Robust recovery of stream of pulses using convex optimization. Journal of Mathematical Analysis and Applications , 442(2):511â536, 2016.
- 7[7] K. Bredies and H. K. Pikkarainen. Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations , 19(01):190â218, 2013.
- 8[8] E. J. Candès and C. Fernandez-Granda. Towards a Mathematical Theory of Super-resolution. Communications on Pure and Applied Mathematics , 67(6):906â956, 2014.
