Super-resolution meets machine learning: approximation of measures
H. N. Mhaskar

TL;DR
This paper investigates the problem of approximately recovering measures from limited information, extending super-resolution concepts to measures supported on continua, with explicit recovery operators and optimal error estimates.
Contribution
It introduces a new framework for measure approximation without support separation assumptions, providing explicit recovery operators and optimal error bounds.
Findings
Explicit recovery operator for measures
Optimal bounds on approximation error
Recovery limitations for limited information
Abstract
The problem of super-resolution in general terms is to recuperate a finitely supported measure given finitely many of its coefficients with respect to some orthonormal system. The interesting case concerns situations, where the number of coefficients required is substantially smaller than a power of the reciprocal of the minimal separation among the points in the support of . In this paper, we consider the more severe problem of recuperating approximately without any assumption on beyond having a finite total variation. In particular, may be supported on a continuum, so that the minimal separation among the points in the support of is . A variant of this problem is also of interest in machine learning as well as the inverse problem of de-convolution. We define an appropriate notion of a distance between the target measure and its…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Numerical methods in inverse problems · Advanced Image Processing Techniques
Super-resolution meets machine learning: approximation of measures
H. N. Mhaskar
Institute of Mathematical Sciences, Claremont Graduate University, Claremont, CA 91711. The research of this author is supported in part by the Office of the Director of National Intelligence (ODNI), Intelligence Advanced Research Projects Activity (IARPA), via 2018-18032000002. email: [email protected]
Abstract
The problem of super-resolution in general terms is to recuperate a finitely supported measure given finitely many of its coefficients with respect to some orthonormal system. The interesting case concerns situations, where the number of coefficients required is substantially smaller than a power of the reciprocal of the minimal separation among the points in the support of .
In this paper, we consider the more severe problem of recuperating approximately without any assumption on beyond having a finite total variation. In particular, may be supported on a continuum, so that the minimal separation among the points in the support of is [math]. A variant of this problem is also of interest in machine learning as well as the inverse problem of de-convolution.
We define an appropriate notion of a distance between the target measure and its recuperated version, give an explicit expression for the recuperation operator, and estimate the distance between and its approximation. We show that these estimates are the best possible in many different ways.
We also explain why for a finitely supported measure the approximation quality of its recuperation is bounded from below if the amount of information is smaller than what is demanded in the super-resolution problem.
Keywords: Super-resolution, machine learning, de-convolution, data defined spaces, widths.
1 Introduction
This paper is motivated by two apparently disjoint areas; super-resolution and machine learning. A problem of interest in both of these areas is the approximation of a measure using a finite amount of information on the measure. Thus we wish to develop a theory of (weak-star) approximation of measures. We will describe our motivation and the connections of this work to the problem of super-resolution and the problem of machine learning in Sections 1.1 and 1.2 respectively. The aims and contributions of this paper, and its outline is given in Section 1.3. This section and the next being introductory in nature, the notation used in these two sections may not be the same as the one used in the remainder of the paper.
1.1 Super-resolution
The problem of super-resolution is stated by Donoho [12] as follows. Given observations of the form
[TABLE]
where is sequence of complex numbers, , and represents a perturbation subject to the condition that , recuperate the sequence up to an accuracy in the sense of optimal recovery, when . It is shown in [12] that this is not possible in general, but it is possible under some sparsity assumptions.
Relevant to the current paper is a generalization stated by Candés and Fernandez-Granda in [6, Theorem 1.2]. We denote the quotient space by . Let
[TABLE]
where denotes the Dirac delta measure at [math]. We assume that the moments
[TABLE]
are known for for some integer . The goal is to estimate how much degradation to expect when this data is extended for the values of with for some larger integer . Since the degradation is expected to be greater as increases, the authors propose to measure this degradation using the Fejér kernel
[TABLE]
They prove that if
[TABLE]
then a measure obtained by the solution of an optimization problem satisfies
[TABLE]
where is a positive constant. We note that the number of observations considered known is . Thus, the condition (1.3) is a lower bound on the amount of information required in terms of the minimal separation in order to guarantee a stable recovery as measured in (1.4).
There is vast amount of literature on this problem, where the problem is referred to with different names : problem of hidden periodicities (e.g., [24, Chapter IV, Section 22]), direction finding in phased array antennas (e.g., [21]), detection of singularities (e.g., [19, 13, 35]), parameter estimation in exponential sums (e.g. [17, 33]), etc. The oldest we are aware of is the paper [8] of Prony, where the problem is considered without noise. We note also that there is some effort [31, 3, 2] in the direction of overcoming this barrier in the case of univariate trigonometric setting, where the information is in the form
[TABLE]
so that each appears with multiplicity .
We will not even begin to list the many modern works in the context of the periodic problem. The problem has been studied also in other settings; for example, the sphere (e.g., [4, 5]) or the rotation group [18], where the exponential monomials are replaced by eigenfunctions of the Laplace-Beltrami operator. In all this work, it is required that the number of Fourier coefficients known about the measure is at least a constant multiple of , where is the dimension of the space involved.
This sort of condition seems to be an inherent barrier, we will refer to it as the minimal separation barrier. We note that any finite set of points will have a positive minimal separation; the condition refers to the amount of information necessary to recuperate the measure to given accuracy.
In this paper, we are interested in approximating an arbitrary measure; not just a measure supported on a finite set. When the support of the target measure is a continuum, then the minimal separation is [math], and any recuperation with a finite amount of noisy data is arguably beyond super-resolution. Of course, even in the absence of noise, an exact recuperation cannot be expected in general using only a finite amount of information. On the other hand, an approximate recovery is very standard in the trigonometric case; the entire section [36, Chapter 2, Section 8] describes already many constructions, some of which are used in [19] in the case of finitely supported measures.
To summarize, the question of overcoming the minimal separation barrier in super-resolution problems for point masses can be viewed as the problem of efficient approximation of a measure given a finite amount of information about the measure.
1.2 Machine learning
A central problem in machine learning is to find a target function on a space , equipped with a probability measure , given the information , , for some points chosen randomly from , where is a random noise. Typically, the function is defined on a very high dimensional space, say a space of dimension . In this case, there are well known results in approximation theory, known as width theorems, which give a lower bound of the form on how accurately one can approximate a function, for which the only available a priori information is that it belongs to a smoothness class indexed by (e.g., a Sobolev class) [9]. This is known as the curse of dimensionality.
In recent years, deep networks have caused a revolution in machine learning, with many spectacular achievements in industrial problems. It is therefore an important problem to examine why and when deep networks perform better than the so called shallow networks. We have argued in [30] that one reason that deep networks perform better than shallow networks is that many functions of practical interest have a compositional structure which deep networks can exploit and shallow networks cannot. For example, suppose that the target function is known to have the structure
[TABLE]
where the functions , , are continuously differentiable on a cube . A shallow network of the form
[TABLE]
where is a suitable activation function, yields an approximation [26].
However, if we use the same construction as in [26] to obtain shallow networks , each with terms to approximate respectively, one gets an accuracy in each approximation. By the triangle inequality the deep network given by
[TABLE]
yields an accuracy , using only parameters.
This theory suggests on the other hand that deep networks do not give any advantage when there is no curse of dimensionality. There is some research exploring other prior assumptions on the target function which ensure that there is no curse of dimensionality of the kind described above.
Let us illustrate the situation by an example. Assume that admits a representation of the form
[TABLE]
for some measure having a bounded total variation on , then it is known that one can obtain approximations to by linear combinations of the form with the degree of approximation being dimension-independent in terms of , often tractable as well (e.g., [1, 20, 22, 23, 27]). The total variation of , sometimes known as the -variation of , plays the role of the norm of the derivatives of in classical approximation theory. The proofs of such theorems depend upon a probabilistic argument, and are not constructive. Therefore, in practice, the parameters , are determined using some learning algorithm.
Thus, in theory, the problem is to determine given finitely many samples of , even though the number of these samples may not be dimension independent. Let us assume that is a compact Riemannian manifold on which has a Mercer expansion of the form for some orthonormal system . Then it is shown under some conditions in [16] (see also [14]) that one can obtain quadrature formulas to integrate all linear combinations of the first few functions ; the number of these functions dependent on the number of points at which the values of are available. Thus, in theory, the inverse problem of recuperating from the samples of is reduced to recuperating (respectively, its discretized version) using finitely many Fourier coefficients of with respect to the system .
A question of theoretical interest here is to estimate the degree of approximation of in a suitable sense in terms of the number of Fourier coefficients that can be computed reliably from the data. This is the same problem that we were led to in our musings on super-resolution in Section 1.1, thereby establishing a close connection between these two problems.
1.3 Contributions of this paper
The problem of approximating a measure in the weak-star sense is inherently different from that of approximation of functions, and also from that of an exact or approximate recuperation of the support of the measure. In particular, in the context of machine learning and probability density estimation, it is customary to use a “convolution” with a positive kernel. As an approximation device, it is well known that this is doomed not to give a good approximation. However, if we use a non-positive kernel to guarantee good approximation as we propose to do in this paper, then there are some difficulties in recuperating the support of the measure exactly and directly without some non-linear operations such as thresholding and clustering. In this paper, we are focused on approximation of measures, and will postpone the discussion of these other issues to future work.
We will describe how we address the difference between approximation of functions and that of measures in the current paper.
To study an approximation problem, one needs the notion of a distance between two objects and some notion of smoothness of the target object to be approximated. In classical theory of function approximation, there are standard ways of defining both of these; e.g., in approximating a continuous -periodic function by trigonometric polynomials, one uses the uniform norm and the smoothness is measured by an appropriate modulus of smoothness. Such questions have been studied in many different contexts, e.g., [10].
In contrast, there is no standard definition for measuring the distance between two measures so that the convergence in the topology corresponds to weak-star convergence. There are several ways of defining such a distance in different application domains, and we will list a few of these in Section 2 to motivate our own definition. However, we are not aware of any standard smoothness class for measures. In our Theorem 4.1 below, we will observe that with no assumption on the target measure, the degree of approximation depends entirely on the definition of the distance. Since an estimate on the degree of approximation is typically (and in our theorem, is) achieved using a specific construction (given in (4.1)), this surprising fact gives rise to the question whether one could obtain better estimates using a different construction, or even using a different kind of information about the measure. We will discuss these issues in Theorem 4.3 and Theorem 4.4. In particular, Theorem 4.4 provides one explanation for the minimal separation barrier in super-resolution of point masses as described in Section 1.1. Another natural question to ask is to understand what a better estimate on the degree of approximation allows us to conclude about the measure, analogous to the converse theorems of approximation theory. A trivial case is when the target measure is absolutely continuous with respect to a base measure, and the Radon-Nikodym derivative is then approximated as in the classical function approximation paradigm. We will prove in Theorem 4.2 that an improvement on the approximation bounds in Theorem 4.1 implies that the target measure is in fact absolutely continuous with respect to a base measure and the derivative is in the right smoothness class as expected in the theory of function approximation.
In Section 2, we review a few notions of distance between measures in order to motivate our Definition 3.2. In Section 3, we develop the set up for our theory, and establish some notation to be used in the subsequent sections. The main results are discussed in Section 4, and the proofs of all the new results are given in Section 5.
We thank Professor Dr. Hans Feichtinger for many useful comments on the presentation in this paper.
2 Distance between measures
In order to discuss the quality of approximate recuperation of measures, we need first to develop a notion of distance between measures. There are many ways of defining a distance. We mention a few of these to motivate Definition 3.2 which we will use in this paper.
In the univariate case, a very old way to define a distance is the Erdős-Turán discrepancy (known also as Kolmogorov-Smirnov statistic in statistics and star discrepancy in information based complexity). In the context of measures on (identified with ), this is defined for a signed measure with by
[TABLE]
A comparison of Fourier coefficients shows that
[TABLE]
where denotes the Bernoulli spline defined by , . In the form (2.2), this notion of discrepancy is generalized using many different kernels on different high dimensional domains (e.g., [32, 11]). A similar notion in statistics is the so called maximum mean discrepancy (MMD), defined by
[TABLE]
for some measure space and a positive definite kernel defined on this space.
Another popular distance between measures is the -Wasserstein distance. Let be a metric space and be a Borel measure on this space with . One of the equivalent definitions of this distance is given by where the supremum is over all Lipschitz continuous functions on with Lipschitz constant . If is a manifold, is the Laplace-Beltrami operator on , an analogue of this distance, more responsive to the manifold structure, is obtained by taking the supremum over all functions with . Denoting the Green function for by , this in turn is equivalent to
[TABLE]
Finally, we note that the estimate (1.4) utilizes a semi-norm of the form
[TABLE]
where is the Fejér kernel with Fourier coefficients equal to [math] outside of .
3 Notation and definitions
In this section, we describe the general set up for our discussion, and establish notation.
Let be a locally compact metric measure space, with denoting the metric on , and being a distinguished positive measure on . In the sequel, only complete, sigma finite, Borel measures are considered, defined on a sigma algebra containing all Borel subsets of . In the sequel, -measurability will be understood in the sense of membership in this fixed sigma algebra.
For , -measurable, and a -measurable function we write
[TABLE]
denotes the class of all –measurable functions for which , where two functions are considered equal if they are equal –almost everywhere. We will omit the mention of if and that of if . Thus, . For , we define with the usual understanding that , . The symbol denotes the space of all continuous real functions on vanishing at infinity; . The symbol will denote the dual space of ; i.e., the class of all regular, Borel, measures with bounded total variation.
We also need a non-decreasing sequence of real numbers, and an (-) orthonormal system of functions in . We assume that , and . In addition we assume that the system is fundamental in both and .
Definition 3.1
The system is called an admissible system if
For each and , the ball is compact. 2. 2.
There exists and such that for , ,
[TABLE] 3. 3.
For , ,
[TABLE]
Remark 3.1
In some of our other papers we have referred to an admissible system in the sense of the above definition as a data defined space. This is motivated by an idea for semi-supervised learning, called diffusion geometry/manifold learning. One assumes that the data for this kind of machine learning problem lives on an unknown low dimensional sub-manifold of a high dimensional Euclidean space. The learning takes place based on the eigen-decomposition of a suitably constructed graph Laplacian. In theory, one may assume the eigen-decomposition of the heat kernel with respect to an elliptic differential operator on the manifold itself. The properties of this heat kernel play a central role in the theoretical development. In particular, it is shown in [29, Theorem 4.3] that the condition (3.2) implies the localization properties of the kernels defined in (3.7) below; which in turn, plays a crucial role in this paper via Proposition 5.1. **
**Constant convention:
**
In the sequel, the symbols will denote generic positive constants depending only on the system and other constant parameters under discussion. Their value will be different at different occurrences, even within a single formula. The notation means .
We now define a candidate for a semi-norm on which will be used in this paper.
Definition 3.2
Let be a kernel that admits a formal Mercer expansion , where for every . For and , we define formally
[TABLE]
We will be particularly interested in the following class of kernels (cf. [28]):
Definition 3.3
Let . A function will be called a mask of type if is an even, times continuously differentiable function such that for , for some such that , , , and , . A function will be called a kernel of type if it admits a formal expansion for some mask of type . If we wish to specify the connection between and , we will write in place of .
Example 3.1
We consider . If , then the kernel defined formally by
[TABLE]
is a kernel of type . **
Example 3.2
We consider the unit sphere . If , the kernel defined formally by is a kernel of type ([34, Section 9.3(4)]). **
When is a kernel as defined in Definition 3.3, is a norm consistent with the weak-star topology on . We will give a proof of the following simple proposition in Section 5.
Proposition 3.1
Let , , be a kernel of type . Then the functional defines a norm on . If is a sequence in , then if and only if .
Next, we define some * smoothness classes of functions* in terms of their degree of approximation by linear combinations of . We define
[TABLE]
and . If , we denote . Following [25], we refer to the elements of as diffusion polynomials. The -closure of is denoted by ; i.e., if , and if .
If and , we define
[TABLE]
If then the smoothness class is the set of all such that
[TABLE]
Our main tool in the recuperation of measures is a localized kernel. Given a compactly supported function , we define:
[TABLE]
For , we define formally
[TABLE]
We write
[TABLE]
Then
[TABLE]
We note that can be identified with the measure . In general, if is absolutely continuous, so that for some , then by an abuse of the notation we write for , and likewise, for .
4 Main results
Our first objective is to estimate the degree of approximation in recuperating a measure from noisy measurements of the form , for with . Toward this end, we fix in the rest of this paper, an infinitely differentiable, even function such that is non-increasing on , if , if . The constants will depend upon as well.
The approximation to is the measure , defined spectrally by
[TABLE]
We find it convenient to denote the noiseless recuperation measure by ; i.e.,
[TABLE]
for all Borel subsets .
The following theorem shows that the rate at which the degree of approximation of by (as a function of ), measured in the norm given in Definition 3.2, decreases to [math] depends only on the kernel . There is no natural way to define a smoothness of the measure .
Theorem 4.1
Let , , be a kernel of type , and , be defined by (4.1). Let . Then
[TABLE]
Moreover, for the high pass filter , we have
[TABLE]
Remark 4.1
We compare this theorem with [6, Theorem 1.2] described in Section 1.1. The analogue of the high pass filter is given by . Note that, unlike (1.4), the noise term has a decreasing influence in the high pass range. Analogous to the kernel , the kernel gives a lower weight to the higher frequencies, but unlike the kernel , the kernel includes all the high frequency components.
We note that there is no longer any assumption on the minimal separation among the points in the support of the target measure . An exact recovery is in general impossible, even in the noise-free case. Our construction in (4.1) being general, does not give an exact recuperation also in the case of finitely supported measures without some further processing, which is not within the scope of this paper. However, the result is applicable for measures defined on a very general space, and does not require the verification of a signature polynomial as in [7]. Therefore, we expect that the approximation is easier to construct so as to obtain a good approximation, even if no exact recovery is possible. **
Remark 4.2
Let admit a representation of the form
[TABLE]
for some measure . A comparison of Fourier coefficients shows that
[TABLE]
Therefore, Theorem 4.1 implies
[TABLE]
In particular, in the case , we get bounds nominally sharper than those in [25]. Rather than assuming a condition on in terms of pseudo-differential operators (informally, choosing to be a Green function of a pseudo-differential operator), we allow a more general kernel . Also, we no longer require the object defined spectrally by , , to be a function in , but allow it to be a measure. It is explained in [28, 14] how to discretize the quantity based on values of at scattered data points. This leads to a constructive procedure to obtain an approximation to by sums of the form [28]. However, the error bounds are not dimension independent. Dimension independent bounds can be obtained using concentration inequalities in a probabilistic sense, but then the proof is not constructive. **
Next, we address the question whether one can improve upon the bounds in (4.3). For simplicity, we consider the noiseless case; i.e., assume in the sequel that . The first theorem below states that one cannot improve the factor of to except in “trivial” cases; i.e., when for some , so that results from function approximation are applicable directly. Thus, in the case when , the estimate (4.3) cannot be improved.
Theorem 4.2
*Let , , , be a kernel of type , and for each , be defined by (4.2). Then the following are equivalent:
(a) There exists such that .
(b) We have*
[TABLE]
Another way to examine a possible improvement in (4.3) is using the notion of non-linear widths. We note that the recuperation measure depends upon the parameters , for such that ; i.e., as many parameters as the dimension of . In most manifolds, the eigenfunctions of the Laplace-Beltrami operator satisfy an additional estimate given in (4.6) below (see [15, 16] for a fuller discussion). In the general set up which we are working with, it is therefore reasonable to assume that there exists such that
[TABLE]
Under this assumption, it is not difficult to verify that the dimension of is . Thus, in terms of the number of parameters used in the recuperation, the bound (4.3) for the case is . We now proceed to show that this is the best possible.
Let be a weak-star compact subset of . We denote by the set of all weak-star continuous mappings from (parameter selection maps). An algorithm is a mapping . Thus, for any algorithm and parameter selection , and , is an attempted reconstruction of from the data using the algorithm . We define
[TABLE]
and the nonlinear width of in the sense of by
[TABLE]
Theorem 4.3
Let be compact, , , be a kernel of type . We assume further that there exists such that (4.6) holds. Let
[TABLE]
Then for integer ,
[TABLE]
We end this section with a width result that demonstrates that the minimal separation is an essential barrier to the recuperation of finitely supported measures, not just from the Fourier information, but from any robust parameter selection. Toward this end, let and
[TABLE]
Although we do not prescribe the exact number of point masses in the definition above, when is compact, then a volume argument shows that this number cannot exceed . It is not difficult to show in this case that is a compact subset of .
Theorem 4.4
Let be compact, , , be a kernel of type . We assume further that (4.6) holds. Then for integer ,
[TABLE]
Remark 4.3
We remark that is a decreasing function of . Therefore, the estimate (4.11) shows a lower limit on how accurately a finitely supported measure with the minimal separation of its support equal to can be approximated using continuously selected parameters. **
Remark 4.4
In the case when , and is measure supported on points, then the Prony method can recuperate the measure exactly using parameters, regardless of minimal separation among the points. This is not a contradiction to Theorem 4.4, which refers to the worst case error for approximating measures in . For any , the class contains a measure supported on points and for this measure, .**
5 Proofs
In the sequel, if , we will write . If is a finite set, we define
[TABLE]
For , we define
[TABLE]
so that
[TABLE]
In the sequel, we write , .
We recall the following results from [28]. Although the set up there is that of a compact smooth manifold without boundary, the proofs are verbatim the same for admissible spaces.
Proposition 5.1
*Let , , be a mask of type .
(a) We have*
[TABLE]
(b)* If then for every , there exists such that , . We have*
[TABLE]
(c)* If , , then*
[TABLE]
(d)* If , is compact, and (4.6) holds, then for any , , ,*
[TABLE]
Proof. The second inequality in (5.4) is proved in [28, Eqn. (5.3)]. The first inequality in (5.4) follows easily from [28, Eqn. (5.11)]. Part (b) is proved in [28, Proposition 5.2]. Part (c) is proved in [28, Eqn. (5.33)], used with . Part (d) is proved in [28, Theorem 3.4].
Proof of Proposition 3.1.
We note that Proposition 5.1 shows that is defined for all . Hence, (5.5) shows that for any , is well defined, and hence, so is .
It is clear that is a semi-norm. If and then for all ; i.e., for all . Since the system is fundamental in , this implies that . The fact that implies that for all , which in turn implies that . Conversely, if then for each ,
[TABLE]
The dominated convergence theorem now leads to the fact that .
Proof of Theorem 4.1.
Using Fubini’s theorem and then making a change of dummy variables, we see that for ,
[TABLE]
Hence, (5.5) leads to
[TABLE]
This proves (4.3). The proof of (4.4) is similar; the last term in the middle expression in (5) does not appear in this case.
It is convenient to organize some details of the proof of Theorem 4.2 in the following lemma.
Lemma 5.1
Let , , , and be a mask of type . Then for any ,
[TABLE]
Proof. In view of (5.4) with , we have for any real and mask of type .
[TABLE]
Consequently, Young inequality shows that for and ,
[TABLE]
In this proof, let . Then is supported on . Analogous to (5.11), we see that for , (5.4) and Young’s inequality lead to
[TABLE]
In view of the fact that for all in the support of , we have
[TABLE]
Therefore, using (5.11) with in place of , the fact that for all , and (5.12), we conclude that
[TABLE]
Hence,
[TABLE]
Since is a mask of type , this leads to (5.10).
Proof of Theorem 4.2. Let . Since is supported on , for all . Therefore, using (5.12), we obtain for any that
[TABLE]
i.e.,
[TABLE]
Since , this yields, together with (5.10) that
[TABLE]
Therefore,
[TABLE]
Thus, part (a) implies part (b).
Conversely, let part (b) hold. Then
[TABLE]
In view of (5.10) this leads to
[TABLE]
This implies that the sequence
[TABLE]
converges in to some . Moreover, for all . Therefore, . Further, (5.15) shows that
[TABLE]
Thus, .
Our proof of Theorems 4.3 and 4.4 depends upon another notion of widths, the so-called Bernstein width. This is defined by for a weak-star compact subset and integer by
[TABLE]
where the supremum is over all subspaces of with dimension . It is proved in [9, Theorem 3.1] that for any integer ,
[TABLE]
Proof of Theorem 4.3.
Let be a maximal separated subset of , where is the constant appearing in the upper bound (3.1) on the -measure of balls. Then . In view of (3.1),
[TABLE]
Therefore, satisfies . We consider the dimensional space , where denotes the Dirac delta at . For any , Proposition 5.1(d) shows that
[TABLE]
In view of (5.17), this leads to (4.9).
Proof of Theorem 4.4.
Choosing so that , the elements of the space constructed in the proof of Theorem 4.3 serves also for this theorem in order to apply (5.17).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. R. Barron. Neural net approximation. In Proc. 7th Yale Workshop on Adaptive and Learning Systems , volume 1, pages 69–72, 1992.
- 2[2] D. Batenkov. Stability and super-resolution of generalized spike recovery. Applied and Computational Harmonic Analysis , 2016.
- 3[3] D. Batenkov and Y. Yomdin. On the accuracy of solving confluent Prony systems. SIAM Journal on Applied Mathematics , 73(1):134–154, 2013.
- 4[4] T. Bendory, S. Dekel, and A. Feuer. Exact recovery of dirac ensembles from the projection onto spaces of spherical harmonics. Constructive Approximation , 42(2):183–207, 2015.
- 5[5] T. Bendory, S. Dekel, and A. Feuer. Super-resolution on the sphere using convex optimization. IEEE transactions on signal processing , 63(9):2253–2262, 2015.
- 6[6] E. J. Candès and C. Fernandez-Granda. Super-resolution from noisy data. Journal of Fourier Analysis and Applications , 19(6):1229–1254, 2013.
- 7[7] E. J. Candès and C. Fernandez-Granda. Towards a mathematical theory of super-resolution. Communications on Pure and Applied Mathematics , 67(6):906–956, 2014.
- 8[8] B. G. R. De Prony. Essai éxperimental et analytique: sur les lois de la dilatabilité de fluides élastique et sur celles de la force expansive de la vapeur de l’alkool,a différentes températures. Journal de l’école polytechnique , 1(22):24–76, 1795.
