Error localization of best L1 polynomial approximants
Yuji Nakatsukasa, Alex Townsend

TL;DR
This paper establishes a connection between best L0 and L1 polynomial approximants for corrupted polynomials, demonstrating an error localization property and proposing an improved approximation algorithm.
Contribution
It introduces a continuous analogue of compressed sensing principles for polynomial approximation and develops an enhanced method for computing best L1 polynomial approximants.
Findings
Best L0 and L1 polynomial approximants are nearly equal for corrupted polynomials.
Error localization property of best L1 polynomial approximants is demonstrated.
An improved algorithm for computing best L1 polynomial approximants is proposed.
Abstract
An important observation in compressed sensing is that the minimizer of an underdetermined linear system is equal to the minimizer when there exists a sparse solution vector and a certain restricted isometry property holds. Here, we develop a continuous analogue of this observation and show that the best and polynomial approximants of a polynomial that is corrupted on a set of small measure are nearly equal. We go on to demonstrate an error localization property of best polynomial approximants and use our observations to develop an improved algorithm for computing best polynomial approximants to continuous functions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Digital Filter Design and Implementation · Image and Signal Denoising Methods
Error localization of best L1 polynomial approximants††thanks: Submitted to the editors .
\fundingThe National Institute of Informatics in Tokyo partially funded an extended collaboration visit between the authors in December 2018, where the majority of this research took place. The first author is supported by the JSPS grants no. 17H01699 and 18H05837. The second author is supported by the National Science Foundation grant no. 1818757.
Yuji Nakatsukasa
Mathematical Institute, University of Oxford, Oxford, OX2 6GG, UK ([email protected]).
Alex Townsend
Department of Mathematics, Cornell University, Ithaca, NY 14853. ([email protected]).
Abstract
An important observation in compressed sensing is that the minimizer of an underdetermined linear system is equal to the minimizer when there exists a sparse solution vector and a certain restricted isometry property holds. Here, we develop a continuous analogue of this observation and show that the best and polynomial approximants of a polynomial that is corrupted on a set of small measure are nearly equal. We go on to demonstrate an error localization property of best polynomial approximants and use our observations to develop an improved algorithm for computing best polynomial approximants to continuous functions.
keywords:
polynomial approximation, best , compressed sensing, best , restricted isometry property, error localization
{AMS}
65F15, 15A18, 15A22
1 Introduction
In compressed sensing the minimizer of an underdetermined linear system can be exactly recovered by the minimizer when the minimizer is sufficiently sparse and satisfies some regularity conditions [11, 15, 18]. Similarly, when an acquired signal is sparsely corrupted, one can exactly recover the original signal by minimizing the error, under suitable assumptions [13]. In this paper, we investigate a continuous analogue of this phenomenon and show that the best and polynomial approximants of corrupted polynomials (see Definition 1.1) are equal, under suitable assumptions (see Section 2). We also make precise a related observation that the best error can be concentrated to intervals of small measure, showing that they can be advantageous compared to minimax approximants for certain applications (see [30]).
Let be a continuous function and an integer. The best polynomial approximant, , of degree to exists, is unique [27, Thm. 14.3], and satisfies
[TABLE]
where is the space of polynomials of degree . While the minimax approximant, , is the best approximant in the sense that , where is the maximum norm, we know by the equioscillation theorem that the maximum deviation is attained times [27, Thm. 7.2]. On the other hand, it can frequently be observed that for most, but not all, (see Fig. 1 and Section 4). To make this observation precise, we define the set111The constant of in the definition of (see Eq. 2) is an arbitrary choice as any constant in would do, with very minor changes to the results that we derive.
[TABLE]
For any we know that is a better approximation to than . By the definition of , is not the empty set, but we often observe that as (see Section 4). For example, in Section 4 we prove that for and for . In such cases we say that the error is “highly localized”. This property of best approximation seems to be underappreciated and is related to observations from compressed sensing.
The highly localized nature of means that best polynomial approximation is ideal for recovering functions that have been arbitrarily corrupted on a set of small measure.
Definition 1.1**.**
For , we say that a function is a -corrupted function if can be written as
[TABLE]
*where is a continuous function, is a measurable function with , and denotes the Lebesgue measure of the support of on . Note that the support of , denoted by , is a closed subset of .
If is a polynomial of degree in Definition 1.1, then we say that is a corrupted polynomial. If, in addition, for some integer , then one finds that the best polynomial approximant of degree to is unique and (see Corollary 2.6). This means that best approximation exactly recovers a corrupted polynomial with arbitrary corruption, provided that the corruption has small enough support.
Figure 2 illustrates the four regimes that one typically observes with best approximants of degree of : (a) If , then , but is a near-best approximant to (see Section 3), (b) If is small and , then one gets exact recovery as (see Corollary 2.6), (c) If is a little larger, then tries to fit corruptions near but not the corruptions away from , and (d) When is large, tries to fit all the corruption, resulting in an overfit.
We go on to derive an efficient algorithm for the recovery of from by showing that the continuous optimization problem in Eq. 1 for can be reduced to a linear programming problem, provided that a sampling condition is satisfied (see Theorem 2.1). This observation results in a computationally efficient algorithm for the exact recovery of corrupted polynomials (see Section 2.3).
It is worth emphasizing that the Lebesgue measure of the support of the corruption must be extremely small. For example, our theory only guarantees that a corrupted polynomial of degree can be exactly recovered if it is corrupted on a set of measure . Nevertheless, in practice, we observe that exact recovery is usually still possible when the corruption occurs on sets that have a much larger measure. Moreover, the distribution of the corruption in does matter. In particular, larger regions of corruption are allowed away from and we present an initial result in this direction (see Theorem A.1). For example, when exact recovery is still guaranteed with any corruption interval of the form with .
The error localization properties of best approximants lead to an iterative algorithm for computing given a continuous function , based on a combination of linear programming and Newton’s method (see Section 5). This can be seen as an improvement on Watson’s algorithm [20, 34]. Our algorithm allows for the zero set of to have positive measure and heavily employs algorithmic advances over the last decade in polynomial rootfinding and adaptive Chebyshev interpolants [4, 25]. In particular, our implementation greatly benefits from the adaptive and robust algorithms for computing with functions in Chebfun.222Chebfun is an object-oriented software system written in MATLAB that provides an environment to compute with piecewise smooth functions [25]. It represents univariate functions defined on a finite interval by piecewise Chebyshev interpolants of adaptively selected degrees that are accurate to essentially machine precision [16]. It is able to accurately compute best approximants of degrees in the thousands (see Section 5).
In addition to the -norm (see Eq. 1), we also define the following for continuous functions :
[TABLE]
where are weights so that as . Despite the notation, is not a norm. For completeness, we also define as the Lebesgue measure of the support of . We always take in the discrete norms and to be the roots of the degree Chebyshev polynomial of the second kind [24, Tab. 18.3.1]. That is,
[TABLE]
Accordingly, we take in Eq. 3 so that the corresponding quadrature rule is related to the Gauss–Chebyshev rule. The Chebyshev polynomials of the second kind and their roots in Eq. 4 play a special role in best approximation [27, Ch. 14]. In particular, when , the polynomial interpolant of at the points in Eq. 4, i.e.,
[TABLE]
is the best polynomial approximation of degree to if has exactly distinct zeros in [8, 26].
For an integer , we denote by , , , and any best , , , and polynomial of degree to , respectively. These polynomials are solutions to the following optimization problems:
[TABLE]
We also define , when best polynomial in this sense exists.
The paper is structured as follows. In Section 2, we show that the exact recovery of an arbitrarily corrupted polynomial is possible provided that the support of the corruption has small enough measure. This leads to an efficient algorithm to achieve recovery. In Section 3, we extend these ideas to the near-recovery of corrupted smooth functions. In Section 4, we show that is small precisely when faster than as and carefully consider two worked examples with error localization. Finally, in Section 5, we present our iterative algorithm for computing best polynomial approximants of continuous functions.
2 Exact recovery of corrupted polynomials
In this section we suppose that is formed by an arbitrarily corrupted polynomial, i.e., , where is a polynomial of degree and is a function with small support. We investigate the question: When is it possible to exactly recover from knowledge of ?
We show that for corrupted polynomials, we have provided that the support of is sufficiently small, , and enough of the samples in Eq. 4 lie outside of the support of .
Theorem 2.1**.**
Let be a -corrupted polynomial of degree . Then, the following statements hold when :
If , then . 2. 2.
If , or fewer, of the samples are in , then . 3. 3.
If of are in , N+1>{\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}6}(n+1)k-1, and , then . 4. 4.
If , then .
We prove the four statements in the theorem, in turn, in the next four subsections.
2.1 Exact recovery with best approximation
Intuitively, recovery of a corrupted function is ideal for best polynomial approximation as the Lebesgue measure of is minimized. The polynomial approximant is so good at recovery that when we have provided that is less than half the interval and .
To see this, note that . Suppose there is a polynomial of degree such that . Then, so and must coincide on a set of positive measure in . Since and are polynomials and , we have that . We conclude that provided that and . This proves the first statement of Theorem 2.1.
2.2 Exact recovery with best approximation
It can be algorithmically challenging to compute and it is reasonable to attempt recovery from instead, which involves a discrete optimization problem. The polynomial approximant is also ideal at recovering polynomials under the mild assumption that enough of the samples (see Eq. 4) lie outside of .
To see this, suppose that and there is a polynomial of degree such that
[TABLE]
where and is the number of samples in . If , then is zero on at least distinct points and hence [27, p. 34]. By definition of , we must have . This proves the second statement of Theorem 2.1.
2.3 Exact recovery with best approximation
The polynomial can be computationally prohibitive to compute if is large. Fortunately, by using the restricted isometry property (RIP) from compressed sensing, one finds that when an oversampling condition is satisfied, along with some regularity assumptions. This means that , which can be computed efficiently, can often be used for exact recovery [3].
First, we know that for is equivalent to a vector having precisely nonzero entries, where
[TABLE]
and is the Chebyshev polynomial of the second kind of degree [24, Tab. 18.3.1]. The problem of minimizing over in Eq. 8 is solved by and can be written as
[TABLE]
which is equivalent to the following diagonally-scaled problem:
[TABLE]
By a technique described in [13, p. 4204], if is a matrix whose columns form a basis for the left null space of so that , then Eq. 9 is also a constrained minimization problem:
[TABLE]
where . This problem is precisely the task of interest in the compressed sensing literature with a short-fat matrix and an unknown sparse vector .
The minimization problem (11) is known to be NP-hard [18, Sec. 2.3]. A practical remedy is to replace the norm with the norm. To understand when this gives the solution to the problem, an important concept in compressed sensing is the RIP. We say that a matrix satisfies the RIP if there exists a constant such that
[TABLE]
for every vector that has at most nonzero entries [13]. It is known that if satisfies the RIP with , then the solution to Eq. 11 is exactly recovered (under the assumption that the -minimizer nonzero entries) by solving the minimization problem [9]
[TABLE]
Here, Eq. 13 can be efficiently solved as a basis pursuit problem via the spectral projected-gradient (SPGL1) algorithm [32]; especially, since there is a fast matrix-vector product for based on the discrete sine transformation (see Eq. 16).
Note that unlike in Eq. 3 for functions, the norm for vectors is simply the sum of the absolute values of the vector entries. The problem in Eq. 13 is equivalent to
[TABLE]
which in turn can be written as (recalling Eq. 3) the best approximation problem:
[TABLE]
We conclude that if the matrix satisfies the RIP with then we have , where
[TABLE]
and the vector is the solution to Eq. 14.
We are left with the task of studying when the matrix in Eq. 11 satisfies the RIP with . For the samples that are given in Eq. 4, we have the discrete orthogonality condition for [23, Sec. 4.6.1] so that we can write down an explicit basis for the left null space of in Eq. 8. That is,
[TABLE]
It turns out that due to the choice of the diagonal matrix in Eq. 10, the matrix in Eq. 16 is formed from a subset of columns of an orthogonal matrix. Furthermore, the size of need not be extremely short-fat, as often required in compressed sensing. It is therefore possible to show that satisfies the RIP under a mild oversampling condition.
Proposition 2.2**.**
*If for some integer , then in Eq. 16 satisfies the RIP with . *
Proof 2.3**.**
Let be the Chebyshev–Vandermonde matrix, i.e., for , where is given in Eq. 4. Let be a diagonal matrix with for . By the discrete orthogonality properties of Chebyshev polynomials of the second kind [23, Sec. 4.6.1], is an orthogonal matrix with
[TABLE]
where and are given in Eq. 8 and Eq. 16, respectively. Since has orthonormal columns, we find that
[TABLE]
Since for [24, (18.14.7)], each entry of has absolute value it follows by Cauchy–Schwarz that each entry of is bounded by where is the number of nonzero entries in , so we have
[TABLE]
Therefore, from Eq. 17 and the trivial bound of , we conclude that
[TABLE]
*for any vector with at most nonzero entries. The statement immediately follows from the definition of the RIP (see Eq. 12). *
Proposition 2.2 tells us that in Eq. 16 satisfies the RIP with if . Since is the number of samples that lie in , it means that provided that the discrete problem is sufficiently oversampled. Since implies that when and when we need , we conclude from Section 2.2 that if and , then when . This proves the third statement of Theorem 2.1.
The polynomial can be computed by solving the basis pursuit problem in Eq. 13. This means that Proposition 2.2 gives us a practical and efficient algorithm for the exact recovery of corrupted polynomials with degrees in the thousands. Often it is the case that one does not know the degree of the corrupted polynomial or . Since the oversampling condition penalizes taking unnecessarily large , we recommend slowly increasing , computing the error , and stopping at the smallest for which .
2.4 Exact recovery with best approximation
To begin to highlight the importance of error localization of best polynomial approximants, we now show that can also be used for exact recovery of corrupted polynomials when the corruption has sufficiently small support. One can achieve this by demonstrating that a polynomial of degree is not too concentrated in any small subset of .
Lemma 2.4**.**
Let be a set of Lebesgue measure . For any , we have
[TABLE]
*for any polynomial of degree . *
Proof 2.5**.**
*This statement is proved in [5, Sec. 4.2, Exercise 6]. *
Lemma 2.4 tells us that polynomials of degree cannot be too localized in a set of small measure. In particular, if , then
[TABLE]
with equality if and only if is the zero polynomial. A consequence of Eq. 20 is that a corrupted polynomial can be exactly recovered by best polynomial approximation.
Corollary 2.6**.**
*Let be a -corrupted polynomial of degree on . Then, the best polynomial approximant of degree to is if and . *
Proof 2.7**.**
Let and let be the support of . Since , we have by the triangle inequality
[TABLE]
*where the last inequality follows from Eq. 20 as well as the fact that for . An equality holds in Eq. 21 if and only if . We conclude that is the unique best polynomial approximant to of degree . *
This proves the fourth and final statement of Theorem 2.1 and explains regime (b) in Fig. 2. It tells us that if a polynomial is corrupted on a subset of that has small enough Lebesgue measure, then the best polynomial approximant exactly recovers the polynomial. Figure 3 illustrates Corollary 2.6 for the corrupted polynomial , where is the degree Chebyshev polynomial of the first kind and . Using the fact that , one can efficiently recover to within essentially machine precision. Numerically, we find that .
To highlight the importance of the -norm for Corollary 2.6, we consider the best polynomial approximants of degree to in the - and -norm (see Fig. 3 (right)). One finds that any corruption of arbitrarily small support prevents the best and polynomial approximants from recovering the uncorrupted polynomial.
The bound on of in Corollary 2.6 is probably not sharp. Though, we know that it cannot be increased above [5, Sec. 4.2]. This means that the algebraic scaling with respect to is definitive. In Appendix A, we extend Corollary 2.6 by demonstrating that the location of the support of the corruption in is important, and more is allowed provided that the corruption occurs away from .
For concreteness, we have assumed that the sample points are the Chebyshev points given in Eq. 4. This choice is recommended when the samples can be taken at arbitrary points in . However, in some cases, the sample points may be given a priori and cannot be chosen. Most of our results carry over to such cases with minor modifications and assumptions on the distribution of sample points.
3 Near-recovery of corrupted smooth functions
When recovering a corrupted polynomial , the degree of is usually unknown so we compute best polynomial approximants to of degree for a slowly increasing sequence of , stopping when . For the majority of this process and one may wonder what is achieving in this regime (see Fig. 2 (a)). Similarly, if is a corrupted smooth function , where is a continuous function (not necessarily a polynomial) on , then one cannot hope for exact recovery using best polynomial approximation. Instead, we find that delivers a near-recovery of in the sense that is a near-best approximation to , provided that the support of the corruption is small and can be well-approximated by a degree polynomial. We first show that the best approximations for and are relatively close to each other.
Theorem 3.1**.**
Let be a -corrupted function on , where is continuous, and be a best polynomial approximant of degree to . If , then
[TABLE]
*where is the best approximant of degree to on .
Proof 3.2**.**
Let and . Since , and by the triangle inequality, we have
[TABLE]
where the last inequality holds since and for . From Eq. 19, we find that
[TABLE]
Hence, for any we have the inequality
[TABLE]
Finally, by setting and noting that we conclude that
[TABLE]
*The result follows by rearranging this inequality. *
Theorem 3.1 shows that best polynomial approximation is useful for near-recovery of a corrupted smooth function. More precisely, when we have
[TABLE]
and we conclude that a best approximant of recovers as best it can, up to a factor that depends on and .
The inequality in Eq. 22 also partially explains regime (a) in Fig. 2. It provides theoretical justification that is a near-best polynomial approximant to in Fig. 2. For the example in Fig. 2, we observe this near-recovery phenomenon since
[TABLE]
where is the best approximant of degree to the corrupted function.
Unlike corrupted polynomials (see Section 2), cannot be exactly recovered by . Nonetheless, we find that is often still a near-best approximant to , i.e., . By interpreting as noise, we observe that minimization gives a stable signal recovery in the presence of noise, a phenomenon that is appreciated in the classical compressed sensing context [12]. Making this observation precise in our setting is left as an open problem. Since by Theorem 3.1 we also have , it follows that and is an excellent initial guess for Newton’s method for computing (see Section 5.3).
3.1 Related studies
The contents of Sections 2 and 3 can be regarded as contributions in compressed sensing, and a number of related studies are available in the literature. For (exact and near-exact) recovery of corrupted functions with minimization, examples include the paper by Adcock, Brugiapaglia and Webster [29], and Shin and Xiu [1]. Unlike this work, these papers consider recovering high-dimensional functions, describing probabilistic methods by taking random samples. Here we focus on univariate polynomials and reveal connections between and minimizers, and derive a deterministic recovery algorithm (under assumptions on the size of sampled corruption ) with minimization. Few of the results in this paper appear to be trivially generalizable to the higher-dimensional setting; this is left as an interesting open problem.
In the more classical setting of recovering a discrete signal (rather than a function) from a corrupted vector of observations, numerous contributions are available in the literature. See for example [10, 13, 21, 35] and the references therein. Ideas in compressed sensing have also been applied for general high-dimensional function approximation [2, 14].
4 Error localization of best polynomial approximants
In Sections 2 and 3 we saw that can be used for recovering corrupted polynomials and smooth functions. This is fundamentally due to the error localization properties of best polynomial approximation. The error localization properties of are also important when approximating continuous functions that one might not necessarily view as corrupted functions. We observe that continuous functions with singularities often have for most .
To make this precise, recall the definition of in Eq. 2. By definition of , we find that and thus,
[TABLE]
Therefore, the measure of is bounded above by the disparity between the magnitude of and . If asymptotically faster than as , then the error must be highly localized for sufficiently large . An upper bound on follows from an upper bound on and a lower bound on .
4.1 Error localization of best approximants to
Consider the function , which is continuous on with square root singularities at . Here, we show that proving that is a better pointwise estimate to than for all except for a set of measure .
By [17, Lem. 4], we know that when is an even integer we have for , where is the degree Chebyshev interpolant of (see Eq. 5). This allows us to derive an explicit expression for by using an explicit formula for [8]. By applying the formula in [8] to , we find that
[TABLE]
Here, the values of are derived as the expansion coefficients of in a Chebyshev series of the second kind. That is,
[TABLE]
Since for , we can bound by
[TABLE]
where the last inequality uses the crude bounds of and .
We now seek a lower bound on . Let be the Chebyshev expansion of the first kind for that is truncated after terms. The values of are simple to calculate: for all integers , and
[TABLE]
Assuming is an even integer, we find that
[TABLE]
Thus, for an even integer . By [22, Cor. 4.1], we know that
[TABLE]
We conclude from Eq. 23 that for we have
[TABLE]
where the final equality holds since it is known that [22, Eq. 20].
Figure 4 (left) shows the error for demonstrating that it is localized near . The measure of is shown in Fig. 4 (right) where it is numerically observed that . When , we find that for all except for a set of measure .
4.2 Error localization of best approximants to
As a second example of error localization, consider on , which is continuously differentiable except at . The error formula for with is calculated in [8] and simplifies to
[TABLE]
Moreover, it is known that for some [33]. We conclude from Eq. 23 that as . Figure 5 (left) shows the error for demonstrating that it is highly localized and Fig. 5 numerically confirms that .
5 A globally convergent algorithm for computing best polynomial approximants
We now turn to the algorithmic aspects of computing . We integrate our findings on exact recovery of corrupted polynomials and error localization into Watson’s algorithm based on Newton’s method [34]. An algorithm to compute best approximants with degrees in the thousands is developed based on recent advances in approximation theory such as stable polynomial interpolation, fast domain subdivision, and robust rootfinding implemented in Chebfun [16]. Figure 6 gives an overview of our algorithm.
5.1 Initial attempt: The Chebyshev interpolant
The polynomial interpolant in Eq. 5 with can be computed in operations [19] and the roots of on can be computed efficiently when is a smooth function [7]. Since when has exactly roots in [26], we recommend that Eq. 5 is always computed to see if . When it is, is efficient to compute and from practical experience it is relatively common for (for example, see [17, Lem. 4]). This can happen also when is a corrupted polynomial.
5.2 Test for corrupted polynomials and initial guess: Compute minimizer
When has zeros in , computing is more involved and, in general, requires an iterative procedure. In this case, we first solve the discrete problem in Eq. 15 to obtain . This has two purposes: (i) If is a corrupted polynomial (see Section 2), then , and (ii) If is not a corrupted polynomial, then [28, Thm. 3.9], which is then used as the initial guess for Newton’s method (see Section 5.3).
Specifically, we solve the LP in Eq. 25 with a large number of samples , taking and as in Eq. 4. In our implementation we select . (This is an engineering choice that assumes the corruption is small.) Recall from Theorem 2.1 that we want .) The maximum value 5000 is set to keep the LP size manageable.
Once is computed, we check whether is a corrupted polynomial. This can be done by testing if holds at most of the sample points to within working precision. If not, then we improve the estimate by refining the LP mesh, and then proceed to Newton’s method.
5.2.1 Refinement: Reducing the discretization error
Underlying the minimization problem Eq. 13 is an approximate integration of a non-differentiable function. Specifically,
[TABLE]
Since is expected to be continuous, but non-differentiable at points, one expects the integration error in Eq. 24 to be large and there is little benefit from using a high-order quadrature rules. Indeed using sample points, we find that the LP solution has accuracy , whether a high-order method (e.g. Clenshaw-Curtis) or a low-order method (such as the midpoint rule) is used. In more detail, the quadrature error in Eq. 24 is , so the objective function value is within of optimal: . However, this only implies , which is a common phenomenon in optimization: at a global (or local) minimum, an -perturbation in the solution results in perturbation in the objective value. This low accuracy of can cause convergence issues for Newton’s method, when it is used as an initial guess.
To improve the discretization error in Eq. 24, we follow a three-step procedure: (1) We use the initial LP solution with points to obtain an approximation to , which we denote by . (2) The roots of in are computed, which we expect to be approximations to the roots of . Finally, (3) we solve another LP to obtain , which is a better approximant to than , with a discretization scheme that forms a finer mesh near the roots: We take points on , where , taking equispaced points on each subinterval. We then take more points on , outside the subintervals, again uniformly, i.e., the grid is much coarser (see Fig. 7 (right)). We take the weights according to the midpoint rule. We thus take a mesh rather than near the roots, while still having a mesh elsewhere. This refinement of the quadrature rule is observed to improve the accuracy to , as the quadrature error at the roots have been improved from to . We then solve Eq. 13 by a standard technique of casting it as linear programming (LP) [13], namely
[TABLE]
Note that we do not use SPGL1 or the Chebyshev points from Eq. 8 in the refinement stage. This is because SPGL1 requires the computation of the null space , which can be more expensive. Due to the sparsity structure of LP, we find that the MOSEK optimization toolbox [3] (using its MATLAB interface) offers an efficient solver.
In Fig. 7 (left) we show the error with the LP solution for , with and without the refinement. Note that the number of decision variables in LP Eq. 25 is , with inequality constraints.
5.3 Iterative procedure: Newton’s method
To improve the initial guess obtained in Section 5.2 we employ Newton’s method based on the ideas in Watson’s algorithm [34, Sec. 4], which is a globally convergent (under mild assumptions) iterative method for computing when the set has zero Lebesgue measure. We assume this below; otherwise was a corrupted polynomial, which would be detected by Eq. 13 if the corruption is small.
When the set has zero Lebesgue measure, an alternative characterization of is [27, Thm. 14.1]
[TABLE]
for all . We propose to apply Newton’s method to Eq. 26. By using the Chebyshev polynomials of the second kind as a basis for , we define a vector-valued operator given by
[TABLE]
We note that if and only if from Eq. 26, and we propose to use Newton’s method on to find it.
Newton’s method tells us to perform the following iteration:
[TABLE]
Moreover, it can be shown that can be expressed as [34]
[TABLE]
where are the roots of and is the Chebyshev–Vandermonde matrix at , i.e., .
At the th Newton iteration, we must calculate the roots of , evaluate for and at , form using Eq. 29, and then solve an dense linear system where the righthand side is . All these operations can be performed conveniently and robustly in Chebfun to an accuracy of essentially machine precision [16]. The dominant computation in each Newton’s step lies either in the evaluation of in (27), which costs where is the Chebfun degree of , or the linear system , for a total of complexity. Typically Newton converges within a handful of iterations.
As Watson notes [34], a small modification for the formula for in Eq. 29 is required when for some , e.g., set , or when is rank-deficient, e.g., set for some small . Under mild restrictions, this modified Newton’s method generically converges to at a quadratic rate [34].
5.4 Stopping criterion: Near-best condition
It is important to have a stopping criterion to determine when Newton’s method in Eq. 28 should be terminated. The simplest criterion could be to stop computing iterates as soon as , where is a small parameter. However, we prefer to stop Newton’s method as soon as because it leads to a near-best guarantee.
Theorem 5.1**.**
Let be a continuous function and . If , then
[TABLE]
*where is the degree Chebyshev polynomial of the second kind. *
Proof 5.2**.**
Let and define so that . Then,
[TABLE]
Therefore, we find that . Expanding in a Chebyshev series, we find that
[TABLE]
Since for [24, (18.14.4) & (18.7.4)], we have
[TABLE]
where the last inequality comes from the fact that . It follows that
[TABLE]
*where the inequality holds since \sum_{i=0}^{n}(i+1){\color[rgb]{0,0,0}\definecolor[named]{pgfstrokecolor}{rgb}{0,0,0}=(n+1)(n+2)/2\leq(n+2)^{2}/2}. By using Eq. 32 to bound the righthand side of Eq. 31, the result follows by rearranging. *
Theorem 5.1 shows that one can track the quantity for to estimate how close the current Newton iterate is to computing . In practice, we terminate Newton’s method as soon as . It can happen that the initial guess in Section 5.2 already satisfies the stopping criteria in which case no Newton iterations are computed.
Acknowledgments
We thank Laurent Demanet for discussing the implications of the Remez inequality with us. We also thank Vanni Noferini who was present during the initial discussions of this work. We thank Nick Trefethen and Heather Wilber for reading a draft of this manuscript and improving the text.
Appendix A Corruption away from the endpoints
Lemma 2.4 shows that polynomials of degree cannot be too concentrated in a set of measure , which is a consequence of the fact that for any . An alternative bound on the derivative of a polynomial is [6, Ch. 5]
[TABLE]
for any . This inequality is better when is away from , and suggests that polynomials of degree are less concentrated in the middle of compared to near . This turns out to be the case.
Theorem A.1**.**
Let with Lebesgue measure and suppose that is such that . For , we have
[TABLE]
*for any polynomial .
Proof A.2**.**
Let and let denote its absolute maximum in . By Bernstein’s inequality [6, Ch. 5] we have that for and for . Let be such that . Using these two inequalities, we observe that there is an interval containing of width at least for which is of the same sign as . The area of the triangle of width and height is . Next use the same argument for such that , to obtain a triangle with area . Note that since , the two triangles can be chosen to not overlap. We can thus write .
Since , we find that
[TABLE]
*The function on is minimized at . The bound in Eq. 33 holds since for any . *
Arguing as in Corollary 2.6, Theorem A.1 means that a corrupted polynomial of degree can be exactly recovered by when and . For sufficiently large , this is a relaxation of the requirements for exact recovery in Section 2 when the corruption is away from (see Fig. 2 (c) and the localized error near in Fig. 5). Other results in Section 3 can be relaxed by using Theorem A.1 under the restriction that the corruption occurs away from . In particular, one can show that if with , then
[TABLE]
where is the best polynomial approximation of on .
Theorem A.1 also encourages us to wildly speculate (recalling the derivation of Corollary 2.6) that the error localization of is usually more concentrated for functions with endpoint singularities, i.e., , and less concentrated for functions with singularities away from , i.e., or even .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Adcock, A. Bao, and S. Brugiapaglia. Correcting for unknown errors in sparse high-dimensional function approximation. Numerische Mathematik , 142(3):667–711, 2019.
- 2[2] B. Adcock, S. Brugiapaglia, and C. G. Webster. Compressed sensing approaches for polynomial approximation of high-dimensional functions. In Compressed Sensing and its Applications , pages 93–124. Springer, 2017.
- 3[3] MOSEK. Ap S. The MOSEK optimization toolbox for MATLAB manual. Version 8.1. , 2017.
- 4[4] Z. Battles and L. N. Trefethen. An extension of MATLAB to continuous functions and operators. SIAM J. Sci. Comp. , 25(5):1743–1770, 2004.
- 5[5] Y. Benyamini, A. Kroó, and A. Pinkus. l 1 superscript 𝑙 1 l^{1} -approximation and finding solutions with small support. Constructive Approximation , 36(3):399–431, 2012.
- 6[6] P. Borwein and T. Erdélyi. Polynomials and Polynomial Inequalities , volume 161. Springer Science & Business Media, 2012.
- 7[7] J. P. Boyd. Computing zeros on a real interval through Chebyshev expansion and polynomial rootfinding. SIAM J. Numer. Anal. , 40(5):1666–1682, 2002.
- 8[8] H. Brass. A remark on best L 1 superscript 𝐿 1 L^{1} -approximation by polynomials. J. Approx. Theory , 52:359–361, 1988.
