Compressed sensing with sparse corruptions: Fault-tolerant sparse collocation approximations
Ben Adcock, Anyi Bao, John D. Jakeman, Akil Narayan

TL;DR
This paper develops a robust compressive sensing method to recover sparse coefficients in Polynomial Chaos Expansions despite measurement corruptions, addressing hardware/software failures in complex computational frameworks.
Contribution
It introduces a novel theoretical analysis and an iterative reweighted algorithm for fault-tolerant sparse recovery in polynomial chaos expansions.
Findings
The method effectively recovers sparse coefficients with corrupted measurements.
The iterative algorithm improves regularization parameter selection.
Numerical tests demonstrate robustness in high-dimensional differential equation solutions.
Abstract
The recovery of approximately sparse or compressible coefficients in a Polynomial Chaos Expansion is a common goal in modern parametric uncertainty quantification (UQ). However, relatively little effort in UQ has been directed toward theoretical and computational strategies for addressing the sparse corruptions problem, where a small number of measurements are highly corrupted. Such a situation has become pertinent today since modern computational frameworks are sufficiently complex with many interdependent components that may introduce hardware and software failures, some of which can be difficult to detect and result in a highly polluted simulation result. In this paper we present a novel compressive sampling-based theoretical analysis for a regularized minimization algorithm that aims to recover sparse expansion coefficients in the presence of measurement corruptions. Our…
| number of measurements | |
| length of sparse vector | |
| sparse vector in | |
| corruptions vector in | |
| measurement matrix | |
| noise vector in | |
| noise bound | |
| non-negative weighting parameter for the corruptions vector | |
| , | solutions of the optimization problem |
| subset of , indices corresponding to | |
| subset of , indices corresponding to | |
| sparsity of | |
| sparsity of | |
| set of -sparse vectors in | |
| set of -sparse vectors in | |
| best -term approximation error, measured in the norm | |
| best -term approximation error, measured in the norm |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Compressed sensing with sparse corruptions: Fault-tolerant sparse collocation approximations
Ben Adcock111B. Adcock and A. Bao acknowledge the support of the Alfred P. Sloan Foundation and the Natural Sciences and Engineering Research Council of Canada through grant 611675.
Department of Mathematics
Simon Fraser University
Burnaby, BC, Canada
Anyi Bao11footnotemark: 1
Department of Mathematics
Simon Fraser University
Burnaby, BC, Canada
John D. Jakeman222J.D.Jakeman’s work was supported by DARPA EQUiPS.
Computer Science Research Institute
Sandia National Laboratories
Albuquerque, NM, USA
Akil Narayan333A. Narayan is partially supported by NSF DMS-1720416, AFOSR FA9550-15-1-0467, and DARPA EQUiPS N660011524053
Department of Mathematics and
Scientific Computing and Imaging (SCI) Institute
University of Utah
Salt Lake City, UT, USA
Abstract
The recovery of approximately sparse or compressible coefficients in a polynomial chaos expansion is a common goal in many modern parametric uncertainty quantification (UQ) problems. However, relatively little effort in UQ has been directed toward theoretical and computational strategies for addressing the sparse corruptions problem, where a small number of measurements are highly corrupted. Such a situation has become pertinent today since modern computational frameworks are sufficiently complex with many interdependent components that may introduce hardware and software failures, some of which can be difficult to detect and result in a highly polluted simulation result.
In this paper we present a novel compressive sampling-based theoretical analysis for a regularized minimization algorithm that aims to recover sparse expansion coefficients in the presence of measurement corruptions. Our recovery results are uniform (the theoretical guarantees hold for all compressible signals and compressible corruptions vectors), and prescribe algorithmic regularization parameters in terms of a user-defined a priori estimate on the ratio of measurements that are believed to be corrupted. We also propose an iteratively reweighted optimization algorithm that automatically refines the value of the regularization parameter, and empirically produces superior results. Our numerical results test our framework on several medium-to-high dimensional examples of solutions to parameterized differential equations, and demonstrate the effectiveness of our approach.
1 Introduction
The approximation of function values using point evaluations or samples is necessary in a wide number of applications. Much attention has been focused recently on the approximation technique of compressive sampling (CS): The ability to recover sparse linear representations of a function from a given dictionary. This is a particularly important problem in parametric uncertainty quantification (UQ) where the number of parameters translates into the number of variables on which an unknown function depends (the “dimension” of the problem). It is common for dimension to be very large, and the number of degrees of freedom in classical approximation strategies generally grows exponentially with the dimension. This makes classical computational procedures for approximating functions infeasible for large dimensions.
In contrast, compressive sampling seeks a sparse representation of a function using only a small number of samples or measurements, regardless of the parametric dimension. In a non-intrusive UQ pipeline, each function sample corresponds to a potentially large-scale simulation, and so minimizing the requisite number of samples is desirable. When functions are sparse or compressible in a given basis or dictionary, this reconstruction procedure has the potential to mitigate the exponentially debilitating curse of dimensionality. Algorithms in UQ that utilize compressive sampling have enjoyed great success in recent years [43, 42, 44, 27, 18, 10, 17, 15, 22]. For related theoretical contributions, see [1, 2, 8, 16, 29, 28, 41].
Missing from the sparse recovery UQ contributions above is a concrete strategy for fault-tolerant or resilient algorithms. Ensuring modeling resilience for UQ in the presence of system failures is essential for credible prediction on new and emerging massively parallel systems. Fault-tolerant algorithms in general have become necessary in computational science since node failures on distributed architectures can yield corrupted data (the frequency of which increases as the number of processors increases), or algorithmic run-time software failures can result in polluted simulation results. These failures can generate polluted measurements in unpredictable and sometimes undetectable ways [6].
Faults can occur due to complex combination of internal and external conditions that are difficult to reproduce. For example, bits may suffer random corruption, or physical defects in hardware may cause data faults. Corruption errors during model simulation can be grouped into two main types, soft and hard. In this paper, we consider hard faults as errors that cause the simulation to terminate prematurely and/or return obvious, automatically detectable error values such as NaN or Inf. Hard faults by this definition are easy to identify and mark for discard, thus obviating or ameliorating the need for fault-tolerant algorithms.
In contrast, soft failures are essentially random systematic corruption of results that are not easily identifiable. These soft failures pose challenges in UQ: A soft failure will not cause obvious failure in fault-intolerant UQ methods; however, incorrect model values caused by soft failures can significantly degrade an approximation. It is in this case that we require the development of robust and resilient algorithms that can, ideally, deliver constant levels of performance when faced with a few highly corrupted data points.
To address this issue, fault-tolerant algorithms for UQ have been investigated in the context of multilevel Monte Carlo algorithms [24, 25, 26], and in overdetermined least-squares polynomial recovery problems [31]. To the best of our knowledge, there is no comprehensive research in the UQ literature on fault-tolerant sparse recovery algorithms, and in the compressive sampling literature only a handful of papers [19, 21, 23, 32, 33, 34, 35, 39] deal with the problem of corrupted measurements.
The operative distinction in the problem we consider in this paper is a hardware or software fault resulting in occasional large-magnitude errors; we call this the problem of corruptions. Existing CS algorithms are known to be stable with respect to small noise perturbations, but cannot handle sparse corruptions, i.e., situations when a small number of samples are highly corrupted with the corruption magnitudes much larger than typical noise. In this paper we present novel theory and application studies of a sparse corruptions algorithm for CS. The algorithm we use was considered in [21], but we present more general theoretical guarantees on recovery, including practical guidance for the choice of algorithmic regularization parameters.
For fault-tolerance in the context of the sparse recovery problem, the recovery properties of an ideal resilient algorithm would be agnostic to large-magnitude corruptions in a small number of function samples. As described above, these corruptions can arise due to unknown failure modes in computational models or because of large but intermittent measurement errors. Development of mathematical theory for the corrupted compressive sampling problem, and investigation of a corresponding resilient algorithm for sparse recovery of expansion coefficients are the central goals of this paper. The target applications we investigate are exemplars of a common task in UQ: recovery of approximately sparse expansion coefficients in an orthogonal polynomial (polynomial chaos) basis.
The theory and algorithms developed in this paper have the following features:
- •
The compressive sampling recovery theorems are uniform with respect to the function and the corruptions. That is, the recovery guarantees hold over all compressible functions having sparsely corrupted measurements for a single random sampling of measurements.
- •
The algorithm involves a tunable regularization parameter , and a theoretically optimal choice of this parameter is explicitly determined by our analytical results. This theoretically optimal value is defined only by the ratio of measurement corruptions to signal sparsity. Since signal sparsity is frequently comparable to the number of measurements, this optimal loosely translates into the fraction of measurement samples that are corrupted. From a user’s point of view, our analysis thus suggests a value of having knowledge only of the ratio of measurements believed to be corrupted.
- •
In experiments, we observe that optimal values of the regularization parameter are non-trivially dependent on the number of measurements, the signal sparsity, and the number of corruptions. We thus propose an iteratively reweighted algorithm for recovery that learns values of the regularization parameter. Our experiments suggest that these learned algorithmic parameters perform better than the value defined by our theoretical results, and thus this reweighted algorithm is more useful in practice.
- •
The location and magnitude of the corruptions amongst the collection of function samples can be unknown, but the algorithm recovers those locations and the corresponding corruption values.
- •
The algorithm is robust to small, but non-sparse measurement errors – e.g. due to noise, truncation of an infinite polynomial expansion or numerical error in computing function samples – and moreover is noise-blind. That is to say, it requires no a priori upper bound on such errors.
- •
The optimization problem we solve to compute solutions is from [21], but our work is both a theoretical and practical advancement over the results in that reference. In order to show the solution computed is indeed the original sparse solution, [21] uses conditions on the restricted isometry constant (RIC) of the measurement matrix. Our results are a significant relaxation of previously reported conditions on the RIC (compare conditions on in Lemma 2.3 of [21] versus our Theorem 3.7, equation (3.8), and the discussion in Section 3.4). The results for general sensing matrices in [21] are nonuniform with respect to the signal and corruptions support, and require certain models for the signal and corruptions; our results are uniform and require no model for the signal or corruptions, other than compressibility. Finally, our paper is also devoted to numerical investigation of the performance of the method, including practical guidance for choosing the regularization parameter ; such thorough investigations are absent in [21].
We first introduce notation and summarize the main mathematical statements of this paper in Section 2. This is followed in Section 3 by our theoretical analysis. Section 4 presents numerical results to complement our theoretical analysis and verify the practical efficacy of the algorithm.
2 Model problem and main results
Let denote an unknown function, and let be a given dictionary of functions, . For example, the functions are frequently multivariate polynomial chaos basis elements; our capstone numerical examples will show results from such a basis. In scenarios of interest, the size of the dictionary is very large.
The ultimate goal is to recover coefficients that determine the approximation
[TABLE]
using samples of , where is an assumed small discrepancy term between the exact function and its -term linear approximation in 444Our notation suggests that depends explicitly and deterministically on ; however, our theory encompasses the case when is a stochastic variable or process, e.g., independent Gaussian random variable additive perturbations of the measurements.. For the purposes of exposition we assume for some known uniform noise bound ; we will show later that lack of a priori knowledge for this bound only affects theoretical results in benign ways. As described above, we assume the vector to be compressible. Sparsity or compressibility of a vector can be quantified via its best -term approximation error,
[TABLE]
where is the standard norm on vectors; for , is the sparsity of , i.e., the number of non-zero elements in the vector.
With a collection of samples of , we have the corresponding corrupted function measurements,
[TABLE]
where the corruption vector is assumed to be -sparse but can have large entries. To enforce an underdetermined system, we assume . Defining the rectangular matrix with entries , then the unknown vectors and satisfy the underdetermined linear system
[TABLE]
In order to compute the solution having knowledge of only and , we consider the following model problem (see also [21] and references therein):
[TABLE]
Let be a minimizer of this problem, where and . Our objective is to obtain conditions on (in particular, on the number of measurements ) and such that the error
[TABLE]
can be bounded by the best approximation numbers and , and the noise magnitude .
2.1 Main results
In all that follows, the statement means for some universal constant . Our first main result shows that stable and robust recovery of and is implied by a certain modification of the classical Restricted Isometry Property (RIP) which incorporates the sparse corruptions term (Definition 3.5). Specifically, Theorem 3.7 establishes that if the matrix satisfies the RIP for the corruptions problem of order (see Definition 3.5) with constant satisfying
[TABLE]
then the following error bounds hold:
[TABLE]
Our second main result (Theorem 3.15) provides explicit conditions on , and for (2.4) to hold for matrices of so-called bounded orthonormal system [12, Chpt. 12]. Specifically, suppose that is an -orthonormal system, where is a probability measure and its support. Define
[TABLE]
and let where are drawn i.i.d. according to . If
[TABLE]
then with probability at least , the restricted isometry constant of the scaled matrix satisfies .
One can see from these estimates that optimizing over values of yields a minimum value of when . Assuming , this provides a concrete determination of the parameter for use in (2.3) having knowledge only of the ratio of corrupted measurements. We note in passing that we do not believe that the second condition in (2.6) is sharp in the dependence on the product . Improvement of this to a condition of the form
[TABLE]
is left as a topic for future work. Note that such a condition is known for Gaussian random matrices. Moreover, a nonuniform recovery result with the scaling (2.7) for exactly sparse coefficients and corruptions having random sign patterns was given in [21]. See Section 3.3 for further discussion.
It is common in compressed sensing to assume some a priori known noise bound based on the user’s knowledge of measurement noise or truncation error. Although there are some results that circumvent this assumption [1, 2], they typically yield somewhat weaker recovery guarantees. However, in the context of the sparse corruptions theory presented above, such prior knowledge of is not necessary for stable recovery: The error introduced by an unknown noise can be passed into theoretical estimates as a penalty of size . To see this, note that if we define , then the system can be written as . Solving (2.3) by setting results in the version of the estimate (2.5b) with replacing . However, the normalized best -term approximation error to appearing in (2.5b) is stable with respect to noise perturbations:
[TABLE]
Here is any bound for the perturbation in the uniform norm. Using (2.7), we see that , which is on the same order as the estimate (2.5b) that uses a priori knowledge of . A similar argument holds for the bound (2.5a).
While our theoretical results are thus insensitive to ignorance about small noise levels, we caution that it is always a good idea to use such information in practical recovery algorithms if available, e.g. as the result of cross validation. See, for example, [10, 43, 17].
2.2 Remarks on numerical results
We postpone presenting numerical results until the end of this paper in Section 4. However, some remarks on our findings are pertinent here in the context of the previous section’s theory. First, the optimal value of that is suggested by (2.4) does not appear to be the computationally optimal value of . That this fixed value of is not the best is not surprising since the bounds (2.5) are derived using some loose inequalities. However, such bounds can be useful in understanding qualitative trends. Results from our experimentation do suggest that large values of more reliably recover corruptions when is large (see Figures 1 and 2). This general trend in numerical results is consistent with the behavior of in (2.4) as a function of when is large.
We address this discrepancy between the theory and empirical results by propose an iteratively reweighted optimization scheme (see [7]) that learns and updates the value of . Our results show that this proposed algorithm performs much better in practice than algorithms that fix . However, we do not present any theory to support the observed superiority of reweighted optimization schemes for the corruptions problem.
Many of our capstone numerical examples are from applications using polynomial chaos expansions, where the compressible function has an expansion in a multivariate orthogonal polynomial basis. To simplify the presentation of our results, we focus on such examples where the basis is a tensor-product Legendre polynomial or Chebyshev polynomial system. Much recent work has shown that randomly generating measurements using samples from standard distributions (e.g., the uniform distribution) can accurately and near-optimally recover orthogonal polynomial expansions from such basis sets [28, 17, 41]. Recovery in more general polynomial spaces has been investigated [16, 18, 15], but these methods usually rely on sophisticated sampling strategies and optimal sampling schemes are still an active area of research.
3 Theory for the sparse corruptions problem
We recall and summarize our notation for the sparse corruptions problem in Table 1. Our previous discussion was framed for real-valued signals and measurements , but we now generalize to the complex-valued setting. This adds generality with no additional mathematical difficulty.
We follow a familiar path for deriving conditions on such that optimization problems recover sparse solutions (see, for example, [12]). Section 3.1 defines an appropriate robust Null Space Property (NSP) for the matrix in the sparse corruptions setting. Under this property, we show that the recovery estimates (2.5) hold. In order to construct matrices that satisfy the robust NSP, Section 3.2 generalizes the concept of the Restricted Isometry Property (RIP) for matrices to the sparse corruptions setting. That section shows that matrices satisfying the RIP for the sparse corruptions problem also satisfy the robust NSP. Sections 3.3 and 3.3.2 show that if the dictionary elements form a bounded orthonormal system, then under the condition (2.6), the matrix satisfies the RIP with high probability. Finally, using these various results, we discuss a theoretically-optimal choice for in Section 3.4.
3.1 The Robust Null Space Property for the sparse corruptions problem
The following two definitions are generalizations of robust null space properties (cf. [12, Definition 4.17] and [12, Definition 4.21], respectively), and prescribe classes of matrices whose kernels do not contain sparse vectors.
Definition 3.1**.**
Let , and . A matrix satisfies the -robust null space property of order with weight if there exist constants and such that
[TABLE]
for all sets and with and . Above, is the complement of in , and similarly for .
Definition 3.2**.**
Let , and . A matrix satisfies the -robust null space property of order with weight if there exist constants and such that
[TABLE]
for all sets and with and .
These definitions yield the following two results:
Lemma 3.3**.**
If satisfies the -robust null space property of order with weight and constants , then it satisfies the -robust null space property of order with weight and constants , .
Proof.
Observe that
[TABLE]
We now use the definition of the -robust null space property. ∎
Theorem 3.4**.**
Let , and and suppose that satisfies the -robust null space property of order with weight . Let , , and be such that , and suppose that is a minimizer of
[TABLE]
*Then *
[TABLE]
and
[TABLE]
where the constants depend on and only and is given by
[TABLE]
Proof.
We first prove (3.2). Lemma 3.3 implies that satisfies the -robust null space property. Let , and , be such that and . Then, if and we have
[TABLE]
Rearranging now gives
[TABLE]
where in the second inequality we note that since is feasible and is a minimizer. The -robust null space property now implies that
[TABLE]
and since , and
[TABLE]
we deduce that
[TABLE]
Finally, to complete the proof of (3.2) we argue as follows:
[TABLE]
Here, we use the -robust null space property in the second step, and (3.5) and (3.6) in the third step.
We now consider (3.3). Writing and as before, let be the index of the largest elements of in absolute value and be the index set of the largest elements of in absolute value. Define
[TABLE]
Then
[TABLE]
Now observe that and , and therefore
[TABLE]
where in the second step we use the -robust null space property and (3.5). Combining this with the previous estimate and using the definition of gives
[TABLE]
where we have defined the non-negative scalar as
[TABLE]
Completing the square with respect to under the brackets yields
[TABLE]
Using the -robust NSP on the pair along with the above estimate, we have
[TABLE]
We note that
[TABLE]
Combining the above with (3.7) proves (3.3). ∎
3.2 The Restricted Isometry Property for the sparse corruptions problem
The robust NSP is typically difficult to prove directly. Hence we now introduce the Restricted Isometry Property (RIP) for the sparse corruptions problem, and show that it implies the robust NSP. Note that this has been defined previously in [21, Defn. 2.1].
Definition 3.5**.**
Let , and . The Restricted Isometry Constant (RIC) of the matrix is the smallest constant such that
[TABLE]
for all and . If then we say that has the Restricted Isometry Property (RIP) of order .
Our first result is the following:
Lemma 3.6**.**
Let , , and . If satisfies the RIP of order with constant
[TABLE]
where is as in (3.4), then satisfies the -robust NSP of order with weight and constants and depending only on .
The proof of this result is given next. Combining this lemma with Theorem 3.4 now yields our main result:
Theorem 3.7**.**
Let , and and suppose that satisfies the RIP of order with constant satisfying (3.8) and as in (3.4). Let , , and be such that , and suppose that is a minimizer of
[TABLE]
Then
[TABLE]
*where the constants depend on only. *
We now prove Lemma 3.6. We first require the following:
Lemma 3.8**.**
Let , , and let satisfy the RIP of order with constant . Suppose that and are such that
[TABLE]
for some with . If and are orthogonal to and , respectively, then
[TABLE]
Proof.
Assume that without loss of generality. Let and and notice that and . Therefore
[TABLE]
Note that in the second step we use orthogonality of the vectors and and and . Similarly,
[TABLE]
Subtracting the second equation from the first gives
[TABLE]
On the other hand
[TABLE]
Combining this with (3.9) gives
[TABLE]
Now let be such that and . Then, after rearranging, we get
[TABLE]
We now seek values and which minimize the right-hand side of this expression. If then the minimal value [math] is attained by setting and letting . Conversely, if the minimal value is attained when and . This gives
[TABLE]
which completes the proof. ∎
Proof of Lemma 3.6.
Let and . To prove the -robust NSP for it is enough to show that (3.1) holds when is the index set of the largest coefficients of in absolute value and is the set of the largest values of in absolute value. Given , let be the index set of the next largest coefficients of in absolute value, be the index set of the next largest coefficients and so on. Define in a similar way. We now have the following:
[TABLE]
Let be such that
[TABLE]
and note that this gives
[TABLE]
For the second term of (3.10), we use the disjointness of and and and for in combination with Lemma 3.8 to get
[TABLE]
Let and be the largest entries of in absolute value. Then, by [12, Lem. 6.14], we have
[TABLE]
Similarly,
[TABLE]
which gives
[TABLE]
Therefore, combining this with (3.10), (3.11), (3.12) and (3.13) yields
[TABLE]
Consider the function , where . This function attains its maximum value at and takes value there. Additionally . Hence we get
[TABLE]
After noting that and rearranging, we obtain
[TABLE]
where
[TABLE]
To complete the proof we note that provided . This holds by assumption, since and therefore the condition (3.8) implies that . Also, after rearranging we see that if
[TABLE]
which again holds by assumption. ∎
- Remark 3.9
The RIP for the sparse corruptions problem is a special case of the RIP in levels (RIPL), introduced in [5]. The RIPL applies to vectors that are sparse in levels; namely, having different amounts of sparsity in different (but fixed) sections of the vector. In the context of the sparse corruptions problem, this corresponds to the concatenated vector , which is -sparse in its first entries and -sparse in its remaining entries. As a general tool, sparsity in levels has been used in the context of compressive imaging [3, 4, 30], radar [11] and multi-sensor acquisition [9]. It is interesting that the same model also occurs naturally in the, seemingly unrelated, sparse corruptions problem. We note in passing that Theorems 3.4 and 3.7 follow a similar approach to that of [5] with some changes made to incorporate the weighted optimization problem.
3.3 Matrices that satisfy the RIP for sparse corruptions
We first recall the classical RIP for sparse vectors:
Definition 3.10**.**
Let and . The Restricted Isometry Constant (RIC) of the matrix is the smallest constant such that
[TABLE]
for all . If then we say that has the Restricted Isometry Property (RIP) of order .
To distinguish it from the RIP for the sparse corruptions problem (Definition 3.5), we shall refer to this as the RIP for sparse vectors.
Lemma 3.11**.**
Let , , and define
[TABLE]
where is the submatrix of with entries . Suppose that has the RIP for sparse vectors with constant and that . Then has the RIP of order for the sparse corruptions problem with constant
[TABLE]
In other words,
[TABLE]
for all and .
Proof.
Let and and write and . Then
[TABLE]
By Young’s inequality
[TABLE]
for any . Hence
[TABLE]
Solving the equation yields the value , and substituting this value of into the previous expression yields the proof. ∎
This result shows that any matrix satisfying the RIP for sparse vectors also satisfies the RIP for the sparse corruptions problem, provided the all submatrices have small spectral norm.
3.3.1 Gaussian random matrices
Gaussian random matrices in the context of the sparse corruptions problem were considered in [21]. The following result essentially recaps the main result for this case given therein. We include a short proof for completeness:
Theorem 3.12**.**
Let , , and suppose that
[TABLE]
Let be a matrix whose entries are independent Gaussian random variables with mean zero and variance . Then with probability at least , the matrix has the RIP for the sparse corruptions problem of order with constant .
Proof.
Lemma 3.11 asserts that has the RIP of order for the sparse corruptions problem with constant provided (i) has the RIP of order with and (ii) the constant defined in (3.15) satisfies . Hence, by the union bound it suffices to show that (3.16) and (3.17) imply both (i) and (ii) separately with probabilities at least . Due to a standard result in compressed sensing (see, for example, [12, Thm. 9.2]), property (i) holds with probability at least whenever the condition (3.16) is satisfied. We now consider property (ii). First, notice that is increasing in . Therefore, we may assume that , i.e. and . Fix subsets and with and . Then, due to a known result for singular values of random Gaussian matrices (see, for example, [38, Cor. 5.35]), we have
[TABLE]
The conditions (3.16) and (3.17) imply that and . Hence, by the union bound
[TABLE]
In particular, provided
[TABLE]
Since , we have . Hence this condition is implied by (3.16) and (3.17). This establishes property (ii) and completes the proof. ∎
This result asserts that Gaussian random matrices can recover a fixed fraction of corruptions (see (3.17)) and (up to constants) the same level of sparsity as in the uncorrupted case (see (3.16)).
3.3.2 Bounded orthonormal systems
Gaussian random matrices, while mathematically appealing, are of little relevance to multivariate approximation using Polynomial Chaos expansions. In this case, a more suitable framework is that of bounded orthonormal systems (see, for example, [12, Chpt. 12]):
Let be a domain with a probability measure and be an orthonormal system of complex-value functions in . Recall that this system is bounded if
[TABLE]
Given such a system, we construct the measurement matrix as
[TABLE]
where are drawn independently at random according to the probability measure .
Theorem 3.13**.**
Let be the matrix of a bounded orthonormal system, and . If
[TABLE]
then satisfies the RIP for sparse vectors with probability at least .
We remark in passing that the logarithmic dependence in can be improved by one power, at the expense of a larger factor in [8]. However, this may not be best for the purposes of this paper, since in view of Theorem 3.7, scales linearly in the parameter (see next).
The following lemma estimates the constant for matrices of the form (3.18):
Lemma 3.14**.**
Let be the matrix of a bounded orthonormal system, and be as in (3.15). Then
[TABLE]
Proof.
Fix subsets , and , and let and with and . Then
[TABLE]
Hence . This now gives the result. ∎
With this in hand, we now deduce the following result:
Theorem 3.15**.**
Let , , and suppose that
[TABLE]
and
[TABLE]
Then, with probability at least , has the RIP of order for the sparse corruptions problem with constant .
Proof.
Theorem 3.13 and (3.19) imply that has the RIP of order with with probability at least . Moreover, Lemma 3.14 and (3.15) imply that . We now apply Lemma 3.11. ∎
- Remark 3.16
This result asserts that the number of corruptions that can be tolerated is a fraction of . This is inferior to the case of Gaussian random measurements, where Theorem 3.12 gives that a fraction of corruptions are permitted. We conjecture, however, that a similar estimate can be proved for the bounded orthonormal systems case – indeed, a nonuniform recovery result of this form was proved in [21] for the case of exactly sparse coefficients and corruptions with random sign sequences – albeit with a substantially more sophisticated argument than the proof of Theorem 3.12. In particular, while estimates for the singular values of matrices of bounded orthonormal systems are known [38], they are more stringent than those for Gaussian random matrices. Using these estimates and arguing via the union bound (as in the proof of Theorem 3.12) unfortunately results in an estimate similar to (3.15). We also note in passing that while there exist RIP estimates for quite general matrices under the sparsity in levels model [20] (see Remark 3.2), these unfortunately do not apply to the setup of the sparse corruptions problem. We therefore leave the problem of improving (3.15) for future work.
3.4 Strategy for choosing
Regardless of the matrix , our main theorems (Theorems 3.7 and 3.13) suggest an optimal strategy for choosing the parameter . Notice that the restricted isometry constant enters into the measurement condition in Theorem 3.13 as . Since Theorem 3.7 requires that (3.8) holds, the measurement condition contains a factor that is at least as large as
[TABLE]
We wish to minimize this factor so as to reduce the measurement condition as much as possible. This can be done by minimizing , which in turn yields the theoretically-optimal scaling
[TABLE]
Notice that this gives the value . In particular, the condition (3.8) becomes
[TABLE]
with right-hand side independent of and . We remark in passing that the choice (3.20) is implicitly made in [21]. However, the condition given in [21, Lem. 2.3] is which is significantly more stringent than (3.20). Moreover, [21] only considers exact sparsity, whereas Theorem 3.7 also treats the case of stable recovery of inexactly sparse coefficients and corruptions.
4 Numerical experiments
We divide our numerical results into two main sections. The goal of Section 4.1 is to study the behavior of numerical algorithms in the context of the theoretical estimates presented earlier. In particular, we investigate the influence that the regularization parameter has on recovery properties. We confine these investigations to problems with manufactured sparsity so that systematic studies may be carried out. The lessons learned from these studies allow us to formulate and propose an iteratively reweighted alternative to the one-time optimization (2.3). Note that none of our theoretical error estimates apply to algorithms with weighted norms. However, weighted schemes can provide empirically superior results, e.g., [43]. Thus, we explore weighted algorithms because their use is natural from a practical point of view, but is not in the scope of our theoretical analysis. Our simulations in this section use the SPGL1 package [36, 37].
The second collection of results, Section 4.2, focuses on more practical scenarios in scientific computing, dealing with recovery of sparse or compressible polynomial Chaos expansions of solutions to parameterized differential equations. Here we use the algorithmic lessons learned from Section 4.1 to illustrate the efficacy and fault-tolerance of our approaches on realistic problems in the presence of measurement corruptions.
4.1 Recovery of manufactured solutions with sparse corruptions
This section is primarily concerned with the generation of phase recovery diagrams for the sparse corruptions problem. In particular, our tests here are not necessarily motivated by sensing matrices and corruptions from function approximation, but instead are designed to understand behavior of the algorithms. The following standard experiment for accomplishing this is carried out: We fix the number of measurements and the dictionary size , and we vary the signal sparsity and the number of measurement corruptions . For each and we generate an -sparse signal , and for a given model of a measurement matrix , we generate measurements from the signal , and subsequently corrupt (highly pollute) these measurements with a -sparse vector , whose non-zero entries are , where is a random draw from a certain probability distribution and is a scaling constant. In this test, is a standard normal random variable and .
We then run the recovery algorithm (2.3) for a given value of , producing a recovered signal and measurement corruption vector . We define the recovery as successful if . In this test, we set the success tolerance to be .
In the test above, the generation of , and of , and of , are statistically independent555Measurement corruptions are generated as iid standard normal random variables, and support indices in a sparse vector are generated using the uniform probability law (draws without replacement) on the index set.. For each and , the above procedure is run times with independent draws, and an empirical estimate of the probability of “success” is computed. In the phase transitions plots below, we use simulations.
The phase transitions color each pixel, corresponding to a particular value of and , according to the empirical success probability. The phase transition axes are and , and thus each ranges in the interval , but we truncate to in our plots because this region is sufficient to illustrate behavior. We consider the following two models of measurement matrix :
Model 1: a Gaussian random matrix
Model 2: a randomly-subsampled Discrete Fourier Transform (DFT) matrix
Note that Model 2 is an example of a bounded orthonormal system. We compare several different choices of for each model.
4.1.1 Phase transition plots for fixed
Figures 1 and 2 display the results for models 1 and 2 described above, respectively. Each figure shows an array of plots; the columns correspond to differing values of , increasing from left to right; the rows correspond to differing values of , increasing from top to bottom, except the last two rows, which show the “optimal” value of suggested by the theory, and the iterative reweighting procedure described in the next section.
Comparing the results for (row 5 in the plots) with the other plots with fixed, we see that does not behave optimally in practice, even though this is suggested by our theory. Indeed, further experimentation reveals that the behavior of these transition plots changes notably when is varied. However, the following observations are consistent across all our runs:
- •
When there are few corruptions relative to the signal sparsity (), larger values of tend to perform better. This general trend is consistent with the theory from previous sections: Our recovery results are stated in terms of a quantity defined in (3.4), and when , we require large to make small.
- •
When there are many corruptions relative to signal sparsity (), smaller values of tend to perform better. Again, this is consistent with the theory in terms of the parameter .
4.1.2 Iteratively reweighted minimization
The results from the previous section show that our a priori postulated optimal values of are not optimal in practice; this suggests that an adaptive learning of may produce better results. See, for example, [7]. This section introduces an iteratively reweighted optimization procedure that effects this learning of .
We compute minimizers and using an initial value of . We then update based on and , and then recompute minimizers with the new . Such an approach not only allows for a single parameter to be updated, it also permits individual (i.e. non-equal) weights to be used for term in the regularization functional. This aims to enhance recovery performance by both iteratively estimating an optimal weighting between the coefficients and corruptions term, and iteratively estimating the support sets of and .
We outline the procedure below:
- •
Step 1. Set , for , and for . Prescribe noise tolerance and a small positive number .
- •
Step 2. Compute the solution to
[TABLE]
where and .
- •
Step 3. Update and as follows:
[TABLE]
- •
Step 4. If , set and go back to step 2, otherwise stop.
Numerical results in the bottom row of plots in Figures 1 and 2 show this approach (implemented with iterations) significantly improves the recovery over a fixed choice of . We therefore use this iteratively reweighted approach for optimization for all our simulations in the next section.
4.1.3 Large corruption values
This section is devoted to understanding the behavior of our algorithm with respect to the magnitude of the corruptions.
We run the same experiment as outlined at the beginning of Section 4.1 on Model 2 (the measurement matrix is a subsampled DFT matrix) using the iteratively reweighted algorithm outlined in Section 4.1.2. For this test, we vary between and , and choose the random variable defining the corruptions as a standard Cauchy random variable.666The point of generating from a Cauchy distribution is to show that measurement corruption by heavy-tailed distributions does not adversely affect the algorithm’s results.
A straightforward application of the iteratively reweighted algorithm in Section 4.1.2 when is very large produces suboptimal results. The reason for this is the scale differential between and , so that the algorithm heavily favors recovery of the corruptions and devotes little effort to recovering the signal. To overcome this limitation, we leverage a significant advantage of our algorithm: Corruption indices and values in the measurement vector are identified. This allows us to formulate a slight modification of the algorithm in Section 4.1.2:
Run the algorithm from Section 4.1.2, generating computed solutions and . 2. 2.
If , then return the solutions and . 3. 3.
If instead , then define a support set for the vector as
[TABLE]
and let equal to on and zero otherwise. 4. 4.
Remove the large corruptions from the measurements and resolve with the measurements . This yields a new solution pair and . Return and .
This procedure uses the algorithm to identify and remove highly corrupted measurements, and then uses another instance of the algorithm to accurately compute the signal. We use the procedure above with the choices and .
We can now generate a phase transition plot for a fixed value of . Figure 3 shows the transition plots for values , and . We see that the algorithm detects and removes corruptions just as well when as when .
- Remark 4.1
The iteratively reweighted procedure in (4.13) updates weights for both the corruptions () and the signal (). Since our focus here is recovery of the corruptions, one may wonder which set of weights is more influential. We have conducted tests in this direction by performing an experiment parallel to the results in Figure 3, where we iteratively update according to (4.13), but keep fixed at unity for all . Our results, shown in Figure 4, indicate that fixing the weights results in significant deterioration of the algorithm’s performance when . However, it results in notable improvement of the algorithm when or . In the context of soft faults, the behavior of the algorithm is more relevant since when it is likely that the corruptions can be easily identified and removed by other means. In this small- context, allowing both sets of weights and to vary appears to be beneficial. On the other hand, the deterioration of the algorithm for very large is an interesting phenomenon whose investigation we leave for future work.
4.2 Recovery of compressible polynomial Chaos expansions
In this section we test our algorithm on more realistic problems in UQ: sparse recovery of multivariate polynomial Chaos expansion coefficients with corrupted measurements. Polynomial chaos expansions (PCE) [40, 14] have become a popular means of quantifying parametric uncertainty in expensive computer simulations. To formulate our problem using our earlier notation, let denote a scalar-valued response of a model (e.g., a differential equation) where is a random parameter appearing in the model. The dependence of on thus encodes uncertainty in the response. We are interested in building the approximation , where are computable orthonormal polynomials constructed from the probability density of the random vector , and we wish to compute the unknown coefficients . In a CS recovery procedure, we construct samples of the random vector , collect the measurements , and then attempt to find a sparse coefficient vector minimizing , where is the measurement matrix with entries . The underlying assumption is that is expensive to evaluate so that should be as small as possible. To focus our study on the corruptions problem, we consider the case where the vector can have a sparse number of entries that are polluted by large-magnitude errors.
The models we consider here reflect the types of large scale models that are susceptible to soft failures. However, these test models can be evaluated repeatedly with almost zero probability of corruptions. Therefore, to simulate the effect of soft failures we randomly generate soft faults according to the corruptions model from Section 4.1. After constructing components of as , we pollute of these entries as described at the beginning of Section 4.1. In our tests below we fix a value , the ratio of corrupted measurements.
4.2.1 Genz test functions
We compare the algorithm presented in this paper against a classical minimization approach in the presence of measurement corruptions for the purposes of computing compressible PCE expansion coefficients of a function. A classical minimization algorithm sets the corruptions vector in (2.3) and minimizes over all .
Our function will be one of the multidimensional test functions used by Genz [13]. For , , we investigate computing expansion coefficients for the following two functions on the hypercube :
[TABLE]
We use and in our tests, with the dictionary elements given by tensor-product Chebyshev polynomials of total degree and , respectively, over . We set the corruptions ratio to the value uniformly over all tests, and vary the corruptions magnitude . After computing a coefficient vector solving either a classical problem or (2.3), we compute a discrete error metric defined by
[TABLE]
where for each test, and are iid samples drawn from the product Chebyshev distribution over .
Figure 5 shows the result of this test. (See the figure caption for additional details of the test.) The results indicate that when corruptions are present, a standard minimization algorithm suffers severe degradation of the quality of the computed expansion coefficients. However, the corruptions algorithm of this paper is able to compute accurate coefficients in the presence of corruptions, whether they have large or small magnitude.
This example shows that there may be a penalty for using our algorithm when no corruptions are present. This is mostly easily noticed in the product peak example with no corruptions (): The corruptions algorithm of this paper computes a PCE that is less accurate than the result using a standard minimization approach. (Compare the black lines in row 3 of Figure 5.)
4.2.2 Damped Harmonic Oscillator
In this section we investigate the fault-tolerance of our algorithm for recovery of PCE coefficients in a damped linear oscillator subject to external forcing with six unknown parameters. The model is
[TABLE]
where we assume the damping coefficient , spring constant , forcing amplitude and frequency , and the initial conditions and are all uncertain, defining components of a 6-dimensional random vector . We solve (4.14) analytically to circumvent the impact of discretization errors in our study.
Defining , we restrict the components of to the following ranges:
[TABLE]
We define to be the range of defined by the product of these intervals. For any parameter realization in the harmonic oscillator is underdamped. In the following, we choose our quantity of interest as . We set the corruptions magnitude as the mean of the function, i.e. .
Figure 6 compares, as a function of the number of measurements, the error in classical recovery for uncorrupted sparse recovery versus the iteratively reweighted version of the sparse corruptions optimization proposed in Section 4.1.2. The results show that the sparse corruptions optimization notably outperforms standard minimization when corruptions are present, and is competitive without corruptions.
In Figure 7 we run the iteratively reweighted sparse corruptions optimization but vary the corruptions rate , and the corruptions magnitude . The left-hand plot shows predictable behavior: increasing corruptions has deleterious effects on the error in recovery, but notably the algorithm is reasonably stable for increasing . The right-hand plot shows that the algorithm is relatively insensitive to the magnitude of the corruptions.
5 Summary and conclusion
We have developed novel theoretical guarantees and algorithms for recovery of sparse or compressible signals where measurements have been polluted by high-magnitude corruptions. Our results are uniform theoretical recovery estimates for general linear systems where the measurement matrix satisfies a corruptions-based RIP-like condition.
We have refined an existing regularized minimization algorithm into an iteratively reweighted minimization algorithm that shows superior performance for the examples that we have investigated. An application of these examples to recovery of polynomial Chaos expansions from model UQ problems illustrates that our algorithms are resistant to highly-corrupted measurement data that may result from hardware or software faults in modern large-scale parallel computing paradigms.
Empirical tests suggest that refinements of our algorithm is relatively stable with respect to the magnitude of the corruptions, but our theory is not applicable to these algorithmic refinements and some observed behavior (e.g., Remark 4.1.3) remains theoretically unexplained, which can be the subject of future explorations.
Acknowledgments
B. Adcock thanks Simone Brugiapaglia and Xiaodong Li for helpful discussions. The authors acknowledge an anonymous referee whose report led to the investigations outlined in Remark 4.1.3.
Sandia National Laboratories is a multimission laboratory managed and operated by National Technology and Engineering Solutions of Sandia, LLC., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. The views expressed in the article do not necessarily represent the views of the U.S. Department of Energy or the United States Government.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Adcock. Infinite-dimensional compressed sensing and function interpolation. Found. Comput. Math. (to appear) , 2017.
- 2[2] B. Adcock. Infinite-dimensional ℓ 1 superscript ℓ 1 \ell^{1} minimization and function approximation from pointwise data. Constr. Approx. (to appear) , 2017.
- 3[3] B. Adcock, A. C. Hansen, C. Poon, and B. Roman. Breaking the coherence barrier: A new theory for compressed sensing. Forum of Math., Sigma (to appear) , 2017.
- 4[4] B. Adcock, A. C. Hansen, and B. Roman. The quest for optimal sampling: computationally efficient, structure-exploiting measurements for compressed sensing. In Compressed Sensing and Its Applications . Springer, 2015.
- 5[5] A. Bastounis and A. C. Hansen. On the absence of the RIP in real-world applications of compressed sensing and the RIP in levels. ar Xiv:1411.4449 , 2014.
- 6[6] P. G. Bridges, K. B. Ferreira, M. A. Heroux, and M. Hoemmen. Fault-tolerant linear solvers via selective reliability. ar Xiv:1206.1390 [cs, math] , June 2012. ar Xiv: 1206.1390.
- 7[7] E. J. Candès, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted ℓ 1 subscript ℓ 1 \ell_{1} minimization. J. Fourier Anal. Appl. , 14(5):877–905, 2008.
- 8[8] A. Chkifa, N. Dexter, H. Tran, and C. Webster. Polynomial approximation via compressed sensing of high-dimensional functions on lower sets. Technical Report ORNL/TM-2015/497, Oak Ridge National Laboratory, 2015.
