A Near-Optimal Sampling Strategy for Sparse Recovery of Polynomial Chaos Expansions
Negin Alemazkoor, Hadi Meidani

TL;DR
This paper introduces a near-optimal sampling strategy for polynomial chaos expansions that improves the accuracy of surrogate models by optimizing sample locations, outperforming existing methods in numerical tests.
Contribution
It proposes a new sampling strategy that enhances local-coherence and cross-correlation properties, leading to more accurate polynomial chaos surrogates.
Findings
Significantly improves surrogate accuracy over existing sampling methods
Enhances local-coherence and cross-correlation properties of measurement matrices
Numerical examples demonstrate superior performance of the proposed strategy
Abstract
Compressive sampling has become a widely used approach to construct polynomial chaos surrogates when the number of available simulation samples is limited. Originally, these expensive simulation samples would be obtained at random locations in the parameter space. It was later shown that the choice of sample locations could significantly impact the accuracy of resulting surrogates. This motivated new sampling strategies or design-of-experiment approaches, such as coherence-optimal sampling, which aim at improving the coherence property. In this paper, we propose a sampling strategy that can identify near-optimal sample locations that lead to improvement in local-coherence property and also enhancement of cross-correlation properties of measurement matrices. We provide theoretical motivations for the proposed sampling strategy along with several numerical examples that show that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A near-optimal sampling strategy for
sparse recovery of polynomial chaos expansions
Negin Alemazkoor
Hadi Meidani
Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, Illinois, USA.
Abstract
Compressive sampling has become a widely used approach to construct polynomial chaos surrogates when the number of available simulation samples is limited. Originally, these expensive simulation samples would be obtained at random locations in the parameter space. It was later shown that the choice of sample locations could significantly impact the accuracy of resulting surrogates. This motivated new sampling strategies or design-of-experiment approaches, such as coherence-optimal sampling, which aim at improving the coherence property. In this paper, we propose a sampling strategy that can identify near-optimal sample locations that lead to improvement in local-coherence property and also enhancement of cross-correlation properties of measurement matrices. We provide theoretical motivations for the proposed sampling strategy along with several numerical examples that show that our near-optimal sampling strategy produces substantially more accurate results, compared to other sampling strategies.
††journal: Elsevier
1 Introduction
In order to facilitate stochastic computation in analysis and design of complex systems, analytical surrogates that approximate and replace full-scale simulation models have been increasingly studied. One of the most widely adopted surrogates is the polynomial chaos expansion (PCE), which approximates the quantity of interest (QoI) by a spectral representation using polynomial functions of random parameters [1, 2, 3, 4]. In estimating these spectral surrogates, non-intrusive stochastic techniques, based on either spectral projection or linear regression, are widely used especially because they don’t require modifying deterministic solvers or legacy codes, which is an otherwise cumbersome task [5]. These non-intrusive techniques are still the subject of ongoing research as the number of required samples for accurate surrogate estimation rapidly grows with the number of random parameters, even when efficient techniques such as sparse grid are used [6, 7, 8, 9].
More recently, researchers have developed techniques, based on compressive sampling (CS), that are particularly advantageous when surrogate expansions are expected to be sparse, i.e. the QoI can be accurately represented with a few polynomial chaos (PC) basis functions. Compressive sampling was first introduced in the field of signal processing to recover sparse signals using a number of samples significantly smaller that the conventionally used Shannon-Nyquist sampling rate [10, 11, 12]. Motivated by the fact that the solution of many high dimensional problems of interest, such as high dimensional PDEs, can be represented by sparse, or at least approximately sparse, PCEs, CS was proposed in [13, 14, 15] to estimate PC coefficients in underdetermined cases. As CS theorems suggest, the success of sparse estimation of PCE depends not only upon the sparsity of the solution of stochastic system, but also on the coherence property of the Vandermonde-like measurement matrix, formed by evaluations of orthogonal polynomials at sample locations [10], as will be elaborated later.
Several efforts have been made in order to improve the two mentioned conditions for successful recovery. For instance in [16], for Hermite expansions with Gaussian input variables, the original inputs are rotated such that a few of the new coordinates, i.e. linear combinations of original inputs, have significant impact on QoI, thereby increasing the sparsity of solution and, in turn, the accuracy of recovery. The second condition, i.e. coherence of the measurement matrix, can be poor especially when trial expansions are high-order and/or high-dimensional. To remedy this, the iterative approaches in [17, 18] can be used to optimally include only the “important” basis functions into the trial expansion and its associated measurement matrix. Focusing on this second condition, another class of methods have proposed sampling strategies that produce less coherent measurement matrices [19, 20, 21]. Among these approaches, the sampling strategy proposed in [21] was designed to be optimal in achieving the lowest local-coherence.
In this work, we introduce a near-optimal sampling strategy by further improving the local-coherence-based sampling of [21] and filtering sample locations based on cross-correlation properties of the resulting measurement matrix. Specifically, we establish quantitative measures to capture these cross-correlation properties between measurement matrix columns, and use these measures as the criteria for near-optimal identification of sample locations. It will be demonstrated that a sampling strategy that seeks to optimize these measures will lead to CS results that on average outperforms all other CS sampling strategies. This paper is organized as follows. Section 2 presents general concepts in compressive sampling and its theoretical background. In Section 3, we introduce our sampling algorithm along with relevant theoretical supports. Finally, Section 4 includes numerical examples and discussions about the advantages of the proposed approach.
2 Setup and background
2.1 Polynomial chaos expansion
Let be a tensor-product domain that is the support of , where is the vector of independent random variables, i.e. and . Also, let be the probability measure for variable and let . Given this setting, the set of univariate orthonormal polynomials, , satisfies
[TABLE]
where , and is the delta function. Therefore, the density function of , , determines the type of polynomial. For example, Gaussian and uniform probability distributions enforce Hermite and Legendre polynomials, respectively. The -dimensional orthonormal polynomials are then derived from the multiplication of one dimensional polynomials in all dimensions. For example,
[TABLE]
Consequently, we have
[TABLE]
Using this construction, any function that is square-integrable can be represented as
[TABLE]
where is the set of orthonormal basis functions satisfying Equation (3). However, for computation’s sake, is approximated by a finite order truncation of PC expansion given by
[TABLE]
where is the total order of the polynomial expansion and is the set of multi-indices defined as
[TABLE]
The cardinality of , i.e. the number of expansion terms, here denoted by , is a function of and according to
[TABLE]
Given this setting, approximates in a proper sense and is referred to as the -th degree PC approximation of . Each of the coefficients involved in the definition of can be exactly calculated by projecting onto the associated basis function:
[TABLE]
Numerical approaches such as quadrature rule or sparse grid are usually used to approximate the above integral as its exact evaluation might be very cumbersome or not even possible [5, 22]. However, as the dimensionality of the problems increases, the number of samples required by these numerical techniques for an accurate integral evaluation increases exponentially. Supplying such a large sample size is prohibitively costly, especially if these samples are drawn from expensive high-fidelity simulations or costly experiments. As a result, efforts have been made to develop adaptive approaches that reduce the number of required samples by placing fewer samples on dimensions that do not impact QoI significantly [23, 24]. Another widely used approach to estimate PC coefficients is linear regression. In order to use linear regression, the number of samples must be equal or greater than the number of PC expansion terms. The well accepted oversampling rate is around 1.5 to 3 times the number of unknown coefficients. Recently, a quasi-optimal sampling approach was introduced in [25] that results in accurate regression with oversampling.
As briefly mentioned in Section 1, when the QoI is sparse with respect to the PC basis functions, compressive sampling approach have proved effective in estimating expansion coefficients using a number of samples that is significantly smaller than the number of PC terms. This section follows with the review on the basics of sparse PC recovery using compressive sampling.
2.2 Stochastic collocation using compressive sampling
Compressive sampling was first introduced in the field of signal processing, where conventionally the number of samples required to recover a signal was determined by Shannon-Nyquist sampling rate [10]. Compressive sampling allows for successful signal recovery using significantly fewer samples when the signal of interest is sparse. Therefore, it has been extensively applied in cases where the number of available samples is limited [26, 27, 28, 29]. Due to this very advantage, compressive sampling has recently gained substantial attention in UQ, more specifically in stochastic collocation, where a sample set is used to build an analytical surrogate model (typically in the form of a PC expansion) [13, 14, 21, 30, 31]. In what follows, a formal brief background on the use of compressive sampling for PCE estimation is provided.
The objective of PCE estimation is to calculate the vector of unknown expansion coefficients in Equation (5), given samples of the QoI, denoted by the data vector . These sampled outputs are evaluated at the realizations, , of model input . Requiring to approximate results in the following system of equation,
[TABLE]
where is the measurement matrix, constructed according to
[TABLE]
We are interested in the underdetermined case where . Under such condition, there may exist infinity many solutions for . Compressive sampling approach can be readily borrowed to find the sparsest solution, by formulating the sparse recovery problem as
[TABLE]
where indicates the -norm, i.e. the number of non-zero terms. However, since -norm is non-convex and discontinuous, the above problem is NP-hard. Therefore, minimization is usually replaced by its convex relaxation where -norm of the solution is minimized instead, i.e.
[TABLE]
minimization is the closest convex problem to minimization. It has been shown that when is sufficiently incoherent, as will be explained later, and is sufficiently sparse, the solution of minimization is unique and is equal to the solution of minimization [32]. The minimization in (12) is named basis pursuit [33], and it can be solved using linear programming. If the measurements are known to be noisy, the problem can be reformulated as
[TABLE]
which is known as the basis pursuit denoising problem. controls the accuracy tolerance and can be prescribed if the distribution of measurement noise is known. For example, if the measurement noise follows a normal distribution with zero mean and standard deviation , then is naturally set to be . When instead of measurement samples, simulation samples are used in the PC estimation, and a distribution for numerical or modeling error is absent, one may use cross-validation to determine the tolerance parameter . This section continues with a brief review on theorems developed on recoverability of compressive sampling and efforts heretofore made to improve the accuracy of PC estimation using compressive sampling.
2.3 PC recoverability using compressive sampling
Whether desired solution can be recovered from compressive sampling or not mainly depends on the properties of the measurement matrix . One of the properties that are shown to be significantly relevant is the Restricted Isometry Property (RIP) or the -restricted isometry constant for the measurement matrix [34]. Formally speaking the -restricted isometry constant for a matrix is defined to be the smallest such that
[TABLE]
for every submatrix , of and every vector . Thus defined, a small RIP constant for a measurement matrix will lead to a situation wherein for any given sparse signal, energy of the measured signal is not very different from the energy of the given signal itself; hence the appeal of a small RIP constant. The following theorem provides an upperbound on RIP constant so that the optimization problem (13) leads to accurate recovery.
Theorem 1** ([34]).**
Let with RIP constant such that . For a given , and noisy measurement , , let be the solution of
[TABLE]
Then the reconstruction error satisfies
[TABLE]
where and only depend on and is the vector with all but the s-largest entries set to be zero. If is s-sparse and the measurements are noiseless, then the recovery is exact.
Calculating RIP constant for a given matrix is an NP-complete problem [35]. The following theorem gives a probabilistic upperbound on RIP constant for bounded orthonormal systems. Let be an orthonormal bounded system of functions on , where is endowed with a probability measure . Specifically, we have
[TABLE]
and the following uniform bound,
[TABLE]
Consequently, an orthonormal polynomial system defined in (3) is a bounded orthonormal system if it is uniformly bounded by
[TABLE]
for some constant . Hereinafter, let us refer to the bound as the local-coherence.
Theorem 2** ([36]).**
Let be a bounded orthonormal system satisfying Equations (17) and (18). Also let be a measurement matrix with entries , where are random samples drawn from measure . Assuming that
[TABLE]
then with probability at least the RIP constant of satisfies . The are universal constants.
It is worthwhile to highlight the differences between these two theorems. Theorem 1 is a deterministic theorem, while Theorem 2 is probabilistic. Theorem 1 provides a deterministic guarantee that is universal, in the sense that once a specific RIP property is satisfied for samples of a class of measurement matrices, then the corresponding error bounds hold for all kinds of target expansions and recovery is always exact for cases with noiseless measurement and -sparse target. On the other hand, when the condition of Theorem 2 is satisfied, i.e. sufficient number of samples are provided, the recovery accuracy can be achieved with high probability, and not with probability 1.
3 Near-optimal sampling strategy
Suppose that a limited sampling budget allows for samples to be drawn. Our objective, then, is to optimally select locations in the parameter space, at which samples of QoI are computationally or experimentally evaluated. Let us pose a slightly different, but closely related, question: out of a sufficiently large pool of candidate sample locations, how can one identify the “optimal” locations? The sufficiently large number can be determined such that an accurate regression for the given stochastic system is achieved. In what follows, we first discuss the sampling strategies that aim at improving the local-coherence. This will then be followed by discussions about other relevant properties, in particular those of the measurement matrix, that have implications on accuracy and stability of sparse recovery, and how a ‘near-optimal’ sampling strategy can be designed to improve these relevant properties.
3.1 Sampling strategies focusing on local-coherence
As is evident by the theoretical results of Section 2.3, specifically in Theorem 2, local-coherence has an implication on recovery accuracy. This has motivated researchers to develop new sampling strategies such that this property is improved. The conventional sampling approach, namely the standard (random) sampling approach, involves drawing random realizations from uniform and normal distributions for Legendre and Hermite PC expansions, respectively. In [19], it was recommended to use a preconditioned Legendre measurement matrix, where samples are taken from Chebyshev distribution, instead of a uniform distribution, to improve accuracy. With respect to this new sampling distribution, new orthogonal basis functions were calculated by scaling or weighting the Legendre polynomials. It was shown in [30] that when system dimensionality is larger than polynomial order , the Chebyshev preconditioning approach will result in a polynomial system with a larger local-coherence, , and subsequently in less accurate results compared with that from standard sampling. An alternative sampling strategy was proposed in [20], where the Legendre polynomial system was first turned into a discrete orthogonal system using Gauss quadrature points, from which samples are drawn randomly for sparse recovery. But, it was shown that the new discretized orthogonal system improves the bound only when , and for , recovery accuracy will be lowered than that from standard sampling.
In [21], a coherence-optimal, or to be compliant with the terminology in this paper, a local-coherence-optimal sampling approach was introduced, where instead of directly sampling from the probability measure, , samples are drawn from a different “optimal” probability measure, , constructed according to
[TABLE]
where is a normalizing constant and
[TABLE]
Corresponding to this probability measure, the weight function, used to retrieve the orthogonality of the basis functions, should be
[TABLE]
Accordingly, the -minimizations in (12) and (13) were adjusted to include the weight functions. The following weighted -minimizations were then used for sparse recovery
[TABLE]
[TABLE]
where is the diagonal weight matrix, with for . It was also proved in [21] that this sampling approach results in the lowest local-coherence among all sampling approaches, and can thus outperform standard sampling for both cases, i.e., and .
In what follows, we discuss how this strategy can be further improved by considering other relevant properties with implications on recovery accuracy.
3.2 Sampling strategies focusing on measurement matrix properties
It is obvious that to recover all -sparse vectors from observation vector , using the minimization problem in (11), we must have for any pair of distinct -sparse vectors . In other words, we want to lead to distinct observation vectors. For this to happen, the measurement matrix should not satisfy , for any vector that has 2 or fewer nonzero terms. This means that the null space of should contain no vector with or fewer nonzero terms. In compressive sampling, this property is mostly characterized using spark [37]. The spark of a matrix is the smallest number of its columns that are linearly dependent. Using the definition of spark, it is then straightforward to guarantee that the solution of minimization given in (11) exactly recovers the -sparse if spark [38]. Moreover, it can be shown that in this case the minimization in (12) also recovers the exact solution, [39].
Clearly, a measurement matrix with a larger spark allows exact recovery using compressive sampling for a larger set of target signals (solution vectors). Therefore, it is desired to maximize the spark of measurement matrices. However, computing the spark of a matrix is an NP-hard problem [40]. As an alternative, one can analyze recovery guarantees using alternative properties which are easier to compute. One such property is the mutual-coherence, which for a given matrix is defined as the maximum absolute normalized inner product, i.e. cross-correlation, between its columns [13, 41]. Let be the columns of matrix . The mutual-coherence of matrix , denoted by , is then given by
[TABLE]
The following simple proposition explains how spark and mutual coherence are related.
Proposition 1** ([42]).**
For any matrix the following holds:
[TABLE]
Mutual-coherence is an indicator of the worst interdependence, i.e. maximum cross-correlation between the columns of a matrix. It is zero for an orthogonal matrix and is strictly positive when . Proposition 1 makes it obvious that a small value for mutual-coherence is desired. It may be concluded that measurement matrix should be designed in a way that its mutual-coherence, i.e. its maximum cross-correlation, is minimized. However, it has been observed that minimizing maximum cross-correlation does not necessarily improve the recovery accuracy of compressive sampling [39, 43]. This is because Equation (27) only provides a lower bound on the spark. Therefore, minimizing mutual-coherence considers only the worst-case scenario and fails to account for other possibilities for improving compressive sampling performance [43].
Our objective in this work is to design a measurement matrix which leads to better compressive sampling performance on average, and not merely in the worst-case scenario. To this end, we need to determine which property of the measurement matrix should be optimized. This property should ideally be one that directly and sufficiently controls the accuracy of compressive sampling. A widely-accepted property is the RIP constant, as suggested by Theorem 1. As previously mentioned, calculating RIP constant for a given matrix is an NP-complete problem. However, it is known that the eigenvalues of each column-submatrix can be bounded by the RIP constant,
[TABLE]
Therefore, one may suggest to minimize the condition number of all column-submatrices with or fewer columns. The challenge, however, is that the sparsity level of solution, , is not known in advance. Moreover, calculating such a combinatorial measure is not a trivial task and can be computationally impossible [43].
As a result, establishing a single matrix property that sufficiently guarantees the accuracy of compressive sampling method and is easily computable still remains an open challenge [39, 43]. To sidestep this challenge, efforts have focused on identifying properties or measures that are relatively better than maximum cross-correlation, or mutual-coherence. In [43], for the first time, it was suggested that a -averaged mutual-coherence be minimized. Denoted by , the -averaged mutual-coherence for a measurement matrix is defined as the average of cross-correlations larger than threshold , i.e.
[TABLE]
where is the indicator function, is the th component of the Gram matrix , and is the column-normalized version of . It has been shown that recovery accuracy can be significantly improved if instead of a random measurement matrix, the measurement matrix is optimized based on . However, it was later shown in [44] that a -optimized measurement matrix is not robust when target signals are attached to some noise, and as such, not exactly sparse.
In order to improve this robustness, efforts have focused on optimizing measurement matrices by considering all the cross-correlations, and not the ones larger than a threshold. Specifically, these efforts seek to minimize the distance between the Gram matrix, , and the corresponding identity matrix. With denoting the Frobenius norm, it has been shown that the measurement matrix determined by solving the following minimization problem
[TABLE]
can lead to a signal recovery that is both more accurate and more robust compared with that using the -optimized matrix [39, 45, 46, 47]. In what follows, we propose our near-optimal sampling strategy which considers measures of maximum and average cross-correlation of measurement matrices, and optimizes sample locations accordingly.
3.3 Near-optimal sampling strategy
To achieve a better robustness, as argued earlier, we aim to select sample locations that collectively minimize a measure of average cross-correlation given by
[TABLE]
where is the total number of column pairs. For the sake of brevity, we refer to as the average cross-correlation, even though precisely speaking, it is the average of squares of cross-correlation. It should be noted that minimizing this average cross-correlation alone will not necessarily result in the smallest maximum cross-correlation, and the mutual-coherence could still be undesirably large and the sparse recovery significantly inaccurate. As a remedy, we seek to minimize both the average cross-correlation and the mutual-coherence , simultaneously in a multi-objective problem.
To solve the optimization problem in (30) for PC measurement matrix, an option is to adopt the solution approaches proposed for (30). These are mainly iterative approaches based on gradient descent method [45] or bound-optimization method [46]. However, these algorithms are applicable for measurement matrices that are less restricted compared to PC measurement matrix, in that their components are not constrained to be evaluations of specific orthogonal basis functions at certain sample locations, and as such can be freely optimized. Therefore, a greedy algorithm is the natural option for solving (30). Our greedy algorithm for near-optimal sampling strategy begins by populating a large pool of candidate sample locations in the multidimensional parameter space, and seeks to find the near-optimal locations and the corresponding measurement matrix. We select the first sample location randomly from the candidates pool, and this constitutes the first row in the measurement matrix. At each step of the algorithm, a new sample location, or matrix row is added. This is done by searching through the candidates pool for the best sample location. In this two-objective optimization problem, we select the “best” sample location as the one with the smallest normalized distance with respect to the utopia point [48, 49]. The utopia point is an abstract point in our two-dimensional objective space, whose first and second coordinates are the smallest maximum cross-correlation and smallest average cross-correlation. Note that this abstract point is typically not among the Pareto optimal solutions of the two-objective optimization problem, and is therefore not attainable as an optimal solution. This is why we select from the sample locations pool the location that is closest to the utopia point, i.e. the sample location whose corresponding average cross-correlation and maximum cross-correlation have the smallest normalized distance with the coordinates of the utopia point, as is elaborated in the following pseudo-code.
Let denote the measurement matrix associated with the large pool of candidate locations, and the near-optimal row-submatrix of the “pool” matrix . This submatrix is identified through incremental row-concatenation, as shown in Algorithm 1. In this pseudo-code, of size denotes the near-optimal submatrix at the th step of the algorithm, and represents the th row in the “pool” matrix. Also, at each iteration, and are the maximum cross-correlation and average cross-correlation, respectively recorded for candidate location .
As discussed in Section 2.3, it has been shown that local-coherence-optimal sampling improves the PC coefficients recovery accuracy, over standard sampling. To exploit this advantage, we use the local-coherence-optimal sampling strategy to generate the large pool of candidate sample locations. In order to retain orthogonality, we just need to substitute with in Algorithm 1, where is the diagonal weight matrix defined earlier in Section 3.1. Compared to standard sampling and local-coherence-optimal sampling, our sampling strategy incurs extra computational cost due to additional row selections from the candidates pool in the greedy algorithm. However, this additional cost is typically negligible with respect to the cost of sample collection, especially when the benefit of improved accuracy is also considered.
We refer to the proposed sampling approach as near-optimal sampling strategy. The reason we do not use the term ‘optimal’ is two-fold: First, even though studies in signal processing have shown significant improvement in recovery accuracy by minimizing (31), this orthogonality property does not by itself establish a sufficient criterion for recovery accuracy of compressive sampling (a measure that is both sufficient and tractable has not been identified in the literature.) Second, our approach can be sensitive to (i) the random choice of candidate locations and (ii) the random choice of the initial location (first row in the submatrix), and is therefore not fully deterministic, and as such not optimal. In the next section, we provide numerical examples to demonstrate advantages of our proposed sampling strategy.
4 Numerical examples
To demonstrate the advantage of the near-optimal sampling strategy over other sampling approaches, four target functions are considered in this section: (i) a low-dimensional high-order polynomial function, (ii) a high-dimensional low-order polynomial function, (iii) a six-dimensional generalized Rosenbrock function, and (iv) the solution to a stochastic diffusion problem. This would allow for a comprehensive comparison of the sampling strategies for a variety of models with different combinations of dimension and order. The first three target functions are chosen to be exactly -sparse (or sparse, in short), whereas the last one is approximately -sparse (or compressible, in short). In all examples, optimization problems (12) and (24), corresponding to a noise-less measurement setting, were formed and solved using the SPGL1 package [50]. Also, coherence-optimal samples were generated using the ‘coh_opt’ package developed by the authors of [21]. For the sake of brevity, we only report the results for Legendre polynomial expansions in this paper, but note that similar improvements in recovery accuracy were observed when near-optimal sampling was applied to Hermite polynomial expansions. In all these examples, the proposed near-optimal sampling approach is compared with (i) standard sampling, and (ii) local-coherence-optimal sampling, or in short, ‘coherence-optimal’ sampling.
4.1 Low-dimensional high-order sparse PCE
Let us consider the target function, , to be a sparse 20th-order Legendre polynomial expansion in a two-dimension random space with uniform density on , manufactured according to
[TABLE]
Using limited samples from this exact function, the objective is to recover the sparse coefficient vector. In selecting the sample locations to form the measurement matrix and data vector, the proposed near-optimal sampling approach is compared versus random sampling strategies, namely (i) standard sampling and (ii) coherence-optimal sampling. The candidates pool in the near-optimal sampling approach includes coherence-optimal samples. For all three approaches, we report the performance results obtained by 100 independent runs. This is to capture the variability induced by small sample size in standard and coherence-optimal sampling approaches. In the near-optimal approach, these independent runs allow us to account for (i) the sensitivity of performance with respect to the choice of initial sample location, and (ii) the variability induced by the finite size of the candidate pool, and (iii) the variability induced by the fact that our cross-correlation measures do not necessarily guarantee CS recovery accuracy.
Figure 1(a) shows the median, and 1st and 3rd quantiles of relative error and Figure 1(b) shows the mean of relative error. The relative error is calculated as , where is the exact coefficient vector and is the solution of minimization in (12). As is apparent in Figure 1(a) and 1(b) , coherence optimal sampling results in a smaller error, compared to standard sampling, as the samples are drawn from a distribution with a smaller bound defined in (19). However, our proposed sampling outperforms the coherence-optimal one. To explain this, Figures 1(c) and 1(d) show the median, and 1st and 3rd quantiles of mutual-coherence and average cross-correlation for the measurement matrix, respectively. It can be seen that near-optimal sampling beats the other two approaches in both measures, leading to the higher observed accuracy in the PCE estimation.
4.2 High-dimensional low-order sparse PCE
As a contrasting example, let us consider the target function, , to be a sparse second order Legendre polynomial expansion in a 20-dimensional random space with uniform density on , manufactured according to
[TABLE]
To compare the performance of near-optimal sampling with standard and coherence optimal sampling on this target function, the numerical results were obtained under a setting similar to that in the previous example: 100 independent runs were used for all three approaches, and 100,000 coherence-optimal samples constituted the initial sample pool in the near-optimal approach of Algorithm 1. Figure 2(a) shows the median, and 1st and 3rd quantiles of relative error. Figure 2(b) shows the mean of relative error. Figure 2(c) and 2(d) show the improvement in the median, and 1st and 3rd quantiles of mutual-coherence and average cross-correlation, respectively, when the near-optimal sampling is used, which has translated into the observed improvement in recovery accuracy.
It should be highlighted that standard sampling is mostly suitable for high-dimensional and low-order cases such as this example [13]. It should also be noted that in the examples in this section and Section 4.1, the measurement matrices have the same number of columns (); however, as it can be seen in Figures 2(c) and 2(d), the high-dimensionality of this example has resulted in smaller mutual-coherences and average cross-correlation for various sample sizes. Consequently, we expect to observe that recently developed sampling strategies do not outperform standard sampling in these high-dimensional low-order problem cases [20, 51]. However, as it can be seen in figures 2(a) and 2(b), the near-optimal sampling approach does outperform both standard and coherence optimal sampling. Although, this improvement in terms of mean or median of relative error may not seem significant, it should be noted that this is a rather “extreme” case, where not only the response is high-dimensional, but its order is also very low. This could be thought of as a lower bound for the improvement offered by the near-optimal sampling approach. That is, the fact that the near-optimal sampling strategy still shows improvement, even non-significant, for such extreme cases where standard sampling is supposed to work well, can be considered as a numerical evidence that it will most likely outperform other sampling strategies in all the other cases.
As we already discussed in Section 3.2, in order to have exact recovery we need the spark of measurement matrix to be larger than two times the number of non-zero coefficients. At small sample sizes none of the sampling approaches meet this requirement. Therefore, as it can be seen in figures 2(a) and 2(b), at low sample sizes, all three sampling strategies recover poor approximations. As the number of sample increases the chance to meet this requirement also increases; hence the variability in recovery accuracy also increases. In near-optimal sampling approach we select sample locations such that the orthogonality of measurement matrix is improved, thereby enhancing the probability of achieving a larger spark and consequently exact recovery. To further clarify this, we demonstrate the advantage of near-optimal sampling by comparing another performance measure, namely the success rate. The success rate, here, is defined as the ratio of trials, out of the total 100 trials, that result in a relative error smaller than . Figure 2(e) shows that near-optimal sampling consistently results in a higher success rate, and is thus expected to produce accurate recoveries, more frequently.
4.3 Generalized Rosenbrock function
Compared to the previous two examples with “extreme” dimension and order combinations, the third target function can be chosen to be of “moderate” combination of dimension and order. Specifically, let us consider the following 6-dimensional generalized Rosenbrock function with random inputs following a uniform density on ,
[TABLE]
The objective is to recover the corresponding sparse Legendre polynomial expansion. Similar setting is used for comparison of the three sampling strategies: 100 independent runs for all sampling approaches, together with a candidate pool of 100,000 coherence-optimal samples for near-optimal sampling of Algorithm 1. Figure 3(a) demonstrates the improvements in the median, and 1st and 3rd quantiles of relative error when near-optimal sampling is used. Figure 3(b) demonstrates the improvements in terms of the mean of relative error. Figure 3(c) and 3(d) show that near-optimal sampling results in smaller mutual-coherence and average cross-correlation, in terms of their median, and 1st and 3rd quantiles. Similar to the previous example, we note that at small sample sizes all approaches result in equally inaccurate recoveries as all approaches fail to achieve a measurement matrix with sufficiently large spark, i.e. spark. A larger sample sizes results in a larger lower bound for the spark of measurement matrix and a higher probability to achieve the critical spark value. Therefore, we observe more variability in recovery accuracy as the sample size increases. Figure 3(e) shows that using near-optimal sampling leads to significant improvement in the success rate, i.e. the ratio of trials with relative errors smaller than . This is a direct result of improving the orthogonality of measurement matrix, i.e. enhancing the chance of achieving critical spark value in near-optimal sampling.
4.4 Stochastic diffusion problem
In this example we consider a stochastic diffusion problem in a one dimensional physical domain, given by
[TABLE]
We assume that the diffusion coefficient, , takes the following analytical form
[TABLE]
We consider to be uniformly distributed on and consider the solution of diffusion problem at to be the quantity of interest. We use 3rd-order Legendre polynomial expansion to approximate the QoI and employ the three sampling approaches to estimate the coefficients of expansion. Similar setting to previous examples is used here: 100 independent runs for all sampling approaches, together with a candidate pool of 100,000 coherence-optimal samples for near-optimal sampling of Algorithm 1. For this example, we define the relative error to be , where is the vector including the exact solutions calculated at 1000 new random samples and includes the approximate solutions calculated by evaluating the PC expansion at the same 1000 samples. Figure 4(a) demonstrates the improvement in the median, and 1st and 3rd quantiles of relative error when near-optimal sampling is used. Figure 4(b) shows the improvement in the mean of relative error. The mutual-coherence and average cross-correlation are also compared in figures 4(c) and 4(d), respectively, in terms of their medians, and 1st and 3rd quantiles. To compare the success rates, since the target expansion is not exactly sparse, we consider a looser definition for successful recovery. This is justified by noting that the error levels in Figure 4(a) are relatively larger than those those in Figures 2(a) and 3(a). Accordingly, we define a recovery to be successful when its relative error is smaller that , and show the resulting success rates in Figure 4(e). These results show that success rates can be improved significantly by using near-optimal sampling.
5 Conclusion
In this paper, we presented a new sampling strategy, or a design-of-experiment technique, for selecting sample locations for sparse estimation of PCEs using compressive sampling. The sample locations are selected such that (i) the local-coherence property is improved, and (ii) the resulting measurement matrix has the smallest mutual-coherence and the smallest average cross-correlation between its columns. It was discussed how the latter two measures have implications on recovery accuracy and numerical results were presented to support it. A greedy algorithm was introduced that selects a prescribed number of near-optimal locations out of a large pool of candidate locations. The resulting measurement matrix is claimed to be near-optimal, rather than optimal, because the two aforementioned cross-correlation measures may not be the only measures that control the recovery accuracy, and therefore minimizing them does not guarantee optimal performance. Another reason why our algorithm can be sub-optimal is that the greedy algorithm is inherently a non-deterministic algorithm, and as such is susceptible to variabilities induced by random choice of location candidates and initial (first) sample location in the algorithm. Four numerical examples with various combinations of dimensionality and order demonstrated the advantages of our proposed sampling strategy over other sampling strategies, in terms of accuracy and robustness.
6 References
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. G. Ghanem, P. D. Spanos, Stochastic finite elements: a spectral approach, Courier Corporation, 2003.
- 2[2] D. Xiu, G. E. Karniadakis, The Wiener–Askey polynomial chaos for stochastic differential equations, SIAM journal on scientific computing 24 (2) (2002) 619–644.
- 3[3] M. K. Deb, I. M. Babuška, J. T. Oden, Solution of stochastic partial differential equations using Galerkin finite element techniques, Computer Methods in Applied Mechanics and Engineering 190 (48) (2001) 6359–6372.
- 4[4] I. Babuska, R. Tempone, G. E. Zouraris, Galerkin finite element approximations of stochastic elliptic partial differential equations, SIAM Journal on Numerical Analysis 42 (2) (2004) 800–825.
- 5[5] M. S. Eldred, C. G. Webster, P. Constantine, Evaluation of non-intrusive approaches for Wiener-Askey generalized polynomial chaos, in: Proceedings of the 10th AIAA Non-Deterministic Approaches Conference, number AIAA-2008-1892, Schaumburg, IL, Vol. 117, 2008, p. 189.
- 6[6] E. Novak, K. Ritter, The curse of dimension and a universal method for numerical integration, in: Multivariate approximation and splines, Springer, 1997, pp. 177–187.
- 7[7] B. Ganapathysubramanian, N. Zabaras, Sparse grid collocation schemes for stochastic natural convection problems, Journal of Computational Physics 225 (1) (2007) 652–685.
- 8[8] D. Xiu, J. S. Hesthaven, High-order collocation methods for differential equations with random inputs, SIAM Journal on Scientific Computing 27 (3) (2005) 1118–1139.
