A Sketched Finite Element Method for Elliptic Models
Robert Lung, Yue Wu, Dimitris Kamilis, Nick Polydorides

TL;DR
This paper introduces a sketched finite element method for elliptic PDEs that uses random sampling based on leverage scores to significantly speed up computations while maintaining accuracy.
Contribution
It proposes a novel algorithm combining low-dimensional projection and randomized sketching with leverage score sampling for efficient high-dimensional elliptic PDE solutions.
Findings
Achieves nearly optimal performance with leverage score sampling
Provides theoretical bounds on error and complexity
Demonstrates two orders of magnitude speedup in simulations
Abstract
We consider a sketched implementation of the finite element method for elliptic partial differential equations on high-dimensional models. Motivated by applications in real-time simulation and prediction we propose an algorithm that involves projecting the finite element solution onto a low-dimensional subspace and sketching the reduced equations using randomised sampling. We show that a sampling distribution based on the leverage scores of a tall matrix associated with the discrete Laplacian operator, can achieve nearly optimal performance and a significant speedup. We derive an expression of the complexity of the algorithm in terms of the number of samples that are necessary to meet an error tolerance specification with high probability, and an upper bound for the distance between the sketched and the high-dimensional solutions. Our analysis shows that the projection not only reduces…
| [] | time [s] | ||||||
|---|---|---|---|---|---|---|---|
| 50 | 0.5 | 0.43 | 0.04 | 0.07 | 1.60 | 0.07 | 0.09 |
| 50 | 1 | 0.78 | 0.06 | 0.07 | 1.07 | 0.05 | 0.08 |
| 100 | 0.5 | 0.49 | 0.04 | 0.03 | 3.99 | 0.11 | 0.11 |
| 100 | 1 | 0.80 | 0.06 | 0.03 | 2.30 | 0.06 | 0.07 |
| 100 | 5 | 3.22 | 0.11 | 0.03 | 0.77 | 0.02 | 0.04 |
| [ | time [s] | ||||||
|---|---|---|---|---|---|---|---|
| 25 | 0.5 | 0.52 | 0.04 | 0.15 | 0.73 | 0.05 | 0.17 |
| 50 | 1 | 0.52 | 0.06 | 0.07 | 0.95 | 0.04 | 0.08 |
| 50 | 5 | 3.51 | 0.12 | 0.07 | 0.35 | 0.02 | 0.07 |
| 100 | 1 | 0.85 | 0.06 | 0.03 | 1.97 | 0.05 | 0.06 |
| 100 | 5 | 3.51 | 0.12 | 0.03 | 0.65 | 0.04 | 0.04 |
| time [s] | |||||||
|---|---|---|---|---|---|---|---|
| 1000 | 1 | 2.67 | 0.06 | 0.07 | 4.61 | 0.01 | 0.26 |
| 1000 | 5 | 5.96 | 0.12 | 0.05 | 1.25 | 0.01 | 0.26 |
| 2000 | 1 | 4.87 | 0.06 | 0.02 | 77.36 | 0.02 | 0.08 |
| 2000 | 5 | 9.95 | 0.12 | 0.03 | 9.64 | 0.01 | 0.08 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A sketched finite element method for elliptic models
Robert Lung
Robert Lung
School of Engineering
University of Edinburgh
UK
,
Yue Wu
Yue Wu
Mathematical Institute
University of Oxford
Oxford
UK
,
Dimitris Kamilis
Dimitris Kamilis
School of Engineering
University of Edinburgh
EH9 3JL Edinburgh
UK
and
Nick Polydorides
Nick Polydorides
School of Engineering
University of Edinburgh
EH9 3JL Edinburgh
UK & The Alan Turing Institute
London, UK
Abstract.
We consider a sketched implementation of the finite element method for elliptic partial differential equations on high-dimensional models. Motivated by applications in real-time simulation and prediction we propose an algorithm that involves projecting the finite element solution onto a low-dimensional subspace and sketching the reduced equations using randomised sampling. We show that a sampling distribution based on the leverage scores of a tall matrix associated with the discrete Laplacian operator, can achieve nearly optimal performance and a significant speedup. We derive an expression of the complexity of the algorithm in terms of the number of samples that are necessary to meet an error tolerance specification with high probability, and an upper bound for the distance between the sketched and the high-dimensional solutions. Our analysis shows that the projection not only reduces the dimension of the problem but also regularises the reduced system against sketching error. Our numerical simulations suggest speed improvements of two orders of magnitude in exchange for a small loss in the accuracy of the prediction.
Key words and phrases:
fRandomised linear algebra, Galerkin finite element method, statistical leverage scores, real-time simulation.
2019 Mathematics Subject Classification:
65F05, 65M60, 68W20
Contents
1. Introduction
Motivated by applications in digital manufacturing twins and real-time simulation in robotics, we consider the implementation of the Finite Element Method (FEM) in high-dimensional discrete models associated with elliptic partial differential equations (PDE). In particular, we focus on the many-query context, where a stream of approximate solutions are sought for various PDE parameter fields [8], aiming to expedite computations in situations where speedy model prediction is critical. Realising real-time simulation with high-dimensional models is instrumental to enable digital economy functions and has been driving developments in model reduction over the last decade [12]. Reducing the computational complexity of models is also central to the practical performance of statistical inference and uncertainty quantification algorithms, where a multitude of model evaluations are necessary to achieve convergence [16]. When real-time prediction is coupled with noisy sensor data, as in the digital twins paradigm, a fast, somewhat inaccurate model prediction typically suffices [4].
Our approach is thus tailored to applications where some of the accuracy of the solution can be traded off with speed. In these circumstances the framework of randomised linear algebra presents a competitive alternative [23]. In the seminal work [6], Drineas and Mahoney propose an algorithm for computing the solution of the Laplacian of a graph, making the case for sampling the rows of the matrices involved based on their statistical leverage scores. Despite aimed explicitly for symmetric diagonally dominant systems arising, their approach provides inspiration for the numerical solution of PDEs on unstructured meshes. Apart from the algebraic resemblance to the Galerkin FEM systems, the authors introduced sampling based on leverage scores of matrices through the concept of ‘effective resistance’ of a graph derived by mimicking Ohmic relations in resistor networks. As it turns out the complexity of computing the leverage scores is similar to that of solving the high-dimensional problem deterministically, however efficient methods to approximate them have since been suggested [7]. More recently, Avron and Toledo have proposed an extension of [6] for preconditioning the FEM equations by introducing the ‘effective stiffness’ of an element in a finite element mesh [1]. Specifically, for sparse symmetric positive definite (SSPD) stiffness matrices, they derive an expression for the effective stiffness of an element and show its equivalence to the statistical leverage scores. Sampling elements leads to a sparser preconditioner.
In situations where a single, high-dimensional linear system is sought, randomised algorithms suited to SSPD systems are readily available. The methods of Gower and Richtarik for example randomises the row-action iterative methods by taking a sequence of random projections onto convex sets [9]. This algorithm is equivalent to a stochastic gradient descent method with provable convergence, while their alternative approach in [10] iteratively sketches the inverse of the matrix. In [2], Bertsekas and Yu present a Monte Carlo method for simulating approximate solutions to linear fixed-point equations, arising in evaluating the cost of stationary policies in Markovian decisions. Their algorithm is based on approximate dynamic programming and has subsequently led to [20], that extends some of the proposed importance sampling ideas in the context of linear ill-posed inverse problems.
Real-time FEM computing at the many query paradigm, is hindered by two fundamental challenges: the fast assembly of the stiffness matrix for each parameter field, and the efficient solution of the resulting system to the required accuracy. To mitigate these, is to compromise slightly on the accuracy in order to capitalise on speed. To achieve this we first transform the linear SSPD system into an overdetermined least squares problem, and then project its solution this onto a low-dimensional subspace. This mounts to inverting a low-dimensional, dense matrix whose entries are perturbed by random errors. Our emphasis and contributions are in developing the projected sketching algorithm, and in optimising the sampling process so that it is both efficient in the multi-query context and effective in suppressing the variance of the solution. We also analyse the complexity of our algorithm and derive, probabilistic error bounds for quality of the approximation.
Our paper is organised as follows: In section 2 we provide a concise introduction to the Galerkin formulation for elliptic boundary value problems, and subsequently derive the projected least squares formulation of the problem. We then describe the sampling distribution used in the sketching and provide the conditions under which the reduced sketched system has a unique solution. Section 4 contains a description of our algorithm, and our main result that describes the complexity of our algorithm in achieving an error tolerance in high probability. We then provide an error analysis addressing the various types of errors imparted on the solution through the various stages of the methodology, before concluding with some numerical experiments based on the steady-state diffusion equation.
1.1. Notation
Let denote the set of integers between 1 and inclusive. For a matrix , and denote its -th row and column respectively, and its -th entry. is the pseudo-inverse of and its condition number. If we define the singular value decomposition where , and . Unless stated otherwise, singular values and eigenvalues are ordered in non-increasing order. Analogously, for a symmetric and positive definite matrix , is the largest eigenvalue, and the smallest. By we denote the number of non-zero elements in . Further we write for the Euclidean norm for a vector or the spectral norm of a matrix and the Frobenius norm of a matrix. For matrices and with the same number of rows is the augmented matrix formed by column concatenation. The identity matrix is expressed as or to specify its dimension when important to the context. We write for the Kronecker product of vector with the ones vector in dimensions.
2. Galerkin finite element method preliminaries
Consider the elliptic partial differential equation
[TABLE]
on a bounded, simply connected domain , with Dirichlet conditions
[TABLE]
on a Lipschitz smooth boundary . Further let a bounded positive parameter function in the Banach space such that
[TABLE]
for some finite constants and . Multiplying (1) by an appropriate test function , then integrating over the domain and invoking the divergence theorem yields
[TABLE]
where denotes the -dimensional integration element. Using the standard definition of the Sobolev space on this domain as
[TABLE]
where is the space of square-integrable functions on we define the solution and test function spaces respectively by
[TABLE]
Let and , where the Sobolev space is to be understood in terms of a surjective trace operator from to . Then the weak form of the boundary value problem (1)-(2) is to find a function such that
[TABLE]
The existence and uniqueness of the weak solution is guaranteed by the Lax-Milgram theorem [8].
To derive the Galerkin finite element approximation method from the weak form (7), we consider a mesh comprising elements, having interior and boundary vertices (nodes). Further let the conforming finite dimensional space associated with the chosen finite element basis defined on . Choosing
[TABLE]
to comprise linear interpolation shape functions with local support over the elements in then we can express the FEM approximation of in this basis for a set of coefficients as
[TABLE]
We have made slight abuse of notation by using for the function in as well as its FEM approximation in . In effect, the finite element formulation of the boundary value problem is to find such that
[TABLE]
We further define the element-average coefficients
[TABLE]
and applying the Dirichlet boundary conditions on the boundary nodes we arrive at the Galerkin system of equations for the vector
[TABLE]
The equations in (11) are expressed in a matrix form as
[TABLE]
where is the symmetric, sparse and positive-definite stiffness matrix, whose dependence on the parameters is implicit and suppressed for clarity. The FEM construction guarantees the consistency of the system (12), thus is always in the column space of and consequently it admits a unique solution . As we focus to the efficient approximation of in the many query context we content with two challenges: the efficient assembly of the stiffness matrix, and the speedy solution of the resulted FEM system.
2.1. The stiffness matrix
Let is the index set of the vertices of the th element, and consider to be the sparse matrix holding the gradients of the linear shape functions where . In this is then a constant gradients vector associated with the th node of , and let the element of a vector such that and a row concatenation of matrices for all elements. If we define as and the concatenation of the matrices as
[TABLE]
then the stiffness matrix takes the form of a high-dimensional sum or product of sparse matrices
[TABLE]
which for large require efficient assembly using reference elements and geometry mappings [15]. The above construction typically leads to a stiffness matrix that is well-conditioned for inversion with the exception of acute element skewness [14] and parameter vectors with wild variation [22], which cause the the condition number to increase dramatically. Explicit bounds on the largest and smallest eigenvalues of , and respectively the singular values of , are given in [13].
3. A regularised sketched formulation
The sought solution can be alternatively obtained by solving the over-determined least squares problem
[TABLE]
since
[TABLE]
The fact that the above problem is over-determined implies, at least to some extent, robustness against noise, such as random perturbations on the elements of the matrix and vector . A similar error is induced by randomised sketching where we replace (15) with
[TABLE]
and look for a random approximation of in the sense that . We note that and don’t have to be similar as such, e.g. have the same dimensions, as long as the problems are well defined and the optimisers remain similar. Following [6] and [19] we seek to approximate with some sketch by sampling and scaling rows according to probabilities that will be specified later. The number of rows in in that case equals the number of drawn samples. Clearly must have at least rows as otherwise the problem (16) will be under-determined and, due to the non-uniqueness of the solution, the error could become arbitrarily large. On the other hand, if around rows are sampled from a suitable distribution, then Drineas and Mahoney show that the resulting sketch is a good approximation with high probability. However, if substantially less than samples are drawn then the sketching induced error outweighs its computational benefits. In order to understand how this issue can be addressed we note that, if has full column-rank and thus the optimiser of (16) is unique, the solution of the sketched problem can be obtained by solving the linear system
[TABLE]
which is equivalent to solving
[TABLE]
From (17) it becomes clear that the sketching induced error can be regarded as an error on the right-hand side of the linear system (12) or the least squares problem (15). We can easily obtain a bound for the relative error given by
[TABLE]
A standard way of dealing with noise as in (17) is regularisation [18]. Suppose that there exists a low-dimensional subspace
[TABLE]
spanned by a basis of orthonormal functions arranged in the columns of matrix , and assume that is sufficient to approximate within some acceptable level of accuracy, in the sense of incurring a small subspace error . The orthogonal projection operator maps vectors from onto the subspace . Of course, such a subspace can’t accommodate all but rather only sufficiently regular . For that reason has to be constructed using prior information (e.g. degree of smoothness) about the solution. Orthogonality of ensures for any the existence of a unique, optimal low-dimensional vector satisfying
[TABLE]
In these conditions we can pose a projected-regularised least-squares problem replacing (15) by
[TABLE]
in order to improve the robustness of the solution against sketching-induced errors. The problem in (20) still involves high-dimensional quantities such as and , but the solution is unique as soon as and the null-space of have intersection. We start by introducing the low dimensional problem***We emphasise the contrast between the projected equations in (21) and the projected variable least squares problem
r^{\prime}=\arg\min_{r\in\mathbb{R}^{\rho}}\bigl{\|}A\Psi r-b\bigr{\|}^{2},
whose solution is
and incurs a subspace regression error term that is quadratic in . Moreover, note that the right hand side vector in the normal equations has dependence on the parameter through .
[TABLE]
A solution of (21) yields a solution of (20) because the columns of form an ONB of . In addition, we have the following.
Lemma 3.1**.**
If has full column rank and the columns of form an ONB of so that is the projection onto , then
[TABLE]
In particular, both problems have a unique solution.
Proof.
Both problems have unique solutions because is convex and has (by assumption) full column rank. Therefore it suffices to show that there exists an element that solves both problems. The solution of (21) can be found explicitly by solving the linear system
[TABLE]
We have used that has full column rank so that and is invertible. Similarly we may consider
[TABLE]
which produces solutions such that is a solution of the right-hand side of (22). Since and has full column rank we can write as
[TABLE]
We conclude that is a solution to both sides of (22) which completes the proof. ∎
The right hand side of (22) has a very natural interpretation and is obtained by embedding the rows of , the vector and the variable in using its low dimensional representation from the basis induced by the columns of . In view of Lemma 3.1 we may regularise the problem from (16) and obtain an embedded sketched counterpart to (20) as
[TABLE]
We argue that (23) is much more robust to the noise imparted by the approximation and produces solutions with controlled errors even if substantially less than suitably drawn samples are used for the approximation. In order to see why, notice that the problem (23) can be expressed in terms of the low-dimensional vector of coefficients
[TABLE]
so that . Recalling that , it is convenient to introduce
[TABLE]
together with their sketched approximations
[TABLE]
Lemma 3.2**.**
If has full column rank then the solution of the least-squares problem (24) is given by and we have
[TABLE]
where and are the solutions of (20) and (23) respectively.
Proof.
If has linearly independent columns then and the solution of (24) solves
[TABLE]
Again is invertible because has linearly independent columns and the first claim follows. The matrix is positive definite which implies that is positive definite and . The matrix has orthonormal columns which implies . Since we can use the formula we have just shown and obtain
[TABLE]
where the last identity is due to . ∎
In order to understand the effect of row sampling and why it can be a good approximation, we start by writing
[TABLE]
as a sum of outer products of rows. Introduce for some sample size the iid random indices taking values in with distribution
[TABLE]
for each and . Instead of (28) we may consider the sketch
[TABLE]
If we define the random matrix and the random diagonal matrix via
[TABLE]
then can put and construct the sketch as
[TABLE]
Lastly, we can write as well as for the sketches of and . A simple computation together with an application of the strong law of large numbers shows the following.
Proposition 3.3** (Lemma 3 and 4 in [DrineasMahoneyKannan]).**
Assume that the sampling probabilities satisfy the consistency condition
[TABLE]
In this case we have for the matrix as defined in (30) that and . As a consequence, almost surely for .
Proposition 3.3 summarises the asymptotic properties of the used sketch. The condition (33) is very mild and holds for a wide range of distributions such as sampling from scaled row norms or uniform sampling. The convergence rate of cannot be improved although the constant depends on the chosen probabilities . In other words, as long as we sample all non-zero rows with positive probability we will obtain a sketch that has good asymptotic properties when considered as an approximation for . However, in order to find good sampling probabilities we have to consider the non-asymptotic behaviour of the sketch. In fact, the main purpose of the regularisation/dimensionality reduction was to avoid situations where sampling a large number of rows is necessary. If , then the regularised problem (21) has substantially fewer degrees of freedom than the high dimensional formulation in (15). Consequently, the dependence of on the rows of is a lot smoother than the dependence of on . In other words, approximating by row sampling has a much smaller effect on the regularised solution than an approximation of with the same sample size would have on the solution of the full system (12). For example, a much smaller number of rows needs to be sampled to obtain the correct null-space which results in a full-rank approximation of . Note that, conditional on being invertible, in combination with Lemma 3.2 implies
[TABLE]
so the randomisation error of the regularised problem is entirely controlled by low dimensional structures. This property is the key to a small sketching error and thus to an overall accurate approximation when only few samples are drawn. Using the notation from before and letting be the singular value decomposition of , we can write the bound from (34) as
[TABLE]
From the above formulation it becomes apparent that the error will be small if the sketch is constructed such that in spectral norm. We argue that this is essentially equivalent to . Indeed, we have the following.
Lemma 3.4**.**
If then
[TABLE]
Proof.
Under the condition of the lemma we know that is invertible and that
[TABLE]
which implies the upper bound by considering the estimate
[TABLE]
Denote by the -th eigenvalue of . Then we may write
[TABLE]
where is the smallest eigenvalue. By assumption of the lemma
[TABLE]
which implies the claim after dividing by and taking the inverse. ∎
An approximation of can be obtained by sampling with probabilities that are proportional to the statistical leverage scores
[TABLE]
i.e. the row norms of the left singular vectors of [7]. At first sight it seems that taking sampling probabilities proportional to the leverage scores in (35) in order to obtain a sketch of (21) is very similar to using the leverage scores of to obtain (16) from (15) as was proposed by Drineas and Mahoney in [6] for a similar problem. A key difference is that is tall and dense while is sparse and thus is quite different to the initial stiffness matrix . Consequently, an interpretation of the leverage scores from (35) in terms of effective stiffness [1] is, to the best of our knowledge, not possible. The following Lemma will be useful for our further developments.
Lemma 3.5** ([21] section 6.4).**
Assume that is constructed as before with sampling probabilities satisfying
[TABLE]
for some . Then we have
[TABLE]
An important corollary of the above lemma is that a sketch which is constructed by sampling from leverage score probabilities will virtually always be invertible and therefore the sketched problem (24) has a unique solution. The following result states that this property is preserved even when the rows are re-weighted, an operation which changes the leverage scores.
Proposition 3.6**.**
Let be a diagonal matrix with positive entries, i.e. for each . Assume that the sketching matrix is constructed with sampling probabilities . For the scaled sketch we have
[TABLE]
Proof.
It is sufficient to show that
[TABLE]
because the probability bound follows immediately from
[TABLE]
after applying (37) from Lemma 3.5. The above matrices are always positive semi-definite and therefore invertibility is equivalent to positive definiteness. For any diagonal matrix it holds that where is a random diagonal matrix with entries . Thus for any we have
[TABLE]
Since has full column rank we know that corresponds to a change of basis and whenever . It follows that is positive definite if and only if is positive definite. As is a diagonal such that with probability , the latter is equivalent to being positive definite. The case of is covered by . ∎
Proposition 3.6 states that re-scaling of rows doesn’t affect the quality of the sketching matrix regarding its invertibility and after sampling rows the probability of the sketch being singular decays exponentially fast with each additional draw. In practice this makes knowledge of valuable because we only need to sample rows for some moderately large and obtain a sketch that is virtually never singular. On the other hand, we need at least samples so that there is any hope in obtaining a non-singular matrix. The remarkable thing about Proposition 3.6 is that the failure probability is independent of both, the inner dimension of the product as well as the scaling matrix and equivalent to the bound which could be obtained by sampling from . This suggests that a sketch which is constructed by drawing samples from is not too different compared to sampling from . This intuition is supported by the following result which describes the change in the leverage scores after re-weighting a single row.
Proposition 3.7** ([5] Lemma 5).**
Let be a diagonal matrix with and for each . Then
[TABLE]
and for
[TABLE]
where are the cross leverage scores.
Since has orthogonal columns, we have for any and thus the cross leverage scores from the above Lemma satisfy
[TABLE]
For a general diagonal matrix as in Proposition 3.6 we may without loss of generality assume that each entry lies in since we can divide the elements by their maximum. The re-weighting can thus be considered as a superposition of single row operations
[TABLE]
where the are as in Proposition 3.7. Since the commute we can apply them in any order without changing the outcome. Considering Lemma 3.5, if we could ensure that isn’t substantially smaller than then sampling from will produce good sketches for .
Large leverage scores
Equation (39) shows that the relative change of the -th leverage score after a re-weighting of the -th row shrinks when . In the extreme case when the re-weighting has no effect. In addition to this stability property it trivially holds that which suggests that large leverage scores are fairly stable when rows are re-weighted.
Small leverage scores
From Equation (40) we know that the increase of after re-weighting of row is proportional to . If the entries of the scaling matrix don’t vary too much, then (41) suggests that we can expect the total increase, i.e. after applying for each to be roughly of order . On the other hand, small are fairly sensitive to re-weighting of row since in that case. Thus we can expect that the re-weighting of row will counterbalance the effects from re-weighting the other rows. In addition, we know that
[TABLE]
Since large leverage scores will likely be quite stable and we would expect that not too many small leverage scores will become large.
So far we have discussed the projection of the high-dimensional system without providing explicit details on how the basis is selected. A desired property is to sustain a small projection error for all admissible parameter choices under the constraint . Suitable options include subsets of the right singular vectors of or orthogonalised Krylov-subspace bases [11], however these have to be computed for each individual parameter vector which can be detrimental to the speed of the solver. Alternatively, we opt for a generic basis exploiting the smoothness of on domains with smooth Lipschitz boundaries. A simple choice is to select the basis among the eigenvectors of the discrete Laplacian operator
[TABLE]
for Z^{2}_{\Delta}=\mathrm{diag}\bigl{(}[|\Omega_{1}|,\ldots,|\Omega_{k}|]\otimes 1_{d}\bigr{)}. From and splitting the eigenvectors as
[TABLE]
such that the columns of correspond to the last columns of , and respectively to the smallest eigenvalues . In effect, with constrained by the Dirichlet boundary conditions, the norm provides a measure of the smoothness of in the interior of . It is not difficult to see that this basis satisfies
[TABLE]
We remark that the computation of the basis is computationally very expensive for large , as the eigen-decomposition of is necessary, however this is only computed once, prior to the beginning of the simulation (offline stage) in an offline stage. After the matrix has been obtained we can compute the leverage scores . The Laplacian differs from a general stiffness matrix only by different diagonal weights, i.e. is replaced by the diagonal matrix Z^{2}=Z^{2}_{\Delta}\mathrm{diag}\bigl{[}(p_{1},\ldots,p_{k})\otimes 1_{d}\bigr{]} where the contain information about the parameter from (1). Propositions 3.6 and 3.7 along with the developments thereafter suggest that the Laplacian leverage scores can nonetheless be used to construct sketches of the projected matrix because the difference in the stiffness matrices is just a diagonal weighting.
4. Complexity and error analysis
Motivated by the developments from the previous sections we propose the following algorithm for computing solutions to a sequence of problem of the form (1). We assume that each problem is specified by its parameter vector for (see section 2.1).
The complexity and approximation error of Algorithm 1 are obviously linked. The more samples we draw the better we expect our solutions to be. Although the size of the reduced system matrix (and therefore its sketched counterpart as well) is independent of , the computational burden for building is higher when drawing more samples. More precisely, we need:
- •
operations in order to find . This is possible because is fixed and we can perform the necessary pre-processing offline [3].
- •
operations for computing the sampled indices and their frequencies as this requires a single loop through the set of initial samples.
- •
operation for assembling the diagonal matrices and .
- •
operations for computing . This can be achieved since computing requires multiplications and multiplications are enough for computing due to sparsity of .
- •
operations in order to build which corresponds to the cost of multiplication for dense matrices.
- •
operations for solving with a direct method.
The sketch will be singular if we draw distinct samples which means that building the sketch dominates the complexity of Algorithm 1. If the sampling probabilities are a good approximation in the sense that in Lemma 3.5 can be chosen close to , then we need samples in order to have a provably controlled error. The worst case, i.e. the the largest increase of , will be observed if for . A parameter corresponding to such a situation essentially renders the implementation of the classical Galerkin FEM problematic, as scales to , see Theorem 5.2 in [13] The following theorem summarises the findings of this section.
Theorem 4.1**.**
Let and is such that the sampling probabilities from Algorithm 1 satisfy (36), i.e.
[TABLE]
where . Let be the reduced system matrix corresponding to parameter and its condition number. For the choice Algorithm 1 requires operations and outputs, with probability exceeding , a vector that satisfies
[TABLE]
Proof.
As stated before, the complexity of Algorithm 1 is which immediately implies that it requires operations for a single query. It remains to prove the error bound. In view of (34) and the developments thereafter it follows, conditional on being invertible, that
[TABLE]
Since we only need to show that
[TABLE]
because is necessarily invertible on that event which implies validity of the estimates from before. But plugging the value for into (37) we obtain for any
[TABLE]
∎
Algorithm 1 is most attractive when we can tolerate an error somewhere between 1% to 10% in which case we can obtain the solution to a single query in about time. In practice the value for is unobtainable since it requires knowledge of the true leverage scores but considering Lemma 3.7 and the arguments thereafter, we expect that for a moderately large the required bound will hold for all but a few small leverage scores. The statement in Lemma 3.5 is rather pessimistic when there are few misaligned leverage scores since it requires a uniform bound. For practical purposes we expect that can be substituted with a small constant and we take which will ensure reglarity of the sketch. Up until now we have only considered the randomisation error of the sketched solution, i.e. we have analysed . However, the the total error of compared to the high dimensional solution of (12) has two components. If we decompose the process into two steps
[TABLE]
it becomes apparent that even with a perfect sketch, i.e. if we solved the noiseless projected problem (20) and (46) is negligible, we could still not achieve an error smaller than . The next result tells us that the error from (45) is close to the optimal one.
Theorem 4.2**.**
Let be the solution of (12) and be the optimum of (20). If is the condition number of the stiffness matrix and the projection ont , then
[TABLE]
Proof.
Recall that and . From the developments in Lemma 3.2 we know that . We may write as before so that and
[TABLE]
If we write and for the smallest and largest eigenvalues of , then it must hold that
[TABLE]
because has orthogonal columns. Indeed, if is the -dimensional unit sphere, then
[TABLE]
is obviously true. Since the columns of form an ONB of we have
[TABLE]
Thus, . Clearly we also have . Due to orthogonality we know that . Combining those estimates we obtain
[TABLE]
which yields the desired bound. ∎
If the subspace is such that the relative projection error is small, then the norm of will be similar to the norm of . More precisely,
[TABLE]
so that Theorem 4.1 applies to with a small -dependent constant. By combining the previous two theorems we obtain the following.
Corollary 4.3**.**
Let and assume that the assumptions of Theorem 4.1 are satisfied for . If is the solution of (12) and the subspace is such that
[TABLE]
for some . Then the total error of the solutions produced by Algorithm 1 satisfy the bound
[TABLE]
Proof.
We can start with the estimate
[TABLE]
Using the estimate from Theorem 4.2 we get
[TABLE]
It remains to bound the other term. Since has orthogonal columns we obtain from Theorem 4.1
[TABLE]
Since we have shown in the proof of Theorem 4.2 that
[TABLE]
we can estimate
[TABLE]
As before, we have used the fact that
[TABLE]
From it follows that
[TABLE]
which completes the proof. ∎
If we assume that , then the error estimate from Corollary 4.3 states, with small leading constants, that
[TABLE]
It therefore makes sense to have a sketching error that is of the same order as the projection error . In practice we found that projection errors of roughly 1% to 10% can be expected so that the sketching induced error isn’t very harmful if we choose the sample size as in Theorem 4.1 with .
5. Numerical results
To test the performance of Algorithm 1 we consider the finite element formulation of the elliptic equation (1) with homogeneous Dirichlet boundary conditions on and a forcing term derived from a piecewise constant approximation of the function
[TABLE]
We discretise the model on a spherical domain of unit radius comprising unstructured linear tetrahedral elements. This leads to a total nodes of which are situated in the interior of the domain. In these circumstances is a tall matrix with rows, the stiffness matrix has dimensions and the sample space is .
We seek to assess the practical performance of our algorithm in terms of its speed and accuracy in computing the sketched solution under various choices sampling budgets and low-dimensional subspaces, for the proposed sampling distribution. To achieve this we perform three benchmark tests involving realisations of (i) a uniformly distributed random parameter field, (ii) a smoothly varying lognormal random field, and (iii) a random field with jump discontinuities. For each of these we run a sequence of simulations, i.e. queries, and record timings and error measures on average. For each realisation we compute also the conventional FEM solution to provide a reference for comparison. The high-dimensional is computed using Matlab’s built-in A\b command [17], and the times provided include the efficient assembly of the full stiffness matrix as a triple product of sparse matrices . Our code was implemented in Matlab R2018b and executed on a workstation equipped with two 14-core Intel Xeon dual processors, running Linux NixOS with 384GB RAM.
In the offline phase of Algorithm 1 we form a low-dimensional ONB for the projection by computing the last eigenfunctions of the sparse Laplacian matrix discretised on . For this time consuming and memory demanding operation we have resorted to the svds and qr commands which avoid computing the complete spectrum or they produce a sparse ONB respectively. The computation of the sampling distribution based on the leverage scores of was also performed once during the offline phase and took about 4 hours, using the svd(,’econ’) command. The distribution was sampled with replacement during the online phase of the algorithm using the efficient command datasample, which indicatively, for the chosen , outputs a million samples in less than 0.3 s. Notice that although this sampling implementation is not independent of the dimension , there exist alternative schemes that can handle arbitrarily large distributions with constant complexity [3].
In the implementation of the algorithm we record the following quantities–diagnostics that provide evidence on the performance in the conditions of each benchmark: the ratio indicating how many of the rows of are used in the sketch, the relative subspace projection error , the upper bound of the randomisation error , the relative regression error , and the relative total error . In the context of real-time model prediction in manufacturing processes an upper limit of 10% for the total error is deemed reasonable.
5.1. Uniformly random parameter field
In this first instance we simulate sketched solutions for 100 parameter vectors drawn at random from \mathcal{U}\bigl{(}[10^{-1},10^{2}]\bigr{)}. Five sets of simulations were performed using ONBs incorporating the last singular functions of the Laplacian. Our focus was on monitoring the trade-off between accuracy and time consumption when iid samples are drawn from . The results are tabulated in table 1.
Although the values in vary over four orders of magnitude, the parameter has a homogeneous expectation within the domain and thus overall the algorithm yields sketched solutions at 10% or less total error, with only 100 basis functions. The results show that the sampling is highly non-uniform since even in the case where a million idd samples were taken these involved only 41074, a mere 6%, of the rows of . The sketching-induced error factor appears to reduce almost linearly with the number of samples . Comparing the relative subspace projection and total errors note that for the later is kept marginally larger than the former, which verifies the regularising effect of the projection on the sketching-induced noise. It is also important to see that in switching from to the projection error is halved to 0.03, however the number of samples necessary to yield the same levels of the error increases by about 5 times. For relative error tolerances around the 10% mark, the times recorded are below 1 s, while by comparison the time for computing was on average found to be 23.75 s.
The trade-off between speed and accuracy can be seen by comparing the results in the first and last rows of the table 1 where the algorithm achieves a 4% total error, when the projection error is at 3%, after five million samples. On the other hand, solutions within a 10% error margin, when the projection error is at 7%, are obtained in less than 0.5 s, which is 55 times faster than computing the standard . The speedup in sketching the more accurate solution with and million is still 7 times faster, compared to the FEM solver. The histograms in figure 1 provide a further insight on how the various error components vary within the ensemble of the 100 problems. We point out that the numerical results are in good agreement with the assertion of Theorem 4.1. For the example shown in figure 1, i.e. when and the error tolerance is , our theorem predicts samples which is consistent to the observed when . In the histograms we see that the sketching error virtually never exceeds and that exhibits the same pattern as which supports the claim that this quantity is driving the sketching error. Similar observations can be made for the other cases of table 1. Figure 1 also shows that, although their magnitude is comparable, the variability in the projection error is much smaller than that of the sketching error. This is not surprising as the sketching is an intrinsically random method while the differences in the projection are only due to perturbations in the parameter.
5.2. Smooth parameter field
In the second benchmark we turn our attention to parameter functions with smooth spatial variation like those encountered in the context of uncertainty quantification for PDEs [16]. As the anticipated FEM solution is smooth we maintain the bases used in 5.1. In this case, the parameter is a lognormal random field given by , where is a zero-mean Gaussian random field with Whittle-Matérn covariance function with smoothness parameter given by
[TABLE]
where is the Gamma function, is the weighted Euclidean norm with positive definite matrix and is the order modified Bessel function of the second kind. Here we use , and . We draw realisations of by calculating once the Karhunen-Loève expansion of and then drawing iid from .
The results presented in table 2 show a similar performance to the uniformly random case in subsection 5.1. The suitability of the low-dimensional subspace is evidenced by the 7% relative projection error attained at . Sketched solutions within an error tolerance of 10% were computed in less than 1 s. Further, note that the total error is within a 2% margin from the projection error, which demonstrates the effectiveness of our sketching regularisation approach, apart from the test with and where is considerably higher, implying that was insufficiently small for that test. This observation is consistent with our error bound in (4.1). Comparing the results for and shows that in the former case, although using half the number of basis functions and five times more samples, due to the larger projection error, the total error is still 1% larger than that of the later. The images presented in figure 2 correspond to one of the simulations in this benchmark with and million, illustrating a cross section of the profile of , the exact FEM solution, the sketched solution and the relative error between the two.
5.3. Non-smooth parameter field
A more challenging benchmark test is to consider the FEM solution for a parameter field with non-smooth variation. In this case it is natural to anticipate that any significant jump discontinuities in the profile of will have an adverse effect on the condition number of the stiffness matrix [13]. For our simulations we choose a piecewise constant approximation of the positive function
[TABLE]
which is discontinuous along the three axes. The sign function is given by when and . In constructing the projection subspace we found that the smooth basis utilised in the previous cases was not appropriate to this case and we thus resorted in a sparse ONB taking a subset of the columns of the sparse unitary matrix computed from the QR decomposition of the Laplacian.
The results in table 3 suggest that the chosen basis is not very appropriate since not only the number of basis functions is substantially larger, but also the reduction in the projection error for a 100% increase in is quiet marginal. In turn, this increase in the dimension of affects the level of sketching error, as even with million samples . Consequently, this has a profound effect on timings, although the sketched approach maintains a five fold advantage to the standard FEM solver. For the tests for and notice that increasing the samples by five times does not yield a significant improvement in the results, which is likely triggered by the large in the error term of Theorem 4.2 which causes the to grow.
6. Conclusions
We have considered expediting the solution of the finite element method equations arising from the discretisation of elliptic PDEs on high-dimensional models. Taking into consideration the multi-query context and the smooth profile of the FEM solution, we proposed a practical sketch-based algorithm that involves projection onto lower-dimensional subspace and sketching using a generic, sampling distribution derived from the leverage scores of a tall matrix associated with the Laplacian operator. We have elaborated on the impact of the projection in reducing the dimensionality as well as mitigating the effects of sketching noise. The performance of our method was evaluated in a series of benchmark tests of FEM simulations that demonstrated substantial speed improvements at the cost of a small compromise in accuracy when the stiffness matrix is well conditioned.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Avron, H., and Toledo, S. Effective Stiffness: Generalizing Effective Resistance Sampling to Finite Element Matrices . Ar Xiv, oct 2011.
- 2[2] Bertsekas, D. P., and Yu, H. Journal of Computational and Applied Projected equation methods for approximate solution of large linear systems. Journal of Computational and Applied Mathematics 227 , 1 (2009), 27–50.
- 3[3] Bringmann, K., and Panagiotou, K. Efficient sampling methods for discrete distributions. Algorithmica 79 , 2 (Oct 2017), 484–508.
- 4[4] Calvetti, D., Dunlop, M., Somersalo, E., and Stuart, A. Iterative updating of model error for Bayesian inversion. Inverse Problems 34 , 2 (feb 2018), 025008.
- 5[5] Cohen, M. B., Lee, Y. T., Musco, C., Musco, C., Peng, R., and Sidford, A. Uniform sampling for matrix approximation. In Proceedings of the 2015 Conference on Innovations in Theoretical Computer Science (New York, NY, USA, 2015), ITCS ’15, ACM, pp. 181–190.
- 6[6] Drineas, P., and Mahoney, M. W. Effective Resistances, Statistical Leverage, and Applications to Linear Equation Solving . Ar Xiv, may 2010.
- 7[7] Drineas P., Magdon-Ismail M., M. M., and D., W. Fast approximation of matrix coherence and statistical leverage. Journal of Machine Learning Research 13 , 1 (2012), 3441–3472.
- 8[8] Elman, H., Silvester, D., and Wathen, A. Finite Elements and Fast Iterative Solvers , 2nd ed. Oxford University Press, 2014.
