Complexity estimates for triangular hierarchical matrix algorithms
Steffen B\"orm

TL;DR
This paper provides theoretical complexity estimates for hierarchical matrix algorithms, demonstrating that LR factorizations are computationally efficient and comparable to matrix multiplication, thereby supporting their practical advantages in solving integral and differential equations.
Contribution
It offers the first theoretical complexity bounds for $ ext{H}$-matrix LR factorizations, inversion, and multiplication, confirming their efficiency over direct inversion.
Findings
LR factorization requires no more operations than matrix multiplication.
An improved upper bound for the complexity of matrix multiplication is established.
Theoretical estimates support the efficiency of $ ext{H}$-matrix algorithms in practical applications.
Abstract
Triangular factorizations are an important tool for solving integral equations and partial differential equations with hierarchical matrices (-matrices). Experiments show that using an -matrix LR factorization to solve a system of linear questions is superior to direct inversion both with respect to accuracy and efficiency, but so far theoretical estimates quantifying these advantages were missing. Due to a lack of symmetry in -matrix algorithms, we cannot hope to prove that the LR factorization takes one third of the operations of the inversion or the matrix multiplication, as in standard linear algebra. We can, however, prove that the LR factorization together with two other operations of similar complexity, i.e., the inversion and multiplication of triangular matrices, requires not more operations than the matrix multiplication. We can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsElectromagnetic Scattering and Analysis · Electromagnetic Simulation and Numerical Methods · Matrix Theory and Algorithms
Complexity estimates for triangular hierarchical matrix
algorithms
Steffen Börm
Abstract
Triangular factorizations are an important tool for solving integral equations and partial differential equations with hierarchical matrices (-matrices).
Experiments show that using an -matrix LR factorization to solve a system of linear questions is superior to direct inversion both with respect to accuracy and efficiency, but so far theoretical estimates quantifying these advantages were missing.
Due to a lack of symmetry in -matrix algorithms, we cannot hope to prove that the LR factorization takes one third of the operations of the inversion or the matrix multiplication, as in standard linear algebra. We can, however, prove that the LR factorization together with two other operations of similar complexity, i.e., the inversion and multiplication of triangular matrices, requires not more operations than the matrix multiplication.
We can complete the estimates by proving an improved upper bound for the complexity of the matrix multiplication, designed for recently introduced variants of classical -matrices.
1 Introduction
Hierarchical matrices [21, 16], -matrices for short, can be used to approximate certain densely populated matrices arising in the context of integral equations [5, 8, 9] and elliptic partial differential equations [6, 11] in linear-polylogarithmic complexity. Compared to other methods like fast multipole expansions [19, 20] or wavelet approximations [7, 10], it is possible to approximate arithmetic operations like the matrix multiplication, inversion, or triangular factorization for -matrices in linear-polylogarithmic complexity. This property makes -matrices attractive for a variety of applications, starting with solving partial differential equations [6, 11] and integral equations [12], up to dealing with matrix equations [15, 17, 4, 3] and evaluating matrix functions [13, 14].
Already the first articles on -matrix techniques introduced an algorithm for approximating the inverse of an -matrix by recursively applying a block representation [21, 16]. This approach works well, but is quite time-consuming.
The situation improved significantly when Lintner and Grasedyck introduced an efficient algorithm for approximating the LR factorization of an -matrix [23, 18], reducing the computational work by a large factor and simultaneously considerably improving the accuracy. It is fairly easy to prove that the -LR or -Cholesky factorization requires less computational work than the -matrix multiplication or inversion, and for the latter operations linear-polylogarithmic complexity bounds have been known for years [16].
For dense matrices in standard array representation, we know that a straightforward implementation of the LR factorization requires
[TABLE]
Our first goal is to prove that this statement also holds for -matrices with (almost) arbitrary block trees, i.e., that the operations appearing in the factorization, triangular inversion, and multiplication fit together like the parts of a jigsaw puzzle corresponding to the -matrix multiplication. Incidentally, combining the three algorithms also allows us to compute the approximate -matrix inverse in place without the need for a separate output matrix.
In order to complete the complexity analysis, we also have to show that the -matrix multiplication has linear-polylogarithmic complexity. This is already known [16, 22], but can find an improved estimate that reduces the impact of the sparsity of the block tree and therefore may be interesting for recently developed versions of -matrices, e.g., MBLR-matrices, that use a denser block tree to improve the potential for parallelization [1, 2].
2 Definitions
The blockwise low-rank structure of -matrices is conveniently described by the cluster tree, a hierarchical subdivision of an index set into disjoint subsets called clusters, and the block tree, a hierarchical subdivision of a product index set into subsets constructed from these clusters.
Definition 1** (Cluster tree)**
Let be a finite index set. A tree is a cluster tree for this index set if each node is labeled with a subset and if these subsets satisfy the following conditions:
- •
The root of is labeled with .
- •
If has sons, the label of is the union of the labels of the sons, i.e., .
- •
The labels of sons of are disjoint, i.e., for and with , we have .
The nodes of a cluster tree are called clusters. The set of leaves is denoted by .
Definition 2** (Block tree)**
Let be a cluster tree for an index set . A tree is a block tree for this cluster tree if
- •
For each node there are cluster with . is called the row cluster for and the column cluster.
- •
If is the root of , the root of is .
- •
If has sons, they are pairs of the sons of and , i.e., .
The nodes of a block tree are called blocks. The set of leaves is denoted by .
We can see that the labels of the leaves of a cluster tree correspond to a disjoint partition of the index set and that the sets with correspond to a disjoint partition of the index set , i.e., of a decomposition of a matrix into submatrices.
Among the leaf blocks , we identify those that correspond to submatrices that we expect to have low numerical rank. These blocks are called admissible and collected in the set . The remaining leaves are called inadmissible and collected in the set .
We note that there are efficient algorithms at our disposal for constructing cluster and block trees for various applications [16, 22].
Definition 3** (Hierarchical matrix)**
Let be a cluster tree for an index set , and let be a block tree for with sets and of admissible and inadmissible leaves. A matrix is a hierarchical matrix (or short -matrix) of local rank if
[TABLE]
i.e., if all admissible leaves have a rank smaller or equal to .
If is a hierarchical matrix, we can find matrices and for every admissible leaf such that
[TABLE]
Here we use the shorthand notation for the set of matrices with row indices in and column indices in .
For inadmissible leaves , we store the nearfield matrices directly. The matrix families , , and together represent an -matrix .
3 Basic -matrix operations
Before we consider algorithms for triangular -matrices, we have to recall the algorithms they are based on: the -matrix-vector multiplication, the -matrix low-rank update, and the -matrix multiplication.
The multiplication an -matrix with multiple vectors collected in the columns of a matrix can be split into updates
[TABLE]
where denotes the restriction of to the row indices in and is a scaling factor. For inadmissible leaves, the update can be carried out directly, taking care to minimize the computational work by ensuring that the scaling by is applied to if and to the product otherwise.
For admissible leaves we have and can first compute the intermediate matrix and then add to the output, i.e.,
[TABLE]
For non-leaf blocks , we recursively consider sons until we arrive at leaves. In total, the number of operations for (3) is equal to
[TABLE]
for all . The multiplication by the transposed -matrix , i.e., updates of the form
[TABLE]
can be handled simultaneously and also requires operations. In the following, we assume that procedures “addeval” and “addevaltrans” for the operations (3) and (5) are at our disposal.
The low-rank update of an -matrix , i.e., the approximation of
[TABLE]
for a block , and is realized by recursively moving to the leaves of the block tree and performing a direct update for inadmissible leaves and a truncated update for admissible ones: if , we have and approximate
[TABLE]
by computing the thin Householder factorization , a low-rank approximation of , so that yields a low-rank approximation . Assuming that the Householder factorization and the low-rank approximation of matrices require operations, we find a constant such that the number of operations for a low-rank update (6) is bounded by
[TABLE]
for all . In the following, we assume that a procedure “update” for approximating the operation (6) in this way is available.
During the course of the -matrix multiplication, we may have to split a low-rank matrix into submatrices, perform updates to these submatrices, and then merge them into a larger low-rank matrix. This task can be handled essentially like the update, but we have to take special care in case that the number of submatrices is large. Let with and . We are looking for an approximation of the matrix
[TABLE]
where we assume that preliminary compression steps have ensured .
In a first step, we compute thin Householder factorizations with for all . Due to our assumption and , this requires operations. Now we form the reduced matrix
[TABLE]
Once we have found a rank- approximation with and an isometric matrix , applying the Householder reflections yields the rank- approximation
[TABLE]
of the original matrix in operations. To construct a rank- approximation of , we can proceed sequentially: first we use techniques like the singular value decomposition or rank-revealing QR factorization to obtain a rank- approximation
[TABLE]
with and an isometric matrix . Due to our assumptions, this task can be accomplished in operations. We find
[TABLE]
where denotes the -dimensional identity matrix. The left factor now has only columns, and repeating the procedure times yields
[TABLE]
Since the matrices have only columns and rows by construction, we can compute
[TABLE]
in operations and find the desired low-rank approximation. We conclude that there is a constant such that not more than operations are needed to merge the row blocks. We can apply the same procedure to merge column blocks, as well, and see that
[TABLE]
operations are sufficient to merge submatrices for all the sons of a block , where . In the following, we assume that a procedure “merge” for this task is available.
Finally, the -matrix multiplication algorithm carries out the approximate update with and , again by recursively considering sons of the blocks until one of them is a leaf, so the product is of low rank and can be computed using the functions “addeval” and “addevaltrans”. The functions “update” and “merge” can then be used to add the result to , performing low-rank truncations if necessary. The algorithm is summarized in Figure 1.
In order to keep the notation short, we introduce the local rank of leaf blocks by
[TABLE]
We can see that the multiplication algorithm in Figure 1 performs matrix-vector multiplications for matrices with columns if and for matrices with columns if , followed by a low-rank update.
We obtain the bound
[TABLE]
for the computational work required by the algorithm “addmul” in Figure 1 called with , where we include the work for merging submatrices in all non-leaf cases for the sake of simplicity.
4 Algorithms for triangular -matrices
In the context of algorithms for hierarchical matrices, we assume triangular matrices to be compatible with the structure of the cluster tree , i.e., if we have two sons of a cluster and if there are indices and with , all indices in are smaller than all indices in . For this property, we use the shorthand .
While this may appear to be a significant restriction at first glance, it rarely poses problems in practice, since we can define the order of the indices in the index set to satisfy our condition by simply choosing an arbitrary order on the indices in leaf clusters and an arbitrary order on the sons of non-leaf clusters. By induction, these orders give rise to a global order on satisfying our requirements.
We are mainly interested in three operations: the construction of an LR factorization , i.e., the decomposition of into a left lower triangular matrix with unit diagonal and a right upper triangular matrix , the inversion of the triangular matrices, and the multiplication of triangular matrices. Together, these three operations allow us to overwrite a matrix with its inverse.
For the sake of simplicity, we assume that we are working with a binary cluster tree , i.e., a cluster is either a leaf or has exactly two sons . In the latter case, we can split triangular matrices into submatrices
[TABLE]
to obtain
[TABLE]
We also assume that diagonal blocks, i.e., blocks of the form with , are never admissible.
Forward and backward substitution
In order to use an LR factorization as a solver, we have to be able to solve systems , , , and . The third equation can be reduced to the second by taking the adjoint, and the fourth equation similarly reduces to the first. We consider the more general tasks of solving
[TABLE]
for an arbitrary cluster .
We first address the case that and are matrices in standard representation, i.e., that no low-rank approximations are required.
If is a leaf, and are a standard matrices and we can solve the equations by standard forward and backward substitution.
If is not a leaf, the first equation takes the form
[TABLE]
so we can solve by recursion, overwrite by using the algorithm “addeval”, and solve by recursion.
The second equation takes the form
[TABLE]
so we can solve by recursion, overwrite by using the algorithm “addeval” again, and solve by recursion. The algorithms are summarized in Figure 2. Counterparts “lsolvetrans” and “rsolvetrans” for the adjoint matrices and can be defined in a similar fashion using “addevaltrans” instead of “addeval”.
In order to construct the LR factorization, we will also have to solve the systems and with -matrices and , and this requires some modifications to the algorithms: we consider the systems
[TABLE]
for blocks . On one hand, we can take advantage of the low-rank structure if holds, i.e., if is an admissible leaf. In this case, we have with and , and with the solution of the linear system
[TABLE]
we find that solves
[TABLE]
This property allows us to handle admissible blocks very efficiently.
On the other hand, we cannot expect to be able to perform the update exactly, since we want to preserve the -matrix structure of , so we have to use “addmul” instead of “addeval”, approximating the intermediate result with the given accuracy.
In order to keep the implementation simple, it also makes sense to follow the structure of the block tree: if we switch to the sons of , we should also switch to the sons of , if it still has sons. The resulting algorithms are summarized in Figure 3.
We also require counterparts for the systems
[TABLE]
for blocks . These can be constructed along the same lines as before and are summarized in Figure 4.
LR factorization
Now we have the necessary tools at our disposal to address the LR factorization. Given an -matrix , we consider computing a lower triangular matrix with unit diagonal and an upper triangular matrix with
[TABLE]
for a cluster . If is a leaf, is given in standard array representation and we can compute the LR factorization by the usual algorithms.
If is not a leaf, we follow (9) and define
[TABLE]
Our equation takes the form
[TABLE]
We can compute the LR factorization of by first computing the factorization by recursion, followed by solving with “rrsolve” and with “llsolve”, computing the Schur complement approximately with “addmul”, and finding the LR factorization , again by recursion. The resulting algorithm is summarized in Figure 5, where is overwritten by the Schur complement .
Triangular inversion
The next operation to consider is the inversion of triangular matrices, i.e., we are looking for and . As before, we consider the inversion of submatrices and for a cluster .
Again, if is a leaf, the matrices and are given in standard representation and can be inverted by standard algorithms. If has sons, the inverses can be written as
[TABLE]
so we can compute the off-diagonal blocks by calling “llsolve” and “lrsolve” in the first case and “rlsolve” and “rrsolve” in the second case, and then invert the diagonal blocks by recursion. The algorithms are summarized in Figure 6.
Triangular matrix multiplication
Finally, having and at our disposal, we consider computing the inverse
[TABLE]
As in the previous cases, recursion leads to sub-problems of the form
[TABLE]
for clusters . If is a leaf, we can compute the product directly.
Otherwise, the equation takes the form
[TABLE]
We can see that we have to compute products and that are of the same kind as the original problem and can be handled by recursion. We also have to compute the product , which can be accomplished by “addmul”.
Finally, we have to compute and , i.e., products of triangular and non-triangular -matrices. In order to handle this task, we could introduce suitable counterparts of the algorithms “rlsolve” and “lrsolve” that multiply by a triangular matrix instead of by its inverse. These algorithms would in turn require counterparts of “lsolve” and “rsolve”, i.e., we would have to introduce four more algorithms.
To keep this article short, another approach can be used: Since we have and , we can evaluate the product by the algorithm “lrsolve” and by the algorithm “rlsolve” without the need for additional algorithms. The result is summarized in Figure 7.
Remark 4** (In-place operation)**
All algorithms for triangular matrices introduced in this section can overwrite input variables with the result: for triangular solves, the right-hand side can be overwritten with the result, for the LR factorization, the lower and upper triangular parts of the input matrix can be overwritten with the triangular factors, the triangular inversion algorithms can overwrite the input matrices with the inverses.
If we are only interested in computing the inverse, we can interleave the algorithm “lrinvert” with “linvert” and “rinvert” to avoid additional storage for the intermediate results and : once the LR factorization is available, we first overwrite the off-diagonal blocks and by the intermediate results and , then overwrite the first diagonal block recursively by its inverse, add the product , then overwrite the no longer required off-diagonal blocks with and . A recursive call to compute the inverse of the second diagonal block completes the algorithm.
5 Complexity estimates for combined operations
Due to the lack of symmetry introduced by the low-rank approximation steps required to compute an -matrix, we cannot prove that the LR factorization requires one third of the work of the matrix multiplication. We can, however, prove that the LR factorization , the inversion of the triangular factors, and the multiplication together require not more work than the matrix multiplication.
Before we can consider the -matrix case, we recall the corresponding estimates for standard matrices, cf. (1): the LR factorization, triangular matrix inversion, and multiplication require
[TABLE]
By adding the estimates for the four parts of the inversion algorithm, we obtain a computational cost of
[TABLE]
i.e., inverting requires less operations than multiplying the matrix by itself and adding the result to a matrix. We aim to obtain a similar result for -matrices.
We assume that the block tree is admissible, i.e., that a leaf of the block tree is either admissible or has a leaf of either as row or column cluster:
[TABLE]
and we assume that there is a constant such that
[TABLE]
The constant is called the resolution (sometimes also the leaf size) of . Both properties (10) and (11) can be ensured during the construction of the cluster tree.
Now let us consider the number of operations for the algorithms “lsolve” and “rsolve” given in Figure 2. If is a leaf, solving the linear systems requires operations. Otherwise, we just use “addeval” and recursive calls and arrive at the recurrence formulas
[TABLE]
that give bounds for the number of operations required by “lsolve” and “rsolve”, respectively, where again denotes the columns of the matrices and .
Lemma 5** (Solving linear systems)**
We have
[TABLE]
Proof 5.6**.**
By structural induction.
Let . We have and therefore
[TABLE]
Let now be such that our claim holds for the sons and . We have
[TABLE]
In the next step, we compare the computational work for the -matrix multiplication with that for the combination of the algorithms “llsolve” and “rlsolve” or “lrsolve” and “rrsolve”, respectively.
The computational work for the forward substitution algorithm “llsolve” given in Figure 3 can be bounded by
[TABLE]
for all , while we get
[TABLE]
for all for the algorithm “rlsolve”.
Lemma 5.7** (Forward and backward solves).**
We have
[TABLE]
Proof 5.8**.**
By structural induction, where the base case is split into two sub-cases for admissible and inadmissible leaves.
Case 1:* Let . If holds, we have and*
[TABLE]
Otherwise, i.e., if , we have , and Lemma 5 yields
[TABLE]
Case 2:* Let . If holds, we have and*
[TABLE]
Otherwise, i.e., if , we have and therefore . Due to (11), this means , and we can use and Lemma 5 to obtain
[TABLE]
Case 3:* Let be such that our claim holds for all sons of . Since is not a leaf, we have and therefore also . This implies*
[TABLE]
and our proof is complete.
Now we consider the two algorithms “lrsolve” and “rrsolve”. They rely on “lsolvetrans” and “rsolvetrans”, and these algorithms require the same work as “lsolve” and “rsolve”, respectively. The work for “lrsolve” and “rrsolve” is then bounded by
[TABLE]
for all . Proceeding as in the proof of Lemma 5.7 leads us to
[TABLE]
Now that the fundamental statements for the forward and backward subsitution algorithms are at our disposal, we can consider the factorization and inversion algorithms. We directly obtain the bounds
[TABLE]
for all clusters and the algorithms “lrdecomp”, “linvert”, “rinvert”, and “lrinvert”, respectively.
Theorem 5.9** (Combined complexity).**
We have
[TABLE]
Proof 5.10**.**
By structural induction. We start with the base case and observe
[TABLE]
Now let be chosen such that the estimate holds for all of its sons. We obtain
[TABLE]
where we have used Lemma 5.7 in the next-to-last estimate.
6 Complexity of the -matrix
multiplication
We have seen that the number of operations for our algorithms can be bounded by the number of operations required by the matrix multiplication. In order to complete the analysis, we derive a bound for that is sharper than the standard results provided in [16, 22]. We rely on the following assumptions:
- •
for an inadmissible leaf of the block tree, we have either or ,
- •
there is a constant , e.g., the resolution introduced in (11), such that
[TABLE]
- •
there is a constant such that
[TABLE]
i.e., is an upper bound for the depth of the cluster tree,
- •
the block tree is sparse, i.e., there is a constant such that
[TABLE]
We denote the maximal rank of leaf blocks by
[TABLE]
In order to facilitate working with sums involving clusters and blocks, we introduce the sets of descendants of clusters and blocks by
[TABLE]
for all . The special case will be important when dealing with products of hierarchical matrices that are added to a part of a low-rank submatrix.
Since the matrix multiplication relies on the matrix-vector multiplication, we start by deriving a bound for introduced in (4). If , our first assumption yields or . In the first case, the second assumption gives us and
[TABLE]
In the second case, we have and obtain
[TABLE]
Now we can combine the estimates for the inadmissible leaves with those for the admissible ones to find
[TABLE]
for all . A straightforward induction yields
[TABLE]
Next we consider the low-rank update and look for an upper bound for introduced in (7). Using the same arguments as before, we find
[TABLE]
for all , and introducing yields the upper bound
[TABLE]
for all due to . A straightforward induction, keeping in mind the special case of for , leads to the estimate
[TABLE]
Now we can investigate the matrix multiplication, i.e., we can look for an upper bound for introduced in (8). While the computational work for the matrix-vector multiplication and the update depends only on two clusters , the matrix multiplication depends on three . We can collect these triples in a special product tree that represents the recursive structure of the algorithm.
Definition 6.11** (Product tree).**
Given a cluster tree and a corresponding block tree , the product tree is the minimal tree satisfying the following conditions:
- •
For every node of the product tree, there are clusters with .
- •
Let be the root of . Then is the root of .
- •
A node is a leaf if and only if or . Otherwise, its sons are given by .
Due to this definition, implies and .
Let . If , we can apply the estimates (13) and (14) to obtain
[TABLE]
If , we get
[TABLE]
Otherwise, we have
[TABLE]
In order to get rid of the recursion, we define the set of descendants for every triple as before and obtain
[TABLE]
We can investigate each of these terms separately. First we notice that Definition 1 implies that all index sets corresponding to clusters on the same level of are disjoint, and we have
[TABLE]
for all . Next we notice that Definition 2 and the sparsity assumption (12) together with (16) yield
[TABLE]
for all . Finally we observe that Definition 6.11 ensures that implies both and , so that we can again use the sparsity assumption (12) together with (17) to get
[TABLE]
for all . With these preliminary estimates at our disposal, we can consider the sums appearing in (15).
For (15a), we can take advantage of the fact that due to (12), for every , there are at most clusters such that , so we find
[TABLE]
By Definition 2, the depth of the block tree is bounded by the depth of the cluster tree , and therefore by the constant introduced in our assumptions. Since every block has at most one father, a block cannot have more than predecessors , and we can use (17) to get
[TABLE]
We can proceed in a similar manner for (15b) to get
[TABLE]
The sum (15d) can be handled using (18) to get
[TABLE]
This leaves us with only (15c). Here, we have to distinguish two cases: if , we can proceed as in the first two cases to find
[TABLE]
where we have used (17) in the last step. If , i.e., if it has been necessary to split an admissible leaf block temporarily, the definition of yields
[TABLE]
using (18) in the last step. Collecting all parts of the sum (15) yields our final result.
Theorem 6.12** (Matrix multiplication).**
Let . We have
[TABLE]
with .
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J.-Y. L’Excellent, and Clément Weisbecker. Improving multifrontal methods by means of block low-rank representations. SIAM J. Sci. Comput. , 37(3):A 1451–A 1474, 2015.
- 2[2] P. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. On the complexity of the block low-rank multifrontal factorizations. SIAM J. Sci. Comput. , 39(4):A 1710–A 1740, 2017.
- 3[3] U. Baur. Low rank solution of data-sparse Sylvester equations. Numer. Lin. Alg. Appl. , 15:837–851, 2008.
- 4[4] U. Baur and P. Benner. Factorized solution of Lyapunov equations based on hierarchical matrix arithmetic. Computing , 78(3):211–234, 2006.
- 5[5] M. Bebendorf. Approximation of boundary element matrices. Numer. Math. , 86(4):565–589, 2000.
- 6[6] M. Bebendorf and W. Hackbusch. Existence of ℋ ℋ {\mathcal{H}} -matrix approximants to the inverse FE-matrix of elliptic operators with L ∞ superscript 𝐿 L^{\infty} -coefficients. Numer. Math. , 95:1–28, 2003.
- 7[7] G. Beylkin, R. Coifman, and V. Rokhlin. The fast wavelet transform and numerical algorithms. Comm. Pure and Appl. Math. , 44:141–183, 1991.
- 8[8] S. Börm and L. Grasedyck. Low-rank approximation of integral operators by interpolation. Computing , 72:325–332, 2004.
