Complexity estimates for triangular hierarchical matrix algorithms

Steffen B\"orm

arXiv:1905.10824·math.NA·May 28, 2019

Complexity estimates for triangular hierarchical matrix algorithms

Steffen B\"orm

PDF

Open Access

TL;DR

This paper provides theoretical complexity estimates for hierarchical matrix algorithms, demonstrating that LR factorizations are computationally efficient and comparable to matrix multiplication, thereby supporting their practical advantages in solving integral and differential equations.

Contribution

It offers the first theoretical complexity bounds for $ ext{H}$-matrix LR factorizations, inversion, and multiplication, confirming their efficiency over direct inversion.

Findings

01

LR factorization requires no more operations than matrix multiplication.

02

An improved upper bound for the complexity of matrix multiplication is established.

03

Theoretical estimates support the efficiency of $ ext{H}$-matrix algorithms in practical applications.

Abstract

Triangular factorizations are an important tool for solving integral equations and partial differential equations with hierarchical matrices ( $H$ -matrices). Experiments show that using an $H$ -matrix LR factorization to solve a system of linear questions is superior to direct inversion both with respect to accuracy and efficiency, but so far theoretical estimates quantifying these advantages were missing. Due to a lack of symmetry in $H$ -matrix algorithms, we cannot hope to prove that the LR factorization takes one third of the operations of the inversion or the matrix multiplication, as in standard linear algebra. We can, however, prove that the LR factorization together with two other operations of similar complexity, i.e., the inversion and multiplication of triangular matrices, requires not more operations than the matrix multiplication. We can…

Equations226

ℓ = 1 \sum n (n - ℓ) + 2 (n - ℓ)^{2}

ℓ = 1 \sum n (n - ℓ) + 2 (n - ℓ)^{2}

ℓ = 1 \sum n 1 + (n - ℓ) + (n - ℓ)^{2}

ℓ = 1 \sum n (n - ℓ) + 2 (n - ℓ)^{2}

\frac{n}{6} (12 n^{2} - 6 n + 6) \leq 2 n^{3} operations,

rank (G ∣_{\hat{t} \times \overset{s}{^}})

rank (G ∣_{\hat{t} \times \overset{s}{^}})

G ∣_{\hat{t} \times \overset{s}{^}} = A_{t s} B_{t s}^{*} .

G ∣_{\hat{t} \times \overset{s}{^}} = A_{t s} B_{t s}^{*} .

X ∣_{\hat{t}}

X ∣_{\hat{t}}

X ∣_{\hat{t}}

X ∣_{\hat{t}}

W_{ev} (t, s, ℓ)

W_{ev} (t, s, ℓ)

X ∣_{\overset{s}{^}}

X ∣_{\overset{s}{^}}

G ∣_{\hat{t} \times \overset{s}{^}} \leftarrow G ∣_{\hat{t} \times \overset{s}{^}} + A B^{*}

G ∣_{\hat{t} \times \overset{s}{^}} \leftarrow G ∣_{\hat{t} \times \overset{s}{^}} + A B^{*}

G ∣_{\hat{t} \times \overset{s}{^}}

G ∣_{\hat{t} \times \overset{s}{^}}

W_{up} (t, s, ℓ)

W_{up} (t, s, ℓ)

G ∣_{\hat{t} \times \overset{s}{^}}

G ∣_{\hat{t} \times \overset{s}{^}}

G := (A_{1} R_{1}^{*} \dots A_{m} R_{m}^{*}) \in R^{\hat{t} \times (k m)} .

G := (A_{1} R_{1}^{*} \dots A_{m} R_{m}^{*}) \in R^{\hat{t} \times (k m)} .

G ∣_{\hat{t} \times \overset{s}{^}} = G Q_{1}^{*} ⋱ Q_{m}^{*} \approx A Q^{*} Q_{1}^{*} ⋱ Q_{m}^{*}

G ∣_{\hat{t} \times \overset{s}{^}} = G Q_{1}^{*} ⋱ Q_{m}^{*} \approx A Q^{*} Q_{1}^{*} ⋱ Q_{m}^{*}

A_{m - 1} Q_{m - 1}^{*} \approx (A_{m - 1} R_{m - 1}^{*} A_{m} R_{m}^{*})

A_{m - 1} Q_{m - 1}^{*} \approx (A_{m - 1} R_{m - 1}^{*} A_{m} R_{m}^{*})

G

G

= (A_{1} R_{1}^{*} \dots A_{m - 1}) (I_{(m - 2) k} Q_{m - 1}^{*}),

G \approx A_{1} Q_{1}^{*} (I_{k} Q_{2}^{*}) (I_{2 k} Q_{3}^{*}) \dots (I_{(m - 2) k} Q_{m - 1}^{*}) .

G \approx A_{1} Q_{1}^{*} (I_{k} Q_{2}^{*}) (I_{2 k} Q_{3}^{*}) \dots (I_{(m - 2) k} Q_{m - 1}^{*}) .

Q := (I_{(m - 2) k} Q_{m - 1}) (I_{(m - 3) k} Q_{m - 2}) \dots (I_{k} Q_{2}) Q_{1}

Q := (I_{(m - 2) k} Q_{m - 1}) (I_{(m - 3) k} Q_{m - 2}) \dots (I_{k} Q_{2}) Q_{1}

C_{mg}^{'} k^{2} (∣ \hat{t} ∣ ∣ sons (s) ∣ + ∣ \overset{s}{^} ∣ (2∣ sons (t) ∣ - 1)) \leq C_{mg} k^{2} t^{'} \in sons (t) s^{'} \in sons (s) \sum ∣ \hat{t}^{'} ∣ + ∣ \overset{s}{^}^{'} ∣

C_{mg}^{'} k^{2} (∣ \hat{t} ∣ ∣ sons (s) ∣ + ∣ \overset{s}{^} ∣ (2∣ sons (t) ∣ - 1)) \leq C_{mg} k^{2} t^{'} \in sons (t) s^{'} \in sons (s) \sum ∣ \hat{t}^{'} ∣ + ∣ \overset{s}{^}^{'} ∣

k_{t s}

k_{t s}

W_{mm} (t, s, r)

W_{mm} (t, s, r)

L_{ν μ}

L_{ν μ}

L ∣_{\hat{t} \times \hat{t}}

L ∣_{\hat{t} \times \hat{t}}

L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t}}

L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t}}

(Y ∣_{\hat{t}_{1}} Y ∣_{\hat{t}_{2}}) = Y ∣_{\hat{t}} = L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t}} = (L_{11} L_{21} L_{22}) (X ∣_{\hat{t}_{1}} X ∣_{\hat{t}_{2}}) = (L_{11} X ∣_{\hat{t}_{1}} L_{21} X ∣_{\hat{t}_{1}} + L_{22} X ∣_{\hat{t}_{2}}),

(Y ∣_{\hat{t}_{1}} Y ∣_{\hat{t}_{2}}) = Y ∣_{\hat{t}} = L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t}} = (L_{11} L_{21} L_{22}) (X ∣_{\hat{t}_{1}} X ∣_{\hat{t}_{2}}) = (L_{11} X ∣_{\hat{t}_{1}} L_{21} X ∣_{\hat{t}_{1}} + L_{22} X ∣_{\hat{t}_{2}}),

(Y ∣_{\hat{t}_{1}} Y ∣_{\hat{t}_{2}}) = Y ∣_{\hat{t}} = R ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t}} = (R_{11} R_{12} R_{22}) (X ∣_{\hat{t}_{1}} X ∣_{\hat{t}_{2}}) = (R_{11} X ∣_{\hat{t}_{1}} + R_{12} X ∣_{\hat{t}_{2}} R_{22} X ∣_{\hat{t}_{2}}),

(Y ∣_{\hat{t}_{1}} Y ∣_{\hat{t}_{2}}) = Y ∣_{\hat{t}} = R ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t}} = (R_{11} R_{12} R_{22}) (X ∣_{\hat{t}_{1}} X ∣_{\hat{t}_{2}}) = (R_{11} X ∣_{\hat{t}_{1}} + R_{12} X ∣_{\hat{t}_{2}} R_{22} X ∣_{\hat{t}_{2}}),

L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t} \times \overset{s}{^}}

L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t} \times \overset{s}{^}}

L ∣_{\hat{t} \times \hat{t}} A_{X, t s} = A_{Y, t s},

L ∣_{\hat{t} \times \hat{t}} A_{X, t s} = A_{Y, t s},

L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t} \times \overset{s}{^}} = L ∣_{\hat{t} \times \hat{t}} A_{X, t s} B_{Y, t s}^{*} = A_{Y, t s} B_{Y, t s}^{*} = Y ∣_{\hat{t} \times \overset{s}{^}} .

L ∣_{\hat{t} \times \hat{t}} X ∣_{\hat{t} \times \overset{s}{^}} = L ∣_{\hat{t} \times \hat{t}} A_{X, t s} B_{Y, t s}^{*} = A_{Y, t s} B_{Y, t s}^{*} = Y ∣_{\hat{t} \times \overset{s}{^}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsElectromagnetic Scattering and Analysis · Electromagnetic Simulation and Numerical Methods · Matrix Theory and Algorithms

Full text

Complexity estimates for triangular hierarchical matrix

algorithms

Steffen Börm

Abstract

Triangular factorizations are an important tool for solving integral equations and partial differential equations with hierarchical matrices ( $\mathcal{H}$ -matrices).

Experiments show that using an $\mathcal{H}$ -matrix LR factorization to solve a system of linear questions is superior to direct inversion both with respect to accuracy and efficiency, but so far theoretical estimates quantifying these advantages were missing.

Due to a lack of symmetry in $\mathcal{H}$ -matrix algorithms, we cannot hope to prove that the LR factorization takes one third of the operations of the inversion or the matrix multiplication, as in standard linear algebra. We can, however, prove that the LR factorization together with two other operations of similar complexity, i.e., the inversion and multiplication of triangular matrices, requires not more operations than the matrix multiplication.

We can complete the estimates by proving an improved upper bound for the complexity of the matrix multiplication, designed for recently introduced variants of classical $\mathcal{H}$ -matrices.

1 Introduction

Hierarchical matrices [21, 16], $\mathcal{H}$ -matrices for short, can be used to approximate certain densely populated matrices arising in the context of integral equations [5, 8, 9] and elliptic partial differential equations [6, 11] in linear-polylogarithmic complexity. Compared to other methods like fast multipole expansions [19, 20] or wavelet approximations [7, 10], it is possible to approximate arithmetic operations like the matrix multiplication, inversion, or triangular factorization for $\mathcal{H}$ -matrices in linear-polylogarithmic complexity. This property makes $\mathcal{H}$ -matrices attractive for a variety of applications, starting with solving partial differential equations [6, 11] and integral equations [12], up to dealing with matrix equations [15, 17, 4, 3] and evaluating matrix functions [13, 14].

Already the first articles on $\mathcal{H}$ -matrix techniques introduced an algorithm for approximating the inverse of an $\mathcal{H}$ -matrix by recursively applying a block representation [21, 16]. This approach works well, but is quite time-consuming.

The situation improved significantly when Lintner and Grasedyck introduced an efficient algorithm for approximating the LR factorization of an $\mathcal{H}$ -matrix [23, 18], reducing the computational work by a large factor and simultaneously considerably improving the accuracy. It is fairly easy to prove that the $\mathcal{H}$ -LR or $\mathcal{H}$ -Cholesky factorization requires less computational work than the $\mathcal{H}$ -matrix multiplication or inversion, and for the latter operations linear-polylogarithmic complexity bounds have been known for years [16].

For dense $n\times n$ matrices in standard array representation, we know that a straightforward implementation of the LR factorization requires

[TABLE]

Our first goal is to prove that this statement also holds for $\mathcal{H}$ -matrices with (almost) arbitrary block trees, i.e., that the operations appearing in the factorization, triangular inversion, and multiplication fit together like the parts of a jigsaw puzzle corresponding to the $\mathcal{H}$ -matrix multiplication. Incidentally, combining the three algorithms also allows us to compute the approximate $\mathcal{H}$ -matrix inverse in place without the need for a separate output matrix.

In order to complete the complexity analysis, we also have to show that the $\mathcal{H}$ -matrix multiplication has linear-polylogarithmic complexity. This is already known [16, 22], but can find an improved estimate that reduces the impact of the sparsity of the block tree and therefore may be interesting for recently developed versions of $\mathcal{H}$ -matrices, e.g., MBLR-matrices, that use a denser block tree to improve the potential for parallelization [1, 2].

2 Definitions

The blockwise low-rank structure of $\mathcal{H}$ -matrices $G\in\mathbb{R}^{\mathcal{I}\times\mathcal{I}}$ is conveniently described by the cluster tree, a hierarchical subdivision of an index set $\mathcal{I}$ into disjoint subsets $\hat{t}$ called clusters, and the block tree, a hierarchical subdivision of a product index set $\mathcal{I}\times\mathcal{I}$ into subsets $\hat{t}\times\hat{s}$ constructed from these clusters.

Definition 1 (Cluster tree)

Let $\mathcal{I}$ be a finite index set. A tree $\mathcal{T}_{\mathcal{I}}$ is a cluster tree for this index set if each node $t\in\mathcal{T}_{\mathcal{I}}$ is labeled with a subset $\hat{t}\subseteq\mathcal{I}$ and if these subsets satisfy the following conditions:

•

The root of $\mathcal{T}_{\mathcal{I}}$ is labeled with $\mathcal{I}$ .

•

If $t\in\mathcal{T}_{\mathcal{I}}$ has sons, the label of $t$ is the union of the labels of the sons, i.e., $\hat{t}=\bigcup_{t^{\prime}\in\mathop{\operatorname{sons}}\nolimits(t)}\hat{t}^{\prime}$ .

•

The labels of sons of $t\in\mathcal{T}_{\mathcal{I}}$ are disjoint, i.e., for $t\in\mathcal{T}_{\mathcal{I}}$ and $t_{1},t_{2}\in\mathop{\operatorname{sons}}\nolimits(t)$ with $t_{1}\neq t_{2}$ , we have $\hat{t}_{1}\cap\hat{t}_{2}=\emptyset$ .

The nodes of a cluster tree are called clusters. The set of leaves is denoted by $\mathcal{L}_{\mathcal{I}}$ .

Definition 2 (Block tree)

Let $\mathcal{T}_{\mathcal{I}}$ be a cluster tree for an index set $\mathcal{I}$ . A tree $\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ is a block tree for this cluster tree if

•

For each node $b\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ there are cluster $t,s\in\mathcal{T}_{\mathcal{I}}$ with $b=(t,s)$ . $t$ is called the row cluster for $b$ and $s$ the column cluster.

•

If $r\in\mathcal{T}_{\mathcal{I}}$ is the root of $\mathcal{T}_{\mathcal{I}}$ , the root of $\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ is $b=(r,r)$ .

•

If $b=(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ has sons, they are pairs of the sons of $t$ and $s$ , i.e., $\mathop{\operatorname{sons}}\nolimits(b)=\mathop{\operatorname{sons}}\nolimits(t)\times\mathop{\operatorname{sons}}\nolimits(s)$ .

The nodes of a block tree are called blocks. The set of leaves is denoted by $\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ .

We can see that the labels of the leaves of a cluster tree $\mathcal{T}_{\mathcal{I}}$ correspond to a disjoint partition of the index set $\mathcal{I}$ and that the sets $\hat{t}\times\hat{s}$ with $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ correspond to a disjoint partition of the index set $\mathcal{I}\times\mathcal{I}$ , i.e., of a decomposition of a matrix $G\in\mathbb{R}^{\mathcal{I}\times\mathcal{I}}$ into submatrices.

Among the leaf blocks $\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , we identify those that correspond to submatrices that we expect to have low numerical rank. These blocks are called admissible and collected in the set $\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}\subseteq\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ . The remaining leaves are called inadmissible and collected in the set $\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}:=\mathcal{L}_{\mathcal{I}\times\mathcal{I}}\setminus\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}$ .

We note that there are efficient algorithms at our disposal for constructing cluster and block trees for various applications [16, 22].

Definition 3 (Hierarchical matrix)

Let $\mathcal{T}_{\mathcal{I}}$ be a cluster tree for an index set $\mathcal{I}$ , and let $\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ be a block tree for $\mathcal{T}_{\mathcal{I}}$ with sets $\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}$ and $\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}$ of admissible and inadmissible leaves. A matrix $G\in\mathbb{R}^{\mathcal{I}\times\mathcal{I}}$ is a hierarchical matrix (or short $\mathcal{H}$ -matrix) of local rank $k\in\mathbb{N}$ if

[TABLE]

i.e., if all admissible leaves have a rank smaller or equal to $k$ .

If $G$ is a hierarchical matrix, we can find matrices $A_{ts}\in\mathbb{R}^{\hat{t}\times k}$ and $B_{ts}\in\mathbb{R}^{\hat{s}\times k}$ for every admissible leaf $b=(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}$ such that

[TABLE]

Here we use the shorthand notation $\mathbb{R}^{\hat{t}\times k}$ for the set $\mathbb{R}^{\hat{t}\times[1:k]}$ of matrices with row indices in $\hat{t}\subseteq\mathcal{I}$ and column indices in $[1:k]$ .

For inadmissible leaves $b=(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}$ , we store the nearfield matrices $N_{ts}\in\mathbb{R}^{\hat{t}\times\hat{s}}$ directly. The matrix families $(A_{ts})_{(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}}$ , $(B_{ts})_{(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}}$ , and $(N_{ts})_{(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}}$ together represent an $\mathcal{H}$ -matrix $G\in\mathbb{R}^{\mathcal{I}\times\mathcal{I}}$ .

3 Basic $\mathcal{H}$ -matrix operations

Before we consider algorithms for triangular $\mathcal{H}$ -matrices, we have to recall the algorithms they are based on: the $\mathcal{H}$ -matrix-vector multiplication, the $\mathcal{H}$ -matrix low-rank update, and the $\mathcal{H}$ -matrix multiplication.

The multiplication an $\mathcal{H}$ -matrix $G\in\mathbb{R}^{\mathcal{I}\times\mathcal{I}}$ with multiple vectors collected in the columns of a matrix $Y\in\mathbb{R}^{\mathcal{I}\times\ell}$ can be split into updates

[TABLE]

where $X|_{\hat{t}}:=X|_{\hat{t}\times\ell}$ denotes the restriction of $X$ to the row indices in $\hat{t}$ and $\alpha\in\mathbb{R}$ is a scaling factor. For inadmissible leaves, the update can be carried out directly, taking care to minimize the computational work by ensuring that the scaling by $\alpha$ is applied to $Y|_{\hat{s}}$ if $|\hat{s}|\leq|\hat{t}|$ and to the product $G|_{\hat{t}\times\hat{s}}Y|_{\hat{s}}$ otherwise.

For admissible leaves we have $G|_{\hat{t}\times\hat{s}}=A_{ts}B_{ts}^{*}$ and can first compute the intermediate matrix $\widehat{Z}_{ts}:=\alpha B_{ts}^{*}Y|_{\hat{s}}\in\mathbb{R}^{k\times\ell}$ and then add $A_{ts}\widehat{Z}_{ts}$ to the output, i.e.,

[TABLE]

For non-leaf blocks $b=(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}\setminus\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , we recursively consider sons until we arrive at leaves. In total, the number of operations for (3) is equal to

[TABLE]

for all $b=(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . The multiplication by the transposed $\mathcal{H}$ -matrix $G^{*}$ , i.e., updates of the form

[TABLE]

can be handled simultaneously and also requires $W_{\text{ev}}(t,s,\ell)$ operations. In the following, we assume that procedures “addeval” and “addevaltrans” for the operations (3) and (5) are at our disposal.

The low-rank update of an $\mathcal{H}$ -matrix $G$ , i.e., the approximation of

[TABLE]

for a block $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , $A\in\mathbb{R}^{\hat{t}\times\ell}$ and $B\in\mathbb{R}^{\hat{s}\times\ell}$ is realized by recursively moving to the leaves of the block tree and performing a direct update for inadmissible leaves and a truncated update for admissible ones: if $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}$ , we have $G|_{\hat{t}\times\hat{s}}=A_{ts}B_{ts}^{*}$ and approximate

[TABLE]

by computing the thin Householder factorization $\widehat{B}=QR$ , a low-rank approximation $C\widehat{D}^{*}$ of $\widehat{A}R^{*}$ , so that $D:=Q\widehat{D}$ yields a low-rank approximation $CD^{*}=C\widehat{D}^{*}Q^{*}\approx\widehat{A}R^{*}Q^{*}=\widehat{A}\widehat{B}^{*}$ . Assuming that the Householder factorization and the low-rank approximation of $n\times m$ matrices require $\mathcal{O}(nm\min\{n,m\})$ operations, we find a constant $C_{\text{ad}}$ such that the number of operations for a low-rank update (6) is bounded by

[TABLE]

for all $b=(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . In the following, we assume that a procedure “update” for approximating the operation (6) in this way is available.

During the course of the $\mathcal{H}$ -matrix multiplication, we may have to split a low-rank matrix into submatrices, perform updates to these submatrices, and then merge them into a larger low-rank matrix. This task can be handled essentially like the update, but we have to take special care in case that the number of submatrices is large. Let $t,s\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ with $|\mathop{\operatorname{sons}}\nolimits(s)|=m\in\mathbb{N}$ and $\mathop{\operatorname{sons}}\nolimits(s)=\{s_{1},\ldots,s_{m}\}$ . We are looking for an approximation of the matrix

[TABLE]

where we assume that preliminary compression steps have ensured $k\leq|\hat{t}|$ .

In a first step, we compute thin Householder factorizations $B_{j}=Q_{j}R_{j}$ with $R_{j}\in\mathbb{R}^{k\times k}$ for all $j\in[1:m]$ . Due to our assumption and $|\hat{s}|=|\hat{s}_{1}|+\ldots+|\hat{s}_{m}|$ , this requires $\mathcal{O}(|\hat{s}|k^{2})$ operations. Now we form the reduced matrix

[TABLE]

Once we have found a rank- $k$ approximation $\widehat{A}\widehat{Q}^{*}\approx\widehat{G}$ with $\widehat{A}\in\mathbb{R}^{\hat{t}\times k}$ and an isometric matrix $\widehat{Q}\in\mathbb{R}^{(mk)\times k}$ , applying the Householder reflections yields the rank- $k$ approximation

[TABLE]

of the original matrix in $\mathcal{O}(|\hat{s}|k^{2})$ operations. To construct a rank- $k$ approximation of $\widehat{G}$ , we can proceed sequentially: first we use techniques like the singular value decomposition or rank-revealing QR factorization to obtain a rank- $k$ approximation

[TABLE]

with $\widehat{A}_{m-1}\in\mathbb{R}^{\hat{t}\times k}$ and an isometric matrix $\widehat{Q}_{m-1}\in\mathbb{R}^{(2k)\times k}$ . Due to our assumptions, this task can be accomplished in $\mathcal{O}(|\hat{t}|k^{2})$ operations. We find

[TABLE]

where $I_{(m-2)k}$ denotes the $(m-2)k$ -dimensional identity matrix. The left factor now has only $(m-1)k$ columns, and repeating the procedure $m-2$ times yields

[TABLE]

Since the matrices $\widehat{Q}_{1},\ldots,\widehat{Q}_{m-1}$ have only $k$ columns and $2k$ rows by construction, we can compute

[TABLE]

in $\mathcal{O}(k^{3}(m-1))\subseteq\mathcal{O}(|\hat{t}|k^{2}(m-1))$ operations and find the desired low-rank approximation. We conclude that there is a constant $C_{\text{mg}}^{\prime}$ such that not more than $C_{\text{mg}}^{\prime}k^{2}(|\hat{t}|(m-1)+|\hat{s}|)$ operations are needed to merge the row blocks. We can apply the same procedure to merge column blocks, as well, and see that

[TABLE]

operations are sufficient to merge submatrices for all the sons of a block $(t,s)\in\mathcal{T}_{\mathcal{I}}\times\mathcal{T}_{\mathcal{I}}$ , where $C_{\text{mg}}:=2C_{\text{mg}}$ . In the following, we assume that a procedure “merge” for this task is available.

Finally, the $\mathcal{H}$ -matrix multiplication algorithm carries out the approximate update $Z|_{\hat{t}\times\hat{r}}\leftarrow Z|_{\hat{t}\times\hat{r}}+\alpha X|_{\hat{t}\times\hat{s}}Y|_{\hat{s}\times\hat{r}}$ with $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ and $(s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , again by recursively considering sons of the blocks until one of them is a leaf, so the product $X|_{\hat{t}\times\hat{s}}Y|_{\hat{s}\times\hat{r}}$ is of low rank and can be computed using the functions “addeval” and “addevaltrans”. The functions “update” and “merge” can then be used to add the result to $Z|_{\hat{t}\times\hat{r}}$ , performing low-rank truncations if necessary. The algorithm is summarized in Figure 1.

In order to keep the notation short, we introduce the local rank of leaf blocks by

[TABLE]

We can see that the multiplication algorithm in Figure 1 performs matrix-vector multiplications for matrices with $k_{ts}$ columns if $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ and for matrices with $k_{sr}$ columns if $(s,r)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , followed by a low-rank update.

We obtain the bound

[TABLE]

for the computational work required by the algorithm “addmul” in Figure 1 called with $(t,s),(s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , where we include the work for merging submatrices in all non-leaf cases for the sake of simplicity.

4 Algorithms for triangular $\mathcal{H}$ -matrices

In the context of algorithms for hierarchical matrices, we assume triangular matrices to be compatible with the structure of the cluster tree $\mathcal{T}_{\mathcal{I}}$ , i.e., if we have two sons $t_{1},t_{2}\in\mathop{\operatorname{sons}}\nolimits(t)$ of a cluster $t\in\mathcal{T}_{\mathcal{I}}$ and if there are indices $i\in\hat{t}_{1}$ and $j\in\hat{t}_{2}$ with $i<j$ , all indices in $\hat{t}_{1}$ are smaller than all indices in $\hat{t}_{2}$ . For this property, we use the shorthand $t_{1}<t_{2}$ .

While this may appear to be a significant restriction at first glance, it rarely poses problems in practice, since we can define the order of the indices in the index set $\mathcal{I}$ to satisfy our condition by simply choosing an arbitrary order on the indices in leaf clusters and an arbitrary order on the sons of non-leaf clusters. By induction, these orders give rise to a global order on $\mathcal{I}$ satisfying our requirements.

We are mainly interested in three operations: the construction of an LR factorization $G=LR$ , i.e., the decomposition of $G$ into a left lower triangular matrix with unit diagonal $L$ and a right upper triangular matrix $R$ , the inversion of the triangular matrices, and the multiplication of triangular matrices. Together, these three operations allow us to overwrite a matrix with its inverse.

For the sake of simplicity, we assume that we are working with a binary cluster tree $\mathcal{T}_{\mathcal{I}}$ , i.e., a cluster is either a leaf or has exactly two sons $t_{1}<t_{2}$ . In the latter case, we can split triangular matrices into submatrices

[TABLE]

to obtain

[TABLE]

We also assume that diagonal blocks, i.e., blocks of the form $b=(t,t)$ with $t\in\mathcal{T}_{\mathcal{I}}$ , are never admissible.

Forward and backward substitution

In order to use an LR factorization as a solver, we have to be able to solve systems $LX=Y$ , $RX=Y$ , $XL=Y$ , and $XR=Y$ . The third equation can be reduced to the second by taking the adjoint, and the fourth equation similarly reduces to the first. We consider the more general tasks of solving

[TABLE]

for an arbitrary cluster $t\in\mathcal{T}_{\mathcal{I}}$ .

We first address the case that $X$ and $Y$ are matrices in standard representation, i.e., that no low-rank approximations are required.

If $t$ is a leaf, $L|_{\hat{t}\times\hat{t}}$ and $R|_{\hat{t}\times\hat{t}}$ are a standard matrices and we can solve the equations by standard forward and backward substitution.

If $t$ is not a leaf, the first equation takes the form

[TABLE]

so we can solve $L_{11}X|_{\hat{t}_{1}}=Y|_{\hat{t}_{1}}$ by recursion, overwrite $Y|_{\hat{t}_{2}}$ by $\widetilde{Y}_{2}:=Y|_{\hat{t}_{2}}-L_{21}X|_{\hat{t}_{1}}$ using the algorithm “addeval”, and solve $L_{22}X|_{\hat{t}_{2}}=\widetilde{Y}_{2}$ by recursion.

The second equation takes the form

[TABLE]

so we can solve $R_{22}X|_{\hat{t}_{2}}=Y|_{\hat{t}_{2}}$ by recursion, overwrite $Y|_{\hat{t}_{1}}$ by $\widetilde{Y}_{1}:=Y|_{\hat{t}_{1}}-R_{12}X|_{\hat{t}_{2}}$ using the algorithm “addeval” again, and solve $R_{11}X|_{\hat{t}_{1}}=\widetilde{Y}_{1}$ by recursion. The algorithms are summarized in Figure 2. Counterparts “lsolvetrans” and “rsolvetrans” for the adjoint matrices $L^{*}$ and $R^{*}$ can be defined in a similar fashion using “addevaltrans” instead of “addeval”.

In order to construct the LR factorization, we will also have to solve the systems $LX=Y$ and $XR=Y$ with $\mathcal{H}$ -matrices $X$ and $Y$ , and this requires some modifications to the algorithms: we consider the systems

[TABLE]

for blocks $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . On one hand, we can take advantage of the low-rank structure if $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}$ holds, i.e., if $(t,s)$ is an admissible leaf. In this case, we have $Y|_{\hat{t}\times\hat{s}}=A_{Y,ts}B_{Y,ts}^{*}$ with $A_{Y,ts}\in\mathbb{R}^{\hat{t}\times k}$ and $B_{Y,ts}\in\mathbb{R}^{\hat{s}\times k}$ , and with the solution $A_{X,ts}\in\mathbb{R}^{\hat{t}\times k}$ of the linear system

[TABLE]

we find that $X|_{\hat{t}\times\hat{s}}:=A_{X,ts}B_{Y,ts}^{*}$ solves

[TABLE]

This property allows us to handle admissible blocks very efficiently.

On the other hand, we cannot expect to be able to perform the update $\widetilde{Y}_{2}=Y|_{\hat{t}_{2}}-L_{21}X|_{\hat{t}_{1}}$ exactly, since we want to preserve the $\mathcal{H}$ -matrix structure of $Y$ , so we have to use “addmul” instead of “addeval”, approximating the intermediate result with the given accuracy.

In order to keep the implementation simple, it also makes sense to follow the structure of the block tree: if we switch to the sons of $t$ , we should also switch to the sons of $s$ , if it still has sons. The resulting algorithms are summarized in Figure 3.

We also require counterparts for the systems

[TABLE]

for blocks $(s,t)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . These can be constructed along the same lines as before and are summarized in Figure 4.

LR factorization

Now we have the necessary tools at our disposal to address the LR factorization. Given an $\mathcal{H}$ -matrix $G$ , we consider computing a lower triangular matrix $L$ with unit diagonal and an upper triangular matrix $R$ with

[TABLE]

for a cluster $t\in\mathcal{T}_{\mathcal{I}}$ . If $t$ is a leaf, $G|_{\hat{t}\times\hat{t}}$ is given in standard array representation and we can compute the LR factorization by the usual algorithms.

If $t$ is not a leaf, we follow (9) and define

[TABLE]

Our equation takes the form

[TABLE]

We can compute the LR factorization of $G|_{\hat{t}\times\hat{t}}$ by first computing the factorization $G_{11}=L_{11}R_{11}$ by recursion, followed by solving $G_{21}=L_{21}R_{11}$ with “rrsolve” and $G_{12}=L_{11}R_{12}$ with “llsolve”, computing the Schur complement $\widetilde{G}_{22}:=G_{22}-L_{21}R_{12}$ approximately with “addmul”, and finding the LR factorization $L_{22}R_{22}=\widetilde{G}_{22}$ , again by recursion. The resulting algorithm is summarized in Figure 5, where $G_{22}$ is overwritten by the Schur complement $\widetilde{G}_{22}$ .

Triangular inversion

The next operation to consider is the inversion of triangular matrices, i.e., we are looking for $\widetilde{L}:=L^{-1}$ and $\widetilde{R}:=R^{-1}$ . As before, we consider the inversion of submatrices $L|_{\hat{t}\times\hat{t}}$ and $R|_{\hat{t}\times\hat{t}}$ for a cluster $t\in\mathcal{T}_{\mathcal{I}}$ .

Again, if $t$ is a leaf, the matrices $L|_{\hat{t}\times\hat{t}}$ and $R|_{\hat{t}\times\hat{t}}$ are given in standard representation and can be inverted by standard algorithms. If $t$ has sons, the inverses can be written as

[TABLE]

so we can compute the off-diagonal blocks by calling “llsolve” and “lrsolve” in the first case and “rlsolve” and “rrsolve” in the second case, and then invert the diagonal blocks by recursion. The algorithms are summarized in Figure 6.

Triangular matrix multiplication

Finally, having $\widetilde{L}=L^{-1}$ and $\widetilde{R}=R^{-1}$ at our disposal, we consider computing the inverse

[TABLE]

As in the previous cases, recursion leads to sub-problems of the form

[TABLE]

for clusters $t\in\mathcal{T}_{\mathcal{I}}$ . If $t$ is a leaf, we can compute the product directly.

Otherwise, the equation takes the form

[TABLE]

We can see that we have to compute products $\widetilde{R}_{11}\widetilde{L}_{11}$ and $\widetilde{R}_{22}\widetilde{L}_{22}$ that are of the same kind as the original problem and can be handled by recursion. We also have to compute the product $\widetilde{R}_{12}\widetilde{L}_{21}$ , which can be accomplished by “addmul”.

Finally, we have to compute $\widetilde{R}_{22}\widetilde{L}_{21}$ and $\widetilde{R}_{12}\widetilde{L}_{22}$ , i.e., products of triangular and non-triangular $\mathcal{H}$ -matrices. In order to handle this task, we could introduce suitable counterparts of the algorithms “rlsolve” and “lrsolve” that multiply by a triangular matrix instead of by its inverse. These algorithms would in turn require counterparts of “lsolve” and “rsolve”, i.e., we would have to introduce four more algorithms.

To keep this article short, another approach can be used: Since we have $\widetilde{R}_{22}=R_{22}^{-1}$ and $\widetilde{L}_{22}=L_{22}^{-1}$ , we can evaluate the product $\widetilde{R}_{12}\widetilde{L}_{22}=\widetilde{R}_{12}L_{22}^{-1}$ by the algorithm “lrsolve” and $\widetilde{R}_{22}\widetilde{L}_{21}=R_{22}^{-1}\widetilde{L}_{21}$ by the algorithm “rlsolve” without the need for additional algorithms. The result is summarized in Figure 7.

Remark 4 (In-place operation)

All algorithms for triangular matrices introduced in this section can overwrite input variables with the result: for triangular solves, the right-hand side can be overwritten with the result, for the LR factorization, the lower and upper triangular parts of the input matrix can be overwritten with the triangular factors, the triangular inversion algorithms can overwrite the input matrices with the inverses.

If we are only interested in computing the inverse, we can interleave the algorithm “lrinvert” with “linvert” and “rinvert” to avoid additional storage for the intermediate results $\widetilde{L}$ and $\widetilde{R}$ : once the LR factorization is available, we first overwrite the off-diagonal blocks $L_{21}$ and $R_{12}$ by the intermediate results $\widetilde{L}_{21}=-L_{22}^{-1}L_{21}L_{11}^{-1}$ and $\widetilde{R}_{12}=-R_{11}^{-1}R_{12}^{-1}R_{22}^{-1}$ , then overwrite the first diagonal block recursively by its inverse, add the product $\widetilde{R}_{12}\widetilde{L}_{21}$ , then overwrite the no longer required off-diagonal blocks with $R_{22}^{-1}\widetilde{L}_{21}$ and $\widetilde{R}_{12}L_{22}^{-1}$ . A recursive call to compute the inverse of the second diagonal block completes the algorithm.

5 Complexity estimates for combined operations

Due to the lack of symmetry introduced by the low-rank approximation steps required to compute an $\mathcal{H}$ -matrix, we cannot prove that the LR factorization requires one third of the work of the matrix multiplication. We can, however, prove that the LR factorization $G=LR$ , the inversion of the triangular factors, and the multiplication $R^{-1}L^{-1}$ together require not more work than the matrix multiplication.

Before we can consider the $\mathcal{H}$ -matrix case, we recall the corresponding estimates for standard matrices, cf. (1): the LR factorization, triangular matrix inversion, and multiplication require

[TABLE]

By adding the estimates for the four parts of the inversion algorithm, we obtain a computational cost of

[TABLE]

i.e., inverting $G$ requires less operations than multiplying the matrix by itself and adding the result to a matrix. We aim to obtain a similar result for $\mathcal{H}$ -matrices.

We assume that the block tree is admissible, i.e., that a leaf $b=(t,s)$ of the block tree $\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ is either admissible or has a leaf of $\mathcal{T}_{\mathcal{I}}$ either as row or column cluster:

[TABLE]

and we assume that there is a constant $\varrho\in\mathbb{N}$ such that

[TABLE]

The constant $\varrho$ is called the resolution (sometimes also the leaf size) of $\mathcal{T}_{\mathcal{I}}$ . Both properties (10) and (11) can be ensured during the construction of the cluster tree.

Now let us consider the number of operations for the algorithms “lsolve” and “rsolve” given in Figure 2. If $t\in\mathcal{T}_{\mathcal{I}}$ is a leaf, solving the linear systems requires $|\hat{t}|^{2}$ operations. Otherwise, we just use “addeval” and recursive calls and arrive at the recurrence formulas

[TABLE]

that give bounds for the number of operations required by “lsolve” and “rsolve”, respectively, where $\ell\in\mathbb{N}$ again denotes the columns of the matrices $X$ and $Y$ .

Lemma 5 (Solving linear systems)

We have

[TABLE]

Proof 5.6.

By structural induction.

Let $t\in\mathcal{L}_{\mathcal{I}}$ . We have $(t,t)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}$ and therefore

[TABLE]

Let now $t\in\mathcal{T}_{\mathcal{I}}\setminus\mathcal{L}_{\mathcal{I}}$ be such that our claim holds for the sons $t_{1}$ and $t_{2}$ . We have

[TABLE]

In the next step, we compare the computational work for the $\mathcal{H}$ -matrix multiplication with that for the combination of the algorithms “llsolve” and “rlsolve” or “lrsolve” and “rrsolve”, respectively.

The computational work for the forward substitution algorithm “llsolve” given in Figure 3 can be bounded by

[TABLE]

for all $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , while we get

[TABLE]

for all $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ for the algorithm “rlsolve”.

Lemma 5.7 (Forward and backward solves).

We have

[TABLE]

Proof 5.8.

By structural induction, where the base case $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ is split into two sub-cases for admissible and inadmissible leaves.

Case 1:* Let $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{+}$ . If $(t,t)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ holds, we have $t\in\mathcal{L}_{\mathcal{I}}$ and*

[TABLE]

Otherwise, i.e., if $(t,t)\not\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , we have $t\not\in\mathcal{L}_{\mathcal{I}}$ , and Lemma 5 yields

[TABLE]

Case 2:* Let $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}$ . If $(t,t)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ holds, we have $t\in\mathcal{L}_{\mathcal{I}}$ and*

[TABLE]

Otherwise, i.e., if $(t,t)\not\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , we have $t\not\in\mathcal{L}_{\mathcal{I}}$ and therefore $s\in\mathcal{L}_{\mathcal{I}}$ . Due to (11), this means $|\hat{s}|\leq\varrho<|\hat{t}|$ , and we can use $k_{ts}=|\hat{s}|$ and Lemma 5 to obtain

[TABLE]

Case 3:* Let $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}\setminus\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ be such that our claim holds for all sons of $(t,s)$ . Since $(t,s)$ is not a leaf, we have $t\not\in\mathcal{L}_{\mathcal{I}}$ and therefore also $(t,t)\not\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ . This implies*

[TABLE]

and our proof is complete.

Now we consider the two algorithms “lrsolve” and “rrsolve”. They rely on “lsolvetrans” and “rsolvetrans”, and these algorithms require the same work as “lsolve” and “rsolve”, respectively. The work for “lrsolve” and “rrsolve” is then bounded by

[TABLE]

for all $(s,t)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . Proceeding as in the proof of Lemma 5.7 leads us to

[TABLE]

Now that the fundamental statements for the forward and backward subsitution algorithms are at our disposal, we can consider the factorization and inversion algorithms. We directly obtain the bounds

[TABLE]

for all clusters $t\in\mathcal{T}_{\mathcal{I}}$ and the algorithms “lrdecomp”, “linvert”, “rinvert”, and “lrinvert”, respectively.

Theorem 5.9 (Combined complexity).

We have

[TABLE]

Proof 5.10.

By structural induction. We start with the base case $t\in\mathcal{L}_{\mathcal{I}}$ and observe

[TABLE]

Now let $t\in\mathcal{T}_{\mathcal{I}}$ be chosen such that the estimate holds for all of its sons. We obtain

[TABLE]

where we have used Lemma 5.7 in the next-to-last estimate.

6 Complexity of the $\mathcal{H}$ -matrix

multiplication

We have seen that the number of operations for our algorithms can be bounded by the number of operations $W_{\text{mm}}(t,s,r)$ required by the matrix multiplication. In order to complete the analysis, we derive a bound for $W_{\text{mm}}(t,s,r)$ that is sharper than the standard results provided in [16, 22]. We rely on the following assumptions:

•

for an inadmissible leaf $b=(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}$ of the block tree, we have either $t\in\mathcal{L}_{\mathcal{I}}$ or $s\in\mathcal{L}_{\mathcal{I}}$ ,

•

there is a constant $m\in\mathbb{N}$ , e.g., the resolution introduced in (11), such that

[TABLE]

•

there is a constant $p\in\mathbb{N}_{0}$ such that

[TABLE]

i.e., $p$ is an upper bound for the depth of the cluster tree,

•

the block tree is sparse, i.e., there is a constant $C_{\text{sp}}\in\mathbb{N}$ such that

[TABLE]

We denote the maximal rank of leaf blocks by

[TABLE]

In order to facilitate working with sums involving clusters and blocks, we introduce the sets of descendants of clusters and blocks by

[TABLE]

for all $t,s\in\mathcal{T}_{\mathcal{I}}$ . The special case $(t,s)\not\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ will be important when dealing with products of hierarchical matrices that are added to a part of a low-rank submatrix.

Since the matrix multiplication relies on the matrix-vector multiplication, we start by deriving a bound for $W_{\text{ev}}(t,s,\ell)$ introduced in (4). If $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}^{-}$ , our first assumption yields $t\in\mathcal{L}_{\mathcal{I}}$ or $s\in\mathcal{L}_{\mathcal{I}}$ . In the first case, the second assumption gives us $|\hat{t}|\leq m$ and

[TABLE]

In the second case, we have $|\hat{s}|\leq m$ and obtain

[TABLE]

Now we can combine the estimates for the inadmissible leaves with those for the admissible ones to find

[TABLE]

for all $b=(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . A straightforward induction yields

[TABLE]

Next we consider the low-rank update and look for an upper bound for $W_{\text{up}}(t,s)$ introduced in (7). Using the same arguments as before, we find

[TABLE]

for all $t,s\in\mathcal{T}_{\mathcal{I}}$ , and introducing $C_{\text{up}}:=\max\{C_{\text{ad}},1\}$ yields the upper bound

[TABLE]

for all $t,s\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ due to $2\ell\hat{k}\leq(\hat{k}+\ell)^{2}$ . A straightforward induction, keeping in mind the special case of $\mathop{\operatorname{desc}}\nolimits(t,s)$ for $(t,s)\not\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , leads to the estimate

[TABLE]

Now we can investigate the matrix multiplication, i.e., we can look for an upper bound for $W_{\text{mm}}(t,s,r)$ introduced in (8). While the computational work for the matrix-vector multiplication and the update depends only on two clusters $t,s\in\mathcal{T}_{\mathcal{I}}$ , the matrix multiplication depends on three $t,s,r\in\mathcal{T}_{\mathcal{I}}$ . We can collect these triples in a special product tree that represents the recursive structure of the algorithm.

Definition 6.11 (Product tree).

Given a cluster tree $\mathcal{T}_{\mathcal{I}}$ and a corresponding block tree $\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , the product tree $\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ is the minimal tree satisfying the following conditions:

•

For every node $\pi\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ of the product tree, there are clusters $t,s,r\in\mathcal{T}_{\mathcal{I}}$ with $\pi=(t,s,r)$ .

•

Let $t\in\mathcal{T}_{\mathcal{I}}$ be the root of $\mathcal{T}_{\mathcal{I}}$ . Then $(t,t,t)$ is the root of $\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ .

•

A node $\pi=(t,s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ is a leaf if and only if $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ or $(s,r)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ . Otherwise, its sons are given by $\mathop{\operatorname{sons}}\nolimits(\pi)=\mathop{\operatorname{sons}}\nolimits(t)\times\mathop{\operatorname{sons}}\nolimits(s)\times\mathop{\operatorname{sons}}\nolimits(r)$ .

Due to this definition, $\pi=(t,s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ implies $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ and $(s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ .

Let $\pi=(t,s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ . If $(t,s)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , we can apply the estimates (13) and (14) to obtain

[TABLE]

If $(s,r)\in\mathcal{L}_{\mathcal{I}\times\mathcal{I}}$ , we get

[TABLE]

Otherwise, we have

[TABLE]

In order to get rid of the recursion, we define the set of descendants $\mathop{\operatorname{desc}}\nolimits(t,s,r)$ for every triple $\pi=(t,s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ as before and obtain

[TABLE]

We can investigate each of these terms separately. First we notice that Definition 1 implies that all index sets corresponding to clusters on the same level of $\mathcal{T}_{\mathcal{I}}$ are disjoint, and we have

[TABLE]

for all $t\in\mathcal{T}_{\mathcal{I}}$ . Next we notice that Definition 2 and the sparsity assumption (12) together with (16) yield

[TABLE]

for all $(t,s)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . Finally we observe that Definition 6.11 ensures that $(t^{\prime},s^{\prime},r^{\prime})\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ implies both $(t^{\prime},s^{\prime})\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ and $(s^{\prime},r^{\prime})\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , so that we can again use the sparsity assumption (12) together with (17) to get

[TABLE]

for all $(t,s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ . With these preliminary estimates at our disposal, we can consider the sums appearing in (15).

For (15a), we can take advantage of the fact that due to (12), for every $(s^{\prime},r^{\prime})\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , there are at most $C_{\text{sp}}$ clusters $t^{\prime}\in\mathcal{T}_{\mathcal{I}}$ such that $(t^{\prime},s^{\prime},r^{\prime})\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , so we find

[TABLE]

By Definition 2, the depth of the block tree $\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ is bounded by the depth of the cluster tree $\mathcal{T}_{\mathcal{I}}$ , and therefore by the constant $p$ introduced in our assumptions. Since every block has at most one father, a block $(s^{\prime\prime},r^{\prime\prime})\in\mathop{\operatorname{desc}}\nolimits(s,r)$ cannot have more than $p+1$ predecessors $(s^{\prime},r^{\prime})\in\mathop{\operatorname{desc}}\nolimits(s,r)$ , and we can use (17) to get

[TABLE]

We can proceed in a similar manner for (15b) to get

[TABLE]

The sum (15d) can be handled using (18) to get

[TABLE]

This leaves us with only (15c). Here, we have to distinguish two cases: if $(t^{\prime},r^{\prime})\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , we can proceed as in the first two cases to find

[TABLE]

where we have used (17) in the last step. If $(t^{\prime},r^{\prime})\not\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}}$ , i.e., if it has been necessary to split an admissible leaf block temporarily, the definition of $\mathop{\operatorname{desc}}\nolimits(t^{\prime},r^{\prime})$ yields

[TABLE]

using (18) in the last step. Collecting all parts of the sum (15) yields our final result.

Theorem 6.12 (Matrix multiplication).

Let $(t,s,r)\in\mathcal{T}_{\mathcal{I}\times\mathcal{I}\times\mathcal{I}}$ . We have

[TABLE]

with $C_{\text{mm}}:=4+2C_{\text{up}}+C_{\text{mg}}$ .

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. Amestoy, C. Ashcraft, O. Boiteau, A. Buttari, J.-Y. L’Excellent, and Clément Weisbecker. Improving multifrontal methods by means of block low-rank representations. SIAM J. Sci. Comput. , 37(3):A 1451–A 1474, 2015.
2[2] P. Amestoy, A. Buttari, J.-Y. L’Excellent, and T. Mary. On the complexity of the block low-rank multifrontal factorizations. SIAM J. Sci. Comput. , 39(4):A 1710–A 1740, 2017.
3[3] U. Baur. Low rank solution of data-sparse Sylvester equations. Numer. Lin. Alg. Appl. , 15:837–851, 2008.
4[4] U. Baur and P. Benner. Factorized solution of Lyapunov equations based on hierarchical matrix arithmetic. Computing , 78(3):211–234, 2006.
5[5] M. Bebendorf. Approximation of boundary element matrices. Numer. Math. , 86(4):565–589, 2000.
6[6] M. Bebendorf and W. Hackbusch. Existence of ℋ ℋ {\mathcal{H}} -matrix approximants to the inverse FE-matrix of elliptic operators with L ∞ superscript 𝐿 L^{\infty} -coefficients. Numer. Math. , 95:1–28, 2003.
7[7] G. Beylkin, R. Coifman, and V. Rokhlin. The fast wavelet transform and numerical algorithms. Comm. Pure and Appl. Math. , 44:141–183, 1991.
8[8] S. Börm and L. Grasedyck. Low-rank approximation of integral operators by interpolation. Computing , 72:325–332, 2004.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Complexity estimates for triangular hierarchical matrix

Abstract

1 Introduction

2 Definitions

Definition 1** (Cluster tree)**

Definition 2** (Block tree)**

Definition 3** (Hierarchical matrix)**

3 Basic H\mathcal{H}H-matrix operations

4 Algorithms for triangular H\mathcal{H}H-matrices

Forward and backward substitution

LR factorization

Triangular inversion

Triangular matrix multiplication

Remark 4** (In-place operation)**

5 Complexity estimates for combined operations

Lemma 5** (Solving linear systems)**

Proof 5.6**.**

Lemma 5.7** (Forward and backward solves).**

Proof 5.8**.**

Theorem 5.9** (Combined complexity).**

Proof 5.10**.**

6 Complexity of the H\mathcal{H}H-matrix

Definition 6.11** (Product tree).**

Theorem 6.12** (Matrix multiplication).**

Definition 1 (Cluster tree)

Definition 2 (Block tree)

Definition 3 (Hierarchical matrix)

3 Basic $\mathcal{H}$ -matrix operations

4 Algorithms for triangular $\mathcal{H}$ -matrices

Remark 4 (In-place operation)

Lemma 5 (Solving linear systems)

Proof 5.6.

Lemma 5.7 (Forward and backward solves).

Proof 5.8.

Theorem 5.9 (Combined complexity).

Proof 5.10.

6 Complexity of the $\mathcal{H}$ -matrix

Definition 6.11 (Product tree).

Theorem 6.12 (Matrix multiplication).