Low-rank updates and divide-and-conquer methods for quadratic matrix   equations

Daniel Kressner; Patrick K\"urschner; Stefano Massei

arXiv:1903.02343·math.NA·March 7, 2019

Low-rank updates and divide-and-conquer methods for quadratic matrix equations

Daniel Kressner, Patrick K\"urschner, Stefano Massei

PDF

TL;DR

This paper introduces a fast low-rank update technique and a divide-and-conquer method for solving large-scale quadratic matrix equations, especially those with hierarchical low-rank structured coefficients, improving efficiency over iterative schemes.

Contribution

It presents a novel low-rank update approach and a divide-and-conquer algorithm tailored for quadratic matrix equations with hierarchical low-rank structures, extending previous linear matrix methods.

Findings

01

The proposed methods outperform iterative schemes in numerical experiments.

02

The divide-and-conquer approach efficiently handles structured large-scale equations.

03

Low-rank updates enable quick adjustments to solutions under coefficient modifications.

Abstract

In this work, we consider two types of large-scale quadratic matrix equations: Continuous-time algebraic Riccati equations, which play a central role in optimal and robust control, and unilateral quadratic matrix equations, which arise from stochastic processes on 2D lattices and vibrating systems. We propose a simple and fast way to update the solution to such matrix equations under low-rank modifications of the coefficients. Based on this procedure, we develop a divide-and-conquer method for quadratic matrix equations with coefficients that feature a specific type of hierarchical low-rank structure, which includes banded matrices. This generalizes earlier work on linear matrix equations. Numerical experiments indicate the advantages of our newly proposed method versus iterative schemes combined with hierarchical low-rank arithmetic.

Tables5

Table 1. Table 1: Execution times (in seconds) and residuals for the divide-and-conquer method and SDA applied to the CARE from Example 5.1 .

	Algorithm 4			SDA
$n$	$Time$	$Res$	HODLR rank	$Time$	$Res$	HODLR rank
$1, 024$	$2.06$	$4.41 \cdot 10^{- 11}$	$55$	$6.11$	$1.17 \cdot 10^{- 10}$	$55$
$2, 048$	$4.02$	$1.00 \cdot 10^{- 10}$	$58$	$21.36$	$1.82 \cdot 10^{- 9}$	$59$
$4, 096$	$8.33$	$5.85 \cdot 10^{- 10}$	$67$	$65.42$	$1.42 \cdot 10^{- 9}$	$67$
$8, 192$	$18.27$	$5.62 \cdot 10^{- 9}$	$67$	$226$	$1.33 \cdot 10^{- 8}$	$67$
$16, 384$	$39.71$	$1.02 \cdot 10^{- 8}$	$74$	$670.17$	$7.44 \cdot 10^{- 8}$	$76$
$32, 768$	$84.1$	$5.56 \cdot 10^{- 8}$	$73$	$2, 023.8$	$2.33 \cdot 10^{- 6}$	$62$

Table 2. Table 2: Execution times (in seconds) and residuals for the divide-and-conquer method and SDA applied to the CARE from Example 5.2 .

	Algorithm 4			SDA
$n$	$Time$	$Res$	HODLR rank	$Time$	$Res$	HODLR rank
$1, 024$	$1.44$	$2.34 \cdot 10^{- 10}$	$10$	$3.15$	$9.54 \cdot 10^{- 14}$	$5$
$2, 048$	$2.74$	$1.23 \cdot 10^{- 11}$	$9$	$7.76$	$2.54 \cdot 10^{- 13}$	$5$
$4, 096$	$5.5$	$8.03 \cdot 10^{- 12}$	$10$	$20.43$	$2.80 \cdot 10^{- 13}$	$4$
$8, 192$	$11.2$	$7.51 \cdot 10^{- 13}$	$10$	$52.07$	$2.88 \cdot 10^{- 13}$	$4$
$16, 384$	$22.21$	$8.82 \cdot 10^{- 13}$	$10$	$131.18$	$2.89 \cdot 10^{- 13}$	$3$
$32, 768$	$44.47$	$1.56 \cdot 10^{- 13}$	$9$	$312.57$	$2.89 \cdot 10^{- 13}$	$3$

Table 3. Table 3: Execution times (in seconds) and residuals for the divide-and-conquer method and SDA applied to the CARE from Example 5.3 .

$n$	$Time$	$Res$	HODLR rank
	Algorithm 4
$1, 024$	$7.54$	$1.20 \cdot 10^{- 7}$	$23$
$2, 048$	$15.01$	$3.02 \cdot 10^{- 8}$	$27$
$4, 096$	$29.74$	$7.58 \cdot 10^{- 9}$	$28$
$8, 192$	$62.01$	$1.90 \cdot 10^{- 9}$	$31$
$16, 384$	$128.99$	$4.74 \cdot 10^{- 10}$	$28$
$32, 768$	$263.61$	$1.19 \cdot 10^{- 10}$	$27$

Table 4. Table 4: Execution times (in seconds) and residuals for the divide-and conquer-method and cyclic reduction applied to the example from Section 5.2 .

	Algorithm 6			CR
$n$	$Time$	$Res$	HODLR rank	$Time$	$Res$	HODLR rank
$1, 024$	$1.84$	$6.45 \cdot 10^{- 9}$	$15$	$3$	$7.04 \cdot 10^{- 9}$	$20$
$2, 048$	$3.2$	$3.83 \cdot 10^{- 9}$	$16$	$10.37$	$4.31 \cdot 10^{- 9}$	$18$
$4, 096$	$8.68$	$5.08 \cdot 10^{- 9}$	$17$	$23.55$	$6.82 \cdot 10^{- 9}$	$21$
$8, 192$	$22.27$	$5.18 \cdot 10^{- 9}$	$18$	$68.16$	$3.87 \cdot 10^{- 9}$	$20$
$16, 384$	$55.54$	$6.10 \cdot 10^{- 9}$	$18$	$160.28$	$5.54 \cdot 10^{- 9}$	$22$
$32, 768$	$137.09$	$6.67 \cdot 10^{- 9}$	$18$	$429.2$	$8.61 \cdot 10^{- 9}$	$22$

Table 5. Table 5: Execution times (in seconds) and residuals for the divide-and-conquer method and cyclic reduction applied to the example from Section 5.3 .

	Algorithm 6			CR
$n$	$Time$	$Res$	HODLR rank	$Time$	$Res$	HODLR rank
$1, 024$	$0.48$	$7.76 \cdot 10^{- 9}$	$2$	$0.61$	$2.38 \cdot 10^{- 8}$	$4$
$2, 048$	$1.2$	$7.76 \cdot 10^{- 9}$	$2$	$1.28$	$2.38 \cdot 10^{- 8}$	$4$
$4, 096$	$2.95$	$7.76 \cdot 10^{- 9}$	$2$	$2.97$	$2.38 \cdot 10^{- 8}$	$4$
$8, 192$	$7.18$	$7.76 \cdot 10^{- 9}$	$2$	$6.95$	$2.38 \cdot 10^{- 8}$	$4$
$16, 384$	$18.05$	$7.76 \cdot 10^{- 9}$	$2$	$14.75$	$2.38 \cdot 10^{- 8}$	$4$
$32, 768$	$41.14$	$7.76 \cdot 10^{- 9}$	$2$	$34.57$	$2.38 \cdot 10^{- 8}$	$4$

Equations115

A^{*} X E + E^{*} X A - E^{*} X F X E + Q = 0,

A^{*} X E + E^{*} X A - E^{*} X F X E + Q = 0,

A_{0}^{*} X_{0} + X_{0} A_{0} - X_{0} F_{0} X_{0} + Q_{0} = 0,

A_{0}^{*} X_{0} + X_{0} A_{0} - X_{0} F_{0} X_{0} + Q_{0} = 0,

A^{*} X + X A - X F X + Q = 0,

A^{*} X + X A - X F X + Q = 0,

A := A_{0} + δ A, F := F_{0} + δ F, Q := Q_{0} + δ Q, with δ A, δ F, δ Q of low rank.

A := A_{0} + δ A, F := F_{0} + δ F, Q := Q_{0} + δ Q, with δ A, δ F, δ Q of low rank.

(A - F X_{0})^{*} δ X + δ X (A - F X_{0}) - δ X F δ X + Q = 0.

(A - F X_{0})^{*} δ X + δ X (A - F X_{0}) - δ X F δ X + Q = 0.

rk (Q) \leq rk (δ Q) + 2rk (δ A) + rk (δ F),

rk (Q) \leq rk (δ Q) + 2rk (δ A) + rk (δ F),

A X^{2} + B X + C = 0,

A X^{2} + B X + C = 0,

φ (λ) := λ^{2} A + λ B + C .

φ (λ) := λ^{2} A + λ B + C .

∣ λ_{1} ∣ ⩽ \dots ∣ λ_{n} ∣ ⩽ 1 ⩽ ∣ λ_{n + 1} ∣ ⩽ \dots ⩽ ∣ λ_{2 n} ∣, ∣ λ_{n} ∣ < ∣ λ_{n + 1} ∣,

∣ λ_{1} ∣ ⩽ \dots ∣ λ_{n} ∣ ⩽ 1 ⩽ ∣ λ_{n + 1} ∣ ⩽ \dots ⩽ ∣ λ_{2 n} ∣, ∣ λ_{n} ∣ < ∣ λ_{n + 1} ∣,

A_{0} X_{0}^{2} + B_{0} X_{0} + C_{0} = 0,

A_{0} X_{0}^{2} + B_{0} X_{0} + C_{0} = 0,

(A_{0} + δ A) (X_{0} + δ X)^{2} + (B_{0} + δ B) (X_{0} + δ X) + (C_{0} + δ C) = 0,

(A_{0} + δ A) (X_{0} + δ X)^{2} + (B_{0} + δ B) (X_{0} + δ X) + (C_{0} + δ C) = 0,

A δ X^{2} + (A X_{0} + B) δ X + A δ X X_{0} + C = 0,

A δ X^{2} + (A X_{0} + B) δ X + A δ X X_{0} + C = 0,

A X + X B = Q

A X + X B = Q

σ_{k h + 1} (X) \leq ∥ r (A) ∥_{2} ∥ r (- B)^{- 1} ∥_{2} ∥ X ∥_{2} .

σ_{k h + 1} (X) \leq ∥ r (A) ∥_{2} ∥ r (- B)^{- 1} ∥_{2} ∥ X ∥_{2} .

\frac{σ _{k h + 1} ( X )}{∥ X ∥ _{2}} ⩽ K_{C} r \in R_{h, h} min \frac{max _{E} ∣ r ( z )∣}{min _{F} ∣ r ( z )∣},

\frac{σ _{k h + 1} ( X )}{∥ X ∥ _{2}} ⩽ K_{C} r \in R_{h, h} min \frac{max _{E} ∣ r ( z )∣}{min _{F} ∣ r ( z )∣},

\frac{σ _{k h + 1} ( X )}{∥ X ∥ _{2}} ⩽ κ_{eig} (A) κ_{eig} (B) r \in R_{h, h} min \frac{max _{E} ∣ r ( z )∣}{min _{F} ∣ r ( z )∣}

\frac{σ _{k h + 1} ( X )}{∥ X ∥ _{2}} ⩽ κ_{eig} (A) κ_{eig} (B) r \in R_{h, h} min \frac{max _{E} ∣ r ( z )∣}{min _{F} ∣ r ( z )∣}

(A - F X_{0}) - F δ X = A - F X .

(A - F X_{0}) - F δ X = A - F X .

A^{*} δ X + δ X A = - Q + δ X F (δ X + X_{0}) + X_{0} F δ X .

A^{*} δ X + δ X A = - Q + δ X F (δ X + X_{0}) + X_{0} F δ X .

(A - F X)^{*} δ X + δ X (A - F X) = - Q - δ X F δ X .

(A - F X)^{*} δ X + δ X (A - F X) = - Q - δ X F δ X .

(A - F X)^{*} δ X + δ X (A - F X) = - Q + δ X F δ X - 2 δ X F δ X,

(A - F X)^{*} δ X + δ X (A - F X) = - Q + δ X F δ X - 2 δ X F δ X,

[X_{0} - C I - (A X_{0} + B)] - λ [I 0 0 A] .

[X_{0} - C I - (A X_{0} + B)] - λ [I 0 0 A] .

[X_{0} - C I - (A X_{0} + B)] [I δ X] = [I 0 0 A] [I δ X] X .

[X_{0} - C I - (A X_{0} + B)] [I δ X] = [I 0 0 A] [I δ X] X .

[X_{0} - A^{- 1} C I - (X_{0} + A^{- 1} B)] .

[X_{0} - A^{- 1} C I - (X_{0} + A^{- 1} B)] .

[I δ X 0 I]^{- 1} [X_{0} - A^{- 1} C I - (X_{0} + A^{- 1} B)] [I δ X 0 I] = [X 0 I - (X + A^{- 1} B)] .

[I δ X 0 I]^{- 1} [X_{0} - A^{- 1} C I - (X_{0} + A^{- 1} B)] [I δ X 0 I] = [X 0 I - (X + A^{- 1} B)] .

λ^{2} A + λ B + C = (λ A + A X + B) (λ I - X) = A^{- 1} (λ I + X + A^{- 1} B) (λ I - X) .

λ^{2} A + λ B + C = (λ A + A X + B) (λ I - X) = A^{- 1} (λ I + X + A^{- 1} B) (λ I - X) .

(X + A^{- 1} B) δ X + δ X X_{0} = - A^{- 1} C .

(X + A^{- 1} B) δ X + δ X X_{0} = - A^{- 1} C .

\displaystyle\mathcal{R}\mathcal{K}_{t}(A,U_{0},\xi):=\mathrm{range}\Big{\{}\Big{[}U_{0},(A-\xi_{1}I)^{-1}U_{0},\ldots,\Big{(}\prod\limits_{j=1}^{t-1}(A-\xi_{j}I)^{-1}\Big{)}U_{0}\Big{]}\Big{\}}

\displaystyle\mathcal{R}\mathcal{K}_{t}(A,U_{0},\xi):=\mathrm{range}\Big{\{}\Big{[}U_{0},(A-\xi_{1}I)^{-1}U_{0},\ldots,\Big{(}\prod\limits_{j=1}^{t-1}(A-\xi_{j}I)^{-1}\Big{)}U_{0}\Big{]}\Big{\}}

δ A = U_{A} V_{A}^{*}, δ Q = U_{Q} D_{Q} U_{Q}^{*}, δ F = U_{F} D_{F} U_{F}^{*},

δ A = U_{A} V_{A}^{*}, δ Q = U_{Q} D_{Q} U_{Q}^{*}, δ F = U_{F} D_{F} U_{F}^{*},

U := [U_{Q}, V_{A}, X_{0} U_{A}, X_{0} U_{F}], D = diag (D_{Q}, [0 I_{rk (U_{A})} I_{rk (U_{A})} 0], D_{F}),

U := [U_{Q}, V_{A}, X_{0} U_{A}, X_{0} U_{F}], D = diag (D_{Q}, [0 I_{rk (U_{A})} I_{rk (U_{A})} 0], D_{F}),

A_{corr}^{*} δ X + δ X A_{corr} - δ X F δ X + Q = 0.

A_{corr}^{*} δ X + δ X A_{corr} - δ X F δ X + Q = 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Low-rank updates and divide-and-conquer methods for quadratic matrix equations

Daniel Kressner EPF Lausanne, Switzerland, [email protected]

Patrick Kürschner KU Leuven, Electrical Engineering (ESAT), Kulak Kortrijk Campus, Belgium, [email protected]

Stefano Massei EPF Lausanne, Switzerland, [email protected]

Abstract

In this work, we consider two types of large-scale quadratic matrix equations: Continuous-time algebraic Riccati equations, which play a central role in optimal and robust control, and unilateral quadratic matrix equations, which arise from stochastic processes on 2D lattices and vibrating systems. We propose a simple and fast way to update the solution to such matrix equations under low-rank modifications of the coefficients. Based on this procedure, we develop a divide-and-conquer method for quadratic matrix equations with coefficients that feature a specific type of hierarchical low-rank structure, which includes banded matrices. This generalizes earlier work on linear matrix equations. Numerical experiments indicate the advantages of our newly proposed method versus iterative schemes combined with hierarchical low-rank arithmetic.

1 Introduction

This paper is concerned with numerical algorithms for treating two types of quadratic matrix equations with large-scale, data-sparse coefficients.

Type 1: CARE.

A continuous-time algebraic Riccati equation (CARE) takes the form

[TABLE]

where $A,E,F,Q$ are real $n\times n$ matrices, such that $E$ is invertible and $F,Q$ are symmetric positive semi-definite. Motivated by its central role in robust and optimal control [42, 31, 44, 35, 8], this class of equations has been widely studied in the literature; see, e.g., [15, 11, 6]. A solution $X$ to (1) is called stabilizing if the so called closed-loop matrix $A-FXE$ is stable, that is, all its eigenvalues are contained in the open left half plane. Mild conditions on the coefficients (see, e.g., [15, Sec. 2.2.2]) ensure the existence, uniqueness, and symmetric positive semi-definiteness of such a stabilizing solution $X$ .

We consider the case when $n$ is large and $F$ has low rank, that is, $F=BB^{*}$ for some matrix $B\in\mathbb{R}^{n\times m}$ with $m\ll n$ . This is a common assumption in linear-quadratic optimal control problems, where $m$ corresponds to the number of inputs [10, 11]. However, we do not impose low rank on $Q$ , which allows for having a large number of outputs in control problems, e.g., when observing the state directly. To simplify the exposition, we will focus the discussion mostly on the case $E=I$ ; the extension to general invertible $E$ will be explained in Section 3.2.1.

For $F=0$ , the equation (1) becomes linear and is called Lyapunov equation. A low-rank updating procedure for such linear matrix equations has been proposed recently in [30]. In this work, we extend this procedure to CARE. More specifically, assuming that $X_{0}$ satisfies a reference CARE

[TABLE]

we aim at computing a correction $\delta X$ such that $X:=X_{0}+\delta X$ solves the modified CARE

[TABLE]

with

[TABLE]

Subtracting (2) from (3) yields

[TABLE]

The modified constant term $\widehat{Q}:=\delta Q+\delta A^{*}X_{0}+X_{0}\delta A-X_{0}\delta FX_{0}$ satisfies

[TABLE]

where $\hbox{rk}(\cdot)$ denotes the rank of a matrix. Hence, independently of the rank of $Q_{0}$ , the constant term of the CARE (4) is guaranteed to have low rank. Note that most algorithms for large-scale Riccati equations [10, 11, 47, 46, 7] assume the constant term to be of low rank which, in turn, may render them unsuitable for solving (3). In contrast, the formulation (4) is well suited for such methods, returning an approximation of $\delta X$ in the form of a symmetric low-rank factorization.

Type 2: UQME.

A unilateral quadratic matrix equation (UQME) takes the form

[TABLE]

with $A,B,C\in\mathbb{R}^{n\times n}$ . The spectrum of a solution to (5) corresponds to a subset of the $2n$ eigenvalues of the matrix polynomial

[TABLE]

Instances of equation (5) arise in overdamped systems in structural mechanics [24] and are at the core of the matrix analytic method for quasi-birth–death (QBD) stochastic processes [13].

A typical situation in applications is that the eigenvalues of $\varphi(\lambda)$ are separated by the unit circle into two subsets of cardinality $n$ :

[TABLE]

and it is of interest to compute the minimal solution of (5), that is, the solution $X$ associated with $\lambda_{1},\dots,\lambda_{n}$ . Note that some of the eigenvalues are allowed to be infinite.

(7) implies that the matrix $X$ is the only power bounded solution of (5); this uniquely identify the matrix-geometric property [13] of certain QBD processes. (7) also guarantees the quadratic convergence of the cyclic reduction algorithm for computing $X$ [18, Theorem 9]. The minimal solution can be constructed as $X=V\operatorname{diag}(\lambda_{1},\dots,\lambda_{n})V^{-1}$ if the matrix $V$ containing the eigenvectors associated with $\lambda_{1},\dots,\lambda_{n}$ is invertible [29]. This property is usually met in practice and in the QBD setting it can be ensured via the probabilistic interpretation of the minimal solution [34, Section 6.2].

We assume that the minimal solution exists and that (7) holds for a reference equation

[TABLE]

as well as for the modified equation

[TABLE]

where $\delta A,\delta B$ and $\delta C$ are given low-rank matrices.

Denoting $A:=A_{0}+\delta A,B:=B_{0}+\delta B,C:=C_{0}+\delta C$ and subtracting (8) from (9) yields the following equation for the correction $\delta X$ :

[TABLE]

where $\widehat{C}:=\delta AX_{0}^{2}+\delta BX_{0}+\delta C$ has rank bounded by $\hbox{rk}(\delta A)+\hbox{rk}(\delta B)+\hbox{rk}(\delta C)$ . Note that equation (10) is not a UQME. Nevertheless, as will be seen in Section 2, there is still a correspondence between the solutions of (10) and an appropriately chosen eigenvalue problem. Similarly as for CARE, the low rank of $\widehat{C}$ will allow us to devise an efficient numerical method for (10).

Quadratic matrix equations with hierarchical low-rank structure.

In the second part of the paper we focus on quadratic equations with coefficients that feature hierarchically low-rank structure. More specifically, the coefficients of a CARE (1) or a UQME (5) are assumed to be hierarchically off-diagonal low-rank (HODLR) matrices[2, 27]. This framework aligns well with the low-rank updates discussed above, because HODLR matrices are block diagonalized by a low-rank perturbation and, in turn, the corresponding reference equations (2) and (8) decouple into two equations of smaller size. Applying this idea recursively results in a divide-and-conquer method for solving UQMEs with HODLR coefficients and CAREs with a low-rank quadratic term and all other coefficients in the HODLR format.

Existing fast algorithms that address such (and more general) scenarios are based on combining a matrix iteration with fast arithmetic in hierarchical low-rank format. For CAREs, a combination of the sign function iteration with hierarchical matrices has been proposed in [3, 23]. For UQMEs, a combination based on cyclic reduction has been proposed in [16, 17]. As pointed out in [30], a disadvantage of these strategies is that they exploit the structure only indirectly and rely on repeated recompression during the iteration, which may constitute a computational bottleneck.

Outline.

The rest of this paper is organized as follows. In Section 2, we study the correction equations (4) and (10), with a particular focus on providing intuition why one can expect their solutions to admit good low-rank approximations. Section 3 is concerned with numerical methods for obtaining such low-rank approximations. While a variety of large-scale solution methods have been recently developed for (4), the equation (10) is non-standard and requires the development of a novel large-scale solver, which may be of independent interest. Section 4 utilizes these solvers to derive divide-and-conquer methods for CARE and UQME featuring HODLR matrix coefficients. Finally, Section 5 highlights several applications of these divide-and-conquer methods and provides numerical evidence of their effectiveness.

2 Analysis of the correction equations

The purpose of this section is to study properties of the correction matrix $\delta X$ , which satisfies one of the two correction equations, (4) or (10).

2.1 Existence and low-rank approximability

A necessary requirement of most solvers for large-scale matrix equations to perform well is that the solution admits good low-rank approximations. This property can sometimes be verified a priori by showing that the singular values exhibit a strong decay. In the following we first recall such results for linear matrix equations and then use them to shed some insight on the low-rank approximability of $\delta X$ .

Singular value decay for linear matrix equations

Let us consider the so called Sylvester equation

[TABLE]

with the coefficients $A,B,Q\in\mathbb{R}^{n\times n}$ such that the spectra of $A$ and $-B$ are disjoint, and $Q$ has rank $k\ll n$ . Moreover, let $\mathcal{R}_{h,h}$ denotes the set of rational functions with numerator and denominator degrees at most $h$ .

In [5], it is shown that for every $r\in\mathcal{R}_{h,h}$ there exists a matrix $\widetilde{X}$ of rank at most $kh$ such that $X-\widetilde{X}=r(A)Xr(-B)^{-1}$ , provided that the right-hand side is well defined. Using that the $(kh+1)$ th singular value, denoted by $\sigma_{kh+1}(\cdot)$ , governs the $2$ -norm error of the best approximation by a matrix of rank at most $kh$ , one obtains

[TABLE]

Combined with norm estimates for rational matrix functions, this leads to the following theorem.

Theorem 2.1 (Theorem 2.1 in [5]).

Consider the Sylvester equation $AX+XB=Q$ , with $Q$ of rank $k$ , and let $E$ and $F$ be disjoint compact sets in the complex plane.

$(i)$

If $E,F$ contain the numerical ranges of $A$ and $-B$ , respectively, then

[TABLE]

where $K_{C}=1$ if $A,B$ are normal matrices and $1\leq K_{C}\leq(1+\sqrt{2})^{2}$ otherwise.

$(ii)$

If $A,B$ are diagonalizable and $E,F$ contain the spectra of $A$ and $-B$ , respectively, then

[TABLE]

where $\kappa_{\mathsf{eig}}(\cdot)$ denotes the $2$ -norm condition number of the eigenvector matrix.

The quantities $Z_{h}(E,F):=\min_{r\in\mathcal{R}_{h,h}}\frac{\max_{E}\lvert r(z)\rvert}{\min_{F}\lvert r(z)\rvert}$ are known in the literature as Zolotarev numbers. When $E$ and $F$ are well separated one can expect that $Z_{h}(E,F)$ decreases rapidly, as $h$ increases, and quickly reaches the level of machine precision. Explicit bounds showing exponential decay have been established for various configurations of $E$ and $F$ , including disjoint real intervals and circles [5, 48].

CARE.

The existence and uniqueness of a stabilizing solution to the correction equation (4) follows immediately from the observation that the closed-loop matrices of CARE (3) and CARE (4) are identical:

[TABLE]

This yields the following lemma.

Lemma 2.2.

Let $X_{0}$ be a solution of (2). Then the correction equation (4) has a unique stabilizing solution $\delta X$ if and only if the modified equation (3) has a unique stabilizing solution $X$ .

To study the low-rank approximability of $\delta X$ , let us first assume that $A$ is stable. By rearranging (4), we get

[TABLE]

Hence, $\delta X$ satisfies a Lyapunov equation with the rank of the right-hand side bounded by $2\hbox{rk}(F)+\hbox{rk}(\widehat{Q})$ . If, additionally, the numerical range of $A$ is contained in the open left half plane then the first part of Theorem 2.1 can be applied to yield singular value bounds for $\delta X$ . Alternatively, the second part can be applied under the milder assumption that $A$ is diagonalizable.

If $A$ is not stable, we rearrange (4) as

[TABLE]

As the closed loop matrix $A-FX$ is stable and the rank of the right-hand side is bounded by $\hbox{rk}(F)+\hbox{rk}(\widehat{Q})$ , Theorem 2.1 applies under the assumptions stated above with $A$ replaced by $A-FX$ . One should note, however, that the obtained bounds are somewhat implicit because they involves the numerical range or the eigenvector conditioning of $A-FX$ , quantities that are hard to estimate a priori. If more information is available for the closed loop matrix $A-F\widetilde{X}$ associated with a stabilizing initial guess $\widetilde{X}$ , one can instead work with the equation

[TABLE]

where $\delta\widetilde{X}:=\widetilde{X}-X_{0}$ .

UQME.

Solutions of the correction equation (10) are intimately related to the matrix pencil

[TABLE]

In fact, a direct computation shows that $\delta X$ solves (10) if and only if

[TABLE]

For simplicity, let us assume that $A$ is invertible. Then the eigenvalues of (11) coincide with the eigenvalues of the matrix

[TABLE]

By a similarity transformation,

[TABLE]

Because $X=X_{0}+\delta X$ is a solution of $\eqref{eq:uqme}$ , the quadratic matrix polynomial $\varphi(\lambda)$ defined in (6) admits the factorization

[TABLE]

Together with (13), this shows that the eigenvalues of the pencil (11) coincide with the eigenvalues of $\varphi(\lambda)$ . In particular, if $X_{0}$ and $X$ are the minimal solutions of (8) and (5), respectively, then the spectra of $X_{0}$ and $-(X+A^{-1}B)$ are separated by the unit circle. By rearranging (9), $\delta X$ can be viewed as the solution of a Sylvester equation with these coefficients and low-rank right hand side:

[TABLE]

This indicates good low-rank approximability of $\delta X$ .

3 Low-rank updates

In Section 1 we already described the basic procedure for updating the solution $X_{0}$ of a reference CARE or UQME. This requires solving correction equations of the form (4) or (10), respectively. In the following, we discuss how to solve these correction equations efficiently .

3.1 Projection subspaces

According to the discussion in Section 2.1, one may expect that the solutions of (4) and (10) admit good low-rank approximations. A common strategy for obtaining such approximate solutions is to project these matrix equations to a pair of subspaces. To be more specific, let $U,V\in\mathbb{R}^{n\times t}$ contain orthonormal bases of $t$ -dimensional subspaces $\mathcal{U},\mathcal{V}\subset\mathbb{R}^{n}$ . Then we consider approximate solutions of the form $\widetilde{X}:=UYV^{*}$ , where $Y\in\mathbb{R}^{t\times t}$ is obtained from solving a compressed matrix equation.

The choice of the projection subspaces $\mathcal{U},\mathcal{V}$ is key to obtaining good approximations. In the context of matrix equations, rational Krylov subspaces [41] are a popular and effective choice.

Definition 3.1.

Let $A\in\mathbb{R}^{n\times n}$ , $U_{0}\in\mathbb{R}^{n\times k}$ , $k<n$ , and $\xi\in\mathbb{C}^{t}$ . The vector space

[TABLE]

is called rational Krylov subspace with respect to $(A,U_{0},\xi)$ .

A good choice of shift parameters $\xi_{j}$ is crucial and we will discuss our choices for CARE and UQME below.

3.2 Low-rank solution of correction equation for CARE

To describe our approach for approximating the solution of (4), let us define $A_{\mathsf{corr}}:=A-FX_{0}$ and suppose that the low-rank updates of the coefficients are given in factorized, symmetry-preserving form:

[TABLE]

with $U_{A},V_{A}\in\mathbb{R}^{n\times\hbox{rk}{(\delta A)}}$ , $U_{Q}\in\mathbb{R}^{n\times\hbox{rk}{(\delta Q)}}$ , $U_{F}\in\mathbb{R}^{n\times\hbox{rk}{(\delta F)}}$ , and symmetric matrices $D_{Q}\in\mathbb{R}^{\hbox{rk}{(\delta Q)}\times\hbox{rk}{(\delta Q)}}$ , $D_{F}\in\mathbb{R}^{\hbox{rk}{(\delta F)}\times\hbox{rk}{(\delta F)}}$ . This allows us to write the right-hand side of (4) in factorized form $\widehat{Q}=UDU^{*}$ as well, with

[TABLE]

where $\operatorname{diag}$ denotes a block diagonal matrix with the blocks determined by the arguments. The correction equation (4) now reads as

[TABLE]

It is recommended to perform an optional preprocessing step that aims at reducing the rank of $\hat{Q}$ further. For this purpose, we compute a thin QR factorization $U=Q_{U}R_{U}$ followed by a (reordered) spectral decomposition

[TABLE]

such that the diagonal matrix $\Lambda_{2}$ contains all eigenvalues of magnitude smaller than a prescribed tolerance $\tau_{\sigma}$ . Discarding these eigenvalues results in the reduced-rank approximation $\hat{Q}\approx UDU^{*}$ with $U\leftarrow Q_{U}S_{1}$ and $D\leftarrow\Lambda_{1}$ .

For a large-scale CARE of the form (16), with both $\widehat{Q}$ and $F=BB^{*}$ of low rank, various numerical methods have been proposed [10, 11, 47, 46, 7, 6]. In the following, we focus on the rational Krylov subspace method (RKSM) [47, 46], but other solvers could be used as well. While these algorithms usually assume $\widehat{Q}$ to be positive semi-definite, their extension to possibly indefinite $\widehat{Q}$ poses no major obstacle; see also the discussion in [33]. RKSM constructs an approximate solution of the form $\delta X_{t}=V_{t}YV_{t}^{*}$ , where $V_{t}$ contains an orthonormal basis of a rational Krylov subspace $\mathcal{R}\mathcal{K}_{t}(A^{*}_{\mathsf{corr}},U,\xi)$ . The small matrix $Y$ is determined via a Galerkin condition, which comes down to solving the compressed CARE

[TABLE]

for a stabilizing solution $Y$ which can be addressed by direct algorithms for small, dense CAREs [9]. Note that the indefinite inhomogeneities $\tilde{U}D\tilde{U}^{*}$ are not an issue for the existence of such stabilizing solution which will then be also indefinite, see, e.g., [50].

In practice it can happen that the Hamiltonian matrix associated to the compressed CARE has eigenvalue close to the imaginary axis which can result in inaccurate solutions $Y$ . For refining the accuracy of $Y$ we apply a defect correction strategy similar to [37] given by (at most) 2 steps of a Newton’s method. Algorithm 1 gives a basic illustration of this method.

We refer to the relevant literature [22, 47, 46] for implementation details and only comment on some critical steps. For selecting the shift parameters in line 7 we employ the adaptive procedure from [22, 47, 46]. This may result in complex shifts or, more precisely, in complex conjugate pairs of shifts. The increased cost of working in complex arithmetic can be largely reduced by using an appropriate implementation, see, e.g., [40], which also returns a real approximation $\delta X_{t}$ . The shifted linear systems in line 7 involve the matrix $(A_{\mathsf{corr}}-\xi_{t}I)^{*}=(A-\xi_{t}I-FX_{0})^{-1}$ . If $A$ is sparse, such a system can be solved, e.g., by combining a sparse direct solver for $A-\xi_{t}I$ with the Sherman-Morrison-Woodbury formula to incorporate the low-rank modification $FX_{0}$ . If $A$ is a HODLR matrix then $A-\xi_{t}I-FX_{0}$ is a HODLR matrix as well and solvers for HODLR matrices can be used. Algorithm 1 is terminated once the residual is sufficiently small, that is, $\|R_{t}\|_{2}=\|A^{*}_{\mathsf{corr}}\delta X_{t}+\delta X_{t}A^{*}_{\mathsf{corr}}-\delta X_{t}F\delta X_{t}+UDU^{*}\|_{2}\leqslant\tau_{\mathsf{care}}$ for some prescribed tolerance $\tau_{\mathsf{care}}>0$ . An efficient way of computing the residual norm $\|R_{t}\|_{2}$ is described in [46]. After termination, it is recommended to perform an optional post-processing step, which aims at reducing the rank of $\delta X_{t}$ , analogous to the rank-reducing procedure for $\hat{Q}$ described above.

3.2.1 Extension to generalized CAREs

In this section, we briefly discuss the extension of our low-rank update procedure to the generalized CARE (GCARE), see (1). The reference and modified equation take the form

[TABLE]

where $\delta A,~{}\delta E,\delta F$ and $\delta Q$ are of low rank, and both $E=E_{0}+\delta E$ as well as $E_{0}$ are invertible. By subtracting (17) from (18), we find that $\delta X$ solves

[TABLE]

where $\widehat{Q}:=\delta Q+\delta A^{*}X_{0}E+E^{*}X_{0}\delta A-E^{*}X_{0}\delta FX_{0}E+\delta E^{*}X_{0}\delta FX_{0}E_{0}+E^{*}X_{0}\delta FX_{0}\delta E$ satisfies $\hbox{rk}{\widehat{Q}}\leqslant\hbox{rk}(Q)+2\hbox{rk}{(\delta A)}+2\hbox{rk}{(\delta F)}+\min\{\hbox{rk}{(\delta F)},\hbox{rk}{(\delta E)}\}$ . Similarly as in (15), we can write $\widehat{Q}=UDU^{*}$ with

[TABLE]

where $\delta E=U_{E}V_{E}^{*}$ with $U_{E},V_{E}\in\mathbb{R}^{n\times\hbox{rk}{(\delta E)}}$ . Again, an optional rank-reducing step for $\hat{Q}$ is recommended. By implicitly working on the equivalent CARE defined by the coefficients $E^{-1}(A-EX_{0}F)$ , $F$ , $E^{-*}\widehat{Q}E^{-1}$ , Algorithm 1 extends with minor modifications to (19). We refer to [47, 11, 46] for further details.

3.3 Low-rank solution of correction equation for UQME

The correction equation (10) features a constant coefficient that has low rank. However, unlike in the case of CARE, we are not aware of existing large-scale solvers tailored to this situation, neither for UQME nor for the modified form (10). For example, a fast cyclic reduction iteration proposed in [14] requires both the quadratic and the constant coefficient (that is, the matrices $A$ and $C$ in (5)) to be of low rank.

In the following, we develop a novel subspace projection method, largely inspired by the existing techniques for CARE described above.

We first discuss the choice of subspaces for our method and consider the Sylvester equation (14) for this purpose. In principle, subspace projection methods for Sylvester equations are well understood, but the coefficients of (14) involve the matrix $X$ , which depends on the unknown $\delta X$ . Solely for the purpose of choosing the subspaces, we replace $X$ with the reference solution $X_{0}$ and consider

[TABLE]

instead. Again, we assume that the low-rank updates are given in factorized form: $\delta A=U_{A}V_{A}^{*},\delta B=U_{B}V_{B}^{*}$ and $\delta C=U_{C}V_{C}^{*}$ . Then, the right-hand sides of (14) and (20) can be written as $A^{-1}\widehat{C}=UV^{*}$ with

[TABLE]

As for CARE, it is recommended to apply a preprocessing step aiming at reducing the rank of $UV^{*}$ . Existing solver for Sylvester equations [46] suggest the use of rational Krylov subspaces with coefficient matrices $X_{0}+A^{-1}B$ , $X^{*}_{0}$ and starting vectors $U$ , $V$ in order to solve (20). Specifically, we choose

[TABLE]

where $\mathbf{\pm 1}_{t}=[1,-1,\dots,1,-1]^{*}\in\mathbb{R}^{2t}$ . This particular choice of shift parameters corresponds to the extended Krylov subspace [45] for Sylvester equations, adapted to the case in which the spectra of the coefficients are separated by the unit circle instead of the imaginary line. Indeed, we replaced [math] and $\infty$ , the usual choice in the extended Krylov method, with $T(0)=-1$ and $T(\infty)=1$ where $T(z):=-\frac{1+z}{1-z}$ is the Cayley transform.

Suppose now that $U_{t},V_{t}$ contain orthonormal bases of the subspaces defined in (22). To construct an approximate solution $\delta X_{t}=U_{t}YV_{t}^{*}$ of the original equation (14), we impose a Galerkin condition with respect to the tensorized space $\mathcal{U}_{t}\otimes\mathcal{V}_{t}$ . This implies that the small matrix $Y$ satisfies the non-symmetric algebraic Riccati equation

[TABLE]

This compressed equation is solved by the structured doubling algorithm (SDA) [25, 15], see also Algorithm 2. If the projected Hamiltonian

[TABLE]

has an eigenvalue splitting with respect to the unit disc and both equation (23) and its dual equation (the one obtained interchanging $\widetilde{F}$ and $\widetilde{Q}$ ) admit a minimal solution, then SDA converges quadratically to the minimal solution of (23) [15, Theorem 5.4].

The whole procedure for solving (14) is summarized in Algorithm 3.

A few remarks concerning the implementation of Algorithms 2 and 3:

•

Algorithm 2 is stopped either when $\min\{\lVert E\rVert_{1},\lVert F\rVert_{1}\}<10^{-13}$ or when a maximum of $30$ iterations is reached. If the projected Hamiltonian (24) has the desired splitting of eigenvalues with respect to the unit circle then Algorithm 2 converges quadratically and is therefore likely to match the convergence condition within $30$ iterations. Otherwise, we move on and consider the next (enlarged) extended Krylov subspaces.

•

We rely on the rktoolbox [12] for executing the rational block Arnoldi processes that return the orthonormal bases $U_{t}$ and $V_{t}$ . We remark that the compressed matrices in line 6 of Algorithm 3 do not need to be computed explicitly; they can be obtained from the rational Krylov decomposition by adding an artificial final step with an infinite shift, see [26, page 74] and [4].

•

The number of iterations $t$ in Algorithm 3 is chosen adaptively to ensure that the relation

[TABLE]

is satisfied for some tolerance $\tau_{\mathsf{uqme}}$ . The artificial final step with an infinite shift mentioned above allows this relation to be verified efficiently, see [45, 28].

•

For applying $(\widehat{A}\pm I)^{-1}$ , $(X_{0}^{*}\pm I)^{-1}$ , LU factorizations of $\widehat{A}\pm I$ and $X_{0}^{*}\pm I$ are computed once before starting the rational block Arnoldi process.

•

After termination of Algorithm 3, it is – once again – recommended to perform an optional post-processing step that aims at reducing the rank of $\delta X_{t}$ .

4 Divide-and-conquer methods

Having an efficient procedure for performing low-rank updates at hand allows us to design divide-and-conquer methods for quadratic matrix equations with rank structured coefficients. For example, suppose that the coefficients of the CARE (3) admit the decompositions

[TABLE]

where $A_{0},F_{0},Q_{0}$ are block diagonal matrices of the same shape and $\delta A,\delta F,\delta Q$ have low rank. This allows us to split (3) into the correction equation (4), which we solve with Algorithm 1, and the two smaller, decoupled equations associated with the diagonal blocks of $A_{0},F_{0},Q_{0}$ . If these diagonal blocks again admit a decomposition of the form (26), we recursively repeat the splitting. The described strategy easily adapts to the UMQE (9).

The storage and manipulation of the low-rank corrections on the various levels of the recursion requires to work with a suitable format, such as the HODLR format.

4.1 HODLR matrices

A HODLR matrix $A\in\mathbb{R}^{n\times n}$ admits block partition

[TABLE]

where $A_{12}$ , $A_{21}$ have low rank and $A_{11}$ , $A_{22}$ are square matrices that again take the form (27). This splitting is continued recursively until the diagonal blocks reach a certain minimal block size $n_{\mathsf{min}}$ . Usually, the partitioning is chosen such that $A_{11}$ , $A_{22}$ have nearly equal sizes. Banded matrices are an important special case of HODLR matrices.

We say that $A$ has HODLR rank $k$ if $k$ is the smallest integer such that the ranks of $A_{21}$ and $A_{12}$ in (27) are bounded by $k$ at all levels of the recursion. If $k$ remains small then $A$ can be stored efficiently by replacing each off-diagonal block with its low-rank factors. The only dense blocks that need to be stored are the diagonal blocks at the lowest level, see Figure 1. In turn, the storage of a HODLR matrix requires $O(kn\log n)$ memory.

4.2 Divide-and-conquer in the HODLR format

For a CARE (3) with HODLR matrices $A,Q$ and a low-rank matrix $F=BB^{*}$ , a divide-and-conquer method can be derived along the lines of the linear case discussed in [30]. Consider

[TABLE]

The diagonal blocks $A_{ii},~{}Q_{ii}$ , $i=1,2$ are again HODLR matrices (with the recursion depth reduced by one). After recursively solving the CAREs associated with the diagonal blocks, a low-rank approximation to the solution of the correction equation (4) is obtained with Algorithm 1. The resulting procedure is summarized in Algorithm 4. As highlighted in the pseudo-code, it is strongly recommended to reduce the ranks of $UDU^{*}$ in line 9 and of $X_{0}+\delta X$ in line 11. Algorithm 4 requires the equations associated with the diagonal blocks to admit a unique stabilizing solution at all levels of the recursion.

The divide-and-conquer method for a UQME with HODLR matrix coefficients is derived in an analogous manner. The only substantial changes are that the equations associated with the diagonal blocks are solved by cyclic reduction [18], see Algorithm 5. The resulting procedure is summarized in Algorithm 6. This algorithm requires that the matrix polynomials associated with the diagonal blocks — $\lambda^{2}A_{jj}+\lambda B_{jj}+C_{jj}$ for $j=1,2$ — maintain the splitting property (7), at all levels of the recursion. Similarly as in Algorithm 4, compression is recommended in lines 9 and 11 of Algorithm 6.

4.2.1 Complexity of divide-and-conquer in the HODLR format

The complexity of Algorithms 4 and 6 critically depends on the convergence of the projection methods (Algorithms 1 and 3, respectively) used for solving the correction equations. To a milder extent, it also depends on the numerical methods used for solving the small dense equations associated with the diagonal blocks on the lowest level of the recursion. In order to provide some insights of the computational cost we make the following simplifying assumptions:

(i)

Algorithm 1 and Algorithm 3 converge in a constant number of iterations; 2. (ii)

solving the dense (unstructured) equations has complexity $\mathcal{O}(n^{3})$ ; 3. (iii)

the matrix $Q$ in CARE has rank $k$ ; 4. (iv)

all involved HODLR matrices have HODLR rank $k$ and have a regular partition, that is, $n=2^{p}n_{\mathsf{min}}$ and the splitting (27) always generates equally sized diagonal blocks; 5. (v)

the compressions in Algorithm 4 and Algorithm 6 is not performed.

Under the assumptions stated above, the LU decomposition of an $n\times n$ HODLR matrix requires $\mathcal{O}(k^{2}n\log^{2}(n))$ operations, while performing forward or backward substitution with a vector is $\mathcal{O}(kn\log(n))$ . A matrix-vector product is $\mathcal{O}(kn\log(n))$ and all involved matrix-matrix operations are at most $\mathcal{O}(k^{2}n\log^{2}(n))$ , see, e.g., [27].

CARE.

Let $\mathcal{C}_{\mathsf{care}}(n,k)$ denote the complexity of Algorithm 4. Assumption (i) implies that the cost of Algorithm 1, called at Line 10, is $\mathcal{O}(k^{2}n\log^{2}(n))$ , because it is dominated by the cost of solving (shifted) linear systems with the matrix $A_{\mathsf{corr}}$ . Assumption (i) also implies that $X_{0}$ , see Line 8, has HODLR rank $\mathcal{O}(k\log(n))$ . Because $U_{A}$ and $U_{F}$ each have $2k$ columns, the matrix multiplications $X_{0}U_{A}$ and $X_{0}U_{F}$ at Line 9 require $\mathcal{O}(k^{2}n\log^{2}(n))$ operations. Finally, thanks to assumption (ii) we have

[TABLE]

Applying the master theorem [20] to (29) yields $\mathcal{C}_{\mathsf{care}}(n,k)=\mathcal{O}(k^{2}n\log^{3}(n))$ .

UQME.

Let $\mathcal{C}_{\mathsf{uqme}}(n,k)$ denote the complexity of Algorithm 6. Analogously to CARE, Assumption (i) implies that Algorithm 1 requires $\mathcal{O}(k^{2}n\log(n)^{2})$ operations and that $X_{0}$ at Line 8 has HODLR rank $\mathcal{O}(k\log(n))$ . Therefore, the complexity of Line 9 is given by the one of solving $\mathcal{O}(k)$ linear systems, that is, $\mathcal{O}(k^{2}n\log^{2}(n))$ . In turn, the recurrence relation for $\mathcal{C}_{\mathsf{uqme}}(n,k)$ is identical with (29) and hence $\mathcal{C}_{\mathsf{uqme}}(n,k)=O(k^{2}n\log^{3}(n))$ .

5 Numerical results

We now proceed to verify the numerical performance of the divide-and-conquer methods, Algorithms 4 and 6 from Section 4. Our methods are compared with state-of-the-art iterative algorithms for solving quadratic matrix equations:

•

structure preserving doubling algorithm (SDA) for CARE [19],

•

cyclic reduction (CR) for UQME [18] (Algorithm 5).

Both algorithms are well suited for coefficients with hierarchical low-rank structures; we have implemented in HODLR arithmetic using the hm-toolbox [36]. As indicated in the description of the algorithms, we apply recompression with the threshold $\tau_{\sigma}=10^{-12}$ in order to keep the ranks under control. Unless stated otherwise, we set the minimal block-size to $n_{\mathsf{min}}=256$ for the representation in the HODLR format. The parameters $\tau_{\mathsf{care}},\tau_{\mathsf{uqme}}$ used in Algorithm 4 and Algorithm 6, respectively, for stopping the low-rank iterative solver have been set to $10^{-8}$ .

All experiments have been performed on a Laptop with a dual-core Intel Core i7-7500U 2.70 GHz CPU, 256KB of level 2 cache, and 16 GB of RAM. The algorithms are implemented in MATLAB and tested under MATLAB2017a, with MKL BLAS version 11.2.3 utilizing both cores.

5.1 Results for CARE

We will use the following three examples to test the performance of Algorithm 4 for CARE.

Example 5.1.

This is an academic example of arbitrary size $n$ : $A=\operatorname{tridiag}(1,-2,1)$ , that is, $A$ is a tridiagonal matrix with $-2$ on the diagonal and $1$ on the sub- and superdiagonal. The matrix $B\in\mathbb{R}^{n\times 2}$ is random with normally distributed entries, $Q=Q_{0}+(0.1-\theta)I$ , where $Q_{0}$ is a random symmetric tridiagonal matrix also with normally distributed entries, and $\theta\in\mathbb{R}$ is the smallest eigenvalue of $Q_{0}$ .

Example 5.2.

This example is taken from [30, Example 3.2 and Section 5.6]:

[TABLE]

where $e_{j}$ denotes the $j$ th unit vector of appropriate length. Since $A$ is unstable, we use an initial stabilizing solution $X_{0}:=Z_{0}Z_{0}^{*}$ , $Z_{0}=8\left[\begin{smallmatrix}-e_{n}&e_{1}\\ -e_{n}&e_{1}\end{smallmatrix}\right]$ and consider the stabilized CARE given by $\tilde{A}:=A-X_{0}^{*}BB^{*}$ and $\tilde{Q}:=Q-X_{0}^{*}BB^{*}X_{0}^{*}+A^{*}X_{0}+X_{0}A$ . Because of the structure of $B$ and $Z_{0}$ , $\tilde{A}$ is still sparse. All matrices are scaled by $\|A\|_{2}$ and, to acquire a banded structure, reordered by a perfect shuffle permutation.

Example 5.3.

This example is carex18 from the CARE benchmark collection [1] with tridiagonal $A$ and $E$ , $B\in\mathbb{R}^{n}$ , but we set $Q:=I_{n}$ .

All matrices have been converted into the HODLR format using the hmtoolbox. However, for the fast solution of the linear systems in Algorithm 1 we invoke the original sparse matrices $A$ and $E$ and call a sparse direct solver via MATLAB’s “backslash”.

The results — compared to those of SDA — are summarized in the Tables 1–3, where $\mathrm{Res}=\|A^{*}X+XA-XBB^{*}X+Q\|_{2}/\|X\|_{2}$ . The reported computing times for both methods clearly reveal that the divide-and-conquer method requires substantially less time than SDA while achieving a similar or even better level of accuracy at the same time. Most of the time in SDA was spent in the numerous HODLR matrix-matrix multiplications and the associated recompression steps after each multiplication.

For the largest matrices from Example 5.1, $n=32\,768$ , we have profiled the computing time spent at the different stages of the divide-and-conquer method. Solving dense CAREs for the diagonal blocks at the lowest level of recursion consumed about 30% of the total time, while about 50% was spent on solving the correction equation, CARE (4), by RKSM (Algorithm 1). About 15% of the time was spent on performing the update $X_{0}+\delta X$ (line 11 in Algorithm 4). The work spent on rank compressions was negligible; it consumed less than 1% of the total time. Within RKSM, orthonormalization within the Arnoldi method and the solution of the compressed CAREs were the most time consuming steps (totaling approximately 40% of the time spent on RKSM), followed by the procedure for shift generation (15%). Due to the sparse, banded structure of $A$ , the linear system solves consumed only a very small fraction (about 3%). We note that the time for solving the correction equation (4) could potentially be reduced by employing a different low-rank solver for CAREs. A good candidate for such a solver is the recently proposed RADI method [7], which does not rely on Galerkin projections and, hence, does not require solving compressed CAREs111Preliminary results suggest that replacing Algorithm 1 with RADI reduces the time by 10% on average over all used sizes $n$ .. We also investigated the effect of reducing the block size $n_{\mathsf{min}}$ from 256 to 128. As expected, this decreased the fraction of computing time spent on diagonal blocks from 30% to 10% but, due to the higher number of occurrences, increased part spent on solving the correction CAREs to about 67%. The change in the overall time for Algorithm 4 was negligible compared to $n_{\mathsf{min}}=256$ .

5.2 Results for UQMEs from QBD processes

QBD processes are discrete-time stochastic processes with a two-dimensional discrete state space. The variables of the state space are called level and phase; the transition — at each time step — with respect to the level coordinate has at most unit length. We consider models whose state space is isomorphic to $\mathbb{N}\times\{0,\dots,n-1\}$ , that is, we have infinite levels and a finite number of possible phases. Moreover, we assume that the process is level independent, i.e. the transition probability depends on the variation of the level but not on its current value.

Computing the stationary distribution of a level independent QBD process amounts to solving a UQME with coefficients corresponding to (possibly shifted) sub-blocks of its transition probability matrix [13]. More specifically, the coefficients of the UQME $AX^{2}+BX+C=0$ have the properties that $A,B+I,C\in\mathbb{R}^{n\times n}$ are non-negative and $A+B+C+I$ is stochastic, that is, each row sums to one. As the following lemma shows, these properties imply — under some mild additional conditions — the eigenvalue splitting property (7) on every level of recursion in the divide-and-conquer method.

Lemma 5.4.

Suppose that $A,B,C$ have the properties stated above and that $\varphi(\lambda)$ has only one eigenvalue on the unit circle, the simple eigenvalue $1$ . For some index set $J\subseteq\{1,\dots,n\}$ , let $A_{J},B_{J},C_{J}$ denote the corresponding principal submatrices of $A,B,C$ . Assume that $B_{J}$ is invertible and $B_{J}^{-1}(A_{J}+C_{J})$ is irreducible. Then $\varphi_{J}(\lambda):=\lambda^{2}A_{J}+\lambda B_{J}+C_{J}$ has the splitting property (7).

Proof.

For the moment, let us assume that $A_{J}+B_{J}+C_{J}+I$ is substochastic, that is, $(A_{J}+B_{J}+C_{J}+I)\mathbf{e}\lneq\mathbf{e}$ , where $\mathbf{e}$ denotes the vector of all ones and the inequality is understood componentwise. We aim at utilizing the following consequence of Rouché’s theorem for matrix-valued functions [38, Theorem 3.2]: if

[TABLE]

holds for an induced norm $\lVert\cdot\rVert$ then $\varphi_{J}(z)$ has exactly $k$ eigenvalues (counting multiplicities) in the open unit disc and $k$ eigenvalues with modulus greater than $1$ , where $k$ denotes the cardinality of $J$ . This implies the result of the lemma.

Setting $\psi(\lambda):=-B_{J}^{-1}(\lambda A_{J}+\lambda^{-1}C_{J})$ , the condition (30) clearly holds if we can show that the spectral radius $\rho(\psi(\lambda))$ is less than $1$ for every $\lambda$ on the unit circle. Note that $|\lambda A_{J}+\lambda^{-1}C_{J}|\leq A_{J}+C_{J}$ because $A_{J},C_{J}$ are non-negative. Combined with the fact that $-B_{J}$ is an M-matrix, which implies $-B_{J}^{-1}\geqslant 0$ , and the monotonicity of the spectral radius, we obtain

[TABLE]

Using $-B_{J}^{-1}\geqslant 0$ we also have

[TABLE]

In particular, the matrix $\psi(1)$ is irreducible and substochastic, and by the Perron Frobenius theorem [43, Theorem 1.5] it has spectral radius strictly less than $1$ .

It remains to consider the case when $A_{J}+B_{J}+C_{J}+I$ is stochastic. Note that, under this assumption also the matrix $\psi(1)$ is stochastic. Obviously, the statement of the lemma holds when $A_{J}=0$ and $C_{J}=0$ , so we assume $A_{J}+C_{J}\not=0$ from now on. Assuming $J=\{1,\ldots,k\}$ after a suitable reordering, we can partition

[TABLE]

and hence an eigenvalue of $\varphi_{J}(\lambda)$ is also an eigenvalue of $\varphi(\lambda)$ .

Let us consider the perturbed matrix polynomial $\varphi_{J,\epsilon}(\lambda):=\lambda^{2}(A_{J}-\epsilon E_{A})+\lambda B_{J}+(C_{J}-\epsilon E_{C})$ for $\epsilon>0$ and Boolean matrices $E_{A},E_{C}$ with the sparsity pattern of $A$ and $C$ , respectively. Because of $A_{J}+C_{J}\not=0$ , the matrix $A_{J}-\epsilon E_{A}+B_{J}+C_{J}-\epsilon E_{C}+I$ is substochastic for $\epsilon$ sufficiently close to [math]. Using again Rouché’s theorem, this ensures that $\varphi_{J,\epsilon}(\lambda)$ has the property (7). By continuity, the eigenvalue functions of $\varphi_{J,\epsilon}(\lambda)$ do not cross the unit circle as $\epsilon\to 0$ and, in turn, $\varphi_{J}(\lambda)=\lim_{\epsilon\to 0}\varphi_{J,\epsilon}(\lambda)$ has $n$ eigenvalues inside or on the unit circle and $n$ eigenvalues outside or on the unit circle. Because the simple eigenvalue $1$ is the only eigenvalue of $\varphi(\lambda)$ on the unit circle and the same property holds for $\varphi_{J}(\lambda)$ , this completes the proof.

∎

We remark that the eigenvalue assumption on $\varphi(\lambda)$ in Lemma 5.4 can be relaxed to the assumption that $1$ is a simple eigenvalue (admitting possibly other eigenvalues on the unit circle), provided that $B_{J}^{-1}(A_{J}+C_{J})$ is primitive and $A_{J}\circ C_{J}\neq 0$ , where $\circ$ indicates the componentwise product.

Often, the probabilistic model requires bounded transitions in the phase coordinate as well. This translates into a band structure in the matrices $A,B$ and $C$ . For example, in the case of double QBD processes (DQBD) [39] the coefficients are all tridiagonal, see also Figure 2.

We test Algorithm 6 on instances of DQBD with increasing size $n$ . In particular, we choose the entries of the $3$ central diagonals of $A,B$ and $C$ randomly from a uniform distribution on $[0,1]$ . We divide each row of the three matrices by the corresponding entry in $(A+B+C)\mathbf{e}$ , in order to make $A+B+C$ row stochastic. Finally, we subtract the identity matrix from $B$ :

[TABLE]

In Table 4 we compare the performance of Algorithm 6 with the method in [16] that combines cyclic reduction — Algorithm 5 — with HODLR arithmetic. Both methods can handle large values for $n$ and return solutions of comparable accuracy, measured in terms of $\mathrm{Res}:=\|AX^{2}+BX+C\|_{2}$ . However, the divide-and-conquer method provides a significant speed up; it is about $3$ times faster than the competitor for $n\geqslant 4096$ .

5.3 Results for UQMEs from damped mass-spring system

Another application of UQME is the solution of the quadratic eigenvalue problem $(\lambda^{2}A+\lambda B+C)v=0$ , arising in the analysis of damped structural systems and vibration problems [21, 32, 29]. After having determined the solution $X$ of (5), the quadratic eigenvalue problem reduces to two linear eigenvalue problems: the one associated with $X$ and the generalized eigenproblem $(AX+B)v=-\lambda Av$ .

We repeat the experiments from Section 5.2 for the UQME associated with a quadratic eigenvalue problem from a damped mass spring system considered in [49, Example 2]. The $n\times n$ coefficients of the UQME are given by

[TABLE]

The results reported in Table 5 confirm the good scalability and accuracy of both methods. The solution exhibits a very low HODLR rank and cyclic reduction needs only 2–3 iterations to converge. As a consequence, cyclic reduction is faster than Algorithm 6 on larger instances of this example.

6 Conclusions

We have proposed novel Krylov subspace methods for updating the solution of continuous-time algebraic Riccati equations and unilateral quadratic matrix equations whose coefficients are subject to low-rank modifications. We have provided theoretical insights into the low-rank and stability properties of the solutions to the involved correction equations. This has led us to design novel divide-and-conquer methods for quadratic equations with large-scale coefficients featuring hierarchical low-rank structures. Our methods have linear polylogarithmic complexity and often outperform existing techniques, sometimes significantly. The applications highlighted in this work include quasi-birth–death processes and damped mass-spring systems.

Acknowledgements. During the larger part of the work on this article, the second author PK was affiliated with the Max Planck Institute for Dynamics of Complex Technical Systems Magdeburg.

Bibliography50

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J. Abels and P. Benner. CAREX - A Collection of Benchmark Examples for Continuous-Time Algebraic Riccati Equations (Version 2.0). SLICOT working note 1999-14, 1999.
2[2] S. Ambikasaran and E. Darve. An 𝒪 ( N log ⁡ N ) 𝒪 𝑁 𝑁 \mathcal{O}(N\log N) fast direct solver for partial hierarchically semi-separable matrices: with application to radial basis function interpolation. J. Sci. Comput. , 57(3):477–501, 2013.
3[3] U. Baur and P. Benner. Factorized solution of Lyapunov equations based on hierarchical matrix arithmetic. Computing , 78(3):211–234, 2006.
4[4] B. Beckermann and L. Reichel. Error estimates and evaluation of matrix functions via the Faber transform. SIAM J. Numer. Anal. , 47(5):3849–3883, 2009.
5[5] B. Beckermann and A. Townsend. On the singular values of matrices with displacement structure. SIAM J. Matrix Anal. Appl. , 38(4):1227–1248, 2017.
6[6] P. Benner, Z. Bujanović, P. Kürschner, and J. Saak. A numerical comparison of solvers for large-scale, continuous-time algebraic Riccati equations. Technical Report 1811.00850, ar Xiv, 2018.
7[7] P. Benner, Z. Bujanović, P. Kürschner, and J. Saak. RADI: A low-rank ADI-type algorithm for large scale algebraic Riccati equations. Numer. Math. , 138(2):301–330, 2018.
8[8] P. Benner, R. Byers, V. Mehrmann, and H. Xu. Robust numerical methods for robust control. Technical Report 06-2004, Institut für Mathematik, TU Berlin, 2004.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Low-rank updates and divide-and-conquer methods for quadratic matrix equations

Abstract

1 Introduction

Type 1: CARE.

Type 2: UQME.

Quadratic matrix equations with hierarchical low-rank structure.

Outline.

2 Analysis of the correction equations

2.1 Existence and low-rank approximability

Singular value decay for linear matrix equations

Theorem 2.1** (Theorem 2.1 in [5]).**

CARE.

Lemma 2.2**.**

UQME.

3 Low-rank updates

3.1 Projection subspaces

Definition 3.1**.**

3.2 Low-rank solution of correction equation for CARE

3.2.1 Extension to generalized CAREs

3.3 Low-rank solution of correction equation for UQME

4 Divide-and-conquer methods

4.1 HODLR matrices

4.2 Divide-and-conquer in the HODLR format

4.2.1 Complexity of divide-and-conquer in the HODLR format

CARE.

UQME.

5 Numerical results

5.1 Results for CARE

Example 5.1**.**

Example 5.2**.**

Example 5.3**.**

5.2 Results for UQMEs from QBD processes

Lemma 5.4**.**

Proof.

5.3 Results for UQMEs from damped mass-spring system

6 Conclusions

Theorem 2.1 (Theorem 2.1 in [5]).

Lemma 2.2.

Definition 3.1.

Example 5.1.

Example 5.2.

Example 5.3.

Lemma 5.4.