Analysis of Krylov Subspace Approximation to Large Scale Differential   Riccati Equations

Antti Koskela; Hermann Mena

arXiv:1705.07507·math.NA·June 24, 2021

Analysis of Krylov Subspace Approximation to Large Scale Differential Riccati Equations

Antti Koskela, Hermann Mena

PDF

TL;DR

This paper analyzes a Krylov subspace method for large-scale symmetric differential Riccati equations, demonstrating structure preservation, superlinear convergence, and providing error estimates supported by numerical experiments.

Contribution

It introduces a structure-preserving Krylov subspace approximation for large-scale Riccati equations with proven superlinear convergence and practical error estimation methods.

Findings

01

The method preserves positivity and monotonicity of the Riccati flow.

02

Superlinear convergence of the approximation is theoretically established.

03

Numerical experiments confirm the effectiveness and accuracy of the approach.

Abstract

We consider a Krylov subspace approximation method for the symmetric differential Riccati equation $\dot{X} = A X + X A^{T} + Q - X S X$ , $X (0) = X_{0}$ . The method we consider is based on projecting the large scale equation onto a Krylov subspace spanned by the matrix $A$ and the low rank factors of $X_{0}$ and $Q$ . We prove that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the positivity of the exact flow, and also the property of monotonicity. We also provide a theoretical a priori error analysis which shows a superlinear convergence of the method. This behavior is illustrated in the numerical experiments. Moreover, we derive an efficient a posteriori error estimate as well as discuss multiple time stepping combined with a cut of the rank of the numerical solution.

Tables2

Table 1. Table 1: Timings for the Krylov subspace iteration and for the solving of the projected system using the modified Davison–Maki method, when integrating up to t = 10 𝑡 10 t=10 using a single Krylov subspace iteration. Times are in seconds.

$k$	Krylov iteration	solving small dimensional system
$10$	0.046	0.037
$20$	0.16	0.091
$30$	0.31	0.20
$40$	0.49	0.44

Table 2. Table 2: Maximum number of columns of the basis matrix V k subscript 𝑉 𝑘 V_{k} along the iteration for the substepping approach and for one step approximation using Algorithm 1 , when an error tolerance tol tol \mathrm{tol} is required.

$tol$	time stepping	single step iteration
$10^{- 2}$	60	112
$10^{- 4}$	76	147
$10^{- 6}$	112	175
$10^{- 8}$	160	203

Equations248

J(x,u)=\int\limits_{0}^{t_{f}}\big{(}x(t)^{T}C^{T}Cx(t)+u(t)^{T}u(t)\big{)}\,\hskip 1.0pt{\rm d}\hskip 0.5ptt+x(t_{f})^{T}Gx(t_{f}),

J(x,u)=\int\limits_{0}^{t_{f}}\big{(}x(t)^{T}C^{T}Cx(t)+u(t)^{T}u(t)\big{)}\,\hskip 1.0pt{\rm d}\hskip 0.5ptt+x(t_{f})^{T}Gx(t_{f}),

\overset{x}{˙} (t) = A x (t) + B u (t), x (0) = x_{0}, t \in [0, t_{f}]

\overset{x}{˙} (t) = A x (t) + B u (t), x (0) = x_{0}, t \in [0, t_{f}]

u (t) = K (t) x (t), where K (t) = - B^{T} X (t),

u (t) = K (t) x (t), where K (t) = - B^{T} X (t),

\dot{X} + A^{T} X + X A - X B B^{T} X + C^{T} C = 0, X (t_{f}) = G,

\dot{X} + A^{T} X + X A - X B B^{T} X + C^{T} C = 0, X (t_{f}) = G,

\dot{\widetilde{x}}(t)=\big{(}A-BB^{T}X(t_{f}-t)\big{)}\widetilde{x}(t),\quad\widetilde{x}(0)=x_{0}.

\dot{\widetilde{x}}(t)=\big{(}A-BB^{T}X(t_{f}-t)\big{)}\widetilde{x}(t),\quad\widetilde{x}(0)=x_{0}.

F (A) = {x^{*} A x : x \in C^{n}, ∥ x ∥ = 1} .

F (A) = {x^{*} A x : x \in C^{n}, ∥ x ∥ = 1} .

μ (A) := {max Re z : z \in F (A)} .

μ (A) := {max Re z : z \in F (A)} .

φ_{1} (z) = \frac{e ^{z} - 1}{z} = ℓ = 0 \sum \infty \frac{z ^{ℓ}}{( ℓ + 1 )!} .

φ_{1} (z) = \frac{e ^{z} - 1}{z} = ℓ = 0 \sum \infty \frac{z ^{ℓ}}{( ℓ + 1 )!} .

\dot{X} (t)

\dot{X} (t)

X (0)

X_{0} = Z Z^{T}, Q = C C^{T},

X_{0} = Z Z^{T}, Q = C C^{T},

\frac{d}{d t} [U (t) V (t)] = [- A Q S A^{T}] [U (t) V (t)], [U (0) V (0)] = [I X_{0}]

\frac{d}{d t} [U (t) V (t)] = [- A Q S A^{T}] [U (t) V (t)], [U (0) V (0)] = [I X_{0}]

X (t) = V (t) U (t)^{- 1} .

X (t) = V (t) U (t)^{- 1} .

(J H)^{T} = J H, where J = [0 - I I 0] .

(J H)^{T} = J H, where J = [0 - I I 0] .

X (t) = e^{t A} X_{0} e^{t A^{T}}

X (t) = e^{t A} X_{0} e^{t A^{T}}

- 0 \int t e^{(t - s) A} X (s) S X (s) e^{(t - s) A^{T}} d s .

H = [- A Q S A^{T}] and H = [- A Q S A^{T}]

H = [- A Q S A^{T}] and H = [- A Q S A^{T}]

\|X(t)\|\leq{\rm e}\hskip 1.0pt^{2t\mu(A)}\|X_{0}\|+t\varphi_{1}\big{(}2t\mu(A))\|Q\|.

\|X(t)\|\leq{\rm e}\hskip 1.0pt^{2t\mu(A)}\|X_{0}\|+t\varphi_{1}\big{(}2t\mu(A))\|Q\|.

∥ X (t) ∥ \leq ∥ e^{t A} X_{0} e^{t A^{T}} + 0 \int t e^{(t - s) A} Q e^{(t - s) A^{T}} d s ∥.

∥ X (t) ∥ \leq ∥ e^{t A} X_{0} e^{t A^{T}} + 0 \int t e^{(t - s) A} Q e^{(t - s) A^{T}} d s ∥.

\max_{s\in[0,t]}\|X(s)\|\leq\max\{1,{\rm e}\hskip 1.0pt^{2t\mu(A)}\}\|X_{0}\|+t\max\{1,\varphi_{1}\big{(}2t\mu(A))\}\|Q\|.

\max_{s\in[0,t]}\|X(s)\|\leq\max\{1,{\rm e}\hskip 1.0pt^{2t\mu(A)}\}\|X_{0}\|+t\max\{1,\varphi_{1}\big{(}2t\mu(A))\}\|Q\|.

K_{k} (A, B) = span {B, A B, A^{2} B, \dots, A^{k - 1} B} .

K_{k} (A, B) = span {B, A B, A^{2} B, \dots, A^{k - 1} B} .

H_{ij}

H_{ij}

W_{j}

W_{j}

H_{k} = V_{k}^{T} A V_{k} .

H_{k} = V_{k}^{T} A V_{k} .

A V_{k} = V_{k} H_{k} + U_{k + 1} H_{k + 1, k} E_{k}^{T},

A V_{k} = V_{k} H_{k} + U_{k + 1} H_{k + 1, k} E_{k}^{T},

e^{A} B \approx V_{k} e^{H_{k}} V_{k}^{T} B = V_{k} e^{H_{k}} E_{1} R_{1},

e^{A} B \approx V_{k} e^{H_{k}} V_{k}^{T} B = V_{k} e^{H_{k}} E_{1} R_{1},

∥ e^{t A} B - V_{k} e^{t H_{k}} V_{k}^{T} B ∥ \leq 2 max {1, e^{t μ (A)}} \frac{∥ t A ∥ ^{k}}{k !} ∥ B ∥.

∥ e^{t A} B - V_{k} e^{t H_{k}} V_{k}^{T} B ∥ \leq 2 max {1, e^{t μ (A)}} \frac{∥ t A ∥ ^{k}}{k !} ∥ B ∥.

K_{k} (A, B, \overset{s}{ˉ}) = span {B, (s_{1} I - A)^{- 1} B, \dots, ℓ = 1 \prod k - 1 (s_{ℓ} I - A)^{- 1} B} .

K_{k} (A, B, \overset{s}{ˉ}) = span {B, (s_{1} I - A)^{- 1} B, \dots, ℓ = 1 \prod k - 1 (s_{ℓ} I - A)^{- 1} B} .

\begin{array}[]{rcl}\dot{Y}_{k}(t)&=&H_{k}Y_{k}(t)+Y_{k}(t)H_{k}^{T}+C_{k}C_{k}^{T}-Y_{k}(t)S_{k}Y_{k}(t),\\ Y_{k}(0)&=&Z_{k}Z_{k}^{T}.\end{array}

\begin{array}[]{rcl}\dot{Y}_{k}(t)&=&H_{k}Y_{k}(t)+Y_{k}(t)H_{k}^{T}+C_{k}C_{k}^{T}-Y_{k}(t)S_{k}Y_{k}(t),\\ Y_{k}(0)&=&Z_{k}Z_{k}^{T}.\end{array}

Y_{k} (t) = W_{k} (t) U_{k} (t)^{- 1}, where [U_{k} (t) W_{k} (t)] = exp (t [- H_{k} C_{k} C_{k}^{T} S_{k} H_{k}^{T}]) [I_{k} Z_{k} Z_{k}^{T}] .

Y_{k} (t) = W_{k} (t) U_{k} (t)^{- 1}, where [U_{k} (t) W_{k} (t)] = exp (t [- H_{k} C_{k} C_{k}^{T} S_{k} H_{k}^{T}]) [I_{k} Z_{k} Z_{k}^{T}] .

[U_{j + 1} W_{j + 1}] = exp (Δ t [- H_{k} C_{k} C_{k}^{T} S_{k} H_{k}^{T}]) [I_{k} Y_{k}^{j}], Y_{k}^{j + 1} = W_{j + 1} U_{j + 1}^{- 1} .

[U_{j + 1} W_{j + 1}] = exp (Δ t [- H_{k} C_{k} C_{k}^{T} S_{k} H_{k}^{T}]) [I_{k} Y_{k}^{j}], Y_{k}^{j + 1} = W_{j + 1} U_{j + 1}^{- 1} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Analysis of Krylov subspace approximation to

Large Scale Differential Riccati Equations

Antti Koskela Department of Mathematics and Statisctics, University of Helsinki, [email protected].

Hermann Mena Department of Mathematics, Yachay Tech, Urcuquí, Ecuador, Department of Mathematics, University of Innsbruck, Innsbruck, Austria, [email protected].

Abstract

We consider a Krylov subspace approximation method for the symmetric differential Riccati equation $\dot{X}=AX+XA^{T}+Q-XSX$ , $X(0)=X_{0}$ . The method we consider is based on projecting the large scale equation onto a Krylov subspace spanned by the matrix $A$ and the low rank factors of $X_{0}$ and $Q$ . We prove that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the positivity of the exact flow, and also the property of monotonicity. We also provide a theoretical a priori error analysis which shows a superlinear convergence of the method. This behavior is illustrated in the numerical experiments. Moreover, we derive an efficient a posteriori error estimate as well as discuss multiple time stepping combined with a cut of the rank of the numerical solution.

keywords:

Differential Riccati equations, LQR optimal control problems, large scale ordinary differential equations, Krylov subspace methods, matrix exponential, exponential integrators, model order reduction, low rank approximation.

AMS:

65F10, 65F60, 65L20, 65M22, 93A15, 93C05

1 Introduction

Large scale differential Riccati equations (DREs) arise in the numerical treatment of optimal control problems governed by partial differential equations. This is the case in particular when solving a linear quadratic regulator problem (LQR), a widely studied problem in control theory. We shortly describe the finite dimensional LQR problem. For more details, we refer to [1, 9]. The differential Riccati equation arises in the finite horizon case, i.e., when a finite time integral cost functional is considered. Denoting the time interval $[0,t_{f}]$ , the functional has then the quadratic form

[TABLE]

where $x\in\mathbb{R}^{n}$ , $C\in\mathbb{R}^{q\times n}$ ( $q\ll n$ ) and $u\in\mathbb{R}^{r}$ ( $r\ll n$ ). The coefficient matrix $G$ of the penalizing term $x(t_{f})^{T}Gx(t_{f})$ is symmetric, nonnegative and has a low rank. The functional (1) is constrained by the system of differential equations

[TABLE]

where the matrix $A\in\mathbb{R}^{n\times n}$ is sparse and $B\in\mathbb{R}^{n\times r}$ . The number of columns of $B$ corresponds to the number of controls and the matrix $C$ represents an observation matrix. Under suitable conditions [1, 9], the control $\widetilde{u}$ minimizing the functional (1) is given by

[TABLE]

$X(t)$ is the unique solution of

[TABLE]

and $\widetilde{x}(t)$ satisfies

[TABLE]

As a result, the central computational problem becomes that of solving the final value problem (4) which, with a careful change of variables, can be written as a initial value problem.

We consider a Krylov subspace approximation method for large scale differential Riccati equations of the form (4). A similar projection method for DREs has been recently proposed in [19]. Our approach differs from that of [19] in the fact that the initial value matrix $G$ of (4) is contained in the Krylov subspace. This allows multiple time stepping. Our approach is also related to projection techniques considered for large scale algebraic Riccati equations [32, 38].

Essentially, the method we consider is based on projecting the matrices $A,Q,S$ and $X_{0}$ on an appropriate Krylov subspace, namely on the *block Krylov subspace *spanned by $A$ and the low rank factors of $X_{0}$ and $Q$ . The projected small dimensional system is then solved using existing linearization techniques. We show that when using a Padé approximant to solve the linearized small dimensional system, the total approximation will be structure preserving in a sense that the property of the positivity is preserved. Also the property of monotonicity is preserved under certain conditions. Our Krylov subspace approach is also strongly related to Krylov subspace techniques used for approximation of the product of an matrix function and a vector, $f(A)b$ , and to exponential integrators [24]. For an introduction to matrix functions we refer to the monograph [21]. The effectiveness of these techniques comes from the fact that generating Krylov subspaces is essentially based on operations of the form $b\rightarrow Ab$ , which are cheap for sparse $A$ .

The linearization approach for DREs is a well-known method. This allows an efficient integration for dense problems, see e.g. [31]. Another approach, the so called Davison–Maki method [10], uses the fundamental solution of the linearized system. A modified variant, avoiding some numerical instabilities, is proposed in [27]. However, the application of these methods for large scale problems is impossible due to the high dimensionality of the linearized differential equation.

The problem of solving large scale DREs has received recently considerable attention. In [5, 4] the authors proposed efficient BDF and Rosenbrock methods for solving DREs capable of exploiting several of the above described properties: sparsity of $A$ , low rank structure of $B$ , $C$ and $G$ , and the symmetry of the solution. However, several difficulties arise when approximating the optimal control (3) in the large scale setting. One difficulty is to evaluate the state equation $x(t)$ and Riccati equation $X(t)$ in the same mesh. In [30] a refined ADI integration method is proposed which addresses the high storage requirements of large scale DRE integrators. In recent studies an efficient splitting method [42] and adaptive high-order splitting schemes [41] for large scale DREs have been proposed.

The paper is organized as follows. In Section 2 we describe some preliminaries. Then, in Section 3, the structure preserving method is proposed. In Section 4, the error analysis first for the differential Lyapunov equation (a simplified version of the DRE), and then for the DRE is presented. In Section 5 a posteriori error estimation is described. In Section 6 the rank cut and multiple time stepping are discussed. Numerical examples and conclusions of Sections 7 and 8 conclude the article.

Notation and definitions

Throughout the article $\|\cdot\|$ will denote the Euclidean norm, or its induced matrix norm, i.e., the spectral norm. By $\mathrm{R}(A)$ we denote the column space of a matrix $A$ . We say that a matrix $A$ is nonnegative if it is symmetric positive semidefinite, and write $A\geq 0$ . For symmetric matrices $A$ and $B$ we write $B\geq A$ if $B-A\geq 0$ .

We will repeatedly use the notion of the *logarithmic norm *of a matrix $A\in\mathbb{C}^{n\times n}$ . It can be defined via the field of values $\mathcal{F}(A)$ , which is defined as

[TABLE]

Then, the logarithmic norm $\mu(A)$ of $A$ is defined by

[TABLE]

We will also repeatedly use the exponential-like function $\varphi_{1}$ defined by

[TABLE]

2 Preliminaries

From now on we consider the time invariant symmetric differential Riccati equation (DRE) written in the form

[TABLE]

where $t\geq 0$ and $A,Q,S,X_{0}\in\mathbb{R}^{n\times n}$ , $Q^{T}=Q$ , $S^{T}=S$ . Specifically, we focus on the low rank positive semidefinite case, where

[TABLE]

for some $Z\in\mathbb{R}^{n\times p}$ and $C\in\mathbb{R}^{n\times q}$ , $p,q\ll n$ , and $S$ is positive semidefinite. Notice that we changed here from to $A^{T}$ to $A$ (a common choice the numerical analysis literature [12, 13]) and from now on $C$ is tall and skinny instead of short and fat as in (4). Although $S$ arises from the low rank matrix $B$ in (4), we do not place any restriction on the rank of $S$ .

2.1 Linearization

We recall a fact that will be needed later on (see e.g. [1, Thm. 3.1.1.]).

Lemma 1 (Associated linear system).

The DRE (5) is equivalent to solving the linear system of differential equations

[TABLE]

where $U(t),V(t)\in\mathbb{R}^{n\times n}$ . If the solution $X(t)$ of (5) exists on the interval $[0,T]$ , then the solution of (7) exists, $U(t)$ is invertible on $[0,T]$ , and

[TABLE]

Notice also that the matrix $\mathcal{H}=\begin{bmatrix}-A&S\\ Q&A^{T}\end{bmatrix}$ is Hamiltonian, i.e., it holds that

[TABLE]

This linearization approach is a standard method for solving finite dimensional DREs, and leads to efficient integration methods for dense problems, see e.g. [10].

2.2 Integral representation of the exact solution

For the exact solution of (5) we have the following integral representation (see also [29, Thm. 8]).

Theorem 2 (Exact solution of the DRE).

The exact solution of the DRE (5) is given by

[TABLE]

Proof.

The proof can be carried out by elementary differentiation. ∎

2.3 Positivity and monotonicity of the exact flow

We recall two important properties of the symmetric DRE, namely the positivity of the exact solution (see e.g. [12, Prop. 1.1]) and the monotonicity of the solution with relative to the initial data (see e.g. [13, Thm. 2]). By these we mean the following.

Theorem 3 (Positivity and monotonicity of the solution).

For the solution $X(t)$ of the symmetric DRE (5) it holds:

(Positivity) $X(t)$ is symmetric positive semidefinite and it exists for all $t>0$ . 2. 2.

(Monotonicity) Consider two symmetric DREs of the (5) corresponding to the linearized systems of the form (7) with the coefficient matrices

[TABLE]

and let $J$ be the skew-symmetric matrix (8). Then, if $\widetilde{\mathcal{H}}J\leq\mathcal{H}J$ , and if $0\leq X_{0}\leq\widetilde{X}_{0}$ , then for every $t\geq 0$ : $X(t)\leq\widetilde{X}(t)$ .

We will later show that our proposed numerical method preserves the properties of Theorem 3.

2.4 Bound for the solution

Using the positivity property of $X(t)$ (Thm. 3) we obtain the following bound for the norm of the solution. This will be repeatedly needed in the analysis of the proposed method.

Lemma 4 (Bound for the exact solution).

For the solution $X(t)$ of (5) it holds

[TABLE]

Proof.

Since $X_{0}$ , $Q$ and $X(t)$ are all symmetric positive semidefinite, we see that the first two terms on the right hand side of (9) are symmetric positive semidefinite and the third term is symmetric negative semidefinite. Moreover, since $X(t)$ is symmetric positive semidefinite by Theorem 3, and since for every symmetric positive definite matrix $M$ it holds that $\|M\|=\max\limits_{\|x\|=1}x^{*}Mx$ , we see that

[TABLE]

Using the well-known bound $\|{\rm e}\hskip 1.0pt^{tA}\|\leq{\rm e}\hskip 1.0pt^{t\mu(A)}$ (see e.g. [43, p. 138]), the fact that $\mu(A^{T})=\mu(A)$ and that $t\varphi_{1}(tz)=\int_{0}^{t}{\rm e}\hskip 1.0pt^{(t-s)z}\,\hskip 1.0pt{\rm d}\hskip 0.5pts$ , the claim follows. ∎

From Lemma 4 we immediately get the following corollary.

Corollary 5.

The solution $X(t)$ satisfies

[TABLE]

3 A Krylov subspace approximation and its structure preserving properties

In this section we propose our projection method. The original problem (5) is projected to small dimensional space using a matrix $V_{k}$ with orthonormal columns which contains certain Krylov subspaces. The fact that $V_{k}$ needs to contain these subspaces can be seen from the point of view of Krylov subspace approximation of the matrix exponential (see also the solution formula (9)). This is strongly related to the approach taken by Saad already in [36] for the algebraic Lyapunov equation. Before introducing our projection method, we recall some basic facts about the Krylov subspace approximation of the matrix exponential. This will also give some auxiliary results that are needed later in the convergence analysis.

3.1 Block Krylov subspace approximation of the matrix exponential

The Krylov subspace approximation of products of the form $f(A)b$ has recently been an active topic of research, and we mention the work on classical Krylov subspaces [14, 17, 28, 34], extended Krylov subspaces [28], and rational Krylov subspaces [44, 3].

Block Krylov subspace methods are based on the idea of projecting a high dimensional problem involving a matrix $A\in\mathbb{R}^{n\times n}$ and a block matrix $B\in\mathbb{R}^{n\times\ell}$ onto a lower dimensional subspace, a block Krylov subspace $\mathcal{K}_{k}(A,B)$ , which is defined by

[TABLE]

Usually, an orthogonal basis matrix $V_{k}$ for $\mathcal{K}_{k}(A,B)$ is generated using an Arnoldi type iteration, and this matrix is then used for the projections. There exist several Arnoldi type methods to produce an orthogonal basis matrix for $\mathcal{K}_{k}(A,B)$ , and in numerical experiments we use the *block Arnoldi iteration *given in [35] which is listed algorithmically as follows.

Input: $A\in\mathbb{R}^{n\times n}$ , $B\in\mathbb{R}^{n\times\ell}$ and number of iterations $k$ . 2. 2.

Start. Compute QR decomposition: $B=U_{1}R_{1}$ . 3. 3.

Iterate. *for $j=1,...,k$ compute:

[TABLE]

As usual, the orthogonalisation can be carried out at step 3 in a modified Gram–Schmidt manner and reorthogonalisation can be performed if needed.

This algorithm gives a basis matrix with orthogonal columns, $V_{k}=\begin{bmatrix}U_{1}&\ldots&U_{k}\end{bmatrix}\in\mathbb{R}^{n\times k\ell}$ , for the block Krylov subspace $\mathcal{K}_{k}(A,B)$ and the projected block Hessenberg matrix

[TABLE]

This means that the $\ell\times\ell$ $(i,j)$ -block of $H_{k}$ is given by $H_{ij}$ in the above algorithm. Moreover, the following Arnoldi relation holds:

[TABLE]

where $E_{k}=\begin{bmatrix}0&\ldots&0&I_{\ell}\end{bmatrix}^{T}\in\mathbb{R}^{k\ell\times\ell}$ .

If $A$ has its field of values on a line, e.g., is Hermitian or skew-Hermitian, then there exists $\theta\in\mathbb{R}$ such that ${\rm e}\hskip 1.0pt^{\text{i}\hskip 1.0pt\theta}A$ is Hermitian. By (12) this implies that $H_{k}$ is block tridiagonal, the orthogonalisation recursions become three-term recursions, and we get the block Lanczos iteration.

The polynomial approximation property of Krylov subspaces motivates to approximate the product of the matrix exponential and a block matrix as

[TABLE]

where $E_{1}=\begin{bmatrix}I_{\ell}&0&\ldots&0\end{bmatrix}^{T}\in\mathbb{R}^{k\ell\times\ell}$ . For a vector $B$ , the approximation (14) was considered already in [14, 17], and for the case of a block matrix $B$ it has been considered also in [33].

Since the columns of $V_{k}$ are orthonormal, we have $\mathcal{F}(H_{k})=\mathcal{F}(V_{k}^{T}AV_{k})\subset\mathcal{F}(A).$ and from this it follows that $\mu(H_{k})\leq\mu(A)$ . Clearly, it also holds that $\|H_{k}\|\leq\|A\|$ . Moreover, we have the following bound.

Lemma 6.

For the approximation (14) holds

[TABLE]

Proof.

The proof goes analogously to the proof of [17, Thm 2.1], where $B$ is a vector. ∎

3.2 Rational Krylov subspaces

We also mention the possibility of approximating matrix functions in *rational Krylov subspaces *(see e.g. [15], [18], [44] and [38]). For poles $\bar{s}=\{s_{1},s_{2},\ldots\}$ , $s_{i}\in\mathbb{C}$ , the rational Krylov subspace can be defined as (see also)

[TABLE]

Then, if a matrix $V_{k}$ with orthogonal columns gives a basis for the subspace $\mathcal{K}_{k}(A,B,\bar{s})$ , the matrix exponential can be approximated as (14), where $H_{k}=V_{k}^{T}AV_{k}$ . Especially for sparse matrices, the rational Krylov methods are often more efficient, and as the solution usually converges faster with respect to subspace dimension, the rational alternative is usually more memory efficient. These differences will be illustrated in numerical experiments. However, for simplicity, in our analysis and numerical experiments we will use the block Arnoldi iteration.

3.3 The method

We approximate $X(t)$ in the block Krylov subspace $\mathcal{K}_{k}\big{(}A,\begin{bmatrix}Z&C\end{bmatrix}\big{)}$ . The fact that the projection onto this subspace results as an accurate approximation can be seen from the form of the exact solution (9) and from the Krylov approximation properties shown in the last subsection. To this end, an orthogonal basis matrix $V_{k}\in\mathbb{R}^{n\times k(p+q)}$ for $\mathcal{K}_{k}\big{(}A,\begin{bmatrix}Z&C\end{bmatrix}\big{)}$ is first generated using the block Arnoldi iteration. Then, we carry out the approximation as listed in Algorithm 1. Notice that the method works independently of the rank of $S$ .

3.4 Solving the small dimensional system

To solve the small dimensional system (17) we use the modified Davison–Maki method [27]. This method is chosen because of its structure preservation properties which are shown in Subsection 3.5. The method can be described as follows.

As shown in Lemma 1, the solution of the projected system (17) is given by

[TABLE]

Instead of directly evaluating $Y_{k}(t)$ by (18), which is the idea of the original Davison–Maki method [10], we perform substepping in order to avoid numerical instabilities arising from the inversion of the matrix $U_{k}(t)$ in (18). This is exactly the modified Davison–Maki method, and it is presented in the following pseudocode. We denote $Y_{k}^{j}\approx Y_{k}\big{(}\tfrac{j\cdot t}{m}\big{)}$ .

Input: Hamiltonian matrix $\left[\begin{smallmatrix}-H_{k}&S_{k}\\ C_{k}C_{k}^{T}&H_{k}^{T}\end{smallmatrix}\right]$ , $Y_{k}(0)=Z_{k}Z_{k}^{T}$ ,

time $t>0$ , substep size $\Delta t=t/m$ , $m\in\mathbb{Z}_{+}$ . 2. 2.

Set: $Y_{k}^{0}=Y_{k}(0)$ . 3. 3.

Iterate. *for $j=0,...,m-1$ :

[TABLE]

For computing the matrix exponential in Step 3, we use the 13 $th$ order diagonal Padé aproximant which is implemented in Matlab as ’expm’ command [22].

3.5 Structure preserving properties of the approximation

We next inspect the two properties stated in Theorem 3. We show that the proposed projection method preserves the property of the positivity of the exact flow, and it also preserves the property of monotonicity under the condition that the matrix $V_{k}$ used for the projection stays constant when the initial data for the DRE is changed. Notice that these results are not restricted to polynomial Krylov subspace methods.

Theorem 7.

The numerical approximation given by Algorithm 1 preserves the property of positivity stated in Theorem 3.

Proof.

The projected coefficient matrices $S_{k}$ , $C_{k}C_{k}^{T}$ and the initial value $Z_{k}Z_{k}^{T}$ of the small system (17) are clearly all symmetric nonnegative. Thus the small system (17) is a symmetric DRE. By Theorem 3.1 of [12], an application of a symplectic Runge–Kutta scheme with positive weights $b_{i}$ (see [13] for details) gives as a result a symmetric nonnegative solution. As the $s$ th order diagonal Padé approximant equals the stability function of the $s$ -stage Gauss–Legendre method (see e.g. [26, p. 46]), the Padé approximation in the third substep of the modified Davison–Maki method (Subsection 3.4) corresponds to a symplectic Runge–Kutta method. Thus each substep of the modified Davison–Maki method outputs a symmetric nonnegative matrix and as a result $Y_{k}(t)$ is symmetric nonnegative. Therefore also $X_{k}(t)=V_{k}Y_{k}(t)V_{k}^{T}$ is symmetric nonnegative. ∎

Theorem 8.

The numerical approximation given by Algorithm 1 preserves the property of monotonicity in the following sense. Consider two DREs corresponding to linearizations with the coefficient matrices

[TABLE]

such that

[TABLE]

Suppose both systems are projected using the same orthogonal matrix $V_{k}\in\mathbb{R}^{n\times k}$ , giving as a result small $k$ -dimensional systems of the form (17) for the matrices $Y_{k}(t)$ and $\widetilde{Y}_{k}(t)$ . Then, for the matrices $X_{k}(t)=V_{k}Y_{k}(t)V_{k}^{T}$ and $\widetilde{X}_{k}(t)=V_{k}\widetilde{Y}_{k}(t)V_{k}^{T}$ we have

[TABLE]

Proof.

Consider the projected systems of the form (17) corresponding to $Y_{k}(t)$ and $\widetilde{Y}_{k}(t)$ with the projected coefficient matrices $H_{k}$ , $Q_{k}$ and $S_{k}$ , and $\widetilde{H}_{k}$ , $\widetilde{Q}_{k}$ and $\widetilde{S}_{k}$ , respectively. Consider also the corresponding linearizations of the form (19) with the Hamiltonian matrices

[TABLE]

By the reasoning of the proof of Theorem 7, the projected systems corresponding to $Y_{k}(t)$ and $\widetilde{Y}_{k}(t)$ are symmetric DREs. We see that

[TABLE]

where $J_{k}=\left[\begin{smallmatrix}0&I\\ -I&0\end{smallmatrix}\right]\in\mathbb{R}^{2k\times 2k}$ . Thus, from (20) it follows that $\widetilde{\mathcal{H}}_{k}J_{k}\leq\widetilde{\mathcal{H}}_{k}J_{k}$ . Clearly, also $0\leq Y_{k}(0)\leq\widetilde{Y}_{k}(0)$ . By Theorem 6 of [13], an application of a symplectic Runge–Kutta scheme with positive weights $b_{i}$ (see [13] for details) preserves the monotonicity. Thus the Padé approximants of the substeps of the modified Davison–Maki method (Subsection 3.4) preserve the monotonicity. Therefore, $Y_{k}(t)\leq\widetilde{Y}_{k}(t)$ and as a consequence $X_{k}(t)\leq\widetilde{X}_{k}(t)$ . ∎

Remark 1.

As the basis matrix $V_{k}$ given by Algorithm 1 is independent of the matrix $S=BB^{T}$ in the DRE (5), where $B$ is the control matrix in the original linear system (2), we see that Algorithm 1 preserves monotonicity under modifications of $B$ . However, if we change the initial value $X_{0}$ or the matrix $Q$ , then forming a new basis $V_{k}$ is generally needed. The fact that $V_{k}$ is independent of $B$ can also be seen by considering similar projection methods for the algebraic Riccati equation, see e.g. [37] and the references therein.

4 A priori error analysis

We first consider the approximation of the DRE without the quadratic term $-XSX$ , i.e., we consider the differential Lyapunov equation. This clarifies the presentation as the derived bounds will be needed when we consider the approximation of the differential Riccati equation. We note, however, that the bounds for the Lyapunov equation are applicable outside of the scope of the optimal control problems, e.g., when considering time integration of an inhomogeneous matrix differential equation.

4.1 Error analysis for the Lyapunov equation

Consider the symmetric Lyapunov differential equation with low rank initial data and constant low rank inhomogeneity,

[TABLE]

where $Z\in\mathbb{R}^{n\times p}$ and $C\in\mathbb{R}^{n\times q}$ , $p,q\ll n$ . Then, the approximation is given by $X_{k}(t)=V_{k}Y_{k}(t)V_{k}^{T}$ , where $Y_{k}(t)$ is a solution of the projected system (17) with $S=0$ . For the error of this approximation we obtain the following bound.

Theorem 9.

Let $A\in\mathbb{R}^{n\times n}$ , $Z\in\mathbb{R}^{n\times p}$ , $C\in\mathbb{R}^{n\times q}$ , and let $X(t)$ be the solution of (21). Let $V_{k}\in\mathbb{R}^{n\times m(q+p)}$ be an orthogonal basis of the block Krylov subspace $\mathcal{K}_{k}\big{(}A,\begin{bmatrix}Z&C\end{bmatrix}\big{)}$ . Let $Y_{k}(t)$ be the solution of the projected system (17) with $S=0$ , and let $X_{k}(t)=V_{k}Y_{k}(t)V_{k}^{T}$ . Then,

[TABLE]

Proof.

Using the integral representation of Theorem 2 for both $X(t)$ and $Y_{k}(t)$ , we see that

[TABLE]

where

[TABLE]

and

[TABLE]

Adding and substracting ${\rm e}\hskip 1.0pt^{tA}ZZ^{T}V_{k}{\rm e}\hskip 1.0pt^{tH_{k}^{T}}V_{k}^{T}$ to the right hand side of (22) gives

[TABLE]

Using Lemma 6 to bound the norm of ${\rm e}\hskip 1.0pt^{tA}Z-V_{k}{\rm e}\hskip 1.0pt^{tH_{k}}V_{k}^{T}Z$ , and using the fact that $\mu(H_{k})\leq\mu(A)$ and that $\|X_{0}\|=\|ZZ^{T}\|=\|Z\|^{2}$ , gives

[TABLE]

Then, similarly, adding and substracting the term $\int_{0}^{t}{\rm e}\hskip 1.0pt^{(t-s)A}CC^{T}V_{k}{\rm e}\hskip 1.0pt^{(t-s)H_{k}^{T}}V_{k}^{T}\,\hskip 1.0pt{\rm d}\hskip 0.5pts$ to (23) and applying Lemma 6 shows that

[TABLE]

which shows the claim. ∎

We note that the error bound of Theorem 9, similarly to the bounds given in [17], exhibits a hump before it starts to decrease in case $\|tA\|>1$ . Improved bounds for special cases of the matrix $A$ are possible by using, e.g., results of [23].

4.2 Refined error bounds for the Lyapunov equation

Although Theorem 9 shows the superlinear convergence speed for the approximation of the Lyapunov equation (21), sharper bounds can be obtained, e.g., by using the bounds given in [23]. As an example we consider the following. If $A$ is symmetric negative semi-definite with its spectrum inside the interval $[-4\rho,0]$ , and $V_{k}$ is an orthonormal basis matrix for the block Krylov subspace $\mathcal{K}_{k}(A,B)$ , we have (see [23, Thm. 2]) for the error $\varepsilon_{k}:=\|{\rm e}\hskip 1.0pt^{tA}B-V_{k}{\rm e}\hskip 1.0pt^{tH_{k}}V_{k}^{T}B\|$ the bound

[TABLE]

Using (24) and following the proof of Theorem 9, we get the following bound for the case of a symmetric negative semidefinite $A$ .

Theorem 10.

Let $A\in\mathbb{R}^{n\times n}$ , $Z\in\mathbb{R}^{n\times p}$ and $C\in\mathbb{R}^{n\times q}$ define the Lyapunov equation 21. Let $V_{k}\in\mathbb{R}^{n\times m(q+p)}$ be an orthogonal basis matrix of the subspace $\mathcal{K}_{k}(A,\begin{bmatrix}Z&C\end{bmatrix}$ . Let $Y_{k}(t)$ be the solution of the projected (using $V_{k}$ ) system (17) with $S=0$ , and let $X_{k}(t)=V_{k}Y_{k}(t)V_{k}^{T}$ . Then, for the error $\varepsilon_{k}:=\|X(t)-X_{k}(t)\|$ it holds that

[TABLE]

The bound (25) can be illustrated with the following simple numerical example. Let $A\in\mathbb{R}^{400\times 400}$ be the tridiagonal matrix $10^{2}\cdot\mathrm{diag}(1,-2,1)$ , $t=0.05$ , and let $Z\in\mathbb{R}^{400}$ and $C\in\mathbb{R}^{400}$ be random vectors. Figure 1 shows the convergence of the algorithm vs. the a priori bound (25).

4.3 Error for the approximation of the Riccati equation

Here, we state our main theorem which shows the superlinear convergence property of Algorithm 1 when applied to the DRE (5). Its proof, which is essentially based on Lemma 6 and Grönwall’s lemma, is lengthy and is left to the appendix.

First, however, we state a bound for the norm of the numerical solution $X_{k}(t)$ which will be needed in the proof of the main theorem.

Lemma 11.

Suppose, $X_{0}=ZZ^{T}$ , $Q=CC^{T}$ and that $S$ is symmetric nonnegative. Then, $X_{k}(t)$ is symmetric nonnegative, and satisfies the bound

[TABLE]

Proof.

As $ZZ^{T}$ , $CC^{T}$ and $S$ are symmetric and nonnegative, we see from (17) that so are the orthogonally projected matrices $Z_{k}Z_{k}^{T}$ , $C_{k}C_{k}^{T}$ and $S_{k}$ . Thus, the projected system is a symmetric DRE. Applying Lemma 4 to the projected system, and using the bounds $\mu(H_{k})\leq\mu(A)$ , $\|Q_{k}\|\leq\|Q\|$ and $\|V_{k}V_{k}^{T}X_{0}V_{k}V_{k}^{T}\|\leq\|X_{0}\|$ shows the claim. ∎

From Lemma 11 we immediately get the following bound.

Corollary 12.

The numerical solution $X_{k}(t)$ satisfies

[TABLE]

We are now ready to state an error bound for the DRE. The proof is left to the appendix.

Theorem 13.

Let $A\in\mathbb{R}^{n\times n}$ , $Z\in\mathbb{R}^{n\times p}$ , $C\in\mathbb{R}^{n\times q}$ and $S\in\mathbb{R}^{n\times n}$ defined the DRE (5). Let $X_{k}(t)$ be the numerical solution given by Algorithm 1. Then, the following bound holds:

[TABLE]

where

[TABLE]

and

[TABLE]

5 A posteriori error estimation

We consider next an a posteriori error estimation for the method by using ideas presented in [8].

Denote the original DRE (5) as

[TABLE]

Using the residual matrix $R_{k}(t)=F(X_{k}(t))-\dot{X}_{k}(t)$ we derive computable error estimates. These derivations are based on the following lemma.

Lemma 14.

The error $\mathcal{E}_{k}(t):=X(t)-X_{k}(t)$ satisfies the equation

[TABLE]

where

[TABLE]

Proof.

We see that the error $\mathcal{E}_{k}(t)$ satisfies the ODE

[TABLE]

with the initial value $\mathcal{E}_{k}(0)=0$ . Applying the variation-of-constants formula to (29) gives (27).

Next we show the representation (28). Since

[TABLE]

and

[TABLE]

we see that

[TABLE]

since $V_{k}Q_{k}V_{k}^{T}=V_{k}V_{k}^{T}CC^{T}V_{k}V_{k}^{T}=CC^{T}=Q$ as $C\in\mathrm{R}(V_{k})$ . Substituting the Arnoldi relation $AV_{k}-V_{k}H_{k}=U_{k+1}H_{k+1,k}E_{k}^{T}$ into (30) gives the representation (28). ∎

To derive a heuristic a posteriori estimate, we neglect the second term in equation (27) and approximate the first integral by leaving out the exponentials. This is especially meaningful in the case $A$ has its numerical range on left half-plane, since then the exponentials have their norms less than or equal to 1. This leads to the approximation

[TABLE]

From a careful inspection we see that $U_{k+1}^{T}V_{k}=0$ implies

[TABLE]

The integral $\int_{0}^{t}Y_{k}(s)\,\hskip 1.0pt{\rm d}\hskip 0.5pts$ can be estimated by simply summing

[TABLE]

where $\Delta t=t/m$ and where the intermediate values $Y_{k}(\ell\Delta t)$ can be obtained from the summing and squaring phase of Algorithm 1 (Subsection 3.4). From (31) and (32) we arrive to a computationally efficient a posteriori estimate

[TABLE]

To illustrate the efficiency of this estimate consider the following toy example. Let $A\in\mathbb{R}^{400\times 400}$ be the tridiagonal matrix $10^{2}\cdot\mathrm{diag}(1,-2,1)$ , $t=0.1$ , and let $Z\in\mathbb{R}^{400}$ and $C\in\mathbb{R}^{400}$ be random vectors. Figure 2 shows the error $\|X(t)-X_{k}(t)\|$ vs. the estimate (33).

We note that by using the error representation (27) and the residual $R_{k}(t)$ given in (28) it is possible to derive corrected schemes, similarly as is done for the matrix exponential in [8] and [34].

6 Rank cut

We describe next a rank cut strategy used in our numerical experiments. When using Algorithm 1, if $\mathrm{rank}\begin{bmatrix}X_{0}&Q\end{bmatrix}=m$ , then after $k$ iterations the numerical solution $X_{k}(t)=V_{k}Y_{k}(t)V_{k}^{T}$ has a rank at most $km$ and memory for $\mathcal{O}(kmn)$ entries is needed. However, the numerical rank of $X_{k}(t)$ may be considerably smaller already for small treshold values. Therefore, when performing multiple time stepping it is reasonable to cut the rank after each step. This can be done, for example, as follows. Let $X\in\mathbb{R}^{n\times n}$ and $\sigma_{1}\leq\sigma_{2}\leq\ldots\leq\sigma_{n}$ denote the singular values of $X$ and $u_{i},v_{i}$ the corresponding left and right singular vectors. Consider the projection

[TABLE]

This projection is efficiently applied on the numerical solution $X_{k}(t)$ given by Algorithm 1 since obviously

[TABLE]

We have the following bound for the effect of the rank cut of the initial value.

Theorem 15.

Suppose $\widetilde{X}_{0}=P_{\varepsilon}(X_{0})$ , and let $X(t)$ and $\widetilde{X}(t)$ be solutions of the system (5) for initial values $X_{0}$ and $\widetilde{X}_{0}$ , respectively. Then,

[TABLE]

Proof.

By Thm. 2,

[TABLE]

Since $\widetilde{X}_{0}=P_{\varepsilon}(X_{0})$ , we see that $\widetilde{X}_{0}\leq X_{0}$ . Then, from Thm. 3 it follows that $\widetilde{X}(t)\leq X(t)$ . Furthermore, by Lemma 17, $\widetilde{X}(s)S\widetilde{X}(s)\leq X(s)SX(s)$ for all $s\geq 0$ . Therefore, the second integral on the right hand side of (34) is a negative semidefinite matrix and thus $\|X(t)-\widetilde{X}(t)\|\leq\|{\rm e}\hskip 1.0pt^{tA}(X_{0}-\widetilde{X}_{0}){\rm e}\hskip 1.0pt^{tA^{T}}\|$ , and the claim follows. ∎

A straightforward corollary of Theorem 15 is an estimate for the total error arising solely from the rank cutting.

Corollary 16.

Consider a time stepping scheme where a rank cut of size $\varepsilon_{\ell}$ is made after every step $\ell$ , and that the substeps are otherwise exact solutions of (5). I.e., the result of step $\ell$ is $X_{\ell}=P_{\varepsilon_{\ell}}(X(h))$ , where $X(h)$ is the solution of (5) with the initial value $X_{\ell-1}$ . Carrying out this procedure for $n$ steps, it follows from 15 and Lady Windermere’s fan (see [20, Ch. I.7]) that the global error satisfies

[TABLE]

7 Numerical experiments: optimal cooling problem

As a numerical example we consider an optimal cooling problem described in [6] (see also Example 2 in [42]). The underlying linear system is of the form

[TABLE]

where the coefficient matrices arise from a finite element discretization of the cross section of a rail. A discretization of dimension $n$ gives coefficient matrices $A,M\in\mathbb{R}^{n\times n}$ , $B\in\mathbb{R}^{n\times 7}$ and $C\in\mathbb{R}^{n\times 6}$ , where $A$ is symmetric. This leads to a symmetric DRE of the form (5) with the coefficient matrices $\widetilde{A}=M^{-1}A$ , $Q=C^{T}C$ and $S=M^{-1}B(M^{-1}B)^{T}$ . We take zero initial value for the DRE. The mass matrix $M$ is sparse so the products using the matrix $M^{-1}A$ are cheap. We note that by a symmetric decomposition of the mass matrix, $M=L^{T}L$ , the system could also be written as a system using a symmetric coefficient matrix $L^{-T}AL^{-1}$ for the scaled variable $LXL^{-1}$ .

7.1 Case $n=1357$

Figure 3 shows the convergence of Algorithm 1 and an a posteriori error estimate given by (33), when $T=10$ . We compute the spectral norm error $\|X(T)-X_{k}(T)\|$ for different Krylov subspace dimensions $k$ . For the scaling and squaring part (Subsection 3.4), we set the parameter $m=10$ . Table 1 shows the CPU time needed for the block Krylov process and for the scaling squaring part of Algorithm 1, for four different Krylov subspace sizes.

Figure 4 shows the convergence of a single step for $T=20$ , when we apply the block orthogonalisation procedure of Subsection 3.1 on the Krylov subspace (11) and on a rational Krylov subspace (16) spanned by $A$ . For the rational Krylov subspace we set all nodes $s_{i}$ equal to 1. Here subspace dimension denotes the number of columns of the basis matrix $V_{k}$ . For comparison, we also consider the best low rank approximation of the solution $X(T)$ obtained from its singular value decomposition (SVD) for different ranks (denoted basis dimension in the figure). We see that the rational approximation needs a considerably smaller subspace for a given error than the polynomial approximation.

Next, we apply Algorithm 1 for $N=10$ subsequent steps. We set for the Krylov error a tolerance $\varepsilon$ , and use the a posteriori estimate (33) as a criterion for stopping the iteration. Also, after each step we cut the rank using the projector $P_{\varepsilon}$ . Figure 5 depicts the final errors at $T=10$ for 4 different values of $\varepsilon$ . As we see the final errors are not far from the tolerances $\varepsilon$ used for substeps. Figure 6 depicts the growth of the rank in the numerical solution for different tolerances $\varepsilon$ . We see that the substepping approach requires less memory for a given error tolerance than a single run using Algorithm 1. This is depicted in Table 2.

As a last experiment for the case $n=1357$ , we carry out a time integration up to $T=4500$ using $900$ substeps. Figure 7 shows the relative spectral norm error along the time integration, i.e., the error $\|\widetilde{X}(t)-X(t)\|/\|X(t)\|$ , where $\widetilde{X}(t)$ denotes the numerical solution. We use a Krylov subspace dimension $k=32$ for the first substep and $k=20$ for the rest. After each time step, we cut the rank of the numerical solution to $40$ using SVD. By these choices of Krylov subspace sizes we ensure that the error arising from the rank cut dominates at each time step.

Next, in order to use the estimate (35), we approximate $\mu(A)\approx 0$ and assuming that the error arising from the rank cut dominates the total error. We then approximate the total using (35) as

[TABLE]

Figure 8 shows the error arising from the best 2-norm approximation after each step, i.e., the singular value $\sigma_{41}$ , and the estimate (37). We see that the error in the end is not far from $900\cdot\sigma_{41}^{(900)}$ , the number of time steps times the largest rank cut.

7.2 Case $n=5177$

Next we consider a finite element discretization with $n=5177$ . Figure 9 shows the convergence of Algorithm 1 and an a posteriori error estimate given by (33), when $T=5$ . For the scaling and squaring part we set the parameter $m=10$ .

We next carry out a time integration up to $T=2000$ using $1000$ substeps. We estimate the total error without access to a reference solution using the estimate (37). As above, we use a Krylov subspace dimension $k=32$ for the first step, and $k=20$ for rest of the steps, and cut the rank to 40 after each step using SVD. We see from Figure 10 that the a posteriori error estimate for Algorithm 1 is negligible compared to the error arising from the best 2-norm approximation at each step. Figure 10 shows also the estimate (37). We see that the estimate is of the same order ( $\approx$ 10 times bigger) as in the $n=1357$ -case.

8 Conclusions and Outlook

We have proposed a Krylov subspace approximation method for large scale differential Riccati equations. We have proven that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the property of the positivity and also under certain conditions also the property of the monotonicity. We have also provided an a priori error analysis of the Krylov subspace approximation which shows a superlinear convergence. This behavior was also verified in numerical experiments. In addition, an a posteriori error analysis was carried out and the proposed estimate was shown to be accurate in numerical examples. In order to limit the memory consumption, we considered limiting the rank of the numerical solution in multiple time stepping. To avoid excessively large approximation basis $V_{k}$ , more studies of the rational Krylov subspaces are needed. Their benefits were illustrated in numerical experiments.

We would like to point out that the presented block Krylov subspace method can be extended to the unsymmetric differential Riccati equation. A possible extension could also be the nonautonomous case, i.e, the case in which the coefficient matrices are $Q$ , $S$ and $A$ are time dependent. In this case an essential tool would be the so called Magnus expansion (see e.g. [2]) which gives the fundamental solution of the linear system corresponding to the time dependent coefficient matrix $A$ .

Acknowledgments

The authors thank Valeria Simoncini for pointing out relevant literature related to the algebraic Riccati equation and Tony Stillfjord for several helpful comments on a draft of the paper.

Appendix A Auxiliary Lemmas and the proof of Thm. 13

We first state two lemmas needed in Thm. 15 and Thm. 13, respectively.

Lemma 17.

Let $A,\widetilde{A},B$ be symmetric positive semidefinite matrices such that $A\leq\widetilde{A}$ . Then, also

[TABLE]

Proof.

Assume first that $B$ is positive definite. Then, (see [25, p. 431])

[TABLE]

and therefore also (see [25, p. 438])

[TABLE]

Then, we see that

[TABLE]

Then, consider the matrix $B_{\varepsilon}=B+\varepsilon I$ , $\varepsilon>0$ , where $B$ is positive semidefinite. Clearly, $B_{\varepsilon}$ is positive definite for all $\varepsilon>0$ . Therefore

[TABLE]

Taking the limit $\varepsilon\rightarrow 0$ and using the fact that

[TABLE]

is a continuous function of $\varepsilon$ , the claim follows. ∎

Lemma 18.

Let $A\in\mathbb{R}^{n\times n}$ , $B\in\mathbb{R}^{n\times\ell}$ , let $V_{k}$ be a matrix with orthonormal columns such that $\mathcal{K}_{k}(A,B)\subset\mathrm{R}(V_{k})$ and let $H_{k}=V_{k}^{T}AV_{k}$ . Then, for all $t,s>0$ it holds that

[TABLE]

Proof.

Using the polynomial approximation property of the Krylov approximation (see [34, Lemma 3.1]), we see that

[TABLE]

Therefore,

[TABLE]

Using the bounds $\|{\rm e}\hskip 1.0pt^{tA}\|\leq{\rm e}\hskip 1.0pt^{t\mu(A)}$ , $\mu(H_{k})\leq\mu(A)$ and (see [17, Lemma A.2])

[TABLE]

on the four terms on the RHS of (39), the claim follows. ∎

Lemma 19.

Let $X(s)$ be the solution of the Riccati differential equation (5) at time $s$ , $0\leq s\leq t$ , and let $V_{k}$ be a matrix with orthonormal columns such that $\mathcal{K}_{k}(A,\begin{bmatrix}C&Z\end{bmatrix})\subset\mathrm{R}(V_{k})$ . Denote $H_{k}=V_{k}^{T}AV_{k}$ . Then, the following bound holds:

[TABLE]

where

[TABLE]

Proof.

Using the integral representation (9) for $X(s)$ we may write

[TABLE]

where

[TABLE]

and

[TABLE]

By using Lemma 18, we obtain for the expressions inside the square brackets on right hand side of (41) the bounds

[TABLE]

and

[TABLE]

Thus,

[TABLE]

From (42) we see that

[TABLE]

Next we bound the first factor in the integrand of (45). We substitute the integral representation (9) for $X(u)$ to find that

[TABLE]

As above when bounding $\|c_{1,k}(t,s)\|$ , we use Lemma 18 on the expressions inside the square brackets on right hand side of (46), to get the inequality

[TABLE]

Applying Grönwall’s lemma on (47), we find that

[TABLE]

Substituting (48) into (45), we get

[TABLE]

The bounds (44) and (49) together show the claim. ∎

Using Lemmas 18 and 19 we are now ready to prove Theorem 13.

Proof of Theorem 13

Proof.

From the integral representation (9) for $X(t)$ and for the solution $Y_{k}(t)$ of the small dimensional system (17), we see that

[TABLE]

where

[TABLE]

and

[TABLE]

Theorem 9 shows that $F_{1,k}(t)$ is bounded as

[TABLE]

We add and substract the term

[TABLE]

to (51) to obtain

[TABLE]

where

[TABLE]

From (53) and (54) we see that

[TABLE]

where

[TABLE]

The claim follows now from (50), (52), (55), Lemma 18, Grönwall’s lemma, Corollary 5 and Corollary 12, which form a sequence of substitutions.∎

Bibliography45

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Abou-Kandil, G. Freiling, V. Ionescu, and G. Jank , Matrix Riccati Equations in Control and Systems Theory , Birkhäuser, Basel, 2003.
2[2] P. Bader, S.Blanes and E. Ponsoda , Structure preserving integrators for solving (non-) linear quadratic optimal control problems with applications to describe the flight of a quadrotor , J. Comput. Appl. Math. 262 (2014), pp.223–233.
3[3] B. Beckermann and L. Reichel , Error estimates and evaluation of matrix functions via the Faber transform , SIAM J. Numer. Anal., 47.5 (2009), pp. 3849–3883.
4[4] P. Benner and H. Mena , Numerical solution of the infinite-dimensional LQR problem and the associated riccati differential equations , J. Numer. Math., De Gruyter (accepted), (2016), DOI: 10.1515/jnma-2016-1039.
5[5] Rosenbrock methods for solving Riccati differential equations , IEEE Trans. Autom. Control 58.11 (2013), pp. 2950–2956.
6[6] P. Benner and J. Saak , A semi-discretized heat transfer model for optimal cooling of steel profiles , In: Dimension reduction of large-scale systems, vol. 45, Lecture Notes in Computational Science and Engineering, P. Benner, V. Mehrmann, and D. Sorensen, Eds. Berlin/Heidelberg, Springer, 2005, pp. 353–356.
7[7] D.A. Bini, B. Iannazzo and B. Meini , Numerical solution of algebraic Riccati equations , SIAM, Philadephia, 2011.
8[8] M.A. Botchev, V. Grimm and M. Hochbruck , Residual, restarting, and Richardson iteration for the matrix exponential , SIAM J. Sci. Comput. 35.3 (2013), pp. A 1376–A 1397.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Analysis of Krylov subspace approximation to

Abstract

keywords:

AMS:

1 Introduction

Notation and definitions

2 Preliminaries

2.1 Linearization

Lemma 1** (Associated linear system).**

2.2 Integral representation of the exact solution

Theorem 2** (Exact solution of the DRE).**

Proof.

2.3 Positivity and monotonicity of the exact flow

Theorem 3** (Positivity and monotonicity of the solution).**

2.4 Bound for the solution

Lemma 4** (Bound for the exact solution).**

Proof.

Corollary 5**.**

3 A Krylov subspace approximation and its structure preserving properties

3.1 Block Krylov subspace approximation of the matrix exponential

Lemma 6**.**

Proof.

3.2 Rational Krylov subspaces

3.3 The method

3.4 Solving the small dimensional system

3.5 Structure preserving properties of the approximation

Theorem 7**.**

Proof.

Theorem 8**.**

Proof.

Remark** 1****.**

4 A priori error analysis

4.1 Error analysis for the Lyapunov equation

Theorem 9**.**

Proof.

4.2 Refined error bounds for the Lyapunov equation

Theorem 10**.**

4.3 Error for the approximation of the Riccati equation

Lemma 11**.**

Proof.

Corollary 12**.**

Theorem 13**.**

5 A posteriori error estimation

Lemma 14**.**

Proof.

6 Rank cut

Theorem 15**.**

Proof.

Corollary 16**.**

7 Numerical experiments: optimal cooling problem

7.1 Case n=1357n=1357n=1357

7.2 Case n=5177n=5177n=5177

8 Conclusions and Outlook

Acknowledgments

Appendix A Auxiliary Lemmas and the proof of Thm. 13

Lemma 17**.**

Proof.

Lemma 18**.**

Proof.

Lemma 19**.**

Proof.

Proof of Theorem 13

Proof.

Lemma 1 (Associated linear system).

Theorem 2 (Exact solution of the DRE).

Theorem 3 (Positivity and monotonicity of the solution).

Lemma 4 (Bound for the exact solution).

Corollary 5.

Lemma 6.

Theorem 7.

Theorem 8.

Remark 1.

Theorem 9.

Theorem 10.

Lemma 11.

Corollary 12.

Theorem 13.

Lemma 14.

Theorem 15.

Corollary 16.

7.1 Case $n=1357$

7.2 Case $n=5177$

Lemma 17.

Lemma 18.

Lemma 19.