Analysis of Krylov Subspace Approximation to Large Scale Differential Riccati Equations
Antti Koskela, Hermann Mena

TL;DR
This paper analyzes a Krylov subspace method for large-scale symmetric differential Riccati equations, demonstrating structure preservation, superlinear convergence, and providing error estimates supported by numerical experiments.
Contribution
It introduces a structure-preserving Krylov subspace approximation for large-scale Riccati equations with proven superlinear convergence and practical error estimation methods.
Findings
The method preserves positivity and monotonicity of the Riccati flow.
Superlinear convergence of the approximation is theoretically established.
Numerical experiments confirm the effectiveness and accuracy of the approach.
Abstract
We consider a Krylov subspace approximation method for the symmetric differential Riccati equation , . The method we consider is based on projecting the large scale equation onto a Krylov subspace spanned by the matrix and the low rank factors of and . We prove that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the positivity of the exact flow, and also the property of monotonicity. We also provide a theoretical a priori error analysis which shows a superlinear convergence of the method. This behavior is illustrated in the numerical experiments. Moreover, we derive an efficient a posteriori error estimate as well as discuss multiple time stepping combined with a cut of the rank of the numerical solution.
| Krylov iteration | solving small dimensional system | |
| 0.046 | 0.037 | |
| 0.16 | 0.091 | |
| 0.31 | 0.20 | |
| 0.49 | 0.44 |
| time stepping | single step iteration | |
|---|---|---|
| 60 | 112 | |
| 76 | 147 | |
| 112 | 175 | |
| 160 | 203 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Analysis of Krylov subspace approximation to
Large Scale Differential Riccati Equations
Antti Koskela Department of Mathematics and Statisctics, University of Helsinki, [email protected].
Hermann Mena Department of Mathematics, Yachay Tech, Urcuquí, Ecuador, Department of Mathematics, University of Innsbruck, Innsbruck, Austria, [email protected].
Abstract
We consider a Krylov subspace approximation method for the symmetric differential Riccati equation , . The method we consider is based on projecting the large scale equation onto a Krylov subspace spanned by the matrix and the low rank factors of and . We prove that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the positivity of the exact flow, and also the property of monotonicity. We also provide a theoretical a priori error analysis which shows a superlinear convergence of the method. This behavior is illustrated in the numerical experiments. Moreover, we derive an efficient a posteriori error estimate as well as discuss multiple time stepping combined with a cut of the rank of the numerical solution.
keywords:
Differential Riccati equations, LQR optimal control problems, large scale ordinary differential equations, Krylov subspace methods, matrix exponential, exponential integrators, model order reduction, low rank approximation.
AMS:
65F10, 65F60, 65L20, 65M22, 93A15, 93C05
1 Introduction
Large scale differential Riccati equations (DREs) arise in the numerical treatment of optimal control problems governed by partial differential equations. This is the case in particular when solving a linear quadratic regulator problem (LQR), a widely studied problem in control theory. We shortly describe the finite dimensional LQR problem. For more details, we refer to [1, 9]. The differential Riccati equation arises in the finite horizon case, i.e., when a finite time integral cost functional is considered. Denoting the time interval , the functional has then the quadratic form
[TABLE]
where , () and (). The coefficient matrix of the penalizing term is symmetric, nonnegative and has a low rank. The functional (1) is constrained by the system of differential equations
[TABLE]
where the matrix is sparse and . The number of columns of corresponds to the number of controls and the matrix represents an observation matrix. Under suitable conditions [1, 9], the control minimizing the functional (1) is given by
[TABLE]
is the unique solution of
[TABLE]
and satisfies
[TABLE]
As a result, the central computational problem becomes that of solving the final value problem (4) which, with a careful change of variables, can be written as a initial value problem.
We consider a Krylov subspace approximation method for large scale differential Riccati equations of the form (4). A similar projection method for DREs has been recently proposed in [19]. Our approach differs from that of [19] in the fact that the initial value matrix of (4) is contained in the Krylov subspace. This allows multiple time stepping. Our approach is also related to projection techniques considered for large scale algebraic Riccati equations [32, 38].
Essentially, the method we consider is based on projecting the matrices and on an appropriate Krylov subspace, namely on the *block Krylov subspace *spanned by and the low rank factors of and . The projected small dimensional system is then solved using existing linearization techniques. We show that when using a Padé approximant to solve the linearized small dimensional system, the total approximation will be structure preserving in a sense that the property of the positivity is preserved. Also the property of monotonicity is preserved under certain conditions. Our Krylov subspace approach is also strongly related to Krylov subspace techniques used for approximation of the product of an matrix function and a vector, , and to exponential integrators [24]. For an introduction to matrix functions we refer to the monograph [21]. The effectiveness of these techniques comes from the fact that generating Krylov subspaces is essentially based on operations of the form , which are cheap for sparse .
The linearization approach for DREs is a well-known method. This allows an efficient integration for dense problems, see e.g. [31]. Another approach, the so called Davison–Maki method [10], uses the fundamental solution of the linearized system. A modified variant, avoiding some numerical instabilities, is proposed in [27]. However, the application of these methods for large scale problems is impossible due to the high dimensionality of the linearized differential equation.
The problem of solving large scale DREs has received recently considerable attention. In [5, 4] the authors proposed efficient BDF and Rosenbrock methods for solving DREs capable of exploiting several of the above described properties: sparsity of , low rank structure of , and , and the symmetry of the solution. However, several difficulties arise when approximating the optimal control (3) in the large scale setting. One difficulty is to evaluate the state equation and Riccati equation in the same mesh. In [30] a refined ADI integration method is proposed which addresses the high storage requirements of large scale DRE integrators. In recent studies an efficient splitting method [42] and adaptive high-order splitting schemes [41] for large scale DREs have been proposed.
The paper is organized as follows. In Section 2 we describe some preliminaries. Then, in Section 3, the structure preserving method is proposed. In Section 4, the error analysis first for the differential Lyapunov equation (a simplified version of the DRE), and then for the DRE is presented. In Section 5 a posteriori error estimation is described. In Section 6 the rank cut and multiple time stepping are discussed. Numerical examples and conclusions of Sections 7 and 8 conclude the article.
Notation and definitions
Throughout the article will denote the Euclidean norm, or its induced matrix norm, i.e., the spectral norm. By we denote the column space of a matrix . We say that a matrix is nonnegative if it is symmetric positive semidefinite, and write . For symmetric matrices and we write if .
We will repeatedly use the notion of the *logarithmic norm *of a matrix . It can be defined via the field of values , which is defined as
[TABLE]
Then, the logarithmic norm of is defined by
[TABLE]
We will also repeatedly use the exponential-like function defined by
[TABLE]
2 Preliminaries
From now on we consider the time invariant symmetric differential Riccati equation (DRE) written in the form
[TABLE]
where and , , . Specifically, we focus on the low rank positive semidefinite case, where
[TABLE]
for some and , , and is positive semidefinite. Notice that we changed here from to to (a common choice the numerical analysis literature [12, 13]) and from now on is tall and skinny instead of short and fat as in (4). Although arises from the low rank matrix in (4), we do not place any restriction on the rank of .
2.1 Linearization
We recall a fact that will be needed later on (see e.g. [1, Thm. 3.1.1.]).
Lemma 1** (Associated linear system).**
The DRE (5) is equivalent to solving the linear system of differential equations
[TABLE]
where . If the solution of (5) exists on the interval , then the solution of (7) exists, is invertible on , and
[TABLE]
Notice also that the matrix is Hamiltonian, i.e., it holds that
[TABLE]
This linearization approach is a standard method for solving finite dimensional DREs, and leads to efficient integration methods for dense problems, see e.g. [10].
2.2 Integral representation of the exact solution
For the exact solution of (5) we have the following integral representation (see also [29, Thm. 8]).
Theorem 2** (Exact solution of the DRE).**
The exact solution of the DRE (5) is given by
[TABLE]
Proof.
The proof can be carried out by elementary differentiation. ∎
2.3 Positivity and monotonicity of the exact flow
We recall two important properties of the symmetric DRE, namely the positivity of the exact solution (see e.g. [12, Prop. 1.1]) and the monotonicity of the solution with relative to the initial data (see e.g. [13, Thm. 2]). By these we mean the following.
Theorem 3** (Positivity and monotonicity of the solution).**
For the solution of the symmetric DRE (5) it holds:
(Positivity) is symmetric positive semidefinite and it exists for all . 2. 2.
(Monotonicity) Consider two symmetric DREs of the (5) corresponding to the linearized systems of the form (7) with the coefficient matrices
[TABLE]
and let be the skew-symmetric matrix (8). Then, if , and if , then for every : .
We will later show that our proposed numerical method preserves the properties of Theorem 3.
2.4 Bound for the solution
Using the positivity property of (Thm. 3) we obtain the following bound for the norm of the solution. This will be repeatedly needed in the analysis of the proposed method.
Lemma 4** (Bound for the exact solution).**
For the solution of (5) it holds
[TABLE]
Proof.
Since , and are all symmetric positive semidefinite, we see that the first two terms on the right hand side of (9) are symmetric positive semidefinite and the third term is symmetric negative semidefinite. Moreover, since is symmetric positive semidefinite by Theorem 3, and since for every symmetric positive definite matrix it holds that , we see that
[TABLE]
Using the well-known bound (see e.g. [43, p. 138]), the fact that and that , the claim follows. ∎
From Lemma 4 we immediately get the following corollary.
Corollary 5**.**
The solution satisfies
[TABLE]
3 A Krylov subspace approximation and its structure preserving properties
In this section we propose our projection method. The original problem (5) is projected to small dimensional space using a matrix with orthonormal columns which contains certain Krylov subspaces. The fact that needs to contain these subspaces can be seen from the point of view of Krylov subspace approximation of the matrix exponential (see also the solution formula (9)). This is strongly related to the approach taken by Saad already in [36] for the algebraic Lyapunov equation. Before introducing our projection method, we recall some basic facts about the Krylov subspace approximation of the matrix exponential. This will also give some auxiliary results that are needed later in the convergence analysis.
3.1 Block Krylov subspace approximation of the matrix exponential
The Krylov subspace approximation of products of the form has recently been an active topic of research, and we mention the work on classical Krylov subspaces [14, 17, 28, 34], extended Krylov subspaces [28], and rational Krylov subspaces [44, 3].
Block Krylov subspace methods are based on the idea of projecting a high dimensional problem involving a matrix and a block matrix onto a lower dimensional subspace, a block Krylov subspace , which is defined by
[TABLE]
Usually, an orthogonal basis matrix for is generated using an Arnoldi type iteration, and this matrix is then used for the projections. There exist several Arnoldi type methods to produce an orthogonal basis matrix for , and in numerical experiments we use the *block Arnoldi iteration *given in [35] which is listed algorithmically as follows.
Input: , and number of iterations . 2. 2.
Start. Compute QR decomposition: . 3. 3.
Iterate. *for compute:
[TABLE]
As usual, the orthogonalisation can be carried out at step 3 in a modified Gram–Schmidt manner and reorthogonalisation can be performed if needed.
This algorithm gives a basis matrix with orthogonal columns, , for the block Krylov subspace and the projected block Hessenberg matrix
[TABLE]
This means that the -block of is given by in the above algorithm. Moreover, the following Arnoldi relation holds:
[TABLE]
where .
If has its field of values on a line, e.g., is Hermitian or skew-Hermitian, then there exists such that is Hermitian. By (12) this implies that is block tridiagonal, the orthogonalisation recursions become three-term recursions, and we get the block Lanczos iteration.
The polynomial approximation property of Krylov subspaces motivates to approximate the product of the matrix exponential and a block matrix as
[TABLE]
where . For a vector , the approximation (14) was considered already in [14, 17], and for the case of a block matrix it has been considered also in [33].
Since the columns of are orthonormal, we have and from this it follows that . Clearly, it also holds that . Moreover, we have the following bound.
Lemma 6**.**
For the approximation (14) holds
[TABLE]
Proof.
The proof goes analogously to the proof of [17, Thm 2.1], where is a vector. ∎
3.2 Rational Krylov subspaces
We also mention the possibility of approximating matrix functions in *rational Krylov subspaces *(see e.g. [15], [18], [44] and [38]). For poles , , the rational Krylov subspace can be defined as (see also)
[TABLE]
Then, if a matrix with orthogonal columns gives a basis for the subspace , the matrix exponential can be approximated as (14), where . Especially for sparse matrices, the rational Krylov methods are often more efficient, and as the solution usually converges faster with respect to subspace dimension, the rational alternative is usually more memory efficient. These differences will be illustrated in numerical experiments. However, for simplicity, in our analysis and numerical experiments we will use the block Arnoldi iteration.
3.3 The method
We approximate in the block Krylov subspace \mathcal{K}_{k}\big{(}A,\begin{bmatrix}Z&C\end{bmatrix}\big{)}. The fact that the projection onto this subspace results as an accurate approximation can be seen from the form of the exact solution (9) and from the Krylov approximation properties shown in the last subsection. To this end, an orthogonal basis matrix for \mathcal{K}_{k}\big{(}A,\begin{bmatrix}Z&C\end{bmatrix}\big{)} is first generated using the block Arnoldi iteration. Then, we carry out the approximation as listed in Algorithm 1. Notice that the method works independently of the rank of .
3.4 Solving the small dimensional system
To solve the small dimensional system (17) we use the modified Davison–Maki method [27]. This method is chosen because of its structure preservation properties which are shown in Subsection 3.5. The method can be described as follows.
As shown in Lemma 1, the solution of the projected system (17) is given by
[TABLE]
Instead of directly evaluating by (18), which is the idea of the original Davison–Maki method [10], we perform substepping in order to avoid numerical instabilities arising from the inversion of the matrix in (18). This is exactly the modified Davison–Maki method, and it is presented in the following pseudocode. We denote Y_{k}^{j}\approx Y_{k}\big{(}\tfrac{j\cdot t}{m}\big{)}.
Input: Hamiltonian matrix , ,
time , substep size , . 2. 2.
Set: . 3. 3.
Iterate. *for :
[TABLE]
For computing the matrix exponential in Step 3, we use the 13 order diagonal Padé aproximant which is implemented in Matlab as ’expm’ command [22].
3.5 Structure preserving properties of the approximation
We next inspect the two properties stated in Theorem 3. We show that the proposed projection method preserves the property of the positivity of the exact flow, and it also preserves the property of monotonicity under the condition that the matrix used for the projection stays constant when the initial data for the DRE is changed. Notice that these results are not restricted to polynomial Krylov subspace methods.
Theorem 7**.**
The numerical approximation given by Algorithm 1 preserves the property of positivity stated in Theorem 3.
Proof.
The projected coefficient matrices , and the initial value of the small system (17) are clearly all symmetric nonnegative. Thus the small system (17) is a symmetric DRE. By Theorem 3.1 of [12], an application of a symplectic Runge–Kutta scheme with positive weights (see [13] for details) gives as a result a symmetric nonnegative solution. As the th order diagonal Padé approximant equals the stability function of the -stage Gauss–Legendre method (see e.g. [26, p. 46]), the Padé approximation in the third substep of the modified Davison–Maki method (Subsection 3.4) corresponds to a symplectic Runge–Kutta method. Thus each substep of the modified Davison–Maki method outputs a symmetric nonnegative matrix and as a result is symmetric nonnegative. Therefore also is symmetric nonnegative. ∎
Theorem 8**.**
The numerical approximation given by Algorithm 1 preserves the property of monotonicity in the following sense. Consider two DREs corresponding to linearizations with the coefficient matrices
[TABLE]
such that
[TABLE]
Suppose both systems are projected using the same orthogonal matrix , giving as a result small -dimensional systems of the form (17) for the matrices and . Then, for the matrices and we have
[TABLE]
Proof.
Consider the projected systems of the form (17) corresponding to and with the projected coefficient matrices , and , and , and , respectively. Consider also the corresponding linearizations of the form (19) with the Hamiltonian matrices
[TABLE]
By the reasoning of the proof of Theorem 7, the projected systems corresponding to and are symmetric DREs. We see that
[TABLE]
where . Thus, from (20) it follows that . Clearly, also . By Theorem 6 of [13], an application of a symplectic Runge–Kutta scheme with positive weights (see [13] for details) preserves the monotonicity. Thus the Padé approximants of the substeps of the modified Davison–Maki method (Subsection 3.4) preserve the monotonicity. Therefore, and as a consequence . ∎
Remark** 1****.**
As the basis matrix given by Algorithm 1 is independent of the matrix in the DRE (5), where is the control matrix in the original linear system (2), we see that Algorithm 1 preserves monotonicity under modifications of . However, if we change the initial value or the matrix , then forming a new basis is generally needed. The fact that is independent of can also be seen by considering similar projection methods for the algebraic Riccati equation, see e.g. [37] and the references therein.
4 A priori error analysis
We first consider the approximation of the DRE without the quadratic term , i.e., we consider the differential Lyapunov equation. This clarifies the presentation as the derived bounds will be needed when we consider the approximation of the differential Riccati equation. We note, however, that the bounds for the Lyapunov equation are applicable outside of the scope of the optimal control problems, e.g., when considering time integration of an inhomogeneous matrix differential equation.
4.1 Error analysis for the Lyapunov equation
Consider the symmetric Lyapunov differential equation with low rank initial data and constant low rank inhomogeneity,
[TABLE]
where and , . Then, the approximation is given by , where is a solution of the projected system (17) with . For the error of this approximation we obtain the following bound.
Theorem 9**.**
Let , , , and let be the solution of (21). Let be an orthogonal basis of the block Krylov subspace \mathcal{K}_{k}\big{(}A,\begin{bmatrix}Z&C\end{bmatrix}\big{)}. Let be the solution of the projected system (17) with , and let . Then,
[TABLE]
Proof.
Using the integral representation of Theorem 2 for both and , we see that
[TABLE]
where
[TABLE]
and
[TABLE]
Adding and substracting to the right hand side of (22) gives
[TABLE]
Using Lemma 6 to bound the norm of , and using the fact that and that , gives
[TABLE]
Then, similarly, adding and substracting the term to (23) and applying Lemma 6 shows that
[TABLE]
which shows the claim. ∎
We note that the error bound of Theorem 9, similarly to the bounds given in [17], exhibits a hump before it starts to decrease in case . Improved bounds for special cases of the matrix are possible by using, e.g., results of [23].
4.2 Refined error bounds for the Lyapunov equation
Although Theorem 9 shows the superlinear convergence speed for the approximation of the Lyapunov equation (21), sharper bounds can be obtained, e.g., by using the bounds given in [23]. As an example we consider the following. If is symmetric negative semi-definite with its spectrum inside the interval , and is an orthonormal basis matrix for the block Krylov subspace , we have (see [23, Thm. 2]) for the error the bound
[TABLE]
Using (24) and following the proof of Theorem 9, we get the following bound for the case of a symmetric negative semidefinite .
Theorem 10**.**
Let , and define the Lyapunov equation 21. Let be an orthogonal basis matrix of the subspace . Let be the solution of the projected (using ) system (17) with , and let . Then, for the error it holds that
[TABLE]
The bound (25) can be illustrated with the following simple numerical example. Let be the tridiagonal matrix , , and let and be random vectors. Figure 1 shows the convergence of the algorithm vs. the a priori bound (25).
4.3 Error for the approximation of the Riccati equation
Here, we state our main theorem which shows the superlinear convergence property of Algorithm 1 when applied to the DRE (5). Its proof, which is essentially based on Lemma 6 and Grönwall’s lemma, is lengthy and is left to the appendix.
First, however, we state a bound for the norm of the numerical solution which will be needed in the proof of the main theorem.
Lemma 11**.**
Suppose, , and that is symmetric nonnegative. Then, is symmetric nonnegative, and satisfies the bound
[TABLE]
Proof.
As , and are symmetric and nonnegative, we see from (17) that so are the orthogonally projected matrices , and . Thus, the projected system is a symmetric DRE. Applying Lemma 4 to the projected system, and using the bounds , and shows the claim. ∎
From Lemma 11 we immediately get the following bound.
Corollary 12**.**
The numerical solution satisfies
[TABLE]
We are now ready to state an error bound for the DRE. The proof is left to the appendix.
Theorem 13**.**
Let , , and defined the DRE (5). Let be the numerical solution given by Algorithm 1. Then, the following bound holds:
[TABLE]
where
[TABLE]
[TABLE]
and
[TABLE]
5 A posteriori error estimation
We consider next an a posteriori error estimation for the method by using ideas presented in [8].
Denote the original DRE (5) as
[TABLE]
Using the residual matrix we derive computable error estimates. These derivations are based on the following lemma.
Lemma 14**.**
The error satisfies the equation
[TABLE]
where
[TABLE]
Proof.
We see that the error satisfies the ODE
[TABLE]
with the initial value . Applying the variation-of-constants formula to (29) gives (27).
Next we show the representation (28). Since
[TABLE]
and
[TABLE]
we see that
[TABLE]
since as . Substituting the Arnoldi relation into (30) gives the representation (28). ∎
To derive a heuristic a posteriori estimate, we neglect the second term in equation (27) and approximate the first integral by leaving out the exponentials. This is especially meaningful in the case has its numerical range on left half-plane, since then the exponentials have their norms less than or equal to 1. This leads to the approximation
[TABLE]
From a careful inspection we see that implies
[TABLE]
The integral can be estimated by simply summing
[TABLE]
where and where the intermediate values can be obtained from the summing and squaring phase of Algorithm 1 (Subsection 3.4). From (31) and (32) we arrive to a computationally efficient a posteriori estimate
[TABLE]
To illustrate the efficiency of this estimate consider the following toy example. Let be the tridiagonal matrix , , and let and be random vectors. Figure 2 shows the error vs. the estimate (33).
We note that by using the error representation (27) and the residual given in (28) it is possible to derive corrected schemes, similarly as is done for the matrix exponential in [8] and [34].
6 Rank cut
We describe next a rank cut strategy used in our numerical experiments. When using Algorithm 1, if , then after iterations the numerical solution has a rank at most and memory for entries is needed. However, the numerical rank of may be considerably smaller already for small treshold values. Therefore, when performing multiple time stepping it is reasonable to cut the rank after each step. This can be done, for example, as follows. Let and denote the singular values of and the corresponding left and right singular vectors. Consider the projection
[TABLE]
This projection is efficiently applied on the numerical solution given by Algorithm 1 since obviously
[TABLE]
We have the following bound for the effect of the rank cut of the initial value.
Theorem 15**.**
Suppose , and let and be solutions of the system (5) for initial values and , respectively. Then,
[TABLE]
Proof.
By Thm. 2,
[TABLE]
Since , we see that . Then, from Thm. 3 it follows that . Furthermore, by Lemma 17, for all . Therefore, the second integral on the right hand side of (34) is a negative semidefinite matrix and thus , and the claim follows. ∎
A straightforward corollary of Theorem 15 is an estimate for the total error arising solely from the rank cutting.
Corollary 16**.**
Consider a time stepping scheme where a rank cut of size is made after every step , and that the substeps are otherwise exact solutions of (5). I.e., the result of step is , where is the solution of (5) with the initial value . Carrying out this procedure for steps, it follows from 15 and Lady Windermere’s fan (see [20, Ch. I.7]) that the global error satisfies
[TABLE]
7 Numerical experiments: optimal cooling problem
As a numerical example we consider an optimal cooling problem described in [6] (see also Example 2 in [42]). The underlying linear system is of the form
[TABLE]
where the coefficient matrices arise from a finite element discretization of the cross section of a rail. A discretization of dimension gives coefficient matrices , and , where is symmetric. This leads to a symmetric DRE of the form (5) with the coefficient matrices , and . We take zero initial value for the DRE. The mass matrix is sparse so the products using the matrix are cheap. We note that by a symmetric decomposition of the mass matrix, , the system could also be written as a system using a symmetric coefficient matrix for the scaled variable .
7.1 Case
Figure 3 shows the convergence of Algorithm 1 and an a posteriori error estimate given by (33), when . We compute the spectral norm error for different Krylov subspace dimensions . For the scaling and squaring part (Subsection 3.4), we set the parameter . Table 1 shows the CPU time needed for the block Krylov process and for the scaling squaring part of Algorithm 1, for four different Krylov subspace sizes.
Figure 4 shows the convergence of a single step for , when we apply the block orthogonalisation procedure of Subsection 3.1 on the Krylov subspace (11) and on a rational Krylov subspace (16) spanned by . For the rational Krylov subspace we set all nodes equal to 1. Here subspace dimension denotes the number of columns of the basis matrix . For comparison, we also consider the best low rank approximation of the solution obtained from its singular value decomposition (SVD) for different ranks (denoted basis dimension in the figure). We see that the rational approximation needs a considerably smaller subspace for a given error than the polynomial approximation.
Next, we apply Algorithm 1 for subsequent steps. We set for the Krylov error a tolerance , and use the a posteriori estimate (33) as a criterion for stopping the iteration. Also, after each step we cut the rank using the projector . Figure 5 depicts the final errors at for 4 different values of . As we see the final errors are not far from the tolerances used for substeps. Figure 6 depicts the growth of the rank in the numerical solution for different tolerances . We see that the substepping approach requires less memory for a given error tolerance than a single run using Algorithm 1. This is depicted in Table 2.
As a last experiment for the case , we carry out a time integration up to using substeps. Figure 7 shows the relative spectral norm error along the time integration, i.e., the error , where denotes the numerical solution. We use a Krylov subspace dimension for the first substep and for the rest. After each time step, we cut the rank of the numerical solution to using SVD. By these choices of Krylov subspace sizes we ensure that the error arising from the rank cut dominates at each time step.
Next, in order to use the estimate (35), we approximate and assuming that the error arising from the rank cut dominates the total error. We then approximate the total using (35) as
[TABLE]
Figure 8 shows the error arising from the best 2-norm approximation after each step, i.e., the singular value , and the estimate (37). We see that the error in the end is not far from , the number of time steps times the largest rank cut.
7.2 Case
Next we consider a finite element discretization with . Figure 9 shows the convergence of Algorithm 1 and an a posteriori error estimate given by (33), when . For the scaling and squaring part we set the parameter .
We next carry out a time integration up to using substeps. We estimate the total error without access to a reference solution using the estimate (37). As above, we use a Krylov subspace dimension for the first step, and for rest of the steps, and cut the rank to 40 after each step using SVD. We see from Figure 10 that the a posteriori error estimate for Algorithm 1 is negligible compared to the error arising from the best 2-norm approximation at each step. Figure 10 shows also the estimate (37). We see that the estimate is of the same order ( 10 times bigger) as in the -case.
8 Conclusions and Outlook
We have proposed a Krylov subspace approximation method for large scale differential Riccati equations. We have proven that the method is structure preserving in the sense that it preserves two important properties of the exact flow, namely the property of the positivity and also under certain conditions also the property of the monotonicity. We have also provided an a priori error analysis of the Krylov subspace approximation which shows a superlinear convergence. This behavior was also verified in numerical experiments. In addition, an a posteriori error analysis was carried out and the proposed estimate was shown to be accurate in numerical examples. In order to limit the memory consumption, we considered limiting the rank of the numerical solution in multiple time stepping. To avoid excessively large approximation basis , more studies of the rational Krylov subspaces are needed. Their benefits were illustrated in numerical experiments.
We would like to point out that the presented block Krylov subspace method can be extended to the unsymmetric differential Riccati equation. A possible extension could also be the nonautonomous case, i.e, the case in which the coefficient matrices are , and are time dependent. In this case an essential tool would be the so called Magnus expansion (see e.g. [2]) which gives the fundamental solution of the linear system corresponding to the time dependent coefficient matrix .
Acknowledgments
The authors thank Valeria Simoncini for pointing out relevant literature related to the algebraic Riccati equation and Tony Stillfjord for several helpful comments on a draft of the paper.
Appendix A Auxiliary Lemmas and the proof of Thm. 13
We first state two lemmas needed in Thm. 15 and Thm. 13, respectively.
Lemma 17**.**
Let be symmetric positive semidefinite matrices such that . Then, also
[TABLE]
Proof.
Assume first that is positive definite. Then, (see [25, p. 431])
[TABLE]
and therefore also (see [25, p. 438])
[TABLE]
Then, we see that
[TABLE]
Then, consider the matrix , , where is positive semidefinite. Clearly, is positive definite for all . Therefore
[TABLE]
Taking the limit and using the fact that
[TABLE]
is a continuous function of , the claim follows. ∎
Lemma 18**.**
Let , , let be a matrix with orthonormal columns such that and let . Then, for all it holds that
[TABLE]
Proof.
Using the polynomial approximation property of the Krylov approximation (see [34, Lemma 3.1]), we see that
[TABLE]
Therefore,
[TABLE]
Using the bounds , and (see [17, Lemma A.2])
[TABLE]
on the four terms on the RHS of (39), the claim follows. ∎
Lemma 19**.**
Let be the solution of the Riccati differential equation (5) at time , , and let be a matrix with orthonormal columns such that . Denote . Then, the following bound holds:
[TABLE]
where
[TABLE]
Proof.
Using the integral representation (9) for we may write
[TABLE]
where
[TABLE]
and
[TABLE]
By using Lemma 18, we obtain for the expressions inside the square brackets on right hand side of (41) the bounds
[TABLE]
and
[TABLE]
Thus,
[TABLE]
From (42) we see that
[TABLE]
Next we bound the first factor in the integrand of (45). We substitute the integral representation (9) for to find that
[TABLE]
As above when bounding , we use Lemma 18 on the expressions inside the square brackets on right hand side of (46), to get the inequality
[TABLE]
Applying Grönwall’s lemma on (47), we find that
[TABLE]
Substituting (48) into (45), we get
[TABLE]
The bounds (44) and (49) together show the claim. ∎
Using Lemmas 18 and 19 we are now ready to prove Theorem 13.
Proof of Theorem 13
Proof.
From the integral representation (9) for and for the solution of the small dimensional system (17), we see that
[TABLE]
where
[TABLE]
and
[TABLE]
Theorem 9 shows that is bounded as
[TABLE]
We add and substract the term
[TABLE]
to (51) to obtain
[TABLE]
where
[TABLE]
From (53) and (54) we see that
[TABLE]
where
[TABLE]
The claim follows now from (50), (52), (55), Lemma 18, Grönwall’s lemma, Corollary 5 and Corollary 12, which form a sequence of substitutions.∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Abou-Kandil, G. Freiling, V. Ionescu, and G. Jank , Matrix Riccati Equations in Control and Systems Theory , Birkhäuser, Basel, 2003.
- 2[2] P. Bader, S.Blanes and E. Ponsoda , Structure preserving integrators for solving (non-) linear quadratic optimal control problems with applications to describe the flight of a quadrotor , J. Comput. Appl. Math. 262 (2014), pp.223–233.
- 3[3] B. Beckermann and L. Reichel , Error estimates and evaluation of matrix functions via the Faber transform , SIAM J. Numer. Anal., 47.5 (2009), pp. 3849–3883.
- 4[4] P. Benner and H. Mena , Numerical solution of the infinite-dimensional LQR problem and the associated riccati differential equations , J. Numer. Math., De Gruyter (accepted), (2016), DOI: 10.1515/jnma-2016-1039.
- 5[5] Rosenbrock methods for solving Riccati differential equations , IEEE Trans. Autom. Control 58.11 (2013), pp. 2950–2956.
- 6[6] P. Benner and J. Saak , A semi-discretized heat transfer model for optimal cooling of steel profiles , In: Dimension reduction of large-scale systems, vol. 45, Lecture Notes in Computational Science and Engineering, P. Benner, V. Mehrmann, and D. Sorensen, Eds. Berlin/Heidelberg, Springer, 2005, pp. 353–356.
- 7[7] D.A. Bini, B. Iannazzo and B. Meini , Numerical solution of algebraic Riccati equations , SIAM, Philadephia, 2011.
- 8[8] M.A. Botchev, V. Grimm and M. Hochbruck , Residual, restarting, and Richardson iteration for the matrix exponential , SIAM J. Sci. Comput. 35.3 (2013), pp. A 1376–A 1397.
