Multiscale differential Riccati equations for linear quadratic regulator problems
Axel M{\aa}lqvist, Anna Persson, Tony Stillfjord

TL;DR
This paper introduces a multiscale approach using localized orthogonal decomposition to efficiently approximate solutions to differential Riccati equations in linear quadratic regulator problems with multiscale operators, achieving high accuracy with low computational cost.
Contribution
It develops a novel multiscale method for Riccati equations that maintains accuracy independent of multiscale variations and provides detailed analysis and validation.
Findings
Second-order convergence in $L^2$ operator norm
First-order convergence in energy norm
Effective low-rank computations for large-scale problems
Abstract
We consider approximations to the solutions of differential Riccati equations in the context of linear quadratic regulator problems, where the state equation is governed by a multiscale operator. Similarly to elliptic and parabolic problems, standard finite element discretizations perform poorly in this setting unless the grid resolves the fine-scale features of the problem. This results in unfeasible amounts of computation and high memory requirements. In this paper, we demonstrate how the localized orthogonal decomposition method may be used to acquire accurate results also for coarse discretizations, at the low cost of solving a series of small, localized elliptic problems. We prove second-order convergence (except for a logarithmic factor) in the operator norm, and first-order convergence in the corresponding energy norm. These results are both independent of the multiscaleâŠ
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\headers
Multiscale Riccati equations for LQR problemsAxel MÄlqvist, Anna Persson and Tony Stillfjord
\newsiamthmassumptionAssumption \newsiamremarkremarkRemark
Multiscale differential Riccati equations for linear quadratic regulator problemsâ â thanks: Submitted to the editors 2017-06-13.
\fundingThis work was supported by the Swedish Research Council under grant no. 2015-04964.
Axel MÄlqvist Mathematical Sciences, Chalmers University of Technology and the University of Gothenburg (, , .) [email protected]
ââ
Anna Persson22footnotemark: 2
ââ
Tony Stillfjord22footnotemark: 2
Abstract
We consider approximations to the solutions of differential Riccati equations in the context of linear quadratic regulator problems, where the state equation is governed by a multiscale operator. Similarly to elliptic and parabolic problems, standard finite element discretizations perform poorly in this setting unless the grid resolves the fine-scale features of the problem. This results in unfeasible amounts of computation and high memory requirements. In this paper, we demonstrate how the localized orthogonal decomposition method may be used to acquire accurate results also for coarse discretizations, at the low cost of solving a series of small, localized elliptic problems. We prove second-order convergence (except for a logarithmic factor) in the operator norm, and first-order convergence in the corresponding energy norm. These results are both independent of the multiscale variations in the state equation. In addition, we provide a detailed derivation of the fully discrete matrix-valued equations, and show how they can be handled in a low-rank setting for large-scale computations. In connection to this, we also show how to efficiently compute the relevant operator-norm errors. Finally, our theoretical results are validated by several numerical experiments.
keywords:
Multiscale, localized orthogonal decomposition, finite elements, linear quadratic regulator problems, differential Riccati equations
{AMS}
49N10, 65N12, 65N30, 93C20
1 Introduction
In a linear quadratic regulator (LQR) problem, the state is a model of a system whose evolution can be influenced through the input . The goal is to drive certain measurable quantities of the system, the output , to a given target which is typically zero. The relations between , and are given by the state and output equations
[TABLE]
where , and are given operators. The optimal input function is found by minimizing the cost functional
[TABLE]
where , and are given weighting factors. It can be shown (see e.g. [1, 21]) that is given in feedback form as , where is the solution to an operator-valued differential Riccati equation (DRE):
[TABLE]
In the case of a nonzero output target, one additional differential equation for the evolution of has to be solved.
In this paper, we consider the case when the operator exhibits multiscale behaviour. In particular, we consider diffusion problems where the spatial variation of the diffusion coefficient is on a fine scale compared to the computational domain. This e.g. occurs in the modeling of composite materials and flows in porous media. Numerically approximating the solutions to elliptic or parabolic equations given by such operators in the usual way is difficult, because a very fine discretization is necessary to resolve the fine-scale structure. These difficulties are exacerbated when considering DREs such as Eq. 3, as their solution essentially requires solving many parabolic equations.
A by now well established method for multiscale elliptic and parabolic problems is the localized orthogonal decomposition (LOD)Â [25, 15]. It is a modification of the finite element method (FEM), which incorporates some of the fine-scale structure into a coarse discretization by precomputing a series of localized fine-scale problems. Due to the localization, these are much cheaper to evaluate than the full fine-scale problem and may additionally be solved in parallel.
We note that finite elements were introduced for the approximation of optimal control problems already in the 1970âs, see e.g. [27, 10, 14, 36], and the field has grown much in several different directions since then. When diffusion problems have been considered, the focus has typically been on constant or slowly varying diffusion. Recently, however, also optimal control problems of multiscale type have been considered in e.g. [11, 12, 22]. None of these consider the LOD approach, instead preferring homogenization or asymptotic expansions. Additionally, a common assumption is that the multiscale features are periodic, which is frequently not the case in applications.
The focus in this paper is on the approximation of DREs such as Eq. 3. In contrast to the forward-adjoint approach, which solves a specific optimal control problem, the DRE provides the feedback laws for all problems defined by the operators . While more expensive to solve, it can be precomputed and reused in many different situations. We refer to [8, 21] for an overview of Riccati theory, with the latter reference treating very general problems.
Our main result is that LOD-approximations to the solution of Eq. 3 with a mesh size converge with order in the operator norm to a given accurate fine-scale FEM approximation. The convergence in the corresponding operator energy norm is shown to be of order . We note that -convergence of FEM approximations to the exact solution of Eq. 3 has previously been shown in [18], and similar results for algebraic Riccati equations can be found in [21]. (See also [30, 6] for convergence results without orders in related settings.) However, the error constants in these results depend on the multiscale variations of , and thus such convergence is not observed in practice. This is not the case for our present results.
For practical computations, also a temporal discretization is necessary; for this we consider a low-rank splitting scheme as introduced in [32]. Such methods decompose the DRE into its affine and nonlinear parts and approximate these separately, thereby greatly reducing the computational cost. The affine problem requires the approximation of several parabolic equations involving in each time step. As the computational efficiency gain for LOD increases with the number of times the modified basis may be reused, splitting schemes are thus particularly well suited to be combined with the LOD method.
We demonstrate how to transform the FEM and LOD discretizations into matrix-valued equations, and how to implement the fully discrete methods. Even if LOD reduces the need for very fine discretizations, large 2D or 3D-problems may still yield large matrices. We therefore consider the low-rank approach, which greatly reduces the necessary amount of computations. As a side effect, this also allows us to compute errors in the operator norms very efficiently.
A brief outline of the paper is as follows: We formalize the setting and our basic assumptions in Section 2, and define the different spatial discretizations in Section 3. Convergence of the LOD approximations with the appropriate order is then shown in Section 4. The matrix-valued formulations of the discretized DREs and related questions are discussed in Section 5, while Section 6 is devoted to the temporal discretization and low-rank setting. Finally, we present several numerical experiments and their results in Section 7.
2 Setting
Let , , be a bounded polygonal/polyhedral domain. We consider the separable Hilbert spaces , , and , where corresponds to the state space, is the control space and is the observation space. In the following, the specification of will be omitted. We write and for the inner product and norm on , and denote the corresponding quantities on , and by subscripts. To define the state evolution operator , we assume that the inner product on is given, with assumptions on given below. Then is defined by and .
Further, let the input operator and the output operator be given. We also consider the output and input weighting operators and (which could be included in and but are typically not) and the final state weighting operator . By â, we denote Hilbert-adjoint operators with respect to , so that e.g. satisfies for all and . Finally, we denote the linear bounded operators from one generic Hilbert space, , to another, , by . When , we abbreviate .
In this notation, the weak form of Eq. 3 is to find satisfying
[TABLE]
for all .
{assumption}
The diffusion coefficient is symmetric and satisfies
[TABLE]
In addition, , , , is invertible with and .
The first part of Section 2 shows that is a bounded and coercive bilinear form, which means that is the generator of an analytic semigroup , see e.g. [34, Theorem 3.6.1]. In conjunction with the boundedness assumptions on , , and , this guarantees the existence and uniqueness of a solution to Eq. 4. In fact, there is even a classical solution to Eq. 3 [8, Part IV, Ch. 3], which means that the term can be extended to an operator in . As a consequence, Equation Eq. 4 holds also for . We note that these conclusions are valid also under various weaker forms of Section 2, which additionally permit the treatment of boundary control and observation [21]. A discussion on an extension of our results to such a setting may be found in Section 8.1.
3 Spatial discretization
We first introduce the FEM approximation of Eq. 4. To this end, we let be a triangulation of with meshwidth and internal nodes. The subspace denotes the space of continuous and piecewise affine functions on , and we denote the corresponding nodal basis functions by . This discretization is referred to as the fine, or sometimes also reference, mesh, see further Section 3.1 below.
We also consider a coarse discretization space for , with the corresponding family of triangulations , which is assumed to be quasi-uniform. For these triangulations, we let be the largest ball contained in the triangle and denote by the shape regularity of the mesh, defined by
[TABLE]
Furthermore, we let denote the identity operator between these spaces, i.e. for all . Similarly, is the identity operator mapping into and is the -orthogonal projection of onto .
The semi-discretized weak form of Eq. 3 is defined by
[TABLE]
for all and with satisfying . Here, the operators , and satisfy
[TABLE]
for all , and . We note that can be proven to be self-adjoint, so we additionally require that is self-adjoint.
For the coarse discretization, we have the same equation but with instead of . We observe that the coarse and fine operators are related in the following way:
[TABLE]
and that the natural extension of to a map on is given by . Here, is the -orthogonal projection of onto .
3.1 Localized orthogonal decomposition
If is varying on a small scale of size , then the classical FEM approximation of a parabolic problem may yield poor results, unless is sufficiently small to resolve the fine-scale variations. That is, we typically do not observe -convergence until , which requires infeasible amounts of computation. The same behaviour occurs for the -discretizations of Eq. 4.
To this end, we assume that is sufficiently small so that is a good approximation of . That is, , and we refer to as the reference solution. The aim is now to approximate by using a multiscale space of the same dimension as the coarse space . To obtain such a space, we use the localized orthogonal decomposition (LOD) method introduced in [25], which incorporates fine-scale information in the coarse-scale space. The construction involves the solution of several fine-scale, but localized and parallelizable, problems. We briefly summarize the procedure here and refer to [25, 15], for the details.
To define the multiscale space , we first introduce an interpolation operator that fulfills
[TABLE]
for all triangles , where . In this paper we use the weighted Clément interpolant as in [25]. Let denote the kernel of ,
[TABLE]
and note that can be decomposed as , meaning that every can be written as with . We now introduce the (global) correction operator by
[TABLE]
and define the (global) multiscale space as , with . This leads to the decomposition with the orthogonality , , . Note that is the orthogonal projection onto with respect to the inner product , i.e. the Ritz projection onto , and is the orthogonal complement to . From the construction it follows that . Indeed, a basis for is given by .
In general, the corrections have global support and are expensive to compute, since they are posed in the entire fine scale space . To overcome this, it is observed that the corrections have exponential decay away from the :th node of (see [25, 15]), which motivates a truncation of the corrections. For this purpose, we define patches of size around each by the following:
[TABLE]
Further, we define to be the restriction of to the patch . For brevity, we do not include the dependence on in the notation. Now note that the correction operator can be written as the sum where
[TABLE]
We can now localize these computations by replacing with . Define such that
[TABLE]
Finally, we can define a local operator and a localized space , with .
The approximation properties (and the required computational effort) of the space depends on the choice of . In [15] it is proven that convergence of order is obtained if is chosen proportional to . In this paper we therefore assume that to avoid explicitly stating the dependence on .
To define an LOD-approximation to the solution in Eq. 5, we additionally need to introduce the identity operator , . Its -adjoint is the -orthogonal projection of onto . Replacing the space with then results in the problem to find satisfying
[TABLE]
for all and the initial condition . Here, the operators , and are given by
[TABLE]
for all , and . Similar to Eq. 6 we have
[TABLE]
The natural -extension of is given by , similar to the -case.
Since has the same dimension as , there is a lower-dimensional representative for , given by . By inserting and , with , in Eq. 7 we see that
[TABLE]
and we consequently define the corrected coarse-scale operators
[TABLE]
4 Error analysis
In the following, denotes a generic constant which may take different values at different occasions. It may depend on the problem data and the size of the domain, but is independent of and . Moreover, it does not depend on the multiscale variations of , i.e. any derivatives of . We start by gathering some useful results:
4.1 Preliminaries
Recall that is the identity mapping, denotes the -orthogonal projection onto , and is the -orthogonal projection onto . We have , i.e. we first project onto and then onto . Straightforward calculations show the following:
Lemma 4.1**.**
*Under Section 2, it holds that , and . *
Further, let denote the solution operator to the equation , i.e. the semigroup generated by . Similarly, is the semigroup generated by . Because generates an analytic semigroup on , these operators are analytic semigroups on and , respectively. More specifically, we have
Lemma 4.2**.**
Under Section 2, the operators
[TABLE]
*are both in for , with the uniform bounds and . *
By arguing as in [24], but for the (simpler) semi-discrete case, we have (choosing
Lemma 4.3**.**
For it holds that
[TABLE]
*Here, the constant depends on , , , and , but not on the multiscale variations of . *
Proof 4.4**.**
We only comment briefly on the proof here. Let and , for . By introducing the Ritz projection satisfying for all we get, see [35, Chapter 3] and [24],
[TABLE]
where . From the error bounds of in [25], see also [24], we get
[TABLE]
*The regularity estimates , for , [35, Lemma 2.5], completes the proof. *
Finally, from Lemma 4.1, we get the existence and uniqueness of solutions and to the discretized DREs Eq. 5 and Eq. 7, respectively. Let us abbreviate
[TABLE]
Then we have
Lemma 4.5**.**
There is a constant which is independent of the multiscale variations of but may depend on and , such that
[TABLE]
*for . *
4.2 Error analysis
We are now ready for the main theorem of this paper:
Theorem 4.6**.**
Suppose that Section 2 is fulfilled. Then for it holds that
[TABLE]
*Here, the constant depends on , , , , and , but not on the multiscale variations of . *
Proof 4.7**.**
We utilize the integral form of Eq. 5. If solves Eq. 5 then it satisfies
[TABLE]
(see e.g. [8, Chapter IV:3, Proposition 2.1]). Recalling that and are the identity operators on and , respectively, and using Eq. 8 therefore shows that
[TABLE]
as well as
[TABLE]
(Note the in the first term, since we suppose .)
Subtracting these expressions yields
[TABLE]
so that
[TABLE]
We observe that for all (with a generic Hilbert space ) it holds that . Thus, using Lemmas 4.2, 4.3 and 4.5 we get
[TABLE]
Additionally using Lemma 4.1 shows that the last two integrands satisfy
[TABLE]
Due to the singularity at in the bound on , we split the integrals of the remaining -terms into two parts. For , we find
[TABLE]
where we have used for the crude estimate , since a -term already appears in the bounds of and . The same bound holds for , and, by Lemma 4.1 and Lemma 4.5, also for and . In conclusion, we thus have
[TABLE]
*which by Grönwallâs lemma yields the statement of the theorem. *
Remark 4.8**.**
*In the common situation that , corresponding to the case of no final state penalization, the -singularity disappears. *
Remark 4.9**.**
*We note that a bound of the same form has been shown in [18] for the FEM error. However, the error constant then depends on the variations in , and one does not observe the given convergence order until . *
Similar to the parabolic case, the error bound becomes less singular near if we measure in the -norm. To prove this we need the following, slightly stronger, assumptions on the operators (cf. Section 2):
{assumption}
In addition to Section 2, , , and . Moreover, we assume that the mesh is of a form such that is stable in the -norm.
Remark 4.10**.**
*In particular, quasi-uniform meshes satisfy Section 4.2. We refer to [2] for a discussion on more general permissible meshes. *
Theorem 4.11**.**
Suppose that Section 4.2 is fulfilled. For it holds that
[TABLE]
*Here, the constant depends on , , , , and , but not on the multiscale variations of . *
Proof 4.12**.**
We start by noting that . Furthermore, since is stable in the -norm, the following bound holds
[TABLE]
Now, note that if the initial data then we may instead of Lemma 4.3 prove the following, less singular, error bound
[TABLE]
In addition, parabolic regularity gives the bounds .
Note that , so from Section 4.2 it follows that
[TABLE]
Similarly, , and we have
[TABLE]
As in the proof of Theorem 4.6 we can write the difference as a sum of eight terms so that
[TABLE]
For we have
[TABLE]
and similarly we prove , where we have used that
[TABLE]
which is bounded due to Section 4.2. Using the bounds Eqs. 10 and 11 we get
[TABLE]
and
[TABLE]
*By applying Grönwallâs lemma we obtain the desired error bound. *
5 Matrix-valued formulation
To perform actual computations, we write the finite-dimensional equations on matrix form by expressing the equations in the FEM or LOD bases. To this end, let the function and the operator have the vector and matrix representations and , i.e.
[TABLE]
Since exactly the same results hold for upon replacing by , we frequently omit the sub- and superscripts in the following manipulations. They will be reinstated later when we compare different discretizations. The coordinates satisfy
[TABLE]
where denotes the (symmetric) mass matrix, . Unfortunately, we will not recover the usual form of the matrix-valued DRE when working in these coordinates. Therefore, we perform the change of variables
[TABLE]
Coincidentally, this means that we actually have
[TABLE]
Equation 5 is equivalent to
[TABLE]
for , and since , the first term becomes
[TABLE]
Likewise, with the (negative) stiffness matrix , the second and third terms become
[TABLE]
due to the symmetry of and . (Recall that we search for a self-adjoint operator .) Finally, the last two terms can be written
[TABLE]
where , , , and , denote orthonormal bases for and , respectively. Summarizing, we can write the equation on matrix form as
[TABLE]
Similar to the relations between the fine and coarse operators Eq. 6, it is easily shown that their matrix representations satisfy
[TABLE]
where is the prolongation matrix that satisfies if and . By expressing the functions in terms of , it can be seen that , where is the :th node of . Thus the coarse systems are easily constructed when the fine system is known. Note, however, that the matrix representation of is not but .
For the LOD case, we let and be the matrix representations of and , respectively. To compute them efficiently, we follow [13]. Then
[TABLE]
where is symmetric and satisfies
[TABLE]
with the matrices
[TABLE]
Finally, we note that if , and , then in coordinates we have . This means that the matrix representation of is .
5.1 Error computation
We measure the quality of different approximations as the -normed distance to a reference approximation at the final time . In order to find a matrix representation for this, we first observe that since , we have
[TABLE]
But for , so since we also get
[TABLE]
To compute the -norm it is thus enough to test with . Again omitting the sub- and superscripts, we have that , and similarly
[TABLE]
Since is symmetric positive definite, we may do a Cholesky factorization , and the change of variables yields
[TABLE]
where denotes the standard spectral matrix norm. Recalling the matrix representation of , we now get that
[TABLE]
The LOD error is completely analogous, using instead and .
A similar approach also allows us to compute -errors. Let be a Cholesky factorization of . Then
[TABLE]
We also get that is bounded from below by , i.e. the latter quantity can be thought of as an equivalent norm. Since is triangular, the extra cost required for the computation of is negligible. If the low-rank formulation is used (see Section 6.1), only a small number of linear equation systems involving needs to be solved, reducing the cost even further.
6 Temporal discretization
We discretize the matrix-valued DREs in time by means of a low-rank splitting scheme, since the basic operation in such methods is the application of , i.e. essentially solving a parabolic problem. Let denote a fixed time step, and let , , be the time discretization of the interval . We split Equation Eq. 15 into two parts, , where
[TABLE]
Then the Strang splitting approximation at time is given by , with and
[TABLE]
Here, the solution operators and satisfy
[TABLE]
where the first equality is apparent from the integral formulation Eq. 9, while the second is easily verified by differentiation.
The low-rank version of the method relies on the assumption that the solution has low rank. This is general true for LQR problems and dramatically reduces the computational cost. In that case, we may factorize , where and with the rank . Also and , and thus the iterates , may then be factorized in such a way. After a reformulation, is very cheap to compute, and the computation of reduces to an evaluation of (plus preliminary, similar work for the integral term). The latter operation is equivalent to solving , , and the matrix is thus never explicitly inverted. For further details, we refer to [32, 33].
6.1 Low-rank errors
Also the error computations outlined in Section 5.1 benefit from being formulated in a low-rank setting. Assume that and with and with , and let be a Cholesky factorization. By setting
[TABLE]
we see that , and it follows that
[TABLE]
Since is not necessarily an eigenvalue decomposition, we cannot immediately determine the norm by inspection. However, performing a QR-factorization is cheap if the number of columns is low, and can also be diagonalized cheaply. (This is precisely the column compression procedure which is applied in each time step.) We acquire , for some , where .
For errors in the -norm, we do not get a symmetric matrix as above. But if we can still write
[TABLE]
with the same , and with
[TABLE]
We can cheaply -factorize both and ; this means that
[TABLE]
where is a small matrix.
7 Numerical experiments
We have performed a number of numerical experiments in order to verify our a priori error bounds for the LOD discretizations, and to demonstrate their efficiency in comparison to the classical FEM.
In all experiments, we compute the relevant matrices for both FEM and LOD by using efficient code written by Fredrik Hellman and Daniel Elfverson111Available on request from Fredrik Hellman, .. These pre-solve computations were run on a IntelÂź Coreâą i5-4690 processor. We note that the localized elliptic fine-scale problems were not solved in parallel. Doing so would further improve the performance of LOD.
For approximating the solutions to the DREs, we employ in all cases the low-rank Strang splitting scheme (as described in Section 6) with time steps. This ensures that the temporal error is small compared to the spatial error, which is our interest here. Our implementation utilizes the DREsplit222Available on request from Tony Stillfjord, , or from http://www.tonystillfjord.net. library. These computations were performed on resources at Chalmers Centre for Computational Science and Engineering (C3SE) provided by the Swedish National Infrastructure for Computing (SNIC). Each simulation used a single IntelŸ XeonŸ E5-2650 v3 processor.
The multiscale diffusion coefficients considered in the numerical examples are of two distinct types. In Examples 1, 2, and 4 we consider a piecewise constant coefficient, generated randomly with no spatial correlation, that varies on a fine scale, see Fig. 1. In Examples 3 and 5 takes two values. One value in the background and one in the thin channels, see Fig. 6. This is a common setup for reinforced (composite) materials. Both these cases are challenging for the finite element method.
7.1 Example 1
In this first example, we consider diffusion on the unit square. More specifically, we take and set \mathcal{A}x=\nabla\cdot\big{(}\kappa\nabla x\big{)} with Dirichlet boundary conditions. Here, is piecewise constant on a square grid of size and taking randomly chosen values in ; see Fig. 1 for an illustration. We consider independent inputs and define the input operator as the sum
[TABLE]
Thus we can control the system on three small squares. As the output operator we take the mean, i.e. . We choose and to be the identity operators and take .
For the discretization in space, we start with a coarse mesh containing triangles, and then refine this times, giving meshes with triangles, for . One additional refinement provides the reference grid with triangles. This results in matrices , and , , with (since we only consider the interior nodes).
The approximations are compared only at the final time, in the - and -norms as outlined in Section 5.1, and the computed errors are shown in Fig. 2. We see that the classical FEM initially struggles due to not resolving the multiscale coefficient properly, but converges with order when the mesh becomes fine enough. The LOD approach converges with order also for the coarse meshes, and additionally results in approximations that are about one order of magnitude more accurate. The plot to the right shows the errors against the actual computation time, including the time spent on constructing the LOD bases. As can be seen, this extra effort is low enough that except for the most inaccurate cases it is always worthwhile to use the LOD approach.
7.2 Example 2
Here, we consider an L-shaped domain , where has been removed from the unit square. The diffusion coefficient is piecewise constant on a square grid of size and taking random values in . We use one control input, given by the characteristic function of the square , and one output, the mean over the square . The meshes are setup as in the previous example, but now with interior nodes ( for the reference solution). The time discretization and other parameters are the same as in the previous example.
The results are shown in Fig. 3. Due to the reentrant corner the errors behave more erratically than in the previous example, but LOD is still clearly first- and second-order convergent in contrast to standard FEM, which performs very poorly. We also observe that LOD is more efficient in all but the coarsest cases.
7.3 Example 3
We again consider the setting of Example 1, but replace the diffusivity constant. Here, takes the constant value everywhere, except for in horizontal stripes where it is . The stripes are centered around the heights , , and have a width of .
The results are shown in Fig. 4. This time, the detrimental effect on the FEM discretization is even more pronounced, with almost no convergence until the thin stripes can be resolved. The LOD approximations are once again more accurate for all . We note that the -error is not quite in this case, but rather close to as predicted by Theorem 4.6. Like in the previous example, computing the LOD bases is cheap enough that the LOD approach is more efficient in all but the least accurate cases.
7.4 Example 4
In this example, we deviate from the basic setting described in Section 4 by considering a boundary control application. All parameters except for the boundary conditions and the input operator are the same as in Example 1. We call the union of the top and bottom edges of the unit square and impose homogeneous Dirichlet boundary conditions there. The left and right edges we denote and , respectively, and there we impose nonhomogeneous Neumann boundary conditions. In particular, with the outward-pointing normal denoted by , we consider functions satisfying
[TABLE]
Here, and are the two control inputs, and
[TABLE]
is a fixed function. The operator now corresponds to x\mapsto\nabla\cdot\big{(}\kappa\nabla x\big{)} on the space with no conditions imposed on , , while the (unbounded) operator implements the Neumann boundary conditions. We refrain from elaborating further on this here, and simply note that the FEM matrix representation becomes
[TABLE]
Fur further details on the proper abstract framework, see e.g. [21] and Section 8.1.
Since Section 2 is no longer satisfied, we may not apply Theorem 4.6. However, the results plotted in Fig. 5 are similar to the results in previous examples. Again, the LOD approximations are more efficient except for the very coarsest meshes. This indicates that our theory could be extended also to the case of unbounded operators and .
7.5 Example 5
As a final experiment, we consider another boundary control application. The domain is formed like a lying U, see Fig. 6. The thickness of each of the âhandlesâ is , the total horizontal extent and the vertical extent . Inside the domain are three evenly spaced stripes with a diameter of . As previously, we consider \mathcal{A}u=\nabla\cdot\big{(}\kappa\nabla u\big{)} where everywhere except for in the stripes where instead . We use homogeneous Neumann boundary conditions over the whole boundary, except for the two vertical sections on the left. On the top-most vertical part, , we impose a nonhomogeneous Neumann condition with having the same hat-shaped form as in Example 4. On the bottom vertical part, , we impose a homogeneous Dirichlet condition. These correspond to an insulated edge, a controllable heat input and a heat sink, respectively. The operator is again given by , and as output we take the mean of the temperature over the domain; . The meshes in this example have interior nodes, respectively, while the reference solution uses .
The results are plotted in Fig. 7, where we can once again observe error behaviour consistent with the bounds given in Theorem 4.6.
Remark 7.1**.**
*In all the experiments, we have chosen the fine-scale structure of the multiscale coefficient such that the reference FEM solution can resolve it, since otherwise we can not properly compute the respective errors. Decreasing the size of the fine-scale features even further would mean that the FEM convergence is further delayed, while we may still compute accurate LOD approximations. In such a case, the efficiency of LOD in comparison to FEM is further (greatly) improved. *
Remark 7.2**.**
*We note that the finest discretizations of the demonstrated numerical experiments are representative of large-scale DRE problems. While there is of course no strict limit, the authors would at the time of writing classify large-scale as problems of size . Due to the matrix-valued nature of the equations, this is naively equivalent to solving a vector-valued differential equation of size . While we do not employ a naive method, the number of unknowns still number in the millions. For readers more familiar with the theory of algebraic Riccati equations (AREs), i.e. the stationary counterparts of DREs, we note that one time step for a DRE solver is roughly equivalent to the solution of one corresponding ARE. The computational effort for solving a DRE is therefore usually at least two orders of magnitude higher, and large-scale in the ARE setting is therefore larger; starting rather at around . We also note that the approximations were here computed on the equivalent of a modern desktop computer. With the increasing availability of parallelization on clusters or GPUs, we expect to see a shift towards even larger problems in the near future. However, for multiscale problems such as these, it is still critical to employ LOD. *
8 Generalizations and future work
In this section we provide some notes on possible extensions of our theory and draw connections to related problems and methods.
8.1 Boundary control
Boundary control applications such as Example 4 occur frequently within the field of optimal control. Then either the input or output operator (or both) acts on the boundary of the computational domain. In order to put such problems into the semigroup framework, one has to allow for unbounded operators and  [21]. Clearly, our convergence analysis is no longer valid in that case, since we can no longer guarantee that or that . However, it is typically assumed that and are not too unbounded. More specifically, if we suppose that and , where , we cover a large class of applications. Here, , denotes fractional powers of which exist due to Section 2. They give rise to the spaces as the completions of in the norm . When we then have that , and by properly extending also the other involved operators to we may follow the line of proof of Theorem 4.6 and show convergence in .
Obviously, this is a sub-optimal estimation, as is a weaker norm than for . However, from [21, Theorem 1.2.1.1] we have that is actually bounded in , at least away from . It therefore seems likely that one could use similar ideas to prove that the same holds for , in which case we would have convergence in . Unfortunately, the theory required for such estimations is rather extensive, and we expect it to be even more so for the LOD approximations. We therefore leave such questions as future work.
8.2 Systems of equations and applications in multiphysics
In this paper we consider problems where the evolution operator in the state equation defines an inner product of the form . However, many interesting applications requires coupled systems to be modeled appropriately, for instance, multiphysical features such as thermoelasticity [9], which describes temperature and displacement in a material. Another example is the singularly perturbed systems [17, 29], which appear when modeling, for instance, fluid catalytic crackers. These are ill-conditioned problems due to a significantly larger time derivative for one (or more) of the equations.
The LOD method has successfully been applied to thermoelasticity and poroelasticity problems, see [23]. With more complicated models, the computational gain in using a coarse representation of the underlying partial differential equation is even greater. Analysis of such problems should be considered in the future.
8.3 Other time discretizations
It should also be noted that the LOD approach could be used with other time discretizations of the DRE. We have here chosen the Strang splitting scheme due to it being familiar to one of the authors and because an efficient implementation was readily available. However, there are also other types of splitting schemes [33, 28]. Additionally, one might instead consider e.g. BDF and Rosenbrock methods [6, 5, 20], projection-based methods [16] or even peer methods [19]. These depend on solving linear equation systems rather than computing the solutions to parabolic problems, and the error analysis approach would thus differ. However, bounds similar to that given in Lemma 4.3 naturally exist also for stationary problems [26].
8.4 Algebraic Riccati equations
The latter fact is even more relevant if one considers algebraic Riccati equations (AREs). These are the stationary counterpoints to the time-dependent DREs and arise when the final time in the cost functional goes to infinity. In this case, splitting does not apply, but we may still apply LOD to the equation to reduce its complexity. Then any method for AREs may be applied to solve this smaller problem, such as Newton-Kleinman ADI [4], rational Krylov subspace methods [31], or RADI [3]. See also [7] for a survey. Clearly, for each of these cases one would have to perform an error analysis such as the one provided in this paper.
Acknowledgments
We are grateful to Fredrik Hellman for his assistance with the code for computing the LOD bases.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Abou-Kandil, G. Freiling, V. Ionescu, and G. Jank , Matrix Riccati equations , Systems & Control: Foundations & Applications, BirkhĂ€user, Basel, 2003, https://doi.org/10.1007/978-3-0348-8081-7 . · doi â
- 2[2] R. E. Bank and H. Yserentant , On the H 1 superscript đ» 1 H^{1} -stability of the L 2 subscript đż 2 L_{2} -projection onto finite element spaces , Numer. Math., 126 (2014), pp. 361â381, https://doi.org/10.1007/s 00211-013-0562-4 . · doi â
- 3[3] P. Benner, Z. BujanoviÄ, P. KĂŒrschner, and J. Saak , RADI: a low-rank ADI-type algorithm for large scale algebraic Riccati equations , Numer. Math., 138 (2018), pp. 301â330, https://doi.org/10.1007/s 00211-017-0907-5 . · doi â
- 4[4] P. Benner, J.-R. Li, and T. Penzl , Numerical solution of large-scale Lyapunov equations, Riccati equations, and linear-quadratic optimal control problems , Numer. Linear Algebra Appl., 15 (2008), pp. 755â777, https://doi.org/10.1002/nla.622 . · doi â
- 5[5] P. Benner and H. Mena , Rosenbrock methods for solving Riccati differential equations , IEEE Trans. Automat. Control, 58 (2013), pp. 2950â2956, https://doi.org/10.1109/TAC.2013.2258495 . · doi â
- 6[6] P. Benner and H. Mena , Numerical solution of the infinite-dimensional LQR problem and the associated Riccati differential equations , J. Numer. Math., 26 (2018), pp. 1â20, https://doi.org/10.1515/jnma-2016-1039 . · doi â
- 7[7] P. Benner and J. Saak , Numerical solution of large and sparse continuous time algebraic matrix Riccati and Lyapunov equations: a state of the art survey , GAMM-Mitt., 36 (2013), pp. 32â52, https://doi.org/10.1002/gamm.201310003 . · doi â
- 8[8] A. Bensoussan, G. Da Prato, M. C. Delfour, and S. K. Mitter , Representation and control of infinite dimensional systems , Systems & Control: Foundations & Applications, BirkhĂ€user Boston, Inc., Boston, MA, second ed., 2007, https://doi.org/10.1007/978-0-8176-4581-6 . · doi â
