Spectral Analysis of Saddle-point Matrices from Optimization problems with Elliptic PDE Constraints
Fabio Durastante, Isabella Furci

TL;DR
This paper characterizes the spectral properties of saddle-point matrices from PDE-constrained optimization problems, revealing a GLT structure that leads to improved preconditioning strategies for iterative solvers.
Contribution
It uncovers the GLT structure in these matrices, enabling sharper spectral analysis and the development of optimal preconditioners for iterative methods.
Findings
Identification of the GLT structure in saddle-point matrices
Sharper spectral characterization of the matrices
Development of optimal preconditioners for GMRES and Flexible-GMRES
Abstract
The main focus of this paper is the characterization and exploitation of the asymptotic spectrum of the saddle--point matrix sequences arising from the discretization of optimization problems constrained by elliptic partial differential equations. We uncover the existence of a hidden structure in these matrix sequences, namely, we show that these are indeed an example of Generalized Locally Toeplitz (GLT) sequences. We show that this enables a sharper characterization of the spectral properties of such sequences than the one that is available by using only the fact that we deal with saddle--point matrices. Finally, we exploit it to propose an optimal preconditioner strategy for the GMRES, and Flexible-GMRES methods.
| 10 | 74 | 100 | 26 | |
| 20 | 353 | 400 | 47 | |
| 40 | 1421 | 1600 | 179 | |
| 80 | 5694 | 6400 | 706 |
| GMRES | FGMRES+PCG+IC | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| N | IT | T(s) | IT | T(s) | IT | T(s) | IT | T(s) | IT | T(s) | |
| 1.0e-03 | 147 | - | 3 | 3.0e-03 | 3 | 2.5e-03 | 3 | 4.4e-03 | 3 | 4.5e-03 | |
| 675 | - | 3 | 6.4e-03 | 3 | 3.7e-03 | 3 | 4.7e-03 | 3 | 4.6e-03 | ||
| 2883 | - | 3 | 1.0e-02 | 3 | 9.9e-03 | 3 | 7.3e-03 | 3 | 7.3e-03 | ||
| 11907 | - | 2 | 3.1e-02 | 2 | 3.0e-02 | 2 | 2.3e-02 | 2 | 2.3e-02 | ||
| 48387 | - | 2 | 2.1e-01 | 2 | 1.7e-01 | 2 | 1.5e-01 | 2 | 1.5e-01 | ||
| 195075 | - | 2 | 9.4e-01 | 2 | 9.0e-01 | 2 | 7.3e-01 | 2 | 7.4e-01 | ||
| 783363 | - | 1 | 2.1e+00 | 1 | 2.0e+00 | 1 | 2.9e+00 | 1 | 2.9e+00 | ||
| 1.0e-06 | 147 | - | 15 | 4.3e-03 | 15 | 4.1e-03 | 15 | 1.5e-03 | 15 | 1.5e-03 | |
| 675 | - | 14 | 1.3e-02 | 14 | 1.2e-02 | 14 | 1.7e-02 | 14 | 1.7e-02 | ||
| 2883 | - | 9 | 3.0e-02 | 9 | 3.0e-02 | 10 | 2.4e-02 | 10 | 2.4e-02 | ||
| 11907 | - | 6 | 1.0e-01 | 6 | 9.3e-02 | 6 | 7.0e-02 | 6 | 7.4e-02 | ||
| 48387 | - | 4 | 3.1e-01 | 4 | 3.0e-01 | 4 | 2.5e-01 | 4 | 2.1e-01 | ||
| 195075 | 86 | 3.8e+00 | 2 | 8.7e-01 | 2 | 8.4e-01 | 2 | 7.8e-01 | 2 | 7.6e-01 | |
| 783363 | 80 | 3.0e+01 | 2 | 4.3e+00 | 2 | 4.3e+00 | 2 | 4.5e+00 | 2 | 4.6e+00 | |
| 1.0e-09 | 147 | - | 27 | 9.6e-03 | 27 | 8.7e-03 | 27 | 3.2e-03 | 27 | 3.4e-03 | |
| 675 | - | 54 | 6.3e-02 | 54 | 5.8e-02 | 54 | 7.9e-02 | 54 | 7.9e-02 | ||
| 2883 | - | 52 | 2.0e-01 | 52 | 2.1e-01 | 52 | 1.6e-01 | 52 | 1.6e-01 | ||
| 11907 | - | 33 | 5.8e-01 | 33 | 6.1e-01 | 33 | 5.2e-01 | 33 | 5.5e-01 | ||
| 48387 | - | 20 | 1.5e+00 | 20 | 1.5e+00 | 43 | 4.8e+00 | 42 | 4.8e+00 | ||
| 195075 | 86 | 2.8e+00 | 33 | 1.3e+01 | 33 | 1.3e+01 | 37 | 3.0e+01 | 36 | 2.9e+01 | |
| 783363 | 80 | 3.0e+01 | 33 | 2.5e+01 | 33 | 2.5e+01 | - | - | |||
| GMRES preconditioned by | ||||||||
|---|---|---|---|---|---|---|---|---|
| 1.0e-09 | N | 147 | 675 | 2883 | 11907 | 48387 | 195075 | 783363 |
| IT | 4 | 5 | 6 | |||||
| T(s) | 1.0e-02 | 5.4e-03 | 1.7e-02 | - | - | - | - | |
| FGMRES | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| GMRES | PCG/BiCGstab+IC/ILU | ||||||||||
| N | IT | T(s) | IT | T(s) | IT | T(s) | IT | T(s) | IT | T(s) | |
| 1.0e-03 | 147 | - | 5 | 8.7e-01 | 5 | 4.7e-03 | 5 | 7.7e-03 | 5 | 7.0e-03 | |
| 675 | - | 5 | 9.6e-03 | 5 | 8.7e-03 | 5 | 7.1e-03 | 5 | 6.4e-03 | ||
| 2883 | - | 4 | 7.2e-02 | 4 | 2.7e-02 | 4 | 1.4e-01 | 4 | 9.5e-03 | ||
| 11907 | - | 3 | 8.6e-01 | 3 | 8.9e-02 | 3 | 4.8e-02 | 3 | 3.4e-02 | ||
| 48387 | - | 3 | 1.1e+00 | 3 | 4.4e-01 | 3 | 3.9e-01 | 3 | 2.5e-01 | ||
| 195075 | - | 2 | 1.7e+00 | 2 | 1.7e+00 | 2 | 1.7e+00 | 2 | 1.1e+00 | ||
| 783363 | - | 2 | 8.5e+00 | 2 | 8.9e+00 | 2 | 7.1e+00 | 2 | 7.4e+00 | ||
| 1.0e-06 | 147 | - | 24 | 1.5e-02 | 24 | 1.4e-02 | 24 | 2.9e-02 | 24 | 2.4e-02 | |
| 675 | - | 26 | 4.7e-02 | 26 | 4.6e-02 | 26 | 3.6e-02 | 27 | 3.3e-02 | ||
| 2883 | - | 24 | 1.7e-01 | 24 | 1.6e-01 | 24 | 7.2e-02 | 25 | 6.0e-02 | ||
| 11907 | - | 22 | 6.9e-01 | 22 | 7.0e-01 | 22 | 4.0e-01 | 24 | 2.9e-01 | ||
| 48387 | - | 19 | 2.8e+00 | 19 | 2.8e+00 | 19 | 2.5e+00 | 22 | 1.9e+00 | ||
| 195075 | - | 17 | 1.4e+01 | 17 | 1.4e+01 | 17 | 1.4e+01 | 18 | 1.2e+01 | ||
| 783363 | - | 14 | 5.9e+01 | 14 | 6.1e+01 | 14 | 7.9e+01 | 14 | 5.9e+01 | ||
| 1.0e-09 | 147 | - | 38 | 3.8e-02 | 38 | 3.5e-02 | 38 | 4.2e-02 | 38 | 4.4e-02 | |
| 675 | - | 73 | 1.5e-01 | 73 | 1.6e-01 | 73 | 1.2e-01 | 87 | 1.4e-01 | ||
| 2883 | - | 84 | 6.5e-01 | 73 | 6.5e-01 | 86 | 3.3e-01 | 73 | 3.7e-01 | ||
| 11907 | - | 94 | 3.5e+00 | 94 | 3.4e+00 | 97 | 2.1e+00 | 97 | 1.9e+00 | ||
| 48387 | - | 87 | 1.4e+01 | 87 | 1.4e+01 | 87 | 1.2e+01 | 87 | 1.1e+01 | ||
| 195075 | - | 77 | 6.8e+01 | 77 | 6.8e+01 | - | - | ||||
| 783363 | - | 66 | 5.2e+02 | 66 | 5.4e+02 | - | - | ||||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Spectral Analysis of Saddle–point Matrices from Optimization problems with Elliptic PDE Constraints††thanks: This work was partially supported by INdAM-GNCS
project “Tecniche innovative per problemi di algebra lineare” (2018), and by the Tor Vergata University “MISSION: SUSTAINABILITY” project “NUMnoSIDS”, CUP E86C18000530005.
Fabio Durastante Istituto per le Applicazioni del Calcolo “Mauro Picone”. Consiglio Nazionale delle Ricerche, Napoli, Italy. ([email protected]).
Isabella Furci Department of Mathematics and Informatics. University of Wuppertal, Wuppertal, Germany. ([email protected]).
Abstract
The main focus of this paper is the characterization and exploitation of the asymptotic spectrum of the saddle–point matrix sequences arising from the discretization of optimization problems constrained by elliptic partial differential equations. We uncover the existence of an hidden structure in these matrix sequences, namely, we show that these are indeed an example of Generalized Locally Toeplitz (GLT) sequences. We show that this enables a sharper characterization of the spectral properties of such sequences than the one that is available by using only the fact that we deal with saddle–point matrices. Finally we exploit it to propose an optimal preconditioner strategy for the GMRES, and Flexible–GMRES methods.
keywords:
Saddle–point matrices, Optimal control, GLT theory, Preconditioning
{AMS}
62M15, 65F08, 15B05
1 Introduction
Linear systems with saddle–point matrices arises in a wide context of applications and have attracted a great deal of attention [5, 2]. In general form, they can be simply stated as the family of linear systems where the left–hand side is given by a block–matrices of the form
[TABLE]
We are interested here in the analysis of their spectral properties in the very specific context of the discretized version of optimal constraint problems [33]
[TABLE]
where, is a fixed constant that acts as a Tikhonov regularization parameter, is a cost functional, is the domain of both the state and the control , and and are two disjoint sets that represent the Dirichlet and Neumann boundary respectively and have the whole boundary as union.
Spectral properties of the general case (1.1) have been indeed thoroughly analyzed [26, 22, 6, 3, 23, 18, 31, 7] under several hypotheses on the blocks of , e.g., , semipositive definite, symmetric and positive definite, and so on. The goal of the latter works has been to provide a sharp localization bounds for their spectrum, and exploit them to devise efficient iterative solvers for such problems. Here we focus on a less general objective, i.e., we intend to exploit finer information on the structure of the blocks of (1.1), a knowledge coming from the coupling of the source problem (1.2) and its discretization, to give an asymptotic description of the spectrum of the matrices . Specifically, we show that the saddle–point form of obtained from (1.1) hides inside another structure, namely, that the sequence of matrices is a Generalized Locally Toeplitz (GLT) sequence [28, 16]. This enables us to obtain a sharper localization of its asymptotic spectrum. Furthermore, we use this characterization to suggest an effective preconditioning strategy for such problems. We stress that an approach of this type has already been exploited for both the saddle–point matrices obtained from a two–dimensional linear elasticity–type problem in [11], and partially explored in [10, 12] for a constrained optimization problem where the constraints were Fractional Differential Equations.
The paper is therefore divided as follows, in Section 2 we describe the discrete form of (1.2) fully specifying the sequence of matrices . In Section 3 we recall the essential tools needed for working with GLT sequences and apply them to our problem, while in Section 4 we exploit them to devise an efficient preconditioning strategy. In Section 5 we substantiate our claims with some numerical examples, and give conclusions in Section 6.
2 From the Continuous Problem to the Saddle–point sequence
The first point we need to answer is how we obtain the sequence of saddle–point matrices from (1.2), indeed a way of doing so is going through its Langrangian formulation. Thus, we find the Lagrangian of (1.2) as
[TABLE]
where represents the PDE constraint as an operator between the Banach spaces and , and is the Adjoint status between the space and its dual acting as Lagrange multiplier. Indeed, a solution for the original constrained optimization problem (1.2) is a stationary point for the Lagrangian (2.3). To obtain such stationary point we require that the Gâteaux derivative with respect to each of the variables of (2.3) is zero, i.e.,
[TABLE]
These are called, in general, the first order optimality conditions or the Karush-Kuhn-Tucker conditions (KKT-conditions) for Problem (1.2). Finally, for obtaining such characterization we have to fully specify the operator , and consequently all the functional spaces , and . The prototypical elliptic problem in this class is represented by the Poisson distributed control
[TABLE]
where represents the forcing term.
The KKT conditions for problem (2.4) are expressed as
[TABLE]
By posing and choosing we can rewrite conditions (2.5) in weak form as:
[TABLE]
Finally, the sequence is obtained by fixing a Finite Element (FEM) approximation of the optimality system (2.6). This means fixing a space with over a mesh on the domain thus obtaining the linear system
[TABLE]
where
[TABLE]
are the usual (scaled) mass and stiffness matrices, and is the zero matrix of order .
2.1 Triangular Lagrangian Elements
To completely specify the linear system (2.7) we need to precise both the mesh and the basis functions , i.e., chose the element defining our discretization. We focus here on nodal Lagrangian elements [9, Chapter 5] of degree . These are built starting from , the vector space of polynomials with scalar coefficients of in of degree less than or equal to ,
[TABLE]
That is indeed a vector space of dimension . Then an homogeneous triangulation of the unit square domain is considered, i.e., a mesh consisting in 2D triangular cells with straight sides, and a lattice of nodes on each triangle; see Figure 1.
By this construction, every polynomial is uniquely determined by its values at the points . The finite element method for triangular Lagrange elements is then built on the discrete finite dimensional space
[TABLE]
and its subspace
[TABLE]
We call degrees of freedom of a function the set of the values of at the nodes on the entire mesh, then the space has exactly the dimension corresponding to the number of internal degrees of freedom, i.e., excluding the nodes on . For our model grid we find that the degrees of freedom are , where and are the number of elements in the and direction, respectively. Thus the dimension of the matrix in (2.8) will be equal to . The matrices (2.8) are then constructed by means of the opportune Gauss quadrature formulas, and in terms of the Lagrange basis functions . For all the discussion, and computation in the paper we deal with the matrices generated for such elements by the FEniCS library (v.2018.1.0) [1, 21].
3 Spectral analysis of the resulting sequence of saddle point matrices
This section is devoted to the attainment of a characterization of the spectra of a suitable scaling of the sequence of matrices in (2.7). Specifically, we are going to answer to the following questions,
- Q1
can we individuate some (possibly sharp) intervals containing the spectrum with respect to ? 2. Q2
For a given how many eigenvalues are in each interval? 3. Q3
What is the relation between the condition number of a suitably preconditioned matrix sequence and the value of the regularization parameter ?
As we mentioned in the introduction, there exist classical localization results for the eigenvalues of a symmetric saddle–point matrix, like the in (1.1).
Theorem 1** (Rusten and Winther [26]).**
Given in (1.1), assume is symmetric and positive definite, has full rank, and . Let and denote the largest and smallest eigenvalues of , and let and denote the largest and smallest singular values of . Then the spectrum of is contained in
[TABLE]
where
[TABLE]
This bound is indeed very general and versatile, since it requires only information on the symmetry/definiteness of the diagonal blocks, and on the rank of the extradiagonal ones. It can be used to obtain an estimate of the condition number of as function of in a straightforward way. To this end, an even sharper result can be obtained by means of [3, Theorem 1(c)] that permits to characterize exactly the eigenvalues with the largest and the smallest module. Nevertheless, by exploiting further information on the blocks, we show that finer answers to our question are indeed possible. Specifically, we are going to individuate three disjoint intervals , , and containing the spectrum of the scaled version of , we show that this choice is not arbitrary, and that it stems directly from the structure of the problem, and the selection of the discretization scheme.
In Section 3.1, we start recalling the tools we use, and then we deploy them to achieve these results in Section 3.2.
3.1 Background and definitions
Throughout this paper, we use the following notation. Let be the linear space of the complex matrices and let , with , , measurable set. We say that f belongs to (resp. is measurable) if all its components belong to (resp. are measurable). We denote by the -dimensional cube and define as the linear space of -variate functions , .
Moreover we indicate by , or simply , the matrix sequence whose elements are the matrices of dimensions , with , .
Definition 1**.**
Let the Fourier coefficients of a given function be defined as
[TABLE]
where and the integrals in (3.9) are computed componentwise.
Then, the th Toeplitz matrix associated with f is the matrix of order given by
[TABLE]
where and is the matrix whose th entry equals 1 if and [math] otherwise.
The set (with ) is called the family of -level Toeplitz matrices generated by f, that in turn is referred to as the generating function or the symbol of .
Moreover from (3.9) the symbol can be expressed via the Fourier series
[TABLE]
In order to deal with low–rank/small–norm perturbations and to show that they do not affect the symbol of a Toeplitz sequence, we introduce the definition of spectral distribution in the sense of the eigenvalues and of the singular values for a generic matrix-sequence , , and then the notion of GLT algebra.
Definition 2**.**
Let be a measurable function, defined on a measurable set with , . Let be the set of continuous functions with compact support over and let , , be a sequence of matrices with eigenvalues , and singular values , .
- •
is distributed as the pair in the sense of the eigenvalues, in symbols
[TABLE]
if the following limit relation holds for all :
[TABLE]
- •
is distributed as the pair in the sense of the singular values, in symbols
[TABLE]
if the following limit relation holds for all :
[TABLE]
In this setting the expression means that every component of the vector tends to infinity, that is, .
Remark 1**.**
We denote by and by the eigenvalues and the singular values of a matrix-valued function f, respectively. If f is smooth enough, an informal interpretation of the limit relation (3.12) (resp. (3.13)) is that when the matrix-size of is sufficiently large, then eigenvalues (resp. singular values) of can be approximated by a sampling of (resp. ) on a uniform equispaced grid of the domain . Analogously each following eigenvalues (resp. singular values) can be approximated by an equispaced sampling of the relative (resp. ), , in the domain.
Remark 2**.**
To perform the sampling in Remark 1 computing a closed analytical expression of any of the eigenvalue functions of f is not the most effective procedure. It is costly and, essentially, useless since for we can provide an “exact” evaluation of at the grid points without actually computing the analytical expression. Indeed the “exact” evaluation for case is achieved by
sampling at , , and thus obtain matrices, ; 2. 2.
for each , compute the eigenvalues of , , ; 3. 3.
for a fixed , the evaluation of at , is given by , .
3.1.1 Spectral analysis of Hermitian (block) Toeplitz sequences: distribution results
We collect here some classical results concerning the distribution of Hermitian (block) Toeplitz sequences from [19, 32], that we will use extensively in the following.
Theorem 2** (Grenander and Szegő [19]).**
Let be a real-valued function with . Then,
[TABLE]
In the case where f is a Hermitian matrix-valued function, according to Tilli [32], the previous theorem can be extended as follows:
Theorem 3** (Tilli [32]).**
Let be a Hermitian matrix-valued function with . Then,
[TABLE]
Remark 3**.**
If is such that each is symmetric with real symmetric blocks, then the symbol has the additional property that
[TABLE]
and therefore Theorem 3 can be restated as
[TABLE]
3.1.2 GLT sequences: operative features
We list here some properties and operative features from the theory of GLT sequences in their block form; refer to [29, 15, 17] for a full account of the GLT theory.
GLT1
Each GLT sequence has a singular value symbol for according to the second Item in Definition 2 with . If the sequence is Hermitian, then the distribution also holds in the eigenvalue sense. If has a GLT symbol we will write .
GLT2
The set of GLT sequences form a -algebra, i.e., it is closed under linear combinations, products, inversion (whenever the symbol is singular, at most, in a set of zero Lebesgue measure), and conjugation. Hence, the sequence obtained via algebraic operations on a finite set of given GLT sequences is still a GLT sequence and its symbol is obtained by performing the same algebraic manipulations on the corresponding symbols of the input GLT sequences.
GLT3
Every Toeplitz sequence generated by an function is a GLT sequence and its symbol is f, with the specifications reported in item GLT1. We note that the function f does not depend on the space variables .
GLT4
Every sequence which is distributed as the constant zero in the singular value sense is a GLT sequence with symbol [math]. In particular:
- •
every sequence in which the rank divided by the size tends to zero, as the matrix size tends to infinity;
- •
every sequence in which the trace-norm (i.e., sum of the singular values) divided by the size tends to zero, as the matrix size tends to infinity.
GLT5
If and the matrices are such that , where
- •
every is Hermitian,
- •
the spectral norms of and are uniformly bounded with respect to ,
- •
the trace-norm of divided by the matrix size converges to 0,
then the distribution holds in the eigenvalue sense.
We highlight that from the previous properties follows that a sequence of Toeplitz matrices is, up to low-rank corrections, a GLT sequence whose symbol is not affected by the low-rank perturbation.
Theorem 4**.**
[16, Section 8.4]** Let be a sequence of Hermitian matrices such that , and let be a sequence of Hermitian positive definite matrices such that and a.e. Then
[TABLE]
3.2 Spectral Analysis of the Sequence
We can now use the introduced tools to perform the spectral analysis of the matrix sequence , assuming that , . For studying it is easier to consider the equivalent distribution given by the following symmetric diagonal scaling
[TABLE]
with
[TABLE]
From the discretization of the Section 2, the elements of the matrix depend on as . Hence, the effect of the proposed scaling permits to eliminate the dependence of of the elements in , which, for large, would make the matrix ill-conditioned.
In particular the matrices , are bi-level Toeplitz matrices with generating functions
[TABLE]
and
[TABLE]
We stress that in this case the matrices and are real and symmetric. A property that we will exploit the theoretical analysis, nevertheless we keep the notation for the (1,3) block of the matrix for two reasons. On one side, for being consistent with the continuous setting, in which the adjoint is usually explicitly expressed. On the other, to keep the analogy with Section 3.3 in which we will discuss the usage of the advection-diffusion equation as constraint.
Theorem 5**.**
The matrix sequence in (3.14) is distributed in the sense of the Eigenvalues as
[TABLE]
i.e., , where
[TABLE]
Proof.
Let , be the th column of the identity matrix of size , we can define a proper permutation matrix, , , such that the th column of , is . The matrix transforms as
[TABLE]
where
- •
is the bi-level block Toeplitz generated by as in (3.11),
- •
is a small-norm matrix, with constant depending on the bandwidths of and .
This is a congruence transformation, thus if we find the distribution of the sequence we found also the distribution for the sequence . Let us observe that the nonzero entries of correspond to the indexes satisfying
[TABLE]
as shown in equation (3.20), for we find
[TABLE]
Therefore, from (3.11), the generating function f is given by the finite sum
[TABLE]
where , that is f is a linear trigonometric polynomial in the variables and with matrix coefficients from (3.18). Moreover, using the equalities in (3.18), the symbol in (3.21) can be readily simplified as
[TABLE]
Note, from the latter, that
[TABLE]
thus f is a symmetric matrix-valued function which implies that is a symmetric matrix. By Theorem 3, we conclude that
[TABLE]
While, from GLT3, we know that is a GLT sequence with symbol f. Moreover, let us observe that is a zero–distributed sequence hence . Indeed, is the permutation of a matrix that in block position (1,1) collects all the terms that contains the scaling , deriving from the (1,1) block of , and [math] anywhere else. Then it can be written as
Since the trace norm of is equal to a constant independent on , we have
[TABLE]
and hence the zero–distribution follows from GLT4. In addition, from GLT1 and the fact that is Hermitian, .
The conclusion of the Theorem is then achieved by applying GLT2 and (3.22), since this proves that is a GLT sequence with symbol , i.e., . Consequently, by recalling that is real symmetric for every and using GLT1, we deduce that the distribution result holds in the sense of the eigenvalues
[TABLE]
Furthermore, since each is symmetric and its blocks are symmetric and real, then f is such that , and therefore (3.23) can be rephrased as
[TABLE]
We can now find a first answer to the questions Q1 and Q2. For sufficiently large, let
[TABLE]
be the eigenvalues of from (3.19), i.e., of . By Remark 1, with , and equation (3.24), we discover that eigenvalues of , up to a number of outliers infinitesimal in the dimension, can be approximated by a sampling of on an opportune grid (see the following discussion). The next on the second one and the last on the sampling of . Moreover, obtaining the following proposition, as a specialized version of Theorem 1, is straightforward.
Proposition 1**.**
Let and be the essential infimum and essential supremum of respectively, for . Then, for sufficiently large, the spectrum of the matrix sequence is contained in three intervals
[TABLE]
*for . *
Proof.
From the definition of f in (3.17), , and matching with the classical analysis for saddle–point matrices in Theorem 1, we find
[TABLE]
i.e.,
[TABLE]
and
[TABLE]
From [27, Theorem 2.3], we know that the thesis holds true for and, from the relation of Theorem 5, we have that asymptotically the inclusion in (3.27) is valid, also involving the small norm correction.
To deliver an actual numerical estimate for these bounds what we need is a reasonable approximation of the eigenvalue functions , , following the procedure from Remark 2 and exploiting Theorem 5, we define the following equispaced grid on
[TABLE]
and consider the following Hermitian matrices of size
[TABLE]
Ordering in ascending way the eigenvalues of
[TABLE]
for any , an evaluation of at is given by , . For a fixed , we denote the vector of all eigenvalues , as , i.e.,
[TABLE]
and by the vector of all eigenvalues , varying , i.e.,
[TABLE]
Note that, refining the grid by increasing , we can provide the evaluation of the eigenvalue functions of f in a larger number of grid points: numerical evidences of this fact are reported in Figure 2,
in which we compare the approximation of on , contained in (ordered in ascending way) with the approximation of the same eigenvalue function on a grid that is twice as fine , contained in (ordered in ascending way as well) for every .
Then, for sufficiently large, if we order in ascending way , its extremes satisfy the following relations
[TABLE]
and we can can compute a satisfactory approximation of the from Proposition 1, e.g., by setting , and , we obtain the following approximations
[TABLE]
This clearly matches with the fact that the matrix–valued symbol is analytically singular in , i.e.,
[TABLE]
hence , nevertheless we stress again that this is not in contradiction with the fact that is non singular.
In conclusion, we can exploit Remark 1, to provide an answer to Q2 determining how many eigenvalues are asymptotically contained in each of the three blocks. According to the relations (3.24), (3.26) we expect the eigenvalues of to verify
[TABLE]
and then to identify blocks
[TABLE]
Correspondingly, we can split the vector containing the sampling of the eigenvalue functions on as follows
[TABLE]
We stress again that (3.29) allows for a number of outliers that is infinitesimal in the dimension .
For example, for (), approximately eigenvalues should be in each block, by a straightforward numerical check one obtains
[TABLE]
Therefore, we expect from that a certain number of eigenvalues of are in none of the blocks; in the example the effective eigenvalues against the expected in the second block. This is confirmed again by Figure 3 in which we highlight represent in blue the whole spectrum of and highlight in black the outliers not belonging to the blocks.
On the other hand, such a phenomenon is in line with (3.29), since the order of what is missing/exceeding is infinitesimal in the dimension . As an example, in Table 1 we compare the actual number of eigenvalues of contained in the second interval with the expected number . In such way, we succeed in counting the outliers of in , whose cardinality behaves as .
A further and more natural evidence of relation (3.24) can be obtained by comparing block by block the eigenvalues of with the sampling of the eigenvalue functions of f, that is comparing Bl1, Bl2, Bl3, with Eval1, Eval2, Eval3, respectively. Indeed we want to compare the eigenvalues of (properly ordered) with the evaluation of at , using the values that are present in the blocks of .
More precisely, we compare the elements of Evalt with the elements of Blt by means of the following matching algorithm:
- •
save the couples of to which the elements of Evalt are associated with;
- •
for a fixed find such that
[TABLE]
- •
associate to the couple corresponding to .
Making use of the previous algorithm, in Figure 4, we compare the eigenvalues of with , displayed as a mesh on , for . The eigenvalues of mimic, up to some outliers shown in the Figure 4b, the sampling of the eigenvalue functions, numerically confirming the result given in Theorem 5.
3.3 From Poisson to advection-diffusion equations
We have built the whole construction using as constraint the Poisson differential equation, this is not restrictive since the analysis can be transparently extended to encompass constraints given by a generic elliptic differential equations, i.e.,
[TABLE]
The matrix sequence (2.7) maintains the same block structure, but with a different (1,3) and (3,1) block . The latter, whenever , is no more symmetric since the new constraint is no more self–adjoint. Specifically, the new block can be decomposed into the sum of three terms,
[TABLE]
with . Therefore, the relative scaled version is given by
[TABLE]
By means of a GLT perturbation argument from Section 3.1, and exploiting the analysis in [17, Section 7.4] for the presence of lower order differential terms, we can obtain again a characterization of the eigenvalues of in (3.32) that is analogous to the one we gave in Theorem 5.
Proposition 2**.**
*The matrix sequence from (3.32) is distributed in the eigenvalue sense as the matrix–valued function from Theorem 5. *
Proof.
Follows from Theorem 5, the techniques adopted in its proof, and from GLT5 applied to , where
[TABLE]
4 An optimal preconditioning strategy
In this section we analyze an effective procedure to precondition the GMRES method for the solution of the systems (3.14), and (3.32). There exist indeed many preconditioners for the linear systems of saddle–point type exploiting their block structure, see, e.g, the review [5] the comparisons in [2], and, more specifically, the approaches described in [4, 24, 25, 20]. What we present here belongs to this class, and is built with the objective of obtaining algorithmic scalability, i.e., independence of the number of iteration from , and optimality with respect to the parameter , i.e., independence of the number of iteration also with respect to it. To achieve this kind of results the classical techniques can be broadly divided into three classes, the case of definite Hermitian preconditioners for which it is possible to retrieve a cluster of the eigenvalue sense from a cluster of the singular values [30, 24, 4], that allows also for the use of the MINRES method; the case of the indefinite Hermitian preconditioners, and non Hermitian preconditioner [25, 20]. We focus here on the last approach, while benefiting both from the spectral distribution of the sequence and of the Sections 3.2, 3.3, and from the block form of the matrices and . Specifically, we propose the following preconditioner
[TABLE]
This is clearly an indefinite, and non Hermitian matrix, nevertheless, the linear systems involving it can be easily solved by the following back–substitution procedure:
Solve ; 2. 2.
Solve ; 3. 3.
Solve .
We stress that this does not require the approximation of any of the possible Schur complements of (), thus greatly simplifying the construction of the preconditioner. Moreover, we are going to prove now that this choice provides a strong cluster at 1 for the eigenvalues of the preconditioned linear system while obtaining also the independence of . We obtain this result in two steps by means of the GLT theory showing that the matrix sequence is distributed in the sense of the eigenvalues as 1. First, in Proposition 3, we show that the eigenvalues of the preconditioned matrix are either , or the generalized eigenvalues of an auxiliary problem, then, in Lemma 1, we prove that the matrix sequence associated to the latter is indeed distributed in the eigenvalue sense as the function 1, thus obtaining that the eigenvalues of the preconditioned system are strictly clustered at .
Proposition 3**.**
Let () be the coefficient matrix in (3.14) (respectively in (3.32)), and let be the associated preconditioner from (4.33). Then, the eigenvalues of the preconditioned matrix are
- •
* for ,*
- •
* for given by the solution of the generalized eigenvalue problem*
[TABLE]
with
Proof.
For each , is an eigenvalue of the matrix if is an eigenpair of the eigenvalue problem
[TABLE]
with
[TABLE]
That is is solution of
[TABLE]
It is clear from the second and the third “block” equations that is an eigenpair for the latter problem for all the vectors in the subspace of
[TABLE]
Otherwise, if , from the third “block” equation
[TABLE]
follows
[TABLE]
And thus, by substitution, we easily find
[TABLE]
and thus the remaining eigenpairs are given by the solution of
[TABLE]
Lemma 1**.**
The matrix sequence
[TABLE]
associated to the generalized eigenvalue problem
[TABLE]
*is distributed in the eigenvalue sense as over . *
Proof.
The statement is equivalent to
[TABLE]
since, from (3.15) and (3.16), we have that and are the symmetric and positive definite matrices and , respectively.
Moreover the sequence \biggl{\{}\frac{h^{4}}{\alpha}T_{\bf n}(m)\biggr{\}}_{\bf n} is distribuited in the singular value sense as [math] over . Hence from property GLT4 plus properties GLT2-GLT3 we have that the following GLT results hold:
[TABLE]
and
[TABLE]
Exploiting again GLT 2–GLT4 we obtain that
[TABLE]
and
[TABLE]
Since the matrix is positive definite, then Theorem 4 implies
[TABLE]
and, hence, the thesis.
Remark 4**.**
Let us stress that the conclusion in Lemma 1 is again an asymptotic result for that is then valid for a fixed value of the parameter . Furthermore, it permits also an answer to Q3 characterizing the condition number of the preconditioned matrix sequence. Specifically, if we let be the matrix of the generalized eigenvectors for the pencil , i.e., if is an invertible matrix such that
[TABLE]
then we find
[TABLE]
It is then straightforward to use (3.15) and (3.16) to estimate the maximum eigenvalues of the generalized eigenvalue problem in Proposition 3 as an . This means that the asymptotic regime described in Lemma 1 is evident whenever becomes smaller than the fixed value of of the given problem.
We can now answer to question Q3 for both the matrix sequences , and of the Subsection 3.3, where in the definition of the preconditioner (4.33) plays the same role of .
Theorem 6**.**
*The matrix sequences , independently of . *
Moreover, an analogous spectral result to Theorem 6 can be given for the sequence (respectively, ), for
[TABLE]
Theorem 7**.**
*The matrix sequences , independently of . *
Proof.
The proof follows the proofs of the Proposition 3 and Lemma 1, replacing the expression of with that of .
This is indeed an example of a block–counter–triangular preconditioner in the style of [4].
Remark 5**.**
The preconditioner proposed in [4] takes the lower anti–triangular part of a different permutation of the system matrix , and considers also a different scaling. By this approach, the term that is dropped out in the preconditioner is not a correction of “small” norm, and this makes a substantial difference in the performances of the two approaches. Specifically, comparing the results of Proposition 3, with [4, Theorem 3.1], it is straightforward to observe that in the latter case it is not possible to infer a cluster of the eigenvalues of the preconditioned system, specifically, for the rearranged system
[TABLE]
The non-unit eigenvalues are the one of the matrix sequence , for which the clustering at one cannot be concluded. Similar observation can be made also for the null–space based block anti–triangular preconditioners [24] arising from the block anti–triangular factorization of the saddle–point matrix. Furthermore, one could consider the preconditioner which neglects the (3,2) block of , avoiding the reordering and the scaling. This would bring to the case where the non-unit eigenvalues are the solution of the following generalized eigenvalue problem
[TABLE]
and, then, we have a behavior analogous to the case with preconditioner (i.e., the absence of a provable cluster of the preconditioned sequence). Precisely, the non-unit eigenvalues are of the form , where, are the reciprocal of the eigenvalues of the matrix sequence .
4.1 Approximate iterative solution of the auxiliary linear systems
The application of the proposed preconditioners requires the solution of auxiliary linear systems with the matrices ,, and or, respectively, , , and obtained from (4.33). In both cases we are dealing with very common linear systems for which there exist highly efficient and specific solvers, e.g., fast Poisson solvers, multigrid methods of geometric, and algebraic type, inner–outer Krylov solver with incomplete factorization preconditioner, and several combinations of all the previous. Potentially, any optimal preconditioner for these matrices could be included in the present framework without spoiling the overall construction, the actual choice is indeed a matter of computational framework; see, e.g., [8, Chapter 3.8]. For the solution of the systems involving the mass matrix a straightforward solution is using the unpreconditioned CG method or its preconditioned version. In the latter case, we use either a modified incomplete Cholesky factorization with drop–tolerance 1e-2 or a standard algebraic multigrid. We stress that the solution of the system involving the stiffness matrix can be machine-dependent; see, e.g., Figure 5. We easily observe that the fastest solution with the required accuracy for the system involving the is obtained by using the PCG with a standard AMG preconditioner. On the other hand, for the non symmetric case we can use the BiCGstab method together with a modified incomplete LU factorization of Crout type.
Nevertheless, as we discuss in the next Section 5, the time–efficiency in the auxiliary solve it is not so crucial, observe that already the direct method gives acceptable results under this aspect. What really matters is the combination of the achieved accuracy of the auxiliary solve with the presence, and the possible accumulation, of the factor in the right–hand side of the auxiliary linear systems. This will cause for their solution by a direct method to return better performances for the lowest value of .
5 Numerical Examples
In this section we test the application of the preconditioners analyzed in Section 4 on some test problems. All the numerical tests are made on a laptop running Linux with 8 Gb memory and CPU Intel® Core™ i7–4710HQ CPU with clock 2.50 GHz and MATLAB version 9.4.0.813654 (R2018a). We recall again that all the relevant matrices and right–hand sides are generated by means of the FEniCS library (v.2018.1.0) [1, 21]; see again Section 2 for the details.
We test the solution procedure with the un–restarted GMRES method set to achieve a tolerance on the residual of tol = 1e-6, and a maximum number of iteration maxit = 100, and measure the number of iterations, and the timings in second. As test problem we consider an instance of a Poisson control problem (2.4), and one with the diffusion–advection–reaction constraint from Section 3.3.
Poisson
The first test problem is an instance of the Poisson control problem (2.4), in which we want to obtain the desired state,
[TABLE]
while using the forcing term
[TABLE]
We test the solution for regularization parameter , and collect the results in Table 2. The approximate preconditioners are applied inside the Flexible–GMRES method as discussed in Section 4.1.
What we observe is that the approximate solution are at an advantage for the higher value of , while perform poorly for the smallest . We stress that this effect is more connected to the behavior of the accuracy in the computation of the Krylov vectors inside the FGMRES method, than to the optimal behavior of the auxiliary problems. Secondarily, what we observe is indeed the optimal behavior with respect to the iteration discussed in Theorem 7. Indeed, the preconditioning routine becomes asymptotically better with the size of the problem, i.e., we get fewer iteration for bigger problems. Moreover, the decreasing of the introduces just a latency effect in the solution, i.e., the asymptotic regimes kicks in for slightly bigger problems when is smaller, we stress that this is exactly the phenomenon described in Remark 4 regarding the asymptotic relation between the value of going to zero, and the value of being fixed independently of . To overcome this limitation, one could decouple the system by neglecting the matrix , i.e., the block in (2.7), thus obtaining the preconditioner
[TABLE]
By computation analogous to the one in Remark 5, we find that the non-unit eigenvalues for this preconditioner are the ones of the matrix sequence . The non-unit eigenvalues tend to cluster at one whenever goes to zero. This means that is efficient for small values of and moderate values of and worsen for diverging values of (keeping fixed ), indeed this is confirmed by the numerical test in Table 3.
Diffusion–Convection–Reaction
The second case we consider is the problem (1.2) in which the costraint is given by the Equation (3.31), with coefficients , and . The desired state is given by the sum of the two impulses
[TABLE]
while the forcing term is given by
[TABLE]
We test the solution for regularization parameter , and collect the results in Table 4.
The results are completely analogous to the one for the Poisson case. We observe a higher number of iteration that is due to the fact that we are using an asymptotic argument both for the sequence , and for its block; see Proposition 2, and the discussion in Remark 4 for the asymptotic relationship between , and .
6 Conclusions and future developments
In this paper we have produced a characterization for the saddle–point matrices arising from the application of the discretize–then–optimize approach to quadratic optimization problems with elliptic PDE constraints highlighting the presence of an hidden Generalized Locally Toeplitz structure, i.e., we have proposed an analysis that is sharper and more informative than the one that can be obtained by looking only at the saddle–point structure. We have produced a localization of the spectrum in three intervals, up to a number of outliers infinitesimal in the dimension of the problem, and used this characterization to produce an asymptotically optimal preconditioner, i.e., a preconditioner that is independent of the value of the regularization parameter , and whose performance increases for finer grids.
We plan to extend this analysis in order that it can cover more general constraints, i.e., we would like to discuss also the case of sparse optimization, and bounded controls. Moreover, the GLT spectral analysis techniques we are using have been recently extended for becoming tools for the fast and reliable computation of generalized eigenvalues see, e.g., [13, 14], since we have analyzed the structure of the eigenvectors of our preconditioned problems (Proposition 3), we plan to investigate the possible application of deflation techniques to further accelerate our iterative methods.
Acknowledgment. We are thankful to Prof. S. Serra–Capizzano for the insightful discussions on the spectral distribution results, and to the referee whose suggestion have been extremely helpful in improving the presentation of the material.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Martin S. Alnæs, Jan Blechta, Johan Hake, August Johansson, Benjamin Kehlet, Anders Logg, Chris Richardson, Johannes Ring, Marie E. Rognes, and Garth N. Wells. The F Eni CS Project Version 1.5. Archive of Numerical Software , 3(100), 2015.
- 2[2] Owe Axelsson, Shiraz Farouq, and Maya Neytcheva. Comparison of preconditioned Krylov subspace iteration methods for PDE-constrained optimization problems: Poisson and convection-diffusion control. Numer. Algorithms , 73(3):631–663, 2016.
- 3[3] Owe Axelsson and Maya Neytcheva. Eigenvalue estimates for preconditioned saddle point matrices. Numer. Linear Algebra Appl. , 13(4):339–360, 2006.
- 4[4] Zhong-Zhi Bai. Block preconditioners for elliptic PDE-constrained optimization problems. Computing , 91(4):379–395, 2011.
- 5[5] Michele Benzi, Gene H. Golub, and Jörg Liesen. Numerical solution of saddle point problems. Acta Numer. , 14:1–137, 2005.
- 6[6] Michele Benzi and Valeria Simoncini. On the eigenvalues of a class of saddle point matrices. Numer. Math. , 103(2):173–196, 2006.
- 7[7] Luca Bergamaschi. On eigenvalue distribution of constraint-preconditioned symmetric saddle point matrices. Numer. Linear Algebra Appl. , 19(4):754–772, 2012.
- 8[8] Daniele Bertaccini and Fabio Durastante. Iterative methods and preconditioning for large and sparse linear systems with applications . Monographs and Research Notes in Mathematics. CRC Press, Boca Raton, FL, 2018.
