An inexact iterative Bregman method for optimal control problems
Frank P\"orner

TL;DR
This paper introduces an inexact iterative Bregman regularization method for optimal control problems with control constraints, demonstrating its robustness, convergence, and effectiveness through numerical experiments.
Contribution
It develops a novel inexact Bregman iterative method tailored for constrained optimal control problems, including analysis of convergence and discretization effects.
Findings
Method is robust under certain regularity conditions
Convergence of the inexact Bregman method is established
Numerical results confirm the effectiveness of the proposed algorithm
Abstract
In this article we investigate an inexact iterative regularization method based on generalized Bregman distances of an optimal control problem with control constraints. We show robustness and convergence of the inexact Bregman method under a regularity assumption, which is a combination of a source condition and a regularity assumption on the active sets. We also take the discretization error into account. Numerical results are presented to demonstrate the algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNumerical methods in inverse problems · Optimization and Variational Analysis · Contact Mechanics and Variational Inequalities
An inexact iterative Bregman method for optimal control problems111This work was funded by German Research Foundation DFG under project grant Wa 3626/1-1.
Frank Pörner222Department of Mathematics, University of Würzburg, Emil-Fischer-Str. 40, 97074 Würzburg, Germany, E-mail: [email protected]
Abstract
In this article we investigate an inexact iterative regularization method based on generalized Bregman distances of an optimal control problem with control constraints. We show robustness and convergence of the inexact Bregman method under a regularity assumption, which is a combination of a source condition and a regularity assumption on the active sets. We also take the discretization error into account. Numerical results are presented to demonstrate the algorithm.
AMS Subject Classification: 49N45, 49M30, 65K10
Keywords: optimal control, source condition, Bregman distance, inexact Bregman method
1 Introduction
We consider an optimization problem of the following form:
[TABLE]
Here , is a bounded, measurable set, a Hilbert space and a given function. The operator is supposed to be linear and continuous and inequality constraints are prescribed on the set . Here, we have in mind to choose as the solution operator of a linear partial differential equation. The special case where is defined as the solution of
[TABLE]
will be treated in detail in section 5.
A well-known method to solve (P) is the proximal point method (PPM) introduced by Martinet [21] and developed by Rockafellar [27]. This method is also known as iterated Tikhonov regularization, see [8, 10, 13]. The PPM is an iterative method, and the next iterate is given as the solution of
[TABLE]
with some given initial starting value . Here is a sequence of non-negative real numbers. One can hope to obtain convergence without the additional requirement that the regularization parameters tend to zero. Unfortunately this is not the case in general, since there exists a counter-example by Güler [9]. There only weak convergence is obtained. However this method is well understood, see e.g. [17, 16, 15, 14, 28] and the references therein.
For the PPM method it is interesting to investigate the robustness with respect to numerical errors. Denote by
[TABLE]
the exact solution, which in general cannot be computed exactly. Since it is clear that this problem has a unique solution. Due to numerical errors we only obtain an approximate solution which satisfies . The sequence can be interpreted as the accuracy of the computed solution. One can hope to achieve convergence of the sequence if is chosen appropriately. The iterates generated by the proximal point method converge weakly to a solution of (P) if the condition
[TABLE]
holds, see [16, 27]. If the state is not attainable, i.e. there exists no feasible control such that holds, the optimal solution might be bang-bang. Here is the set of all feasible controls
[TABLE]
This means it is a linear combination of characteristic functions. Hence the solution may not be in and it is unlikely that a source condition holds in this case, see [32].
To handle this non-attainability we considered in [25] an iterative method based on generalized Bregman distances. There, the iterate is given by the solution of
[TABLE]
where is called the (generalized) Bregman distance [3] associated with a regularization function with subgradient . Here we have additional freedom in choosing the regularization function . This method was first applied to an image restoration problem, where was chosen to be the total variation, see [4, 23]. Our approach was to incorporate the control constraints into the regularization functional, resulting in
[TABLE]
Here is the indicator function from convex analysis. This choice allowed us to prove strong convergence under a suitable regularity assumption, which allows bang-bang structure and non-attainability, see [25]. In the case of noisy data we established an a-priori stopping rule in [24].
The aim of this paper is to analyse the robustness of the iterative method presented in [25] with respect to numerical errors. We replace the operator in (1.2) by a linear and continuous operator with finite-dimensional range . This makes the problem (1.2) numerically solvable, but introduces an additional discretization error. If is the solution operator of a linear elliptic partial differential equation and is spanned by linear finite elements, then this can be interpreted as the variational discretization in the sense of Hinze [11].
We aim to establish sufficient conditions on the sequence comparable to (1.1), to ensure convergence.
This paper is structured as follows. In section 2 we recall our iterative method, our regularity assumption and some convergence results. The operator is then introduces in section 3. Furthermore we present an a-posteriori error estimator for the discretized subproblem, which allows to control the accuracy of the iterates. In section 4 we establish our inexact Bregman iteration and show robustness and convergence results under the presence of numerical errors using our regularity assumption. As an example we consider in section 5 the optimal control of the heat equation. We construct the operator and show its properties. Furthermore numerical results are presented for a bang-bang example. Finally conclusions are drawn in section 6.
Notation.
For elements , we denote the -Norm by . Furthermore is a generic constant, which may change from line to line, but is independent from the important variables, e.g. .
2 Assumptions and preliminary results
Let , be a bounded, measurable domain, a Hilbert space, linear and continuous. We are interested in the solution to problem (P). Here we assume and such that . Hence the set of admissible controls is non-empty. By
[TABLE]
we will denote our functional to be minimized.
2.1 Existence of solutions
Using classical arguments we can deduce existence of solutions.
Theorem 2.1**.**
Under the assumptions listed above the problem (P) has a solution. If the operator is injective the solution is unique.
Let denote a solution of (P) with state and adjoint state . Note that due to the strict convexity of with respect to the optimal state is uniquely defined. We now have the following result, see also [25].
Theorem 2.2**.**
We have the relation for almost all
[TABLE]
and the following variational inequality holds:
[TABLE]
2.2 Bregman iteration
In [25] we started to investigate an iterative method to solve (P) based on generalized Bregman distances. The Bregman distance [3] for a regularization functional at is given by
[TABLE]
where . We incorporate the control constraints into the regularization functional
[TABLE]
Let us recall some important properties of the regularization functional and the Bregman distance. The next result can also be found in [25, Lemma 2.3].
Lemma 2.3**.**
Let be non-empty, closed, and convex. The functional
[TABLE]
is convex and nonnegative. Furthermore the Bregman distance
[TABLE]
is nonnegative and convex with respect to .
In the following we define to be the -projection onto the set . Our algorithm is now given by: (see [25, 4])
Algorithm A**.**
Let , and .
Solve for :
[TABLE] 2. 2.
Choose . 3. 3.
Set , go back to 1.
Here is a bounded sequence of non-negative real numbers. In the next theorems we summarize some properties of the algorithm. The proofs can be found in [25]. In the following we use the abbreviation
[TABLE]
Let us first recall a convergence result in terms of the functional .
Theorem 2.4**.**
Algorithm A is well-posed and we have for all . Let be a solution of (P). We then have
[TABLE]
Furthermore we have the monotonicity property of the sequence with respect to the Bregman distance
[TABLE]
and
[TABLE]
We also established a general convergence result in terms of the controls.
Theorem 2.5**.**
Weak limit points of the sequence generated by Algorithm A are solutions to the problem (P). Furthermore we obtain strong convergence of the states
[TABLE]
where is the uniquely determined optimal state of (P). If in addition is the unique solution of (P), we obtain
In order to establish convergence rates for the iterates of Algorithm A we have to assume some regularity on the solution of (P). A common assumption on a solution is the following source condition, which is an abstract smoothness condition, see, e.g., [4, 5, 12, 22, 30, 32]. We say satisfies the source condition SC if the following assumption holds.
Assumption SC** (Source Condition).**
Let be a solution of (P). Assume that there exists an element such that holds.
This assumption is too restrictive as in many cases the solution is bang-bang, i.e. a linear combination of characteristic functions, hence discontinuous. But in many applications the range of contains or , hence the Assumption SC is not applicable in this case. To overcome this, we use the regularity of the adjoint state. We say satisfies the source condition ASC if the following assumption holds. In the following we define to be the indicator function of the set . Recall that the adjoint state is defined by .
Assumption ASC** (Active Set Condition).**
Let be a solution of (P) and assume that there exists a set , a function , and positive constants such that the following holds
(source condition) and
[TABLE] 2. 2.
(structure of active set) and for all
[TABLE] 3. 3.
(regularity of solution) .
Assumption ASC is a generalization of Assumption SC, since for both assumptions coincide. A sufficient condition for Assumption ASC can be found in [6]. If satisfies
[TABLE]
Assumption ASC is fulfilled with and . Since Assumption SC omits more regularity, we expect to establish improved results in this case. The regularity assumption ASC is used in e.g. [30, 32, 29, 25].
Using this regularity assumptions we established in [25] the following convergence results.
Theorem 2.8**.**
Let be the sequence generated by Algorithm A. Assume that Assumption SC holds for . Then
[TABLE]
If we assume that instead Assumption ASC holds, then
[TABLE]
Note that we have
[TABLE]
see [25]. If Assumption ASC holds with , which implies that is bang-bang on , we can improve the estimate of Theorem 2.8 to
[TABLE]
The sequence of subdifferentials is unbounded in general, but we can show that the weighted average is converging. The proof can be found in [25, Corollary 4.14].
Lemma 2.9**.**
We have
[TABLE]
3 The discretized problem
The aim of this section is to introduce the operator and to establish auxiliary estimates for the discretized subproblem. These estimates will then be applied to prove convergence results in section 4.
3.1 The operator
As mentioned in the introduction we want to introduce a family of linear and continuous operators from to with finite-dimensional range . Throughout this paper we make the following assumption. A similar assumption is also made in [31].
Assumption 3.1**.**
Assume that there exists a continuous and monotonically increasing function with such that
[TABLE]
holds for all , and .
For the case of a linear elliptic partial differential equation, the operator is the solution operator of the weak formulation with respect to the test function space . If is spanned by linear finite elements, this can be interpreted as the variational discretization in the sense of Hinze, see [11]. We consider a linear elliptic partial differential equation in section 5. We assume that the operator and its adjoint can be computed exactly.
Note that 3.1 is an assumption on the approximation of discrete functions. Under Assumption 3.1 we can establish the following discretization error estimate. The proof is similar to [31, Proposition 1.6] and is omitted here.
Lemma 3.2**.**
Let be the solution of
[TABLE]
and be the solution of the discretized problem
[TABLE]
with and . Then we have the following estimate
[TABLE]
with the abbreviation .
Please note that the norm of the operator is bounded in the following sense.
Lemma 3.3**.**
Let . Then there exists a constant independent from , such that .
Proof.
We compute the operator norm of and estimate
[TABLE]
∎
In the subsequent analysis we will need the following estimate.
Lemma 3.4**.**
There exists a constant independent from , such that the following estimate holds for all
[TABLE]
Proof.
We compute with
[TABLE]
Please note that we used the continuity of and the assumption on the operator . ∎
As a corollary we obtain the following result.
Lemma 3.5**.**
Let for . Then there exists a constant independent from and such that the following estimate holds
[TABLE]
3.2 A-posteriori error estimate for the discretized subproblem
We now want to consider the discretized subproblem, i.e. we replaced the operator in the minimization problem (step 1) of algorithm A with the discrete operator . This gives the following problem
[TABLE]
This problem can be rewritten as the equivalent minimization problem (3.1), see also [25]. For brevity we set and .
[TABLE]
To construct an a-posteriori error estimate we use Theorem 2.2 in [20], which will give us the following result. Note that we also use Lemma 3.3 here.
Theorem 3.6**.**
Let be the solution of the subproblem (3.1). Let be given and define and . Let with . Then there exists a constant independent from such that
[TABLE]
This results allows us to estimate the distance to the exact solution of the subproblem. Note that the problem (3.1) is uniquely solvable if , see [25].
For abbreviation we set
[TABLE]
Let be an approximate solution to the discretized subproblem (3.1). The quantity then is an upper bound for the accuracy of . This is part of the next result. The proof follows directly with Lemma 3.3 and Theorem 3.6.
Lemma 3.7**.**
Assume that . Let be the solution of the discretized subproblem (3.1). Then there exists a constant independent from such that the following implication holds for all and :
[TABLE]
Let us close this section with the following remark. As mentioned in [11] the solution of the discretized subproblem (3.1) can be approximated with arbitrary accuracy. This will play a role in the analysis presented in the next section.
4 Inexact Bregman iteration
Solving the subproblem
[TABLE]
exactly is very costly and in general not possible. We therefore suggest the following inexact Bregman iteration which can be interpreted as an inexact version of Algorithm A.
Inexact Bregman iterations are analysed in the literature, see e.g. [7, 18, 19, 1] for a finite dimensional approach, and for an abstract Banach space setting, see [26].
Algorithm B**.**
Let , and .
Find with and such that
[TABLE] 2. 2.
Set
[TABLE] 3. 3.
Set , go back to 1.
Here is a given sequence of positive real numbers controlling the accuracy of the approximate solution . For for all and Algorithm A is obtained.
The analysis of Algorithm A presented in [25] is based on the fact that . This is guaranteed by the construction of . However, since and in general, we cannot expect that holds.
Before we start to establish robustness results we want to give an overview over the different auxiliary problems we are going to use. Furthermore we want to introduce and clarify our notation.
4.1 Notation and auxiliary results
The aim of this section is to summarize the most important notations and abbreviations. Our aim is to solve the unregularized problem
[TABLE]
This problem is solvable and we want to specify a solution . We assume, that this function satisfies one of the regularity assumptions SC or ASC. In Algorithm A we have to solve the following regularized problem. We will refer to this as subproblem
[TABLE]
with some and . Here the (exact) unique solution is denoted with . The superscript ex stands for exact solution.
However, since the operator is not computable in general, we introduced the operator , which is an approximation of . We now replace with in (4.1) and obtain the discretized subproblem
[TABLE]
Again this problem is unique solvable and its solution is denoted with . The subscript indicates that it is a discrete solution. Under suitable assumptions we can estimate the discretization error between and . This is done in Theorem 3.2.
Please note that neither nor are computed during the algorithm. As mentioned above we can approximate with arbitrary precision. So we compute an inexact solution of (3.1), which is denoted with . We use the function to measure the accuracy.
To control the accuracy during the algorithm we introduce a sequence of positive real values. In each iteration we now search for a function such that .
In the end we want to estimate the error . This is done by triangular inequality
[TABLE]
Note that is controlled by the accuracy and is limited by the discretization error. It remains to estimate the regularization error with the help of the regularity assumptions.
We also want to recall the following definitions, as they will appear quite often.
[TABLE]
4.2 Convergence under Assumption SC
We now start to analyse Algorithm B with satisfying Assumption SC.
Theorem 4.1**.**
Let satisfy Assumption SC and let be a sequence of positive real numbers. Furthermore let be given and let be a sequence generated by Algorithm B. Then we have the estimate
[TABLE]
with the abbreviations
[TABLE]
Proof.
The proof is based on the splitting of the error in three parts, see (4.2)
[TABLE]
Here is controlled by the given accuracy and can be estimated with the help of Lemma 3.2:
[TABLE]
It is left to estimate . We start with adding the optimality conditions for and , see [25, Lemma 3.1] and Theorem 2.2,
[TABLE]
Addition yields
[TABLE]
For the term we estimate with help of the source condition SC
[TABLE]
To estimate the remaining term (-\lambda_{k}^{\mathrm{in}},u^{\dagger}-u_{k+1}^{\mathrm{ex}}\big{)}) we introduce the quantity
[TABLE]
This quantity will be helpful in the subsequent analysis. Let us sketch the next steps. First we will replace the operator by in order to apply the first order conditions for . Second we eliminate the unknown exact solution by its approximation . For the first part we make use of Lemma 3.5 and estimate
[TABLE]
Now we eliminate the variable by using the first order conditions for presented in Theorem 2.2
[TABLE]
Since the variable is unknown we replace it by its approximation
[TABLE]
Now we use (4.6) and (4.7) in (4.5) and obtain
[TABLE]
In the next step we plug (4.4) and (4.8) in (4.3)
[TABLE]
Before we proceed we need two additional results. A calculation reveals that
[TABLE]
holds. Second we obtain
[TABLE]
Furthermore we use Young’s inequality and (4.10) to establish for :
[TABLE]
This now yields
[TABLE]
with and the abbreviations
[TABLE]
Summation over finally reveals
[TABLE]
where we used the convention . The result now follows by triangular inequality.
∎
Let us point out that the variables can be identified with the accuracy of the iterates and while the are only influenced by the discretization. This result above can now be interpreted in different ways. First we start with the (theoretical) case that we can evaluate the operator and its dual . This refers to the case where .
Corollary 4.2**.**
Let satisfy Assumption SC and let be a sequence of positive real numbers such that
[TABLE]
Furthermore assume that and let be a sequence generated by Algorithm B. Then we have in .
The other interesting case is, that we can solve the discretized subproblem exactly, i.e. for all . Here we obtain convergence in the following sense.
Corollary 4.3**.**
Let satisfy Assumption SC. Let be given and for all . Then there exists a constant C such that for every there exists a stopping index such that
[TABLE]
and as . Furthermore as .
Proof.
We only have to show the existence of such a stopping index. The convergence result then is a direct consequence of Theorem 4.1. Let us define the following auxiliary variables
[TABLE]
It is clear that as . Now choose sufficiently large such that
[TABLE]
Now pick . Since is a monotonically increasing function function we get the existence of , such that
[TABLE]
Hence, the following expression is well-defined
[TABLE]
It is left to show that as . Assume that this is wrong, hence there exists a such that for all . This yields
[TABLE]
However, since and are independent from this is a contradiction for small enough. This finishes the proof. ∎
If the disretized subproblem is only solved inexactly we can establish the following result. The proof is a combination of Corollary 4.2 and Corollary 4.3.
Corollary 4.4**.**
Let satisfy Assumption SC and let be a sequence of positive real numbers such that
[TABLE]
Let be given. Then there exists a constant C such that for every there exists a stopping index such that
[TABLE]
and as . Furthermore as .
4.3 Convergence under Assumption ASC
Let us now consider the case when Assumption ASC is satisfied.
Theorem 4.5**.**
Let satisfy Assumption ASC and let be a sequence of positive real numbers. Furthermore let be given and let be a sequence generated by Algorithm B. Then we have the estimate
[TABLE]
with the abbreviations
[TABLE]
Proof.
The proof mainly follows the idea of Theorem 4.1. Again the main part is to establish estimates for the regularization error for . First we want to estimate the term using Assumption ASC. We use [25, Lemma 4.12] and obtain
[TABLE]
This inequality introduces an additional -term. To compensate this term we use an improved optimality condition, which is valid under Assumption ASC
[TABLE]
with . For a proof we refer to [25, Lemma 4.11]. Similar to (4.6) we compute
[TABLE]
Now we estimate the term using Young’s inequality
[TABLE]
We now consider the following inequality, similar to (4.3)
[TABLE]
and use again the equality
[TABLE]
As done in Theorem 4.1 we obtain with that
[TABLE]
Combining everything now reveals with some
[TABLE]
Now we plug in our estimate (4.14) and obtain
[TABLE]
As in the proof of Theorem 4.1 we apply triangular inequality to finish the proof.
∎
Let us now establish convergence results similar to Corollary 4.2 and 4.3.
Corollary 4.6**.**
Let satisfy Assumption ASC and let be a sequence of positive real numbers such that . Furthermore assume that and let be a sequence generated by Algorithm B. Then we obtain
[TABLE]
as .
Proof.
The sequence is bounded by a constant . Hence we have the following inequalities for large enough
[TABLE]
Furthermore we have by [25, Lemma 3.5] that
[TABLE]
We now obtain
[TABLE]
which finishes the proof. ∎
Corollary 4.7**.**
Let satisfy Assumption ASC. Let be given and for all . Then there exists a constant C such that for every there exists a stopping index such that
[TABLE]
and as . Furthermore
[TABLE]
as .
Proof.
The proof is very similar to the proof of Corollary 4.3. ∎
A combination of both results yields the following corollary.
Corollary 4.8**.**
Let satisfy Assumption ASC and let be a sequence of positive real numbers such that . Let be given. Then there exists a constant C such that for every there exists a stopping index such that
[TABLE]
and as . Furthermore
[TABLE]
as .
5 Numerical example
Now, let be defined as the (weak) solution of the linear partial differential equation for a convex set ()
[TABLE]
Let us show that this example fit into our framework. Clearly, for equation (5.1) has a unique weak solution , and the associated solution operator is linear and continuous. For the choice we obtain .
Let us now report on the discretization and the operator . We follow the argumentation and results presented in [31, Section 3]. Let be a regular mesh which consists of closed cells . For we define . Furthermore we set . We assume that there exists a constant such that for all . Here we define to be the diameter of the largest ball contained in .
For this mesh we define an associated finite dimensional space , such that the restriction of a function to a cell is a linear polynomial.
The operator is now defined in the sense of weak solutions. We set if solves
[TABLE]
We also obtain in the discrete case. Let us now mention that the operator satisfy Assumption 3.1. Following [31] and the references therein we obtain the following result.
Lemma 5.1**.**
Assume that there exists a constant such that holds. Then we have the estimates
[TABLE]
for and a constant independent from and .
Hence Assumption 3.1 is satisfied with .
Let us quickly resort on the computation of the solution of (3.1). In [24, Section 4] we applied a variational discretization and a semi-smooth Newton solver to this problem. The space was defined as the span of linear finite elements. This gives us approximate solutions such that , and . Here the control can be computed as the truncation of a finite element. For more details we refer to [24, 6, 2] and the references therein.
We now consider the following optimal control problem. Note that due to the linearity of this is of form (P).
[TABLE]
We use the inexact Bregman method B to solve (5.2). With the choice of , , and
[TABLE]
the functions are a solution to (5.2). Here the solution satisfies assumption ASC with and . We use different mesh sizes for comparison and plot the error for the first iterations in Figure 1, 2 and 3. Furthermore we set and to satisfy the assumptions of Corollary 4.8. As expected we see that for we obtain convergence for . The coarsest mesh has and the finest mesh has approximately degrees of freedom.
6 Conclusion
We showed that our iterative method is robust against numerical errors. Furthermore we established error estimates and convergence result both for errors introduced by the accuracy of the computed iterates and by the discretization. We constructed an a-posteriori error estimator for the discretized subproblem and provided numerical results.
Together with the exact a-priori regularization estimates [25] and the convergence results obtained for noisy data [24], we conclude that the Bregman iterative method is a stable and robust method to compute solutions for our model problem (P).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] A. Benfenati and V. Ruggiero. Inexact Bregman iteration with an application to Poisson data reconstruction. Inverse Problems , 29(6):065016, 31, 2013.
- 2[2] S. Beuchler, C. Pechstein, and D. Wachsmuth. Boundary concentrated finite elements for optimal boundary control problems of elliptic PD Es. Comput. Optim. Appl. , 51(2):883–908, 2012.
- 3[3] L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. Ussr Computational Mathematics and Mathematical Physics , 7:200–217, 1967.
- 4[4] M. Burger, E. Resmerita, and L. He. Error estimation for Bregman iterations and inverse scale space methods in image restoration. Computing , 81(2-3):109–135, 2007.
- 5[5] G. Chavent and K. Kunisch. Convergence of Tikhonov regularization for constrained ill-posed inverse problems. Inverse Problems , 10(1):63–76, 1994.
- 6[6] K. Deckelnick and M. Hinze. A note on the approximation of elliptic control problems with bang-bang controls. Comput. Optim. Appl. , 51(2):931–939, 2012.
- 7[7] J. Eckstein. Approximate iterations in Bregman-function-based proximal algorithms. Math. Programming , 83(1, Ser. A):113–123, 1998.
- 8[8] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems , volume 375 of Mathematics and its Applications . Kluwer Academic Publishers Group, Dordrecht, 1996.
