An inexact iterative Bregman method for optimal control problems

Frank P\"orner

arXiv:1702.04547·math.OC·August 30, 2017

An inexact iterative Bregman method for optimal control problems

Frank P\"orner

PDF

Open Access

TL;DR

This paper introduces an inexact iterative Bregman regularization method for optimal control problems with control constraints, demonstrating its robustness, convergence, and effectiveness through numerical experiments.

Contribution

It develops a novel inexact Bregman iterative method tailored for constrained optimal control problems, including analysis of convergence and discretization effects.

Findings

01

Method is robust under certain regularity conditions

02

Convergence of the inexact Bregman method is established

03

Numerical results confirm the effectiveness of the proposed algorithm

Abstract

In this article we investigate an inexact iterative regularization method based on generalized Bregman distances of an optimal control problem with control constraints. We show robustness and convergence of the inexact Bregman method under a regularity assumption, which is a combination of a source condition and a regularity assumption on the active sets. We also take the discretization error into account. Numerical results are presented to demonstrate the algorithm.

Equations245

Minimize such that \frac{1}{2} ∥ S u - z ∥_{Y}^{2} u_{a} \leq u \leq u_{b} a.e. in Ω.

Minimize such that \frac{1}{2} ∥ S u - z ∥_{Y}^{2} u_{a} \leq u \leq u_{b} a.e. in Ω.

- Δ y

- Δ y

y

Minimize such that \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k + 1} ∥ u - u_{k} ∥_{L^{2} (Ω)}^{2} u_{a} \leq u \leq u_{b} a.e. in Ω,

Minimize such that \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k + 1} ∥ u - u_{k} ∥_{L^{2} (Ω)}^{2} u_{a} \leq u \leq u_{b} a.e. in Ω,

P u_{k} := u \in U_{ad} arg min \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k + 1} ∥ u - u_{k} ∥_{L^{2} (Ω)}^{2}

P u_{k} := u \in U_{ad} arg min \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k + 1} ∥ u - u_{k} ∥_{L^{2} (Ω)}^{2}

j = 1 \sum \infty \frac{ε _{j}}{α _{j}} < \infty

j = 1 \sum \infty \frac{ε _{j}}{α _{j}} < \infty

U_{ad} := {u \in L^{2} (Ω) : u_{a} \leq u \leq u_{b}} .

U_{ad} := {u \in L^{2} (Ω) : u_{a} \leq u \leq u_{b}} .

Minimize \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k + 1} D^{λ_{k}} (u, u_{k}),

Minimize \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k + 1} D^{λ_{k}} (u, u_{k}),

J (u) := \frac{1}{2} ∥ u ∥^{2} + I_{U_{ad}} (u) .

J (u) := \frac{1}{2} ∥ u ∥^{2} + I_{U_{ad}} (u) .

H (u) := \frac{1}{2} ∥ S u - z ∥^{2}

H (u) := \frac{1}{2} ∥ S u - z ∥^{2}

u^{†} (x) ⎩ ⎨ ⎧ = u_{a} (x) \in [u_{a} (x), u_{b} (x)] = u_{b} if p^{†} (x) < 0, if p^{†} (x) = 0, if p^{†} (x) > 0,

u^{†} (x) ⎩ ⎨ ⎧ = u_{a} (x) \in [u_{a} (x), u_{b} (x)] = u_{b} if p^{†} (x) < 0, if p^{†} (x) = 0, if p^{†} (x) > 0,

(- p^{†}, u - u^{†}) \geq 0 \forall u \in U_{ad} .

(- p^{†}, u - u^{†}) \geq 0 \forall u \in U_{ad} .

D^{λ} (u, v) := J (u) - J (v) - (u - v, λ)

D^{λ} (u, v) := J (u) - J (v) - (u - v, λ)

J : L^{2} (Ω) \to R \cup {- \infty, + \infty}, J (u) := \frac{1}{2} ∥ u ∥^{2} + I_{U_{ad}} (u) .

J : L^{2} (Ω) \to R \cup {- \infty, + \infty}, J (u) := \frac{1}{2} ∥ u ∥^{2} + I_{U_{ad}} (u) .

J : L^{2} (Ω) \to R \cup {+ \infty}, u \mapsto \frac{1}{2} ∥ u ∥^{2} + I_{C} (u)

J : L^{2} (Ω) \to R \cup {+ \infty}, u \mapsto \frac{1}{2} ∥ u ∥^{2} + I_{C} (u)

D^{λ} (u, v) := J (u) - J (v) - (u - v, λ), λ \in \partial J (v)

D^{λ} (u, v) := J (u) - J (v) - (u - v, λ), λ \in \partial J (v)

Minimize \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k} D^{λ_{k - 1}} (u, u_{k - 1}) .

Minimize \frac{1}{2} ∥ S u - z ∥_{Y}^{2} + α_{k} D^{λ_{k - 1}} (u, u_{k - 1}) .

γ_{k} := j = 1 \sum k \frac{1}{α _{j}} .

γ_{k} := j = 1 \sum k \frac{1}{α _{j}} .

H (u_{k})

H (u_{k})

∣ H (u_{k}) - H (u^{†}) ∣

D^{λ_{k}} (u^{†}, u_{k}) \leq D^{λ_{k - 1}} (u^{†}, u_{k - 1})

D^{λ_{k}} (u^{†}, u_{k}) \leq D^{λ_{k - 1}} (u^{†}, u_{k - 1})

i = 1 \sum \infty D^{λ_{i - 1}} (u_{i}, u_{i - 1}) < \infty.

i = 1 \sum \infty D^{λ_{i - 1}} (u_{i}, u_{i - 1}) < \infty.

S u_{k} \to y^{†},

S u_{k} \to y^{†},

χ_{I} u^{†} = χ_{I} P_{U_{ad}} (S^{*} w),

χ_{I} u^{†} = χ_{I} P_{U_{ad}} (S^{*} w),

∣ {x \in A : 0 < ∣ p^{†} (x) ∣ < ε} ∣ \leq c ε^{κ},

∣ {x \in A : 0 < ∣ p^{†} (x) ∣ < ε} ∣ \leq c ε^{κ},

\nabla p^{†} \neq = 0 \forall x \in \overset{ˉ}{Ω} with p^{†} (x) = 0

\nabla p^{†} \neq = 0 \forall x \in \overset{ˉ}{Ω} with p^{†} (x) = 0

∥ u^{†} - u_{k} ∥^{2}

∥ u^{†} - u_{k} ∥^{2}

∥ u^{†} - u_{k} ∥^{2}

∥ u^{†} - u_{k} ∥^{2}

and i = 1 \sum k \frac{1}{α _{i}} ∥ u^{†} - u_{i} ∥^{2}

γ_{k}^{- 1} + γ_{k}^{- 1} j = 1 \sum k α_{j}^{- 1} γ_{j}^{- κ} \to 0 as k \to \infty,

γ_{k}^{- 1} + γ_{k}^{- 1} j = 1 \sum k α_{j}^{- 1} γ_{j}^{- κ} \to 0 as k \to \infty,

∥ u^{†} - u_{k} ∥^{2} = O (γ_{k}^{- 1} j = 1 \sum k α_{j}^{- 1} γ_{j}^{- κ}) .

∥ u^{†} - u_{k} ∥^{2} = O (γ_{k}^{- 1} j = 1 \sum k α_{j}^{- 1} γ_{j}^{- κ}) .

γ_{k}^{- 1} λ_{k} - p^{†}^{2} = ⎩ ⎨ ⎧ O (γ_{k}^{- 2}) O (γ_{k}^{- 2} (1 + j = 1 \sum k α_{j}^{- 1} γ_{j}^{- κ})) if u^{†} satisfies \ref ass:SC, if u^{†} satisfies \ref ass:ActiveSet .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNumerical methods in inverse problems · Optimization and Variational Analysis · Contact Mechanics and Variational Inequalities

Full text

An inexact iterative Bregman method for optimal control problems111This work was funded by German Research Foundation DFG under project grant Wa 3626/1-1.

Frank Pörner222Department of Mathematics, University of Würzburg, Emil-Fischer-Str. 40, 97074 Würzburg, Germany, E-mail: [email protected]

Abstract

In this article we investigate an inexact iterative regularization method based on generalized Bregman distances of an optimal control problem with control constraints. We show robustness and convergence of the inexact Bregman method under a regularity assumption, which is a combination of a source condition and a regularity assumption on the active sets. We also take the discretization error into account. Numerical results are presented to demonstrate the algorithm.

AMS Subject Classification: 49N45, 49M30, 65K10

Keywords: optimal control, source condition, Bregman distance, inexact Bregman method

1 Introduction

We consider an optimization problem of the following form:

[TABLE]

Here $\Omega\subseteq\mathbb{R}^{n}$ , $n\geq 1$ is a bounded, measurable set, $Y$ a Hilbert space and $z\in Y$ a given function. The operator $S:L^{2}(\Omega)\to Y$ is supposed to be linear and continuous and inequality constraints are prescribed on the set $\Omega$ . Here, we have in mind to choose $S$ as the solution operator of a linear partial differential equation. The special case where $y=Su$ is defined as the solution of

[TABLE]

will be treated in detail in section 5.

A well-known method to solve (P) is the proximal point method (PPM) introduced by Martinet [21] and developed by Rockafellar [27]. This method is also known as iterated Tikhonov regularization, see [8, 10, 13]. The PPM is an iterative method, and the next iterate $u_{k+1}$ is given as the solution of

[TABLE]

with some given initial starting value $u_{0}$ . Here $(\alpha_{k})_{k}$ is a sequence of non-negative real numbers. One can hope to obtain convergence without the additional requirement that the regularization parameters $\alpha_{k}$ tend to zero. Unfortunately this is not the case in general, since there exists a counter-example by Güler [9]. There only weak convergence is obtained. However this method is well understood, see e.g. [17, 16, 15, 14, 28] and the references therein.

For the PPM method it is interesting to investigate the robustness with respect to numerical errors. Denote by

[TABLE]

the exact solution, which in general cannot be computed exactly. Since $\alpha_{k+1}>0$ it is clear that this problem has a unique solution. Due to numerical errors we only obtain an approximate solution $u_{k+1}$ which satisfies $\|u_{k+1}-\mathcal{P}u_{k}\|_{L^{2}(\Omega)}\leq\varepsilon_{k}$ . The sequence $(\varepsilon_{k})_{k}$ can be interpreted as the accuracy of the computed solution. One can hope to achieve convergence of the sequence $(u_{k})_{k}$ if $(\varepsilon_{k})_{k}$ is chosen appropriately. The iterates generated by the proximal point method converge weakly to a solution of (P) if the condition

[TABLE]

holds, see [16, 27]. If the state $z$ is not attainable, i.e. there exists no feasible control $u\in{U_{\mathrm{ad}}}$ such that $Su=z$ holds, the optimal solution might be bang-bang. Here ${U_{\mathrm{ad}}}$ is the set of all feasible controls

[TABLE]

This means it is a linear combination of characteristic functions. Hence the solution may not be in $H^{1}(\Omega)$ and it is unlikely that a source condition holds in this case, see [32].

To handle this non-attainability we considered in [25] an iterative method based on generalized Bregman distances. There, the iterate $u_{k+1}$ is given by the solution of

[TABLE]

where $D^{\lambda}(u,v):=J(u)-J(v)-(u-v,\lambda)$ is called the (generalized) Bregman distance [3] associated with a regularization function $J$ with subgradient $\lambda\in\partial J(v)$ . Here we have additional freedom in choosing the regularization function $J$ . This method was first applied to an image restoration problem, where $J$ was chosen to be the total variation, see [4, 23]. Our approach was to incorporate the control constraints into the regularization functional, resulting in

[TABLE]

Here $I$ is the indicator function from convex analysis. This choice allowed us to prove strong convergence under a suitable regularity assumption, which allows bang-bang structure and non-attainability, see [25]. In the case of noisy data $\|z-z^{\delta}\|\leq\delta$ we established an a-priori stopping rule in [24].

The aim of this paper is to analyse the robustness of the iterative method presented in [25] with respect to numerical errors. We replace the operator $S$ in (1.2) by a linear and continuous operator $S_{h}$ with finite-dimensional range $Y_{h}\subset Y$ . This makes the problem (1.2) numerically solvable, but introduces an additional discretization error. If $S$ is the solution operator of a linear elliptic partial differential equation and $Y_{h}$ is spanned by linear finite elements, then this can be interpreted as the variational discretization in the sense of Hinze [11].

We aim to establish sufficient conditions on the sequence $(\varepsilon_{k})_{k}$ comparable to (1.1), to ensure convergence.

This paper is structured as follows. In section 2 we recall our iterative method, our regularity assumption and some convergence results. The operator $S_{h}$ is then introduces in section 3. Furthermore we present an a-posteriori error estimator for the discretized subproblem, which allows to control the accuracy of the iterates. In section 4 we establish our inexact Bregman iteration and show robustness and convergence results under the presence of numerical errors using our regularity assumption. As an example we consider in section 5 the optimal control of the heat equation. We construct the operator $S_{h}$ and show its properties. Furthermore numerical results are presented for a bang-bang example. Finally conclusions are drawn in section 6.

Notation.

For elements $q\in L^{2}(\Omega)$ , we denote the $L^{2}$ -Norm by $\|q\|:=\|q\|_{L^{2}(\Omega)}$ . Furthermore $c$ is a generic constant, which may change from line to line, but is independent from the important variables, e.g. $k$ .

2 Assumptions and preliminary results

Let $\Omega\subseteq\mathbb{R}^{n}$ , $n\in\mathbb{N}$ be a bounded, measurable domain, $Y$ a Hilbert space, $S:L^{2}(\Omega)\to Y$ linear and continuous. We are interested in the solution to problem (P). Here we assume $z\in Y$ and $u_{a},u_{b}\in L^{\infty}(\Omega)$ such that $u_{a}\leq u_{b}$ . Hence the set of admissible controls ${U_{\mathrm{ad}}}$ is non-empty. By

[TABLE]

we will denote our functional to be minimized.

2.1 Existence of solutions

Using classical arguments we can deduce existence of solutions.

Theorem 2.1.

Under the assumptions listed above the problem (P) has a solution. If the operator $S$ is injective the solution is unique.

Let $u^{\dagger}\in{U_{\mathrm{ad}}}$ denote a solution of (P) with state $y^{\dagger}:=Su^{\dagger}$ and adjoint state $p^{\dagger}:=S^{\ast}(z-Su^{\dagger})$ . Note that due to the strict convexity of $H$ with respect to $Su$ the optimal state $y^{\dagger}$ is uniquely defined. We now have the following result, see also [25].

Theorem 2.2.

We have the relation for almost all $x\in\Omega$

[TABLE]

and the following variational inequality holds:

[TABLE]

2.2 Bregman iteration

In [25] we started to investigate an iterative method to solve (P) based on generalized Bregman distances. The Bregman distance [3] $D^{\lambda}$ for a regularization functional $J$ at $u,v\in L^{2}(\Omega)$ is given by

[TABLE]

where $\lambda\in\partial J(v)$ . We incorporate the control constraints into the regularization functional

[TABLE]

Let us recall some important properties of the regularization functional and the Bregman distance. The next result can also be found in [25, Lemma 2.3].

Lemma 2.3.

Let $C\subseteq L^{2}(\Omega)$ be non-empty, closed, and convex. The functional

[TABLE]

is convex and nonnegative. Furthermore the Bregman distance

[TABLE]

is nonnegative and convex with respect to $u$ .

In the following we define $P_{U}$ to be the $L^{2}$ -projection onto the set $U$ . Our algorithm is now given by: (see [25, 4])

Algorithm A.

Let $u_{0}=P_{U_{\mathrm{ad}}}(0)\in{U_{\mathrm{ad}}}$ , $\lambda_{0}=0\in\partial J(u_{0})$ and $k=1$ .

Solve for $u_{k}$ :

[TABLE] 2. 2.

Choose $\lambda_{k}\in\partial J(u_{k})$ . 3. 3.

Set $k:=k+1$ , go back to 1.

Here $(\alpha_{k})_{k}$ is a bounded sequence of non-negative real numbers. In the next theorems we summarize some properties of the algorithm. The proofs can be found in [25]. In the following we use the abbreviation

[TABLE]

Let us first recall a convergence result in terms of the functional $H$ .

Theorem 2.4.

Algorithm A is well-posed and we have $\lambda_{k}\in\partial J(u_{k})$ for all $k\in\mathbb{N}_{0}$ . Let $u^{\dagger}$ be a solution of (P). We then have

[TABLE]

Furthermore we have the monotonicity property of the sequence $(u_{k})_{k}$ with respect to the Bregman distance

[TABLE]

and

[TABLE]

We also established a general convergence result in terms of the controls.

Theorem 2.5.

Weak limit points of the sequence $(u_{k})_{k}$ generated by Algorithm A are solutions to the problem (P). Furthermore we obtain strong convergence of the states

[TABLE]

where $y^{\dagger}$ is the uniquely determined optimal state of (P). If in addition $u^{\dagger}$ is the unique solution of (P), we obtain $u_{k}\to u^{\dagger}.$

In order to establish convergence rates for the iterates of Algorithm A we have to assume some regularity on the solution of (P). A common assumption on a solution $u^{\dagger}$ is the following source condition, which is an abstract smoothness condition, see, e.g., [4, 5, 12, 22, 30, 32]. We say $u^{\dagger}$ satisfies the source condition SC if the following assumption holds.

Assumption SC (Source Condition).

Let $u^{\dagger}$ be a solution of (P). Assume that there exists an element $w\in Y$ such that $u^{\dagger}=P_{U_{\mathrm{ad}}}(S^{\ast}w)$ holds.

This assumption is too restrictive as in many cases the solution $u^{\dagger}$ is bang-bang, i.e. a linear combination of characteristic functions, hence discontinuous. But in many applications the range of $S^{\ast}$ contains $H^{1}(\Omega)$ or $C(\bar{\Omega})$ , hence the Assumption SC is not applicable in this case. To overcome this, we use the regularity of the adjoint state. We say $u^{\dagger}$ satisfies the source condition ASC if the following assumption holds. In the following we define $\chi_{A}$ to be the indicator function of the set $A$ . Recall that the adjoint state is defined by $p^{\dagger}=S^{\ast}(z-Su^{\dagger})$ .

Assumption ASC (Active Set Condition).

Let $u^{\dagger}$ be a solution of (P) and assume that there exists a set $I\subseteq\Omega$ , a function $w\in Y$ , and positive constants $\kappa,c$ such that the following holds

(source condition) $I\supset\{x\in\Omega:\;p^{\dagger}(x)=0\}$ and

[TABLE] 2. 2.

(structure of active set) $A:=\Omega\setminus I$ and for all $\varepsilon>0$

[TABLE] 3. 3.

(regularity of solution) $S^{\ast}w\in L^{\infty}(\Omega)$ .

Assumption ASC is a generalization of Assumption SC, since for $I=\Omega$ both assumptions coincide. A sufficient condition for Assumption ASC can be found in [6]. If $p^{\dagger}\in C^{1}(\bar{\Omega})$ satisfies

[TABLE]

Assumption ASC is fulfilled with $A=\Omega$ and $\kappa=1$ . Since Assumption SC omits more regularity, we expect to establish improved results in this case. The regularity assumption ASC is used in e.g. [30, 32, 29, 25].

Using this regularity assumptions we established in [25] the following convergence results.

Theorem 2.8.

Let $(u_{k})_{k}$ be the sequence generated by Algorithm A. Assume that Assumption SC holds for $u^{\dagger}$ . Then

[TABLE]

If we assume that instead Assumption ASC holds, then

[TABLE]

Note that we have

[TABLE]

see [25]. If Assumption ASC holds with $A=\Omega$ , which implies that $u^{\dagger}$ is bang-bang on $\Omega$ , we can improve the estimate of Theorem 2.8 to

[TABLE]

The sequence of subdifferentials $(\lambda_{k})_{k}$ is unbounded in general, but we can show that the weighted average $(\gamma_{k}^{-1}\lambda_{k})_{k}$ is converging. The proof can be found in [25, Corollary 4.14].

Lemma 2.9.

We have

[TABLE]

3 The discretized problem

The aim of this section is to introduce the operator $S_{h}$ and to establish auxiliary estimates for the discretized subproblem. These estimates will then be applied to prove convergence results in section 4.

3.1 The operator $S_{h}$

As mentioned in the introduction we want to introduce a family of linear and continuous operators $(S_{h})_{h}$ from $L^{2}(\Omega)$ to $Y$ with finite-dimensional range $Y_{h}\subset Y$ . Throughout this paper we make the following assumption. A similar assumption is also made in [31].

Assumption 3.1.

Assume that there exists a continuous and monotonically increasing function $\delta:\mathbb{R}^{+}\to\mathbb{R}^{+}$ with $\delta(0)=0$ such that

[TABLE]

holds for all $h\geq 0$ , $u_{h}\in{U_{\mathrm{ad}}}$ and $y_{h}:=S_{h}u_{h}$ .

For the case of a linear elliptic partial differential equation, the operator $S_{h}$ is the solution operator of the weak formulation with respect to the test function space $Y_{h}$ . If $Y_{h}$ is spanned by linear finite elements, this can be interpreted as the variational discretization in the sense of Hinze, see [11]. We consider a linear elliptic partial differential equation in section 5. We assume that the operator $S_{h}$ and its adjoint $S_{h}^{\ast}$ can be computed exactly.

Note that 3.1 is an assumption on the approximation of discrete functions. Under Assumption 3.1 we can establish the following discretization error estimate. The proof is similar to [31, Proposition 1.6] and is omitted here.

Lemma 3.2.

Let $u_{k}$ be the solution of

[TABLE]

and $u_{k,h}$ be the solution of the discretized problem

[TABLE]

with $\lambda\in L^{2}(\Omega)$ and $\alpha>0$ . Then we have the following estimate

[TABLE]

with the abbreviation $\rho^{2}_{k}:=\alpha_{k}^{-1}(1+\alpha_{k}^{-1})$ .

Please note that the norm of the operator $S_{h}$ is bounded in the following sense.

Lemma 3.3.

Let $0<h\leq h_{\max}$ . Then there exists a constant $C>0$ independent from $h$ , such that $\|S_{h}\|\leq C$ .

Proof.

We compute the operator norm of $S_{h}$ and estimate

[TABLE]

∎

In the subsequent analysis we will need the following estimate.

Lemma 3.4.

There exists a constant $c>0$ independent from $h$ , such that the following estimate holds for all $u_{h}\in{U_{\mathrm{ad}}}$

[TABLE]

Proof.

We compute with $y_{h}:=S_{h}u_{h}$

[TABLE]

Please note that we used the continuity of $S^{\ast}$ and the assumption on the operator $S_{h}$ . ∎

As a corollary we obtain the following result.

Lemma 3.5.

Let $u_{i}\in{U_{\mathrm{ad}}}$ for $i=1,..,k$ . Then there exists a constant $c>0$ independent from $h$ and $k$ such that the following estimate holds

[TABLE]

3.2 A-posteriori error estimate for the discretized subproblem

We now want to consider the discretized subproblem, i.e. we replaced the operator $S$ in the minimization problem (step 1) of algorithm A with the discrete operator $S_{h}$ . This gives the following problem

[TABLE]

This problem can be rewritten as the equivalent minimization problem (3.1), see also [25]. For brevity we set $\lambda:=\lambda_{k-1}$ and $\alpha:=\alpha_{k}$ .

[TABLE]

To construct an a-posteriori error estimate we use Theorem 2.2 in [20], which will give us the following result. Note that we also use Lemma 3.3 here.

Theorem 3.6.

Let $\hat{u}$ be the solution of the subproblem (3.1). Let $u_{h}\in L^{2}(\Omega)$ be given and define $y_{h}:=S_{h}u_{h}$ and $p_{h}:=S_{h}^{\ast}(z-y_{h})$ . Let $0<h\leq h_{\max}$ with $h_{\max}>0$ . Then there exists a constant $c>0$ independent from $h$ such that

[TABLE]

This results allows us to estimate the distance to the exact solution of the subproblem. Note that the problem (3.1) is uniquely solvable if $\alpha>0$ , see [25].

For abbreviation we set

[TABLE]

Let $u\in L^{2}(\Omega)$ be an approximate solution to the discretized subproblem (3.1). The quantity $\mathcal{B}(\alpha,\lambda,u)$ then is an upper bound for the accuracy of $u$ . This is part of the next result. The proof follows directly with Lemma 3.3 and Theorem 3.6.

Lemma 3.7.

Assume that $0<h\leq h_{\max}$ . Let $\hat{u}$ be the solution of the discretized subproblem (3.1). Then there exists a constant $c>0$ independent from $h$ such that the following implication holds for all $u\in L^{2}(\Omega)$ and $\varepsilon\geq 0$ :

[TABLE]

Let us close this section with the following remark. As mentioned in [11] the solution of the discretized subproblem (3.1) can be approximated with arbitrary accuracy. This will play a role in the analysis presented in the next section.

4 Inexact Bregman iteration

Solving the subproblem

[TABLE]

exactly is very costly and in general not possible. We therefore suggest the following inexact Bregman iteration which can be interpreted as an inexact version of Algorithm A.

Inexact Bregman iterations are analysed in the literature, see e.g. [7, 18, 19, 1] for a finite dimensional approach, and for an abstract Banach space setting, see [26].

Algorithm B.

Let $u_{0}^{\mathrm{in}}=P_{U_{\mathrm{ad}}}(0)\in{U_{\mathrm{ad}}}$ , $\lambda_{0}^{\mathrm{in}}=0\in\partial J(u_{0})$ and $k=1$ .

Find $u_{k}^{\mathrm{in}}$ with $y_{k}^{\mathrm{in}}={S_{h}}u_{k}^{\mathrm{in}}$ and $p_{k}^{\mathrm{in}}={S_{h}}^{\ast}(z-{S_{h}}u_{k}^{\mathrm{in}})$ such that

[TABLE] 2. 2.

Set

[TABLE] 3. 3.

Set $k:=k+1$ , go back to 1.

Here $\varepsilon_{k}\geq 0$ is a given sequence of positive real numbers controlling the accuracy of the approximate solution $u_{k}^{\mathrm{in}}$ . For $\varepsilon_{k}=0$ for all $k\in\mathbb{N}$ and $h=0$ Algorithm A is obtained.

The analysis of Algorithm A presented in [25] is based on the fact that $\lambda_{k}\in\partial J(u_{k})$ . This is guaranteed by the construction of $\lambda_{k}$ . However, since $S_{h}\neq S$ and $\varepsilon_{k}>0$ in general, we cannot expect that $\lambda_{k}^{\mathrm{in}}\not\in\partial J(u_{k}^{\mathrm{in}})$ holds.

Before we start to establish robustness results we want to give an overview over the different auxiliary problems we are going to use. Furthermore we want to introduce and clarify our notation.

4.1 Notation and auxiliary results

The aim of this section is to summarize the most important notations and abbreviations. Our aim is to solve the unregularized problem

[TABLE]

This problem is solvable and we want to specify a solution $u^{\dagger}$ . We assume, that this function satisfies one of the regularity assumptions SC or ASC. In Algorithm A we have to solve the following regularized problem. We will refer to this as subproblem

[TABLE]

with some $\lambda\in L^{2}(\Omega)$ and $\alpha_{k+1}>0$ . Here the (exact) unique solution is denoted with $u_{k+1}^{\mathrm{ex}}$ . The superscript ex stands for exact solution.

However, since the operator $S$ is not computable in general, we introduced the operator $S_{h}$ , which is an approximation of $S$ . We now replace $S$ with $S_{h}$ in (4.1) and obtain the discretized subproblem

[TABLE]

Again this problem is unique solvable and its solution is denoted with $u_{k+1,h}^{\mathrm{ex}}$ . The subscript $h$ indicates that it is a discrete solution. Under suitable assumptions we can estimate the discretization error between $u_{k+1}^{\mathrm{ex}}$ and $u_{k+1,h}^{\mathrm{ex}}$ . This is done in Theorem 3.2.

Please note that neither $u_{k+1}^{\mathrm{ex}}$ nor $u_{k+1,h}^{\mathrm{ex}}$ are computed during the algorithm. As mentioned above we can approximate $u_{k+1,h}^{\mathrm{ex}}$ with arbitrary precision. So we compute an inexact solution of (3.1), which is denoted with $u_{k+1}^{\mathrm{in}}$ . We use the function $\mathcal{B}$ to measure the accuracy.

To control the accuracy during the algorithm we introduce a sequence $(\varepsilon_{k})_{k}$ of positive real values. In each iteration we now search for a function $u_{k+1}^{\mathrm{in}}\in{U_{\mathrm{ad}}}$ such that $\mathcal{B}(\alpha,\lambda,u_{k+1}^{\mathrm{in}})\leq\varepsilon_{k+1}$ .

In the end we want to estimate the error $\|u^{\dagger}-u_{k}^{\mathrm{in}}\|$ . This is done by triangular inequality

[TABLE]

Note that $(I)$ is controlled by the accuracy $\varepsilon_{k}$ and $(II)$ is limited by the discretization error. It remains to estimate the regularization error $(III)$ with the help of the regularity assumptions.

We also want to recall the following definitions, as they will appear quite often.

[TABLE]

4.2 Convergence under Assumption SC

We now start to analyse Algorithm B with $u^{\dagger}$ satisfying Assumption SC.

Theorem 4.1.

Let $u^{\dagger}$ satisfy Assumption SC and let $(\varepsilon_{k})_{k}$ be a sequence of positive real numbers. Furthermore let $h>0$ be given and let $(u_{k}^{\mathrm{in}})_{k}$ be a sequence generated by Algorithm B. Then we have the estimate

[TABLE]

with the abbreviations

[TABLE]

Proof.

The proof is based on the splitting of the error $\|u_{k}^{\mathrm{in}}-u^{\dagger}\|$ in three parts, see (4.2)

[TABLE]

Here $(I)$ is controlled by the given accuracy $\varepsilon_{k}$ and $(II)$ can be estimated with the help of Lemma 3.2:

[TABLE]

It is left to estimate $(III)$ . We start with adding the optimality conditions for $u_{k+1}^{\mathrm{ex}}$ and $u^{\dagger}$ , see [25, Lemma 3.1] and Theorem 2.2,

[TABLE]

Addition yields

[TABLE]

For the term $(u^{\dagger},u^{\dagger}-u_{k+1}^{\mathrm{ex}})$ we estimate with help of the source condition SC

[TABLE]

To estimate the remaining term $(-\lambda_{k}^{\mathrm{in}},u^{\dagger}-u_{k+1}^{\mathrm{ex}}\big{)})$ we introduce the quantity

[TABLE]

This quantity will be helpful in the subsequent analysis. Let us sketch the next steps. First we will replace the operator $S_{h}$ by $S$ in order to apply the first order conditions for $u^{\dagger}$ . Second we eliminate the unknown exact solution $u_{k+1}^{\mathrm{ex}}$ by its approximation $u_{k+1}^{\mathrm{in}}$ . For the first part we make use of Lemma 3.5 and estimate

[TABLE]

Now we eliminate the variable $z$ by using the first order conditions for $u^{\dagger}$ presented in Theorem 2.2

[TABLE]

Since the variable $u_{k+1}^{\mathrm{ex}}$ is unknown we replace it by its approximation $u_{k+1}^{\mathrm{in}}$

[TABLE]

Now we use (4.6) and (4.7) in (4.5) and obtain

[TABLE]

In the next step we plug (4.4) and (4.8) in (4.3)

[TABLE]

Before we proceed we need two additional results. A calculation reveals that

[TABLE]

holds. Second we obtain

[TABLE]

Furthermore we use Young’s inequality and (4.10) to establish for $\tau>1$ :

[TABLE]

This now yields

[TABLE]

with $c_{\tau}=1-\frac{1}{2}\left(1+\frac{1}{\tau}\right)>0$ and the abbreviations

[TABLE]

Summation over $k$ finally reveals

[TABLE]

where we used the convention $v_{0}^{\mathrm{in}}=0$ . The result now follows by triangular inequality.

∎

Let us point out that the variables $R_{i}$ can be identified with the accuracy of the iterates and while the $H_{i}$ are only influenced by the discretization. This result above can now be interpreted in different ways. First we start with the (theoretical) case that we can evaluate the operator $S$ and its dual $S^{\ast}$ . This refers to the case where $h=0$ .

Corollary 4.2.

Let $u^{\dagger}$ satisfy Assumption SC and let $(\varepsilon_{k})_{k}$ be a sequence of positive real numbers such that

[TABLE]

Furthermore assume that $S_{h}=S$ and let $(u_{k}^{\mathrm{in}})_{k}$ be a sequence generated by Algorithm B. Then we have $u_{k}^{\mathrm{in}}\to u^{\dagger}$ in $L^{2}(\Omega)$ .

The other interesting case is, that we can solve the discretized subproblem exactly, i.e. $\varepsilon_{k}=0$ for all $k\in\mathbb{N}$ . Here we obtain convergence in the following sense.

Corollary 4.3.

Let $u^{\dagger}$ satisfy Assumption SC. Let $h_{\max}>0$ be given and $\varepsilon_{k}=0$ for all $k\in\mathbb{N}$ . Then there exists a constant C such that for every $0<h\leq h_{\max}$ there exists a stopping index $k(h)$ such that

[TABLE]

and $k(h)\to\infty$ as $h\to 0$ . Furthermore $u_{k(h)}^{\mathrm{in}}\to u^{\dagger}$ as $h\to 0$ .

Proof.

We only have to show the existence of such a stopping index. The convergence result then is a direct consequence of Theorem 4.1. Let us define the following auxiliary variables

[TABLE]

It is clear that $A_{k},B_{k}\to\infty$ as $k\to\infty$ . Now choose $C>0$ sufficiently large such that

[TABLE]

Now pick $0<h\leq h_{\max}$ . Since $\delta:(0,\infty)\to\mathbb{R}$ is a monotonically increasing function function we get the existence of $\tilde{k}\in\mathbb{N}$ , $\tilde{k}\geq 1$ such that

[TABLE]

Hence, the following expression is well-defined

[TABLE]

It is left to show that $k(h)\to\infty$ as $h\to 0$ . Assume that this is wrong, hence there exists a $n\in\mathbb{N}$ such that $k(h)<n$ for all $h>0$ . This yields

[TABLE]

However, since $A_{i}$ and $B_{i}$ are independent from $h$ this is a contradiction for $h$ small enough. This finishes the proof. ∎

If the disretized subproblem is only solved inexactly we can establish the following result. The proof is a combination of Corollary 4.2 and Corollary 4.3.

Corollary 4.4.

Let $u^{\dagger}$ satisfy Assumption SC and let $(\varepsilon_{k})_{k}$ be a sequence of positive real numbers such that

[TABLE]

Let $h>0$ be given. Then there exists a constant C such that for every $0<h\leq h_{\max}$ there exists a stopping index $k(h)$ such that

[TABLE]

and $k(h)\to\infty$ as $h\to 0$ . Furthermore $u_{k(h)}^{\mathrm{in}}\to u^{\dagger}$ as $h\to 0$ .

4.3 Convergence under Assumption ASC

Let us now consider the case when Assumption ASC is satisfied.

Theorem 4.5.

Let $u^{\dagger}$ satisfy Assumption ASC and let $(\varepsilon_{k})_{k}$ be a sequence of positive real numbers. Furthermore let $h>0$ be given and let $(u_{k}^{\mathrm{in}})_{k}$ be a sequence generated by Algorithm B. Then we have the estimate

[TABLE]

with the abbreviations

[TABLE]

Proof.

The proof mainly follows the idea of Theorem 4.1. Again the main part is to establish estimates for the regularization error for $u_{k+1}^{\mathrm{ex}}$ . First we want to estimate the term $(u^{\dagger},u^{\dagger}-u)$ using Assumption ASC. We use [25, Lemma 4.12] and obtain

[TABLE]

This inequality introduces an additional $L^{1}$ -term. To compensate this term we use an improved optimality condition, which is valid under Assumption ASC

[TABLE]

with $c_{A}>0$ . For a proof we refer to [25, Lemma 4.11]. Similar to (4.6) we compute

[TABLE]

Now we estimate the term $(u^{\dagger},u^{\dagger}-u_{k+1}^{\mathrm{ex}})$ using Young’s inequality

[TABLE]

We now consider the following inequality, similar to (4.3)

[TABLE]

and use again the equality

[TABLE]

As done in Theorem 4.1 we obtain with $\tau>1$ that

[TABLE]

Combining everything now reveals with some $c_{\tau}>0$

[TABLE]

Now we plug in our estimate (4.14) and obtain

[TABLE]

As in the proof of Theorem 4.1 we apply triangular inequality to finish the proof.

∎

Let us now establish convergence results similar to Corollary 4.2 and 4.3.

Corollary 4.6.

Let $u^{\dagger}$ satisfy Assumption ASC and let $(\varepsilon_{k})_{k}$ be a sequence of positive real numbers such that $\gamma_{i-1}\varepsilon_{i}\to 0$ . Furthermore assume that $S_{h}=S$ and let $(u_{k}^{\mathrm{in}})_{k}$ be a sequence generated by Algorithm B. Then we obtain

[TABLE]

as $k\to\infty$ .

Proof.

The sequence $(\alpha_{k})_{k}$ is bounded by a constant $M$ . Hence we have the following inequalities for $k$ large enough

[TABLE]

Furthermore we have by [25, Lemma 3.5] that

[TABLE]

We now obtain

[TABLE]

which finishes the proof. ∎

Corollary 4.7.

Let $u^{\dagger}$ satisfy Assumption ASC. Let $h_{\max}>0$ be given and $\varepsilon_{k}=0$ for all $k\in\mathbb{N}$ . Then there exists a constant C such that for every $0<h\leq h_{\max}$ there exists a stopping index $k(h)$ such that

[TABLE]

and $k(h)\to\infty$ as $h\to 0$ . Furthermore

[TABLE]

as $h\to 0$ .

Proof.

The proof is very similar to the proof of Corollary 4.3. ∎

A combination of both results yields the following corollary.

Corollary 4.8.

Let $u^{\dagger}$ satisfy Assumption ASC and let $(\varepsilon_{k})_{k}$ be a sequence of positive real numbers such that $\gamma_{i-1}\varepsilon_{i}\to 0$ . Let $h>0$ be given. Then there exists a constant C such that for every $0<h\leq h_{\max}$ there exists a stopping index $k(h)$ such that

[TABLE]

and $k(h)\to\infty$ as $h\to 0$ . Furthermore

[TABLE]

as $h\to 0$ .

5 Numerical example

Now, let $Sy=u$ be defined as the (weak) solution of the linear partial differential equation for a convex set $\Omega\subset\mathbb{R}^{n}$ ( $n=2,3$ )

[TABLE]

Let us show that this example fit into our framework. Clearly, for $u\in L^{2}(\Omega)$ equation (5.1) has a unique weak solution $y\in H_{0}^{1}(\Omega)$ , and the associated solution operator $S$ is linear and continuous. For the choice $Y=L^{2}(\Omega)$ we obtain $S^{\ast}=S$ .

Let us now report on the discretization and the operator $S_{h}$ . We follow the argumentation and results presented in [31, Section 3]. Let $\mathcal{T}_{h}$ be a regular mesh which consists of closed cells $T$ . For $T\in\mathcal{T}_{h}$ we define $h_{T}:=\text{diam }T$ . Furthermore we set $h:=\max_{T\in\mathcal{T}_{h}}h_{T}$ . We assume that there exists a constant $R>0$ such that $\frac{h_{T}}{R_{T}}\leq R$ for all $T\in\mathcal{T}$ . Here we define $R_{T}$ to be the diameter of the largest ball contained in $T$ .

For this mesh $\mathcal{T}$ we define an associated finite dimensional space $Y_{h}\subset H_{0}^{1}(\Omega)$ , such that the restriction of a function $v\in Y_{h}$ to a cell $T\in\mathcal{T}$ is a linear polynomial.

The operator $S_{h}$ is now defined in the sense of weak solutions. We set $y_{h}:=S_{h}u$ if $y_{h}\in Y_{h}$ solves

[TABLE]

We also obtain $S_{h}^{\ast}=S_{h}$ in the discrete case. Let us now mention that the operator $S_{h}$ satisfy Assumption 3.1. Following [31] and the references therein we obtain the following result.

Lemma 5.1.

Assume that there exists a constant $C_{M}>1$ such that $\max\limits_{T\in\mathcal{T}_{h}}h_{T}\leq C_{M}\min\limits_{T\in\mathcal{T}_{h}}h_{T}$ holds. Then we have the estimates

[TABLE]

for $f\in L^{2}(\Omega)$ and a constant $c$ independent from $f$ and $h$ .

Hence Assumption 3.1 is satisfied with $\delta(h)=ch^{2}$ .

Let us quickly resort on the computation of the solution of (3.1). In [24, Section 4] we applied a variational discretization and a semi-smooth Newton solver to this problem. The space $Y_{h}$ was defined as the span of linear finite elements. This gives us approximate solutions $(u_{h},y_{h},p_{h})$ such that $y_{h}:=S_{h}u_{h}\in Y_{h}$ , $p_{h}:=S_{h}^{\ast}(z-y_{h})\in Y_{h}$ and $u_{h}\in{U_{\mathrm{ad}}}$ . Here the control $u_{h}$ can be computed as the truncation of a finite element. For more details we refer to [24, 6, 2] and the references therein.

We now consider the following optimal control problem. Note that due to the linearity of $S$ this is of form (P).

[TABLE]

We use the inexact Bregman method B to solve (5.2). With the choice of $\Omega=(0,1)^{2}$ , $u_{a}=-1$ , $u_{b}=1$ and

[TABLE]

the functions $(u^{\dagger},y^{\dagger},p^{\dagger})$ are a solution to (5.2). Here the solution satisfies assumption ASC with $A=\Omega$ and $\kappa=1$ . We use different mesh sizes for comparison and plot the error for the first $500$ iterations in Figure 1, 2 and 3. Furthermore we set $\alpha_{k}:=0.1$ and $\varepsilon_{k}:=k^{-3/2}$ to satisfy the assumptions of Corollary 4.8. As expected we see that for $h\to 0$ we obtain convergence for $k\to\infty$ . The coarsest mesh has $10^{2}$ and the finest mesh has approximately $10^{5}$ degrees of freedom.

6 Conclusion

We showed that our iterative method is robust against numerical errors. Furthermore we established error estimates and convergence result both for errors introduced by the accuracy of the computed iterates and by the discretization. We constructed an a-posteriori error estimator for the discretized subproblem and provided numerical results.

Together with the exact a-priori regularization estimates [25] and the convergence results obtained for noisy data [24], we conclude that the Bregman iterative method is a stable and robust method to compute solutions for our model problem (P).

Bibliography32

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] A. Benfenati and V. Ruggiero. Inexact Bregman iteration with an application to Poisson data reconstruction. Inverse Problems , 29(6):065016, 31, 2013.
2[2] S. Beuchler, C. Pechstein, and D. Wachsmuth. Boundary concentrated finite elements for optimal boundary control problems of elliptic PD Es. Comput. Optim. Appl. , 51(2):883–908, 2012.
3[3] L. M. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. Ussr Computational Mathematics and Mathematical Physics , 7:200–217, 1967.
4[4] M. Burger, E. Resmerita, and L. He. Error estimation for Bregman iterations and inverse scale space methods in image restoration. Computing , 81(2-3):109–135, 2007.
5[5] G. Chavent and K. Kunisch. Convergence of Tikhonov regularization for constrained ill-posed inverse problems. Inverse Problems , 10(1):63–76, 1994.
6[6] K. Deckelnick and M. Hinze. A note on the approximation of elliptic control problems with bang-bang controls. Comput. Optim. Appl. , 51(2):931–939, 2012.
7[7] J. Eckstein. Approximate iterations in Bregman-function-based proximal algorithms. Math. Programming , 83(1, Ser. A):113–123, 1998.
8[8] H. W. Engl, M. Hanke, and A. Neubauer. Regularization of inverse problems , volume 375 of Mathematics and its Applications . Kluwer Academic Publishers Group, Dordrecht, 1996.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

An inexact iterative Bregman method for optimal control problems111This work was funded by German Research Foundation DFG under project grant Wa 3626/1-1.

Abstract

1 Introduction

Notation.

2 Assumptions and preliminary results

2.1 Existence of solutions

Theorem 2.1**.**

Theorem 2.2**.**

2.2 Bregman iteration

Lemma 2.3**.**

Algorithm A**.**

Theorem 2.4**.**

Theorem 2.5**.**

Assumption SC** (Source Condition).**

Assumption ASC** (Active Set Condition).**

Theorem 2.8**.**

Lemma 2.9**.**

3 The discretized problem

3.1 The operator ShS_{h}Sh​

Assumption 3.1**.**

Lemma 3.2**.**

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

Lemma 3.5**.**

3.2 A-posteriori error estimate for the discretized subproblem

Theorem 3.6**.**

Lemma 3.7**.**

4 Inexact Bregman iteration

Algorithm B**.**

4.1 Notation and auxiliary results

4.2 Convergence under Assumption SC

Theorem 4.1**.**

Proof.

Corollary 4.2**.**

Corollary 4.3**.**

Proof.

Corollary 4.4**.**

4.3 Convergence under Assumption ASC

Theorem 4.5**.**

Proof.

Corollary 4.6**.**

Proof.

Corollary 4.7**.**

Proof.

Corollary 4.8**.**

5 Numerical example

Lemma 5.1**.**

6 Conclusion

Theorem 2.1.

Theorem 2.2.

Lemma 2.3.

Algorithm A.

Theorem 2.4.

Theorem 2.5.

Assumption SC (Source Condition).

Assumption ASC (Active Set Condition).

Theorem 2.8.

Lemma 2.9.

3.1 The operator $S_{h}$

Assumption 3.1.

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Lemma 3.5.

Theorem 3.6.

Lemma 3.7.

Algorithm B.

Theorem 4.1.

Corollary 4.2.

Corollary 4.3.

Corollary 4.4.

Theorem 4.5.

Corollary 4.6.

Corollary 4.7.

Corollary 4.8.

Lemma 5.1.