A convex analysis approach to optimal controls with switching structure   for partial differential equations

Christian Clason; Kazufumi Ito; Karl Kunisch

arXiv:1702.07540·math.OC·February 27, 2017

A convex analysis approach to optimal controls with switching structure for partial differential equations

Christian Clason, Kazufumi Ito, Karl Kunisch

PDF

1 Repo

TL;DR

This paper introduces a convex analysis framework for optimal control problems with switching structures in PDEs, enabling explicit characterization and efficient computation of solutions with zero optimality gap.

Contribution

It develops a convex relaxation and a primal-dual system for hybrid control costs, facilitating semismooth Newton methods for PDE control problems with switching controls.

Findings

01

Explicit pointwise characterization of optimal controls.

02

Zero optimality gap for controls with switching structure.

03

Numerical examples demonstrate method effectiveness.

Abstract

Optimal control problems involving hybrid binary-continuous control costs are challenging due to their lack of convexity and weak lower semicontinuity. Replacing such costs with their convex relaxation leads to a primal-dual optimality system that allows an explicit pointwise characterization and whose Moreau-Yosida regularization is amenable to a semismooth Newton method in function space. This approach is especially suited for computing switching controls for partial differential equations. In this case, the optimality gap between the original functional and its relaxation can be estimated and shown to be zero for controls with switching structure. Numerical examples illustrate the effectiveness of this approach.

Equations360

∣ \cdot ∣_{0} : R \to R, ∣ t ∣_{0} := {10 if t \neq = 0, if t = 0,

∣ \cdot ∣_{0} : R \to R, ∣ t ∣_{0} := {10 if t \neq = 0, if t = 0,

g (v) = \frac{α}{2} (v_{1}^{2} + v_{2}^{2}) + β ∣ v_{1} v_{2} ∣_{0} .

g (v) = \frac{α}{2} (v_{1}^{2} + v_{2}^{2}) + β ∣ v_{1} v_{2} ∣_{0} .

⎩ ⎨ ⎧ u \in L^{2} (0, T; R^{2}) min s. t. \frac{1}{2} ∥ y - z ∥_{L^{2} (ω_{T})}^{2} + \int_{0}^{T} g (u (t)) d t, L y = B u .

⎩ ⎨ ⎧ u \in L^{2} (0, T; R^{2}) min s. t. \frac{1}{2} ∥ y - z ∥_{L^{2} (ω_{T})}^{2} + \int_{0}^{T} g (u (t)) d t, L y = B u .

u min F (u) + G (u),

u min F (u) + G (u),

u min F (u) + G^{**} (u),

u min F (u) + G^{**} (u),

{- p_{γ} u_{γ} \in \partial F (u_{γ}), = (\partial G^{*})_{γ} (p_{γ}),

{- p_{γ} u_{γ} \in \partial F (u_{γ}), = (\partial G^{*})_{γ} (p_{γ}),

P

P

G^{*} : U^{*} \to R \cup {\infty}, G^{*} (p) = u \in U sup ⟨ u, p ⟩ - G (u),

G^{*} : U^{*} \to R \cup {\infty}, G^{*} (p) = u \in U sup ⟨ u, p ⟩ - G (u),

{- \overset{p}{ˉ} \overset{u}{ˉ} \in \partial F (\overset{u}{ˉ}), \in \partial G^{*} (\overset{p}{ˉ}),

{- \overset{p}{ˉ} \overset{u}{ˉ} \in \partial F (\overset{u}{ˉ}), \in \partial G^{*} (\overset{p}{ˉ}),

u \in U min F (u) + G^{**} (u),

u \in U min F (u) + G^{**} (u),

⎩ ⎨ ⎧ F is convex and weakly lower-semicontinuous, G is proper and non-negative, F + G^{**} is radially unbounded.

⎩ ⎨ ⎧ F is convex and weakly lower-semicontinuous, G is proper and non-negative, F + G^{**} is radially unbounded.

λ \geq 0 ⋃ λ (dom F - dom G^{**}) is a closed vector space

λ \geq 0 ⋃ λ (dom F - dom G^{**}) is a closed vector space

δ (u, p) := G (u) + G^{*} (p) - ⟨ p, u ⟩

δ (u, p) := G (u) + G^{*} (p) - ⟨ p, u ⟩

J (\overset{u}{ˉ}) \leq J (u) + δ (\overset{u}{ˉ}, \overset{p}{ˉ}) for all u \in U .

J (\overset{u}{ˉ}) \leq J (u) + δ (\overset{u}{ˉ}, \overset{p}{ˉ}) for all u \in U .

F (u) - F (\overset{u}{ˉ}) - ⟨ - \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩ \geq 0.

F (u) - F (\overset{u}{ˉ}) - ⟨ - \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩ \geq 0.

G (u) - G (\overset{u}{ˉ}) - ⟨ \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩ = G (u) - ⟨ \overset{p}{ˉ}, u ⟩ + G^{*} (\overset{p}{ˉ}) - δ (\overset{u}{ˉ}, \overset{p}{ˉ}) \geq - δ (\overset{u}{ˉ}, \overset{p}{ˉ}) .

G (u) - G (\overset{u}{ˉ}) - ⟨ \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩ = G (u) - ⟨ \overset{p}{ˉ}, u ⟩ + G^{*} (\overset{p}{ˉ}) - δ (\overset{u}{ˉ}, \overset{p}{ˉ}) \geq - δ (\overset{u}{ˉ}, \overset{p}{ˉ}) .

J (u) - J (\overset{u}{ˉ}) = (F (u) + G (u)) - (F (\overset{u}{ˉ}) + G (\overset{u}{ˉ})) = (F (u) - F (\overset{u}{ˉ}) - ⟨ - \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩) + (G (u) - G (\overset{u}{ˉ}) - ⟨ \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩) \geq - δ (\overset{u}{ˉ}, \overset{p}{ˉ}) .

J (u) - J (\overset{u}{ˉ}) = (F (u) + G (u)) - (F (\overset{u}{ˉ}) + G (\overset{u}{ˉ})) = (F (u) - F (\overset{u}{ˉ}) - ⟨ - \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩) + (G (u) - G (\overset{u}{ˉ}) - ⟨ \overset{p}{ˉ}, u - \overset{u}{ˉ} ⟩) \geq - δ (\overset{u}{ˉ}, \overset{p}{ˉ}) .

u = (\partial G^{*})_{γ} (p) := \frac{1}{γ} (p - prox_{γ G^{*}} (p)),

u = (\partial G^{*})_{γ} (p) := \frac{1}{γ} (p - prox_{γ G^{*}} (p)),

prox_{γ f} (v) = ar g w min f (w) + \frac{1}{2 γ} ∥ w - v ∥^{2}

prox_{γ f} (v) = ar g w min f (w) + \frac{1}{2 γ} ∥ w - v ∥^{2}

f_{γ} (v) = f (prox_{γ f} (v)) + \frac{1}{2 γ} ∥ prox_{γ f} (v) - v ∥^{2}

f_{γ} (v) = f (prox_{γ f} (v)) + \frac{1}{2 γ} ∥ prox_{γ f} (v) - v ∥^{2}

(\partial f)_{γ} = \frac{1}{γ} (Id - (Id + γ \partial f)^{- 1}) = \partial f \circ (Id + γ \partial f)^{- 1},

(\partial f)_{γ} = \frac{1}{γ} (Id - (Id + γ \partial f)^{- 1}) = \partial f \circ (Id + γ \partial f)^{- 1},

{- p_{γ} u_{γ} \in \partial F (u_{γ}), = H_{γ} (p_{γ}) .

{- p_{γ} u_{γ} \in \partial F (u_{γ}), = H_{γ} (p_{γ}) .

u min F (u) + (G_{γ}^{*})^{*} (u),

u min F (u) + (G_{γ}^{*})^{*} (u),

{(i) F is Fr \overset{e}{ˊ} chet differentiable, F^{'} has weakly closed graph, and (ii) {F (u_{γ})}_{γ > 0} bounded implies {F^{'} (u_{γ})}_{γ > 0} bounded,

{(i) F is Fr \overset{e}{ˊ} chet differentiable, F^{'} has weakly closed graph, and (ii) {F (u_{γ})}_{γ > 0} bounded implies {F^{'} (u_{γ})}_{γ > 0} bounded,

\displaystyle\{p_{\gamma}\}_{\gamma>0}\text{ bounded implies }\big{\{}\inf_{q\in\partial\mathcal{G}^{*}(p_{\gamma})}\|q\|_{U}\big{\}}_{\gamma>0}\text{ bounded.}

(G_{γ}^{*})^{*} (0) = p \in U sup - G_{γ}^{*} (p) = p \in U in f G_{γ}^{*} (p) \leq p \in U in f G^{*} (p)

(G_{γ}^{*})^{*} (0) = p \in U sup - G_{γ}^{*} (p) = p \in U in f G_{γ}^{*} (p) \leq p \in U in f G^{*} (p)

F (u_{γ}) \leq F (u_{γ}) + (G_{γ}^{*})^{*} (u_{γ}) \leq F (0) + p \in U in f G^{*} (p) .

F (u_{γ}) \leq F (u_{γ}) + (G_{γ}^{*})^{*} (u_{γ}) \leq F (0) + p \in U in f G^{*} (p) .

{p_{γ}}_{γ > 0} = {- F^{'} (u_{γ})}_{γ > 0}

{p_{γ}}_{γ > 0} = {- F^{'} (u_{γ})}_{γ > 0}

∥ u_{γ} ∥_{U} = ∥ H_{γ} (p_{γ}) ∥_{U} \leq q \in \partial G^{*} (p_{γ}) in f ∥ q ∥_{U} \leq C,

∥ u_{γ} ∥_{U} = ∥ H_{γ} (p_{γ}) ∥_{U} \leq q \in \partial G^{*} (p_{γ}) in f ∥ q ∥_{U} \leq C,

\overset{p}{^} = - F^{'} (\overset{u}{^}) .

\overset{p}{^} = - F^{'} (\overset{u}{^}) .

⟨ H_{γ_{1}} (p_{γ_{1}}) - H_{γ_{2}} (p_{γ_{2}}), p_{γ_{1}} - p_{γ_{2}} ⟩ = - ⟨ u_{γ_{1}} - u_{γ_{2}}, F^{'} (u_{γ_{1}}) - F^{'} (u_{γ_{2}})⟩ \leq 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

clason/switchingcontrol
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A convex analysis approach to optimal controls with switching structure for partial differential equations

Christian Clason Faculty of Mathematics, University Duisburg-Essen, 45117 Essen, Germany () [email protected]

Kazufumi Ito Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA (). [email protected]

Karl Kunisch Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, 8010 Graz, Austria (). [email protected]

(March 15, 2015)

Abstract

Optimal control problems involving hybrid binary–continuous control costs are challenging due to their lack of convexity and weak lower semicontinuity. Replacing such costs with their convex relaxation leads to a primal-dual optimality system that allows an explicit pointwise characterization and whose Moreau–Yosida regularization is amenable to a semismooth Newton method in function space. This approach is especially suited for computing switching controls for partial differential equations. In this case, the optimality gap between the original functional and its relaxation can be estimated and shown to be zero for controls with switching structure. Numerical examples illustrate the effectiveness of this approach.

1 Introduction

In the context of control of differential equations, switching control refers to problems with two or more controls of which only one should be active at every point in time. This is a challenging problem due to its hybrid discrete–continuous nature.

To partially set the stage, consider the parabolic partial differential equation $Ly=Bu$ on $\Omega_{T}:=[0,T]\times\Omega$ , where $L=\partial_{t}-A$ for an elliptic operator $A$ defined on $\Omega\subset\mathbb{R}^{n}$ , and $B$ is defined by $(Bu)(t,x)=\chi_{\omega_{1}}(x)u_{1}(t)+\chi_{\omega_{2}}(x)u_{2}(t)$ for given control domains $\omega_{1},\omega_{2}\subset\overline{\Omega}$ (which may include controls acting on the boundary). To promote a switching structure, we propose to use the binary function

[TABLE]

to construct a cost functional which has the value [math] if and only if at most one control is active pointwise. To guarantee coercivity, we also need to add an (in this case) quadratic term, i.e., we define for $v=(v_{1},v_{2})\in\mathbb{R}^{2}$ the pointwise control cost

[TABLE]

This term combines in a single functional both switching enhancement and a quadratic cost for the active control(s), where the binary part naturally acts as a penalization of the switching constraint $v_{1}v_{2}=0$ . In this respect we shall consider the asymptotic behavior $\beta\to\infty$ in Section 4.

For some $\omega_{T}\subset\Omega_{T}$ we then consider the problem

[TABLE]

Using the solution operator $S=L^{-1}B:u\mapsto y$ , problem (3) can be expressed in reduced form as

[TABLE]

where $\mathcal{F}$ is smooth and convex, and $\mathcal{G}$ is neither smooth nor convex nor, in fact, weakly lower semicontinuous (since this is the case if and only if $g$ is lower semicontinous and convex, which is not the case; see, e.g., [4, Corollary 2.14]). This makes both its analysis and its numerical solution challenging; for example, one cannot rely on standard techniques to guarantee existence of solutions. We therefore consider the relaxed problem

[TABLE]

where $\mathcal{G}^{**}$ is the biconjugate of $\mathcal{G}$ , which is always convex. Existence and optimality conditions for the relaxed problem can readily be obtained. However, as we shall see, these optimality conditions are not directly amenable to numerical solution by Newton-type techniques. For this reason we consider a regularized optimality system

[TABLE]

where $(\partial\mathcal{G}^{*})_{\gamma}$ is the Moreau–Yosida approximation of the subdifferential of the Fenchel conjugate $\mathcal{G}^{*}$ . Thus for the numerical realization, only $(\partial\mathcal{G}^{*})_{\gamma}$ is needed which can be computed without explicit knowledge of $\mathcal{G}^{**}$ . For problem (3), the first relation of (6) coincides with the usual state and adjoint equations, while the second relation allows a pointwise characterization; see (65) below.

The remainder of this work is organized as follows. In Section 2, we shall provide the abstract existence results, derive optimality conditions, and prove the convergence of solutions to system (6) to minimizers of problem (5). Section 3 is dedicated to giving an explicit pointwise characterization of the subdifferential $\partial\mathcal{G}^{*}$ and its Moreau–Yosida $(\partial\mathcal{G}^{*})_{\gamma}$ in the concrete case of switching control; two other functionals involving $|\cdot|_{0}$ (sparsity and multi-bang penalties) are discussed in Appendix A. These characterizations allow addressing the significant questions related to the relaxation (5) of (4) in Section 4: We clarify the relation between the value of the costs in (5) and in (4) in terms of the duality gap between $\mathcal{G}$ and $\mathcal{G}^{*}$ , and show that in certain cases it can be guaranteed to be zero. If this is the case, then the solution to problem (5) is also a solution to problem (4). Moreover, we analyze to which extent the choice of the functional $(v_{1},v_{2})\mapsto|v_{1}v_{2}|_{0}$ , when used as part of control costs, in fact leads to optimal solutions of switching type. We shall be able to give a sufficient condition on the relation of $\alpha$ and $\beta$ for (5) that rule out free arcs, where $|v_{1}|$ and $|v_{2}|$ are both strictly positive but not equal, whereas singular arcs, on which $|v_{1}|=|v_{2}|>0$ , may remain. Section 5 is concerned with the numerical solution of (6) via a path-following semismooth Newton method. To guarantee convergence, a globalization is required. This guarantees superlinear convergence of the semismooth Newton algorithm in spite of the challenging cost, which combines continuous and discrete objectives. Finally, Section 6 contains numerical tests for switching controls in the context of an elliptic and a parabolic partial differential equation.

Let us put our work into perspective with respect to the existing literature. Casting the problem of switching controls as a nonconvex optimization problem involving the binary functional $|\cdot|_{0}$ is certainly new. Concerning the convex relaxation of nonconvex problems, we can draw from existing works. We only mention the monograph [8], where, however, the focus is on obtaining existence rather than on explicit optimality conditions and numerical realization. The partial (Moreau–Yosida) regularization of nonsmooth convex finite-dimensional problems for the purpose of efficiently applying first-order methods was investigated in [3]. Switching control has been studied mainly for ordinary differential equations; here we refer to [21] for a survey with emphasis on stability of switching systems. The Hamilton–Jacobi–Bellman equation for switching controls was extensively studied in [6] and [23]. Switching control in the context of partial differential equations was especially investigated with respect to their improved flexibility over nonswitching controls for stabilization [10, 18]. Controllability for systems with switching controls were studied in [24, 17]. The hybrid nature of continuous and discrete phenomena when the system switches among different modes is the focus of the work in [11, 12]. In [12] a relaxation technique combined with rounding strategies is proposed to solve mixed-integer programming problems arising in optimal control of partial differential equations. It is verified that the solution of the relaxed problems can be approximated with arbitrary accuracy by a solution satisfying the integer requirements. In [14] optimal control of linear switched systems are considered, and an algorithmic treatment is proposed that relies on an exhaustive search which involves solving on the order of $m^{k}$ differential Riccati equations, where $m$ denotes the number of possible controller configurations and $k$ the number of predefined switching times.

2 Convex relaxation and regularization approach

In this section we introduce the abstract framework and recall relevant concepts from convex analysis. Consider the variational problem

[TABLE]

where $U$ is a Hilbert space and $\mathcal{F}:U\to\mathbb{R}$ is convex. If moreover $\mathcal{G}:U\to\mathbb{R}\cup\{\infty\}$ is convex, any minimizer $\bar{u}\in U$ satisfies (under a regularity assumption stated below) the following necessary optimality conditions: There exists a $\bar{p}\in-\partial\mathcal{F}(\bar{u})\subset U^{*}$ such that $\bar{p}\in\partial\mathcal{G}(\bar{u})\subset U^{*}$ , which holds if and only if $\bar{u}\in\partial\mathcal{G}^{*}(\bar{p})$ ; see, e.g., [20, Proposition 4.4.4]. Here,

[TABLE]

denotes the Fenchel conjugate of the convex functional $\mathcal{G}$ , and $\partial\mathcal{G}^{*}$ denotes its convex subdifferential. (In the following, we identify the Hilbert space $U$ with its dual $U^{*}$ and consider $\mathcal{G}^{*}:U\to\mathbb{R}\cup\{\infty\}$ .) We thus obtain the primal-dual optimality system

[TABLE]

which is well-defined even for nonconvex $\mathcal{G}:U\to\mathbb{R}\cup\{\infty\}$ as in the situation we are interested in. To argue existence of a solution, we will show that the system (8) is the necessary optimality condition for

[TABLE]

where $\mathcal{G}^{**}=(\mathcal{G}^{*})^{*}$ is the biconjugate of $\mathcal{G}$ , and make the following standard assumptions:

[TABLE]

Proposition 2.1.

Under assumption (a1), the system (8) admits a solution $(\bar{u},\bar{p})\in U\times U$ . If $\mathcal{F}$ is strictly convex, this solution is unique.

Proof 2.2.

By assumption, $\mathcal{G}:U\to\mathbb{R}+\cup\{\infty\}$ is bounded from below by [math], which implies that $\mathcal{G}^{**}\geq 0$ as well, see, e.g. [2, Proposition 13.14]. Furthermore, Fenchel conjugates are always lower semicontinuous and convex, see, e.g. [2, Proposition 13.11]. Together with assumption (a1) this implies that $\mathcal{F}+\mathcal{G}^{**}$ is convex, weakly lower semicontinuous, and radially unbounded, and thus a standard subsequence argument yields existence of a minimizer $\bar{u}\in U$ to (9).

Since $\operatorname{\mathrm{dom}}\mathcal{F}=U$ ensures that the stability condition

[TABLE]

holds, we can apply the sum rule for the convex subdifferential from [1] and again appeal to [20, Proposition 4.4.4] for $\partial\mathcal{G}^{**}$ to arrive at the necessary optimality conditions (8).

Problem (9) can be seen a convex relaxation of problem ( $\mathcal{P}$ ). This approach is thus related to the $\Gamma$ -regularization in the calculus of variations, see, e.g., [8, Chapter IX], although here we consider a more specific relaxation and pass to the biconjugate only in the nonconvex term rather than to the full biconjugate functional $\mathcal{J}^{**}$ , which allows us to obtain explicit optimality conditions in the primal-dual form (8) that are useful for numerical computations.

In general, a solution to system (8) is not necessarily a minimizer of ( $\mathcal{P}$ ), since for nonconvex $\mathcal{G}$ we cannot rely on equality in the Fenchel–Young inequality (which requires the characterization of the convex subdifferential). In fact, a solution to problem ( $\mathcal{P}$ ) may not even exist. However, for the class of penalties we are interested in, it is possible to show that a solution to system (8) is suboptimal in the sense that the corresponding functional value is within a certain distance of the infimum. This distance is given by the duality gap

[TABLE]

between $\mathcal{G}$ and its Fenchel dual $\mathcal{G}^{*}$ . This gap is always non-negative by the Fenchel–Young inequality, and vanishes if $\mathcal{G}$ is convex and $p\in\partial\mathcal{G}(u)$ .

Lemma 2.3.

Let $\mathcal{F}$ satisfy (a1), and let $(\bar{u},\bar{p})$ satisfy (8). Then

[TABLE]

Proof 2.4.

Assume that $(\bar{u},\bar{p})$ is a solution to system (8) and let $u\in U$ be arbitrary. Recall that the first relation of (8) then implies that

[TABLE]

Furthermore, by definition (11) and the Fenchel–Young inequality (which holds for any proper $\mathcal{G}$ ) we have that

[TABLE]

Hence,

[TABLE]

Since the subdifferential $\partial\mathcal{G}^{*}$ is in general multivalued and not Lipschitz continuous, system (8) is not amenable to numerical solution. We therefore introduce the Moreau–Yosida regularization of $\partial\mathcal{G}^{*}$ :

[TABLE]

where

[TABLE]

is the proximal mapping of $f$ ; see [19]. We recall the following properties of $\mathrm{prox}_{\gamma f}$ and $(\partial f)_{\gamma}$ , e.g., from [2, Props. 12.29, 12.15, 23.10, 23.43, 12.9, 16.34]; see also [15, Chapter 4.4].

Proposition 2.5.

Let $f:H\to\mathbb{R}\cup\{\infty\}$ be a proper convex function on a Hilbert space $H$ . Then,

(i)

$(\partial f)_{\gamma}=(f_{\gamma})^{\prime}$ , where

[TABLE]

is the Moreau-envelope of $f$ , which is real-valued and convex. 2. (ii)

$(\partial f)_{\gamma}$ * is single-valued, maximally monotone and Lipschitz-continuous with constant $\gamma^{-1}$ ,* 3. (iii)

$\|(\partial f)_{\gamma}(v)\|_{H}\leq\inf_{q\in\partial f(v)}\|q\|_{H}$ * for all $v\in H$ ,* 4. (iv)

$f\left(\mathrm{prox}_{\gamma f}(v)\right)\leq f_{\gamma}(v)\leq f(v)$ * for all $\gamma>0$ and $v\in H$ ,* 5. (v)

$\mathrm{prox}_{\gamma f}=(\operatorname{\mathrm{Id}}+\gamma\partial f)^{-1}$ * (the resolvent of $\partial f$ ).*

From the last property, we can see that

[TABLE]

i.e., $(\partial f)_{\gamma}$ is indeed the Moreau–Yosida regularization of $\partial f$ .

For brevity, we set $\mathcal{G}_{\gamma}^{*}:=(\mathcal{G}^{*})_{\gamma}$ and $H_{\gamma}:=(\partial\mathcal{G}^{*})_{\gamma}$ from here on and consider the regularized optimality system

[TABLE]

Arguing as in Proposition 2.1, existence of a solution follows from the fact that this system is the necessary optimality condition for the problem

[TABLE]

using that $\mathcal{G}^{*}_{\gamma}\leq\mathcal{G}^{*}$ implies that $0\leq\mathcal{G}^{**}\leq(\mathcal{G}^{*}_{\gamma})^{*}$ and that $H_{\gamma}=(\partial\mathcal{G}^{*})_{\gamma}$ is single-valued by Proposition 2.5 (i,ii).

Proposition 2.6.

Under assumption (a1), the system (20) admits a solution $(u_{\gamma},p_{\gamma})\in U\times U$ . If $\mathcal{F}$ is strictly convex, this solution is unique.

The convergence $(u_{\gamma},p_{\gamma})\to(\bar{u},\bar{p})$ as $\gamma\to 0$ requires additional assumptions on $\mathcal{F}$ and $\mathcal{G}$ :

[TABLE]

We point out that (a2 ii) is generically satisfied for functionals of the type $\mathcal{F}(u)=F(S(u))$ , where

(i)

$F:Y\to\mathbb{R}$ is radially unbounded on a Banach space $Y$ , 2. (ii)

$F$ is Fréchet differentiable and $F^{\prime}$ is bounded on bounded sets, 3. (iii)

$S:U\to Y$ is Fréchet differentiable and $S^{\prime}(u)^{*}$ is uniformly bounded on $U$ ,

since in this case boundedness of $\mathcal{F}(u_{\gamma})$ implies boundedness of $y_{\gamma}:=S(u_{\gamma})$ and hence boundedness of $\mathcal{F}^{\prime}(u_{\gamma})=S^{\prime}(u_{\gamma})^{*}F^{\prime}(y_{\gamma})$ . In particular, it holds for many common tracking-type functionals of the form $F(y)=\frac{1}{2}\|y-z\|_{Y}^{2}$ and bounded linear control-to-state mappings $S$ . In this case, $\mathcal{F}^{\prime}(u)=S^{*}(Su-z)$ and (a2 i) trivially holds. Assumption (a3) is more restrictive but satisfied for the class of functionals we shall consider later on.

Proposition 2.7.

If $\mathcal{F}$ and $\mathcal{G}$ satisfy assumptions (a1)–(a3), the family $\{(u_{\gamma},p_{\gamma})\}_{\gamma>0}$ contains a subsequence converging weakly as $\gamma\to 0$ to a solution $(\bar{u},\bar{p})$ to system (8). If $\mathcal{F}$ is strictly convex, the whole sequence converges weakly.

Proof 2.8.

First, observe that

[TABLE]

by Proposition 2.5 (iii). By the optimality of $u_{\gamma}$ we thus have for any $\gamma>0$ that

[TABLE]

Hence, $\{\mathcal{F}(u_{\gamma})\}_{\gamma>0}$ is bounded, and assumption (a2) yields that

[TABLE]

is bounded. From assumption (a3) together with Proposition 2.5 (iii) it then follows that for every $\gamma>0$ , we have that

[TABLE]

i.e., $\{H_{\gamma}(p_{\gamma})\}_{\gamma>0}$ and $\{u_{\gamma}\}_{\gamma>0}$ are bounded. Hence, there exist subsequences $\{u_{\gamma_{n}}\}_{n\in\mathbb{N}}$ , $\{p_{\gamma_{n}}\}_{n\in\mathbb{N}}$ and $\{H_{\gamma_{n}}(p_{\gamma_{n}})\}_{n\in\mathbb{N}}$ converging weakly in $U$ to some $\hat{u}$ , $\hat{p}$ , and $\hat{y}$ , respectively. The weak closedness of $\mathcal{F}^{\prime}$ then yields

[TABLE]

For the second relation of system (8), we first observe that due to the monotonicity of $\mathcal{F}^{\prime}$ and using both relations of system (20), we have for any $\gamma_{1},\gamma_{2}>0$ that

[TABLE]

and hence that for any sequence $\{\gamma_{n}\}_{n\in\mathbb{N}}$ with $\gamma_{n}\to 0$ ,

[TABLE]

Since $H_{\gamma}$ is monotone, we can apply [5, Lemma 1.3(e)] to obtain that $\hat{u}=\partial\mathcal{G}^{*}(\hat{p})$ , i.e., $(\hat{u},\hat{p})$ satisfies system (8).

If $\mathcal{F}$ is strictly convex, the solution to system (8) is unique, and the claim follows from a subsequence–subsequence argument.

To conclude this section, we compare the Moreau–Yosida regularization with the following complementarity formulation of the second relation of system (8): For any $\gamma>0$ , we have that

[TABLE]

see also [15, Theorem 4.41]. The subdifferential inclusion can thus be equivalently expressed as a nonlinear equation. While the subdifferential inclusion is explicit with respect to $u$ , the nonlinear equation is implicit. Moreover, the appearance of $u$ in the proximal mapping rules out the effective use of semismooth Newton methods for the applications we have in mind. On the other hand, note that the Moreau–Yosida approximation (16) differs only in the absence of $\gamma u$ on the right hand side of the last equality. Hence semismooth Newton methods will be applicable.

3 Switching cost functional $\scriptstyle g$

To make practical use of the proposed approach, we require an explicit, pointwise, characterization of $\partial\mathcal{G}^{*}$ and $(\partial\mathcal{G}^{*})_{\gamma}$ . For this, we exploit the integral nature of functionals of the type

[TABLE]

with $D\subset\mathbb{R}^{d},$ for some $d\geq 1$ , which allows computing the Fenchel conjugate and its subdifferential pointwise as well; see, e.g., [8, Props. IV.1.2, IX.2.1], [2, Prop. 16.50].

Specifically, we consider here the switching cost functional on $\mathbb{R}^{2}$ ,

[TABLE]

Other penalties of this class are discussed in Appendix A. The use of the term $|v_{1}v_{2}|_{0}$ enhances switching between the control variables $v_{1}$ and $v_{2}$ in such a manner that simultaneous nontriviality of both of them is penalized. We shall give sufficient conditions which guarantee that in fact $v_{1}$ and $v_{2}$ are not simultaneously nontrivial except for a singular set of controls for which $|v_{1}|=|v_{2}|\leq\sqrt{2\beta/\alpha}$ .

3.1 Fenchel conjugate of $\scriptstyle g$

To characterize

[TABLE]

first note that the function $v\mapsto g(v)-v\cdot q$ is lower semicontinuous and radially unbounded. The supremum in (32) is thus attained at some $\bar{v}\in\mathbb{R}^{2}$ . We then discriminate the following cases:

(i)

$\bar{v}_{1}=0$ , in which case $g(\bar{v})=\frac{\alpha}{2}\bar{v}_{2}^{2}$ . The supremum in (32) is attained if and only if the necessary optimality condition $q_{2}-\alpha\bar{v}_{2}=0$ holds. Solving for $\bar{v}_{2}$ and inserting into (32) yields

[TABLE] 2. (ii)

$\bar{v}_{2}=0$ , in which case $g(\bar{v})=\frac{\alpha}{2}\bar{v}_{1}^{2}$ . By the same argument as in case (i) we obtain

[TABLE] 3. (iii)

$\bar{v}_{1},\bar{v}_{2}\neq 0$ , in which case $g(\bar{v})=\frac{\alpha}{2}(\bar{v}_{1}^{2}+\bar{v}_{2}^{2})+\beta$ . Again, using the necessary optimality condition for the supremum in (32) yields

[TABLE]

It remains to decide which of these cases is attained based on the value of $q$ . For this purpose, define

[TABLE]

Since all $g_{i}^{*}$ are finite, the supremum in (32) is attained at

[TABLE]

From the definition, we have that $g_{1}^{*}(q)\geq g_{2}^{*}(q)$ if $|\bar{v}_{1}|\geq|\bar{v}_{2}|$ and $g_{1}^{*}(q)\geq g_{0}^{*}(q)$ if $|\bar{v}_{2}|\leq\sqrt{2\alpha\beta}$ ; similarly for $g_{2}^{*}(q)$ . Conversely, $g_{0}^{*}(q)\geq g_{i}^{*}(q)$ if $|\bar{v}_{j}|\leq\sqrt{2\alpha\beta}$ , $j=1,2$ . Summarizing the above, we have {equation+} g^(q) = {12αq12if |q1|≥|q2| and |q2|≤2αβ*,12αq22if |q1|≤|q2| and |q1|≤2αβ,12α(q12+q22)-βif |q1|,|q2| ≥2αβ.

3.2 Subdifferential of $\scriptstyle g^{*}$

Since $g^{*}$ is the maximum of a finite number of convex functions, its subdifferential is given by

[TABLE]

where $\overline{\mathrm{co}}$ denotes the closed convex hull; see, e.g., [13, Corollary 4.3.2]. We make a case distinction based on all possibilities for $g^{*}(q)=g_{i}^{*}(q)$ , $i\in\{0,1,2\}$ :

(i)

$g^{*}(q)=g_{1}^{*}(q)$ only, which is the case if and only if

[TABLE]

Here the subdifferential is single-valued and given by

[TABLE] 2. (ii)

$g^{*}(q)=g_{2}^{*}(q)$ only, which is the case if and only if

[TABLE]

Here,

[TABLE] 3. (iii)

$g^{*}(q)=g_{0}^{*}(q)$ only, which is the case if and only if

[TABLE]

Here,

[TABLE] 4. (iv)

$g^{*}(q)=g_{1}^{*}(q)=g_{0}^{*}(q)\neq g_{2}^{*}(q)$ , which is the case if and only if

[TABLE]

Here, the subdifferential is given by the convex hull of $\{(g_{1}^{*})^{\prime}(q),(g_{0}^{*})^{\prime}(q)\}$ , i.e.,

[TABLE]

To keep the notation concise, we use the convention $[a,b]:=[\min\{a,b\},\max\{a,b\}]$ here and below. 5. (v)

$g^{*}(q)=g_{2}^{*}(q)=g_{0}^{*}(q)\neq g_{1}^{*}(q)$ , which is the case if and only if

[TABLE]

Here,

[TABLE] 6. (vi)

$g^{*}(q)=g_{1}^{*}(q)=g_{2}^{*}(q)$ , which is the case if and only if

[TABLE]

Here,

[TABLE]

Note that this also includes the case $g^{*}(q)=g_{1}^{*}(q)=g_{2}^{*}(q)=g_{0}^{*}(q)$ , since then $(g_{0}^{*})^{\prime}(q)\in\partial g^{*}(q)$ .

Since $\mathbb{R}^{2}$ is the disjoint union of the sets $Q_{i}$ defined above, see Fig. 1, we thus obtain a complete characterization of the subdifferential $\partial g^{*}(q)$ .

3.3 Proximal mapping of $\scriptstyle g^{*}$

For the Moreau–Yosida regularization or the complementarity formulation, we need to compute the proximal mapping of $g^{*}$ or, equivalently, the resolvent of $\partial g^{*}$ . For given $\gamma>0$ and $v\in\mathbb{R}$ , the resolvent $w:=(\mathrm{Id}+\gamma\partial g^{*})^{-1}(v)$ is characterized by the subdifferential inclusion

[TABLE]

Note that this implies

[TABLE]

and hence that $\operatorname{\mathrm{sign}}(v_{j})=\operatorname{\mathrm{sign}}(w_{j})$ , $j=1,2$ . We now follow the case discrimination in the characterization of the subdifferential.

(i)

$w\in Q_{1}$ : In this case, the subdifferential inclusion (51) yields $v_{1}=(1+\frac{\gamma}{\alpha})w_{1}$ and $v_{2}=w_{2}$ ; solving for $w_{1},w_{2}$ and inserting the result into the definition of $Q_{1}$ yields

[TABLE] 2. (ii)

$w\in Q_{2}$ : In this case, $v_{1}=w_{1}$ and $v_{2}=(1+\frac{\gamma}{\alpha})w_{2}$ , and as in case (i) we have that

[TABLE] 3. (iii)

$w\in Q_{0}$ : In this case, $v_{1}=(1+\frac{\gamma}{\alpha})w_{1}$ and $v_{2}=(1+\frac{\gamma}{\alpha})w_{2}$ , and hence

[TABLE] 4. (iv)

$w\in Q_{10}$ : In this case, $v_{1}=(1+\frac{\gamma}{\alpha})w_{1}$ and $v_{2}\in[w_{2},(1+\frac{\gamma}{\alpha})w_{2}]$ . Since $\operatorname{\mathrm{sign}}(w_{2})=\operatorname{\mathrm{sign}}(v_{2})$ , we have from the definition of $Q_{10}$ that $w_{2}=\operatorname{\mathrm{sign}}(v_{2})\sqrt{2\alpha\beta}$ . Hence

[TABLE] 5. (v)

$w\in Q_{20}$ : In this case, $v_{2}=(1+\frac{\gamma}{\alpha})w_{2}$ and $v_{1}\in[w_{1},(1+\frac{\gamma}{\alpha})w_{1}]$ . As in (iv), we have that

[TABLE] 6. (vi)

$w\in Q_{12}$ : In this case, $v_{1}\in[w_{1},(1+\frac{\gamma}{\alpha})w_{1}]$ and $v_{2}\in[w_{2},(1+\frac{\gamma}{\alpha})w_{2}]$ . This does not yield an explicit value for $w$ , although the definition of $Q_{12}$ implies that $|w_{1}|=|w_{2}|\leq\sqrt{2\alpha\beta}$ . We therefore turn to the equivalent characterization of $w$ via the proximal mapping

[TABLE]

First, assume that $z_{1}=z_{2}=:z$ (which implies $\operatorname{\mathrm{sign}}(v_{1})=\operatorname{\mathrm{sign}}(z)=\operatorname{\mathrm{sign}}(v_{2})$ ). The minimizer of the reduced problem is then given by the projection of the unconstrained minimizer $z=\frac{\alpha}{2\alpha+\gamma}(v_{1}+v_{2})$ to the (convex) feasible set $[-\sqrt{2\alpha\beta},\sqrt{2\alpha\beta}]$ , i.e.,

[TABLE]

Inserting each of these values for $w$ into the relation $v\in[w,(1+\tfrac{\gamma}{\alpha})w]$ yields (after some algebraic manipulations)

[TABLE]

and

[TABLE]

respectively.

We argue similarly for $z_{1}=-z_{2}$ (where $\operatorname{\mathrm{sign}}(v_{1})=\operatorname{\mathrm{sign}}(z)=-\operatorname{\mathrm{sign}}(v_{2})$ ). Combining the two cases, we obtain

[TABLE]

and

[TABLE]

Inserting this into the definition of the Moreau–Yosida regularization

[TABLE]

and simplifying yields

[TABLE]

where

[TABLE]

see Fig. 2.

This pointwise characterization allows obtaining expressions for the Moreau–Yosida approximation and the complementarity formulation of $u\in\partial\mathcal{G}^{*}(p)$ .

4 Optimality conditions and structure

We now discuss the properties of solutions $(\bar{u},\bar{p})$ to system (8). Specifically, let

[TABLE]

with $g$ given by (31). The functional $\mathcal{F}$ will be assumed to be a tracking term of the form

[TABLE]

for a Hilbert space $Y=Y^{*}$ (e.g., $Y=L^{2}([0,T]\times\Omega)$ ), given $z\in Y$ , and a bounded linear control-to-observation mapping $S:U\to Y$ . We further assume the existence of a Banach space $V\hookrightarrow L^{r}(D;\mathbb{R}^{2})$ with $r>2$ such that the adjoint $S^{*}:Y\to U$ maps continuously into $V$ . The optimality system (8) is then given by

[TABLE]

From (b.2) it follows that $\mathcal{G}^{**}$ is radially unbounded. Hence, $\mathcal{F}$ and $\mathcal{G}$ satisfy assumption (a1), and Proposition 2.1 yields existence of a solution $(\bar{u},\bar{p})\in U\times U$ (which is unique if $S$ is injective).

Using Section 3.2 and the pointwise characterization of the subdifferential of integral functionals (see, e.g., [2, Proposition 16.50]), the second relation in (OS) implies that for almost all $x\in D$ ,

[TABLE]

We define the switching arc (where at most one control is active, i.e., nonzero)

[TABLE]

Clearly,

[TABLE]

Let us address the question when the solution to system (OS) will be optimal. For this purpose, we first estimate the duality gap (11).

Lemma 4.1.

If $(\bar{u},\bar{p})\in U\times U$ satisfies $\bar{u}\in\partial\mathcal{G}^{*}(\bar{p})$ , then

[TABLE]

Proof 4.2.

We discriminate pointwise in the definition (11) based on the value of $\bar{p}(x)$ for almost every $x\in D$ .

(i)

$\bar{p}(x)\in Q_{1}$ . In this case, the relation (75) yields $\bar{u}_{1}(x)=\frac{1}{\alpha}\bar{p}_{1}(x)$ and $\bar{u}_{2}(x)=0$ , and thus

[TABLE] 2. (ii)

$\bar{p}(x)\in Q_{2}$ . In this case, the relation (75) yields $\bar{u}_{1}(x)=0$ and $\bar{u}_{2}(x)=\frac{1}{\alpha}\bar{p}_{2}(x)$ , and thus

[TABLE] 3. (iii)

$\bar{p}(x)\in Q_{0}$ . In this case, the relation (75) yields $\bar{u}_{1}(x)=\frac{1}{\alpha}\bar{p}_{1}(x)$ and $\bar{u}_{2}(x)=\frac{1}{\alpha}\bar{p}_{2}(x)$ , and thus

[TABLE] 4. (iv)

$\bar{p}(x)\in Q_{10}$ . In this case, the relation (75) yields $\bar{u}_{1}(x)=\frac{1}{\alpha}\bar{p}_{1}(x)$ and $\bar{u}_{2}(x)\in[0,\frac{1}{\alpha}\bar{p}_{2}(x)]$ . Assume first that $\bar{p}_{2}(x)$ is positive, and that $0<\bar{u}_{2}(x)<\frac{1}{\alpha}\bar{p}_{2}(x)$ (otherwise argue as in case (i) or (iii)). Then,

[TABLE]

A simple calculus argument shows that the right-hand side is a monotonically decreasing function of $\bar{u}_{2}(x)$ on $(0,\frac{1}{\alpha}\bar{p}_{2}(x))$ and hence attains its supremum for $\bar{u}_{2}(x)=0$ , which implies that

[TABLE]

for all $\bar{u}_{2}(x)\in(0,\frac{1}{\alpha}\bar{p}_{2}(x))$ . For $\bar{q}_{2}(x)$ negative, we argue similarly. 5. (v)

$\bar{p}(x)\in Q_{20}$ . In this case, the relation (75) yields $\bar{u}_{1}(x)\in[0,\frac{1}{\alpha}\bar{p}_{1}(x)]$ and $\bar{u}_{2}(x)=\frac{1}{\alpha}\bar{p}_{2}(x)$ . Proceeding as in case (iv) yields

[TABLE] 6. (vi)

$\bar{p}(x)\in Q_{12}$ . In this case, the relation (75) yields $(\bar{u}_{1}(x),\bar{u}_{2}(x))=\left(\frac{t}{\alpha}\bar{p}_{1}(x),\frac{1-t}{\alpha}\bar{p}_{2}(x)\right)$ for some $t\in[0,1]$ . Furthermore, we have that $|\bar{p}_{1}(x)|=|\bar{p}_{2}(x)|\leq\sqrt{2\alpha\beta}$ .

First, if $\bar{p}(x)=(0,0)\in Q_{12}$ , this implies that $\bar{u}(x)=(0,0)$ and hence

[TABLE]

For $\bar{p}(x)\neq(0,0)$ , we obtain

[TABLE]

Both expressions in parentheses are convex quadratic functions of $t\in[0,1]$ and hence attain their supremum at $t=0$ and $t=1$ . Together with $|\bar{p}_{1}(x)|\leq\sqrt{2\alpha\beta}$ this implies that

[TABLE]

Integrating over $D$ now yields the claim.

From Lemma 2.3 we obtain the following characterization of (sub)optimality of solutions.

Theorem 4.3.

If $(\bar{u},\bar{p})\in U\times U$ satisfies (OS), then for any $u\in U$ ,

[TABLE]

Hence if $\partial\mathcal{I}$ and $\mathcal{S}$ are sets of Lebesgue measure zero, $\bar{u}$ is a solution to ( $\mathcal{P}$ ).

We next investigate the behavior of $\mathcal{I}$ and $\mathcal{S}$ as $\beta\to\infty$ . For this purpose, we denote by $(u_{\beta},p_{\beta})$ the solution to (OS) for given $\beta>0$ , with corresponding free arc $\mathcal{I}_{\beta}$ . Note that the value of $\beta$ does not appear in the relation (52) except as part of the case distinction, and hence $\beta\to\infty$ does not necessarily imply that $u_{\beta}\to 0$ .

Theorem 4.4.

Let $\alpha>0$ be fixed and let $(u_{\beta},p_{\beta})$ satisfy (OS). Then, $|\mathcal{I}_{\beta}|\to 0$ as $\beta\to\infty$ .

Proof 4.5.

We use the minimizing properties of $u_{\beta}$ with respect to $\mathcal{F}+\mathcal{G}^{**}$ by making use of $g^{**}$ computed in Appendix B; see (b.1). Note that from the subdifferential inclusion (75), we can see that $u_{\beta}(x)\in D_{0}$ if and only if $p_{\beta}(x)\in\overline{Q_{0}}$ . Since $g^{**}(0)=0$ , we have that

[TABLE]

i.e., the family $\{\mathcal{G}^{**}(u_{\beta})\}_{\beta>0}$ is bounded. We thus have for the free arc

[TABLE]

that

[TABLE]

where the right-hand side remains bounded as $\beta\to\infty$ if and only if the second term goes to zero as claimed.

Note that $\partial\mathcal{I}_{\beta}\subset\mathcal{I}_{\beta}$ and hence, from the estimate (94), the corresponding optimality gap $\beta|\partial\mathcal{I}_{\beta}|$ remains bounded for $\beta\to\infty$ .

If $p_{\beta}$ is uniformly bounded pointwise almost everywhere, we can deduce that $\mathcal{I}_{\beta}$ must vanish for some sufficiently large (finite) value of $\beta$ .

Theorem 4.6.

If $V\hookrightarrow L^{\infty}(D)$ , then there exists a $\beta_{0}>0$ such that $|\mathcal{I}_{\beta}|=0$ for all $\beta\geq\beta_{0}$ .

Proof 4.7.

Due to the estimate (94) and the definition of $\mathcal{G}^{**}$ , the family $\{u_{\beta}\}_{\beta>0}$ is bounded in $U$ . Hence $\{Su_{\beta}\}_{\beta>0}$ and thus $\{F^{\prime}(Su_{\beta})\}_{\beta>0}$ are bounded in $Y$ and $Y^{*}$ , respectively. Since $S^{*}$ maps continuously to $L^{\infty}(D)$ , this implies that $\{p_{\beta}\}_{\beta>0}=\{-S^{*}F^{\prime}(Su_{\beta})\}_{\beta>0}$ is uniformly bounded pointwise almost everywhere by a constant $M>0$ . Choosing $\beta_{0}$ such that $M>\sqrt{2\alpha\beta_{0}}$ , we obtain from the subdifferential inclusion (75) that $Q_{0}=Q_{10}=Q_{20}=\emptyset$ , which yields the claim.

Remark 4.8.

The above theorem is a result in the spirit of exact penalization as in, e.g., [9]. However, it does not yield an exact penalization of the switching condition $u_{1}u_{2}=0$ almost everywhere since the singular set $\mathcal{S}$ cannot be controlled fully. It appears difficult to give a sufficient condition for $\mathcal{S}$ to be empty, since on this set neither $\mathcal{F}(u)$ nor $\mathcal{G}(u)$ yield enough information to decide which component of $u$ should be active. On the other hand, since $|\bar{p}_{1}(x)|=|\bar{p}_{2}(x)|$ has to hold on the singular arc, we can expect $|\mathcal{S}|$ to be small. We shall comment on the cardinality of $\mathcal{S}$ for the numerical examples. Direct extensions of the concepts in [9] are not possible, since sparsity-promoting or exact penalty functionals of the type $|\cdot|^{p}$ with $p\in[0,1]$ on the controls do not lead to well-posed optimal control problems.

5 Numerical solution

We return to the Moreau–Yosida regularization of the optimality system (OS): For given $\gamma>0$ , find $(u_{\gamma},p_{\gamma})\in U\times U$ satisfying

[TABLE]

Since $\mathcal{F}^{\prime}(u)=S^{*}(Su-z)$ is linear and bounded, assumption (a2) is clearly satisfied; in addition, the explicit characterization of $\partial\mathcal{G}^{*}$ in Section 3 immediately yields that $\inf_{q\in\partial\mathcal{G}^{*}(p)}\|q\|_{U}\leq\frac{1}{\alpha}\|p\|_{U}$ , and hence assumption (a3) holds. From Proposition 2.6 and Proposition 2.7, we thus obtain existence of a solution (which is unique if $S$ is injective) and convergence to a solution of (OS) as $\gamma\to 0$ . For later reference, we note that the mapping properties of $S^{*}$ imply that $p_{\gamma}\in V$ .

The solution to (OSγ) can be computed using a semismooth Newton method. We first show that $H_{\gamma}$ is Newton-differentiable. Recall that $H_{\gamma}$ is defined pointwise almost everywhere by

[TABLE]

and that $h_{\gamma}$ is globally Lipschitz continuous with constant $\gamma^{-1}$ by Proposition 2.5 (iii). Hence, $h_{\gamma}$ is directionally differentiable almost everywhere. In addition, $h_{\gamma}$ is piecewise differentiable, and hence its directional derivative

[TABLE]

at $q$ in direction $\delta q$ satisfies

[TABLE]

Together we obtain that $h_{\gamma}$ is semismooth; see, e.g., [15, Theorem 8.2] or [22, Proposition 2.7]; see also [22, Proposition 2.26].

This implies that the superposition operator $H_{\gamma}$ is Newton-differentiable from $V\hookrightarrow L^{r}(D;\mathbb{R}^{2})$ to $L^{2}(D;\mathbb{R}^{2})$ for any $r>2$ ; see, e.g., [15, Example 8.12] or [22, Theorem 3.49]. Its Newton derivative will be denoted by $D_{N}H_{\gamma}:V\to U$ , and it is given pointwise almost everywhere at $p$ in direction $\delta p$ by a measurable selection

[TABLE]

where $\partial_{C}h_{\gamma}(q)$ is the Clarke derivative, which for piecewise differentiable functions is given by the convex hull of the piecewise derivatives at each point. Specifically, for $h_{\gamma}$ given in Section 3.3, a Newton derivative $D_{N}h_{\gamma}(q)\in\partial_{C}h_{\gamma}(q)$ is given by

[TABLE]

where $\mathrm{diag}(\cdot,\cdot)$ denotes the $2\times 2$ diagonal matrix with the given entries.

In the sequel, we shall require the following two properties of the Newton derivative.

Lemma 5.1.

For all $p\in V$ and $\delta p\in V$ , we have

[TABLE]

Proof 5.2.

Recall from Proposition 2.5 that $h_{\gamma}$ is the derivative of the convex functional $(g^{*})_{\gamma}$ and hence is monotone. Therefore we have for all $t>0$ , almost all $q$ , and all $\delta q$ that

[TABLE]

Dividing by $t^{2}>0$ and taking the limit as $t\to 0$ yields

[TABLE]

Similarly, since $h_{\gamma}$ is globally Lipschitz with constant $\gamma^{-1}$ , we have for all $t>0$ , almost all $q$ , and all $\delta q$ that

[TABLE]

Taking again the limit as $t\to 0$ yields

[TABLE]

As a consequence, all elements in the Clarke derivative satisfy the inequalities (103) and (105). Since $D_{N}H_{\gamma}(p)$ is taken as a measurable selection from $\partial_{C}h_{\gamma}(p(\cdot))$ , the claim follows by substitution and integration over $D$ .

To apply a semismooth Newton method to (OSγ), we first introduce the state $y_{\gamma}:=S(u_{\gamma})\in Y$ and eliminate $u_{\gamma}$ , thus obtaining the equivalent optimality system

[TABLE]

Considering the system (106) as an operator equation from $Y\times V$ to $Y\times V$ , a semismooth Newton step for its solution consists in computing $(\delta y,\delta p)\in Y\times V$ for given $(y^{k},p^{k})\in Y\times V$ such that

[TABLE]

and setting $y^{k+1}=y^{k}+\delta y$ and $p^{k+1}=p^{k}+\delta p$ .

To show superlinear convergence of this iteration, it remains to show uniform solvability of each Newton step.

Proposition 5.3.

For any $(y,p)\in Y\times V$ and $(w_{1},w_{2})\in Y\times V$ , the system

[TABLE]

has a solution $(\delta y,\delta p)\in Y\times V$ which satisfies

[TABLE]

Proof 5.4.

Eliminating $\delta p=S^{*}\delta y+w_{2}\in V$ , we obtain that (108) is equivalent to

[TABLE]

Since $S^{*}$ is linear and bounded from $Y$ to $V$ and $D_{N}H_{\gamma}$ is monotone on $V$ from Lemma 5.1, the operator $SD_{N}H_{\gamma}(p)S^{*}$ is maximally monotone from $Y$ to $Y$ ; see, e.g., [2, Propositions 20.10, 20.24]. Minty’s theorem thus yields existence of a solution $\delta y\in Y$ and hence of a corresponding $\delta p\in V$ ; see, e.g., [2, Proposition 21.1].

Taking the inner product of equation (110) with $\delta y$ and using Lemma 5.1 with $S^{*}\delta y\in V\hookrightarrow U$ implies that

[TABLE]

using the boundedness of $S^{*}$ from $Y$ to $V$ and Lemma 5.1 with $w_{2}\in V\hookrightarrow U$ . The second equation of (108) then yields

[TABLE]

As a consequence of the Newton differentiability of $H_{\gamma}$ and of Proposition 5.3, we obtain the following result; see, e.g., [15, Theorem 8.16], [22, Chapter 3.2].

Theorem 5.5.

The semismooth Newton iteration (107) converges locally superlinearly in $Y\times V$ .

Since the right-hand side of the Newton system (107) is linear apart from the term $H_{\gamma}(p^{k})$ , we can use the following termination criterion for the Newton iteration: If all active sets $A_{i}(p)=\left\{x\in\Omega:p(x)\in Q^{\gamma}_{i}\right\}$ coincide for $p^{k}$ and $p^{k+1}$ , and the control is computed as $u^{k+1}=H_{\gamma}(p^{k+1})$ , then $(u^{k+1},p^{k+1})$ satisfies (OSγ); see, e.g., [15, Remark 7.1.1].

This can be used as part of a continuation strategy to deal with the local convergence behavior of Newton methods: Starting with $\gamma^{0}$ large and $(y^{0},p^{0})=(0,0)$ , we solve the regularized optimality system (OSγ) using the semismooth Newton iteration (107). If the iteration converges for some $\gamma^{m}$ (in the sense that all active sets coincide), we reduce $\gamma^{m+1}=\frac{1}{10}\gamma^{m}$ and solve the system (OSγ) again with the solution for $\gamma^{m}$ as the starting point. This procedure is terminated if the Newton iteration converges in a single step (assuming that the corresponding iterate then satisfies the system for smaller values of $\gamma$ as well) or if the Newton iteration fails to converge within a given number of steps (assuming that the system has then become too ill-conditioned for a stable numerical solution). In any case, the continuation is stopped when $\gamma^{m}\leq 10^{-16}$ is reached.

While this strategy has proved robust for problems with scalar $L^{1}$ - and $L^{0}$ -type penalties, see e.g. [16, 7], the situation is more delicate for the vector functional considered here; this is in particular the case when the singular arc $\mathcal{S}$ is non-negligible and $D_{N}H_{\gamma}$ is not a diagonal matrix, where the continuation strategy failed in some cases to provide a good initial guess for the next Newton iteration. We thus combine the semismooth Newton method with a backtracking line search along the Newton direction. In principle, this requires computation of $(\mathcal{G}^{*}_{\gamma})^{*}$ (or $\mathcal{F}^{*}$ and $\mathcal{G}^{*}_{\gamma}$ ); however, if the tracking term $\mathcal{F}$ is strictly convex (as will be the case in the examples considered below), the system (OSγ) is a sufficient as well as necessary condition and hence we can equivalently backtrack according to the residual norm of (OSγ). This was sufficient to achieve a robust and superlinear convergence in all examples.

6 Numerical examples

We illustrate the behavior of the proposed approach and the structure of the resulting controls with two numerical examples. First, we consider an elliptic problem where the two control components each act along a strip in one coordinate direction. Specifically, we set $\Omega=[0,1]^{2}$ , $D=[0,1]$ ,

[TABLE]

and consider the control-to-state mapping $S:u\mapsto y\in Y=L^{2}(\Omega)$ satisfying

[TABLE]

The target is

[TABLE]

see Fig. 3.

The state $y$ and adjoint $p$ are discretized using piecewise linear finite elements based on a uniform triangulation $\mathcal{T}_{h}$ of the domain $\Omega$ with $N_{h}=128\times 128$ nodes. Since the control is eliminated, this can be interpreted as a variational discretization. Integration over the piecewise defined functions $H_{\gamma}(p_{h})$ and $D_{N}H_{\gamma}(p_{h})\delta p_{h}$ in the weak formulation of (107) is approximated by applying the mass matrix to the vector of nodal values; see [7]. The control operator $B$ is approximated by forming the tensor product of the discrete indicator function of $\omega_{i}$ with the nodal values of $u_{i}$ ; the adjoint operator $B^{*}$ is approximated by the transpose of this matrix in order to preserve symmetry. The “globalized” semismooth Newton method with continuation and line searches described above is applied to the discretized system. The continuation is started at $\gamma^{0}=1$ and the backtracking is performed in steps of $\tau_{i}=2^{-i}$ for $i=0,\dots,40$ ; if $\tau_{i}<10^{-12}$ , the Newton iteration is restarted with reduced $\gamma$ . Since we no longer perform full Newton steps, we augment the termination criterion for the Newton iteration with an additional check for the residual norm in the optimality system, i.e., we terminate if all active sets coincide and the residual is smaller than $10^{-6}$ . A Matlab implementation of the described algorithm can be downloaded from https://github.com/clason/switchingcontrol.

We begin by illustrating the effects of the values of $\alpha$ and $\beta$ on the structure of the resulting controls. Figure 4 shows the final computed controls $u_{\gamma}$ for the same target $z$ and different combinations of control costs. For the choice $\alpha=\beta=10^{-3}$ (Fig. 4(a)), the control has a pure switching structure, with $80$ nodes (out of $128$ ) having values in the active set $Q^{\gamma}_{1}$ and $48$ nodes in the set $Q^{\gamma}_{2}$ (the remaining sets being empty); in particular, the singular arc $\mathcal{S}$ is empty. Furthermore, the effect of the $L^{2}$ costs on the active control components can be observed clearly. Decreasing $\beta$ to $10^{-8}$ results in a control that is no longer purely switching (Fig. 4(b)), although some switching behavior still obtains in parts of $D$ ; the resulting active sets have $51$ nodes in $Q^{\gamma}_{1}$ , $25$ nodes in $Q^{\gamma}_{2}$ , and $52$ nodes in the regularized free arc $Q^{\gamma}_{0}$ . Since $\alpha$ is unchanged, the magnitude of the active controls is the same as before. Decreasing $\alpha$ , on the other hand, allows for controls of larger magnitude, but results in the appearance of singular arcs. For $\alpha=10^{-5}$ and $\beta=10^{-3}$ (Fig. 4(c)), we observe a control which is almost purely switching ( $66$ and $59$ nodes in $Q^{\gamma}_{1}$ and $Q^{\gamma}_{2}$ , respectively) but still has a non-negligible singular arc with $3$ nodes in $Q^{\gamma}_{12}$ . The control shows a chittering behavior on part of the switching arc, which can be attributed to the weak but not pointwise convergence of the regularized controls. For the smaller value of $\beta$ (Fig. 4(d)), the singular arc disappears at the expense of the appearance of a large free arc ( $5$ nodes in $Q^{\gamma}_{1}$ , $3$ nodes in $Q^{\gamma}_{2}$ , and $120$ nodes in $Q^{\gamma}_{0}$ ).

Let us briefly comment on the convergence behavior of the “globalized” Newton method. For $\gamma>10^{-9}$ , the semismooth Newton iteration shows the typical superlinear behavior, converging within two or three (full) steps to a solution of the system (OSγ). For smaller values of $\gamma$ , backtracking becomes necessary after one full step, but, depending on the presence of singular arcs, often enters into a superlinear phase again where full steps are taken to convergence. Specifically, in the case of $\alpha=\beta=10^{-3}$ , the iteration terminates successfully at $\gamma=10^{-12}$ with only a few reduced steps necessary. For $\alpha=10^{-5}$ and $\beta=10^{-3}$ , more line searches are performed, but the final superlinear phase is still observed for $\gamma>10^{-13}$ , after which the Newton iteration terminated since no sufficient decrease in the residual was possible. However, restarting with smaller $\gamma$ still allowed some successful steps before terminating again, which continued until the specified terminal value of $\gamma=10^{-16}$ was reached. For $\beta=10^{-8}$ , no backtracking was necessary, and the algorithm showed the typical behavior of a semismooth Newton method with continuation (terminating successfully at $\gamma=10^{-9}$ for $\alpha=10^{-3}$ and at $\gamma=10^{-10}$ for $\alpha=10^{-5}$ ).

To demonstrate the applicability of the proposed approach to switching control of parabolic equations, we also show results for the one-dimensional heat equation, where $S:u\mapsto y$ satisfying

[TABLE]

with $\Omega=[-1,1]$ , $D=[0,2]$ , $\Omega_{T}=D\times\Omega$ ,

[TABLE]

As a target, we choose the trajectory of the heat equation with the right-hand side

[TABLE]

see Fig. 5.

The discretization is similar as in the elliptic case, using a full space-time discontinuous Galerkin discretization corresponding to a backward Euler method with $N_{h}=128$ spatial grid points and $N_{t}=512$ time steps.

The resulting controls for $\alpha=10^{-1}$ are shown in Fig. 6. For $\beta=1$ (Fig. 6(a)), the control is again of purely switching type with $256$ nodes each in $Q^{\gamma}_{1}$ and $Q^{\gamma}_{2}$ . No backtracking was necessary, and the continuation terminated successfully at $\gamma=10^{-9}$ . The control for $\beta=10^{-1}$ (Fig. 6(b)) shows a free arc, with $77$ nodes in $Q^{\gamma}_{1}$ , $110$ nodes in $Q^{\gamma}_{2}$ , and $325$ nodes in $Q^{\gamma}_{0}$ . The convergence behavior is now different due to the intermittent appearance of singular arcs: Although the first continuation step with $\gamma=10^{-2}$ shows the usual superlinear convergence with full steps, the resulting iterate contains nodes in $Q^{\gamma}_{10}$ and $Q^{\gamma}_{20}$ . Subsequently, the iterations for $\gamma>10^{-5}$ suffer from progressively smaller steps until no sufficient decrease is possible. At $\gamma=10^{-5}$ , however, the corresponding singular arc $\partial\mathcal{I}$ is empty and the iteration returns to superlinear convergence with full steps, terminating successfully at $\gamma=10^{-9}$ . The difference to the elliptic case can be attributed to the lower regularity of the adjoint state $p$ with respect to the control dimension (here: time) and the corresponding smaller norm gap in the regularized subdifferential $H_{\gamma}(p)$ .

7 Conclusion

A framework for optimal control problems was presented that promotes controls of switching type. While switching is promoted by a sparsity-enhancing part of the cost functional, the active controls are weighted with quadratic cost. Analysis of the proposed approach is carried out by techniques from convex analysis, while its numerical solution is achieved using a semismooth Newton method with continuation and line searches. Numerical results support the theoretical findings.

There are many interesting follow-up topics, including the treatment of problems with nonlinear control-to-state mappings, a more detailed analysis of the influence of the control cost parameters on the structure of the controls, and problems with multiple controls exhibiting generalized switching structures.

Appendix A Application to other binary penalties

This appendix demonstrates the application of the approach of Section 3 to other functionals involving the binary functional $|v|_{0}$ . While the Fenchel conjugates and subdifferentials have already been obtained in the previous works cited below, the proximal mappings and corresponding Moreau–Yosida regularizations and complementarity formulations are new.

A.1 Sparse control

We first consider the functional

[TABLE]

which promotes sparsity in optimal control and, contrary to $L^{1}$ -type penalties, allows separate penalization of magnitude and support; see [16]. Setting

[TABLE]

we compute the Fenchel conjugate

[TABLE]

by case distinction. Assume that the supremum is attained for some $\bar{v}\in\mathbb{R}$ . Then we discriminate the following two cases:

(i)

$\bar{v}=0$ , in which case $g(\bar{v})=0$ and hence $g^{*}(q)=0$ ; 2. (ii)

$\bar{v}\neq 0$ , in which case $g(\bar{v})=\frac{\alpha}{2}\bar{v}^{2}+\beta$ . Since $g$ is differentiable at $\bar{v}$ , the necessary condition for $\bar{v}$ to attain the maximum is $q=\alpha\bar{v}$ . Solving for $\bar{v}$ and inserting in (a.1) yields

[TABLE]

It remains to decide which of these cases is attained for a given $q$ , i.e., whether

[TABLE]

This directly yields

[TABLE]

as well as

[TABLE]

We now turn to the computation for given $\gamma>0$ and $v\in\mathbb{R}$ of the proximal mapping $w=\mathrm{prox}_{\gamma g^{*}}(v)$ of $g^{*}$ or, equivalently, the resolvent of $\partial g^{*}$ , which is characterized by the relation $v\in(\mathrm{Id}+\gamma\partial g^{*})(w)$ . We now distinguish all possible cases in (a.2):

(i)

$|w|<\sqrt{2\alpha\beta}$ : In this case $v=w$ , which implies that $|v|<\sqrt{2\alpha\beta}$ . 2. (ii)

$|w|>\sqrt{2\alpha\beta}$ : In this case $v=(1+\frac{\gamma}{\alpha})w$ , which implies that $|v|>(1+\frac{\gamma}{\alpha})\sqrt{2\alpha\beta}$ . 3. (iii)

$|w|=\sqrt{2\alpha\beta}$ : In this case $v\in[w,(1+\frac{\gamma}{\alpha})w]$ , which implies that $\sqrt{2\alpha\beta}\leq|v|\leq(1+\frac{\gamma}{\alpha})\sqrt{2\alpha\beta}$ .

Inserting this into the definition of the Moreau–Yosida regularization and simplifying yields

[TABLE]

which can be interpreted as a soft-thresholding operator.

Since $h_{\gamma}:=(\partial g^{*})_{\gamma}$ is Lipschitz continuous and piecewise differentiable, it is semismooth, and its Newton-derivative at $q$ in direction $\delta q$ is given by

[TABLE]

A.2 Multi-bang control

We now consider the multi-bang functional

[TABLE]

where $u_{1},\dots,u_{d}$ are given desired control states and $\delta_{C}$ denotes the indicator function of the convex set $C$ . In optimal control problems, the binary term (together with the pointwise constraints) promotes controls which, for $\beta$ sufficiently large, take on only the desired values almost everywhere except possibly on a singular set; see [7].

Proceeding as in Section A.1 yields the Fenchel conjugate

[TABLE]

whose subdifferential is

[TABLE]

where

[TABLE]

Note that some of these sets can be empty. In fact, for $\beta$ sufficiently large, $Q_{0}$ and hence $Q_{i0}$ , $i=1,\dots,d$ , can be guaranteed to vanish; see [7, § 2.3].

To compute for given $\gamma>0$ and $v\in\mathbb{R}$ the resolvent $w=(\mathrm{Id}+\gamma\partial g^{*})^{-1}(v)$ of $\partial g^{*}$ , we again use the relation $v\in\{w\}+\gamma\partial g^{*}(w)$ and follow the case differentiation in the subdifferential.

(i)

$w\in Q_{i}$ for some $i\in\{1,\dots,d\}$ : In this case, $v=w+\gamma u_{i}$ , which implies that

[TABLE]

and

[TABLE]

(with the first and last condition being void for $i=1$ and $i=d$ , respectively). 2. (ii)

$w\in Q_{0}$ : In this case, $v=\left(1+\frac{\gamma}{\alpha}\right)w$ , which implies that

[TABLE]

and

[TABLE] 3. (iii)

$w\in Q_{i0}$ for some $i\in\{1,\dots,d\}$ : In this case, $v\in[w,(1+\frac{\gamma}{\alpha})w]$ and $w=\alpha u_{i}+\sqrt{2\alpha\beta}$ , which implies that

[TABLE] 4. (iv)

$w\in Q_{i,i+1}$ for some $i\in\{1,\dots,d-1\}$ : In this case, $v\in[w+\gamma u_{i},w+\gamma u_{i+1}]$ and $w=\frac{\alpha}{2}(u_{i}+u_{i+1})$ , which implies that

[TABLE]

Inserting this into the definition of the Moreau–Yosida regularization and simplifying, we obtain

[TABLE]

where

[TABLE]

Since $h_{\gamma}:=(\partial g^{*})_{\gamma}$ is Lipschitz continuous and piecewise differentiable, it is semismooth, and its Newton-derivative at $q$ in direction $\delta q$ is given by

[TABLE]

Appendix B Biconjugate of $\scriptstyle g$

We now compute the biconjugate $g^{**}$ used in Theorem 4.4. As in Section 3.1, we proceed by a casewise maximization based on the definition of $g^{*}$ ; however, we need to take into account the restrictions $q\in Q_{i}$ . We assume that $v_{1},v_{2}\geq 0$ , the remaining cases following by symmetry. Consider first

[TABLE]

and note that the supremum can only be attained for $q_{1},q_{2}\geq 0$ . Introducing Lagrange multipliers $\lambda,\mu\geq 0$ for the constraints $q_{1}-q_{2}\geq 0$ and $\sqrt{2\alpha\beta}-q_{2}\geq 0$ , we obtain the KKT system

[TABLE]

We now make a case differentiation based on the optimal value of the multipliers $\bar{\lambda},\bar{\mu}$ .

(i)

$\bar{\mu}=0$ : Adding the first two equations then yields

[TABLE]

To obtain an equation for $\bar{q}_{2}$ , we further discriminate based on the value of $\bar{\lambda}$ :

(a)

$\bar{\lambda}=0$ : The second equation yields the condition $v_{2}=0$ . In this case, the value of $\bar{q}_{2}$ is irrelevant to the supremum and we obtain for any admissible $\bar{q}_{2}$

[TABLE] 2. (b)

$\bar{\lambda}\neq 0$ : In this case, $\bar{q}_{1}=\bar{q}_{2}=\alpha(v_{1}+v_{2})$ and we obtain

[TABLE]

while the condition $\bar{q}_{2}\leq\sqrt{2\alpha\beta}$ translates into

[TABLE] 2. (ii)

$\mu\neq 0$ : This implies $\bar{q}_{2}=\sqrt{2\alpha\beta}$ . For the value of $\bar{q}_{1}$ , we again further discriminate based on the value of $\bar{\lambda}$ :

(a)

$\bar{\lambda}=0$ : The first equation then yields $v_{1}=\frac{1}{\alpha}\bar{q}_{1}$ and we obtain

[TABLE]

while the condition $\bar{q}_{1}\geq\bar{q}_{2}=\sqrt{2\alpha\beta}$ translates into

[TABLE] 2. (b)

$\bar{\lambda}\neq 0$ : In this case, $\bar{q}_{1}=\bar{q}_{2}=\sqrt{2\alpha\beta}$ , which yields

[TABLE]

Note that no conditions on $v_{1},v_{2}$ are obtained.

Collecting these cases, we obtain

[TABLE]

We proceed similarly for

[TABLE]

to obtain the possible values and conditions

[TABLE]

where the case (i) a) has been absorbed into the first and second case (which for $v_{1}=0$ are exhaustive).

For

[TABLE]

we use the fact that the optimality conditions for the maximizer are given by $\bar{q}=P_{Q_{0}}(\alpha v)$ , where $P_{Q_{0}}$ denotes the projection onto the convex feasible set $Q_{0}=\{q:q_{1},q_{2}\geq\sqrt{2\alpha\beta}\}$ . Inserting the possible cases $\bar{q}_{i}\in\{\alpha v_{i},\sqrt{2\alpha\beta}\}$ , $i=1,2$ , yields

[TABLE]

It remains to decide for a given $v\in\mathbb{R}^{2}$ which is the maximal of the feasible values.

(i)

For $v_{1},v_{2}\geq\sqrt{\frac{2\beta}{\alpha}}$ , we have the three possible values

[TABLE]

Since $\sqrt{2\alpha\beta}\leq\alpha v_{i}$ , $i=1,2$ , and $\beta>0$ , the first two are clearly smaller than the third. For the last case, we consider

[TABLE]

For these values of $v_{1},v_{2}$ , the terms in parentheses are monotonously increasing functions of $v_{1}$ and $v_{2}$ , respectively; the minimimum is thus attained for $v_{1}=v_{2}=\sqrt{\frac{2\beta}{\alpha}}$ at $2\beta>0$ . Hence, $g^{**}(v)=\frac{\alpha}{2}(v_{1}^{2}+v_{2}^{2})+\beta$ . 2. (ii)

For $v_{1}\geq\sqrt{\frac{2\beta}{\alpha}}\geq v_{2}$ , the only two distinct cases are

[TABLE]

Considering the difference of these functions as above, we conclude that $g^{**}(v)=\frac{\alpha}{2}v_{1}^{2}+\sqrt{2\alpha\beta}v_{2}$ . 3. (iii)

We argue similarly for $v_{2}\geq\sqrt{\frac{2\beta}{\alpha}}\geq v_{1}$ to conclude $g^{**}(v)=\frac{\alpha}{2}v_{2}^{2}+\sqrt{2\alpha\beta}v_{1}$ . 4. (iv)

For $v_{1}+v_{2}\leq\sqrt{\frac{2\beta}{\alpha}}$ , we have to compare the two cases

[TABLE]

We have

[TABLE]

and thus $g^{**}(v)=\frac{\alpha}{2}(v_{1}+v_{2})^{2}$ . 5. (v)

In the remaining case $v_{1},v_{2}\leq\sqrt{\frac{2\beta}{\alpha}}$ and $v_{1}+v_{2}\geq\sqrt{\frac{2\beta}{\alpha}}$ , the only possible value is

[TABLE]

Arguing similarly for the three remaining quadrants of $\mathbb{R}^{2}$ , we obtain

[TABLE]

where

[TABLE]

see Fig. 7.

A short calculation shows that

[TABLE]

This is obvious for $v\in D_{0}$ and $v\in D_{4}$ . For $v\in D_{1}$ , we have $\sqrt{2\alpha\beta}\geq\alpha|v_{2}|$ and hence

[TABLE]

and similarly for $v\in D_{2}$ . For $v\in D_{3}$ , we consider the difference

[TABLE]

On $D_{3}$ , the terms in parentheses are monotonically increasing functions of $|v_{1}|$ and $|v_{2}|$ respectively, and thus the minimum is attained at the boundard $|v_{1}|+|v_{2}|=\sqrt{2\beta/\alpha}$ , i.e., for $|v_{1}|=t\sqrt{2\beta/\alpha}$ and $|v_{2}|=(1-t)\sqrt{2\beta/\alpha}$ for some $t\in[0,1]$ . Inserting this and simplifying yields

[TABLE]

which is a concave quadratic function of $t$ and thus attains its minimum at $t=0$ or $t=1$ , yielding $r(v)\geq 0$ as desired.

Acknowledgments

The work of CC and KK was supported in part by the Austrian Science Fund (FWF) under grant SFB f32 (SFB “Mathematical Optimization and Applications in Biomedical Sciences”). The work of KI was partially supported by the Army Research Office under grant daad 19-02-1-0394.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Hédy Attouch and Haı̈m Brezis “Duality for the sum of convex functions in general Banach spaces” In Aspects of Mathematics and its Applications Amsterdam: North-Holland, 1986, pp. 125–133 DOI: 10.1016/S 0924-6509(09)70252-1 · doi ↗
2[2] Heinz H. Bauschke and Patrick L. Combettes “Convex Analysis and Monotone Operator Theory in Hilbert Spaces” New York: Springer, 2011 DOI: 10.1007/978-1-4419-9467-7 · doi ↗
3[3] Amir Beck and Marc Teboulle “Smoothing and first order methods: a unified framework” In SIAM Journal on Optimization 22.2 , 2012, pp. 557–580 DOI: 10.1137/100818327 · doi ↗
4[4] Andrea Braides “ Γ Γ \Gamma -Convergence for Beginners” Oxford: Oxford University Press, 2002 DOI: 10.1093/acprof:oso/9780198507840.001.0001 · doi ↗
5[5] Haı̈m Brezis, Michael G. Crandall and Amnon Pazy “Perturbations of nonlinear maximal monotone sets in Banach space” In Comm. Pure Appl. Math. 23 , 1970, pp. 123–144 DOI: 10.1002/cpa.3160230107 · doi ↗
6[6] Italo Capuzzo Dolcetta and Lawrence C. Evans “Optimal switching for ordinary differential equations” In SIAM Journal on Control and Optimization 22.1 , 1984, pp. 143–161 DOI: 10.1137/0322011 · doi ↗
7[7] Christian Clason and Karl Kunisch “Multi-bang control of elliptic systems” In Annales de l’Institut Henri Poincaré (C) Analyse Non Linéaire 31.6 , 2014, pp. 1109–1130 DOI: 10.1016/j.anihpc.2013.08.005 · doi ↗
8[8] Ivar Ekeland and Roger Témam “Convex Analysis and Variational Problems” 28 , Classics Appl. Math. Philadelphia: SIAM, 1999 DOI: 10.1137/1.9781611971088 · doi ↗

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Code & Models

Videos

A convex analysis approach to optimal controls with switching structure for partial differential equations

Abstract

1 Introduction

2 Convex relaxation and regularization approach

Proposition 2.1**.**

Proof 2.2**.**

Lemma 2.3**.**

Proof 2.4**.**

Proposition 2.5**.**

Proposition 2.6**.**

Proposition 2.7**.**

Proof 2.8**.**

3 Switching cost functional g\scriptstyle gg

3.1 Fenchel conjugate of g\scriptstyle gg

3.2 Subdifferential of g∗\scriptstyle g^{*}g∗

3.3 Proximal mapping of g∗\scriptstyle g^{*}g∗

4 Optimality conditions and structure

Lemma 4.1**.**

Proof 4.2**.**

Theorem 4.3**.**

Theorem 4.4**.**

Proof 4.5**.**

Theorem 4.6**.**

Proof 4.7**.**

Remark 4.8**.**

5 Numerical solution

Lemma 5.1**.**

Proof 5.2**.**

Proposition 5.3**.**

Proof 5.4**.**

Theorem 5.5**.**

6 Numerical examples

7 Conclusion

Appendix A Application to other binary penalties

A.1 Sparse control

A.2 Multi-bang control

Appendix B Biconjugate of g\scriptstyle gg

Acknowledgments

Proposition 2.1.

Proof 2.2.

Lemma 2.3.

Proof 2.4.

Proposition 2.5.

Proposition 2.6.

Proposition 2.7.

Proof 2.8.

3 Switching cost functional $\scriptstyle g$

3.1 Fenchel conjugate of $\scriptstyle g$

3.2 Subdifferential of $\scriptstyle g^{*}$

3.3 Proximal mapping of $\scriptstyle g^{*}$

Lemma 4.1.

Proof 4.2.

Theorem 4.3.

Theorem 4.4.

Proof 4.5.

Theorem 4.6.

Proof 4.7.

Remark 4.8.

Lemma 5.1.

Proof 5.2.

Proposition 5.3.

Proof 5.4.

Theorem 5.5.

Appendix B Biconjugate of $\scriptstyle g$