TL;DR
This paper introduces a convex analysis framework for optimal control problems with switching structures in PDEs, enabling explicit characterization and efficient computation of solutions with zero optimality gap.
Contribution
It develops a convex relaxation and a primal-dual system for hybrid control costs, facilitating semismooth Newton methods for PDE control problems with switching controls.
Findings
Explicit pointwise characterization of optimal controls.
Zero optimality gap for controls with switching structure.
Numerical examples demonstrate method effectiveness.
Abstract
Optimal control problems involving hybrid binary-continuous control costs are challenging due to their lack of convexity and weak lower semicontinuity. Replacing such costs with their convex relaxation leads to a primal-dual optimality system that allows an explicit pointwise characterization and whose Moreau-Yosida regularization is amenable to a semismooth Newton method in function space. This approach is especially suited for computing switching controls for partial differential equations. In this case, the optimality gap between the original functional and its relaxation can be estimated and shown to be zero for controls with switching structure. Numerical examples illustrate the effectiveness of this approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A convex analysis approach to optimal controls with switching structure for partial differential equations
Christian Clason Faculty of Mathematics, University Duisburg-Essen, 45117 Essen, Germany () [email protected]
Kazufumi Ito Department of Mathematics, North Carolina State University, Raleigh, North Carolina, USA (). [email protected]
Karl Kunisch Institute of Mathematics and Scientific Computing, University of Graz, Heinrichstrasse 36, 8010 Graz, Austria (). [email protected]
(March 15, 2015)
Abstract
Optimal control problems involving hybrid binary–continuous control costs are challenging due to their lack of convexity and weak lower semicontinuity. Replacing such costs with their convex relaxation leads to a primal-dual optimality system that allows an explicit pointwise characterization and whose Moreau–Yosida regularization is amenable to a semismooth Newton method in function space. This approach is especially suited for computing switching controls for partial differential equations. In this case, the optimality gap between the original functional and its relaxation can be estimated and shown to be zero for controls with switching structure. Numerical examples illustrate the effectiveness of this approach.
1 Introduction
In the context of control of differential equations, switching control refers to problems with two or more controls of which only one should be active at every point in time. This is a challenging problem due to its hybrid discrete–continuous nature.
To partially set the stage, consider the parabolic partial differential equation on , where for an elliptic operator defined on , and is defined by for given control domains (which may include controls acting on the boundary). To promote a switching structure, we propose to use the binary function
[TABLE]
to construct a cost functional which has the value [math] if and only if at most one control is active pointwise. To guarantee coercivity, we also need to add an (in this case) quadratic term, i.e., we define for the pointwise control cost
[TABLE]
This term combines in a single functional both switching enhancement and a quadratic cost for the active control(s), where the binary part naturally acts as a penalization of the switching constraint . In this respect we shall consider the asymptotic behavior in Section 4.
For some we then consider the problem
[TABLE]
Using the solution operator , problem (3) can be expressed in reduced form as
[TABLE]
where is smooth and convex, and is neither smooth nor convex nor, in fact, weakly lower semicontinuous (since this is the case if and only if is lower semicontinous and convex, which is not the case; see, e.g., [4, Corollary 2.14]). This makes both its analysis and its numerical solution challenging; for example, one cannot rely on standard techniques to guarantee existence of solutions. We therefore consider the relaxed problem
[TABLE]
where is the biconjugate of , which is always convex. Existence and optimality conditions for the relaxed problem can readily be obtained. However, as we shall see, these optimality conditions are not directly amenable to numerical solution by Newton-type techniques. For this reason we consider a regularized optimality system
[TABLE]
where is the Moreau–Yosida approximation of the subdifferential of the Fenchel conjugate . Thus for the numerical realization, only is needed which can be computed without explicit knowledge of . For problem (3), the first relation of (6) coincides with the usual state and adjoint equations, while the second relation allows a pointwise characterization; see (65) below.
The remainder of this work is organized as follows. In Section 2, we shall provide the abstract existence results, derive optimality conditions, and prove the convergence of solutions to system (6) to minimizers of problem (5). Section 3 is dedicated to giving an explicit pointwise characterization of the subdifferential and its Moreau–Yosida in the concrete case of switching control; two other functionals involving (sparsity and multi-bang penalties) are discussed in Appendix A. These characterizations allow addressing the significant questions related to the relaxation (5) of (4) in Section 4: We clarify the relation between the value of the costs in (5) and in (4) in terms of the duality gap between and , and show that in certain cases it can be guaranteed to be zero. If this is the case, then the solution to problem (5) is also a solution to problem (4). Moreover, we analyze to which extent the choice of the functional , when used as part of control costs, in fact leads to optimal solutions of switching type. We shall be able to give a sufficient condition on the relation of and for (5) that rule out free arcs, where and are both strictly positive but not equal, whereas singular arcs, on which , may remain. Section 5 is concerned with the numerical solution of (6) via a path-following semismooth Newton method. To guarantee convergence, a globalization is required. This guarantees superlinear convergence of the semismooth Newton algorithm in spite of the challenging cost, which combines continuous and discrete objectives. Finally, Section 6 contains numerical tests for switching controls in the context of an elliptic and a parabolic partial differential equation.
Let us put our work into perspective with respect to the existing literature. Casting the problem of switching controls as a nonconvex optimization problem involving the binary functional is certainly new. Concerning the convex relaxation of nonconvex problems, we can draw from existing works. We only mention the monograph [8], where, however, the focus is on obtaining existence rather than on explicit optimality conditions and numerical realization. The partial (Moreau–Yosida) regularization of nonsmooth convex finite-dimensional problems for the purpose of efficiently applying first-order methods was investigated in [3]. Switching control has been studied mainly for ordinary differential equations; here we refer to [21] for a survey with emphasis on stability of switching systems. The Hamilton–Jacobi–Bellman equation for switching controls was extensively studied in [6] and [23]. Switching control in the context of partial differential equations was especially investigated with respect to their improved flexibility over nonswitching controls for stabilization [10, 18]. Controllability for systems with switching controls were studied in [24, 17]. The hybrid nature of continuous and discrete phenomena when the system switches among different modes is the focus of the work in [11, 12]. In [12] a relaxation technique combined with rounding strategies is proposed to solve mixed-integer programming problems arising in optimal control of partial differential equations. It is verified that the solution of the relaxed problems can be approximated with arbitrary accuracy by a solution satisfying the integer requirements. In [14] optimal control of linear switched systems are considered, and an algorithmic treatment is proposed that relies on an exhaustive search which involves solving on the order of differential Riccati equations, where denotes the number of possible controller configurations and the number of predefined switching times.
2 Convex relaxation and regularization approach
In this section we introduce the abstract framework and recall relevant concepts from convex analysis. Consider the variational problem
[TABLE]
where is a Hilbert space and is convex. If moreover is convex, any minimizer satisfies (under a regularity assumption stated below) the following necessary optimality conditions: There exists a such that , which holds if and only if ; see, e.g., [20, Proposition 4.4.4]. Here,
[TABLE]
denotes the Fenchel conjugate of the convex functional , and denotes its convex subdifferential. (In the following, we identify the Hilbert space with its dual and consider .) We thus obtain the primal-dual optimality system
[TABLE]
which is well-defined even for nonconvex as in the situation we are interested in. To argue existence of a solution, we will show that the system (8) is the necessary optimality condition for
[TABLE]
where is the biconjugate of , and make the following standard assumptions:
[TABLE]
Proposition 2.1**.**
Under assumption (a1), the system (8) admits a solution . If is strictly convex, this solution is unique.
Proof 2.2**.**
By assumption, is bounded from below by [math], which implies that as well, see, e.g. [2, Proposition 13.14]. Furthermore, Fenchel conjugates are always lower semicontinuous and convex, see, e.g. [2, Proposition 13.11]. Together with assumption (a1) this implies that is convex, weakly lower semicontinuous, and radially unbounded, and thus a standard subsequence argument yields existence of a minimizer to (9).
Since ensures that the stability condition
[TABLE]
holds, we can apply the sum rule for the convex subdifferential from [1] and again appeal to [20, Proposition 4.4.4] for to arrive at the necessary optimality conditions (8).
Problem (9) can be seen a convex relaxation of problem (). This approach is thus related to the -regularization in the calculus of variations, see, e.g., [8, Chapter IX], although here we consider a more specific relaxation and pass to the biconjugate only in the nonconvex term rather than to the full biconjugate functional , which allows us to obtain explicit optimality conditions in the primal-dual form (8) that are useful for numerical computations.
In general, a solution to system (8) is not necessarily a minimizer of (), since for nonconvex we cannot rely on equality in the Fenchel–Young inequality (which requires the characterization of the convex subdifferential). In fact, a solution to problem () may not even exist. However, for the class of penalties we are interested in, it is possible to show that a solution to system (8) is suboptimal in the sense that the corresponding functional value is within a certain distance of the infimum. This distance is given by the duality gap
[TABLE]
between and its Fenchel dual . This gap is always non-negative by the Fenchel–Young inequality, and vanishes if is convex and .
Lemma 2.3**.**
Let satisfy (a1), and let satisfy (8). Then
[TABLE]
Proof 2.4**.**
Assume that is a solution to system (8) and let be arbitrary. Recall that the first relation of (8) then implies that
[TABLE]
Furthermore, by definition (11) and the Fenchel–Young inequality (which holds for any proper ) we have that
[TABLE]
Hence,
[TABLE]
Since the subdifferential is in general multivalued and not Lipschitz continuous, system (8) is not amenable to numerical solution. We therefore introduce the Moreau–Yosida regularization of :
[TABLE]
where
[TABLE]
is the proximal mapping of ; see [19]. We recall the following properties of and , e.g., from [2, Props. 12.29, 12.15, 23.10, 23.43, 12.9, 16.34]; see also [15, Chapter 4.4].
Proposition 2.5**.**
Let be a proper convex function on a Hilbert space . Then,
- (i)
, where
[TABLE]
is the Moreau-envelope of , which is real-valued and convex. 2. (ii)
* is single-valued, maximally monotone and Lipschitz-continuous with constant ,* 3. (iii)
* for all ,* 4. (iv)
* for all and ,* 5. (v)
* (the resolvent of ).*
From the last property, we can see that
[TABLE]
i.e., is indeed the Moreau–Yosida regularization of .
For brevity, we set and from here on and consider the regularized optimality system
[TABLE]
Arguing as in Proposition 2.1, existence of a solution follows from the fact that this system is the necessary optimality condition for the problem
[TABLE]
using that implies that and that is single-valued by Proposition 2.5 (i,ii).
Proposition 2.6**.**
Under assumption (a1), the system (20) admits a solution . If is strictly convex, this solution is unique.
The convergence as requires additional assumptions on and :
[TABLE]
We point out that (a2 ii) is generically satisfied for functionals of the type , where
- (i)
is radially unbounded on a Banach space , 2. (ii)
is Fréchet differentiable and is bounded on bounded sets, 3. (iii)
is Fréchet differentiable and is uniformly bounded on ,
since in this case boundedness of implies boundedness of and hence boundedness of . In particular, it holds for many common tracking-type functionals of the form and bounded linear control-to-state mappings . In this case, and (a2 i) trivially holds. Assumption (a3) is more restrictive but satisfied for the class of functionals we shall consider later on.
Proposition 2.7**.**
If and satisfy assumptions (a1)–(a3), the family contains a subsequence converging weakly as to a solution to system (8). If is strictly convex, the whole sequence converges weakly.
Proof 2.8**.**
First, observe that
[TABLE]
by Proposition 2.5 (iii). By the optimality of we thus have for any that
[TABLE]
Hence, is bounded, and assumption (a2) yields that
[TABLE]
is bounded. From assumption (a3) together with Proposition 2.5 (iii) it then follows that for every , we have that
[TABLE]
i.e., and are bounded. Hence, there exist subsequences , and converging weakly in to some , , and , respectively. The weak closedness of then yields
[TABLE]
For the second relation of system (8), we first observe that due to the monotonicity of and using both relations of system (20), we have for any that
[TABLE]
and hence that for any sequence with ,
[TABLE]
Since is monotone, we can apply [5, Lemma 1.3(e)] to obtain that , i.e., satisfies system (8).
If is strictly convex, the solution to system (8) is unique, and the claim follows from a subsequence–subsequence argument.
To conclude this section, we compare the Moreau–Yosida regularization with the following complementarity formulation of the second relation of system (8): For any , we have that
[TABLE]
see also [15, Theorem 4.41]. The subdifferential inclusion can thus be equivalently expressed as a nonlinear equation. While the subdifferential inclusion is explicit with respect to , the nonlinear equation is implicit. Moreover, the appearance of in the proximal mapping rules out the effective use of semismooth Newton methods for the applications we have in mind. On the other hand, note that the Moreau–Yosida approximation (16) differs only in the absence of on the right hand side of the last equality. Hence semismooth Newton methods will be applicable.
3 Switching cost functional
To make practical use of the proposed approach, we require an explicit, pointwise, characterization of and . For this, we exploit the integral nature of functionals of the type
[TABLE]
with for some , which allows computing the Fenchel conjugate and its subdifferential pointwise as well; see, e.g., [8, Props. IV.1.2, IX.2.1], [2, Prop. 16.50].
Specifically, we consider here the switching cost functional on ,
[TABLE]
Other penalties of this class are discussed in Appendix A. The use of the term enhances switching between the control variables and in such a manner that simultaneous nontriviality of both of them is penalized. We shall give sufficient conditions which guarantee that in fact and are not simultaneously nontrivial except for a singular set of controls for which .
3.1 Fenchel conjugate of
To characterize
[TABLE]
first note that the function is lower semicontinuous and radially unbounded. The supremum in (32) is thus attained at some . We then discriminate the following cases:
- (i)
, in which case . The supremum in (32) is attained if and only if the necessary optimality condition holds. Solving for and inserting into (32) yields
[TABLE] 2. (ii)
, in which case . By the same argument as in case (i) we obtain
[TABLE] 3. (iii)
, in which case . Again, using the necessary optimality condition for the supremum in (32) yields
[TABLE]
It remains to decide which of these cases is attained based on the value of . For this purpose, define
[TABLE]
Since all are finite, the supremum in (32) is attained at
[TABLE]
From the definition, we have that if and if ; similarly for . Conversely, if , . Summarizing the above, we have {equation+} g^(q) = {12αq12if |q1|≥|q2| and |q2|≤2αβ*,12αq22if |q1|≤|q2| and |q1|≤2αβ,12α(q12+q22)-βif |q1|,|q2| ≥2αβ.
3.2 Subdifferential of
Since is the maximum of a finite number of convex functions, its subdifferential is given by
[TABLE]
where denotes the closed convex hull; see, e.g., [13, Corollary 4.3.2]. We make a case distinction based on all possibilities for , :
- (i)
only, which is the case if and only if
[TABLE]
Here the subdifferential is single-valued and given by
[TABLE] 2. (ii)
only, which is the case if and only if
[TABLE]
Here,
[TABLE] 3. (iii)
only, which is the case if and only if
[TABLE]
Here,
[TABLE] 4. (iv)
, which is the case if and only if
[TABLE]
Here, the subdifferential is given by the convex hull of , i.e.,
[TABLE]
To keep the notation concise, we use the convention here and below. 5. (v)
, which is the case if and only if
[TABLE]
Here,
[TABLE] 6. (vi)
, which is the case if and only if
[TABLE]
Here,
[TABLE]
Note that this also includes the case , since then .
Since is the disjoint union of the sets defined above, see Fig. 1, we thus obtain a complete characterization of the subdifferential .
3.3 Proximal mapping of
For the Moreau–Yosida regularization or the complementarity formulation, we need to compute the proximal mapping of or, equivalently, the resolvent of . For given and , the resolvent is characterized by the subdifferential inclusion
[TABLE]
Note that this implies
[TABLE]
and hence that , . We now follow the case discrimination in the characterization of the subdifferential.
- (i)
: In this case, the subdifferential inclusion (51) yields and ; solving for and inserting the result into the definition of yields
[TABLE] 2. (ii)
: In this case, and , and as in case (i) we have that
[TABLE] 3. (iii)
: In this case, and , and hence
[TABLE] 4. (iv)
: In this case, and . Since , we have from the definition of that . Hence
[TABLE] 5. (v)
: In this case, and . As in (iv), we have that
[TABLE] 6. (vi)
: In this case, and . This does not yield an explicit value for , although the definition of implies that . We therefore turn to the equivalent characterization of via the proximal mapping
[TABLE]
First, assume that (which implies ). The minimizer of the reduced problem is then given by the projection of the unconstrained minimizer to the (convex) feasible set , i.e.,
[TABLE]
Inserting each of these values for into the relation yields (after some algebraic manipulations)
[TABLE]
and
[TABLE]
respectively.
We argue similarly for (where ). Combining the two cases, we obtain
[TABLE]
and
[TABLE]
Inserting this into the definition of the Moreau–Yosida regularization
[TABLE]
and simplifying yields
[TABLE]
where
[TABLE]
see Fig. 2.
This pointwise characterization allows obtaining expressions for the Moreau–Yosida approximation and the complementarity formulation of .
4 Optimality conditions and structure
We now discuss the properties of solutions to system (8). Specifically, let
[TABLE]
with given by (31). The functional will be assumed to be a tracking term of the form
[TABLE]
for a Hilbert space (e.g., ), given , and a bounded linear control-to-observation mapping . We further assume the existence of a Banach space with such that the adjoint maps continuously into . The optimality system (8) is then given by
[TABLE]
From (b.2) it follows that is radially unbounded. Hence, and satisfy assumption (a1), and Proposition 2.1 yields existence of a solution (which is unique if is injective).
Using Section 3.2 and the pointwise characterization of the subdifferential of integral functionals (see, e.g., [2, Proposition 16.50]), the second relation in (OS) implies that for almost all ,
[TABLE]
We define the switching arc (where at most one control is active, i.e., nonzero)
[TABLE]
Clearly,
[TABLE]
Let us address the question when the solution to system (OS) will be optimal. For this purpose, we first estimate the duality gap (11).
Lemma 4.1**.**
If satisfies , then
[TABLE]
Proof 4.2**.**
We discriminate pointwise in the definition (11) based on the value of for almost every .
- (i)
. In this case, the relation (75) yields and , and thus
[TABLE] 2. (ii)
. In this case, the relation (75) yields and , and thus
[TABLE] 3. (iii)
. In this case, the relation (75) yields and , and thus
[TABLE] 4. (iv)
. In this case, the relation (75) yields and . Assume first that is positive, and that (otherwise argue as in case (i) or (iii)). Then,
[TABLE]
A simple calculus argument shows that the right-hand side is a monotonically decreasing function of on and hence attains its supremum for , which implies that
[TABLE]
for all . For negative, we argue similarly. 5. (v)
. In this case, the relation (75) yields and . Proceeding as in case (iv) yields
[TABLE] 6. (vi)
. In this case, the relation (75) yields for some . Furthermore, we have that .
First, if , this implies that and hence
[TABLE]
For , we obtain
[TABLE]
Both expressions in parentheses are convex quadratic functions of and hence attain their supremum at and . Together with this implies that
[TABLE]
Integrating over now yields the claim.
From Lemma 2.3 we obtain the following characterization of (sub)optimality of solutions.
Theorem 4.3**.**
If satisfies (OS), then for any ,
[TABLE]
Hence if and are sets of Lebesgue measure zero, is a solution to ().
We next investigate the behavior of and as . For this purpose, we denote by the solution to (OS) for given , with corresponding free arc . Note that the value of does not appear in the relation (52) except as part of the case distinction, and hence does not necessarily imply that .
Theorem 4.4**.**
Let be fixed and let satisfy (OS). Then, as .
Proof 4.5**.**
We use the minimizing properties of with respect to by making use of computed in Appendix B; see (b.1). Note that from the subdifferential inclusion (75), we can see that if and only if . Since , we have that
[TABLE]
i.e., the family is bounded. We thus have for the free arc
[TABLE]
that
[TABLE]
where the right-hand side remains bounded as if and only if the second term goes to zero as claimed.
Note that and hence, from the estimate (94), the corresponding optimality gap remains bounded for .
If is uniformly bounded pointwise almost everywhere, we can deduce that must vanish for some sufficiently large (finite) value of .
Theorem 4.6**.**
If , then there exists a such that for all .
Proof 4.7**.**
Due to the estimate (94) and the definition of , the family is bounded in . Hence and thus are bounded in and , respectively. Since maps continuously to , this implies that is uniformly bounded pointwise almost everywhere by a constant . Choosing such that , we obtain from the subdifferential inclusion (75) that , which yields the claim.
Remark 4.8**.**
The above theorem is a result in the spirit of exact penalization as in, e.g., [9]. However, it does not yield an exact penalization of the switching condition almost everywhere since the singular set cannot be controlled fully. It appears difficult to give a sufficient condition for to be empty, since on this set neither nor yield enough information to decide which component of should be active. On the other hand, since has to hold on the singular arc, we can expect to be small. We shall comment on the cardinality of for the numerical examples. Direct extensions of the concepts in [9] are not possible, since sparsity-promoting or exact penalty functionals of the type with on the controls do not lead to well-posed optimal control problems.
5 Numerical solution
We return to the Moreau–Yosida regularization of the optimality system (OS): For given , find satisfying
[TABLE]
Since is linear and bounded, assumption (a2) is clearly satisfied; in addition, the explicit characterization of in Section 3 immediately yields that , and hence assumption (a3) holds. From Proposition 2.6 and Proposition 2.7, we thus obtain existence of a solution (which is unique if is injective) and convergence to a solution of (OS) as . For later reference, we note that the mapping properties of imply that .
The solution to (OSγ) can be computed using a semismooth Newton method. We first show that is Newton-differentiable. Recall that is defined pointwise almost everywhere by
[TABLE]
and that is globally Lipschitz continuous with constant by Proposition 2.5 (iii). Hence, is directionally differentiable almost everywhere. In addition, is piecewise differentiable, and hence its directional derivative
[TABLE]
at in direction satisfies
[TABLE]
Together we obtain that is semismooth; see, e.g., [15, Theorem 8.2] or [22, Proposition 2.7]; see also [22, Proposition 2.26].
This implies that the superposition operator is Newton-differentiable from to for any ; see, e.g., [15, Example 8.12] or [22, Theorem 3.49]. Its Newton derivative will be denoted by , and it is given pointwise almost everywhere at in direction by a measurable selection
[TABLE]
where is the Clarke derivative, which for piecewise differentiable functions is given by the convex hull of the piecewise derivatives at each point. Specifically, for given in Section 3.3, a Newton derivative is given by
[TABLE]
where denotes the diagonal matrix with the given entries.
In the sequel, we shall require the following two properties of the Newton derivative.
Lemma 5.1**.**
For all and , we have
[TABLE]
Proof 5.2**.**
Recall from Proposition 2.5 that is the derivative of the convex functional and hence is monotone. Therefore we have for all , almost all , and all that
[TABLE]
Dividing by and taking the limit as yields
[TABLE]
Similarly, since is globally Lipschitz with constant , we have for all , almost all , and all that
[TABLE]
Taking again the limit as yields
[TABLE]
As a consequence, all elements in the Clarke derivative satisfy the inequalities (103) and (105). Since is taken as a measurable selection from , the claim follows by substitution and integration over .
To apply a semismooth Newton method to (OSγ), we first introduce the state and eliminate , thus obtaining the equivalent optimality system
[TABLE]
Considering the system (106) as an operator equation from to , a semismooth Newton step for its solution consists in computing for given such that
[TABLE]
and setting and .
To show superlinear convergence of this iteration, it remains to show uniform solvability of each Newton step.
Proposition 5.3**.**
For any and , the system
[TABLE]
has a solution which satisfies
[TABLE]
Proof 5.4**.**
Eliminating , we obtain that (108) is equivalent to
[TABLE]
Since is linear and bounded from to and is monotone on from Lemma 5.1, the operator is maximally monotone from to ; see, e.g., [2, Propositions 20.10, 20.24]. Minty’s theorem thus yields existence of a solution and hence of a corresponding ; see, e.g., [2, Proposition 21.1].
Taking the inner product of equation (110) with and using Lemma 5.1 with implies that
[TABLE]
using the boundedness of from to and Lemma 5.1 with . The second equation of (108) then yields
[TABLE]
As a consequence of the Newton differentiability of and of Proposition 5.3, we obtain the following result; see, e.g., [15, Theorem 8.16], [22, Chapter 3.2].
Theorem 5.5**.**
The semismooth Newton iteration (107) converges locally superlinearly in .
Since the right-hand side of the Newton system (107) is linear apart from the term , we can use the following termination criterion for the Newton iteration: If all active sets coincide for and , and the control is computed as , then satisfies (OSγ); see, e.g., [15, Remark 7.1.1].
This can be used as part of a continuation strategy to deal with the local convergence behavior of Newton methods: Starting with large and , we solve the regularized optimality system (OSγ) using the semismooth Newton iteration (107). If the iteration converges for some (in the sense that all active sets coincide), we reduce and solve the system (OSγ) again with the solution for as the starting point. This procedure is terminated if the Newton iteration converges in a single step (assuming that the corresponding iterate then satisfies the system for smaller values of as well) or if the Newton iteration fails to converge within a given number of steps (assuming that the system has then become too ill-conditioned for a stable numerical solution). In any case, the continuation is stopped when is reached.
While this strategy has proved robust for problems with scalar - and -type penalties, see e.g. [16, 7], the situation is more delicate for the vector functional considered here; this is in particular the case when the singular arc is non-negligible and is not a diagonal matrix, where the continuation strategy failed in some cases to provide a good initial guess for the next Newton iteration. We thus combine the semismooth Newton method with a backtracking line search along the Newton direction. In principle, this requires computation of (or and ); however, if the tracking term is strictly convex (as will be the case in the examples considered below), the system (OSγ) is a sufficient as well as necessary condition and hence we can equivalently backtrack according to the residual norm of (OSγ). This was sufficient to achieve a robust and superlinear convergence in all examples.
6 Numerical examples
We illustrate the behavior of the proposed approach and the structure of the resulting controls with two numerical examples. First, we consider an elliptic problem where the two control components each act along a strip in one coordinate direction. Specifically, we set , ,
[TABLE]
and consider the control-to-state mapping satisfying
[TABLE]
The target is
[TABLE]
see Fig. 3.
The state and adjoint are discretized using piecewise linear finite elements based on a uniform triangulation of the domain with nodes. Since the control is eliminated, this can be interpreted as a variational discretization. Integration over the piecewise defined functions and in the weak formulation of (107) is approximated by applying the mass matrix to the vector of nodal values; see [7]. The control operator is approximated by forming the tensor product of the discrete indicator function of with the nodal values of ; the adjoint operator is approximated by the transpose of this matrix in order to preserve symmetry. The “globalized” semismooth Newton method with continuation and line searches described above is applied to the discretized system. The continuation is started at and the backtracking is performed in steps of for ; if , the Newton iteration is restarted with reduced . Since we no longer perform full Newton steps, we augment the termination criterion for the Newton iteration with an additional check for the residual norm in the optimality system, i.e., we terminate if all active sets coincide and the residual is smaller than . A Matlab implementation of the described algorithm can be downloaded from https://github.com/clason/switchingcontrol.
We begin by illustrating the effects of the values of and on the structure of the resulting controls. Figure 4 shows the final computed controls for the same target and different combinations of control costs. For the choice (Fig. 4(a)), the control has a pure switching structure, with nodes (out of ) having values in the active set and nodes in the set (the remaining sets being empty); in particular, the singular arc is empty. Furthermore, the effect of the costs on the active control components can be observed clearly. Decreasing to results in a control that is no longer purely switching (Fig. 4(b)), although some switching behavior still obtains in parts of ; the resulting active sets have nodes in , nodes in , and nodes in the regularized free arc . Since is unchanged, the magnitude of the active controls is the same as before. Decreasing , on the other hand, allows for controls of larger magnitude, but results in the appearance of singular arcs. For and (Fig. 4(c)), we observe a control which is almost purely switching ( and nodes in and , respectively) but still has a non-negligible singular arc with nodes in . The control shows a chittering behavior on part of the switching arc, which can be attributed to the weak but not pointwise convergence of the regularized controls. For the smaller value of (Fig. 4(d)), the singular arc disappears at the expense of the appearance of a large free arc ( nodes in , nodes in , and nodes in ).
Let us briefly comment on the convergence behavior of the “globalized” Newton method. For , the semismooth Newton iteration shows the typical superlinear behavior, converging within two or three (full) steps to a solution of the system (OSγ). For smaller values of , backtracking becomes necessary after one full step, but, depending on the presence of singular arcs, often enters into a superlinear phase again where full steps are taken to convergence. Specifically, in the case of , the iteration terminates successfully at with only a few reduced steps necessary. For and , more line searches are performed, but the final superlinear phase is still observed for , after which the Newton iteration terminated since no sufficient decrease in the residual was possible. However, restarting with smaller still allowed some successful steps before terminating again, which continued until the specified terminal value of was reached. For , no backtracking was necessary, and the algorithm showed the typical behavior of a semismooth Newton method with continuation (terminating successfully at for and at for ).
To demonstrate the applicability of the proposed approach to switching control of parabolic equations, we also show results for the one-dimensional heat equation, where satisfying
[TABLE]
with , , ,
[TABLE]
As a target, we choose the trajectory of the heat equation with the right-hand side
[TABLE]
see Fig. 5.
The discretization is similar as in the elliptic case, using a full space-time discontinuous Galerkin discretization corresponding to a backward Euler method with spatial grid points and time steps.
The resulting controls for are shown in Fig. 6. For (Fig. 6(a)), the control is again of purely switching type with nodes each in and . No backtracking was necessary, and the continuation terminated successfully at . The control for (Fig. 6(b)) shows a free arc, with nodes in , nodes in , and nodes in . The convergence behavior is now different due to the intermittent appearance of singular arcs: Although the first continuation step with shows the usual superlinear convergence with full steps, the resulting iterate contains nodes in and . Subsequently, the iterations for suffer from progressively smaller steps until no sufficient decrease is possible. At , however, the corresponding singular arc is empty and the iteration returns to superlinear convergence with full steps, terminating successfully at . The difference to the elliptic case can be attributed to the lower regularity of the adjoint state with respect to the control dimension (here: time) and the corresponding smaller norm gap in the regularized subdifferential .
7 Conclusion
A framework for optimal control problems was presented that promotes controls of switching type. While switching is promoted by a sparsity-enhancing part of the cost functional, the active controls are weighted with quadratic cost. Analysis of the proposed approach is carried out by techniques from convex analysis, while its numerical solution is achieved using a semismooth Newton method with continuation and line searches. Numerical results support the theoretical findings.
There are many interesting follow-up topics, including the treatment of problems with nonlinear control-to-state mappings, a more detailed analysis of the influence of the control cost parameters on the structure of the controls, and problems with multiple controls exhibiting generalized switching structures.
Appendix A Application to other binary penalties
This appendix demonstrates the application of the approach of Section 3 to other functionals involving the binary functional . While the Fenchel conjugates and subdifferentials have already been obtained in the previous works cited below, the proximal mappings and corresponding Moreau–Yosida regularizations and complementarity formulations are new.
A.1 Sparse control
We first consider the functional
[TABLE]
which promotes sparsity in optimal control and, contrary to -type penalties, allows separate penalization of magnitude and support; see [16]. Setting
[TABLE]
we compute the Fenchel conjugate
[TABLE]
by case distinction. Assume that the supremum is attained for some . Then we discriminate the following two cases:
- (i)
, in which case and hence ; 2. (ii)
, in which case . Since is differentiable at , the necessary condition for to attain the maximum is . Solving for and inserting in (a.1) yields
[TABLE]
It remains to decide which of these cases is attained for a given , i.e., whether
[TABLE]
This directly yields
[TABLE]
as well as
[TABLE]
We now turn to the computation for given and of the proximal mapping of or, equivalently, the resolvent of , which is characterized by the relation . We now distinguish all possible cases in (a.2):
- (i)
: In this case , which implies that . 2. (ii)
: In this case , which implies that . 3. (iii)
: In this case , which implies that .
Inserting this into the definition of the Moreau–Yosida regularization and simplifying yields
[TABLE]
which can be interpreted as a soft-thresholding operator.
Since is Lipschitz continuous and piecewise differentiable, it is semismooth, and its Newton-derivative at in direction is given by
[TABLE]
A.2 Multi-bang control
We now consider the multi-bang functional
[TABLE]
where are given desired control states and denotes the indicator function of the convex set . In optimal control problems, the binary term (together with the pointwise constraints) promotes controls which, for sufficiently large, take on only the desired values almost everywhere except possibly on a singular set; see [7].
Proceeding as in Section A.1 yields the Fenchel conjugate
[TABLE]
whose subdifferential is
[TABLE]
where
[TABLE]
Note that some of these sets can be empty. In fact, for sufficiently large, and hence , , can be guaranteed to vanish; see [7, § 2.3].
To compute for given and the resolvent of , we again use the relation and follow the case differentiation in the subdifferential.
- (i)
for some : In this case, , which implies that
[TABLE]
and
[TABLE]
(with the first and last condition being void for and , respectively). 2. (ii)
: In this case, , which implies that
[TABLE]
and
[TABLE] 3. (iii)
for some : In this case, and , which implies that
[TABLE] 4. (iv)
for some : In this case, and , which implies that
[TABLE]
Inserting this into the definition of the Moreau–Yosida regularization and simplifying, we obtain
[TABLE]
where
[TABLE]
Since is Lipschitz continuous and piecewise differentiable, it is semismooth, and its Newton-derivative at in direction is given by
[TABLE]
Appendix B Biconjugate of
We now compute the biconjugate used in Theorem 4.4. As in Section 3.1, we proceed by a casewise maximization based on the definition of ; however, we need to take into account the restrictions . We assume that , the remaining cases following by symmetry. Consider first
[TABLE]
and note that the supremum can only be attained for . Introducing Lagrange multipliers for the constraints and , we obtain the KKT system
[TABLE]
We now make a case differentiation based on the optimal value of the multipliers .
- (i)
: Adding the first two equations then yields
[TABLE]
To obtain an equation for , we further discriminate based on the value of :
- (a)
: The second equation yields the condition . In this case, the value of is irrelevant to the supremum and we obtain for any admissible
[TABLE] 2. (b)
: In this case, and we obtain
[TABLE]
while the condition translates into
[TABLE] 2. (ii)
: This implies . For the value of , we again further discriminate based on the value of :
- (a)
: The first equation then yields and we obtain
[TABLE]
while the condition translates into
[TABLE] 2. (b)
: In this case, , which yields
[TABLE]
Note that no conditions on are obtained.
Collecting these cases, we obtain
[TABLE]
We proceed similarly for
[TABLE]
to obtain the possible values and conditions
[TABLE]
where the case (i) a) has been absorbed into the first and second case (which for are exhaustive).
For
[TABLE]
we use the fact that the optimality conditions for the maximizer are given by , where denotes the projection onto the convex feasible set . Inserting the possible cases , , yields
[TABLE]
It remains to decide for a given which is the maximal of the feasible values.
- (i)
For , we have the three possible values
[TABLE]
Since , , and , the first two are clearly smaller than the third. For the last case, we consider
[TABLE]
For these values of , the terms in parentheses are monotonously increasing functions of and , respectively; the minimimum is thus attained for at . Hence, . 2. (ii)
For , the only two distinct cases are
[TABLE]
Considering the difference of these functions as above, we conclude that . 3. (iii)
We argue similarly for to conclude . 4. (iv)
For , we have to compare the two cases
[TABLE]
We have
[TABLE]
and thus . 5. (v)
In the remaining case and , the only possible value is
[TABLE]
Arguing similarly for the three remaining quadrants of , we obtain
[TABLE]
where
[TABLE]
see Fig. 7.
A short calculation shows that
[TABLE]
This is obvious for and . For , we have and hence
[TABLE]
and similarly for . For , we consider the difference
[TABLE]
On , the terms in parentheses are monotonically increasing functions of and respectively, and thus the minimum is attained at the boundard , i.e., for and for some . Inserting this and simplifying yields
[TABLE]
which is a concave quadratic function of and thus attains its minimum at or , yielding as desired.
Acknowledgments
The work of CC and KK was supported in part by the Austrian Science Fund (FWF) under grant SFB f32 (SFB “Mathematical Optimization and Applications in Biomedical Sciences”). The work of KI was partially supported by the Army Research Office under grant daad 19-02-1-0394.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Hédy Attouch and Haı̈m Brezis “Duality for the sum of convex functions in general Banach spaces” In Aspects of Mathematics and its Applications Amsterdam: North-Holland, 1986, pp. 125–133 DOI: 10.1016/S 0924-6509(09)70252-1 · doi ↗
- 2[2] Heinz H. Bauschke and Patrick L. Combettes “Convex Analysis and Monotone Operator Theory in Hilbert Spaces” New York: Springer, 2011 DOI: 10.1007/978-1-4419-9467-7 · doi ↗
- 3[3] Amir Beck and Marc Teboulle “Smoothing and first order methods: a unified framework” In SIAM Journal on Optimization 22.2 , 2012, pp. 557–580 DOI: 10.1137/100818327 · doi ↗
- 4[4] Andrea Braides “ Γ Γ \Gamma -Convergence for Beginners” Oxford: Oxford University Press, 2002 DOI: 10.1093/acprof:oso/9780198507840.001.0001 · doi ↗
- 5[5] Haı̈m Brezis, Michael G. Crandall and Amnon Pazy “Perturbations of nonlinear maximal monotone sets in Banach space” In Comm. Pure Appl. Math. 23 , 1970, pp. 123–144 DOI: 10.1002/cpa.3160230107 · doi ↗
- 6[6] Italo Capuzzo Dolcetta and Lawrence C. Evans “Optimal switching for ordinary differential equations” In SIAM Journal on Control and Optimization 22.1 , 1984, pp. 143–161 DOI: 10.1137/0322011 · doi ↗
- 7[7] Christian Clason and Karl Kunisch “Multi-bang control of elliptic systems” In Annales de l’Institut Henri Poincaré (C) Analyse Non Linéaire 31.6 , 2014, pp. 1109–1130 DOI: 10.1016/j.anihpc.2013.08.005 · doi ↗
- 8[8] Ivar Ekeland and Roger Témam “Convex Analysis and Variational Problems” 28 , Classics Appl. Math. Philadelphia: SIAM, 1999 DOI: 10.1137/1.9781611971088 · doi ↗
