TL;DR
This paper extends primal-dual proximal splitting methods to solve challenging non-convex, non-smooth problems by leveraging generalized conjugates and saddle-point reformulations, with proven local linear convergence under certain conditions.
Contribution
It introduces a novel framework applying primal-dual proximal splitting to non-convex, non-smooth problems using generalized conjugates and saddle-point formulations, with convergence analysis.
Findings
Method successfully applied to Nash equilibrium and Potts segmentation problems.
Proven local linear convergence under strong convexity assumptions.
Numerical experiments confirm theoretical convergence results.
Abstract
We demonstrate that difficult non-convex non-smooth optimization problems, such as Nash equilibrium problems and anisotropic as well as isotropic Potts segmentation model, can be written in terms of generalized conjugates of convex functionals. These, in turn, can be formulated as saddle-point problems involving convex non-smooth functionals and a general smooth but non-bilinear coupling term. We then show through detailed convergence analysis that a conceptually straightforward extension of the primal--dual proximal splitting method of Chambolle and Pock is applicable to the solution of such problems. Under sufficient local strong convexity assumptions of the functionals -- but still with a non-bilinear coupling term -- we even demonstrate local linear convergence of the method. We illustrate these theoretical results numerically on the aforementioned example problems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
\manuscripteprinttype
arxiv \manuscripteprint1901.02746v4
Primal–dual proximal splitting and generalized conjugation in non-smooth non-convex optimization
Christian Clason Faculty of Mathematics, University Duisburg-Essen, 45117 Essen, Germany (, \orcid0000-0002-9948-8426) [email protected]
Stanislav Mazurenko Loschmidt Laboratories, Masaryk University, Brno, Czechia; previously Department of Mathematical Sciences, University of Liverpool, United Kingdom (, \orcid0000-0003-3659-4819) [email protected]
Tuomo Valkonen ModeMat, Escuela Politécnica Nacional, Quito, Ecuador and Department of Mathematics and Statistics, University of Helsinki, Finland; previously Department of Mathematical Sciences, University of Liverpool, United Kingdom (, \orcid0000-0001-6683-3572) [email protected]
(2020-03-19)
Abstract
We demonstrate that difficult non-convex non-smooth optimization problems, such as Nash equilibrium problems and anisotropic as well as isotropic Potts segmentation model, can be written in terms of generalized conjugates of convex functionals. These, in turn, can be formulated as saddle-point problems involving convex non-smooth functionals and a general smooth but non-bilinear coupling term. We then show through detailed convergence analysis that a conceptually straightforward extension of the primal–dual proximal splitting method of Chambolle and Pock is applicable to the solution of such problems. Under sufficient local strong convexity assumptions of the functionals – but still with a non-bilinear coupling term – we even demonstrate local linear convergence of the method. We illustrate these theoretical results numerically on the aforementioned example problems.
1 Introduction
This work is concerned with the numerical solution of non-smooth non-convex saddle-point problems of the form
[TABLE]
where and are (possibly non-smooth) proper, convex and lower semicontinuous functionals on Hilbert spaces and , and is smooth but may be non-convex-concave. Such problems arise in many areas of optimal control, inverse problems, and imaging; we will treat two specific examples below. To find a critical point for (1), we propose the generalized primal–dual proximal splitting (GPDPS) method:
Algorithm 1.1** (GPDPS).**
Given a starting point and step lengths , iterate:
[TABLE]
where is the proximal mapping for ; and are the partial Fréchet derivatives of with respect to and . A main result of this work is that under suitable conditions on the step length parameters , , and , this algorithm converges weakly to a critical point of (1); see Theorem 6.1. Furthermore, if and/or is strongly metrically subregular at the saddle point (in particular, if and/or are strongly convex), we show optimal convergence rates for the standard acceleration strategies; see Theorems 6.4 and 6.6.
In addition, we demonstrate in this work how through a suitable reformulation this method can be applied to the following two non-trivial applications:
- (i)
elliptic Nash equilibrium problems, where is the so-called Nikaido–Isoda function encoding the Nash equilibrium [29, 25, 38]; see Section 2.1 for details. 2. (ii)
(Huber-regularized) - denoising (also referred to as the Potts model) [18, 33, 34], where is used to express the non-convex Potts functional as the generalized -conjugate of a convex indicator function; see Section 2.2 for details.
In particular, the second example demonstrates how the proposed method can be used to solve (some) non-convex non-smooth problems by reformulating in them in terms of a convex but non-smooth functional and a smooth but non-convex coupling term. (We stress, however, that we do not claim that this approach is superior to state-of-the-art problem-specific approaches such as the ones mentioned in the cited works for the specific problems; such an investigation is left for the future.)
Related literature.
Our approach is obviously motivated by the well-known primal–dual proximal splitting (PDPS) method of Chambolle and Pock [8] for convex optimization problems of the form for proper, convex, and lower semicontinuous and linear. The method is based on the equivalent reformulation as the saddle-point problem
[TABLE]
where is the Fenchel conjugate of . Several other alternative techniques for such optimization problems have also been developed, e.g., using smoothing schemes [28] or a proximal alternating predictor corrector [13]. This approach was extended to allow for nonlinear but Fréchet differentiable in [35]. Later work [12, 10] applied this to non-convex PDE-constrained optimization problems and derived accelerated variants.
In a broader context, generalized convex conjugation has been studied for many decades with applications in economics, see, e.g., [26, 32, 15] and the references therein. Algorithms for the solution of general saddle-point problems have been considered in several seminal papers. In particular, a prox-type method was suggested in [27] for convex–concave functions yielding a rate of convergence for an ergodic version of the gap . These results were further extended to allow non-smooth functions in the Mirror Descent method [22], demonstrating a rate of convergence for the ergodic gap although with a vanishing step size for large . The authors also considered an acceleration of the Mirror Proximal method for the case when the gradient map of can be split into a Lipschitz-continuous part and a monotone operator [23]. The latter was assumed “simple” in the sense that a solution to a specific variational inequality could be found relatively efficiently. As a result, the authors obtained an rate of convergence with a possibility for improvement to for a strongly concave . Finally, the reformulation of (1) with a bilinear as a monotone inclusion problem was considered in [21]. Algorithms applicable to (1) with a genuinely nonlinear have only started to appear in literature relatively recently. An abstract convergence result was obtained for an inexact regularized Gauss–Seidel method in [3]. In [20], the authors considered saddle-point representable functions and arrived at a very similar structure to (1); specifically, they reformulated this problem as a smooth linearly-constrained saddle point problem by moving the non-smooth terms into the problem domain and applied the Mirror Proximal algorithm mentioned earlier, with a smooth cost function and the convergence rate [27]. Following [21], Kolossoski and Monteiro [24] developed a non-Euclidean hybrid proximal extragradient for and Bregman distances, and general convex–concave. The case of a general convex–concave in (1) (which therefore becomes an overall convex–concave problem) has been recently studied in [19]. Besides being restricted to convex–concave problems, their algorithm differs from Algorithm 1.1 in applying the overrelaxation to instead of to in the third step. Finally, problems for general sufficiently smooth were considered in [5] in conjunction with a variant of ADMM; however, no proofs of convergence were given in the general case.
Organization.
To motivate our approach, we start with a more detailed description of the above-mentioned example problems and their reformulation as a saddle-point problem of the form (1) in the next Section 2. (This section can be skipped by readers only interested in the convergence analysis for the general Algorithm 1.1.) The following Section 3 then collects basic notation and definitions as well as the fundamental assumptions that will be used throughout the following. We then study the convergence and convergence rates of Algorithm 1.1 in Sections 4, 5 and 6. More precisely, in Section 4 we derive a basic convergence estimate using the “testing” framework introduced in [36, 37] for the study of preconditioned proximal point methods. The results and assumptions depend on the iterates staying in a local neighborhood of a solution. In Section 5 we therefore derive conditions on the step length parameters and initial iterate that ensure that the iterates do not escape from a local neighborhood. Afterwards, we provide in Section 6 exact step length rules for Algorithm 1.1 together with respective weak convergence or convergence rate results: linear under sufficient strong convexity of and , and “accelerated” or rates with somewhat lesser assumptions. Finally, we illustrate the applicability and performance of the proposed approach applied to our two example problems in Section 7. Appendices A to C contain further technical results on the assumptions required for convergence, in particular verifying them for the Huber-regularized -TV denoising example.
2 Applications
Before we begin our analysis of the convergence of Algorithm 1.1, we motivate its generality by discussing two examples of practically relevant problems that can be cast in the form (1) and which will be used to numerically illustrate the behavior of the algorithm in Section 7. The idea in each case is to write a non-convex functional as the generalized -conjugate of a convex functional , i.e.,
[TABLE]
for a suitable (depending on ).
2.1 Elliptic Nash equilibrium problems
Our first example is the reformulation of Nash equilibrium problems using the Nikaido–Isoda function following [38]. Consider a non-cooperative game of players, each of which has a strategy and a payout function . For convenience, we introduce the vector of strategies and the notation
[TABLE]
for the vector where player changes their strategy to . We also set . A vector of strategies is then a Nash equilibrium if
[TABLE]
We now introduce the Nikaido–Isoda function [29] (also called the Ky Fan function [17])
[TABLE]
as well as the optimum response function
[TABLE]
It follows from [38, Thm. 2.2] that is a Nash equilibrium if and only if it is a minimizer of . Using the indicator function of the set defined by
[TABLE]
we see that the generally non-convex response function is the -preconjugate of the convex functional and can characterize a Nash equilibrium as the solution to the saddle-point problem
[TABLE]
We can therefore solve the Nash equilibrium problem (3) by applying Algorithm 1.1 to
[TABLE]
In Section 7.1, we illustrate this exemplarily for the two-player elliptic Nash equilibrium problem from [6].
Remark 2.1**.**
If the set of feasible strategies for each player depends on the strategies of the other players (i.e., ), (3) becomes a generalized Nash equilibrium problem (GNEP); see the survey [16] and the literature cited therein. If for all
[TABLE]
for some closed and convex set , the GNEP is called jointly convex. In this case, minimization of (4) is no longer an equivalent characterization but defines a variational equilibria [31]; every variational equilibrium is a generalized Nash equilibrium but not vice versa, see, e.g., [16, Thm. 3.9]. Hence Algorithm 1.1 can also be applied to compute (some if not all) solutions to jointly convex GNEPs.
2.2 Huber–Potts denoising
Our next example is concerned with (Huber-regularized) -TV denoising or segmentation, also referred to as Potts model. Let , , be a given noisy or to be segmented image. We then search for the denoised or segmented image as the solution to
[TABLE]
for a regularization parameter (which we write in front of the discrepancy term to simplify the computations), the discrete gradient , and the vectorial -seminorm
[TABLE]
and for is the usual -norm on ; we will discuss the choice of in detail below. Clearly, is a non-convex functional for any . Let us briefly comment on the use of -TV as a regularizer in imaging. Intuitively, the functional in (6) applied to the discrete gradient counts the number of jumps of the image value between neighboring pixels; it can therefore be expected that minimizers are piecewise constant, and that jumps are penalized even more strongly than by the (convex) total variation model.
To motivate our approach, we first consider a simple scalar (lower semicontinuous) step function, i.e., we consider for the corresponding characteristic function
[TABLE]
To write this non-convex function as the generalized preconjugate of a convex function, let satisfy , , and . Then a simple case distinction shows that
[TABLE]
Setting , we thus obtain that is the -preconjugate of the convex indicator function . One possible choice for is ; however, we require to be smooth in order to apply Algorithm 1.1. A better choice is therefore
[TABLE]
see Fig. 2, which has the advantage that the supremum in (8) is always attained at a finite . We will use this choice from now on.
Noting that , we can proceed similarly by case distinction to write
[TABLE]
i.e., for as above, is the -preconjugate of the zero function . In practice, it may be useful to add Huber regularization, i.e., replace by for some . Using the fact that and our choice (9) are differentiable, an elementary calculus argument shows that the corresponding preconjugate is
[TABLE]
which is a still non-convex approximation of , see Fig. 2.
We now turn to the vectorial seminorm, where we distinguish between .
The case .
With this choice, (6) reduces to
[TABLE]
which is the most common choice for the Potts model found in the literature. Here, the Potts functional counts for each pixel the jumps across each edge of the pixel separately, i.e., the contribution of each pixel is either [math] (no jump), (jump in either horizontal or vertical direction), or (jump in both directions). We thus refer (in a slight abuse of terminology) to this case as the anisotropic Potts model.
Since this functional is completely separable, we can apply the above scalar approach componentwise by taking
[TABLE]
such that is the -preconjugate of the zero function . Correspondingly, the Huber regularization of is given by
[TABLE]
The case .
Now (6) reduces to
[TABLE]
Here, each pixel contributes to the Potts functional only once, even if there is a jump across both edges. Since a simple case distinction shows that for any and , this case is equivalent to
[TABLE]
for any , which leads to an alternate definition of the Potts functional sometimes found in the literature. We refer to this case as the isotropic Potts model.
This functional is only separable with respect to the pixel coordinates but not with respect to . We thus extend our preconjugation approach to by observing for that
[TABLE]
since for , for all , while for or , the supremum will be attained at by the choice of . Setting
[TABLE]
makes again the -preconjugate of the zero function . The corresponding Huber regularization can be once more computed by elementary calculus as
[TABLE]
The case .
In principle, one could proceed as for by constructing a function with
[TABLE]
and setting . However, since the corresponding Potts functional only differs from the case by the relative contribution of pixels with jumps in both directions and for , we will only consider the extremal cases and .
In all cases, we can apply Algorithm 1.1 to
[TABLE]
for and . We illustrate the application of Algorithm 1.1 for and in Section 7.2.
Remark 2.2**.**
We can also apply this approach for with using the same as above, writing
[TABLE]
as if and attains the maximal value otherwise. However, is not ; we can achieve that by instead writing
[TABLE]
3 Notation and assumptions
We start the development of our proposed method by introducing the necessary notation and overall assumptions. Throughout the rest of this paper, we write for the space of bounded linear operators between Hilbert spaces and . In what follows, we let and denote elements of and , respectively, and denote by a pair . For brevity, we will also use this notation for similar tuples, e.g., , without explicit introduction in each case.
For any Hilbert space, is the identity operator, is the inner product in the corresponding space, and is the closed unit ball of the radius at . If is a set-valued map, we will frequently use the concise notation
[TABLE]
as well as, e.g.,
[TABLE]
if the corresponding relation holds for all .
For self-adjoint , the inequality means is positive semidefinite. If is self-adjoint, we further set , and (which define an inner product and a norm in , respectively, if is in addition positive definite). In this case, implies that for all .
We also recall that and denote the partial Fréchet derivatives of a continuosly differentiable operator with respect to the given variable.
Throughout this paper, we make the following fundamental assumptions on (1).
Assumption 3.0**.**
The functionals and are convex, proper, and lower semicontinuous. Furthermore,
- (i)
there exist a constant and a neighborhood of such that
[TABLE] 2. (ii)
there exist a constant and a neighborhood of such that
[TABLE]
Let us comment on this assumption. First, since the subgradients and of convex, proper, and lower semicontinuous functionals are maximally monotone operators [4, Theorem 20.25], Section 3 always holds with . This is already sufficient for showing weak convergence of Algorithm 1.1; see Theorem 6.1. For strong convergence with rates, however, we (as usual in nonlinear optimization) need a local superlinear growth condition near the solution that requires taking and/or strictly positive (unless we can compensate by better properties of through Section 3 below); see Theorems 6.4 and 6.6. In this case, Section 3 (i), for example, coincides with strong metric subregularity of ; see [1, 2]. This property holds (at any and ) whenever is strongly convex; however, it is a strictly weaker property since we only require it to hold at a specific and arising from the first-order necessary optimality conditions (17) below. (For example, for is strongly metrically subregular at for – but not at – although is not strongly convex.)
Assumption 3.0**.**
The functional and there exist such that for all
[TABLE]
the following properties hold:
- (i)
(second partial derivatives) The second partial derivatives and exist and satisfy . 2. (ii)
(locally Lipschitz gradients) For some functions and a constant ,
[TABLE] 3. (iii)
(locally bounded gradient) There exists with . 4. (iv)
(three-point condition) There exist , , such that
[TABLE]
We again elaborate on this assumption. Section 3 (i)–(iii) are standard in nonlinear optimization of smooth functions. Apart from the estimates in Section 3 (ii), we make use of the following inequality that is an immediate consequence:
[TABLE]
The constants and in Section 3 (iv) can typically be taken positive by exploiting the strong monotonicity factors and of and . Indeed, further on in Theorem 4.1, we will require that and , where and will be acceleration factors employed to update the step length parameters , , and in the algorithm.
In Appendix A we demonstrate that Section 3 (iv) is closely related to standard second-order optimality conditions, i.e., a positive definite Hessian at the solution . In particular, if the primal problem for the saddle-point functional is strongly convex and the dual problem is strongly concave, the constants that ensure Section 3 (iv) can be found explicitly. Nonetheless, Section 3 (iv) is more general than the simple strong convex-concavity. Indeed, in Appendix C we verify Section 3 for arising from combinations of a linear operator with a generalized conjugate representations of the step function and the function from Section 2.2.
Since (LABEL:eq:k-nonlinear-ky) holds for any when for some , the conditions (15) reduce to the three-point condition for from [10] with the exponent . In the present work, such an exponent would correspond to exponents over the norms with the factors and that we consider in Appendix B (ivenumi). These can sometimes be useful: The exponent was needed in [35, Appendix B] to show the three-point condition for for a phase and amplitude reconstruction problem. For the sake of readability, in the main part of the present work we focus on the case , i.e., Section 3 (iv), and discuss the changes needed for in Appendix B.
4 An abstract convergence result
We want to find a critical point of the saddle point functional , i.e., satisfying
[TABLE]
Since and are proper, convex, and lower semicontinuous, and is continuously differentiable, using the definition of the saddle-point, the Fréchet derivative, and the convex subdifferential, an elementary limiting argument as in, e.g., [9, Prop. 2.2] shows that the inclusion (17) is a first-order necessary optimality condition for a saddle point. If for , (17) reduces to and , which coincides with the well-known Fenchel–Rockafellar extremality conditions for (2); see [14, Remark 4.2].
To study Algorithm 1.1, we reformulate it in the preconditioned proximal point and testing framework of [36]. Specifically, we write Algorithm 1.1 in implicit proximal point form as solving in each iteration for in
[TABLE]
where the linearization of , the linear preconditioner , and the step length operator are defined as
[TABLE]
Inserting these definitions into (IPP) and rearranging, we can rewrite inclusion (IPP) as
[TABLE]
Therefore, based on the definitions of the proximal point mapping and of , solving (IPP) for is equivalent to performing one step of Algorithm 1.1. Since proximal mappings of proper, convex and lower semicontinuous functionals are well-defined, single-valued, and Lipschitz continuous [4, Proposition 12.15], and is twice Fréchet differentiable on , this also shows that (IPP) always admits a unique solution .
The next step is to “test” the inclusion (IPP) by application of for the testing operator
[TABLE]
This testing operator and the respective primal and dual testing variables and will be seen to encode convergence rates after some rearrangements of the tested inclusions for .
We will base our convergence analysis on the following abstract estimate, where forms a local metric that measures the convergence of the iterates while can potentially be used to measure function value or gap converge. In particular, we therefore want as with a certain rate such that boundedness of implies the convergence of at the reciprocal rate (see Theorems 6.4 and 6.6).
Theorem 4.1** ([36, Theorem 2.1]).**
Suppose (IPP) is solvable, and denote the iterates by . If is self-adjoint and for some and , for all ,
[TABLE]
The next theorem specializes Theorem 4.1 to our specific setup, converting the abstract condition (22) into several step length and testing parameter update rules and bounds. Specifically, (24a) below couples the primal and dual step lengths and and the over-relaxation parameter with the testing parameters. Condition (24b) determines convergence rates by limiting how fast the testing parameters can grow. This rate is limited through the available strong monotonicity or second-order behavior ( and ) through (24d) and (24e) as well as additional step length bounds from (24c). We point out that only the latter are specific to our non-convex setting; the remaining conditions are present in the convex setting as well, see [36]. We will further develop these rules and conditions in the next section to obtain specific convergence results; an explicit example for a set of parameters satisfying these rules and conditions will be provided for the -TV denoising in Sections 7.2 and C. Here and in the following, we use the notation from Algorithm 1.1 for brevity.
Theorem 4.2**.**
Suppose Sections 3 and 3 hold with the constants ; ; ; and . For all , let , and suppose for some . Assume for all that and that for some ; ; and ,
[TABLE]
Then (22) is satisfied for any .
Proof 4.3**.**
We split the proof into several steps.
Step 1 (estimation of )
By (24a), and , so (19) yields
[TABLE]
which is clearly self-adjoint. Applying Cauchy’s and Young’s inequalities, we further obtain for any , , and that
[TABLE]
implying that
[TABLE]
Step 2 (estimation of )
Expanding according to (25) and then applying (24b), we obtain
[TABLE]
Step 3 (estimation of )
By (18) we have
[TABLE]
Since , we have and . Using (21) multliplied by , Section 3, and (24a), we can thus estimate
[TABLE]
Combining (28), (27), and (26), we arrive at
[TABLE]
for
[TABLE]
The claim of the theorem is established if we prove that .
Step 4 (estimation of )
With
[TABLE]
we can rewrite
[TABLE]
We rearrange
[TABLE]
Since , setting
[TABLE]
we can write
[TABLE]
As for the estimate for , using Section 3 (ii) and (16) we obtain
[TABLE]
using in the last inequality the expansion and the bound that follows from the assumed inclusion .
We now use Section 3 (iv) to further bound and . From (15a), we obtain
[TABLE]
using in the last two inequalities that for some , and from (24e). Analogously, from (LABEL:eq:k-nonlinear-ky) and Cauchy’s inequality,
[TABLE]
where in the last two inequalities we again used , , and from (24d). Therefore, combining (30), (31), and (32), we obtain
[TABLE]
where we have also used the first bounds of (24d) and (24e) in the final step. Further using (24c) and , we deduce that . Recalling (29), we obtain , i.e., (22) holds with as claimed.
In the subsequent sections, we will also need the following corollary.
Corollary 4.4**.**
Suppose that Section 3 (iii) and the conditions (24) hold. Then
[TABLE]
and
[TABLE]
Proof 4.5**.**
Observe that due to (24),
[TABLE]
This is our first claim. As for the second term, from Section 3 (iii) we have
[TABLE]
Inserting this bound into (26) in the proof of Theorem 4.2 establishes (34).
5 Local step length bounds
In the previous section, we derived step length conditions that we will further develop in Section 6 to prove convergence and convergence rates. However, we implicitly required that all the iterations belong to . In this section, we derive additional step lengths restrictions to ensure that this holds.
We start with a lemma that bounds the next iterate given bounds on the current iterate and the step lengths for the current iteration. Afterwards, we chain these estimates to only require bounds on the initial iterates and the step lengths.
Lemma 5.1**.**
Fix . Suppose Section 3, Section 3 (ii), and (iii) hold in , and that solves (IPP). For simplicity, assume . Suppose and are such that and . If
[TABLE]
then and .
Proof 5.2**.**
We want to show that the step length conditions (35) are sufficient for
[TABLE]
We do this by applying the testing argument on the primal and dual variables separately. Multiplying (IPP) by with and , we obtain
[TABLE]
Using the three-point identity
[TABLE]
we obtain
[TABLE]
Using further and the monotonicity of , we arrive at
[TABLE]
With , this implies that
[TABLE]
After rearranging the terms and using , we thus have
[TABLE]
which leads to
[TABLE]
To estimate the dual variable, we multiply (IPP) by with and . This gives
[TABLE]
Using and following the steps leading to (38), we deduce
[TABLE]
with .
We now proceed to derive bounds on and with the goal of bounding both (38) and (39) from above. Using Section 3 (ii), (iii), and the mean value theorem applied to and ,
[TABLE]
the latter under the assumption that , which we now verify. First, by definition,
[TABLE]
Applying (37) and (38), we obtain
[TABLE]
The bound (35) on implies that and hence that . From (38) we thus obtain . The bound (35) on then implies that , which together with (39) completes the proof.
To chain the applications of Lemma 5.1 on each iteration , we introduce the following assumption, for which we recall the notations in Section 3 as well as the definition of from (14).
Assumption 5.2**.**
Suppose Section 3 holds near a solution . Given an initial iterate , and initial step length parameters as well as (to satisfy (24)), define the weighted distance
[TABLE]
We then assume that there exist and such that
[TABLE]
and that for all the step lengths satisfy
[TABLE]
Lemma 5.3**.**
For all , suppose solves (IPP) and that all the conditions of Theorem 4.2 are satisfied for some and except for the requirement . Then if Section 5 holds, .
Proof 5.4**.**
We define and
[TABLE]
Since the conditions (24) hold, we can apply Corollary 4.4 and the estimate (34) on to deduce that
[TABLE]
From (24b), we also deduce that and hence that . Consequently, if , then
[TABLE]
so it will suffice to show that for each to prove the claim. We do this in two steps. In the first step, we show that and
[TABLE]
In the second step, we show by induction that as well as for .
Step 1
We first prove (44). Since , we only have to show that . First, note that (24) and imply as well as for defined in (40). We then obtain from the definition of substituting from (25) that
[TABLE]
Using Cauchy’s and Young’s inequalities, the fact that , and the assumption that , we arrive at
[TABLE]
We obtain from Corollary 4.4 that and hence that . The assumption on then yields for all that
[TABLE]
Thus (44) follows from the definition of .
Step 2
We next show by induction that and for all . Since (42) holds for , we have that . Moreover, since in Step 1 we have , the bound (35) for follows from (41). This gives the induction basis.
Suppose now that . By (44), we have that . Since again the bound (35) for follows from (41) and the bound follows from Step 1, we can apply Lemma 5.1 to obtain
[TABLE]
By (43), we have and thus . Theorem 4.2 now implies that (23) is satisfied for with , which together with (23) and (42) yields that . This completes the induction step and hence the proof.
6 Convergence estimates
We are now ready to formulate the main convergence results of this paper based on the estimates derived above. First, based on (24d) and (24e), strong convexity may be required if and have to be positive for Section 3 to be satisfied. Moreover, the neighborhood has to be small enough, as determined by the assumptions and in the next results. This affects the admissible step lengths and how close we have to initialize via Section 5. After the next three main convergence results, we show that Section 5 is satisfied if we initialize close enough to a root . Hence, to apply the theorems in practice, we have to find constants for which Sections 3 and 3 are satisfied, use these constants to bound and compute the step lengths as described in the theorems, and initialize close enough to . In Appendix B we consider some relaxation of Section 3 (iv), which in turn requires larger and instead of and .
The following theorem provides conditions sufficient for weak convergence of the sequence generated by Algorithm 1.1. Apart from technical requirements of Theorem 4.2, we require additional weak-to-strong continuity of the mapping . While its verification depends on the particular choice of , it is trivially satisfied in two cases: (i) and are finite-dimensional and is continuous; or (ii) the mapping is linear and compact.
Theorem 6.1** (weak convergence: ).**
Suppose Sections 3, 3 and 5 hold for some ; ; ; and such that
[TABLE]
For some , choose
[TABLE]
Furthermore, suppose that
- (i)
implies that for all ,
and either
- (iia)
the mapping is weak-to-strong continuous in ; or 2. (iib)
the mapping is weak-to-weak continuous, but Section 3 (monotone and ) and Section 3 (iv) (three-point condition on ) hold at any weak limit of for the same choices of and .
Then the sequence generated by Algorithm 1.1 converges weakly to some (possibly different from ).
Since it is assumed that , we can replace by in the bound on in (47) if the latter is more readily available.
For constant , , and , we have to set and to satisfy (24a). Consequently, applying Corollary 4.4 to bound from below will not help to prove Theorem 6.1. We instead will make use of the following enhanced version of Opial’s lemma.
Lemma 6.2** ([10, Lemma A.2]).**
Let be a Hilbert space, (not necessarily closed or convex), and . Also let be self-adjoint and for some for all . If the following conditions hold, then in for some :
- (i)
The sequence is nonincreasing for some . 2. (ii)
All weak limit points of belong to . 3. (iii)
There exists such that for all , and for any weakly convergent subsequence there exists such that strongly in for all .
Proof 6.3** (Proof of Theorem 6.1).**
We first verify (24) so that we can apply Theorem 4.2 and Lemma 5.3. We set , , to satisfy (24a), (24b), (24d) and (24e) for and , , , satisfying (46). With the choice , the bounds (47) thus ensure (24c).
Hence (24) holds, which together with Section 5 and enables us to use Lemma 5.3 to obtain and . Therefore there exists at least one weak limit point of . Moreover, (25) yields self-adjointness of and since the bounds (47) are strict, Theorem 4.2 holds with for some .
We now verify the conditions of Lemma 6.2 with and . Estimate (23) is valid for any starting iterate; thus setting and taking instead of , we obtain for any due to Theorem 4.2. This verifies (i). Moreover, (iii) follows from the assumed constant step lengths, Section 3 (iii), and the assumption that for all if .
Hence we only need to verify (ii), i.e., if a subsequence of converges weakly to some , then . We note that , and (IPP) implies that for
[TABLE]
Therefore it suffices to show that if for a subsequence, then
[TABLE]
which by construction is equivalent to . Note that is maximally monotone since it only involves subgradient mappings of proper convex lower semicontinuous functions due to Section 3. Moreover, further use of (23) shows that and hence that . The last two terms in (51) thus converge strongly to zero. We therefore only have to consider the first term, for which we make a case distinction.
- (a)
If assumption (iia) holds, we obtain that , and the required inclusion follows from the fact that the graph of the maximally monotone operator is sequentially weakly–strongly closed; see [4, Proposition 16.36]. 2. (b)
If assumption (iib) holds, then only . In this case, we can apply the Brezis–Crandall–Pazy Lemma [4, Corollary 20.59 (iii)] to obtain the required inclusion under the additional condition that . In our case, recalling that the last two terms of (51) converge strongly to zero, we have that
[TABLE]
for
[TABLE]
Defining
[TABLE]
we rearrange and estimate
[TABLE]
Using , , (16), and both Section 3 and Section 3 (iv) at , we estimate as
[TABLE]
In the last bounds we used , , and because both and ; likewise, . Since , we obtain that . The Brezis–Crandall–Pazy Lemma thus yields the desired inclusion .
Hence in both cases, and the condition (ii) of Lemma 6.2 is satisfied. Applying Lemma 6.2, we obtain the claim.
We now provide convergence rates under additional assumptions of strong convexity of and/or , although we still allow non-convexity of the overall problem through . To be specific, we require that we can take the acceleration or step length update factors and/or in (24d) and (24e), respectively. Let us start with , which is the case, for instance, when is strongly convex and (15a) holds with . Since we obtain a fortiori strong convergence from the rates, we do not require the additional assumptions on introduced in Theorem 6.1; on the other hand, we only obtain convergence of the primal iterates. Similar to the linear case of [10], the step length choice follows directly from having to satisfy (24b) and the desire to keep the right-hand side of the -rule (24c) constant.
Theorem 6.4** (convergence rates under acceleration: ).**
Suppose Sections 3, 3 and 5 hold for some ; ; ; and such that for some ,
[TABLE]
Choose
[TABLE]
satisfying for some the bounds
[TABLE]
Then converges to zero at the rate .
Proof 6.5**.**
We again first verify (24) so that we can apply Theorem 4.2 and Lemma 5.3. Setting , , , and , (24a) follows from the -rule of (54) and the choice of , , and . Using (54) and , we obtain , and hence (24b) follows. Since and , (24c) follows from (55) and . Furthermore, (24d) and (24e) are satisfied due to the assumed bounds (53) on , , , and taking .
We can thus apply Theorem 4.2 and Lemma 5.3 to arrive at (23) for . We now estimate the convergence rate from (23) by bounding from below. Using Corollary 4.4, we obtain . Moreover,
[TABLE]
which yields the claim.
Theorem 6.6** (linear convergence: ).**
Suppose Sections 3, 3 and 5 hold for some ; ; ; and as well as
[TABLE]
with
[TABLE]
Assume for some the bound
[TABLE]
Then converges to zero with the linear rate .
Proof 6.7**.**
We will use Theorem 4.2 and Lemma 5.3, for both of which we need to verify (24) first. We set ,
[TABLE]
Then and , verifying (24a) and (24b). We next observe that substituting , the first bound of (24c) is tantamount to requiring
[TABLE]
Substituting , this in turn is equivalent to
[TABLE]
which after solving a quadratic inequality for yields the second bound of (58). Since , the first bound of (58) gives the second bound of (24c). Finally, (24d) and (24e) follow directly from (56) with .
Since Section 5 and (24) hold, we can apply Lemma 5.3 to obtain and . Moreover, (25) yields self-adjointness of . Consequently, we can apply Theorem 4.2 and Lemma 5.3 to arrive at (23) for any .
We now estimate the convergence rate from (23) by bounding from below. Using Corollary 4.4, we obtain that
[TABLE]
Since , this gives the claimed linear convergence rate through the exponential growth of .
Remark 6.8**.**
If for some , then and with and for a local Lipschitz factor of . Furthermore, Section 3, the step length bounds, and the update rules required in Theorem 6.1 or 6.6 reduce to the corresponding ones introduced in [10] for this case. As for acceleration, Theorem 6.4 now gives a weaker convergence rate of compared to in [10, Theorem 4.3]. This is due to (24c) requiring to be bounded whenever , even when goes to zero.
Before we conclude this section, we refine Section 5 by showing that its implicit requirements do not add any additional step length bounds provided the starting point is sufficiently close to .
Proposition 6.9**.**
Under the assumptions of Theorem 6.1, 6.4, or 6.6, suppose that . Then there exists such that Section 5 holds whenever the initial iterate satisfies
[TABLE]
Proof 6.10**.**
We take , , , , and as they are defined in the corresponding Theorem 6.1, 6.4, or 6.6, and , from Section 3. We need to show that there exist and such that (41) holds and
[TABLE]
Let and set as well as and . Observing (60), we then see both that and that (61) holds for sufficiently small. Furthermore, (60) yields that in Lemma 5.3. Let
[TABLE]
Since , , and for small enough, we see that as . Comparing the definition of to (41), we therefore see that the latter holds for any given and by taking sufficiently small. Since in Theorems 6.1, 6.4 and 6.6 we have , the inequalities (41) hold.
7 Numerical examples
Finally, we illustrate the applicability of the proposed approach for the example applications described in Section 2. The Julia implementation used to generate the following results is on Zenodo [11].
7.1 An elliptic Nash equilibrium problem
Our first example illustrates the reformulation from Section 2.1 for the two-player elliptic Nash equilibrium problem from [6]. Here the action space of each player is for a bounded domain with boundary . To avoid confusion with the spatial variable, we will in this subsection denote the primal variable with and the dual variable with . The set of admissible strategies is
[TABLE]
For a set of strategies , the payout function for each player is
[TABLE]
where , are given target states, maps to the solution to the elliptic boundary value problem
[TABLE]
are control operators which are here chosen as
[TABLE]
for some control domains , and is a common source term. Following Section 2.1, the corresponding Nash equilibrium problem (3) can then be solved by applying Algorithm 1.1 to
[TABLE]
To implement the algorithm, we need explicit forms of the proximal mappings for and and of the partial derivatives of . Since for , we have
[TABLE]
for the metric projections onto the convex sets given pointwise almost everywhere by
[TABLE]
It remains to address the computation of and . Using adjoint calculus and the linearity of the adjoint equation, we have that
[TABLE]
where and are the solutions to the equations
[TABLE]
all with homogeneous Dirichlet conditions. Hence, every iteration of Algorithm 1.1 requires nine solutions of a partial differential equation (recall that is evaluated at , while is evaluated at ). Since and hence and are affine in and , the assumptions of Theorem 6.1 are satisfied for sufficiently small step lengths. Since neither nor are strongly convex, no acceleration is possible.
For our numerical tests we follow [6] and consider a finite-difference discretization of (62) on with nodes in each direction,
[TABLE]
as well as , , and . Using the method of manufactured solutions, , , and are chosen such that the solution of the Nash equilibrium problem is known a priori; see Fig. 3. By construction, the saddle point then satisfies and hence .
Since the Lipschitz constants for and its derivatives are not available, we simply take the parameters in Algorithm 1.1 as , , and . The results of the algorithm for different values of are shown in Table 1, which reports the distance of the primal-dual iterates to the exact solution. As can be seen, the iteration converges in each case to machine precision within iterations, and the convergence behavior is virtually identical. This demonstrates the mesh independence expected from an algorithm for which convergence can be shown in function spaces.
7.2 -TV denoising
Our next example concerns the -TV denoising or segmentation problem from Section 2.2. Recall that we can solve the (Huber-regularized) -TV problem (5) by applying Algorithm 1.1 to
[TABLE]
for and , where is the discrete gradient. We write for defined in (17) corresponding to . Since and are quadratic, a simple computation shows that
[TABLE]
where all operations are to be understood componentwise. For the derivatives of , we have by the chain rule
[TABLE]
where is the discrete (negative) divergence. For the partial derivatives of and , we again distinguish the cases and :
[TABLE]
It remains to choose valid step sizes for Algorithm 1.1, for which the next result gives useful estimates. We recall from [7] that a forward differences discretization of the gradient operator satisfies . Recalling (63) and the definitions of and , a critical point satisfies
[TABLE]
For brevity, we set
[TABLE]
Using the results of Appendix C we verify the fundamental Section 3.
Corollary 7.1**.**
Let for either or . Choose and . Then Section 3 holds for some and with
[TABLE]
and the constants , satisfying
[TABLE]
Proof 7.2**.**
We consider only as the proof for is similar. Taking , Lemma C.1 applied componentwise shows that the operator satisfies Section 3 for some and (depending on ) when we take
[TABLE]
Moreover, the constants and need to satisfy as well as and for .
By Lemma C.3 on compositions with a linear operator, we can now take
[TABLE]
These give the claim.
We now obtain from Theorem 6.6 the following estimate.
Corollary 7.3**.**
Suppose Section 3 holds. Choose . For some and , take and as well as such that (65) holds. For some , take and as well as
[TABLE]
Then converges to zero with the linear rate provided is close enough to .
Proof 7.4**.**
The assumptions and ensure . Since we have assumed (65), Corollary 7.1 yields Section 3 for any and some . We next use Theorem 6.6, whose conditions we need to verify. First, taking ensures that and . Furthermore, the strict inequality in (66) implies (58) for sufficiently small . Finally, Proposition 6.9 ensures that we can satisfy Section 5 by taking sufficiently close to . The rest of the conditions we have assumed explicitly, so we can apply Theorem 6.6 to finish the proof.
Recall that Section 3 is a second-order growth condition at the critical point , which is a common assumption needed to show convergence of algorithms for non-convex optimization problems. To calculate the upper bounds on in (66), we need to find satisfying (65). For this, in turn, we need to estimate and . To do this, note that the critical point conditions (64) imply
[TABLE]
Since is increasing, we can estimate based on . Since any solution of the Potts problem should be piecewise constant with very few intensity quantization levels, we can estimate as the expected maximal jump between neighboring pixels. We take this as 100% of the dynamic range for safety. In practice, as a practical choice of will likely not satisfy , we use an over-approximation in (67). We remark that we thus cannot guarantee convergence of Algorithm 1.1 for small ; however, we demonstrate below that these estimates can still lead to useful step sizes for such cases. Similarly, we do not have an estimate for the unknown local neighborhood of convergence; we compensate for this by taking small in (66). As the results below demonstrate, with these parameters we nevertheless observe convergence for the reasonable starting point with and .
We illustrate the performance of the algorithm and the effects of the choice of . As a test image, we choose “blobs” from the ImageJ framework [30] with size , see Fig. 4(a). We set and (cf. Fig. 2) and use the accelerated step size rule from Theorem 6.6. To do this, we need to satisfy (66) for the primal step length . We discretize the problem such that and hence . Furthermore, we set and for . The above estimates then lead to the step length parameters
:
, , ;
:
, , .
Since the exact solution is not available here, we instead use for and similarly as references for computing errors. The corresponding reference images obtained from Algorithm 1.1 after iterations are shown in Figs. 4(b) and 4(c) for and , respectively. While the evaluation of the formulation and the algorithm in the context of image processing is outside of the scope of this work, we briefly comment on the difference between and . As can be seen by comparing the two images, the results are very similar. However, since diagonal jumps are penalized less for , the isotropic Huber–Potts model is better able to preserve small light blobs such as the one indicated by the red circles. The edges of the blobs are also noticeably smoother.
The convergence behavior of the method for both choices of over iterations is given in Fig. 5. For the function values, we observe in Fig. 5(a) the usual fast decrease in the beginning of the iteration, after which the values stagnate. Nevertheless, the errors continue to decrease down to machine precision at the predicted linear rate. The convergence behavior for and is similar, although the linear convergence for is with a significantly smaller constant. We remark that visually, the iterates in both cases are indistinguishable from the reference images already after iterations. This is consistent with Fig. 5(b) since the total error is dominated by the dual component, which acts as an edge indicator; small changes of the boundaries of the blobs during the iteration will, even for small gray value changes, lead to large differences in the dual variable.
8 Conclusion
Using generalized conjugation, some non-smooth non-convex optimization problems can be transformed into saddle-point problems involving non-smooth convex functionals and a smooth non-convex-concave coupling term. For such problems, a generalized primal–dual proximal splitting method can be applied that converges weakly under step length conditions if a local quadratic growth condition is satisfied near a saddle-point. Under additional strong convexity assumptions on the functionals (but not the coupling term and hence the problem), convergence rates for accelerated algorithms can be shown. This approach can be applied to elliptic Nash equilibrium problems and for the anisotropic and isotropic Huber-regularized Potts models, as the numerical examples illustrate. Future work is concerned with further evaluating and comparing the performance of the proposed algorithm for these examples.
Acknowledgments
In the first stages of the research T. Valkonen and S. Mazurenko were supported by the EPSRC First Grant EP/P021298/1, “PARTIAL Analysis of Relations in Tasks of Inversion for Algorithmic Leverage”. Later T. Valkonen was supported by the Academy of Finland grants 314701 and 320022. C. Clason was supported by the German Science Foundation (DFG) under grant Cl 487/2-1. We thank the anonymous reviewers for insightful comments.
A data statement for the EPSRC
The source codes for the numerical experiments are on Zenodo at [11].
Appendix A Reductions of the three-point condition
The following two propositions demonstrate that Section 3 (iv) is closely related to standard second-order optimality conditions, i.e., that the Hessian is positive definite at the solution .
Proposition A.1**.**
Suppose Section 3 (ii) (locally Lipschitz gradients of ) holds in some neighborhood of , and for some , ,
[TABLE]
Then (15a) holds in with , and for any .
Proof A.2**.**
An application of Cauchy’s and Young’s inequalities with any factor , Section 3 (ii), and (68) yields the estimate
[TABLE]
At the same time, using (16),
[TABLE]
Therefore (15a) holds if we take and .
Proposition A.3**.**
Suppose Section 3 (ii) (locally Lipschitz gradients of ) holds in some neighborhood of with , and that
[TABLE]
for some constant . Assume, moreover, for some , that
[TABLE]
Then (LABEL:eq:k-nonlinear-ky) holds in with , and for any , .
Proof A.4**.**
An application of Cauchy’s and Young’s inequalities with any factor , Section 3 (ii), and (69) yields the estimate
[TABLE]
At the same time, using (16) and Young’s inequality for any ,
[TABLE]
Therefore (LABEL:eq:k-nonlinear-ky) holds if we take and .
Appendix B Relaxations of the three-point condition
In all the results of this paper, Section 3 (iv) can be generalized to the following three-point condition similar to the one used in [10].
Assumption B.0**.**
The functional and there exists a neighborhood
[TABLE]
for some such that for all , the following property holds:
- (ivenumi)
(three-point condition) There exist , , , and such that
[TABLE]
This assumption introduces and in , while in Section 3 (iv) we had . For instance, in [10, Appendix B] we verified Appendix B with for the case for the reconstruction of the phase and amplitude of a complex number. This relaxation mainly affects the proof of Step 4 in Theorem 4.2, which now requires a few intermediate derivations.
Corollary B.1**.**
The results of Theorem 4.2 continue to hold if Section 3 (iv) is replaced with Appendix B (ivenumi) for some , where in case , (24d) is replaced by
[TABLE]
Proof B.2**.**
The beginning of the proof follows the exact same steps as in the proof of Theorem 4.2 up until (30). We now use Appendix B (ivenumi) to further bound and similarly to (31) and (32). From (71a),
[TABLE]
The following generalized Young’s inequality for any positive and such that allows for our choice of varying :
[TABLE]
Applying this inequality with ,
[TABLE]
for any to the last term of (73), we arrive at the estimate
[TABLE]
We now use for some , and to obtain
[TABLE]
If , we use the assumed inequality from (24e) to show that the right-hand side of (75) is non-negative for any . Otherwise we take to ensure the right-hand side of (75) is zero. In either case, and hence
[TABLE]
Analogously, from (LABEL:eq:k-nonlinear-ky-p2) and Cauchy’s inequality,
[TABLE]
This has a structure similar to (73) with now as a multiplier. Hence, we apply a similar generalized Young’s inequality to the last term with any . Noting that , we use the following bound similar to (75):
[TABLE]
The last inequality holds for any if due to the assumed from (24d); otherwise, we set . We then obtain that
[TABLE]
Combining (30), (76), and (77), we can thus bound
[TABLE]
where in the final step, we have also used (72) and the selected and if or or both. Thus, we obtained exactly the same lower bound as in (33). We then continue along the rest of the proof of Theorem 4.2 to obtain the claim.
It is worth observing that when or , the inequalities (72) do not directly bound the respective or . Hence, we do not need to initalize the corresponding variable locally, unlike when or . On the other hand, sufficient strong convexity is required from the corresponding and .
We start with the lemma ensuring that the iterates stay in the initial neighborhood of the saddle point.
Corollary B.3**.**
The results of Lemma 5.3 continue to hold if the corresponding conditions of Theorem 4.2 are replaced with those in Corollary B.1.
Proof B.4**.**
The proof repeats that of Lemma 5.3, applying Corollary B.1 instead of Theorem 4.2 in Step 2.
We next extend the results of Section 6 to arbitrary choices of both and . This mainly consists of verifying (72a) when and (72b) when . Note that it is possible to take and , or vice versa, as long as the corresponding conditions are satisfied.
Corollary B.5**.**
The results of Theorem 6.1 continue to hold if Section 3 (iv) is replaced with Appendix B (ivenumi) for some , where in case , (46a) is replaced with
[TABLE]
Proof B.6**.**
Since conditions (79) are sufficient for (72) with to hold, we can repeat the proof of Theorem 6.1 replacing the references to Theorem 4.2 by references to Corollary B.1 up until (52). If , we now obtain a lower bound on by arguing as in (73)–(75) with replaced by . Specifically, using (16), Appendix B (ivenumi) at , and the generalized Young’s inequality (74), we obtain for any that
[TABLE]
Inserting and , we eliminate the first term on the right-hand side. Likewise, if , similar steps applied to result in
[TABLE]
for . Using and the selection of and , we then obtain the desired estimate .
Corollary B.7**.**
The results of Theorem 6.4 continue to hold if Section 3 (iv) is replaced with Appendix B (ivenumi) for some , where in case , (53a) is replaced for some with
[TABLE]
Proof B.8**.**
Conditions (80) are sufficient for (72) with to hold; therefore, we can repeat the proof of Theorem 6.4 replacing the references to Theorem 4.2 by references to Corollary B.1.
Corollary B.9**.**
The results of Theorem 6.6 continue to hold if Section 3 (iv) is replaced with Appendix B (ivenumi) for some , where in case , (56a) is replaced for some with
[TABLE]
Proof B.10**.**
Conditions (81) are sufficient for (72) with to hold; therefore, we can repeat the proof of Theorem 6.6 replacing the references to Theorem 4.2 by references to Corollary B.1.
Corollary B.11**.**
The results of Proposition 6.9 continue to hold if the corresponding conditions of Theorem 6.1, 6.4, or 6.6 are replaced with those in Corollary B.5, B.7, or B.9.
Proof B.12**.**
The proof repeats that of Proposition 6.9.
Appendix C Verification of conditions for step function presentation and Potts model
Throughout this section, we set and for . Then so that
[TABLE]
where is the tensor product between two vectors and , producing a matrix of all the combinations of products between the entries.
The following lemma verifies Section 3 for .
Lemma C.1**.**
Let , and suppose for with
[TABLE]
Then the function defined above satisfies Section 3 for some and some dependent on with
[TABLE]
as well as the constants , satisfying , , and .
Proof C.2**.**
First, Section 3 (i) holds everywhere since . To verify Section 3 (ii), we observe using (82) that
[TABLE]
Hence , , and are as claimed.
To verify Section 3 (iii), we first of all observe using (83) that
[TABLE]
Therefore for some dependent on .
Finally, to verify Section 3 (iv), we start with (15a), i.e.,
[TABLE]
Expanding the equation using (82), (84), and
[TABLE]
we require that
[TABLE]
Taking any , this will hold by Cauchy’s and Young’s inequalities if and . If , clearly these hold for some . Otherwise, solving from the latter as an equality, i.e., taking , the former holds if . If , this holds for some in a neighborhood of (.
It remains to verify (LABEL:eq:k-nonlinear-ky), i.e.,
[TABLE]
Again, using (82) and (84) we expand this as
[TABLE]
Rearranging the -term, we see that this holds if
[TABLE]
Rearranging and estimating the first term as
[TABLE]
and then using Young’s inequality on both parts, we obtain the condition
[TABLE]
If and , this holds for some in .
We comment on the condition (83) on the primal–dual solutions pair . First, for , this condition reduces to . This is necessarily satisfied in the case of the step function (where ) and in the case of the function (where ) as in both cases, by the dual optimality condition . Furthermore, if we take for some , then for any the dual optimality condition reads , i.e, , for which (83) is easily verified.
The following lemma shows that Section 3 remains valid if we include a linear operator in the primal component.
Lemma C.3**.**
Let for some and on Hilbert spaces . Suppose satisfies Section 3 at . Mark the corresponding constants with a tilde: , , and so on. Then satisfies Section 3 with ; , ; , ; , ; , and as well as
[TABLE]
Proof C.4**.**
Observe first of all that by the chain rule,
[TABLE]
and hence Section 3 Item (i) holds for if it holds for .
Let now Section 3 (ii) hold for with , , and . Observing that
[TABLE]
Section 3 (ii) thus also holds with the function of (86). Similarly in Section 3 (iii), we can take .
Finally, we expand Section 3 (iv) for as
[TABLE]
where , , and . Since , etc., this follows from Section 3 (iv) for with the constants as claimed.
Applying this lemma to , we can thus lift the scalar estimates for as in (82) to the corresponding estimates on as used in the Potts model example.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. J.Aragón Artacho and M. H.Geoffroy, Characterization of metric regularity of subdifferentials, Journal of Convex Analysis 15 (2008), 365–380.
- 2[2] F. J.Aragón Artacho and M. H.Geoffroy, Metric subregularity of the convex subdifferential in Banach spaces, J. Nonlinear Convex Anal. 15 (2014), 35–47.
- 3[3] H.Attouch, J.Bolte, and B.Svaiter, Convergence of descent methods for semi-algebraic and tame problems: proximal algorithms, forward–backward splitting, and regularized Gauss–Seidel methods, Mathematical Programming 137 (2013), 91–129, doi:10.1007/s 10107-011-0484-9 . · doi ↗
- 4[4] H. H.Bauschke and P. L.Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , CMS Books in Mathematics/Ouvrages de Mathématiques de la SMC, Springer, 2 edition, 2017, doi:10.1007/978-3-319-48311-5 . · doi ↗
- 5[5] M.Benning, F.Knoll, C. B.Schönlieb, and T.Valkonen, Preconditioned ADMM with nonlinear operator constraint, in System Modeling and Optimization: 27th IFIP TC 7 Conference, CSMO 2015, Sophia Antipolis, France, June 29–July 3, 2015, Revised Selected Papers , L.Bociu, J. A.Désidéri, and A.Habbal (eds.), Springer International Publishing, 2016, 117–126, doi:10.1007/978-3-319-55795-3_10 , ar Xiv:1511.00425 , https://tuomov.iki.fi/m/nonlinear ADMM.pdf . · doi ↗
- 6[6] A.Borzì and C.Kanzow, Formulation and numerical solution of Nash equilibrium multiobjective elliptic control problems, SIAM Journal on Control and Optimization 51 (2013), 718–744, doi:10.1137/120864921 . · doi ↗
- 7[7] A.Chambolle, An algorithm for total variation minimization and applications, Journal of Mathematical Imaging and Vision 20 (2004), 89–97, doi:10.1023/b:jmiv.0000011325.36760.1e . · doi ↗
- 8[8] A.Chambolle and T.Pock, A first-order primal-dual algorithm for convex problems with applications to imaging, Journal of Mathematical Imaging and Vision 40 (2011), 120–145, doi:10.1007/s 10851-010-0251-1 . · doi ↗
