A primal-dual dynamical approach to structured convex minimization problems
Radu Ioan Bot, Ern\"o Robert Csetnek, Szilard Laszlo

TL;DR
This paper introduces a primal-dual dynamical system for structured convex minimization, proving convergence to saddle points and deriving convergence rates, leading to a new numerical algorithm combining proximal methods.
Contribution
It presents a novel dynamical system approach for structured convex problems and derives an explicit discretization that results in an effective numerical algorithm.
Findings
Trajectories asymptotically converge to saddle points.
Convergence rates for feasibility violation and objective function.
Discretization yields a new algorithm combining proximal methods.
Abstract
In this paper we propose a primal-dual dynamical approach to the minimization of a structured convex function consisting of a smooth term, a nonsmooth term, and the composition of another nonsmooth term with a linear continuous operator. In this scope we introduce a dynamical system for which we prove that its trajectories asymptotically converge to a saddle point of the Lagrangian of the underlying convex minimization problem as time tends to infinity. In addition, we provide rates for both the violation of the feasibility condition by the ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value. Explicit time discretization of the dynamical system results in a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A primal-dual dynamical approach to structured convex minimization problems
Radu Ioan Boţ University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: [email protected]. Research partially supported by FWF (Austrian Science Fund), project I 2419-N32.
Ernö Robert Csetnek University of Vienna, Faculty of Mathematics, Oskar-Morgenstern-Platz 1, A-1090 Vienna, Austria, email: [email protected]. Research supported by FWF (Austrian Science Fund), project P 29809-N32.
Szilárd Csaba László Technical University of Cluj-Napoca, Department of Mathematics, Memorandumului 28, Cluj-Napoca, Romania, email: [email protected]. This work was supported by a grant of Ministry of Research and Innovation, CNCS - UEFISCDI, project number PN-III-P1-1.1-TE-2016-0266, within PNCDI III.
Abstract. In this paper we propose a primal-dual dynamical approach to the minimization of a structured convex function consisting of a smooth term, a nonsmooth term, and the composition of another nonsmooth term with a linear continuous operator. In this scope we introduce a dynamical system for which we prove that its trajectories asymptotically converge to a saddle point of the Lagrangian of the underlying convex minimization problem as time tends to infinity. In addition, we provide rates for both the violation of the feasibility condition by the ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value. Explicit time discretization of the dynamical system results in a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.
Keywords. structured convex minimization, dynamical system, proximal ADMM algorithm, primal-dual algorithm
AMS Subject Classification. 37N40, 49N15, 90C25, 90C46
1 Introduction and preliminaries
For and real Hilbert spaces, we consider the convex minimization problem
[TABLE]
where and are proper, convex and lower semicontinuous functions, is a convex and Fréchet differentiable function with -Lipschitz continuous gradient , i.e. for every , and is a continuous linear operator.
Problem (1) can be rewritten as
[TABLE]
Obviously, is an optimal solution of (1) if and only if is an optimal solution of (2) and .
Based on this reformulation of problem (1) we define its Lagrangian
[TABLE]
An element is said to be a saddle point of the Lagrangian , if
[TABLE]
It is known that is a saddle point of if and only if is an optimal solution of (1), , and is an optimal solution of the Fenchel dual to problem (1), which reads
[TABLE]
In this situation the optimal objective values of (1) and (3) coincide.
In the formulation of (3),
[TABLE]
and
[TABLE]
denote the conjugate functions of and , respectively, and denotes the adjoint operator of . The infimal convolution of the functions and is defined by
[TABLE]
It is also known that is a saddle point of the Lagrangian if and only if it is a solution of the following system of primal-dual optimality conditions
[TABLE]
We recall that the convex subdifferential of the function at is defined by , for , and by , otherwise.
A saddle point of the Lagrangian exists whenever the primal problem (1) has an optimal solution and the so-called Attouch-Brézis regularity condition
[TABLE]
holds. Here,
[TABLE]
denotes the strong quasi-relative interior of a set . We refer the reader to [9, 11, 28] for more insights into the world of regularity conditions and convex duality theory.
Let denote the family of continuous linear operators which are self-adjoint and positive semidefinite. For we introduce the following seminorm on :
[TABLE]
This introduces on the following partial ordering: for
[TABLE]
For fixed, let be
[TABLE]
where denotes the identity operator on .
The subject of our investigations in this paper will be the following dynamical system, for which we will show that it asymptotically approaches the set of solutions of the primal-dual pair of optimization problems (1)-(3)
[TABLE]
where , , and and are such that
[TABLE]
One of the motivation for the study of this dynamical system comes from the fact that, as we will see in Remark 1, it provides through explicit time discretization a numerical algorithm which is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.
In the next section we will show the existence and uniqueness of strong global solutions for the dynamical system (4) in the framework of the Cauchy-Lipschitz Theorem. In Section 3 we will prove some technical results, which will play an important role in the asymptotic analyis. In Section 4 we will investigate the asymptotic behaviour of the trajectories as the time tends to infinity. By carrying out a Lyapunov analysis and by relying on the continuous variant of the Opial Lemma, we are able to prove that the trajectories generated by (4) asymptotically convergence to a saddle point of the Lagrangian . Furthermore, we provide convergence rates of for the violation of the feasibility condition by ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value.
The approach of optimization problems by dynamical systems has a long tradition. Crandall and Pazy considered in [20] dynamical systems governed by subdifferential operators (and more general by maximally monotone operators) in Hilbert spaces, addressed questions like the existence and uniqueness of solution trajectories, and related the latter to the theory of semi-groups of nonlinear contractions. Brézis [14] studied the asymptotic behaviour of the trajectories for dynamical systems governed by convex subdifferentials, and Bruck carried out in [15] a similar analysis for maximally monotone operators. Dynamical systems defined via resolvent/proximal evaluations of the governing operators have enjoyed much attention in the last years, as they result by explicit time discretization in relaxed versions of standard numerical algorithms, with high flexibility and good numerical performances. Abbas and Attouch introduced in [1] a forward-backward dynamical system, by extending to more general optimization problems an approach proposed by Antipin in [5] and Bolte in [10] on a gradient-projected dynamical system associated to the minimization of a smooth convex function over a convex closed set. Implicit dynamical systems were considered also in [13] in the context of monotone inclusion problems. A dynamical system of forward-backward-forward type was considered in [7], while, a dynamical system of Douglas-Rachford type was recently introduced in [21].
It is important to notice that the approaches mentioned above have been introduced in connection with the study of “simple” monotone inclusion and convex minimization problems. They rely on straightforward splitting strategies and cannot be efficiently used when addressing structured minimization problems, like (1), which need to be addressed from a primal and a dual perspective, thus, require for tools and techniques from the convex duality theory. The dynamical approach we introduce and investigate in this paper is, to our knowledge, the first meant to address structured convex minimization problems in the spirit of the full splitting paradigm.
Remark 1**.**
The first inclusion in (4) can be equivalently written as
[TABLE]
while the second one as
[TABLE]
The explicit discretization of (5) with respect to the time variable and constant step equal to yields the iterative scheme
[TABLE]
By convex subdifferential calculus, one can easily see that this can be for every equivalently written as
[TABLE]
and, further, as
[TABLE]
Similarly, (6) leads for every to
[TABLE]
which is nothing else than
[TABLE]
Here, and are two operator sequences in and , respectively.
Thus the dynamical system (4) leads through explicit time discretization to a numerical algorithm, which, for a starting point , generates a sequence for every as follows
[TABLE]
The algorithm (7) is a combination of the linearized proximal method of multipliers and the proximal ADMM algorithm.
Indeed, in the case when , (7) becomes the proximal ADMM algorithm with variable metrics from [8] (see, also, [12]). If, in addition, and the operator sequences and are constant, then (7) becomes the proximal ADMM algorithm investigated in [25, Section 3.2] (see, also, [23]). It is known that the proximal ADMM algorithm can be seen as a generalization of the full splitting primal-dual algorithms of Chambolle-Pock (see [16]) and Condat-Vu (see [19, 27]).
On the other hand, in the case when , (7) becomes an extension of the linearized proximal method of multipliers of Chen-Teboulle (see [17], [25, Algorithm 1]).
In the following remark we provide a particular choice for the maps and , which transforms (4) into a dynamical system of primal-dual type formulated in the spirit of the full splitting paradigm.
Remark 2**.**
For every , define
[TABLE]
where is such that .
Let be fixed. In this particular setting, (5) is equivalent to
[TABLE]
and further to
[TABLE]
In other words,
[TABLE]
where
[TABLE]
denotes the proximal point operator of a proper, convex and lower semicontinuous function .
On the other hand, relation (6) is equivalent to
[TABLE]
hence,
[TABLE]
This is further equivalent to
[TABLE]
and further to
[TABLE]
In other words,
[TABLE]
Consequently, in this particular setting, the dynamical system (4) can be equivalently written as
[TABLE]
Let us also mention that when and the dynamical system (8) reads
[TABLE]
The explicit time discretization of (9) leads to a numerical algorithm, which, for a starting point , generates the sequence for every as follows
[TABLE]
By substituting in the first equation of (10) the term by , which is allowed according to the last equation, one can easily see that (10) is equivalent to the following numerical algorithm, which, for a starting point , generates the sequence for every as follows
[TABLE]
For for every , (11) is nothing else than the primal-dual algorithm proposed by Chambolle and Pock in [16].
Remark 3**.**
The maps and can be seen as inducing a variable renorming of the underlying Hilbert space and, as seen in the remark above, allow the use of variable step sizes. In addition, they might provide favourable settings for the derivation of convergence rates for function values along the trajectories, as it is the case for the discrete time counterpart of (4) (see [12, Section 3]).
Example 1**.**
We will illustrate the way in which the parameters and may influence the asymptotic convergence of the primal and dual trajectories via numerical experiments. In this scope, we considered the following primal optimization problem
[TABLE]
which is in fact problem (1) written in the following particular setting: , , , , , for every , and , . Then is the unique optimal solution of (12) and that
[TABLE]
is the Fenchel dual problem of (12). Thus is the unique dual optimal solution.
We considered the dynamical system (8) attached to the primal-dual pair (12)-(13) with starting points , and in the case when for every is a constant function. In order to solve the resulting dynamical system we used the Matlab function ode15s and, to this end, we reformulated it as
[TABLE]
where
[TABLE]
and
[TABLE]
is defined as
[TABLE]
Notice that
[TABLE]
where denotes the projection operator on a convex and closed set .
According to Theorem 12, the asymptotic convergence of the trajectories can be guaranteed when . Since , we considered for three different choices, namely, and . The primal and the dual trajectories generated by the dynamical system for each of these three choices are represented in the figures 1, 2 and 3, respectively. The first row of each figure represents the primal trajectories for and , while the second row represents the dual trajectories for the same choices of the parameter .
One can see that the parameter has a strong influence on the asymptotic behaviour of the primal and dual trajectories. Namely, in all three figures, thus independently of the choice of the parameters and , the primal and the dual trajectories converge faster to the corresponding primal and dual solutions, respectively, for larger values of , namely, when is closer to . As is the coefficient of the derivative in the second inclusion of the dynamical system, it can be seen as a constant which characterizes its level of implicitness. In particular, more implicitness promotes a better asymptotic convergence. On the other hand, we notice that the smaller the values of are, the smaller is the influence of on the asymptotic convergence of the trajectories.
Notations**.**
The following two functions will play an important role in particular in the forthcoming analysis
[TABLE]
and
[TABLE]
With these two notations, the dynamical system (4) can be rewritten as
[TABLE]
Let be fixed. The function is proper, convex and lower semicontinuous, hence is proper, strongly convex and lower semicontinuous for every . This allows us to use the sign equal in the second relation of (14). On the other hand, a sufficient condition which guarantees that the function , which is proper and lower semicontinuous, is strongly convex is that there exists such that . This actually ensures that is proper, strongly convex and lower semicontinuous for every .
This means that if the assumption
[TABLE]
holds, then we can use also in the first relation of (14) the sign equal. It is easy to see, that, if holds, then is -strongly monotone for every . In other words, for every , all and all we have
[TABLE]
Notice that, since and for every , is fulfilled, if
[TABLE]
or, if
[TABLE]
Notice also that, if is a finite dimensional Hilbert space, then (16), which is independent of , is nothing else than is positively definite or, equivalently, is injective.
Let be the unit sphere of . Assumption is fulfilled if and only if for every . In this case we can take for every .
Obviously, if holds, then holds with for every .
2 Existence and uniqueness of the trajectories
In this section we will investigate the existence and uniqueness of the trajectories generated by (4). We start by recalling the definition of a locally absolutely continuous map.
Definition 1**.**
A function is said to be locally absolutely continuous, if it is absolutely continuous on every interval ; that is, for every there exists an integrable function such that
[TABLE]
Remark 4**.**
(a) Every absolutely continuous function is differentiable almost everywhere, its derivative coincides with its distributional derivative almost everywhere and one can recover the function from its derivative by the above integration formula.
(b) Let be and an absolutely continuous function. This is equivalent to (see [6, 2]): for every there exists such that for any finite family of intervals the following property holds:
[TABLE]
From this characterization it is easy to see that, if is -Lipschitz continuous with , then the function is absolutely continuous, too. This means that is differentiable almost everywhere and holds almost everywhere.
The following definition specifies which type of solutions we consider in the analysis of the dynamical system (4).
Definition 2**.**
Let , , , and and . We say that the function is a strong global solutions of (4), if the following properties are satisfied:
- (i)
the functions are locally absolutely continuous; 2. (ii)
for almost every
[TABLE] 3. (iii)
**
The following results will be useful in the proof of the existence and uniqueness theorem.
Lemma 1**.**
Assume that holds. Then, for every fixed , the operator
[TABLE]
is Lipschitz continuous.
Proof.
Let be fixed and . By subdifferential calculus we obtain that
[TABLE]
and
[TABLE]
Using that, due to , is -strongly monotone, we get
[TABLE]
By the Cauchy-Schwartz inequality we obtain
[TABLE]
which shows that is Lipschitz continuous with constant . ∎
Now we are going to prove another technical result which will be used in the proof of the main theorem of this section.
Lemma 2**.**
Assume that holds. Let be and the maps
[TABLE]
and
[TABLE]
Then the following statements are true for every :
- (i)
**
- (ii)
**
Proof.
Let be fixed.
(i) From the definition of one has
[TABLE]
and
[TABLE]
which is equivalent to
[TABLE]
Using again that is -strongly monotone for every , we obtain
[TABLE]
The conclusion follows via the Cauchy-Schwarz inequality.
(ii) From the definition of one has
[TABLE]
and
[TABLE]
which is equivalent to
[TABLE]
Using that is strongly monotone, we obtain
[TABLE]
From the Cauchy-Schwarz inequality and (i) it follows
[TABLE]
∎
Now we can prove existence and uniqueness of a strong global solution of (4) under . To this end we will first reformulate (14) as a particular first order dynamical system in a suitably chosen product space (see also [4]). Subsequently we will make use of the Cauchy-Lipschitz-Picard Theorem for absolutely continues trajectories (see, for example, [24, Proposition 6.2.1], [26, Theorem 54]). Notice that under the operator in Lemma 1 is Lipschitz continuous with constant for every .
Theorem 3**.**
Assume that holds, and and , namely,
[TABLE]
are integrable on for every . Then, for every starting points , the dynamical system (4) has a unique strong global solution
Proof.
Denoting , the dynamical system (4) can be rewritten as
[TABLE]
where
[TABLE]
is defined as
[TABLE]
The existence and uniqueness of a strong global solution follows according to the Cauchy-Lipschitz-Picard Theorem, if we show: (1) that is -Lipschitz continuous for every and the Lipschitz constant as a function of time has the property that ; (2) that for every .
(1) Let be fixed and consider . We have
[TABLE]
where (see Lemma 1)
[TABLE]
Hence,
[TABLE]
Using Lemma 1 and taking into account that is fulfilled, which means that the Lipschitz constant of the operator is , it follows
[TABLE]
By taking into account the nonexpansiveness of the proximal operator and that , it also follows
[TABLE]
Finally,
[TABLE]
Consequently,
[TABLE]
where
[TABLE]
and
[TABLE]
which means that is -Lipschitz continuous. Since and , it is obvious that .
(2) Now we will show that for every . Let be fixed and . We have
[TABLE]
By Lemma 2 and taking into account that for every and , we have for every that
[TABLE]
[TABLE]
and
[TABLE]
Since and , it follows that the integral
[TABLE]
exists and it is finite, in other words, .
Consequently, the dynamical system (17) has a unique locally absolutely continuous solution, which means that the dynamical system (14) has a unique strong global solution. ∎
3 Some technical results
In this section we will prove some technical results which will be useful in the asymptotic analysis of the dynamical system (4). We endow the real linear space with the norm
[TABLE]
If is self-adjoint, then it holds (see [29, Lemma 3.2.4 iv)])
[TABLE]
Definition 3**.**
We say that the map is derivable at , if the limit
[TABLE]
taken with respect to the norm topology of exists. In this case we denote by the value of this limit.
If exists, for , then one can easily see that
[TABLE]
According to Remark 4, if is locally absolutely continuous then exists for almost every
Assume now that is self-adjoint for every and that it is derivable at . For all we have
[TABLE]
which shows that is also self-adjoint.
Lemma 4**.**
Let be derivable at , and let the maps be also derivable at Then the real function is derivable at and one has
[TABLE]
Proof.
We have
[TABLE]
The derivation formula of the scalar product leads to the desired conclusion
[TABLE]
∎
The main result of this section follows.
Lemma 5**.**
Assume that holds and that the maps and are locally absolutely continuous. For a given starting point , let be the unique strong global solution of the dynamical system (4). Then
[TABLE]
is locally absolutely continuous, hence exists for almost every .
In addition, if and , then there exists such that
[TABLE]
for almost every
Proof.
Let be fixed. We will use the same notations as in the proof of Theorem 3. Let be fixed. We have
[TABLE]
Since
[TABLE]
according to Lemma 1, we get
[TABLE]
Since is bounded on there exists such that
[TABLE]
Similarly, since
[TABLE]
by the nonexpansiveness of the proximal operator we get
[TABLE]
Since is bounded on by taking into consideration (18), one can easily see that there exists such that
[TABLE]
Further, by using (18) and (19), we get
[TABLE]
Hence, there exists such that
[TABLE]
Using now Lemma 2 (i), we get
[TABLE]
Since and are Lipschitz continuous and and are absolutely continuous, the map
[TABLE]
is bounded on Consequently, there exists such that
[TABLE]
Similarly, using this time Lemma 2 (ii), we get
[TABLE]
Since the proximal operator is nonexpansive and and are absolutely continuous, the map
[TABLE]
is bounded on Consequently, there exists such that
[TABLE]
Further, by using (22) and (24), we get
[TABLE]
Consequently, there exists such that
[TABLE]
Summing the relations (18)-(25) we obtain that there exists such that
[TABLE]
Let be . Since the maps and are absolutely continuous on , there exists such that for any finite family of intervals such that for any subfamily of disjoint intervals with it holds
[TABLE]
[TABLE]
Consequently,
[TABLE]
hence is absolutely continuous on . This proves that the second order derivatives exist almost everywhere on
We come now to the proof of the second statement and assume to this end that and . Under these assumption, and appearing in (18), (19) and (20), respectively, can be taken as being global constants, that is, (18)-(20) hold for every .
Since and for every , from (21) and (23) we get
[TABLE]
and, respectively,
[TABLE]
for every Consequently,
[TABLE]
for every This shows that there exists such that
[TABLE]
for every
Now we fix at which the second derivative of the trajectories exist and take in the above inequality for some . This yields
[TABLE]
After dividing in the above inequality by and letting , we obtain
[TABLE]
This inequality holds for almost every ∎
4 Asymptotic analysis
In this section we will address the asymptotic behaviour of the trajectories generated by the dynamical system (4). At the beginning we will recall two results which will play a central role in the asymptotic analysis (see [2, Lemma 5.1] and [2, Lemma 5.2], respectively).
Lemma 6**.**
Suppose that is locally absolutely continuous and bounded from below and that there exists such that for almost every
[TABLE]
Then there exists .
Lemma 7**.**
If , , is locally absolutely continuous, , , and for almost every
[TABLE]
then .
The first result which we prove in this section is a continuous version of the Opial Lemma formulated in the setting of variable metrics (see [18, Theorem 3.3] for its discrete counterpart).
Lemma 8**.**
Let be a nonempty set and a continuous map. Let be such that for every with and there exists with for every . If the following two conditions are fulliled
(i) the limit exists for every ;
(ii) every weak sequential cluster point of belongs to ;
then there exists such that , converges weakly to as .
Proof.
Since and , by (i) we have that is bounded, hence it possesses at least one weak sequential cluster point, which belongs to . We show that has exactly one weak sequential cluster point.
Indeed, let two weak sequential cluster points of . For our claim it is enough to show that . Obviously and there exist the sequences with and such that converges weakly to and converges weakly to as .
Further, since for every for every with and for every it follows that for every the function
[TABLE]
is decreasing and is bounded from below, hence there exists
[TABLE]
Since , we have that the limits and exist. Further, since
[TABLE]
holds for every , the limit
[TABLE]
exists.
Next we show that the limits
[TABLE]
exists for every . To this end we fix . We will actually show that
[TABLE]
and the conclusion will follow by the Cauchy criterion.
For we have by the generalized Cauchy-Schwarz inequality that
[TABLE]
Hence, for with we have , therefore
[TABLE]
Since , we have that
[TABLE]
for every . This shows that is bounded. This, together with the fact that , which follows from (26), implies
[TABLE]
This proves (28). For every let us denote by Since for every , it holds
[TABLE]
Since converges weakly to and converges weakly to as and
[TABLE]
passing to the limit in (27) we get
[TABLE]
and
[TABLE]
In conclusion,
[TABLE]
which shows that ∎
Remark 5**.**
If a map satisfies for every with we say that is monotonically decreasing. If is monotonically decreasing and locally absolutely continuous, then exists and for almost every
The following result is an adaptation of a result from [3] to our setting.
Proposition 9**.**
(see [3, Proposition 2.4]) In the setting of the optimization problem (1), let be a sequence in the graph of and a sequence in the graph of Suppose that converges weakly to converges weakly to , and as . Then
[TABLE]
and
[TABLE]
The theorem which states the asymptotic convergence of the trajectories generated by the dynamical system (4) to a saddle point of the Lagrangian of the problem (1) follows.
Theorem 10**.**
In the setting of the optimization problem (1), assume that the set of saddle points of the Lagrangian is nonempty, the maps
[TABLE]
are locally absolutely continuous and monotonically decreasing,
[TABLE]
and
[TABLE]
For an arbitrary starting point , let be the unique strong global solution of the dynamical system (4). If one of the following conditions holds:
- (I)
there exists such that for every ;
- (II)
* and there exists such that ;*
then the trajectory converges weakly to a saddle point of as
Proof.
The proof of the theorem relies on Lemma 8. An important step in the proof will be the derivation of two inequalities of Lyapunov type, namely, (4), in the case when , and (4), in the case when . Let be a saddle point of the Lagrangian . Then
[TABLE]
According to (5) we have for almost every
[TABLE]
which yields, by taking into account the monotonicity of ,
[TABLE]
Similarly, according to (6) we have for almost every
[TABLE]
which yields, by taking into account the monotonicity of ,
[TABLE]
By using the last equation of (4) we obtain for almost every
[TABLE]
Assume that . By using the Baillon-Haddad Theorem we have for almost every
[TABLE]
By summing (30) and (32) and by taking into account (4) and (4) we obtain for almost every
[TABLE]
We have for almost every
[TABLE]
Since
[TABLE]
and
[TABLE]
we obtain from above that for almost every it holds
[TABLE]
By using Lemma 4 we observe that for almost every it holds
[TABLE]
and
[TABLE]
By plugging the last two identities and (4) into (4), we obtain for almost every
[TABLE]
According to Remark 5,
[TABLE]
for almost every . This means that for almost every we have
[TABLE]
From Lemma 6 we have
[TABLE]
Let be . By integrating (4) on the interval we obtain
[TABLE]
Letting converge to we find
[TABLE]
and, consequently,
[TABLE]
In the case when , which corresponds to the situation when is an affine-continuous function, instead of (4) we obtain that for almost every
[TABLE]
By arguing as above, we obtain also in this case (38), (39)-(41) and (42).
Further, we have that . Indeed, in case (I), when we assume that there exists such that for every , then this yields automatically. In case (II), from and , we have
[TABLE]
But, since , it yields for almost every , which means that also in this case
[TABLE]
According to Lemma 5, this yields
[TABLE]
Consequently, for almost every it holds
[TABLE]
and the right-hand side is a function in . Hence, according to Lemma 7,
[TABLE]
Similarly, we obtain that
[TABLE]
We will close the proof of the theorem by showing that the asymptotic convergence of the trajectory follows from Lemma 8. One can easily notice that (38) is nothing else but condition (i) of this lemma when applied in the product space for the trajectory
[TABLE]
the monotonically decreasing map
[TABLE]
and the set taken as the set of saddle points of the Lagrangian
Next we will show that also condition (ii) in Lemma 8 is fulfilled, namely, that every weak sequential cluster point of the trajectory is a saddle point of the Langrangian .
Let be such a weak sequentially cluster point. This means that there exists a sequence with such that converges to as in the weak topology of .
From (29) and (31) we get for every
[TABLE]
and
[TABLE]
respectively. For every , let
[TABLE]
and
[TABLE]
Hence, Similarly, for every , let
[TABLE]
and
[TABLE]
Hence,
Since , and it follows that converges weakly to as . Furthermore, since is bounded, and
[TABLE]
it follows that converges weakly to as .
From (14) we have
[TABLE]
which implies that . On the other hand, since is Lipschitz continuous, we have
[TABLE]
hence
[TABLE]
Thus, according to Proposition 9, we have
[TABLE]
Consequently, is a saddle point of
The conclusion of the theorem follows from Lemma 8. ∎
Next we will address two particular cases of the dynamical system (4). We consider first the case when for every , thus, the system (4) becomes
[TABLE]
where and . The dynamical system (44) can be seen as the continuous counterpart of the classical ADMM algorithm. The corresponding convergence result follows as a particular case of Theorem 10.
Theorem 11**.**
In the setting of the optimization problem (1), assume that the set of saddle points of the Lagrangian is nonempty, and that there exists such that . For an arbitrary starting point , let be the unique strong global solution of the dynamical system (44). Then the trajectory converges weakly to a saddle point of as
Next we consider the setting from Remark 2 with and , where is such that , for every . The resulting dynamical system is the primal-dual system (8). The corresponding convergence result follows again as a particular case of Theorem 10.
Theorem 12**.**
In the setting of the optimization problem (1), assume that the set of saddle points of the Lagrangian is nonempty, the map is locally absolutely continuous and monotonically increasing with
[TABLE]
and . For an arbitrary starting point , let be the unique strong global solution of the dynamical system (8). If one of the following assumptions holds:
- (I)
there exists such that for every ;
- (II)
* and there exists such that ;*
then the trajectory converges weakly to a saddle point of as
Remark 6**.**
Let be . Notice that the condition is fulfilled if and only if
[TABLE]
On the other hand, the condition holds, for , if and only if
[TABLE]
For the last result of this paper we go back to the general dynamical system (4) and provide convergence rates for the violation of the feasibility condition by ergodic trajectories and the convergence of the objective function along these ergodic trajectories to its minimal value. The result can be seen as the continuous counterpart of a convergence rate result proved for the ADMM algorithm in [22, Theorem 4.3].
Theorem 13**.**
In the setting of the optimization problem (1), assume that the set of saddle points of the Lagrangian is nonempty, the maps
[TABLE]
are locally absolutely continuous and monotonically decreasing,
[TABLE]
[TABLE]
and that one of the following conditions holds:
- (I)
there exists such that for every ;
- (II)
* and there exists such that ;*
For an arbitrary starting point , let be the unique strong global solution of the dynamical system (4). Consider further for every the ergodic trajectories
[TABLE]
and
[TABLE]
Then there exists such that for every
[TABLE]
In addition, for every and every such that , one has
[TABLE]
where
[TABLE]
Proof.
Let be fixed. By using (5), that is
[TABLE]
it yields
[TABLE]
for almost every . Similarly, by using (31), that is
[TABLE]
it yields
[TABLE]
for almost every . Further, by using the convexity of and the Descent Lemma we obtain for almost every
[TABLE]
Adding (45) and (4) we obtain for almost every
[TABLE]
We recall the following four identities from the proof of Theorem 10 (here were actually replace with and by [math])
[TABLE]
which corresponds to (4),
[TABLE]
which corresponds to (4), and
[TABLE]
and
[TABLE]
which all hold for for almost every . By adding the four identities, (48) and (46), we obtain for almost every
[TABLE]
By neglecting the negative terms (here we use also that ), we obtain for almost every
[TABLE]
where
[TABLE]
For and it holds
[TABLE]
From Theorem 10 it follows that the trajectory , converges weakly to a saddle point of as . This means that it is bounded, thus there exists such that
[TABLE]
Let be such that . By Jensen’s inequality in the integral form we have for every
[TABLE]
and
[TABLE]
which, combined with (4), yields
[TABLE]
Hence,
[TABLE]
∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Abbas, H. Attouch, Dynamical systems and forward-backward algorithms associated with the sum of a convex subdifferential and a monotone cocoercive operator , Optimization 64(10), 2223–2252, 2015
- 2[2] B. Abbas, H. Attouch, B.F. Svaiter, Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces , Journal of Optimization Theory and its Applications 161(2), 331–360, 2014
- 3[3] A. Alotaibi, P. L. Combettes and N. Shahzad, Solving coupled composite monotone inclusions by successive Fejér approximations of their Kuhn-Tucker set , SIAM Journal on Optimization 24(4), 2076–2095, 2014
- 4[4] F. Alvarez, H. Attouch, J. Bolte, P. Redont, A second-order gradient-like dissipative dynamical system with Hessian-driven damping. Application to optimization and mechanics , Journal de Mathématiques Pures et Appliquées (9) 81(8), 747–779, 2002
- 5[5] A.S. Antipin, Minimization of convex functions on convex sets by means of differential equations , (Russian) Differentsial’nye Uravneniya 30(9), 1475–1486, 1994; translation in Differential Equations 30(9), 1365–1375, 1994
- 6[6] H. Attouch, B.F. Svaiter, A continuous dynamical Newton-like approach to solving monotone inclusions , SIAM Journal on Control and Optimization 49(2), 574–598, 2011
- 7[7] S. Banert, R.I. Boţ, A forward-backward-forward differential equation and its asymptotic properties , Journal of Convex Analysis 25(2), 371–388, 2018
- 8[8] S. Banert, R.I. Boţ, E.R. Csetnek, Fixing and extending some recent results on the ADMM algorithm , Numerical Algorithms, DOI: 10.1007/s 11075-020-00934-5, 2020
