Shadow Douglas--Rachford Splitting for Monotone Inclusions
Ern\"o Robert Csetnek, Yura Malitsky, Matthew K. Tam

TL;DR
This paper introduces a novel algorithm derived from a non-standard discretization of a continuous dynamical system for solving monotone inclusions involving a Lipschitz continuous operator, requiring one forward and one backward evaluation per iteration.
Contribution
It presents a new explicit discretization-based algorithm for monotone inclusions, expanding the Douglas--Rachford splitting framework with a different discretization approach.
Findings
Convergence of the proposed algorithm is established.
The method efficiently handles problems with a Lipschitz continuous operator.
It offers an alternative to traditional implicit discretization methods.
Abstract
In this work, we propose a new algorithm for finding a zero in the sum of two monotone operators where one is assumed to be single-valued and Lipschitz continuous. This algorithm naturally arises from a non-standard discretization of a continuous dynamical system associated with the Douglas--Rachford splitting algorithm. More precisely, it is obtained by performing an explicit, rather than implicit, discretization with respect to one of the operators involved. Each iteration of the proposed algorithm requires the evaluation of one forward and one backward operator.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Shadow Douglas–Rachford Splitting for Monotone Inclusions
Ernö Robert Csetnek111Faculty of Mathematics, University of Vienna, [email protected]
Yura Malitsky†
Matthew K. Tam222Institute for Numerical and Applied Mathematics, University of Göttingen, [email protected], [email protected]
Abstract
In this work, we propose a new algorithm for finding a zero in the sum of two monotone operators where one is assumed to be single-valued and Lipschitz continuous. This algorithm naturally arises from a non-standard discretization of a continuous dynamical system associated with the Douglas–Rachford splitting algorithm. More precisely, it is obtained by performing an explicit, rather than implicit, discretization with respect to one of the operators involved. Each iteration of the proposed algorithm requires the evaluation of one forward and one backward operator.
**Keywords. ** monotone operator operator splitting Douglas–Rachford algorithm
dynamical systems
MSC2010. 49M29, 90C25, 47H05, 47J20, 65K15
1 Introduction
The study of continuous time dynamical systems associated with iterative algorithms for solving optimization problems has a long history which can be traced back at least to 1950s [14, 4]. The relationship between the continuous and discrete versions of an algorithm provides a unifying perspective which gives insights into their behavior and properties. As we will see in this work, this includes suggesting new algorithmic schemes as well as appropriate Lyapunov functions for analyzing their convergence properties. The interplay between continuous and discrete dynamical systems has been studied by many authors including [22, 2, 3, 9, 21, 1, 10, 5, 6].
The following well-known idea will help to motivate the approach used in this work. Let be a real Hilbert space and suppose is a maximal monotone operator. Consider the monotone equation
[TABLE]
to which the following continuous time dynamical system can be attached
[TABLE]
Let . We now devise two iterative algorithms for solving (1) by using different discretizations of in (2). To this end, let us first approximate the trajectory in (2) by discretizing at the points , and denote the discretized trajectory by .
Now, on one hand, using the forward discretization gives
[TABLE]
In the particular case when is the gradient of a function, (3) is nothing more than the classical gradient descent method. On the other hand, using the backward discretization gives
[TABLE]
where denotes the resolvent of a (potentially multi-valued) maximal monotone operator . This iteration is precisely the proximal point algorithm for the monotone inclusion (1). It is worth emphasizing that (3) and (4) are different iterative algorithms which, in general, do not converge under the same conditions. In particular, if is monotone but not cocoercive, then (4) converges to a solution for any whereas the (3) does not. Nevertheless, both algorithms correspond to the same continuous dynamical system (2).
In this work, we exploit the same type relationship between continuous and discrete dynamical systems to discover a new algorithm for monotone inclusions of the form
[TABLE]
where and are (maximally) monotone operators with Lipschitz continuous (but not necessarily cocoercive). More precisely, by using a non-standard discretization of the continuous time Douglas–Rachford algorithm, we obtain
[TABLE]
which, as we will show, converges weakly to a solution of (5) whenever . Note also that, by choosing the operators and appropriately, the setting of (5) covers smooth-nonsmooth convex minimization, monotone inclusions through duality, and saddle point problems with smooth convex-concave couplings. For further details, see [20].
Despite substantial progress in monotone operator theory, there are not so many original splitting algorithms for solving monotone inclusions of form (5) which use forward evaluations of . Tseng’s forward-backward-forward algorithm [24], published in 2000, was the first such method capable of solving (5). Until recently, this was the only known method with these properties, however there has been progress in the area with the discovery of further methods having this property [16, 17, 20]. In this connection, see also [12, 8].
The remainder of this work is organized as follows. In Section 2, we discuss the classical Douglas–Rachford and study an alternative form of its continuous time dynamical system. In Section 3, we discretize this alternative form to obtain (6) and prove its convergence. In Section 4, we briefly show how the same idea can be applied to derive a new primal-dual algorithm. Section 5 concludes our work by suggesting avenues for further investigation.
2 From the Discrete to the Continuous
The Douglas–Rachford method is an algorithm for finding a zero in the sum of maximally monotone operators, and . This popular splitting method works by only requiring the evaluation of the resolvents of each of the operators individually, rather than the resolvent of their sum. The method was first formulated for solving linear equations in [13] and later generalized to monotone inclusions in [18].
The method can be compactly described as the fixed point iteration
[TABLE]
where denotes the reflected resolvent of a monotone operator . Its behavior is summarized in the following theorem.
Theorem 1**.**
([7, Theorem 25.6]). Let and be maximally monotone operators with . Let and . Then the sequence , generated by (7), satisfies
- (i)
* converges weakly to a point .* 2. (ii)
* converges weakly to .*
The iteration (7) can be viewed as a discretization of the continuous time dynamical system
[TABLE]
where the discretization and are used. Since the operator is nonexpansive (i.e., -Lipschitz), the Picard-Lindelöf theorem [15, Theorem 2.2] implies that, for any , there exists a unique trajectory satisfying (8) and the initial condition .
Let us now express this dynamical system in an alternative form. First, by using the definition of the reflected resolvent, we observe that (8) can be written as
[TABLE]
Denote and . Clearly, . Then we have
[TABLE]
By using these identities to eliminate from (9), we obtain
[TABLE]
This system can be viewed as the continuous dynamical system associated with the shadow trajectories, , of the Douglas–Rachford system (8) specified by . In particular, this fact implies the existence of the trajectories and . In a later section, we will use a discretization of this system to obtain a new splitting algorithm.
We begin with a theorem concerning the asymptotic behavior of (11). Although this result can be obtained, with some work, from [10, Theorem 6], we give a more direct proof which serves the additional purpose of providing insights useful for the analysis of the discrete case. We require the following two preparatory lemmas.
Lemma 1**.**
Let . Suppose and are maximally monotone operators. Then the set-valued operator on defined by
[TABLE]
is demiclosed. That is, its graph is a sequentially closed set in the weak-strong topology.
Proof.
Note that the operator in (12) is maximally monotone as the sum of two maximally monotone, the latter having full domain [7, Corollary 24.4(i)]. Since maximally monotone operators are demiclosed [7, Proposition 20.32], the result follows. ∎
Although the following lemma is a direct consequence of [1, Lemma 5.2], we include its explicit statement for the convenience of the reader.
Lemma 2**.**
Suppose is -Lipschitz continuous. If and , then as .
Proof.
Since is -Lipschitz continuous, [10, Remark 1] implies that exists almost everywhere and that for almost all . From this it follows that . We also have
[TABLE]
Since the right hand side is integrable, [1, Lemma 5.2] yields the result. ∎
The following theorem is our main result regarding the asymptotic behavior of (11).
Theorem 2**.**
Let and be maximally monotone operators with . Let and . Then the trajectories , , generated by (11) with initial condition , satisfy
- (i)
* converges weakly to a point .* 2. (ii)
* converges weakly to a point .*
Proof.
Let and . Denote and . By using monotonicity of followed by monotonicity of , we obtain
[TABLE]
In particular, this shows that is decreasing, hence exists, and that . The latter combined with Lemma 2 implies that as . Monotonicity of then yields
[TABLE]
from which it follows that is bounded. By using the definition of the resolvent , we can express (11) in the form
[TABLE]
Let be a weak sequential cluster point of the bounded trajectory . Taking the limit along this subsequence in (14), using Lemma 1, and unraveling the resulting expression gives
[TABLE]
In particular, by combining (13) with (15), we deduce that exists. Applying Opial’s lemma [10, Lemma 4] then shows that converges weakly to a point where is a weak sequential cluster point of . The definition of then yields , which implies that is the unique cluster point of . The trajectory therefore converges weakly to a point . To complete the proof, simply note that as . ∎
3 From the Continuous to the Discrete
In this section, we devise a new splitting algorithm by considering different discretizations of the dynamical system (11). For the remainder of this work, we will suppose that is a single-valued operator. In this case, the system (11) simplifies to
[TABLE]
In order to discretize this system, let us replace and . As two derivatives appear in (16), there are many combinations of possible discretizations. One involves using forward discretizations of both and , that is,
[TABLE]
Under this discretization, (16) becomes
[TABLE]
As written, this expression does not given rise to a useful algorithm, since appears on both sides of the equation. However, we note that by taking and rearranging, we obtain
[TABLE]
which is precisely the usual Douglas–Rachford algorithm given in (7).
To derive a new algorithm, we consider a different discretization of (16). To this end, we perform a forward discretization of and a backward discretization of , that is,
[TABLE]
Under this discretization, (11) becomes
[TABLE]
Although not surprising, it is interesting to note that (18) and (20) only differ in the indices which appear in the last two terms. In particular, in this expression, does not appear on the right-hand side.
Before turning our attention to the convergence properties of this iteration, we make the following remark.
Remark 1**.**
Backward/forward discretizations of a derivative usually correspond to the same type of step in their discrete counterpart of the algorithms. This is, for instance, the case for the forward-backward method which includes the discussion from Section 2 as a special case. It is curious to note, however, that forward (resp. backward) discretization gave rise to backward (resp. forward) operators in the discrete counterparts. In particular, two forward discretizations of (16) gave rise the Douglas–Rachford algorithm which has two backward steps whereas one forward and one backward discretization produced a method also having one forward and one backward step.
We now prove the following preparatory lemma, which might be interesting in its own right due to the very general form of the recurrent relation.
Lemma 3**.**
Let be a maximal monotone operator and let be an arbitrary sequence. Let and consider defined by
[TABLE]
Then, for all and , we have
[TABLE]
Proof.
By the definition of the resolvent and (21), it follows that
[TABLE]
Since and is monotone, we have
[TABLE]
which is equivalent to
[TABLE]
To simplify (24), we note that
[TABLE]
Now, using the above three identities in (24), we obtain
[TABLE]
The equivalence between the last inequality and (3) is now obvious. ∎
Since (19) is of the form specified by Lemma 3, this lemma suggests one possible way to prove convergence of (19): the quantity will be decreasing if the other terms in the right hand-side of (3) can be estimated appropriately. The following theorem, which is our main result regarding convergence of (20), makes use of this observation.
Theorem 3**.**
Let be maximally monotone and be monotone and -Lipschitz with . Let , and let . Then the sequence , generated by (20), satisfies
- (i)
* converges weakly to a point .* 2. (ii)
* converges weakly to .*
Proof.
Let and set . Since (20) of the form specified by (21), we apply Lemma 3 to the monotone operator with to deduce that the inequality (3) holds. Now, using that is monotone, we have and hence
[TABLE]
Next, we estimate the inner-product in the last line of (3). To this end, note that Young’s inequality gives
[TABLE]
and that Lipschitzness of yields
[TABLE]
Combing these two estimates with (3) gives the inequality
[TABLE]
By denoting and , the previous inequality implies
[TABLE]
which telescopes to yield
[TABLE]
From this, it follows that is bounded and that . The latter, together with Lipschitz continuity of , implies and, consequently, we also have that . Since , we have
[TABLE]
Since is bounded, and is nonexpansive, it then follows that the sequence is also bounded. Also, due to (29), we see that the following limit exits
[TABLE]
Now, by using the definition of the resolvent , we can express (23) in the form
[TABLE]
Let be a weak cluster point of the bounded sequence . Taking the limit along this subsequence in (30), using Lemma 1, and unravelling the resulting expression gives
[TABLE]
Applying Opial’s Lemma [7, Lemma 2.39] then follows that converges weakly to a point where is weak cluster point of . But then the definition of yields that which implies that is the unique cluster point of . The sequence therefore converges weakly to a point . To complete the proof, simply note that as . ∎
Some remarks regarding Theorem 3 and its proof are in order.
Remark 2** (Continuous and discrete proofs).**
The sequence plays a similar role in Theorem 3 to the trajectory in Theorem 2. This does however highlight a subtle difference between the two proofs — in the discrete case, we have x_{k}=J_{\lambda B}\bigl{(}z_{k}+(y_{k}-y_{k-1})\bigr{)} whereas, in the continuous case, we have . Note also that although our combination of the estimates (27) and (28) for may appear somewhat arbitrary, the combination of these two inequalities is in fact optimal.
Remark 3**.**
Although we were unable to prove so in Theorem 3, we conjecture that the interval in which lies can be extended to . Our original motivation for considering the continuous dynamical system (11) did not arise from its connection to the Douglas–Rachford algorithm, but rather it from its connection to the operator splitting method studied in [20] given by
[TABLE]
Note that the iterations (20) and (32) look very similar and, in fact, coincide if is the identity operator. For (32), convergence has been established when , which is slightly better than for (20). Thus, in the case that , this provides some evidence for the conjecture.
On the other hand, the analysis of dynamical systems corresponding to (32) is more complicated. In particular, a natural candidate for a continuous analogue of (32) is given by
[TABLE]
Because we are unable to couple the derivatives and in (33) in general, it is not clear how to prove existence of its trajectory .
4 Primal-Dual Algorithms
In this section, we use Lemma 3 from Section 3 to analyse a new primal-dual algorithm. Consider the bilinear convex-concave saddle point problem
[TABLE]
where , are proper convex lsc functions, is a bounded linear operator with norm , and denotes the Fenchel conjugate of . A popular method to solve this problem is the primal-dual method [11] defined by
[TABLE]
Under the assumption that the solution set of (34) is non-empty and that , one can prove that the sequence weakly converges to a saddle point of (34).
In spirit of (20), we propose the following novel primal-dual algorithm:
[TABLE]
In the following theorem, we prove convergence of this algorithm. As one can see, the conditions required for its convergence are exactly the same as for (35). Rather than present the full proof, we will only focus on the most important ingredient — the fact that , remain bounded. One this is established, the rest of the proof follows the standard argument, as in Theorem 3.
Theorem 4**.**
Let , be proper convex lsc functions and be a bounded linear operator with norm such that the solution set of (34) is nonempty. Let , let , and let . Then the sequence , generated by (36), converges weakly to a solution of (34).
Proof.
Let be a saddle point of (34). Then the first-order optimality conditions give and . By applying Lemma 3 for a fixed with
[TABLE]
we obtain
[TABLE]
where, instead of (3), we used its equivalent form (25). Similarly, by applying Lemma 3 for a fixed with
[TABLE]
we obtain
[TABLE]
By applying Young’s inequality and using the inequality , we have
[TABLE]
Now, multiplying (37) by , (38) by , summing these two inequalities, and then using the estimate (39) yields
[TABLE]
By telescoping this inequality, one obtains boundedness of and . In fact, a slightly tighter estimation in (39) would yield and (since the inequality is strict). ∎
Although we do not know yet if the proposed scheme (36) has any benefits as compared to (35), we believe that both algorithms will perform very similarly. Nevertheless, the fact that the Lyapunov function associated with the analysis of (36) is different to the one use for (35) might be of interest for deriving new extensions.
5 Concluding Remarks/Future Directions
In this work, we proposed and analyzed a new algorithm for finding a zero in the sum of two monotone operators, one of which is assumed to be Lipschitz continuous. This algorithm naturally arise from a non-standard discretization of a continuous dynamical system with the Douglas–Rachford algorithm. To conclude, we outline possible directions for future work.
- •
Extending the stepsize: In our main result, Theorem 3, we established convergence whenever . However, for the reasons discussed in Remark 3, the upper-bound can be improved to , at least when . It would be interesting to either improve or show, by means of a counterexample, that the condition is optimal. Furthermore, it would also be interesting to investigate the optimal convergence rate under some additional assumptions, as it was done in [23] for the classical Douglas–Rachford algorithm.
- •
Linesearch: It would be interesting to incorporate a linesearch procedure in the shadow Douglas–Rachford method. Similarly, it makes sense to consider a continuous dynamic scheme with variable steps, as it was done, for example, in [6] for Tseng’s method.
- •
Inertial terms: It is important to study the extensions of (11) and (20), which incorporate additional inertial and relaxed terms, as it was done in the recent work [5] for the forward-backward method. Combining inertial and relaxing effects allows one to go beyond the standard bound of for the stepsize associated with the inertial term.
- •
Role of reflection: Perhaps the most interesting and challenging direction for future work is to understand why the inclusion of a “reflection term” in an algorithm allows for convergence to proven under milder hypotheses. For instance, applied to the saddle point problem (34), the famous Arrow–Hurwicz algorithm [4] can fail to converge. In contrast, both (35) and (36), which can be viewed its “reflected” modifications, do converge. Similarly, for the monotone variational inequality , where is a closed convex set and is its normal cone, the projected gradient algorithm
[TABLE]
does not work, but its “reflected” modification [19] given by
[TABLE]
does converge to a solution. For the more general monotone inclusion , the forward-backward method also does not work, however both of its “reflected” modifications, (20) and (32), do. We note however that although all of aforementioned algorithms share the same “reflected term”, their analyses are not the same. It would be interesting to understand deeper reasons for their success.
Acknowledgements.
E.R. Csetnek was supported by Austrian Science Fund Project P 29809-N32. Y. Maltsky was supported by German Research Foundation grant SFB755-A4. The authors also would like to thank the Erwin Schr̈odinger Institute for their support and hospitality during the thematic program “Modern Maximal Monotone Operator Theory: From Nonsmooth Optimization to Differential Inclusions”.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Abbas, B., Attouch, H., and Svaiter, B. F. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161 , 2 (2014), 331–360.
- 2[2] Al’ber, Y. I. Continuous regularization of linear operator equations in a hilbert space. Mathematical notes of the Academy of Sciences of the USSR 4 , 5 (1968), 793–797.
- 3[3] Antipin, A. S. Minimization of convex functions on convex sets by means of differential equations. Differential equations 30 , 9 (1994), 1365–1375.
- 4[4] Arrow, K., and Hurwicz, L. Gradient methods for constrained maxima. Operations Research 5 , 2 (1957), 258–265.
- 5[5] Attouch, H., and Cabot, A. Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. preprint hal-01782016 (2018).
- 6[6] Banert, S., and Boţ, R. I. A forward-backward-forward differential equation and its asymptotic properties. Journal of Convex Analysis 25 , 2 (2018), 371–388.
- 7[7] Bauschke, H. H., and Combettes, P. L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 1st ed. Springer Science+Business Media, New York, 2011.
- 8[8] Bello Cruz, J., and Díaz Millán, R. A variant of forward-backward splitting method for the sum of two monotone operators with a new search strategy. Optim. 64 , 7 (2015), 1471–1486.
