Shadow Douglas--Rachford Splitting for Monotone Inclusions

Ern\"o Robert Csetnek; Yura Malitsky; Matthew K. Tam

arXiv:1903.03393·math.OC·June 17, 2020

Shadow Douglas--Rachford Splitting for Monotone Inclusions

Ern\"o Robert Csetnek, Yura Malitsky, Matthew K. Tam

PDF

TL;DR

This paper introduces a novel algorithm derived from a non-standard discretization of a continuous dynamical system for solving monotone inclusions involving a Lipschitz continuous operator, requiring one forward and one backward evaluation per iteration.

Contribution

It presents a new explicit discretization-based algorithm for monotone inclusions, expanding the Douglas--Rachford splitting framework with a different discretization approach.

Findings

01

Convergence of the proposed algorithm is established.

02

The method efficiently handles problems with a Lipschitz continuous operator.

03

It offers an alternative to traditional implicit discretization methods.

Abstract

In this work, we propose a new algorithm for finding a zero in the sum of two monotone operators where one is assumed to be single-valued and Lipschitz continuous. This algorithm naturally arises from a non-standard discretization of a continuous dynamical system associated with the Douglas--Rachford splitting algorithm. More precisely, it is obtained by performing an explicit, rather than implicit, discretization with respect to one of the operators involved. Each iteration of the proposed algorithm requires the evaluation of one forward and one backward operator.

Equations120

find \leavevmode x \in H such that 0 = B (x),

find \leavevmode x \in H such that 0 = B (x),

\overset{x}{˙} (t) = - B (x (t)) .

\overset{x}{˙} (t) = - B (x (t)) .

x_{k + 1} = x_{k} - λ B (x_{k}) .

x_{k + 1} = x_{k} - λ B (x_{k}) .

x_{k} = J_{λ B} (x_{k - 1}),

x_{k} = J_{λ B} (x_{k - 1}),

find \leavevmode x \in H such that 0 \in (A + B) (x),

find \leavevmode x \in H such that 0 \in (A + B) (x),

x_{k+1}=J_{\lambda A}\bigl{(}x_{k}-\lambda B(x_{k})\bigr{)}-\lambda\bigl{(}B(x_{k})-B(x_{k-1})\bigr{)},

x_{k+1}=J_{\lambda A}\bigl{(}x_{k}-\lambda B(x_{k})\bigr{)}-\lambda\bigl{(}B(x_{k})-B(x_{k-1})\bigr{)},

z_{k + 1} = (\frac{Id + R _{λ A} R _{λ B}}{2}) z_{k},

z_{k + 1} = (\frac{Id + R _{λ A} R _{λ B}}{2}) z_{k},

\overset{z}{˙} (t) + z (t) = (\frac{Id + R _{λ A} R _{λ B}}{2}) z (t),

\overset{z}{˙} (t) + z (t) = (\frac{Id + R _{λ A} R _{λ B}}{2}) z (t),

\dot{z}(t)=J_{\lambda A}\bigl{(}2J_{\lambda B}(z(t))-z(t)\bigr{)}-J_{\lambda B}(z(t)).

\dot{z}(t)=J_{\lambda A}\bigl{(}2J_{\lambda B}(z(t))-z(t)\bigr{)}-J_{\lambda B}(z(t)).

z (t) = x (t) + y (t), and \overset{z}{˙} (t) = \overset{x}{˙} (t) + \overset{y}{˙} (t) .

z (t) = x (t) + y (t), and \overset{z}{˙} (t) = \overset{x}{˙} (t) + \overset{y}{˙} (t) .

\overset{x}{˙} (t) + x (t)

\overset{x}{˙} (t) + x (t)

y (t)

(y x) \mapsto ([λ A (λ B)^{- 1}] + [0 - Id Id 0]) (y x),

(y x) \mapsto ([λ A (λ B)^{- 1}] + [0 - Id Id 0]) (y x),

\frac{d}{d t} ∥ \overset{z}{˙} (t) ∥^{2} = 2 ⟨ \overset{z}{¨} (t), \overset{z}{˙} (t) ⟩ \leq ∥ \overset{z}{¨} (t) ∥^{2} + ∥ \overset{z}{˙} (t) ∥^{2} .

\frac{d}{d t} ∥ \overset{z}{˙} (t) ∥^{2} = 2 ⟨ \overset{z}{¨} (t), \overset{z}{˙} (t) ⟩ \leq ∥ \overset{z}{¨} (t) ∥^{2} + ∥ \overset{z}{˙} (t) ∥^{2} .

0

0

= ⟨ \overset{z}{˙} (t), \overset{z}{ˉ} - z (t) ⟩ - ∥ \overset{z}{˙} (t) ∥^{2} - ⟨ y (t) - \overset{y}{ˉ}, x (t) - \overset{x}{ˉ} ⟩

\leq - \frac{1}{2} \frac{d}{d t} ∥ z (t) - \overset{z}{ˉ} ∥^{2} - ∥ \overset{z}{˙} (t) ∥^{2} .

∥ z (t) - \overset{z}{ˉ} ∥^{2}

∥ z (t) - \overset{z}{ˉ} ∥^{2}

\geq ∥ x (t) - \overset{x}{ˉ} ∥^{2} + ∥ y (t) - \overset{y}{ˉ} ∥^{2},

- (z ˙ ( t ) z ˙ ( t )) \in ([λ A (λ B)^{- 1}] + [0 - Id Id 0]) (z ( t ) - x ( t ) z ˙ ( t ) + x ( t )) .

- (z ˙ ( t ) z ˙ ( t )) \in ([λ A (λ B)^{- 1}] + [0 - Id Id 0]) (z ( t ) - x ( t ) z ˙ ( t ) + x ( t )) .

{0 x \in λ A (x) + (z - x) \in (λ B)^{- 1} (z - x) ⟹ {x z \in zer (A + B) \in x + λ B (x)

{0 x \in λ A (x) + (z - x) \in (λ B)^{- 1} (z - x) ⟹ {x z \in zer (A + B) \in x + λ B (x)

\overset{x}{˙} (t) + x (t)

\overset{x}{˙} (t) + x (t)

y (t)

\overset{x}{˙} (t) \approx x_{k + 1} - x_{k}, \overset{y}{˙} (t) \approx y_{k + 1} - y_{k} .

\overset{x}{˙} (t) \approx x_{k + 1} - x_{k}, \overset{y}{˙} (t) \approx y_{k + 1} - y_{k} .

x_{k+1}=J_{\lambda A}(x_{k}-\lambda B(x_{k}))-\lambda\bigl{(}B(x_{k+1})-B(x_{k})\bigr{)}.

x_{k+1}=J_{\lambda A}(x_{k}-\lambda B(x_{k}))-\lambda\bigl{(}B(x_{k+1})-B(x_{k})\bigr{)}.

z_{k + 1} = z_{k} + J_{λ A} (2 J_{λ B} z_{k} - z_{k}) - J_{λ B} (z_{k}),

z_{k + 1} = z_{k} + J_{λ A} (2 J_{λ B} z_{k} - z_{k}) - J_{λ B} (z_{k}),

\overset{x}{˙} (t) \approx x_{k + 1} - x_{k}, \overset{y}{˙} (t) \approx y_{k} - y_{k - 1},

\overset{x}{˙} (t) \approx x_{k + 1} - x_{k}, \overset{y}{˙} (t) \approx y_{k} - y_{k - 1},

x_{k+1}=J_{\lambda A}\bigl{(}x_{k}-\lambda B(x_{k})\bigr{)}-\lambda\bigl{(}B(x_{k})-B(x_{k-1})\bigr{)}.

x_{k+1}=J_{\lambda A}\bigl{(}x_{k}-\lambda B(x_{k})\bigr{)}-\lambda\bigl{(}B(x_{k})-B(x_{k-1})\bigr{)}.

x_{k + 1} = J_{A} (x_{k} - y_{k}) - (y_{k} - y_{k - 1}), \forall k \in N .

x_{k + 1} = J_{A} (x_{k} - y_{k}) - (y_{k} - y_{k - 1}), \forall k \in N .

∥ (x_{k + 1}

∥ (x_{k + 1}

+ 4 ⟨ y_{k} - y_{k - 1}, x_{k} - x_{k + 1} ⟩ - ∥ x_{k + 1} - x_{k} ∥^{2} - 3 ∥ y_{k} - y_{k - 1} ∥^{2} .

x_{k+1}-x_{k}+y_{k}+(y_{k}-y_{k-1})\in-A\bigl{(}x_{k+1}+y_{k}-y_{k-1}\bigr{)}.

x_{k+1}-x_{k}+y_{k}+(y_{k}-y_{k-1})\in-A\bigl{(}x_{k+1}+y_{k}-y_{k-1}\bigr{)}.

0 \leq ⟨ x_{k + 1} - x_{k} + y_{k} + (y_{k} - y_{k - 1}) - y, x - x_{k + 1} - (y_{k} - y_{k - 1}) ⟩,

0 \leq ⟨ x_{k + 1} - x_{k} + y_{k} + (y_{k} - y_{k - 1}) - y, x - x_{k + 1} - (y_{k} - y_{k - 1}) ⟩,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Shadow Douglas–Rachford Splitting for Monotone Inclusions

Ernö Robert Csetnek111Faculty of Mathematics, University of Vienna, [email protected]

Yura Malitsky†

Matthew K. Tam222Institute for Numerical and Applied Mathematics, University of Göttingen, [email protected], [email protected]

Abstract

In this work, we propose a new algorithm for finding a zero in the sum of two monotone operators where one is assumed to be single-valued and Lipschitz continuous. This algorithm naturally arises from a non-standard discretization of a continuous dynamical system associated with the Douglas–Rachford splitting algorithm. More precisely, it is obtained by performing an explicit, rather than implicit, discretization with respect to one of the operators involved. Each iteration of the proposed algorithm requires the evaluation of one forward and one backward operator.

**Keywords. ** monotone operator $\cdot$ operator splitting $\cdot$ Douglas–Rachford algorithm $\cdot$

dynamical systems

MSC2010. 49M29, 90C25, 47H05, 47J20, 65K15

1 Introduction

The study of continuous time dynamical systems associated with iterative algorithms for solving optimization problems has a long history which can be traced back at least to 1950s [14, 4]. The relationship between the continuous and discrete versions of an algorithm provides a unifying perspective which gives insights into their behavior and properties. As we will see in this work, this includes suggesting new algorithmic schemes as well as appropriate Lyapunov functions for analyzing their convergence properties. The interplay between continuous and discrete dynamical systems has been studied by many authors including [22, 2, 3, 9, 21, 1, 10, 5, 6].

The following well-known idea will help to motivate the approach used in this work. Let $\mathcal{H}$ be a real Hilbert space and suppose ${B:\mathcal{H}\to\mathcal{H}}$ is a maximal monotone operator. Consider the monotone equation

[TABLE]

to which the following continuous time dynamical system can be attached

[TABLE]

Let $\lambda>0$ . We now devise two iterative algorithms for solving (1) by using different discretizations of $\dot{x}(t)$ in (2). To this end, let us first approximate the trajectory $x(t)$ in (2) by discretizing at the points $(k\lambda)_{k\in\mathbb{Z}_{+}}$ , and denote the discretized trajectory by $x_{k}:=x(k\lambda)$ .

Now, on one hand, using the forward discretization $\dot{x}(t)\approx\frac{x_{k+1}-x_{k}}{\lambda}$ gives

[TABLE]

In the particular case when $B$ is the gradient of a function, (3) is nothing more than the classical gradient descent method. On the other hand, using the backward discretization $\dot{x}(t)\approx\frac{x_{k}-x_{k-1}}{\lambda}$ gives

[TABLE]

where $J_{A}:=(\operatorname{Id}+A)^{-1}$ denotes the resolvent of a (potentially multi-valued) maximal monotone operator $A:\mathcal{H}\rightrightarrows A$ . This iteration is precisely the proximal point algorithm for the monotone inclusion (1). It is worth emphasizing that (3) and (4) are different iterative algorithms which, in general, do not converge under the same conditions. In particular, if $B$ is monotone but not cocoercive, then (4) converges to a solution for any $\lambda>0$ whereas the (3) does not. Nevertheless, both algorithms correspond to the same continuous dynamical system (2).

In this work, we exploit the same type relationship between continuous and discrete dynamical systems to discover a new algorithm for monotone inclusions of the form

[TABLE]

where $A:\mathcal{H}\rightrightarrows\mathcal{H}$ and $B:\mathcal{H}\to\mathcal{H}$ are (maximally) monotone operators with $B$ Lipschitz continuous (but not necessarily cocoercive). More precisely, by using a non-standard discretization of the continuous time Douglas–Rachford algorithm, we obtain

[TABLE]

which, as we will show, converges weakly to a solution of (5) whenever $\lambda\in(0,\frac{1}{3L})$ . Note also that, by choosing the operators $A$ and $B$ appropriately, the setting of (5) covers smooth-nonsmooth convex minimization, monotone inclusions through duality, and saddle point problems with smooth convex-concave couplings. For further details, see [20].

Despite substantial progress in monotone operator theory, there are not so many original splitting algorithms for solving monotone inclusions of form (5) which use forward evaluations of $B$ . Tseng’s forward-backward-forward algorithm [24], published in 2000, was the first such method capable of solving (5). Until recently, this was the only known method with these properties, however there has been progress in the area with the discovery of further methods having this property [16, 17, 20]. In this connection, see also [12, 8].

The remainder of this work is organized as follows. In Section 2, we discuss the classical Douglas–Rachford and study an alternative form of its continuous time dynamical system. In Section 3, we discretize this alternative form to obtain (6) and prove its convergence. In Section 4, we briefly show how the same idea can be applied to derive a new primal-dual algorithm. Section 5 concludes our work by suggesting avenues for further investigation.

2 From the Discrete to the Continuous

The Douglas–Rachford method is an algorithm for finding a zero in the sum of maximally monotone operators, $A$ and $B$ . This popular splitting method works by only requiring the evaluation of the resolvents of each of the operators individually, rather than the resolvent of their sum. The method was first formulated for solving linear equations in [13] and later generalized to monotone inclusions in [18].

The method can be compactly described as the fixed point iteration

[TABLE]

where $R_{\lambda B}=2J_{\lambda B}-\operatorname{Id}$ denotes the reflected resolvent of a monotone operator $\lambda B$ . Its behavior is summarized in the following theorem.

Theorem 1.

([7, Theorem 25.6]). Let $A\colon\mathcal{H}\rightrightarrows\mathcal{H}$ and $B\colon\mathcal{H}\rightrightarrows\mathcal{H}$ be maximally monotone operators with $\operatorname{zer}(A+B)\neq\varnothing$ . Let $\lambda>0$ and $z_{0}\in\mathcal{H}$ . Then the sequence $(z_{k})$ , generated by (7), satisfies

(i)

$(z_{k})$ * converges weakly to a point $z\in\operatorname{Fix}(R_{\lambda A}R_{\lambda B})$ .* 2. (ii)

$(J_{\lambda B}z_{k})$ * converges weakly to $J_{\lambda B}z\in\operatorname{zer}(A+B)$ .*

The iteration (7) can be viewed as a discretization of the continuous time dynamical system

[TABLE]

where the discretization $\dot{z}(t)\approx z_{k+1}-z_{k}$ and $z(t)\approx z_{k}$ are used. Since the operator $R_{\lambda A}R_{\lambda B}$ is nonexpansive (i.e., $1$ -Lipschitz), the Picard-Lindelöf theorem [15, Theorem 2.2] implies that, for any $z_{0}\in\mathcal{H}$ , there exists a unique trajectory $z(t)$ satisfying (8) and the initial condition $z(0)=z_{0}$ .

Let us now express this dynamical system in an alternative form. First, by using the definition of the reflected resolvent, we observe that (8) can be written as

[TABLE]

Denote $x(t)=J_{\lambda B}(z(t))$ and $y(t)=z(t)-x(t)$ . Clearly, $y(t)\in\lambda B(x(t))$ . Then we have

[TABLE]

By using these identities to eliminate $z$ from (9), we obtain

[TABLE]

This system can be viewed as the continuous dynamical system associated with the shadow trajectories, $x(t)$ , of the Douglas–Rachford system (8) specified by $z(t)$ . In particular, this fact implies the existence of the trajectories $x(t)$ and $y(t)$ . In a later section, we will use a discretization of this system to obtain a new splitting algorithm.

We begin with a theorem concerning the asymptotic behavior of (11). Although this result can be obtained, with some work, from [10, Theorem 6], we give a more direct proof which serves the additional purpose of providing insights useful for the analysis of the discrete case. We require the following two preparatory lemmas.

Lemma 1.

Let $\lambda>0$ . Suppose $A\colon\mathcal{H}\rightrightarrows\mathcal{H}$ and $B\colon\mathcal{H}\rightrightarrows\mathcal{H}$ are maximally monotone operators. Then the set-valued operator on $\mathcal{H}\times\mathcal{H}$ defined by

[TABLE]

is demiclosed. That is, its graph is a sequentially closed set in the weak-strong topology.

Proof.

Note that the operator in (12) is maximally monotone as the sum of two maximally monotone, the latter having full domain [7, Corollary 24.4(i)]. Since maximally monotone operators are demiclosed [7, Proposition 20.32], the result follows. ∎

Although the following lemma is a direct consequence of [1, Lemma 5.2], we include its explicit statement for the convenience of the reader.

Lemma 2.

Suppose $T\colon\mathcal{H}\to\mathcal{H}$ is $L$ -Lipschitz continuous. If $\dot{z}(t)=T(z(t))$ and $\int_{0}^{\infty}\left\|\dot{z}(t)\right\|^{2}\,dt<+\infty$ , then $\dot{z}(t)\to 0$ as $t\to+\infty$ .

Proof.

Since $T$ is $L$ -Lipschitz continuous, [10, Remark 1] implies that $\ddot{z}$ exists almost everywhere and that $\left\|\ddot{z}(t)\right\|=\left\|\frac{d}{dt}Tz(t)\right\|\leq L\left\|\dot{z}(t)\right\|$ for almost all $t\geq 0$ . From this it follows that $\int_{0}^{\infty}\left\|\ddot{z}(t)\right\|^{2}\,dt<+\infty$ . We also have

[TABLE]

Since the right hand side is integrable, [1, Lemma 5.2] yields the result. ∎

The following theorem is our main result regarding the asymptotic behavior of (11).

Theorem 2.

Let $A\colon\mathcal{H}\rightrightarrows\mathcal{H}$ and $B\colon\mathcal{H}\rightrightarrows\mathcal{H}$ be maximally monotone operators with $\operatorname{zer}(A+B)\neq\varnothing$ . Let $\lambda>0$ and $x_{0}\in\mathcal{H}$ . Then the trajectories $x(t)$ , $y(t)$ , generated by (11) with initial condition $x(0)=x_{0}$ , satisfy

(i)

$x(t)$ * converges weakly to a point $\bar{x}\in\operatorname{zer}(A+B)$ .* 2. (ii)

$y(t)$ * converges weakly to a point $\bar{y}\in\lambda B(\bar{x})\cap(-\lambda A(\bar{x}))$ .*

Proof.

Let $\bar{x}\in\operatorname{zer}(A+B)$ and $\bar{y}\in\lambda B(\bar{x})\cap(-\lambda A(\bar{x}))$ . Denote $\bar{z}=\bar{x}+\bar{y}$ and $z(t)=x(t)+y(t)$ . By using monotonicity of $\lambda A$ followed by monotonicity of $\lambda B$ , we obtain

[TABLE]

In particular, this shows that $\left\|z(t)-\bar{z}\right\|^{2}$ is decreasing, hence $\lim_{t\to\infty}\left\|z(t)-\bar{z}\right\|$ exists, and that $\int_{0}^{\infty}\left\|\dot{z}(t)\right\|^{2}\,dt<+\infty$ . The latter combined with Lemma 2 implies that $\dot{z}(t)\to 0$ as $t\to\infty$ . Monotonicity of $\lambda B$ then yields

[TABLE]

from which it follows that $x(t)$ is bounded. By using the definition of the resolvent $J_{\lambda A}$ , we can express (11) in the form

[TABLE]

Let $(x,z)$ be a weak sequential cluster point of the bounded trajectory $(x(t),z(t))$ . Taking the limit along this subsequence in (14), using Lemma 1, and unraveling the resulting expression gives

[TABLE]

In particular, by combining (13) with (15), we deduce that $\lim_{t\to+\infty}\left\|z(t)-z\right\|^{2}$ exists. Applying Opial’s lemma [10, Lemma 4] then shows that $z(t)$ converges weakly to a point $\bar{z}\in\bar{x}+\lambda B(\bar{x})$ where $\bar{x}$ is a weak sequential cluster point of $x(t)$ . The definition of $J_{\lambda B}$ then yields $\bar{x}=J_{\lambda B}(\bar{z})$ , which implies that $J_{\lambda B}(\bar{z})$ is the unique cluster point of $x(t)$ . The trajectory $x(t)$ therefore converges weakly to a point $\bar{x}\in\operatorname{zer}(A+B)$ . To complete the proof, simply note that $y(t)=z(t)-x(t)\rightharpoonup\bar{z}-\bar{x}\in\lambda B(\bar{x})\cap(-\lambda A(\bar{x}))$ as $t\to+\infty$ . ∎

3 From the Continuous to the Discrete

In this section, we devise a new splitting algorithm by considering different discretizations of the dynamical system (11). For the remainder of this work, we will suppose that $B$ is a single-valued operator. In this case, the system (11) simplifies to

[TABLE]

In order to discretize this system, let us replace $x(t)\approx x_{k}$ and $y(t)\approx y_{k}$ . As two derivatives appear in (16), there are many combinations of possible discretizations. One involves using forward discretizations of both $\dot{x}(t)$ and $\dot{y}(t)$ , that is,

[TABLE]

Under this discretization, (16) becomes

[TABLE]

As written, this expression does not given rise to a useful algorithm, since $x_{k+1}$ appears on both sides of the equation. However, we note that by taking $z_{k}=x_{k}+y_{k}=(I+\lambda B)x_{k}$ and rearranging, we obtain

[TABLE]

which is precisely the usual Douglas–Rachford algorithm given in (7).

To derive a new algorithm, we consider a different discretization of (16). To this end, we perform a forward discretization of $\dot{x}(t)$ and a backward discretization of $\dot{y}(t)$ , that is,

[TABLE]

Under this discretization, (11) becomes

[TABLE]

Although not surprising, it is interesting to note that (18) and (20) only differ in the indices which appear in the last two terms. In particular, in this expression, $x_{k+1}$ does not appear on the right-hand side.

Before turning our attention to the convergence properties of this iteration, we make the following remark.

Remark 1.

Backward/forward discretizations of a derivative usually correspond to the same type of step in their discrete counterpart of the algorithms. This is, for instance, the case for the forward-backward method which includes the discussion from Section 2 as a special case. It is curious to note, however, that forward (resp. backward) discretization gave rise to backward (resp. forward) operators in the discrete counterparts. In particular, two forward discretizations of (16) gave rise the Douglas–Rachford algorithm which has two backward steps whereas one forward and one backward discretization produced a method also having one forward and one backward step.

We now prove the following preparatory lemma, which might be interesting in its own right due to the very general form of the recurrent relation.

Lemma 3.

Let $A\colon\mathcal{H}\rightrightarrows\mathcal{H}$ be a maximal monotone operator and let $(y_{k})\subset\mathcal{H}$ be an arbitrary sequence. Let $x_{0}\in\mathcal{H}$ and consider $(x_{k})$ defined by

[TABLE]

Then, for all $x\in\mathcal{H}$ and $y\in-A(x)$ , we have

[TABLE]

Proof.

By the definition of the resolvent and (21), it follows that

[TABLE]

Since $-y\in A(x)$ and $A$ is monotone, we have

[TABLE]

which is equivalent to

[TABLE]

To simplify (24), we note that

[TABLE]

Now, using the above three identities in (24), we obtain

[TABLE]

The equivalence between the last inequality and (3) is now obvious. ∎

Since (19) is of the form specified by Lemma 3, this lemma suggests one possible way to prove convergence of (19): the quantity $\left\|x_{k}+y_{k-1}-x-y\right\|^{2}$ will be decreasing if the other terms in the right hand-side of (3) can be estimated appropriately. The following theorem, which is our main result regarding convergence of (20), makes use of this observation.

Theorem 3.

Let $A:\mathcal{H}\rightrightarrows\mathcal{H}$ be maximally monotone and $B:\mathcal{H}\to\mathcal{H}$ be monotone and $L$ -Lipschitz with $\operatorname{zer}(A+B)\neq\varnothing$ . Let $\varepsilon>0$ , $\lambda\in\left[\varepsilon,\frac{1-3\varepsilon}{3L}\right]$ and let $x_{0},x_{-1}\in\mathcal{H}$ . Then the sequence $(x_{k})$ , generated by (20), satisfies

(i)

$(x_{k})$ * converges weakly to a point $\overline{x}\in\operatorname{zer}(A+B)$ .* 2. (ii)

$(B(x_{k}))$ * converges weakly to $B(\bar{x})$ .*

Proof.

Let $x\in\operatorname{zer}(A+B)$ and set $y=\lambda B(x)\in-\lambda A(x)$ . Since (20) of the form specified by (21), we apply Lemma 3 to the monotone operator $\lambda A$ with $y_{k}=\lambda B(x_{k})$ to deduce that the inequality (3) holds. Now, using that $B$ is monotone, we have $\left\langle y_{k}-y,x_{k}-x\right\rangle\geq 0$ and hence

[TABLE]

Next, we estimate the inner-product in the last line of (3). To this end, note that Young’s inequality gives

[TABLE]

and that Lipschitzness of $B$ yields

[TABLE]

Combing these two estimates with (3) gives the inequality

[TABLE]

By denoting $z_{k}=x_{k}+y_{k-1}$ and $z=x+y$ , the previous inequality implies

[TABLE]

which telescopes to yield

[TABLE]

From this, it follows that $(z_{k})$ is bounded and that ${\left\|x_{k}-x_{k-1}\right\|\to 0}$ . The latter, together with Lipschitz continuity of $B$ , implies ${\left\|y_{k}-y_{k-1}\right\|\to 0}$ and, consequently, we also have that $\left\|z_{k}-z_{k-1}\right\|\to 0$ . Since $z_{k}=(\operatorname{Id}+\lambda B)x_{k}+(y_{k-1}-y_{k})$ , we have

[TABLE]

Since $(z_{k})$ is bounded, $\left\|y_{k}-y_{k-1}\right\|\to 0$ and $J_{\lambda B}$ is nonexpansive, it then follows that the sequence $(x_{k})$ is also bounded. Also, due to (29), we see that the following limit exits

[TABLE]

Now, by using the definition of the resolvent $J_{\lambda A}$ , we can express (23) in the form

[TABLE]

Let $(x,z)$ be a weak cluster point of the bounded sequence $(x_{k},z_{k})$ . Taking the limit along this subsequence in (30), using Lemma 1, and unravelling the resulting expression gives

[TABLE]

Applying Opial’s Lemma [7, Lemma 2.39] then follows that $(z_{k})$ converges weakly to a point $\bar{z}=\bar{x}+\lambda B(\bar{x})$ where $\bar{x}$ is weak cluster point of $(x_{k})$ . But then the definition of $J_{\lambda B}$ yields that $\bar{x}=J_{\lambda B}(\bar{z})$ which implies that $J_{\lambda B}(\bar{z})$ is the unique cluster point of $(x_{k})$ . The sequence $(x_{k})$ therefore converges weakly to a point $\bar{x}\in\operatorname{zer}(A+B)$ . To complete the proof, simply note that $y_{k-1}=z_{k}-x_{k}\rightharpoonup\bar{z}-\bar{x}=\lambda B(\bar{x})$ as $k\to\infty$ . ∎

Some remarks regarding Theorem 3 and its proof are in order.

Remark 2 (Continuous and discrete proofs).

The sequence $z_{k}=x_{k}+y_{k-1}$ plays a similar role in Theorem 3 to the trajectory $z(t)=x(t)+y(t)$ in Theorem 2. This does however highlight a subtle difference between the two proofs — in the discrete case, we have $x_{k}=J_{\lambda B}\bigl{(}z_{k}+(y_{k}-y_{k-1})\bigr{)}$ whereas, in the continuous case, we have $x(t)=J_{\lambda B}(z(t))$ . Note also that although our combination of the estimates (27) and (28) for $\left\langle y_{k}-y_{k-1},x_{k}-x_{k+1}\right\rangle$ may appear somewhat arbitrary, the combination of these two inequalities is in fact optimal.

Remark 3.

Although we were unable to prove so in Theorem 3, we conjecture that the interval in which $\lambda$ lies can be extended to $\lambda\in(0,\frac{1}{2L})$ . Our original motivation for considering the continuous dynamical system (11) did not arise from its connection to the Douglas–Rachford algorithm, but rather it from its connection to the operator splitting method studied in [20] given by

[TABLE]

Note that the iterations (20) and (32) look very similar and, in fact, coincide if $J_{A}$ is the identity operator. For (32), convergence has been established when $\lambda<\frac{1}{2L}$ , which is slightly better than for (20). Thus, in the case that $A=0$ , this provides some evidence for the conjecture.

On the other hand, the analysis of dynamical systems corresponding to (32) is more complicated. In particular, a natural candidate for a continuous analogue of (32) is given by

[TABLE]

Because we are unable to couple the derivatives $\dot{x}(t)$ and $\dot{y}(t)$ in (33) in general, it is not clear how to prove existence of its trajectory $x(t)$ .

4 Primal-Dual Algorithms

In this section, we use Lemma 3 from Section 3 to analyse a new primal-dual algorithm. Consider the bilinear convex-concave saddle point problem

[TABLE]

where $g\colon\mathcal{H}_{1}\to(-\infty,+\infty]$ , $f\colon\mathcal{H}_{2}\to(-\infty,+\infty]$ are proper convex lsc functions, $K\colon\mathcal{H}_{1}\to\mathcal{H}_{2}$ is a bounded linear operator with norm $\left\|K\right\|$ , and $f^{*}$ denotes the Fenchel conjugate of $f$ . A popular method to solve this problem is the primal-dual method [11] defined by

[TABLE]

Under the assumption that the solution set of (34) is non-empty and that $\tau\sigma\left\|K\right\|^{2}<1$ , one can prove that the sequence $(u_{k},v_{k})$ weakly converges to a saddle point of (34).

In spirit of (20), we propose the following novel primal-dual algorithm:

[TABLE]

In the following theorem, we prove convergence of this algorithm. As one can see, the conditions required for its convergence are exactly the same as for (35). Rather than present the full proof, we will only focus on the most important ingredient — the fact that $(u_{k})$ , $(v_{k})$ remain bounded. One this is established, the rest of the proof follows the standard argument, as in Theorem 3.

Theorem 4.

Let $g\colon\mathcal{H}_{1}\to(-\infty,+\infty]$ , $f\colon\mathcal{H}_{2}\to(\infty,+\infty]$ be proper convex lsc functions and $K\colon\mathcal{H}_{1}\to\mathcal{H}_{2}$ be a bounded linear operator with norm $\left\|K\right\|$ such that the solution set of (34) is nonempty. Let $\tau\sigma\left\|K\right\|^{2}<1$ , let $u_{0}\in\mathcal{H}_{1}$ , and let $v_{0}\in\mathcal{H}_{2}$ . Then the sequence $(u_{k},v_{k})$ , generated by (36), converges weakly to a solution of (34).

Proof.

Let $(u,v)$ be a saddle point of (34). Then the first-order optimality conditions give $-K^{*}v\in\partial g(u)$ and $Ku\in\partial f^{*}(v)$ . By applying Lemma 3 for a fixed $k\in\mathbb{N}$ with

[TABLE]

we obtain

[TABLE]

where, instead of (3), we used its equivalent form (25). Similarly, by applying Lemma 3 for a fixed $k\in\mathbb{N}$ with

[TABLE]

we obtain

[TABLE]

By applying Young’s inequality and using the inequality $\tau\sigma\left\|K\right\|^{2}<1$ , we have

[TABLE]

Now, multiplying (37) by $1/\tau$ , (38) by $1/\sigma$ , summing these two inequalities, and then using the estimate (39) yields

[TABLE]

By telescoping this inequality, one obtains boundedness of $(u_{k})$ and $(v_{k})$ . In fact, a slightly tighter estimation in (39) would yield $\left\|u_{k}-u_{k-1}\right\|\to 0$ and $\left\|v_{k}-v_{k-1}\right\|\to 0$ (since the inequality $\tau\sigma\left\|K\right\|^{2}<1$ is strict). ∎

Although we do not know yet if the proposed scheme (36) has any benefits as compared to (35), we believe that both algorithms will perform very similarly. Nevertheless, the fact that the Lyapunov function associated with the analysis of (36) is different to the one use for (35) might be of interest for deriving new extensions.

5 Concluding Remarks/Future Directions

In this work, we proposed and analyzed a new algorithm for finding a zero in the sum of two monotone operators, one of which is assumed to be Lipschitz continuous. This algorithm naturally arise from a non-standard discretization of a continuous dynamical system with the Douglas–Rachford algorithm. To conclude, we outline possible directions for future work.

•

Extending the stepsize: In our main result, Theorem 3, we established convergence whenever $\lambda<\frac{1}{3L}$ . However, for the reasons discussed in Remark 3, the upper-bound can be improved to $\lambda<\frac{1}{2L}$ , at least when $A=0$ . It would be interesting to either improve or show, by means of a counterexample, that the condition $\lambda<\frac{1}{3L}$ is optimal. Furthermore, it would also be interesting to investigate the optimal convergence rate under some additional assumptions, as it was done in [23] for the classical Douglas–Rachford algorithm.

•

Linesearch: It would be interesting to incorporate a linesearch procedure in the shadow Douglas–Rachford method. Similarly, it makes sense to consider a continuous dynamic scheme with variable steps, as it was done, for example, in [6] for Tseng’s method.

•

Inertial terms: It is important to study the extensions of (11) and (20), which incorporate additional inertial and relaxed terms, as it was done in the recent work [5] for the forward-backward method. Combining inertial and relaxing effects allows one to go beyond the standard bound of $\frac{1}{3}$ for the stepsize associated with the inertial term.

•

Role of reflection: Perhaps the most interesting and challenging direction for future work is to understand why the inclusion of a “reflection term” in an algorithm allows for convergence to proven under milder hypotheses. For instance, applied to the saddle point problem (34), the famous Arrow–Hurwicz algorithm [4] can fail to converge. In contrast, both (35) and (36), which can be viewed its “reflected” modifications, do converge. Similarly, for the monotone variational inequality $0\in N_{C}(x)+B(x)$ , where $C$ is a closed convex set and $N_{C}$ is its normal cone, the projected gradient algorithm

[TABLE]

does not work, but its “reflected” modification [19] given by

[TABLE]

does converge to a solution. For the more general monotone inclusion ${0\in A(x)+B(x)}$ , the forward-backward method also does not work, however both of its “reflected” modifications, (20) and (32), do. We note however that although all of aforementioned algorithms share the same “reflected term”, their analyses are not the same. It would be interesting to understand deeper reasons for their success.

Acknowledgements.

E.R. Csetnek was supported by Austrian Science Fund Project P 29809-N32. Y. Maltsky was supported by German Research Foundation grant SFB755-A4. The authors also would like to thank the Erwin Schr̈odinger Institute for their support and hospitality during the thematic program “Modern Maximal Monotone Operator Theory: From Nonsmooth Optimization to Differential Inclusions”.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Abbas, B., Attouch, H., and Svaiter, B. F. Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161 , 2 (2014), 331–360.
2[2] Al’ber, Y. I. Continuous regularization of linear operator equations in a hilbert space. Mathematical notes of the Academy of Sciences of the USSR 4 , 5 (1968), 793–797.
3[3] Antipin, A. S. Minimization of convex functions on convex sets by means of differential equations. Differential equations 30 , 9 (1994), 1365–1375.
4[4] Arrow, K., and Hurwicz, L. Gradient methods for constrained maxima. Operations Research 5 , 2 (1957), 258–265.
5[5] Attouch, H., and Cabot, A. Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. preprint hal-01782016 (2018).
6[6] Banert, S., and Boţ, R. I. A forward-backward-forward differential equation and its asymptotic properties. Journal of Convex Analysis 25 , 2 (2018), 371–388.
7[7] Bauschke, H. H., and Combettes, P. L. Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 1st ed. Springer Science+Business Media, New York, 2011.
8[8] Bello Cruz, J., and Díaz Millán, R. A variant of forward-backward splitting method for the sum of two monotone operators with a new search strategy. Optim. 64 , 7 (2015), 1471–1486.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Shadow Douglas–Rachford Splitting for Monotone Inclusions

Abstract

1 Introduction

2 From the Discrete to the Continuous

Theorem 1**.**

Lemma 1**.**

Proof.

Lemma 2**.**

Proof.

Theorem 2**.**

Proof.

3 From the Continuous to the Discrete

Remark 1**.**

Lemma 3**.**

Proof.

Theorem 3**.**

Proof.

Remark 2** (Continuous and discrete proofs).**

Remark 3**.**

4 Primal-Dual Algorithms

Theorem 4**.**

Proof.

5 Concluding Remarks/Future Directions

Acknowledgements.

Theorem 1.

Lemma 1.

Lemma 2.

Theorem 2.

Remark 1.

Lemma 3.

Theorem 3.

Remark 2 (Continuous and discrete proofs).

Remark 3.

Theorem 4.