Bregman Forward-Backward Operator Splitting

Minh N. B\`ui; Patrick L. Combettes

arXiv:1908.03878·math.OC·September 29, 2020

Bregman Forward-Backward Operator Splitting

Minh N. B\`ui, Patrick L. Combettes

PDF

TL;DR

This paper proves the convergence of a Bregman distance-based forward-backward splitting algorithm for monotone operators in Banach spaces, extending known results beyond minimization problems and providing sharper convergence rates.

Contribution

It introduces a novel Bregman forward-backward splitting framework with variable distances and a new assumption on operators, broadening applicability and improving convergence rates.

Findings

01

Convergence established for the algorithm in reflexive Banach spaces.

02

Sharper convergence rates in the minimization setting.

03

Framework accommodates iteration-varying Bregman distances.

Abstract

We establish the convergence of the forward-backward splitting algorithm based on Bregman distances for the sum of two monotone operators in reflexive Banach spaces. Even in Euclidean spaces, the convergence of this algorithm has so far been proved only in the case of minimization problems. The proposed framework features Bregman distances that vary over the iterations and a novel assumption on the single-valued operator that captures various properties scattered in the literature. In the minimization setting, we obtain rates that are sharper than existing ones.

Equations217

(\forall x\in C)(\forall y\in C)(\forall z\in\mathscr{S})(\forall y^{*}\in Ay)(\forall z^{*}\in Az)\\ \qquad\big{\langle}{{y-x},{By-Bz}}\big{\rangle}\leqslant\kappa D_{f}(x,y)+\big{\langle}{{y-z},{\delta_{1}(y^{*}-z^{*})+\delta_{2}\big{(}By-Bz\big{)}}}\big{\rangle}.

(\forall x\in C)(\forall y\in C)(\forall z\in\mathscr{S})(\forall y^{*}\in Ay)(\forall z^{*}\in Az)\\ \qquad\big{\langle}{{y-x},{By-Bz}}\big{\rangle}\leqslant\kappa D_{f}(x,y)+\big{\langle}{{y-z},{\delta_{1}(y^{*}-z^{*})+\delta_{2}\big{(}By-Bz\big{)}}}\big{\rangle}.

find x \in int dom f such that 0 \in A x + B x .

find x \in int dom f such that 0 \in A x + B x .

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f_{n}+\gamma_{n}A\big{)}^{-1}\big{(}\nabla f_{n}(x_{n})-\gamma_{n}Bx_{n}\big{)},

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f_{n}+\gamma_{n}A\big{)}^{-1}\big{(}\nabla f_{n}(x_{n})-\gamma_{n}Bx_{n}\big{)},

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f+\gamma_{n}A\big{)}^{-1}\big{(}\nabla f(x_{n})\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f+\gamma_{n}A\big{)}^{-1}\big{(}\nabla f(x_{n})\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}U_{n}+\gamma_{n}A\big{)}^{-1}\big{(}U_{n}x_{n}-\gamma_{n}Bx_{n}\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}U_{n}+\gamma_{n}A\big{)}^{-1}\big{(}U_{n}x_{n}-\gamma_{n}Bx_{n}\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f_{n}+\gamma_{n}\partial\varphi\big{)}^{-1}\big{(}\nabla f_{n}(x_{n})-\gamma_{n}\nabla\psi(x_{n})\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f_{n}+\gamma_{n}\partial\varphi\big{)}^{-1}\big{(}\nabla f_{n}(x_{n})-\gamma_{n}\nabla\psi(x_{n})\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f+\gamma A\big{)}^{-1}\big{(}\nabla f(x_{n})-\gamma Bx_{n}\big{)}

(\forall n\in\mathbb{N})\quad x_{n+1}=\big{(}\nabla f+\gamma A\big{)}^{-1}\big{(}\nabla f(x_{n})-\gamma Bx_{n}\big{)}

\big{(}\forall(x_{1},x_{1}^{*})\in\operatorname{gra}M\big{)}\big{(}\forall(x_{2},x_{2}^{*})\in\operatorname{gra}M\big{)}\quad\langle{{x_{1}-x_{2}},{x_{1}^{*}-x_{2}^{*}}}\rangle\geqslant 0,

\big{(}\forall(x_{1},x_{1}^{*})\in\operatorname{gra}M\big{)}\big{(}\forall(x_{2},x_{2}^{*})\in\operatorname{gra}M\big{)}\quad\langle{{x_{1}-x_{2}},{x_{1}^{*}-x_{2}^{*}}}\rangle\geqslant 0,

\partial f\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}\colon x\mapsto\big{\{}{x^{*}\in{\mathcal{X}}^{*}}~{}|~{}{(\forall y\in{\mathcal{X}})\,\langle{{y-x},{x^{*}}}\rangle+f(x)\leqslant f(y)}\big{\}}.

\partial f\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}\colon x\mapsto\big{\{}{x^{*}\in{\mathcal{X}}^{*}}~{}|~{}{(\forall y\in{\mathcal{X}})\,\langle{{y-x},{x^{*}}}\rangle+f(x)\leqslant f(y)}\big{\}}.

D_{f} : X \times X

D_{f} : X \times X

(x, y)

\mathcal{C}_{\alpha}(f)=\big{\{}{g\in\Gamma_{0}({\mathcal{X}})}~{}|~{}{\operatorname{dom}g=\operatorname{dom}f,\>g\>\text{is G\^{a}teaux differentiable on}\>\operatorname{int}\operatorname{dom}f,\>D_{g}\geqslant\alpha D_{f}}\big{\}}.

\mathcal{C}_{\alpha}(f)=\big{\{}{g\in\Gamma_{0}({\mathcal{X}})}~{}|~{}{\operatorname{dom}g=\operatorname{dom}f,\>g\>\text{is G\^{a}teaux differentiable on}\>\operatorname{int}\operatorname{dom}f,\>D_{g}\geqslant\alpha D_{f}}\big{\}}.

(\forall x \in C) (\forall y \in C) (\forall z \in S) D_{ψ} (x, y) ⩽ κ D_{f} (x, y) + D_{ψ} (x, z) + D_{ψ} (z, y) .

(\forall x \in C) (\forall y \in C) (\forall z \in S) D_{ψ} (x, y) ⩽ κ D_{f} (x, y) + D_{ψ} (x, z) + D_{ψ} (z, y) .

\big{(}\forall(x,x^{*})\in\operatorname{gra}(A+B)\big{)}\big{(}\forall(y,y^{*})\in\operatorname{gra}(A+B)\big{)}\quad\langle{{x-y},{x^{*}-y^{*}}}\rangle\geqslant\beta\|Bx-By\|^{2},

\big{(}\forall(x,x^{*})\in\operatorname{gra}(A+B)\big{)}\big{(}\forall(y,y^{*})\in\operatorname{gra}(A+B)\big{)}\quad\langle{{x-y},{x^{*}-y^{*}}}\rangle\geqslant\beta\|Bx-By\|^{2},

(\forall x \in X) (\forall y \in X) ⟨ x - y, B x - B y ⟩ ⩾ β ∥ B x - B y ∥^{2} .

(\forall x \in X) (\forall y \in X) ⟨ x - y, B x - B y ⟩ ⩾ β ∥ B x - B y ∥^{2} .

(\forall x \in X) (\forall y \in X) (\forall z \in X) ⟨ y - z, B z - B x ⟩ ⩽ \frac{1}{4 β ν} ⟨ x - y, B x - B y ⟩ .

(\forall x \in X) (\forall y \in X) (\forall z \in X) ⟨ y - z, B z - B x ⟩ ⩽ \frac{1}{4 β ν} ⟨ x - y, B x - B y ⟩ .

(\forall x \in C) (\forall y \in C) (\forall z \in S) κ D_{f} (x, y) ⩾ D_{ψ} (x, y) - D_{ψ} (x, z) - D_{ψ} (z, y) = ⟨ z - x, B y - B z ⟩ .

(\forall x \in C) (\forall y \in C) (\forall z \in S) κ D_{f} (x, y) ⩾ D_{ψ} (x, y) - D_{ψ} (x, z) - D_{ψ} (z, y) = ⟨ z - x, B y - B z ⟩ .

(\forall x \in \overline{dom} A) (\forall y \in \overline{dom} A) ⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ ⩾ α ∥ x - y ∥^{2} .

(\forall x \in \overline{dom} A) (\forall y \in \overline{dom} A) ⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ ⩾ α ∥ x - y ∥^{2} .

D_{f} (x, y)

D_{f} (x, y)

\displaystyle=\int_{0}^{1}\big{\langle}{{x-y},{\nabla f(y+t(x-y))-\nabla f(y)}}\big{\rangle}dt

⩾ \int_{0}^{1} t α ∥ x - y ∥^{2} d t

= \frac{α}{2} ∥ x - y ∥^{2} .

\displaystyle\hskip-125.19212pt(\forall x\in C)\big{(}\forall(y,y^{*})\in\operatorname{gra}A\big{)}\big{(}\forall(z,z^{*})\in\operatorname{gra}A\big{)}

\displaystyle\hskip-125.19212pt(\forall x\in C)\big{(}\forall(y,y^{*})\in\operatorname{gra}A\big{)}\big{(}\forall(z,z^{*})\in\operatorname{gra}A\big{)}

⟨ y - x, B y - B z ⟩

⩽ \frac{∥ y - x ∥ ^{2}}{2 ( 2 β - ε )} + \frac{2 β - ε}{2} ∥ B y - B z ∥^{2}

\displaystyle\leqslant\kappa D_{f}(x,y)+\big{\langle}{{y-z},{\delta_{1}(y^{*}-z^{*})+\delta_{2}(By-Bz)}}\big{\rangle}.

\big{(}\forall(x,x^{*})\in\operatorname{gra}(A+B)\big{)}\big{(}\forall(y,y^{*})\in\operatorname{gra}(A+B)\big{)}\\ \langle{{x-y},{x^{*}-y^{*}}}\rangle\geqslant\mu\|x-y\|^{2}\geqslant\beta\|Bx-By\|^{2}.

\big{(}\forall(x,x^{*})\in\operatorname{gra}(A+B)\big{)}\big{(}\forall(y,y^{*})\in\operatorname{gra}(A+B)\big{)}\\ \langle{{x-y},{x^{*}-y^{*}}}\rangle\geqslant\mu\|x-y\|^{2}\geqslant\beta\|Bx-By\|^{2}.

\displaystyle\hskip-125.19212pt(\forall x\in C)\big{(}\forall(y,y^{*})\in\operatorname{gra}A\big{)}\big{(}\forall(z,z^{*})\in\operatorname{gra}A\big{)}

\displaystyle\hskip-125.19212pt(\forall x\in C)\big{(}\forall(y,y^{*})\in\operatorname{gra}A\big{)}\big{(}\forall(z,z^{*})\in\operatorname{gra}A\big{)}

⟨ y - x, B y - B z ⟩

\displaystyle\leqslant\kappa D_{f}(x,y)+\big{\langle}{{y-z},{\delta_{2}(By-Bz)}}\big{\rangle}.

B = \nabla ψ, Argmin ψ = {0}, and \nabla ψ (K) \subset K .

B = \nabla ψ, Argmin ψ = {0}, and \nabla ψ (K) \subset K .

(\nabla f_{n} + γ_{n} A)^{- 1} is single-valued on dom (\nabla f_{n} + γ_{n} A)^{- 1} = ran (\nabla f_{n} + γ_{n} A) .

(\nabla f_{n} + γ_{n} A)^{- 1} is single-valued on dom (\nabla f_{n} + γ_{n} A)^{- 1} = ran (\nabla f_{n} + γ_{n} A) .

ran (\nabla f_{n} + γ_{n} A)^{- 1} = dom \nabla f_{n} \cap dom A = (int dom f_{n}) \cap dom A = C .

ran (\nabla f_{n} + γ_{n} A)^{- 1} = dom \nabla f_{n} \cap dom A = (int dom f_{n}) \cap dom A = C .

δ_{1} γ_{n + 1} ⩽ (1 - ε) γ_{n} .

δ_{1} γ_{n + 1} ⩽ (1 - ε) γ_{n} .

\begin{cases}x_{n+1}^{*}=\gamma_{n}^{-1}\big{(}\nabla f_{n}(x_{n})-\nabla f_{n}(x_{n+1})\big{)}-Bx_{n}\\ \Delta_{n}=D_{f_{n}}(z,x_{n})+\delta_{1}\gamma_{n}\langle{{x_{n}-z},{x_{n}^{*}+Bz}}\rangle\\ \theta_{n}=(1-\kappa\gamma_{n}/\alpha)D_{f_{n}}(x_{n+1},x_{n})\\ \qquad\;+\varepsilon\gamma_{n}\langle{{x_{n+1}-z},{x_{n+1}^{*}+Bz}}\rangle+(1-\delta_{2})\gamma_{n}\langle{{x_{n}-z},{Bx_{n}-Bz}}\rangle.\end{cases}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Bregman Forward-Backward Operator Splitting††thanks: Contact

author: P. L. Combettes, [email protected], phone: +1 (919) 515 2671. This work was supported by the National Science Foundation under grant DMS-1818946.

Minh N. Bùi and Patrick L. Combettes

North Carolina State University

Department of Mathematics

Raleigh

NC 27695-8205

USA

[email protected] and [email protected]

( )

Dedicated to Terry Rockafellar on the occasion of his 85th birthday

Abstract. We establish the convergence of the forward-backward splitting algorithm based on Bregman distances for the sum of two monotone operators in reflexive Banach spaces. Even in Euclidean spaces, the convergence of this algorithm has so far been proved only in the case of minimization problems. The proposed framework features Bregman distances that vary over the iterations and a novel assumption on the single-valued operator that captures various properties scattered in the literature. In the minimization setting, we obtain rates that are sharper than existing ones.

Keywords. Banach space, Bregman distance, forward-backward splitting, Legendre function, monotone operator.

1 Introduction

Throughout, ${\mathcal{X}}$ is a reflexive real Banach space with topological dual ${\mathcal{X}}^{*}$ . We are concerned with the following monotone inclusion problem (see Section 2.1 for notation and definitions).

Problem 1.1

Let $A\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ and $B\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ be maximally monotone, let $f\in\Gamma_{0}({\mathcal{X}})$ be essentially smooth, and let $D_{f}$ be the Bregman distance associated with $f$ . Set $C=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{dom}A$ and $\mathscr{S}=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{zer}(A+B)$ . Suppose that $C\subset\operatorname{int}\operatorname{dom}B$ , $\mathscr{S}\neq\varnothing$ , $B$ is single-valued on $\operatorname{int}\operatorname{dom}B$ , and there exist $\delta_{1}\in\left[0,1\right[$ , $\delta_{2}\in[0,1]$ , and $\kappa\in\left[0,{+}\infty\right[$ such that

[TABLE]

The objective is to

[TABLE]

The central problem (1.2) has extensive connections with various areas of mathematics and its applications. In Hilbert spaces, if $B$ is cocoercive, a standard method for solving (1.2) is the forward-backward algorithm, which operates with the update $x_{n+1}=(\operatorname{Id}+\gamma A)^{-1}(x_{n}-\gamma Bx_{n})$ [17]. This iteration is not applicable beyond Hilbert spaces since $A$ maps to ${\mathcal{X}}^{*}\neq{\mathcal{X}}$ . In addition, there has been a significant body of work (see, e.g., [3, 6, 8, 12, 13, 16, 18, 19, 23]) showing the benefits of replacing standard distances by Bregman distances, even in Euclidean spaces. Given a sequence $(\gamma_{n})_{n\in\mathbb{N}}$ in $\left]0,{+}\infty\right[$ and a suitable sequence of differentiable convex functions $(f_{n})_{n\in\mathbb{N}}$ , we propose to solve (1.2) via the iterative scheme

[TABLE]

which consists of first applying a forward (explicit) step involving $B$ and then a backward (implicit) step involving $A$ . Let us note that the convergence of such an iterative process has not yet been established, even in finite-dimensional spaces with a single function $f_{n}=f$ and constant parameters $\gamma_{n}=\gamma$ . Furthermore, the novel scheme (1.3) will be shown to unify and extend several iterative methods which have thus far not been brought together:

•

The Bregman monotone proximal point algorithm

[TABLE]

of [6] for finding a zero of $A$ in $\operatorname{int}\operatorname{dom}f$ , where $f$ is a Legendre function.

•

The variable metric forward-backward splitting method

[TABLE]

of [15] for finding a zero of $A+B$ in a Hilbert space, where $(U_{n})_{n\in\mathbb{N}}$ is a sequence of strongly positive self-adjoint bounded linear operators.

•

The splitting method

[TABLE]

of [18] for finding a minimizer of the sum of the convex functions $\varphi$ and $\psi$ in $\operatorname{int}\operatorname{dom}f$ .

•

The Renaud–Cohen algorithm

[TABLE]

of [20] for finding a zero of $A+B$ in a Hilbert space, where $f$ is real-valued and strongly convex.

Problems which cannot be solved by algorithms (1.4)–(1.7) will be presented in Example 2.9 as well as in Sections 3.2 and 3.4. New results on the minimization setting will be presented in Section 3.3.

The goal of the present paper is to investigate the asymptotic behavior of (1.3) under mild conditions on $A$ , $B$ , and $(f_{n})_{n\in\mathbb{N}}$ . Let us note that the convergence proof techniques used in the above four frameworks do not extend to (1.3). For instance, the tools of [18] rely heavily on functional inequalities involving $\varphi$ and $\psi$ . On the other hand, the approach of [15] exploits specific properties of quadratic kernels in Hilbert spaces, while [6] relies on Bregman monotonicity properties of the iterates that will no longer hold in the presence of $B$ . Finally, the proofs of [20] depend on the strong convexity of $f$ , the underlying Hilbertian structure, and the fact that the updating equation is governed by a fixed operator. Our analysis will not only capture these frameworks but also provide new methods to solve problems beyond their reach. It hinges on the theory of Legendre functions and the following new condition, which will be seen to cover in particular various properties such as the cocoercivity assumption used in the standard forward-backward method in Hilbert spaces [7, 17], as well as the seemingly unrelated assumptions used in [6, 15, 18, 20] to study (1.4)–(1.7).

The main result on the convergence of (1.3) is established in Section 2 for the general scenario described in Problem 1.1. Section 3 is dedicated to special cases and applications. In the context of minimization problems, convergence rates on the worst behavior of the method are obtained.

2 Main results

2.1 Notation and definitions

The norm of ${\mathcal{X}}$ is denoted by $\|{\mkern 2.0mu\cdot\mkern 2.0mu}\|$ and the canonical pairing between ${\mathcal{X}}$ and ${\mathcal{X}}^{*}$ by $\langle{{{\mkern 1.0mu\cdot\mkern 2.0mu}},{{\cdot\mkern 1.0mu}}}\rangle$ . If ${\mathcal{X}}$ is Hilbertian, its scalar product is denoted by ${\langle{{{\mkern 1.0mu\cdot}}\mid{{\cdot\mkern 1.0mu}}}\rangle}$ . The symbols $\>\rightharpoonup\>$ and $\rightarrow$ denote respectively weak and strong convergence. The set of weak sequential cluster points of a sequence $(x_{n})_{n\in\mathbb{N}}$ in ${\mathcal{X}}$ is denoted by $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ .

Let $M\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ be a set-valued operator. Then $\operatorname{gra}M=\big{\{}{(x,x^{*})\in{\mathcal{X}}\times{\mathcal{X}}^{*}}~{}|~{}{x^{*}\in Mx}\big{\}}$ is the graph of $M$ , $\operatorname{dom}M=\big{\{}{x\in{\mathcal{X}}}~{}|~{}{Mx\neq\varnothing}\big{\}}$ the domain of $M$ , $\operatorname{ran}M=\big{\{}{x^{*}\in{\mathcal{X}}^{*}}~{}|~{}{(\exists\,x\in{\mathcal{X}})\,x^{*}\in Mx}\big{\}}$ the range of $M$ , and $\operatorname{zer}M=\big{\{}{x\in{\mathcal{X}}}~{}|~{}{0\in Mx}\big{\}}$ the set of zeros of $M$ . Moreover, $M$ is monotone if

[TABLE]

and maximally monotone if, furthermore, there exists no monotone operator from ${\mathcal{X}}$ to $2^{{\mathcal{X}}^{*}}$ the graph of which properly contains $\operatorname{gra}M$ .

A function $f\colon{\mathcal{X}}\rightarrow\left]{-}\infty,{+}\infty\right]$ is coercive if $\lim_{\|x\|\rightarrow+\infty}f(x)=+\infty$ and supercoercive if $\lim_{\|x\|\rightarrow+\infty}f(x)/\|x\|=+\infty$ . $\Gamma_{0}({\mathcal{X}})$ is the class of lower semicontinuous convex functions $f\colon{\mathcal{X}}\rightarrow\left]{-}\infty,{+}\infty\right]$ such that $\operatorname{dom}f=\big{\{}{x\in{\mathcal{X}}}~{}|~{}{f(x)<{{+}\infty}}\big{\}}\neq\varnothing$ . Now let $f\in\Gamma_{0}({\mathcal{X}})$ . The conjugate of $f$ is the function $f^{*}\in\Gamma_{0}({\mathcal{X}}^{*})$ defined by $f^{*}\colon{\mathcal{X}}^{*}\rightarrow\left]{-}\infty,{+}\infty\right]\colon x^{*}\mapsto\sup_{x\in{\mathcal{X}}}(\langle{{x},{x^{*}}}\rangle-f(x))$ , and the subdifferential of $f$ is the maximally monotone operator

[TABLE]

In addition, $f$ is a Legendre function if it is essentially smooth in the sense that $\partial f$ is both locally bounded and single-valued on its domain, and essentially strictly convex in the sense that $\partial f^{*}$ is locally bounded on its domain and $f$ is strictly convex on every convex subset of $\operatorname{dom}\partial f$ [5]. Suppose that $f$ is Gâteaux differentiable on $\operatorname{int}\operatorname{dom}f\neq\varnothing$ . The Bregman distance associated with $f$ is

[TABLE]

Given $\alpha\in\left]0,{+}\infty\right[$ , we define

[TABLE]

2.2 On condition (1.1)

The following proposition provides several key illustrations of the pertinence of (1.1) in terms of capturing concrete scenarios.

Proposition 2.1

Consider the setting of Problem 1.1. Then (1.1) holds in each of the following cases:

(i)

$\delta_{1}\in\left[0,1\right[$ , $\delta_{2}=1$ , and $(\forall x\in C)(\forall y\in C)(\forall z\in\mathscr{S})$ $\langle{{z-x},{By-Bz}}\rangle\leqslant\kappa D_{f}(x,y)$ . 2. (ii)

$\delta_{1}=0$ , $\delta_{2}=1$ , and $B=\partial\psi$ , where $\psi\in\Gamma_{0}({\mathcal{X}})$ satisfies

[TABLE] 3. (iii)

$\delta_{1}=0$ , $\delta_{2}=1$ , and there exists $\psi\in\Gamma_{0}({\mathcal{X}})$ such that $B=\partial\psi$ and $(\forall x\in C)(\forall y\in C)$ $D_{\psi}(x,y)\leqslant\kappa D_{f}(x,y)$ . 4. (iv)

$\operatorname{dom}B={\mathcal{X}}$ , there exists $\beta\in\left]0,{+}\infty\right[$ such that

[TABLE]

$f$ * is Fréchet differentiable on ${\mathcal{X}}$ , $\nabla f$ is $\alpha$ -strongly monotone on $\operatorname{dom}A$ for some $\alpha\in\left]0,{+}\infty\right[$ , $\varepsilon\in\left]0,2\beta\right[$ , $\kappa=1/(\alpha(2\beta-\varepsilon))$ , and $\delta_{1}=\delta_{2}=(2\beta-\varepsilon)/(2\beta)$ .* 5. (v)

$A+B$ * is strongly monotone with constant $\mu\in\left]0,{+}\infty\right[$ , $B$ is Lipschitzian on $\operatorname{dom}B={\mathcal{X}}$ with constant $\nu\in\left]0,{+}\infty\right[$ , $f$ is Fréchet differentiable on ${\mathcal{X}}$ , $\nabla f$ is $\alpha$ -strongly monotone on $\operatorname{dom}A$ for some $\alpha\in\left]0,{+}\infty\right[$ , $\varepsilon\in\left]0,2\mu/\nu^{2}\right[$ , $\kappa=\nu^{2}/(\alpha(2\mu-\varepsilon\nu^{2}))$ , and $\delta_{1}=\delta_{2}=(2\mu-\varepsilon\nu^{2})/(2\mu)$ .* 6. (vi)

$\operatorname{dom}B={\mathcal{X}}$ , $\beta\in\left]0,{+}\infty\right[$ , $f$ is Fréchet differentiable on ${\mathcal{X}}$ , $\nabla f$ is $\alpha$ -strongly monotone on $\operatorname{dom}A$ for some $\alpha\in\left]0,{+}\infty\right[$ , $\varepsilon\in\left]0,2\beta\right[$ , $\kappa=1/(\alpha(2\beta-\varepsilon))$ , $\delta_{1}=0$ , $\delta_{2}=(2\beta-\varepsilon)/(2\beta)$ , and one of the following is satisfied:

[a]

$B$ * is $\beta$ -cocoercive, i.e.,*

[TABLE] 2. [b]

$B$ * is $\nu$ -Lipschitzian for some $\nu\in\left]0,{+}\infty\right[$ , and angle bounded with constant $1/(4\beta\nu)$ , i.e.,*

[TABLE] 3. [c]

$B$ * is $(1/\beta)$ -Lipschitzian and there exists $\psi\in\Gamma_{0}({\mathcal{X}})$ such that $B=\nabla\psi$ .*

Proof. (i): Let $x\in C$ , $y\in C$ , and $z\in\mathscr{S}$ . Then $\langle{{y-x},{By-Bz}}\rangle=\langle{{z-x},{By-Bz}}\rangle+\langle{{y-z},{By-Bz}}\rangle\leqslant\kappa D_{f}(x,y)+\langle{{y-z},{\delta_{2}(By-Bz)}}\rangle$ . In view of the monotonicity of $A$ , we obtain (1.1).

(ii) $\Rightarrow$ (i): In the light of [9, Proposition 4.1.5 and Corollary 4.2.5], $\psi$ is Gâteaux differentiable on $\operatorname{int}\operatorname{dom}\psi$ and $B=\nabla\psi$ on $\operatorname{int}\operatorname{dom}\psi=\operatorname{int}\operatorname{dom}B\supset C$ . Hence, we derive from (2.5), (2.3), and [6, Proposition 2.3(ii)] that

[TABLE]

(iii) $\Rightarrow$ (ii): Clear.

(iv): It results from [9, Theorem 4.2.10] that $\nabla f$ is continuous. Thus, using the strong monotonicity of $\nabla f$ on $\operatorname{dom}A$ , we obtain

[TABLE]

Given $x$ and $y$ in $\operatorname{\overline{dom}}A$ , define $\phi\colon\mathbb{R}\rightarrow\mathbb{R}\colon t\mapsto f(y+t(x-y))$ , and observe that, since $\operatorname{\overline{dom}}A$ is convex [24, Theorem 3.11.12], $[x,y]\subset\operatorname{\overline{dom}}A$ and therefore (2.10) yields

[TABLE]

In turn, using (2.6) and (2.11), we deduce that

[TABLE]

(v) $\Rightarrow$ (iv): Set $\beta=\mu/\nu^{2}$ . Then

[TABLE]

(vi): We consider each case separately.

(vi)[a]: By arguing as in (2.11), we obtain $(\forall x\in\operatorname{dom}A)(\forall y\in\operatorname{dom}A)$ $D_{f}(x,y)\geqslant(\alpha/2)\|x-y\|^{2}$ . It thus follows from (2.12) and (2.7) that

[TABLE]

(vi)[b] $\Rightarrow$ (vi)[a]: We derive from [1, Proposition 4] that $B$ is cocoercive with constant $\beta$ .

(vi)[c] $\Rightarrow$ (vi)[a]: This follows from [1, Corollaire 10].

Remark 2.2

Condition (iv) in Proposition 2.1 first appeared in [20] and does not seem to have gotten much notice in the literature. The cocoercivity condition (vi)(vi)[a] was first used in [17] to prove the weak convergence of the classical forward-backward method in Hilbert spaces. Finally, in reflexive Banach space minimization problems, (iii) appears in [18]; see also [3] for the Euclidean case.

Remark 2.3

Condition (iii) is satisfied in particular when ${\mathcal{X}}$ is a Hilbert space, $f=\|{\mkern 2.0mu\cdot\mkern 2.0mu}\|^{2}/2$ , $\operatorname{dom}\psi={\mathcal{X}}$ , and $\nabla\psi$ is Lipschitzian [7, Theorem 18.15], in which case it is known as the “descent lemma.” Condition (ii) can be viewed as an extension of this standard descent lemma involving triples $(x,y,z)$ and an arbitrary Bregman distance $D_{f}$ in reflexive Banach spaces. Let us underline that (ii) is more general than (iii). Indeed, consider the setting of Problem 1.1 with the following additional assumptions: ${\mathcal{X}}$ is a Hilbert space, $0\in\operatorname{int}\operatorname{dom}f$ , $A$ is the normal cone operator of some self-dual cone $K$ , and there exists a Gâteaux differentiable convex function $\psi\colon{\mathcal{X}}\rightarrow\mathbb{R}$ such that

[TABLE]

Then $C=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{dom}A\subset K$ and $\mathscr{S}=\{0\}$ . Further, for every $x\in C$ and every $y\in C$ , (2.16) yields $D_{\psi}(x,y)-D_{\psi}(x,0)-D_{\psi}(0,y)={\langle{{-x}\mid{\nabla\psi(y)-\nabla\psi(0)}}\rangle}={\langle{{-x}\mid{\nabla\psi(y)}}\rangle}\leqslant 0\leqslant D_{f}(x,y)$ . Therefore, (2.5) is satisfied. On the other hand, (iii) does not hold in general. For instance, take ${\mathcal{X}}=\mathbb{R}$ , $K=\left[0,{+}\infty\right[$ , $f=|{\mkern 2.0mu\cdot\mkern 2.0mu}|^{2}/2$ , and $\psi=|{\mkern 2.0mu\cdot\mkern 2.0mu}|^{3/2}$ .

2.3 Forward-backward splitting for monotone inclusions

The formal setting of the proposed Bregman forward-backward splitting method is as follows.

Algorithm 2.4

Consider the setting of Problem 1.1. Let $\alpha\in\left]0,{+}\infty\right[$ , let $(\gamma_{n})_{n\in\mathbb{N}}$ be in $\left]0,{+}\infty\right[$ , and let $(f_{n})_{n\in\mathbb{N}}$ be in $\mathcal{C}_{\alpha}(f)$ . Suppose that the following hold:

[a]

$\inf_{n\in\mathbb{N}}\gamma_{n}>0$ , $\sup_{n\in\mathbb{N}}(\kappa\gamma_{n})\leqslant\alpha$ , and $\sup_{n\in\mathbb{N}}(\delta_{1}\gamma_{n+1}/\gamma_{n})<1$ . 2. [b]

There exists a summable sequence $(\eta_{n})_{n\in\mathbb{N}}$ in $\left[0,{+}\infty\right[$ such that $(\forall n\in\mathbb{N})$ $D_{f_{n+1}}\leqslant(1+\eta_{n})D_{f_{n}}$ . 3. [c]

For every $n\in\mathbb{N}$ , $\nabla f_{n}$ is strictly monotone on $C$ and $(\nabla f_{n}-\gamma_{n}{B})(C)\subset\operatorname{ran}(\nabla f_{n}+\gamma_{n}A)$ .

Take $x_{0}\in C$ and set $(\forall n\in\mathbb{N})$ $x_{n+1}=(\nabla f_{n}+\gamma_{n}A)^{-1}(\nabla f_{n}(x_{n})-\gamma_{n}Bx_{n})$ .

Let us establish basic asymptotic properties of Algorithm 2.4, starting with the fact that its viability domain is $C$ .

Proposition 2.5

Let $(x_{n})_{n\in\mathbb{N}}$ be a sequence generated by Algorithm 2.4 and let $z\in\mathscr{S}$ . Then $(x_{n})_{n\in\mathbb{N}}$ is a well-defined sequence in $C$ and the following hold:

(i)

$(D_{f_{n}}(z,x_{n}))_{n\in\mathbb{N}}$ * converges.* 2. (ii)

$\sum_{n\in\mathbb{N}}(1-\kappa\gamma_{n}/\alpha)D_{f_{n}}(x_{n+1},x_{n})<{{+}\infty}$ * and $\sum_{n\in\mathbb{N}}(1-\kappa\gamma_{n}/\alpha)D_{f}(x_{n+1},x_{n})<{{+}\infty}$ .* 3. (iii)

$\sum_{n\in\mathbb{N}}\langle{{x_{n+1}-z},{\gamma_{n}^{-1}(\nabla f_{n}(x_{n})-\nabla f_{n}(x_{n+1}))-Bx_{n}+Bz}}\rangle<{{+}\infty}$ . 4. (iv)

$\sum_{n\in\mathbb{N}}(1-\delta_{2})\langle{{x_{n}-z},{{B}x_{n}-{B}z}}\rangle<{{+}\infty}$ . 5. (v)

Suppose that one of the following is satisfied:

[a]

$C$ * is bounded.* 2. [b]

$f$ * is supercoercive.* 3. [c]

$f$ * is uniformly convex.* 4. [d]

$f$ * is essentially strictly convex with $\operatorname{dom}f^{*}$ open and $\nabla f^{*}$ weakly sequentially continuous.* 5. [e]

${\mathcal{X}}$ * is finite-dimensional and $\operatorname{dom}f^{*}$ is open.* 6. [f]

$f$ * is essentially strictly convex and $\displaystyle\rho=\inf_{\begin{subarray}{c}x\in\operatorname{int}\operatorname{dom}f\\ y\in\operatorname{int}\operatorname{dom}f\\ x\neq y\end{subarray}}\;\frac{D_{f}(x,y)}{D_{f}(y,x)}\in\left]0,{+}\infty\right[$ .*

Then $(x_{n})_{n\in\mathbb{N}}$ is bounded.

Proof. Take $n\in\mathbb{N}$ , and suppose that $(y^{*},y_{1})$ and $(y^{*},y_{2})$ belong to $\operatorname{gra}(\nabla f_{n}+\gamma_{n}A)^{-1}$ . Then $y^{*}\in(\nabla f_{n}+\gamma_{n}A)y_{1}$ and $y^{*}\in(\nabla f_{n}+\gamma_{n}A)y_{2}$ . However, by virtue of condition [c] in Algorithm 2.4, $\nabla f_{n}+\gamma_{n}A$ is strictly monotone. Therefore, since $\langle{{y_{1}-y_{2}},{y^{*}-y^{*}}}\rangle=0$ , we infer that $y_{1}=y_{2}$ . Hence

[TABLE]

Moreover, it follows from [9, Proposition 4.2.2] and (2.4) that

[TABLE]

Next, we observe that, since $x_{0}\in C\subset\operatorname{int}\operatorname{dom}B$ , $\nabla f_{0}(x_{0})-\gamma_{0}Bx_{0}$ is a singleton. Furthermore, in view of condition [c] in Algorithm 2.4, $\nabla f_{0}(x_{0})-\gamma_{0}Bx_{0}\in\operatorname{ran}(\nabla f_{0}+\gamma_{0}A)$ . We thus deduce from (2.17) that $x_{1}=(\nabla f_{0}+\gamma_{0}A)^{-1}(\nabla f_{0}(x_{0})-\gamma_{0}Bx_{0})$ is uniquely defined. In addition, (2.18) yields $x_{1}\in\operatorname{ran}(\nabla f_{0}+\gamma_{0}A)^{-1}=C$ . The conclusion that $(x_{n})_{n\in\mathbb{N}}$ is a well-defined sequence in $C$ follows by invoking these facts inductively.

(i)–(iv): Condition [a] in Algorithm 2.4 entails that there exists $\varepsilon\in\left]0,1\right[$ such that

[TABLE]

Now take $x_{0}^{*}\in Ax_{0}$ and set

[TABLE]

In view of (2.20),

[TABLE]

In turn, since $(z,-Bz)\in\operatorname{gra}A$ and $A$ is monotone,

[TABLE]

Hence, invoking condition [a] in Algorithm 2.4 and the monotonicity of $B$ , we obtain $\theta_{n}\geqslant 0$ . Next, since $z\in\operatorname{int}\operatorname{dom}f=\operatorname{int}\operatorname{dom}f_{n}$ by (2.4), we derive from (2.20) and [6, Proposition 2.3(ii)] that

[TABLE]

Thus, since $(z,-Bz)\in\operatorname{gra}A$ and $f_{n}\in\mathcal{C}_{\alpha}(f)$ , we infer from (2.19), (2.22), (2.21), and (1.1) that

[TABLE]

Consequently, by condition [b] in Algorithm 2.4 and (2.22),

[TABLE]

Hence, [7, Lemma 5.31] asserts that

[TABLE]

In turn, we infer from (2.20) and condition [a] in Algorithm 2.4 that

[TABLE]

Thus, since $(f_{n})_{n\in\mathbb{N}}$ lies in $\mathcal{C}_{\alpha}(f)$ , we obtain $\sum_{n\in\mathbb{N}}(1-\kappa\gamma_{n}/\alpha)D_{f}(x_{n+1},x_{n})<{{+}\infty}$ . It results from (2.26) and (2.20) that $(D_{f_{n}}(z,x_{n}))_{n\in\mathbb{N}}$ converges.

(v): Recall that $(x_{n})_{n\in\mathbb{N}}$ lies in $C$ .

(v)[a]: Clear.

(v)[b]: We derive from (i) that $(D_{f}(z,x_{n}))_{n\in\mathbb{N}}$ is bounded. In turn, [5, Lemma 7.3(viii)] asserts that $(x_{n})_{n\in\mathbb{N}}$ is bounded.

(v)[c]: It results from [24, Theorem 3.5.10] that there exists a function $\phi\colon\left[0,{+}\infty\right[\rightarrow\left[0,{+}\infty\right]$ that vanishes only at [math] such that $\lim_{t\rightarrow{{+}\infty}}\phi(t)/t\rightarrow{{+}\infty}$ and

[TABLE]

Hence, in the light of (i), $\sup_{n\in\mathbb{N}}\phi(\|x_{n}-z\|)\leqslant\sup_{n\in\mathbb{N}}D_{f}(z,x_{n})\leqslant(1/\alpha)\sup_{n\in\mathbb{N}}D_{f_{n}}(z,x_{n})<{{+}\infty}$ and $(x_{n})_{n\in\mathbb{N}}$ is therefore bounded.

(v)[d]: Suppose that there exists a subsequence $(x_{k_{n}})_{n\in\mathbb{N}}$ of $(x_{n})_{n\in\mathbb{N}}$ such that $\|x_{k_{n}}\|\rightarrow{{+}\infty}$ . We deduce from [5, Lemma 7.3(vii)] and (i) that

[TABLE]

However, $f^{*}$ is a Legendre function by virtue of [5, Corollary 5.5] and $\nabla f(z)\in\operatorname{int}\operatorname{dom}f^{*}$ by virtue of [5, Theorem 5.10]. Thus, [5, Lemma 7.3(v)] guarantees that $D_{f^{*}}({\mkern 2.0mu\cdot\mkern 2.0mu},\nabla f(z))$ is coercive. It therefore follows from (2.29) that $(\nabla f(x_{k_{n}}))_{n\in\mathbb{N}}$ is bounded, and then from the reflexivity of ${\mathcal{X}}^{*}$ that $\mathfrak{W}(\nabla f(x_{k_{n}}))_{n\in\mathbb{N}}\neq\varnothing$ . In turn, there exist a subsequence $(x_{l_{k_{n}}})_{n\in\mathbb{N}}$ of $(x_{k_{n}})_{n\in\mathbb{N}}$ and $x^{*}\in{\mathcal{X}}^{*}$ such that $\nabla f(x_{l_{k_{n}}})\>\rightharpoonup\>x^{*}$ . The weak lower semicontinuity of $f^{*}$ and (2.29) yield $D_{f^{*}}(x^{*},\nabla f(z))\leqslant\varliminf D_{f^{*}}(\nabla f(x_{l_{k_{n}}}),\nabla f(z))<{{+}\infty}$ . Therefore

[TABLE]

Moreover, [5, Theorem 5.10] asserts that $\nabla f^{*}(x^{*})\in\operatorname{int}\operatorname{dom}f$ and $(\forall n\in\mathbb{N})\;\nabla f^{*}\big{(}\nabla f(x_{n})\big{)}=x_{n}$ . Hence, (2.30) and the weak sequential continuity of $\nabla f^{*}$ imply that $x_{l_{k_{n}}}=\nabla f^{*}(\nabla f(x_{l_{k_{n}}}))\>\rightharpoonup\>\nabla f^{*}(x^{*})$ . This yields $\sup_{n\in\mathbb{N}}\|x_{l_{k_{n}}}\|<{{+}\infty}$ and we reach a contradiction.

(v)[e]: A consequence of [5, Lemma 7.3(ix)] and (i).

(v)[f]: It results from [5, Lemma 7.3(v)] that $D_{f}({\mkern 2.0mu\cdot\mkern 2.0mu},z)$ is coercive. In turn, since $\sup_{n\in\mathbb{N}}D_{f}(x_{n},z)\leqslant(1/\rho)\sup_{n\in\mathbb{N}}D_{f}(z,x_{n})<{{+}\infty}$ by (i), $(x_{n})_{n\in\mathbb{N}}$ is bounded.

As seen in Proposition 2.5, by construction, an orbit of Algorithm 2.4 lies in $C$ and therefore in $\operatorname{int}\operatorname{dom}f$ . Next, we proceed to identify sufficient conditions that guarantee that their weak sequential cluster points are also in $\operatorname{int}\operatorname{dom}f$ .

Proposition 2.6

Let $(x_{n})_{n\in\mathbb{N}}$ be a sequence generated by Algorithm 2.4 and suppose that one of the following holds:

[a]

$\operatorname{\overline{dom}}f\cap\operatorname{\overline{dom}}A\subset\operatorname{int}\operatorname{dom}f$ . 2. [b]

$f$ * is essentially strictly convex with $\operatorname{dom}f^{*}$ open and $\nabla f^{*}$ weakly sequentially continuous.* 3. [c]

$f$ * is strictly convex on $\operatorname{int}\operatorname{dom}f$ and $\displaystyle\rho=\inf_{\begin{subarray}{c}x\in\operatorname{int}\operatorname{dom}f\\ y\in\operatorname{int}\operatorname{dom}f\\ x\neq y\end{subarray}}\;\frac{D_{f}(x,y)}{D_{f}(y,x)}\in\left]0,{+}\infty\right[$ .* 4. [d]

${\mathcal{X}}$ * is finite-dimensional.*

Then $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{int}\operatorname{dom}f$ .

Proof. Suppose that $x\in\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\>\rightharpoonup\>x$ , and fix $z\in\mathscr{S}$ .

[a]: Since $\operatorname{\overline{dom}}f$ is closed and convex, it is weakly closed [10, Corollary II.6.3.3(i)]. Hence, since Proposition 2.5 asserts that $(x_{n})_{n\in\mathbb{N}}$ lies in $C\subset\operatorname{dom}f$ , we infer that $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{\overline{dom}}f$ . Likewise, since $\operatorname{\overline{dom}}A$ is a closed convex set [24, Theorem 3.11.12] and $(x_{n})_{n\in\mathbb{N}}$ lies in $C\subset\operatorname{dom}A$ , we obtain $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{\overline{dom}}A$ . Altogether, $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{\overline{dom}}f\cap\operatorname{\overline{dom}}A\subset\operatorname{int}\operatorname{dom}f$ .

[b]: Using an argument similar to that of the proof of Proposition 2.5 (v)(v)[d], we infer that there exist a strictly increasing sequence $(l_{k_{n}})_{n\in\mathbb{N}}$ in $\mathbb{N}$ and $x^{*}\in\operatorname{int}\operatorname{dom}f^{*}$ such that $x_{l_{k_{n}}}\>\rightharpoonup\>\nabla f^{*}(x^{*})$ . Thus, appealing to [5, Theorem 5.10], we conclude that $x=\nabla f^{*}(x^{*})\in\operatorname{int}\operatorname{dom}f$ .

[c]: Proposition 2.5 (i) and the weak lower semicontinuity of $D_{f}({\mkern 2.0mu\cdot\mkern 2.0mu},z)$ yield

[TABLE]

Thus $x\in\operatorname{dom}f$ . We show that $\operatorname{dom}f$ is open. Suppose that there exists $y\in\operatorname{dom}f\smallsetminus\operatorname{int}\operatorname{dom}f$ , let $(\alpha_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,1\right[$ such that $\alpha_{n}\rightarrow 1$ , and set $(\forall n\in\mathbb{N})$ $y_{n}=\alpha_{n}y+(1-\alpha_{n})z$ . Then $\{y_{n}\}_{n\in\mathbb{N}}\subset\left]y,z\right[\subset(\operatorname{int}\operatorname{dom}f)\smallsetminus\{z\}$ [10, Proposition II.2.6.16]. Moreover, $y_{n}\rightarrow y$ and, by convexity of $f$ , $(\forall n\in\mathbb{N})$ $D_{f}(y_{n},z)\leqslant\alpha_{n}(f(y)-f(z)-\langle{{y-z},{\nabla f(z)}}\rangle)$ . Hence

[TABLE]

However, it results from the lower semicontinuity of $f$ that $\varliminf D_{f}(y_{n},z)=\varliminf(f(y_{n})-f(z))-\lim\langle{{y_{n}-z},{\nabla f(z)}}\rangle\geqslant f(y)-f(z)-\langle{{y-z},{\nabla f(z)}}\rangle=D_{f}(y,z)$ . Hence, (2.32) forces

[TABLE]

In addition, by convexity of $f$ , $(\forall n\in\mathbb{N})\;D_{f}(z,y_{n})\geqslant\alpha_{n}(f(z)-f(y)-\langle{{z-y},{\nabla f(y_{n})}}\rangle)$ . However, [5, Theorem 5.6] and the essential smoothness of $f$ entail that

[TABLE]

Thus,

[TABLE]

It results from (2.33) and (2.35) that $0<\rho\leqslant\lim D_{f}(y_{n},z)/D_{f}(z,y_{n})=0$ , so that we reach a contradiction. Consequently, $\operatorname{dom}f$ is open and hence $x\in\operatorname{dom}f=\operatorname{int}\operatorname{dom}f$ .

[d]: Proposition 2.5 (i) ensures that $(x_{k_{n}})_{n\in\mathbb{N}}$ is a sequence in $\operatorname{int}\operatorname{dom}f$ such that $(D_{f}(z,x_{k_{n}}))_{n\in\mathbb{N}}$ is bounded. Therefore, [4, Theorem 3.8(ii)] and the essential smoothness of $f$ yield $x\in\operatorname{int}\operatorname{dom}f$ .

Definition 2.7

Algorithm 2.4 is focusing if, for every $z\in\mathscr{S}$ ,

[TABLE]

Our main result establishes the weak convergence of the orbits of Algorithm 2.4.

Theorem 2.8

Let $(x_{n})_{n\in\mathbb{N}}$ be a sequence generated by Algorithm 2.4 and suppose that the following hold:

[a]

$(x_{n})_{n\in\mathbb{N}}$ * is bounded.* 2. [b]

$\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{int}\operatorname{dom}f$ . 3. [c]

Algorithm 2.4 is focusing. 4. [d]

One of the following is satisfied:

1/

$\mathscr{S}$ * is a singleton.* 2. 2/

There exists a function $g$ in $\Gamma_{0}({\mathcal{X}})$ which is Gâteaux differentiable on $\operatorname{int}\operatorname{dom}g\supset C$ , with $\nabla g$ strictly monotone on $C$ , and such that, for every sequence $(y_{n})_{n\in\mathbb{N}}$ in $C$ and every $y\in\mathfrak{W}(y_{n})_{n\in\mathbb{N}}\cap C$ , $y_{k_{n}}\>\rightharpoonup\>y$ $\Rightarrow$ $\nabla f_{k_{n}}(y_{k_{n}})\>\rightharpoonup\>\nabla g(y)$ .

Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a point in $\mathscr{S}$ .

Proof. It results from [a] and the reflexivity of ${\mathcal{X}}$ that

[TABLE]

On the other hand, [c] and items (i)–(iv) in Proposition 2.5 yield $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{zer}(A+B)$ . In turn, it results from [b] that

[TABLE]

In view of [7, Lemma 1.35] applied in ${\mathcal{X}}^{\text{weak}}$ , it remains to show that $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ is a singleton. If [d][d]1/ holds, this follows from (2.38). Now suppose that [d][d]2/ holds, and take $y_{1}$ and $y_{2}$ in $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\>\rightharpoonup\>y_{1}$ and $x_{l_{n}}\>\rightharpoonup\>y_{2}$ . Then $y_{1}\in\mathscr{S}$ and $y_{2}\in\mathscr{S}$ by virtue of (2.38), and we therefore deduce from Proposition 2.5 (i) that $(D_{f_{n}}(y_{1},x_{n}))_{n\in\mathbb{N}}$ and $(D_{f_{n}}(y_{2},x_{n}))_{n\in\mathbb{N}}$ converge. However, condition [b] in Algorithm 2.4 and [7, Lemma 5.31] assert that $(D_{f_{n}}(y_{1},y_{2}))_{n\in\mathbb{N}}$ converges. Hence, appealing to [6, Proposition 2.3(ii)], it follows that $(\langle{{y_{1}-y_{2}},{\nabla f_{n}(x_{n})-\nabla f_{n}(y_{2})}}\rangle)_{n\in\mathbb{N}}=(D_{f_{n}}(y_{2},x_{n})+D_{f_{n}}(y_{1},y_{2})-D_{f_{n}}(y_{1},x_{n}))_{n\in\mathbb{N}}$ converges. Set $\ell=\lim\langle{{y_{1}-y_{2}},{\nabla f_{n}(x_{n})-\nabla f_{n}(y_{2})}}\rangle$ . Since $(x_{n})_{n\in\mathbb{N}}$ is a sequence in $C$ , we infer from (2.38) and [d][d]2/ that $\ell\leftarrow\langle{{y_{1}-y_{2}},{\nabla f_{l_{n}}(x_{l_{n}})-\nabla f_{l_{n}}(y_{2})}}\rangle\rightarrow\langle{{y_{1}-y_{2}},{\nabla g(y_{2})-\nabla g(y_{2})}}\rangle=0$ , which yields $\ell=0$ . However, invoking [d][d]2/, we obtain $\ell\leftarrow\langle{{y_{1}-y_{2}},{\nabla f_{k_{n}}(x_{k_{n}})-\nabla f_{k_{n}}(y_{2})}}\rangle\rightarrow\langle{{y_{1}-y_{2}},{\nabla g(y_{1})-\nabla g(y_{2})}}\rangle$ . It therefore follows that $\langle{{y_{1}-y_{2}},{\nabla g(y_{1})-\nabla g(y_{2})}}\rangle=0$ and hence from the strict monotonicity of $\nabla g$ on $C$ that $y_{1}=y_{2}$ .

Example 2.9

We provide an example with operating conditions that are not captured by any of the methods described in (1.4)–(1.7). Let $p\in\left]1,{{+}\infty}\right[$ , let $(\chi_{n})_{n\in\mathbb{N}}$ be a sequence in $\left[1,{{+}\infty}\right[$ such that $\chi_{n}\rightarrow 1$ , and let $(\eta_{n})_{n\in\mathbb{N}}$ be a summable sequence in $\left[0,{+}\infty\right[$ such that $(\forall n\in\mathbb{N})$ $\chi_{n+1}\leqslant(1+\eta_{n})\chi_{n}$ . We denote by $z=(\zeta_{k})_{k\in\mathbb{N}}$ a sequence in $\ell^{p}(\mathbb{N})$ . Set ${\mathcal{X}}=\ell^{p}(\mathbb{N})\times\mathbb{R}$ , hence ${\mathcal{X}}^{*}=\ell^{p/(p-1)}(\mathbb{N})\times\mathbb{R}$ , and define the Legendre functions

[TABLE]

and

[TABLE]

Now let $\psi\colon{\mathcal{X}}\rightarrow\left[0,{+}\infty\right[\colon(z,\xi)\mapsto\|z\|^{p}/p$ , set $B=\nabla\psi$ , and let $A\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ be any maximally monotone operator such that

[TABLE]

Let us check that this setting conforms to that of Theorem 2.8. First, Proposition 2.1 (iii) implies that (1.1) is satisfied with $\delta_{1}=0$ and $\delta_{2}=\kappa=1$ . Next, we note that $\operatorname{int}\operatorname{dom}f=\ell^{p}(\mathbb{N})\times\left]0,{+}\infty\right[$ , that $(f_{n})_{n\in\mathbb{N}}$ lies in $\mathcal{C}_{1}(f)$ , and that condition [b] in Algorithm 2.4 holds. Furthermore, we derive from (2.39) that

[TABLE]

and we observe that

[TABLE]

It therefore follows from the Brézis–Haraux theorem [11, Théorème 4] that

[TABLE]

and hence that condition [c] in Algorithm 2.4 holds. It remains to verify condition [d][d]2/ in Theorem 2.8. Set $\varphi\colon\ell^{p}(\mathbb{N})\rightarrow\left[0,{+}\infty\right[\colon z\mapsto\|z\|^{p}/p$ and $(\forall n\in\mathbb{N})$ $\varphi_{n}\colon\ell^{p}(\mathbb{N})\rightarrow\left[0,{+}\infty\right[\colon z\mapsto\chi_{n}\|z\|^{p}/p$ . Take a sequence $(z_{n},\xi_{n})_{n\in\mathbb{N}}$ in $\operatorname{dom}A$ and a point $(z,\xi)\in\operatorname{dom}A$ such that $(z_{n},\xi_{n})\>\rightharpoonup\>(z,\xi)$ . We have $\xi_{n}\rightarrow\xi$ and $(\forall k\in\mathbb{N})$ $\zeta_{n,k}\rightarrow\zeta_{k}$ . Now let $(e_{k})_{k\in\mathbb{N}}$ be the canonical Schauder basis of $\ell^{p}(\mathbb{N})$ . Then

[TABLE]

and $(\nabla\varphi_{n}(z_{n}))_{n\in\mathbb{N}}$ is bounded. It therefore follows from [2, Théorème VIII-2] that $\nabla\varphi_{n}(z_{n})\>\rightharpoonup\>\nabla\varphi(z)$ and, in turn, that $\nabla f_{n}(z_{n},\xi_{n})\>\rightharpoonup\>\nabla g(z,\xi)$ by (2.40) and (2.42). Note that the above setting is not covered by the assumptions underlying (1.4)–(1.7): the fact that $B\neq 0$ excludes [6], the fact that ${\mathcal{X}}$ is not a Hilbert space excludes [15] and [20], and [18] is excluded because $A$ is not a subdifferential.

3 Special cases and applications

We illustrate the general scope of Theorem 2.8 by recovering apparently unrelated results and also by deriving new ones. Sufficient conditions for [a] and [b] in Theorem 2.8 to hold can be found in Propositions 2.5 (v) and 2.6, respectively. As to checking the focusing condition [c], the following fact will be useful.

Lemma 3.1

[13, Proposition 2.1(iii)]* Let $M_{1}\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ and $M_{2}\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ be maximally monotone, let $(a_{n},a_{n}^{*})_{n\in\mathbb{N}}$ be a sequence in $\operatorname{gra}M_{1}$ , let $(b_{n},b_{n}^{*})_{n\in\mathbb{N}}$ be a sequence in $\operatorname{gra}M_{2}$ , let $x\in{\mathcal{X}}$ , and let $y^{*}\in{\mathcal{X}}^{*}$ . Suppose that $a_{n}\>\rightharpoonup\>x$ , $b_{n}^{*}\>\rightharpoonup\>y^{*}$ , $a_{n}^{*}+b_{n}^{*}\rightarrow 0$ , and $a_{n}-b_{n}\rightarrow 0$ . Then $x\in\operatorname{zer}(M_{1}+M_{2})$ .*

3.1 Recovering existing frameworks for monotone inclusions

In this section, we show that the existing results of [6, 15, 20] discussed in the Introduction can be recovered from Theorem 2.8. As will be clear from the proofs, more general versions of these results can also be derived at once from Theorem 2.8. First, we derive from Theorem 2.8 the convergence of the Bregman-based proximal point algorithm (1.4) studied in [6, Section 5.5].

Corollary 3.2

Let $A\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ be maximally monotone, let $f\in\Gamma_{0}({\mathcal{X}})$ be a supercoercive Legendre function such that $\varnothing\neq\operatorname{zer}A\subset\operatorname{dom}A\subset\operatorname{int}\operatorname{dom}f$ and $\nabla f$ is weakly sequentially continuous, and let $(\gamma_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,{+}\infty\right[$ such that $\inf_{n\in\mathbb{N}}\gamma_{n}>0$ . Suppose that, for every bounded sequence $(y_{n})_{n\in\mathbb{N}}$ in $\operatorname{int}\operatorname{dom}f$ ,

[TABLE]

Take $x_{0}\in C$ and set $(\forall n\in\mathbb{N})$ $x_{n+1}=(\nabla f+\gamma_{n}A\big{)}^{-1}(\nabla f(x_{n}))$ . Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a point in $\operatorname{zer}A$ .

Proof. We apply Theorem 2.8 with $B=0$ , $\alpha=1$ , $\kappa=\delta_{1}=\delta_{2}=0$ , and $(\forall n\in\mathbb{N})$ $f_{n}=f$ . First, (1.1) together with conditions [a] and [b] in Algorithm 2.4 are trivially fulfilled. On the other hand, since $f$ is a Legendre function and $\operatorname{dom}A\subset\operatorname{int}\operatorname{dom}f$ , condition [c] in Algorithm 2.4 follows from [6, Theorem 3.13(iv)(d)]. Next, condition [a] in Theorem 2.8 follows from Proposition 2.5 (v)(v)[b]. Furthermore, in view of the weak sequential continuity of $\nabla f$ , condition [d][d]2/ in Theorem 2.8 is satisfied with $g=f$ . Next, to show that the algorithm is focusing, suppose that $\sum_{n\in\mathbb{N}}D_{f}(x_{n+1},x_{n})<{{+}\infty}$ and take $x\in\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\>\rightharpoonup\>x$ . Since $(x_{n})_{n\in\mathbb{N}}$ is a bounded sequence in $\operatorname{int}\operatorname{dom}f$ , we derive from (3.1) that $\nabla f(x_{n+1})-\nabla f(x_{n})\rightarrow 0$ . In turn, since $\inf_{n\in\mathbb{N}}\gamma_{n}>0$ , it follows that $\gamma_{n}^{-1}(\nabla f(x_{n+1})-\nabla f(x_{n}))\rightarrow 0$ . However, by construction, $(\forall n\in\mathbb{N})$ $\gamma_{k_{n}-1}^{-1}(\nabla f(x_{k_{n}-1})-\nabla f(x_{k_{n}}))\in Ax_{k_{n}}$ . Therefore, upon invoking Lemma 3.1 (with $M_{1}=A$ and $M_{2}=0$ ), we obtain $x\in\operatorname{zer}A$ and the algorithm is therefore focusing. This also shows that $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{zer}A\subset\operatorname{int}\operatorname{dom}f$ . Condition [b] in Theorem 2.8 is thus satisfied.

The next application of Theorem 2.8 is a variable metric version of the Hilbertian forward-backward method (1.5) established in [15, Theorem 4.1].

Corollary 3.3

Let ${\mathcal{X}}$ be a real Hilbert space, let $A\colon{\mathcal{X}}\rightarrow 2^{\mathcal{X}}$ be maximally monotone, let $\alpha$ and $\beta$ be in $\left]0,{+}\infty\right[$ , and let $B\colon{\mathcal{X}}\rightarrow{\mathcal{X}}$ satisfy

[TABLE]

Further, for every $n\in\mathbb{N}$ , let $U_{n}\colon{\mathcal{X}}\rightarrow{\mathcal{X}}$ be a bounded linear operator which is $\alpha$ -strongly monotone and self-adjoint. Suppose that $\operatorname{zer}(A+B)\neq\varnothing$ and that there exists a summable sequence $(\eta_{n})_{n\in\mathbb{N}}$ in $\left[0,{+}\infty\right[$ such that

[TABLE]

Let $\varepsilon\in\left]0,2\beta\right[$ and let $(\gamma_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,{+}\infty\right[$ such that $0<\inf_{n\in\mathbb{N}}\gamma_{n}\leqslant\sup_{n\in\mathbb{N}}\gamma_{n}\leqslant(2\beta-\varepsilon)\alpha$ . Define a sequence $(x_{n})_{n\in\mathbb{N}}$ via the recursion

[TABLE]

Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a point in $\operatorname{zer}(A+B)$ .

Proof. Set $f=\|{\mkern 2.0mu\cdot\mkern 2.0mu}\|^{2}/2$ , $C=\operatorname{dom}A$ , and $\mathscr{S}=\operatorname{zer}(A+B)$ . In addition, for every $n\in\mathbb{N}$ , define $f_{n}\colon{\mathcal{X}}\rightarrow\mathbb{R}\colon x\mapsto{\langle{{x}\mid{U_{n}x}}\rangle}/2$ . Let us apply Theorem 2.8 with $\kappa=1/(2\beta-\varepsilon)$ , $\delta_{1}=0$ , and $\delta_{2}=(2\beta-\varepsilon)/(2\beta)\in\left]0,1\right[$ . First, $f\in\Gamma_{0}({\mathcal{X}})$ is a supercoercive Legendre function with $\operatorname{dom}f={\mathcal{X}}$ and, for every $n\in\mathbb{N}$ , since $\nabla f_{n}=U_{n}$ is $\alpha$ -strongly monotone, $f_{n}\in\mathcal{C}_{\alpha}(f)$ . Furthermore, it follows from Proposition 2.1 (vi)(vi)[a] that (1.1) is fulfilled. We also observe that condition [a] in Algorithm 2.4 is satisfied. Next, by (3.3) and the assumption that the operators $(U_{n})_{n\in\mathbb{N}}$ are self-adjoint,

[TABLE]

and condition [b] in Algorithm 2.4 therefore holds. Now take $n\in\mathbb{N}$ . Since $\nabla f_{n}=U_{n}$ is maximally monotone with $\operatorname{dom}\nabla f_{n}={\mathcal{X}}$ and $A$ is maximally monotone, [7, Corollary 25.5(i)] entails that $\nabla f_{n}+\gamma_{n}A$ is maximally monotone. Thus, since $\nabla f_{n}+\gamma_{n}A$ is $\alpha$ -strongly monotone, [7, Proposition 22.11(ii)] implies that $\operatorname{ran}(\nabla f_{n}+\gamma_{n}A)={\mathcal{X}}$ and it follows that condition [c] in Algorithm 2.4 is satisfied. Next, in view of Proposition 2.5 (v)(v)[b], $(x_{n})_{n\in\mathbb{N}}$ is bounded, while $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset{\mathcal{X}}=\operatorname{int}\operatorname{dom}f$ . Now set $\mu=\sup_{n\in\mathbb{N}}\|U_{n}\|$ . For every $n\in\mathbb{N}$ , since it results from (3.3) and [7, Fact 2.25(iii)] that

[TABLE]

we derive from [7, Fact 2.25(iii)] that $\|U_{n}\|\leqslant\|U_{0}\|\prod_{k\in\mathbb{N}}(1+\eta_{k})$ . Hence $\mu<{{+}\infty}$ and therefore, appealing to [14, Lemma 2.3(i)], there exists an $\alpha$ -strongly monotone self-adjoint bounded linear operator $U\colon{\mathcal{X}}\rightarrow{\mathcal{X}}$ such that $(\forall w\in{\mathcal{X}})$ $U_{n}w\rightarrow Uw$ . Define $g\colon{\mathcal{X}}\rightarrow{\mathcal{X}}:x\mapsto{\langle{{x}\mid{Ux}}\rangle}/2$ . Then $\nabla g=U$ is strongly monotone (and thus strictly monotone). Furthermore, given $(y_{n})_{n\in\mathbb{N}}$ in $C$ and $y\in\mathfrak{W}(y_{n})_{n\in\mathbb{N}}\cap C$ , say $y_{k_{n}}\>\rightharpoonup\>y$ , we have

[TABLE]

and thus $\nabla f_{k_{n}}(y_{k_{n}})\>\rightharpoonup\>\nabla g(y)$ . Therefore, condition [d][d]2/ in Theorem 2.8 is satisfied. Let us now verify that (3.4) is focusing. Towards this goal, take $z\in\mathscr{S}$ and suppose that $\sum_{n\in\mathbb{N}}(1-\delta_{2}){\langle{{x_{n}-z}\mid{Bx_{n}-Bz}}\rangle}<{{+}\infty}$ and $\sum_{n\in\mathbb{N}}(1-\kappa\gamma_{n}/\alpha)D_{f_{n}}(x_{n+1},x_{n})<{{+}\infty}$ . Since $\delta_{2}<1$ and $\sup_{n\in\mathbb{N}}(\kappa\gamma_{n})<\alpha$ , we infer from (3.2) that

[TABLE]

and $\sum_{n\in\mathbb{N}}\|x_{n+1}-x_{n}\|^{2}=2\sum_{n\in\mathbb{N}}D_{f}(x_{n+1},x_{n})\leqslant(2/\alpha)\sum_{n\in\mathbb{N}}D_{f_{n}}(x_{n+1},x_{n})<{{+}\infty}$ . It follows that

[TABLE]

Now take $x\in\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\>\rightharpoonup\>x$ , and set $(\forall n\in\mathbb{N})$ $x_{n+1}^{*}=\gamma_{n}^{-1}U_{n}(x_{n}-x_{n+1})-Bx_{n}$ . It results from (3.4) that $(x_{k_{n}+1},x_{k_{n}+1}^{*})_{n\in\mathbb{N}}$ lies in $\operatorname{gra}A$ and from (3.9) that $x_{k_{n}+1}\>\rightharpoonup\>x$ . Moreover, (3.9) yields $x_{k_{n}+1}^{*}+Bx_{k_{n}}\rightarrow 0$ . Altogether, Lemma 3.1 (applied to the sequences $(x_{k_{n}+1},x_{k_{n}+1}^{*})_{n\in\mathbb{N}}$ in $\operatorname{gra}A$ and $(x_{k_{n}},Bx_{k_{n}})_{n\in\mathbb{N}}$ in $\operatorname{gra}B$ ) guarantees that $x\in\operatorname{zer}(A+B)$ . Consequently, Theorem 2.8 asserts that $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a point in $\mathscr{S}$ .

Example 3.4

The classical forward-backward method is obtained by setting $U_{n}\equiv\operatorname{Id}$ in Corollary 3.3, which yields

[TABLE]

The case when the proximal parameters $(\gamma_{n})_{n\in\mathbb{N}}$ are constant was first addressed in [17].

We now turn to the Renaud–Cohen algorithm (1.7) and recover [20, Theorem 3.4].

Corollary 3.5

Let ${\mathcal{X}}$ be a real Hilbert space, let $A\colon{\mathcal{X}}\rightarrow 2^{\mathcal{X}}$ and $B\colon{\mathcal{X}}\rightarrow{\mathcal{X}}$ be maximally monotone, and let $f\colon{\mathcal{X}}\rightarrow\mathbb{R}$ be convex and Fréchet differentiable. Suppose that $\operatorname{zer}(A+B)\neq\varnothing$ , that $\nabla f$ is $1$ -strongly monotone on $\operatorname{dom}A$ and Lipschitzian on bounded sets, and that there exists $\beta\in\left]0,{+}\infty\right[$ such that

[TABLE]

Let $\gamma\in\left]0,2\beta\right[$ , take $x_{0}\in\operatorname{dom}A$ , and set $(\forall n\in\mathbb{N})$ $x_{n+1}=(\nabla f+\gamma A)^{-1}(\nabla f(x_{n})-\gamma Bx_{n})$ . Suppose, in addition, that $\nabla f$ is weakly sequentially continuous. Then $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a point in $\operatorname{zer}(A+B)$ .

Proof. Let $\varepsilon\in\left]0,2\beta\right[$ be such that $\gamma<2\beta-\varepsilon$ . We apply Theorem 2.8 with $C=\operatorname{dom}A$ , $\alpha=1$ , $\kappa=1/(2\beta-\varepsilon)$ , $\delta_{1}=\delta_{2}=(2\beta-\varepsilon)/(2\beta)\in\left]0,1\right[$ , and $(\forall n\in\mathbb{N})$ $f_{n}=f$ and $\eta_{n}=0$ . Proposition 2.1 (iv) asserts that (1.1) is satisfied. Furthermore, as shown in the proof of Proposition 2.1 (iv),

[TABLE]

Next, note that conditions [a] and [b] in Algorithm 2.4 are trivially satisfied. Since $\nabla f+\gamma A$ is strongly monotone and since, by [7, Corollary 25.5(i)], $\nabla f+\gamma A$ is maximally monotone, it follows from [7, Proposition 22.11(ii)] that $\operatorname{ran}(\nabla f+\gamma A)={\mathcal{X}}$ and therefore that condition [c] in Algorithm 2.4 holds. We observe that condition [b] in Theorem 2.8 is trivially satisfied and that condition [a] in Theorem 2.8 follows from (3.12) and Proposition 2.5 (i). Furthermore, since $\nabla f$ is weakly sequentially continuous and $1$ -strongly monotone on $C$ , condition [d][d]2/ in Theorem 2.8 is satisfied with $g=f$ . Now take $z\in\operatorname{zer}(A+B)$ and suppose that $\sum_{n\in\mathbb{N}}(1-\kappa\gamma)D_{f}(x_{n+1},x_{n})<{{+}\infty}$ , $\sum_{n\in\mathbb{N}}(1-\delta_{2}){\langle{{x_{n}-z}\mid{Bx_{n}-Bz}}\rangle}<{{+}\infty}$ , and $\sum_{n\in\mathbb{N}}{\langle{{x_{n+1}-z}\mid{\gamma^{-1}(\nabla f(x_{n})-\nabla f(x_{n+1}))-Bx_{n}+Bz}}\rangle}<{{+}\infty}$ . Then, since $\kappa\gamma<1$ and $\delta_{2}<1$ , it follows that

[TABLE]

and therefore that

[TABLE]

Since $(z,0)\in\operatorname{gra}(A+B)$ and since the sequence $(x_{n+1},\gamma^{-1}(\nabla f(x_{n})-\nabla f(x_{n+1}))-Bx_{n}+Bx_{n+1})_{n\in\mathbb{N}}$ lies in $\operatorname{gra}(A+B)$ by construction, it follows from (3.11) and (3.14) that $\sum_{n\in\mathbb{N}}\|Bx_{n}-Bz\|^{2}<{{+}\infty}$ . On the other hand, since $(x_{n})_{n\in\mathbb{N}}$ lies in $\operatorname{dom}A$ by Proposition 2.5, we deduce from (3.12) and (3.13) that $x_{n+1}-x_{n}\rightarrow 0$ . In turn, it results from the Lipschitz continuity of $\nabla f$ on the bounded set $\{x_{n}\}_{n\in\mathbb{N}}$ that $\nabla f(x_{n})-\nabla f(x_{n+1})\rightarrow 0$ . Now take $x\in\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\>\rightharpoonup\>x$ , and set $(\forall n\in\mathbb{N})$ $x_{n+1}^{*}=\gamma^{-1}(\nabla f(x_{n})-\nabla f(x_{n+1}))-Bx_{n}$ . Then $(x_{k_{n}+1},x_{k_{n}+1}^{*})_{n\in\mathbb{N}}$ lies in $\operatorname{gra}A$ . Furthermore, $x_{k_{n}+1}^{*}+Bx_{k_{n}}=\gamma^{-1}(\nabla f(x_{k_{n}})-\nabla f(x_{k_{n}+1}))\rightarrow 0$ and, since $x_{n}-x_{n+1}\rightarrow 0$ , $x_{k_{n}+1}\>\rightharpoonup\>x$ . Thus, applying Lemma 3.1 with the sequences $(x_{k_{n}+1},x_{k_{n}+1}^{*})_{n\in\mathbb{N}}$ and $(x_{k_{n}},Bx_{k_{n}})_{n\in\mathbb{N}}$ yields $x\in\operatorname{zer}(A+B)$ , and we conclude that condition [c] in Theorem 2.8 is satisfied as well.

3.2 The finite-dimensional case

We discuss the finite-dimensional case, a setting in which the assumptions can be greatly simplified and the results presented below are new.

Corollary 3.6

Let $(x_{n})_{n\in\mathbb{N}}$ be a sequence generated by Algorithm 2.4. In addition, suppose that the following hold:

[a]

${\mathcal{X}}$ * is finite-dimensional.* 2. [b]

$f$ * is essentially strictly convex and $\operatorname{dom}f^{*}$ is open.* 3. [c]

$(\operatorname{int}\operatorname{dom}f)\cap\operatorname{\overline{dom}}A\subset\operatorname{int}\operatorname{dom}B$ . 4. [d]

$\sup_{n\in\mathbb{N}}(\kappa\gamma_{n})<\alpha$ . 5. [e]

There exists a function $g$ in $\Gamma_{0}({\mathcal{X}})$ which is differentiable on $\operatorname{int}\operatorname{dom}g\supset\operatorname{int}\operatorname{dom}f$ , with $\nabla g$ strictly monotone on $C$ , and such that, for every sequence $(y_{n})_{n\in\mathbb{N}}$ in $C$ and every sequential cluster point $y\in\operatorname{int}\operatorname{dom}f$ of $(y_{n})_{n\in\mathbb{N}}$ , $y_{k_{n}}\rightarrow y$ $\Rightarrow$ $\nabla f_{k_{n}}(y_{k_{n}})\rightarrow\nabla g(y)$ .

Then $(x_{n})_{n\in\mathbb{N}}$ converges to a point in $\mathscr{S}$ .

Proof. It follows from Proposition 2.5 (v)(v)[e] that $(x_{n})_{n\in\mathbb{N}}$ is bounded and from Proposition 2.6 [d] that $\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{int}\operatorname{dom}f$ . In view of Theorem 2.8, it remains to show that Algorithm 2.4 is focusing. Towards this goal, let $z\in\mathscr{S}$ , and suppose that $(D_{f_{n}}(z,x_{n}))_{n\in\mathbb{N}}$ converges and $\sum_{n\in\mathbb{N}}(1-\kappa\gamma_{n}/\alpha)D_{f_{n}}(x_{n+1},x_{n})<{{+}\infty}$ , and let $x$ be a sequential cluster point of $(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\rightarrow x$ . Using [d] and the fact that $(f_{n})_{n\in\mathbb{N}}$ lies in $\mathcal{C}_{\alpha}(f)$ , we obtain

[TABLE]

Since $(x_{k_{n}})_{n\in\mathbb{N}}$ lies in $\operatorname{int}\operatorname{dom}f$ , [4, Theorem 3.8(ii)] and (3.15) imply that

[TABLE]

and [5, Theorem 5.10] thus yields

[TABLE]

Next, it results from [b], [5, Lemma 7.3(vii)], and (3.15) that

[TABLE]

Therefore, since $\nabla f(z)\in\operatorname{int}\operatorname{dom}f^{*}$ [5, Theorem 5.10] and since $f^{*}$ is a Legendre function [5, Corollary 5.5], it results from [5, Lemma 7.3(v)] that $(\nabla f(x_{k_{n}+1}))_{n\in\mathbb{N}}$ is bounded. In turn, there exists a strictly increasing sequence $(l_{k_{n}})_{n\in\mathbb{N}}$ in $\mathbb{N}$ and a point $x^{*}\in{\mathcal{X}}^{*}$ such that

[TABLE]

By lower semicontinuity of $D_{f^{*}}({\mkern 2.0mu\cdot\mkern 2.0mu},\nabla f(z))$ and (3.18), $x^{*}\in\operatorname{dom}f^{*}$ . On the other hand, appealing to [5, Lemma 7.3(vii)] and (3.15), we obtain

[TABLE]

Thus, since $(\nabla f(x_{n}))_{n\in\mathbb{N}}$ lies in $\operatorname{int}\operatorname{dom}f^{*}$ by virtue of Proposition 2.5 and [5, Theorem 5.10], we derive from [4, Theorem 3.9(iii)], (3.17), and (3.19) that $x^{*}=\nabla f(x)$ and, hence, from (3.19) that $\nabla f(x_{l_{k_{n}}+1})\rightarrow\nabla f(x)$ . It thus follows from [5, Theorem 5.10] that $x_{l_{k_{n}}+1}\rightarrow x$ . In turn, by using respectively [e] with the sequences $(x_{n})_{n\in\mathbb{N}}$ and $(x_{n+1})_{n\in\mathbb{N}}$ , we get $\nabla f_{l_{k_{n}}}(x_{l_{k_{n}}})\rightarrow\nabla g(x)$ and $\nabla f_{l_{k_{n}}}(x_{l_{k_{n}}+1})\rightarrow\nabla g(x)$ . Now set $(\forall n\in\mathbb{N})$ $x_{n+1}^{*}=\gamma_{n}^{-1}(\nabla f_{n}(x_{n})-\nabla f_{n}(x_{n+1}))-Bx_{n}$ . Then, by construction of $(x_{n})_{n\in\mathbb{N}}$ , $(\forall n\in\mathbb{N})$ $(x_{n+1},x_{n+1}^{*})\in\operatorname{gra}A$ . In addition, since $\inf_{n\in\mathbb{N}}\gamma_{n}>0$ and $\nabla f_{l_{k_{n}}}(x_{l_{k_{n}}})-\nabla f_{l_{k_{n}}}(x_{l_{k_{n}}+1})\rightarrow\nabla g(x)-\nabla g(x)=0$ , we deduce that $x_{l_{k_{n}}+1}^{*}+Bx_{l_{k_{n}}}\rightarrow 0$ . On the other hand, since $(x_{n})_{n\in\mathbb{N}}$ lies in $\operatorname{dom}A$ and $x_{k_{n}}\rightarrow x$ , it follows that $x\in\operatorname{\overline{dom}}A$ and therefore, by (3.16) and [c], that $x\in\operatorname{int}\operatorname{dom}B$ . Hence, using [21, Corollary 1.1], we obtain $Bx_{l_{k_{n}}}\rightarrow Bx$ . Altogether, Lemma 3.1 (applied to the sequence $(x_{l_{k_{n}}+1},x_{l_{k_{n}}+1}^{*})_{n\in\mathbb{N}}$ in $\operatorname{gra}A$ and the sequence $(x_{l_{k_{n}}},Bx_{l_{k_{n}}})_{n\in\mathbb{N}}$ in $\operatorname{gra}B$ ) asserts that $x\in\operatorname{zer}(A+B)$ . In view of Theorem 2.8, we conclude that $(x_{n})_{n\in\mathbb{N}}$ converges to a point in $\mathscr{S}$ .

3.3 Forward-backward splitting for convex minimization

In this section, we study the convergence of (1.6). Our results improve on and complement those of [18].

Problem 3.7

Let $\varphi\in\Gamma_{0}({\mathcal{X}})$ , let $\psi\in\Gamma_{0}({\mathcal{X}})$ , and let $f\in\Gamma_{0}({\mathcal{X}})$ be essentially smooth. Set $C=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{dom}\partial\varphi$ and $\mathscr{S}=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{Argmin}(\varphi+\psi)$ . Suppose that $\varphi+\psi$ is coercive, $\varnothing\neq C\subset\operatorname{int}\operatorname{dom}\psi$ , $\mathscr{S}\neq\varnothing$ , $\psi$ is Gâteaux differentiable on $\operatorname{int}\operatorname{dom}\psi$ , and there exists $\kappa\in\left]0,{+}\infty\right[$ such that

[TABLE]

The objective is to find a point in $\mathscr{S}$ .

In the context of Problem 3.7, given $\gamma\in\left]0,{+}\infty\right[$ and $g\in\mathcal{C}_{\alpha}(f)$ , we define $\operatorname{prox}^{g}_{\gamma\varphi}=(\nabla g+\gamma\partial\varphi)^{-1}$ .

Algorithm 3.8

Consider the setting of Problem 3.7. Let $\alpha\in\left]0,{+}\infty\right[$ , let $(\gamma_{n})_{n\in\mathbb{N}}$ be in $\left]0,{+}\infty\right[$ , and let $(f_{n})_{n\in\mathbb{N}}$ be in $\mathcal{C}_{\alpha}(f)$ . Suppose that the following hold:

[a]

There exists $\varepsilon\in\left]0,1\right[$ such that $0<\inf_{n\in\mathbb{N}}\gamma_{n}\leqslant\sup_{n\in\mathbb{N}}\gamma_{n}\leqslant\alpha(1-\varepsilon)/\kappa$ . 2. [b]

There exists a summable sequence $(\eta_{n})_{n\in\mathbb{N}}$ in $\left[0,{+}\infty\right[$ such that $(\forall n\in\mathbb{N})$ $D_{f_{n+1}}\leqslant(1+\eta_{n})D_{f_{n}}$ . 3. [c]

For every $n\in\mathbb{N}$ , $\operatorname{int}\operatorname{dom}f_{n}=\operatorname{dom}\partial f_{n}$ and $\nabla f_{n}$ is strictly monotone on $C$ .

Take $x_{0}\in C$ and set $(\forall n\in\mathbb{N})$ $x_{n+1}=\operatorname{prox}^{f_{n}}_{\gamma_{n}\varphi}(\nabla f_{n}(x_{n})-\gamma_{n}\nabla\psi(x_{n}))$ .

Theorem 3.9

Let $(x_{n})_{n\in\mathbb{N}}$ be a sequence generated by Algorithm 3.8 and suppose that the following hold:

[a]

$\mathfrak{W}(x_{n})_{n\in\mathbb{N}}\subset\operatorname{int}\operatorname{dom}f$ . 2. [b]

One of the following is satisfied:

1/

$\mathscr{S}$ * is a singleton.* 2. 2/

There exists a function $g$ in $\Gamma_{0}({\mathcal{X}})$ which is Gâteaux differentiable on $\operatorname{int}\operatorname{dom}g\supset C$ , with $\nabla g$ strictly monotone on $C$ , and such that, for every sequence $(y_{n})_{n\in\mathbb{N}}$ in $C$ and every $y\in\mathfrak{W}(y_{n})_{n\in\mathbb{N}}\cap C$ , $y_{k_{n}}\>\rightharpoonup\>y$ $\Rightarrow$ $\nabla f_{k_{n}}(y_{k_{n}})\>\rightharpoonup\>\nabla g(y)$ .

Then the following hold:

(i)

$(x_{n})_{n\in\mathbb{N}}$ * converges weakly to a point in $\mathscr{S}$ .* 2. (ii)

$(x_{n})_{n\in\mathbb{N}}$ * is a monotone minimizing sequence: $\varphi(x_{n})+\psi(x_{n})\downarrow\min(\varphi+\psi)({\mathcal{X}})$ .* 3. (iii)

$\sum_{n\in\mathbb{N}}((\varphi+\psi)(x_{n})-\min(\varphi+\psi)({\mathcal{X}}))<{{+}\infty}$ * and $(\varphi+\psi)(x_{n})-\min(\varphi+\psi)({\mathcal{X}})=o(1/n)$ .* 4. (iv)

$\sum_{n\in\mathbb{N}}n(D_{f_{n}}(x_{n+1},x_{n})+D_{f_{n}}(x_{n},x_{n+1}))<{{+}\infty}$ .

Proof. (i): We shall derive this result from Theorem 2.8 with $A=\partial\varphi$ , $B=\partial\psi$ , $\delta_{1}=0$ , and $\delta_{2}=1$ . First, appealing to [24, Theorem 2.4.4(i)], $B$ is single-valued on $\operatorname{int}\operatorname{dom}B=\operatorname{int}\operatorname{dom}\psi$ and $B=\nabla\psi$ on $\operatorname{int}\operatorname{dom}B$ . Next, set $\theta=\varphi+\psi$ . Since $\varnothing\neq(\operatorname{int}\operatorname{dom}f)\cap\operatorname{dom}\partial\varphi\subset\operatorname{int}\operatorname{dom}\psi$ , we have $\operatorname{dom}\varphi\cap\operatorname{int}\operatorname{dom}\psi\neq\varnothing$ . Hence, [9, Theorem 4.1.19] yields $A+B=\partial\theta$ . Therefore, $\operatorname{Argmin}\theta=\operatorname{zer}\partial\theta=\operatorname{zer}(A+B)$ and $\mathscr{S}=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{zer}(A+B)$ . Next, in view of Proposition 2.1 (iii), (1.1) is fulfilled. On the other hand, conditions [a] and [b] in Algorithm 2.4 are trivially satisfied. To verify condition [c] in Algorithm 2.4, it suffices to show that, for every $n\in\mathbb{N}$ , $(\nabla f_{n}-\gamma_{n}B)(C)\subset\operatorname{ran}(\nabla f_{n}+\gamma_{n}A)$ , i.e., since $C\subset\operatorname{int}\operatorname{dom}B$ and $B=\nabla\psi$ on $\operatorname{int}\operatorname{dom}B$ , that $(\nabla f_{n}-\gamma_{n}\nabla\psi)(C)\subset\operatorname{ran}(\nabla f_{n}+\gamma_{n}A)$ . To do so, fix temporarily $n\in\mathbb{N}$ , let $x\in C$ , and set

[TABLE]

Then, since $\operatorname{dom}\partial f_{n}\cap\operatorname{dom}A=(\operatorname{int}\operatorname{dom}f_{n})\cap\operatorname{dom}A=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{dom}A\neq\varnothing$ by condition [c] in Algorithm 3.8, it results from [6, Proposition 3.12] that $A_{n}$ is maximally monotone. Next, we deduce from condition [a] in Algorithm 3.8 and (3.21) that

[TABLE]

In turn,

[TABLE]

However, by coercivity of $\theta$ , there exists $\rho\in\left]0,{+}\infty\right[$ such that

[TABLE]

Now suppose that $(y,y^{*})\in\operatorname{gra}A_{n}({\mkern 2.0mu\cdot\mkern 2.0mu}+x)$ satisfies $\|y\|\geqslant\rho$ . Then $y+x\in\operatorname{dom}\nabla f_{n}\cap\operatorname{dom}A=(\operatorname{int}\operatorname{dom}f_{n})\cap\operatorname{dom}A=C$ and $y^{*}-\nabla f_{n}(y+x)+\gamma_{n}\nabla\psi(y+x)+\nabla f_{n}(x)-\gamma_{n}\nabla\psi(x)\in\gamma_{n}(A+B)(y+x)$ . Thus, it follows from (3.25) and (3.24) that

[TABLE]

Therefore, in view of [22, Proposition 2] and the maximal monotonicity of $A_{n}({\mkern 2.0mu\cdot\mkern 2.0mu}+x)$ , there exists $\overline{y}\in{\mathcal{X}}$ such that $0\in A_{n}(\overline{y}+x)$ . Hence $(\nabla f_{n}-\gamma_{n}\nabla\psi)(x)\in\nabla f_{n}(\overline{y}+x)+\gamma_{n}A(\overline{y}+x)\subset\operatorname{ran}(\nabla f_{n}+\gamma_{n}A)$ , as desired. Since $(x_{n+1},\gamma_{n}^{-1}(\nabla f_{n}(x_{n})-\nabla f_{n}(x_{n+1}))-\nabla\psi(x_{n}))$ lies in $\operatorname{gra}\partial\varphi$ by construction, we derive from [6, Proposition 2.3(ii)] that

[TABLE]

On the other hand, (3.23) and the convexity of $\psi$ entail that

[TABLE]

Altogether, upon adding (3.3) and (3.28), we obtain

[TABLE]

In particular, since $x_{n}\in C$ ,

[TABLE]

This shows that

[TABLE]

In turn, using the coercivity of $\theta$ , we infer that $(x_{n})_{n\in\mathbb{N}}$ is bounded, which secures [a] in Theorem 2.8. It remains to verify that Algorithm 3.8 is focusing. Towards this end, let $z\in\mathscr{S}$ and suppose that

[TABLE]

and

[TABLE]

Set $\gamma=\inf_{n\in\mathbb{N}}\gamma_{n}$ and $\ell=\lim D_{f_{n}}(z,x_{n})$ . It follows from (3.29) applied to $z\in C$ that

[TABLE]

and therefore from condition [b] in Algorithm 3.8 that

[TABLE]

Hence, $\varlimsup\gamma(\theta(x_{n+1})-\min\theta({\mathcal{X}}))+\ell\leqslant\ell$ and therefore $\varlimsup(\theta(x_{n+1})-\min\theta({\mathcal{X}}))=0$ . Thus

[TABLE]

Now take $x\in\mathfrak{W}(x_{n})_{n\in\mathbb{N}}$ , say $x_{k_{n}}\>\rightharpoonup\>x$ . By weak lower semicontinuity of $\theta$ , $\min\theta({\mathcal{X}})\leqslant\theta(x)\leqslant\varliminf\theta(x_{k_{n}})=\min\theta({\mathcal{X}})$ and it follows that $x\in\operatorname{Argmin}\theta=\operatorname{zer}(A+B)$ . Consequently, Theorem 2.8 asserts that $(x_{n})_{n\in\mathbb{N}}$ converges weakly to a point in $\mathscr{S}$ .

(ii): Combine (3.31) and (3.36).

(iii)&(iv): Fix $z\in\mathscr{S}$ and set $\gamma=\inf_{n\in\mathbb{N}}\gamma_{n}$ . Arguing along the same lines as above, we obtain

[TABLE]

and therefore [7, Lemma 5.31] guarantees that $\sum_{n\in\mathbb{N}}(\theta(x_{n})-\min\theta({\mathcal{X}}))<{{+}\infty}$ . In addition, $(\theta(x_{n})-\min\theta({\mathcal{X}}))_{n\in\mathbb{N}}$ is decreasing by virtue of (3.31). However, recall that if $(\alpha_{n})_{n\in\mathbb{N}}$ is a decreasing sequence in $\left[0,{+}\infty\right[$ such that $\sum_{n\in\mathbb{N}}\alpha_{n}<{{+}\infty}$ , then

[TABLE]

Hence, $\theta(x_{n})-\min\theta({\mathcal{X}})=o(1/n)$ and $\sum_{n\in\mathbb{N}}n(\theta(x_{n})-\theta(x_{n+1}))<{{+}\infty}$ . Consequently, since (3.29) yields

[TABLE]

we infer that $\sum_{n\in\mathbb{N}}n(D_{f_{n}}(x_{n+1},x_{n})+D_{f_{n}}(x_{n},x_{n+1}))<{{+}\infty}$ .

Remark 3.10

Let us relate Theorem 3.9 to the literature.

(i)

The conclusions of items (i) and (ii) are obtained in [18, Theorem 1(2)] under more restrictive conditions on the sequences $(\gamma_{n})_{n\in\mathbb{N}}$ and $(f_{n})_{n\in\mathbb{N}}$ . Thus, we do not require in Theorem 3.9 the additional condition $(\forall n\in\mathbb{N})$ $(1+\eta_{n})\gamma_{n}-\gamma_{n+1}\leqslant\alpha\eta_{n}/\kappa$ . Furthermore, we do not suppose either that ${-}\operatorname{ran}\nabla\psi\subset\operatorname{dom}\varphi^{*}$ or that the functions $(f_{n})_{n\in\mathbb{N}}$ are cofinite. 2. (ii)

Items (iii) and (iv) are new even in Euclidean spaces. In the finite-dimensional setting, partial results can be found in [3], where:

(a)

A single convex function is used: $(\forall n\in\mathbb{N})$ $f_{n}=f$ . 2. (b)

The viability of the sequence $(x_{n})_{n\in\mathbb{N}}$ is a blanket assumption, while it is guaranteed in Theorem 3.9. 3. (c)

Only the rates $\sum_{n\in\mathbb{N}}D_{f}(x_{n+1},x_{n})<{{+}\infty}$ and $(\varphi+\psi)(x_{n})-\min(\varphi+\psi)({\mathcal{X}})=O(1/n)$ are obtained.

3.4 Further applications

Theorems 2.8 and 3.9 operate under broad assumptions which go beyond those of the existing forward-backward methods of [6, 15, 18, 20] described in (1.4)–(1.7). Here are two examples which do not fit the existing scenarios and exploit this generality.

Example 3.11

Consider the setting of Problem 1.1. Suppose, in addition, that the following hold:

[a]

$A$ is uniformly monotone on bounded sets. 2. [b]

There exist $\psi\in\Gamma_{0}({\mathcal{X}})$ and $\kappa\in\left]0,{+}\infty\right[$ such that $B=\partial\psi$ and $(\forall x\in C)(\forall y\in C)$ $D_{\psi}(x,y)\leqslant\kappa D_{f}(x,y)$ . 3. [c]

$f$ is supercoercive. 4. [d]

$\operatorname{zer}(A+B)\subset\operatorname{int}\operatorname{dom}f$ .

Let $(\gamma_{n})_{n\in\mathbb{N}}$ be a sequence in $\left]0,{+}\infty\right[$ such that $0<\inf_{n\in\mathbb{N}}\gamma_{n}\leqslant\sup_{n\in\mathbb{N}}\gamma_{n}<1/\kappa$ , take $x_{0}\in C$ , and set $(\forall n\in\mathbb{N})$ $x_{n+1}=(\nabla f+\gamma_{n}A)^{-1}(\nabla f(x_{n})-\gamma_{n}\nabla\psi(x_{n}))$ . Then $(x_{n})_{n\in\mathbb{N}}$ converges strongly to the unique zero of $A+\nabla\psi$ .

The next example concerns variational inequalities.

Example 3.12

Let $\varphi\in\Gamma_{0}({\mathcal{X}})$ , let $B\colon{\mathcal{X}}\rightarrow 2^{{\mathcal{X}}^{*}}$ be maximally monotone, let $f\in\Gamma_{0}({\mathcal{X}})$ be essentially smooth, and set $C=(\operatorname{int}\operatorname{dom}f)\cap\operatorname{dom}\partial\varphi$ . Suppose that $C\subset\operatorname{int}\operatorname{dom}B$ and $B$ is single-valued on $\operatorname{int}\operatorname{dom}B$ . Consider the problem of finding a point in

[TABLE]

which is assumed to be nonempty. This is a special case of Problem 1.1 with $A=\partial\varphi$ and, given $x_{0}\in C$ , Algorithm 2.4 produces the iterations $(\forall n\in\mathbb{N})$ $x_{n+1}=\operatorname{prox}_{\gamma_{n}\varphi}^{f_{n}}(\nabla f_{n}(x_{n})-\gamma_{n}Bx_{n})$ . The weak convergence of $(x_{n})_{n\in\mathbb{N}}$ to a point in $\mathscr{S}$ is discussed in Theorem 2.8. Even in Euclidean spaces, this scheme is new and of interest since, as shown in [3, 13, 18], the Bregman proximity operator $\operatorname{prox}_{\gamma_{n}\varphi}^{f_{n}}$ may be easier to compute for a particular $f_{n}$ than for the standard kernel $\|{\mkern 2.0mu\cdot\mkern 2.0mu}\|^{2}/2$ . Altogether, our framework makes it possible to solve variational inequalities by forward-backward splitting with non-cocoercive operators and/or outside of Hilbert spaces.

Bibliography24

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] J.-B. Baillon and G. Haddad, Quelques propriétés des opérateurs angle-bornés et n 𝑛 n -cycliquement monotones, Israel J. Math. , vol. 26, pp. 137–150, 1977.
2[2] S. Banach, Théorie des Opérations Linéaires. Seminar. Matem. Univ. Warszawa, 1932.
3[3] H. H. Bauschke, J. Bolte, and M. Teboulle, A descent lemma beyond Lipschitz gradient continuity: First-order methods revisited and applications, Math. Oper. Res. , vol. 42, pp. 330–348, 2017.
4[4] H. H. Bauschke and J. M. Borwein, Legendre functions and the method of random Bregman projections, J. Convex Anal. , vol. 4, pp. 27–67, 1997.
5[5] H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Essential smoothness, essential strict convexity, and Legendre functions in Banach spaces, Commun. Contemp. Math. , vol. 3, pp. 615–647, 2001.
6[6] H. H. Bauschke, J. M. Borwein, and P. L. Combettes, Bregman monotone optimization algorithms, SIAM J. Control Optim. , vol. 42, pp. 596–636, 2003.
7[7] H. H. Bauschke and P. L. Combettes, Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2nd ed. Springer, New York, 2017.
8[8] H. H. Bauschke, M. N. Dao, and S. B. Lindstrom, Regularizing with Bregman–Moreau envelopes, SIAM J. Optim. , vol. 28, pp. 3208–3228, 2018.