A Lyapunov-type approach to convergence of the Douglas-Rachford   algorithm

Minh N. Dao; Matthew K. Tam

arXiv:1706.04846·math.OC·April 6, 2020

A Lyapunov-type approach to convergence of the Douglas-Rachford algorithm

Minh N. Dao, Matthew K. Tam

PDF

TL;DR

This paper introduces a Lyapunov-type approach to prove convergence of the Douglas-Rachford algorithm in nonconvex settings, expanding its theoretical understanding and applicability.

Contribution

It provides the first convergence proof for Douglas-Rachford in certain nonconvex problems using a Lyapunov functional approach.

Findings

01

Convergence is established in nonconvex scenarios.

02

The Lyapunov functional does not require convexity of the original sets.

03

Examples demonstrate global convergence in nonconvex cases.

Abstract

The Douglas-Rachford projection algorithm is an iterative method used to find a point in the intersection of closed constraint sets. The algorithm has been experimentally observed to solve various nonconvex feasibility problems which current theory cannot sufficiently explain. In this paper, we prove convergence of the Douglas-Rachford algorithm in a potentially nonconvex setting. Our analysis relies on the existence of a Lyapunov-type functional whose convexity properties are not tantamount to convexity of the original constraint sets. Moreover, we provide various nonconvex examples in which our framework proves global convergence of the algorithm.

Equations240

X

X

⟨ (x_{1}, y_{1}), (x_{2}, y_{2}) ⟩_{X \times Y} := ⟨ x_{1}, x_{2} ⟩_{X} + ⟨ y_{1}, y_{2} ⟩_{Y} .

⟨ (x_{1}, y_{1}), (x_{2}, y_{2}) ⟩_{X \times Y} := ⟨ x_{1}, x_{2} ⟩_{X} + ⟨ y_{1}, y_{2} ⟩_{Y} .

P_{C}\colon X\rightrightarrows C\colon x\mapsto\operatorname*{argmin}_{c\in C}\|x-c\|=\{{c\in C}~{}\big{|}~{}{\|x-c\|=d_{C}(x)}\},

P_{C}\colon X\rightrightarrows C\colon x\mapsto\operatorname*{argmin}_{c\in C}\|x-c\|=\{{c\in C}~{}\big{|}~{}{\|x-c\|=d_{C}(x)}\},

C \cap D

C \cap D

T_{C, D} := \frac{1}{2} (Id + R_{D} R_{C}),

T_{C, D} := \frac{1}{2} (Id + R_{D} R_{C}),

(\forall n \in N) x_{n + 1} \in T_{C, D} x_{n},

(\forall n \in N) x_{n + 1} \in T_{C, D} x_{n},

(\forall x\in X)\quad T_{C,D}x=\frac{1}{2}(\operatorname{Id}+R_{D}R_{C})x=\{{x-c+P_{D}(2c-x)}~{}\big{|}~{}{c\in P_{C}x}\}.

(\forall x\in X)\quad T_{C,D}x=\frac{1}{2}(\operatorname{Id}+R_{D}R_{C})x=\{{x-c+P_{D}(2c-x)}~{}\big{|}~{}{c\in P_{C}x}\}.

(\forall x \in X) (\forall y \in X) ∥ P_{C} x - P_{C} y ∥^{2} + ∥ (Id - P_{C}) x - (Id - P_{C}) y ∥^{2} \leq ∥ x - y ∥^{2} .

(\forall x \in X) (\forall y \in X) ∥ P_{C} x - P_{C} y ∥^{2} + ∥ (Id - P_{C}) x - (Id - P_{C}) y ∥^{2} \leq ∥ x - y ∥^{2} .

(\forall x \in X) (\forall y \in X) ∥ R_{C} x - R_{C} y ∥ \leq ∥ x - y ∥.

(\forall x \in X) (\forall y \in X) ∥ R_{C} x - R_{C} y ∥ \leq ∥ x - y ∥.

2∥ x_{+} - \overset{x}{ˉ} ∥

2∥ x_{+} - \overset{x}{ˉ} ∥

\leq ∥ x - \overset{x}{ˉ} ∥ + 2∥ p - R_{C} x ∥ + ∥ R_{C} x - \overset{x}{ˉ} ∥

= ∥ x - \overset{x}{ˉ} ∥ + 2 d_{D} (R_{C} x) + ∥ R_{C} x - \overset{x}{ˉ} ∥

\leq ∥ x - \overset{x}{ˉ} ∥ + 2∥ R_{C} x - \overset{x}{ˉ} ∥ + ∥ R_{C} x - \overset{x}{ˉ} ∥

= ∥ x - \overset{x}{ˉ} ∥ + 3∥ R_{C} x - R_{C} \overset{x}{ˉ} ∥ \leq 4∥ x - \overset{x}{ˉ} ∥,

(\forall n \in N) p_{n} := x_{n + 1} - x_{n} + P_{C} x_{n} \in P_{D} R_{C} x_{n} \subseteq D .

(\forall n \in N) p_{n} := x_{n + 1} - x_{n} + P_{C} x_{n} \in P_{D} R_{C} x_{n} \subseteq D .

(\forall x \in dom f) (\forall y \in dom f) (\forall λ \in] 0, 1 [) f ((1 - λ) x + λ y) \leq (1 - λ) f (x) + λ f (y) .

(\forall x \in dom f) (\forall y \in dom f) (\forall λ \in] 0, 1 [) f ((1 - λ) x + λ y) \leq (1 - λ) f (x) + λ f (y) .

(\forall x \in C) (\forall y \in C) ⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ \geq 0.

(\forall x \in C) (\forall y \in C) ⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ \geq 0.

(\forall x \in C) (\forall y \in C) x \neq = y ⟹ ⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ > 0.

(\forall x \in C) (\forall y \in C) x \neq = y ⟹ ⟨ x - y, \nabla f (x) - \nabla f (y) ⟩ > 0.

N_{C}(x):=\bigg{\{}{x^{*}\in X}~{}\bigg{|}~{}{\exists\varepsilon_{n}\downarrow 0,\;x_{n}\stackrel{{\scriptstyle C}}{{\to}}x,\text{~{}and~{}}x_{n}^{*}\to x^{*}\text{~{}with~{}}\limsup_{y\stackrel{{\scriptstyle C}}{{\to}}x_{n}}\frac{\left\langle{x_{n}^{*}},{y-x_{n}}\right\rangle}{\|y-x_{n}\|}\leq\varepsilon_{n}}\bigg{\}}

N_{C}(x):=\bigg{\{}{x^{*}\in X}~{}\bigg{|}~{}{\exists\varepsilon_{n}\downarrow 0,\;x_{n}\stackrel{{\scriptstyle C}}{{\to}}x,\text{~{}and~{}}x_{n}^{*}\to x^{*}\text{~{}with~{}}\limsup_{y\stackrel{{\scriptstyle C}}{{\to}}x_{n}}\frac{\left\langle{x_{n}^{*}},{y-x_{n}}\right\rangle}{\|y-x_{n}\|}\leq\varepsilon_{n}}\bigg{\}}

\partial f(x):=\{{x^{*}\in X}~{}\big{|}~{}{(x^{*},-1)\in N_{\operatorname{epi}f}(x,f(x))}\}

\partial f(x):=\{{x^{*}\in X}~{}\big{|}~{}{(x^{*},-1)\in N_{\operatorname{epi}f}(x,f(x))}\}

\widehat{\partial}_{\varepsilon}f(x):=\bigg{\{}{x^{*}\in X}~{}\bigg{|}~{}{\liminf_{y\to x}\frac{f(y)-f(x)-\left\langle{x^{*}},{y-x}\right\rangle}{\|y-x\|}\geq-\varepsilon}\bigg{\}}.

\widehat{\partial}_{\varepsilon}f(x):=\bigg{\{}{x^{*}\in X}~{}\bigg{|}~{}{\liminf_{y\to x}\frac{f(y)-f(x)-\left\langle{x^{*}},{y-x}\right\rangle}{\|y-x\|}\geq-\varepsilon}\bigg{\}}.

\partial f(x)=\operatorname*{Limsup}_{\begin{subarray}{c}y\stackrel{{\scriptstyle f}}{{\to}}x,\,\varepsilon\,\downarrow\,0\end{subarray}}\widehat{\partial}_{\varepsilon}f(y)=\{{x^{*}\in X}~{}\big{|}~{}{\exists\varepsilon_{n}\to 0,\,x_{n}\stackrel{{\scriptstyle f}}{{\to}}x,\,x_{n}^{*}\to x^{*}\text{~{}with~{}}x_{n}^{*}\in\widehat{\partial}_{\varepsilon_{n}}f(x_{n})}\},

\partial f(x)=\operatorname*{Limsup}_{\begin{subarray}{c}y\stackrel{{\scriptstyle f}}{{\to}}x,\,\varepsilon\,\downarrow\,0\end{subarray}}\widehat{\partial}_{\varepsilon}f(y)=\{{x^{*}\in X}~{}\big{|}~{}{\exists\varepsilon_{n}\to 0,\,x_{n}\stackrel{{\scriptstyle f}}{{\to}}x,\,x_{n}^{*}\to x^{*}\text{~{}with~{}}x_{n}^{*}\in\widehat{\partial}_{\varepsilon_{n}}f(x_{n})}\},

\operatorname*{Limsup}_{y\to x}F(y):=\{{x^{*}\in X}~{}\big{|}~{}{\exists x_{n}\to x,\,x_{n}^{*}\to x^{*}\text{~{}with~{}}x_{n}^{*}\in F(x_{n})}\}

\operatorname*{Limsup}_{y\to x}F(y):=\{{x^{*}\in X}~{}\big{|}~{}{\exists x_{n}\to x,\,x_{n}^{*}\to x^{*}\text{~{}with~{}}x_{n}^{*}\in F(x_{n})}\}

\partial (f + g) (x) = \partial f (x) + \nabla g (x) .

\partial (f + g) (x) = \partial f (x) + \nabla g (x) .

\partial (f \cdot g) (x) = \partial (g (x) f + f (x) g) (x) \subseteq \partial (g (x) f) (x) + \partial (f (x) g) (x) .

\partial (f \cdot g) (x) = \partial (g (x) f + f (x) g) (x) \subseteq \partial (g (x) f) (x) + \partial (f (x) g) (x) .

\partial f(x)=\operatorname*{Limsup}_{\begin{subarray}{c}y\stackrel{{\scriptstyle f}}{{\to}}x\end{subarray}}\widehat{\partial}f(y)=\{{x^{*}\in X}~{}\big{|}~{}{\exists x_{n}\stackrel{{\scriptstyle f}}{{\to}}x\text{~{}and~{}}x_{n}^{*}\to x^{*}\text{~{}with~{}}x_{n}^{*}\in\widehat{\partial}f(x_{n})}\},

\partial f(x)=\operatorname*{Limsup}_{\begin{subarray}{c}y\stackrel{{\scriptstyle f}}{{\to}}x\end{subarray}}\widehat{\partial}f(y)=\{{x^{*}\in X}~{}\big{|}~{}{\exists x_{n}\stackrel{{\scriptstyle f}}{{\to}}x\text{~{}and~{}}x_{n}^{*}\to x^{*}\text{~{}with~{}}x_{n}^{*}\in\widehat{\partial}f(x_{n})}\},

\partial f (0) = [- 1, 1] and - \partial (- f) (0) = {- 1, 1} .

\partial f (0) = [- 1, 1] and - \partial (- f) (0) = {- 1, 1} .

\partial^{0} f := \partial f \cup \partial^{+} f,

\partial^{0} f := \partial f \cup \partial^{+} f,

\partial f (x) = \partial^{+} f (x) = \partial^{0} f (x) = {\nabla f (x)} .

\partial f (x) = \partial^{+} f (x) = \partial^{0} f (x) = {\nabla f (x)} .

\partial f(x)=\{{x^{*}\in X}~{}\big{|}~{}{(\forall y\in X)\;\left\langle{x^{*}},{y-x}\right\rangle\leq f(y)-f(x)}\},

\partial f(x)=\{{x^{*}\in X}~{}\big{|}~{}{(\forall y\in X)\;\left\langle{x^{*}},{y-x}\right\rangle\leq f(y)-f(x)}\},

\partial^{+} f (x) \subseteq \partial f (x) = \partial^{0} f (x) .

\partial^{+} f (x) \subseteq \partial f (x) = \partial^{0} f (x) .

(\forall λ \in R ∖ {0}) \partial (λ f) = {λ \partial f λ \partial^{+} f if λ > 0, if λ < 0.

(\forall λ \in R ∖ {0}) \partial (λ f) = {λ \partial f λ \partial^{+} f if λ > 0, if λ < 0.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

A Lyapunov-type approach to convergence of the Douglas–Rachford algorithm

Minh N. Dao CARMA, University of Newcastle, Callaghan, NSW 2308, Australia. E-mail: [email protected]

Matthew K. Tam Institut für Numerische und Angewandte Mathematik, Universität Göttingen, 37083 Göttingen, Germany. E-mail: [email protected]

Abstract

The Douglas–Rachford projection algorithm is an iterative method used to find a point in the intersection of closed constraint sets. The algorithm has been experimentally observed to solve various nonconvex feasibility problems which current theory cannot sufficiently explain. In this paper, we prove convergence of the Douglas–Rachford algorithm in a potentially nonconvex setting. Our analysis relies on the existence of a Lyapunov-type functional whose convexity properties are not tantamount to convexity of the original constraint sets. Moreover, we provide various nonconvex examples in which our framework proves global convergence of the algorithm.

Mathematics Subject Classification (MSC 2010): 90C26 $\cdot$ 47H10 $\cdot$ 37B25

Keywords: Douglas–Rachford algorithm, feasibility problem, global convergence, graph of a function, linear convergence, Lyapunov function, method of alternating projections, Newton’s method, nonconvex set, projection, stability, zero of a function

1 Introduction

The Douglas–Rachford algorithm (DRA) is an iterative method used to solve the so-called feasibility problem which asks for a point in the intersection of closed constraint sets. The method generates a sequence by combining the nearest point projectors of the individual constraint sets with exploiting the structure of problems in which these individual projectors can be efficiently computed or, at least, more efficiently than a direct attempt to solve the original problem. The origins of the method can be traced to work of Douglas & Rachford [DR56] where it was proposed as a method for numerically solving problems arising in heat conduction. In the convex setting, the situation is fairly well understood; convergence is due to Lions & Mercier [LM79] and has since been refined in various works [BCL04, BDM16, BM17, Sva11].

In the absence of convexity, Borwein & Sims [BS11] established local convergence of the DRA applied to a prototypical nonconvex feasibility problem involving a line and sphere. Here “prototypical” is meant in the sense of being an accessible model for imaging problems where phase is to be reconstructed from magnitude measurements whilst retaining the mathematical complexities. The same prototype has since studied in [AB13, Gil16]. In their paper, Borwein & Sims [BS11] conjectured that the DRA was actually globally convergent; a conjecture that was recently resolved in the affirmative by Benoist [Ben15] through a cleverly constructed Lyapunov function. Global convergence of the DRA for prototypical combinatorial optimization problems has been also proven in [ABT16, BD16].

A general approach to convergence of the DRA without convexity was provided by Phan [Pha16], to which work of Luke & Hesse [HL13] was a precursor. This approach follows related works, originating from [LLM09], which focus on the method of alternating projections and assume that local regularity properties of the underlying constraint sets hold near solutions of the problem (see also [NR16]). The main difficulty in applying these results, lies in that they give little information regarding the region of convergence, that is, the starting points from which the algorithm converges. Moreover, in practice, finding a point sufficiently close to a solution of a feasibility problem is often just as difficult as solving the original feasibility problem itself.

In this work, we generalize Benoist’s approach to construction of Lyapunov-type functionals as a tool to prove convergence of the DRA. In particular, we show that convergence of the DRA is ensured provided that the constructed Lyapunov-type function possess appropriate convexity properties. We emphasize here that the convexity properties of the Lyapunov-type function are independent of convexity of the underlying feasibility problem. As a consequence of our analysis, a region of convergence of the DRA can be identified by analyzing the Lyapunov-type function associated with the problem at hand.

The remainder of this paper is organized as follows. In Section 2, we introduce the necessary notions of nonsmooth analysis. In Section 3, we give a precise description of the Douglas–Rachford operator. In Section 4 we provide conditions under which the DRA enjoys stability properties near fixed points. Our Lyapunov approach to convergence of the algorithm follows in Section 5. Examples to which the results apply are considered in Section LABEL:s:examples together with some counter-examples to demonstrate that both the method of alternating projections and Newton’s method can fail to converge to a solution even when the DRA does. In fact, global convergence of the DRA is obtained in all bar one of our provided examples.

2 Preliminaries

In this section we introduce and recall necessary notions and tools from nonsmooth analysis. Throughout this work, we assume that

[TABLE]

(i.e., a finite-dimensional real Hilbert space) with inner product $\left\langle{\cdot},{\cdot}\right\rangle$ and induced norm $\|\cdot\|$ . Given two real Hilbert spaces $X$ and $Y$ with corresponding inner products denoted $\left\langle{\cdot},{\cdot}\right\rangle_{X}$ and $\left\langle{\cdot},{\cdot}\right\rangle_{Y}$ , where appropriate, we use the product space $X\times Y$ which is a Hilbert space when equipped with the inner product defined by

[TABLE]

We denote the set of nonnegative integers is by $\mathbb{N}$ and the set of real numbers by $\mathbb{R}$ . The set of nonnegative real numbers is denoted $\mathbb{R}_{+}:=\{{x\in\mathbb{R}}~{}\big{|}~{}{x\geq 0}\}$ and set of the positive real numbers $\mathbb{R}_{++}:=\{{x\in\mathbb{R}}~{}\big{|}~{}{x>0}\}$ . The sets of nonpositive and negative real numbers, denoted $\mathbb{R}_{-}$ and $\mathbb{R}_{--}$ respectively, are defined analogously. Given a subset $C$ of $X$ , its closure and interior are denoted respectively by $\overline{C}$ and $\operatorname{int}C$ . For a point $x\in X$ and scalar $\rho\in\mathbb{R}_{++}$ , the closed ball centered at $x$ with radius $\rho$ is denoted $\mathbb{B}\left({x};{\rho}\right):=\{{y\in X}~{}\big{|}~{}{\|x-y\|\leq\rho}\}$ .

2.1 The Douglas–Rachford algorithm

In this section, we recall the background material for the Douglas–Rachford algorithm. Let $C$ be a nonempty subset of $X$ . The projector onto $C$ is the mapping

[TABLE]

where $d_{C}(x):=\inf_{c\in C}\|x-c\|$ is the distance from $x$ to $C$ . Each $y\in P_{C}(x)$ is a nearest point of $x$ in $C$ , and called a projection of $x$ onto $C$ . Since we consider only finite-dimensional spaces $X$ , closedness of the set $C$ is necessary and sufficient for $C$ being proximinal, i.e., $(\forall x\in X)$ $P_{C}x\neq\varnothing$ ; see [BC11, Corollary 3.13]. In an abuse of notation, we write $P_{C}x=c$ whenever $P_{C}x=\{c\}$ .

Let $C$ and $D$ be closed subsets of $X$ such that $C\cap D\neq\varnothing$ . The feasibility problem is

[TABLE]

A classical splitting method for solving (4) is the so-called Douglas–Rachford algorithm which is concisely described as the fixed point iteration corresponding to the Douglas–Rachford (DR) operator defined by

[TABLE]

where $\operatorname{Id}$ is the identity operator, and $R_{C}:=2P_{C}-\operatorname{Id}$ and $R_{D}:=2P_{D}-\operatorname{Id}$ are the reflectors across $C$ and $D$ , respectively. A sequence $(x_{n})_{n\in{\mathbb{N}}}$ is called a DR sequence (with respect to $(C,D)$ ), with starting point $x_{0}\in X$ , if

[TABLE]

where we note that

[TABLE]

In the literature, the DRA for feasibility problems is also known as averaged alternating reflections [BCL04] and reflect-reflect-average method [BS11]. For other connections, we refer the reader to [BCL02].

The following fact gives important properties of convex projectors.

Fact 2.1 (Projectors and reflectors onto convex sets).

Let $C$ be a nonempty closed convex subset of $X$ . Then the following hold:

(i)

$P_{C}$ * is everywhere single-valued and firmly nonexpansive, that is,*

[TABLE] 2. (ii)

$R_{C}$ * is everywhere single-valued and nonexpansive, that is,*

[TABLE]

In particular, $P_{C}$ and $R_{C}$ are continuous on $X$ .

Proof.

(i): See [BC11, Theorem 3.14 & Proposition 4.8]. (ii): This follows from (i) and [BC11, Corollary 4.10]. ∎

In the case that one of the constraints is convex, we make the following observations. In what follows, recall that a sequence $(x_{n})_{n\in{\mathbb{N}}}$ is asymptotically regular if $x_{n}-x_{n+1}\to 0$ as $n\to+\infty$ .

Lemma 2.2 (Properties of the DRA).

Let $C$ be a closed convex subset and $D$ be a closed subset of $X$ such that $C\cap D\neq\varnothing$ , and let $(x_{n})_{n\in{\mathbb{N}}}$ be a DR sequence with respect to $(C,D)$ . Then the following hold:

(i)

$T_{C,D}=\frac{1}{2}(\operatorname{Id}+R_{D}R_{C})=\operatorname{Id}-P_{C}+P_{D}R_{C}.$ ** 2. (ii)

$T_{C,D}(\mathbb{B}\left({\bar{x}};{\delta}\right))\subseteq\mathbb{B}\left({\bar{x}};{2\delta}\right)$ * whenever $\bar{x}\in C\cap D$ and $\delta\in\mathbb{R}_{++}$ .* 3. (iii)

If $(x_{n})_{n\in{\mathbb{N}}}$ is asymptotically regular and possess a cluster point $x$ , then $P_{C}x\in C\cap D$ .

Proof.

(i): Combine (7) with the single-valuedness of $P_{C}$ (2.1).

(ii): Let $\bar{x}\in C\cap D$ , let $\delta\in\mathbb{R}_{++}$ , let $x\in\mathbb{B}\left({\bar{x}};{\delta}\right)$ , and let $x_{+}\in T_{C,D}x$ . Then there exists $p\in P_{D}R_{C}x$ such that $x_{+}=\frac{1}{2}(x+(2p-R_{C}x))$ . Since $\bar{x}\in C\cap D$ , it follows that $R_{C}\bar{x}=\bar{x}$ and thus

[TABLE]

where the last estimate follows from the nonexpansiveness of $R_{C}$ (2.1). Altogether, we obtain that $\|x_{+}-\bar{x}\|\leq 2\|x-\bar{x}\|\leq 2\delta$ , hence $x_{+}\in\mathbb{B}\left({\bar{x}};{2\delta}\right)$ and the result follows.

(iii): Using (i) yields

[TABLE]

Let $x$ be a cluster point of $(x_{n})_{n\in{\mathbb{N}}}$ . Then there exists a subsequence $(x_{k_{n}})_{n\in{\mathbb{N}}}$ of $(x_{n})_{n\in{\mathbb{N}}}$ such that $x_{k_{n}}\to x$ . By 2.1, $P_{C}$ is continuous and so $P_{C}x_{k_{n}}\to P_{C}x$ . Combining with (11) and the asymptotic regularity of $(x_{n})_{n\in{\mathbb{N}}}$ , this gives $p_{k_{n}}\to P_{C}x$ . Noting that $(\forall{n\in{\mathbb{N}}})$ $p_{k_{n}}\in D$ and that $D$ is closed, we deduce that $P_{C}x\in D$ and therefore $P_{C}x\in C\cap D$ . ∎

2.2 Convexity

Given an extended-real-valued function $f\colon X\to\left[-\infty,+\infty\right]$ , its effective domain is denoted by $\operatorname{dom}f:=\{{x\in X}~{}\big{|}~{}{f(x)<+\infty}\}$ , its graph by $\operatorname{gra}f:=\{{(x,\rho)\in X\times\mathbb{R}}~{}\big{|}~{}{f(x)=\rho}\}$ , its epigraph by $\operatorname{epi}f:=\{{(x,\rho)\in X\times\mathbb{R}}~{}\big{|}~{}{f(x)\leq\rho}\}$ , and its lower level set at height $\xi\in\mathbb{R}$ by $\operatorname{lev}_{\leq\xi}f:=\{{x\in X}~{}\big{|}~{}{f(x)\leq\xi}\}$ . The function $f$ is said to be proper if $\operatorname{dom}f\neq\varnothing$ and it never takes the value $-\infty$ , lower semicontinuous (lsc) if $f(x)\leq\liminf_{y\to x}f(y)$ for every $x\in\operatorname{dom}f$ , and convex if

[TABLE]

Let $f\colon X\to\left[-\infty,+\infty\right]$ be proper. Then $f$ is said to be strictly convex if, in addition to being convex, the inequality in (12) is strict whenever $x\neq y$ . We say that $f$ is convex on $C$ (respectively strictly convex on $C$ ) if the corresponding inequality holds whenever $x\in C$ and $y\in C$ . ity.

Fact 2.3.

Let $f\colon X\to\left[-\infty,+\infty\right]$ be a proper function and let $C$ be a nonempty open convex subset of $\operatorname{dom}f$ .

(i)

Suppose that $f$ is Gâteaux differentiable on $C$ . Then the following hold:

(a)

$f$ * is convex on $C$ if and only if $\nabla f$ is monotone on $C$ in the sense that*

[TABLE] 2. (b)

$f$ * is strictly convex on $C$ if and only if $\nabla f$ is strictly monotone on $C$ in the sense that*

[TABLE] 2. (ii)

Suppose that $f$ is twice Gâteaux differentiable on $C$ . Then the following hold:

(a)

$f$ * is convex on $C$ if and only if $\nabla^{2}f(x)$ is positive semidefinite for every $x\in C$ .* 2. (b)

$f$ * is strictly convex on $C$ if $\nabla^{2}f(x)$ is positive definite for every $x\in C$ .*

Proof.

This follows from [BC11, Propositions 17.10 & 17.13]. ∎

2.3 Subdifferentiability

The limiting normal cone to a subset $C$ of $X$ at a point $x\in X$ is defined by

[TABLE]

if $x\in C$ , and by $N_{C}(x):=\varnothing$ otherwise. Here the notation $y\stackrel{{\scriptstyle C}}{{\to}}x$ means $y\to x$ with $y\in C$ .

Let $f\colon X\to\left[-\infty,+\infty\right]$ , let $x\in X$ with $|f(x)|<+\infty$ , and let $\varepsilon\in\mathbb{R}_{+}$ . The limiting subdifferential of $f$ at $x$ is given by

[TABLE]

and the analytic $\varepsilon$ -subdifferential of $f$ at $x$ is given by

[TABLE]

Both subdifferentials of $f$ at a point $x$ are defined to be empty when $|f(x)|=+\infty$ . The limiting subdifferential can be represented in analytic form [Mor06, Theorem 1.89]

[TABLE]

where the notation $y\stackrel{{\scriptstyle f}}{{\to}}x$ means $y\to x$ with $f(y)\to f(x)$ and

[TABLE]

denotes the sequential Painlevé–Kuratowski upper limit of $F$ at $x$ .

We now recall some important properties of the limiting subdifferential.

Fact 2.4 (Fermat’s rule).

Suppose that a function $f\colon X\to\left[-\infty,+\infty\right]$ attains a local minimum at a point $x$ with $|f(x)|<+\infty$ . Then $0\in\partial f(x)$ .

Proof.

This follows from [Mor06, Proposition 1.114]. ∎

Fact 2.5 (Sum and product rules).

Consider two functions, $f\colon X\to\left[-\infty,+\infty\right]$ and $g\colon X\to\left[-\infty+\infty\right]$ , and let $x\in X$ . The following assertions hold.

(i)

If $f$ is finite at $x$ and $g$ is strictly differentiable at $x$ , then

[TABLE] 2. (ii)

If $f$ and $g$ are Lipschitz continuous around $x$ , then

[TABLE]

Proof.

(i): [Mor06, Proposition 1.107]. (ii): [Mor06, Propositions 1.111 and 3.45]. ∎

If $f$ is lsc around $x$ , then a convenient, and often used, representation for the limiting subdifferential is given by [Mor06, Theorems 1.89 & 2.34]

[TABLE]

where $\widehat{\partial}f:=\widehat{\partial}_{0}f$ is the so-called Fréchet subdifferential of $f$ . However, in what follows, it will be necessary to consider the subdifferentials of both $f$ and $-f$ simultaneously. In this case, (22) cannot not be applied because $f$ and $-f$ are usually not simultaneously lsc (e.g., if $f$ takes the value $+\infty$ ). Further, it is worth emphasizing, that $\partial f$ and $-\partial(-f)$ are, in general, considerably different. For instance, the function $f=|\cdot|\colon\mathbb{R}\to\mathbb{R}$ has

[TABLE]

Combining these two subdifferentials yields the symmetric subdifferential of $f$ which is defined by

[TABLE]

where $\partial^{+}\!f:=-\partial(-f)$ is the so-called limiting upper subdifferential of $f$ . In contrast to the limiting subdifferential, the symmetric subdifferential possess the classical “plus-minus” symmetry (i.e., $\partial^{0}\!(-f)=-\partial^{0}\!f$ ). Also note that, if $f$ is strictly differentiable at $x$ , then [Mor06, Corollary 1.82]

[TABLE]

If $f$ is convex, then the limiting subdifferential reduces to the convex subdifferential (or Fenchel subdifferential) of convex analysis [Mor06, Theorem 1.93], that is,

[TABLE]

and we have the inclusions

[TABLE]

The following property for the limiting subdifferential is mentioned without proof in [Mor06, RW98] and we therefore we provide one for the convenience of the reader. Furthermore, note that lower semicontinuity is not assumed and so we cannot simply appeal to the representation (22).

Lemma 2.6 (Scalar multiplication rule).

Let $f\colon X\to\left[-\infty,+\infty\right]$ . Then

[TABLE]

Proof.

Let $\lambda\in\mathbb{R}\smallsetminus\{0\}$ . Then $|f(x)|<+\infty$ if and only if $|\lambda f(x)|<+\infty$ for all $x\in X$ . In particular, this shows that (28) holds at points at which $f$ is not finite. Assume now that $x\in X$ with $|f(x)|<+\infty$ . By the definition of the analytic $\varepsilon$ -subdifferential of $f$ ,

[TABLE]

Hence, if $\lambda>0$ , then as $|f(x)|<+\infty$ , we may apply (18) to deduce that

[TABLE]

The argument for $\lambda<0$ is performed analogously. ∎

Remark 2.7 (Multiplication by zero).

Care must be exercised in the case that $\lambda=0$ in Lemma 2.6. Consider, for instance, the lsc convex function $f\colon\mathbb{R}\to\left[-\infty,+\infty\right]$ defined by

[TABLE]

which has $\partial f(\pm 1)=\varnothing$ . Under the convention that $0\cdot(+\infty)=+\infty$ , it follows that $0\cdot f=\iota_{\left[-1,1\right]}$ , where $\iota_{C}$ is the indicator function of a set $C$ , so that $\partial(0\cdot f)=N_{\left[-1,1\right]}$ and

[TABLE]

Alternatively, under the convention that $0\cdot(+\infty)=0=0\cdot(-\infty)$ as suggested in [RW98, Section 1E], we have $0\cdot f=0$ and hence that $\partial(0\cdot f)(\pm 1)=\partial(0)(\pm 1)=\{0\}$ .

For our purposes, both conventions are problematic, and thus we shall treat the cases of $\lambda=0$ directly as it arises.

As holds for the limiting subdifferential, the symmetric subdifferential also enjoys the following robustness property.

Lemma 2.8 (Robustness of the symmetric subdifferential).

Let $f\colon X\to[-\infty,+\infty]$ and let $x\in X$ with $|f(x)|<+\infty$ . Then the symmetric subdifferential has the following robustness property

[TABLE]

Proof.

It is clear that

[TABLE]

To prove the opposite inclusion, we assume that $x_{n}\stackrel{{\scriptstyle f}}{{\to}}x$ and $x_{n}^{*}\to x^{*}$ with $x_{n}^{*}\in\partial^{0}\!f(x_{n})$ . Since $\partial^{0}\!f=\partial f\cup\partial^{+}\!f$ , by passing to subsequences if necessary, it suffices to prove the results assuming that the sequence $(x_{n}^{*})_{{n\in{\mathbb{N}}}}$ is contained only in either $\partial f$ or $\partial^{+}\!f$ . To this end, by a diagonal subsequence argument we derive from (18) that $\partial f$ and $\partial^{+}\!f$ both have the robustness property. Thus, in either case, the result follows. ∎

Lemma 2.9 (Upper semicontinuity of the symmetric subdifferential).

Let $f\colon X\to[-\infty,+\infty]$ be Lipschitz continuous around $x\in X$ with $|f(x)|<+\infty$ , and consider sequences $(x_{n})_{n\in{\mathbb{N}}}$ and $(x_{n}^{*})_{n\in{\mathbb{N}}}$ in $X$ such that $x_{n}\to x$ and $x_{n}^{*}\in\partial^{0}\!f(x_{n})$ for every ${n\in{\mathbb{N}}}$ . Then $(x_{n}^{*})_{n\in{\mathbb{N}}}$ is bounded and its cluster points are contained in $\partial^{0}\!f(x)$ .

Proof.

By assumption, there exist a neighborhood $U$ of $x$ and a constant $\ell\in\mathbb{R}_{+}$ such that $f$ is Lipschitz continuous on $U$ with modulus $\ell$ . In particular, $f$ and $-f$ are Lipschitz continuous around each $u\in U$ with modulus $\ell$ . By [Mor06, Corollary 1.81] and (24), we have that

[TABLE]

Since $x_{n}\to x$ , it follows that $(x_{n}^{*})_{n\in{\mathbb{N}}}$ is bounded.

Let $x^{*}$ be a cluster point of $(x_{n}^{*})_{n\in{\mathbb{N}}}$ . Then there is a subsequence $(x_{k_{n}}^{*})_{n\in{\mathbb{N}}}$ converging to $x^{*}$ . Noting that $x_{k_{n}}\to x$ , the Lipschitz continuity of $f$ around $x$ yields $x_{k_{n}}\stackrel{{\scriptstyle f}}{{\to}}x$ . Now apply Lemma 2.8. ∎

2.4 Coercivity

Recall that a function $f\colon X\to\left[-\infty,+\infty\right]$ is coercive if

[TABLE]

For convenience, we recall some basic properties of coercivity.

Fact 2.10 (Coercive functions).

Let $f\colon X\to\left[-\infty,+\infty\right]$ . Then the following hold:

(i)

$f$ * is coercive if and only if its lower level sets $\operatorname{lev}_{\leq\xi}f$ are bounded for all $\xi\in\mathbb{R}$ .* 2. (ii)

If $f$ is proper, convex, and coercive, then $\inf f(X)>-\infty$ .

Proof.

(i): [BC11, Proposition 11.11]. (ii): [BCN06, Lemma 2.13]. ∎

The following preparatory lemma shows that coercivity is preserved under direct sums.

Lemma 2.11.

Let $f\colon X\to\left[-\infty,+\infty\right]$ and let $g\colon Y\to\left[-\infty,+\infty\right]$ , where $X$ and $Y$ are real Hilbert spaces. Set $h\colon X\times Y\to\left[-\infty,+\infty\right]\colon(x,y)\mapsto f(x)+g(y)$ . Suppose that $f$ and $g$ are proper, convex, and coercive on $X$ and $Y$ , respectively. Then $h$ is proper, convex, and coercive on $X\times Y$ .

Proof.

It immediately follows by assumption and definition that $h$ is proper and convex. Now 2.10 (ii) implies that

[TABLE]

Suppose, by way of a contradiction, that $h$ is not coercive. Then, there exists a sequence $(x_{n},y_{n})_{n\in{\mathbb{N}}}$ in $X\times Y$ such that $\|(x_{n},y_{n})\|=\sqrt{\|x_{n}\|^{2}+\|y_{n}\|^{2}}\to+\infty$ and $(h(x_{n},y_{n}))_{n\in{\mathbb{N}}}$ is bounded above that is, there exists $\mu\in\mathbb{R}$ such that

[TABLE]

Combining with (37), we obtain that $(f(x_{n}))_{n\in{\mathbb{N}}}$ and $(g(y_{n}))_{n\in{\mathbb{N}}}$ are bounded above. But since $f$ and $g$ are coercive, $(x_{n})_{n\in{\mathbb{N}}}$ and $(y_{n})_{n\in{\mathbb{N}}}$ must therefore be bounded, and thus so is $(x_{n},y_{n})_{n\in{\mathbb{N}}}$ which contradicts the fact that $\|(x_{n},y_{n})\|\to+\infty$ . ∎

3 The Douglas–Rachford algorithm for finding a zero of a function

From herein, we assume that

[TABLE]

Note that, since $f$ is assumed proper, $\operatorname{gra}f$ is necessarily a closed set whenever $f$ is continuous throughout its effective domain in the sense that

[TABLE]

As the following examples show, the converse need not hold (i.e., the graph of a discontinuous function can be closed) and, in general, mere lsc is not sufficent to ensure closedness of the graph.

Example 3.1 (A discontinuous, lsc function with closed graph).

Consider $f\colon\mathbb{R}\to\mathbb{R}$ defined by

[TABLE]

Then $f$ is continuous except at $x=0$ where it is merely lsc. In particular, $f$ is lsc but not continuous. However, $f$ does have a closed graph. Indeed, the graph of $f$ may be expressed as the union of two closed sets: $\operatorname{gra}f=\operatorname{gra}\left(1/|\cdot|\right)\cup\{(0,0)\}$ where we note that $\operatorname{gra}\left(1/|\cdot|\right)$ is closed since $x\mapsto 1/|x|$ is continuous on its domain. ∎

It is known, see for instance [BC11, Corollary 9.15], that every proper lsc convex function $f\colon\mathbb{R}\to\left[-\infty,+\infty\right]$ is continuous throughout the closure of $\operatorname{dom}f$ and hence has a closed graph. However, this does not hold for proper lsc convex functions in $\mathbb{R}^{2}$ which, as a consequence, gives rise to the following example.

Example 3.2 (A proper lsc convex function with nonclosed graph).

Consider $f\colon\mathbb{R}^{2}\to\left[-\infty,+\infty\right]$ defined by

[TABLE]

Then $f$ is proper, lsc, and convex, as shown in [BC11, Example 9.27]. Now setting $(\forall{n\in{\mathbb{N}}})$ $x_{n}=(1/(n+1)^{2},1/(n+1))$ , we have that the sequence $(x_{n},f(x_{n}))_{n\in{\mathbb{N}}}$ lies in $\operatorname{gra}f$ but its limit $((0,0),1)\notin\operatorname{gra}f$ , hence $\operatorname{gra}f$ is not closed.∎

Our focus is the feasibility problem (4) in the product Hilbert space $X\times\mathbb{R}$ with constraints

[TABLE]

where $A\cap B\neq\varnothing$ . Note that, in the case in which $B$ is the epigraph of a proper lower semicontinuous function (and hence a nonempty closed convex set), the convergence of the Douglas–Rachford algorithm was previously studied in [BD16, BDNP16a, BDNP16]. Until this work, the case in which $B$ is the graph of a proper function had not been considered even for the class of convex functions. It is also clear that, equivalently, our problem may be posed as

[TABLE]

under the assumption that $f^{-1}(0)\neq\varnothing$ . In what follows, the sequence $(z_{n})_{n\in{\mathbb{N}}}$ shall denote a DR sequence for (43), that is, any sequence which satisfies

[TABLE]

In this setting, the projector onto $A$ and the reflector across $A$ are given, respectively, by

[TABLE]

Although the two possible DR operators, $T_{A,B}$ and $T_{B,A}$ , associated with $A$ and $B$ give different algorithms, since $A$ is a subspace, it holds that $T_{B,A}^{n}=R_{A}T_{A,B}^{n}R_{A}$ for every $n\in\mathbb{N}$ (see [BM16, Theorem 2.7(i) & Remark 2.10(ii)-(iii)]). Thus in order to study the DRA corresponding to $T_{B,A}$ it suffices just to study the DRA corresponding to $T_{A,B}$ .

To begin, we collect some preparatory lemmas which we use to give a precise description of the DR iteration for the sets $A$ and $B$ in (43). Our first result is concerned with the range of the DR operator.

Lemma 3.3 (Range of $T_{A,B}$ ).

The following assertions hold.

(i)

$(\forall(x,\rho)\in X\times\mathbb{R})\quad T_{A,B}(x,\rho)=(0,\rho)+P_{B}(x,-\rho)$ . 2. (ii)

$\operatorname{ran}T_{A,B}:=T_{A,B}(X\times\mathbb{R})\subseteq\operatorname{dom}f\times\mathbb{R}$ .

Proof.

Let $(x,\rho)\in X\times\mathbb{R}$ . (i): Combining Lemma 2.2 (i) and (46) yields

[TABLE]

(ii): Since $B\subseteq\operatorname{dom}f\times\mathbb{R}$ , it follows from (i) that

[TABLE]

which completes the proof. ∎

Note that, in view of Lemma 3.3 (ii), from now on it suffices to assume that

[TABLE]

In the following lemma, we turn our attention to the projector onto $B=\operatorname{gra}f$ . The provided characterization for $P_{B}$ will then be used in Lemma 3.5 to describe the DR operator relative to $(A,B)$ .

Lemma 3.4 (Projector onto the graph of $f$ ).

Let $(x,\rho)\in\operatorname{dom}f\times\mathbb{R}$ . Then $P_{B}(x,\rho)\neq\varnothing$ and, for any $(p,\pi)\in P_{B}(x,\rho)$ , it holds that $p\in\operatorname{dom}f$ and $\pi=f(p)$ . In addition, the following assertions hold.

(i)

If $f$ is Lipschitz continuous around $p$ , then

[TABLE] 2. (ii)

If $f$ is convex and $p\in\operatorname{int}\operatorname{dom}\,f$ , then

[TABLE]

Proof.

The existence of a point $(p,\pi)\in P_{B}(x,\rho)$ is ensured since the set $B=\operatorname{gra}f$ is a nonempty closed subset of $X\times\mathbb{R}$ . Since $(p,\pi)\in B=\operatorname{gra}f$ , it holds that $p\in\operatorname{dom}f$ and $\pi=f(p)$ .

(i): Since $(p,f(p))\in P_{B}(x,\rho)$ , we have that

[TABLE]

and, applying Fermat’s rule (2.4), gives

[TABLE]

Using the sum and product rules (2.5) and noting that $\|\cdot-x\|^{2}$ is continuously (Fréchet) differentiable and hence strictly differentiable on $X$ with $\nabla\|\cdot-x\|^{2}(p)=2(p-x)$ (see, for instance, [BC11, Example 16.11 & Corollary 17.36]), we deduce that

[TABLE]

Now by the scalar multiplication rule (Lemma 2.6),

[TABLE]

Finally, if $f(p)=\rho$ , then, since by assumption $p\in\operatorname{int}\operatorname{dom}f$ , the function $2(f(p)-\rho)(f(\cdot)-\rho)$ is zero around $p$ . Consequently, $\partial\big{(}2(f(p)-\rho)(f(\cdot)-\rho)\big{)}(p)=\{0\}=2(f(p)-\rho)\partial f(p)$ where $\partial f(p)\neq\varnothing$ due to [Mor06, Corollary 2.25]. Altogether, we have proven (50).

(ii): Since $f$ is proper and convex, $f$ is locally Lipschitz continuous on $\operatorname{int}\operatorname{dom}\,f$ [BC11, Corollary 8.32]. The claim thus follows from (i). ∎

Lemma 3.5 (One DR step).

Let $(x,\rho)\in\operatorname{dom}f\times\mathbb{R}$ and let $(x_{+},\rho_{+})\in T_{A,B}(x,\rho)$ . Then

[TABLE]

Suppose, in addition, that $f$ is Lipschitz continuous around $x_{+}$ . Then there exists $x^{*}\in\partial^{0}\!f(x_{+})$ such that

[TABLE]

and, furthermore, the following assertions hold.

(i)

If $f$ is strictly differentiable at $x_{+}$ with $\nabla f(x_{+})=0$ , then $x_{+}=x$ . 2. (ii)

If $f$ is convex and $0\in\partial f(x_{+})$ , then either $x_{+}=x$ or $0\not\in\partial f(x)$ .

Proof.

It follows from Lemma 3.3 (i) that $(x_{+},\rho_{+}-\rho)=(x_{+},\rho_{+})-(0,\rho)\in P_{B}(x,-\rho)$ and from Lemma 3.4 that $x_{+}\in\operatorname{dom}f$ and $\rho_{+}-\rho=f(x_{+})$ . Altogether, $(x_{+},f(x_{+}))\in P_{B}(x,-\rho)$ and $\rho_{+}=\rho+f(x_{+})$ . The former implies that

[TABLE]

which completes the proof of (56).

Now assume that $f$ is Lipschitz continuous around $x_{+}$ . By Lemma 3.4 (i),

[TABLE]

from which (57) follows since $\partial^{0}\!f(x_{+})=\partial f(x_{+})\cup\partial^{+}\!f(x_{+})$ . Furthermore, we argue as follows.

(i): If $f$ is strictly differentiable at $x_{+}$ with $\nabla f(x_{+})=0$ , then $\partial f(x_{+})=\partial^{+}\!f(x_{+})=\partial^{0}\!f(x_{+})=\{0\}$ , and so $x_{+}^{*}=0$ , which gives $x_{+}=x$ .

(ii): Suppose $f$ is convex, $0\in\partial f(x_{+})$ and $x_{+}\neq x$ . Then (56) yields

[TABLE]

Since $0\in\partial f(x_{+})$ , we have $f(x_{+})=\min f(X)$ and hence $f(x_{+})\leq f(x)$ . By (60), the inequality is actually strict, that is, $f(x_{+})<f(x)$ which implies that $f(x)>\min f(X)$ and hence $0\not\in\partial f(x)$ . ∎

Recall that the set of fixed points of $T_{A,B}$ is the set $\operatorname{Fix}T_{A,B}:=\{{z\in X\times\mathbb{R}}~{}\big{|}~{}{z\in T_{A,B}z}\}$ . If $A$ and $B$ were convex sets, the fixed point of the DR operator can be precisely described [BCL04, Corollary 3.9]. Although $B$ is not convex in our setting, we are still, nevertheless, able to arrive at the following satisfactory characterization.

Lemma 3.6 (Fixed points of $T_{A,B}$ ).

The following assertions hold.

(i)

$(\forall(x,\rho)\in\operatorname{Fix}T_{A,B})$ * $f(x)=0$ and $(x,0)\in P_{B}(x,-\rho)$ . Consequently,*

[TABLE] 2. (ii)

If $\min f(X)=0$ , then

[TABLE] 3. (iii)

If $f$ is locally Lipschitz continuous on $f^{-1}(0)$ , then

[TABLE]

In particular, if $f^{-1}(0)\cap(\partial f)^{-1}(0)=f^{-1}(0)\cap(\partial^{+}\!f)^{-1}(0)=\varnothing$ , then $\operatorname{Fix}T_{A,B}=A\cap B$ . 4. (iv)

If $f$ is convex and $f^{-1}(0)\subseteq\operatorname{int}\operatorname{dom}\,f$ , then

[TABLE]

In particular, if $\inf f(X)<0$ , then $\operatorname{Fix}T_{A,B}=A\cap B$ .

Proof.

(i): Let $(x,\rho)\in\operatorname{Fix}T_{A,B}$ . Then, by Lemma 3.3 (i), we have

[TABLE]

On the one hand, (65) implies $(x,0)\in B=\operatorname{gra}f$ , so that $f(x)=0$ , and hence $(x,\rho)\in f^{-1}(0)\times\mathbb{R}$ . On the other hand, (65) gives

[TABLE]

which proves that $P_{A}(x,\rho)\in A\cap B$ . We deduce that $\operatorname{Fix}T_{A,B}\subseteq f^{-1}(0)\times\mathbb{R}$ and $P_{A}\operatorname{Fix}T_{A,B}\subseteq A\cap B$ . It straight-forward to show that $A\cap B\subseteq\operatorname{Fix}T_{A,B}$ from which it follows that $A\cap B=P_{A}(A\cap B)\subseteq P_{A}\operatorname{Fix}T_{A,B}$ .

(ii): We immediately have that $A\cap B=f^{-1}\times\{0\}\subseteq f^{-1}(0)\times\mathbb{R}_{+}$ . Now let $(x,\rho)\in f^{-1}(0)\times\mathbb{R}_{+}$ . Again by Lemma 3.3 (i), $T_{A,B}(x,\rho)=(0,\rho)+P_{B}(x,-\rho)$ . It follows from $\min f(X)=0=f(x)$ and $\rho\in\mathbb{R}_{+}$ that

[TABLE]

and therefore $P_{B}(x,-\rho)=(x,f(x))=(x,0)$ , which yields $T_{A,B}(x,\rho)=(0,\rho)+(x,0)=(x,\rho)$ , that is, $(x,\rho)\in\operatorname{Fix}T_{A,B}$ . Hence $f^{-1}(0)\times\mathbb{R}_{+}\subseteq\operatorname{Fix}T_{A,B}$ .

(iii): Let $(x,\rho)\in\operatorname{Fix}T_{A,B}$ . By (i), $f(x)=0$ and $(x,0)\in P_{B}(x,-\rho)$ . If $\rho=0$ , then $f(x)=\rho=0$ , and hence the fixed point $(x,\rho)\in A\cap B$ . If $\rho\neq 0$ , then, by using Lemma 3.4 (i),

[TABLE]

Thus either $(x,\rho)\in f^{-1}(0)\cap(\partial f)^{-1}(0)\times\mathbb{R}_{++}$ or $(x,\rho)\in f^{-1}(0)\cap(\partial^{+}\!f)^{-1}(0)\times\mathbb{R}_{--}$ which completes the proof of the claim.

(iv): By the assumptions on $f$ and [BC11, Corollary 8.32], $f$ is locally Lipschitz continuous on $f^{-1}(0)\subseteq\operatorname{int}\operatorname{dom}\,f$ . The first claim by applying (iii) and notating from convexity that $\partial f=\partial f^{0}=\partial f\cup\partial^{+}\!f$ (see (27)).

To prove the second claim, suppose that there exists $x\in f^{-1}(0)\cap(\partial f)^{-1}(0)$ , that is, $f(x)=0$ and $0\in\partial f(x)$ . But then $\min f(X)=f(x)=0$ which contradicts the assumption that $\inf f(X)<0$ , hence we deduce that $f^{-1}(0)\cap(\partial f)^{-1}(0)=\varnothing$ . The conclusion follows. ∎

Roughly speaking, Lemma 3.6 shows that the fixed point set of $T_{A,B}$ consists of two parts: the intersection $A\cap B$ and a set containing critical points of $f$ . In the following result, we give conditions under which the DRA stays away from critical points.

For convenience, we denote $\Delta:=T_{A,B}(\operatorname{dom}f\times\mathbb{R})$ and the first coordinate projection by

[TABLE]

Corollary 3.7.

Suppose that one of the following holds:

(i)

$f$ * is locally Lipschitz continuous on $\Pi(\Delta)$ and, for every $x\in(\nabla f)^{-1}(0)$ , either*

[TABLE] 2. (ii)

$f$ * is convex with $\operatorname{dom}f$ open, and $\inf f(X)<\min\{0,\sup f(\operatorname{dom}f)\}$ .*

Then the set $S:=\{{{n\in{\mathbb{N}}}}~{}\big{|}~{}{\text{$ f $is strictly differentiable at$ x_{n} $with$ \nabla f(x_{n})=0 $}}\}$ is bounded.

Proof.

(i): By way of a contraction, suppose that $S$ is unbounded. In this case, we claim that $S=\mathbb{N}$ and that the sequence $(x_{n})_{{n\in{\mathbb{N}}}}$ is constant. To see this, observe that if $n\in S$ (i.e., $f$ is strictly differentiable at $x_{n}$ with $\nabla f(x_{n})=0$ ), then Lemma 3.5 (i) yields that $x_{n-1}=x_{n}$ . In particular, $f$ is strictly differentiable at $x_{n-1}$ with $\nabla f(x_{n-1})=0$ . The claim now follows by descending induction on $n$ .

Now, set $x:=x_{0}=x_{n}$ for any ${n\in{\mathbb{N}}}$ . Let $y\in\operatorname{dom}f$ . For all ${n\in{\mathbb{N}}}$ , since $(x_{n+1},f(x_{n+1}))\in P_{B}(x_{n},-\rho_{n})$ , the definition of $P_{B}$ implies

[TABLE]

Since $\nabla f(x)=0$ , (70) implies that either $f(x)<0$ or $f(x)>0$ . In the former case, Lemma 3.5 implies $\rho_{n+1}=\rho_{0}+nf(x)\to-\infty$ as $n\to\infty$ , and hence $f(x)+f(y)+2\rho_{n}\to-\infty$ . Since $\|y-x\|^{2}$ is fixed, (71b) implies that $f(x)-f(y)\geq 0$ . Since $y\in\operatorname{dom}f$ was chosen arbitrarily, $f(x)=\sup f(\operatorname{dom}f)$ , which contradicts the fact that $f(x)=\inf f(X)<\sup f(\operatorname{dom}f)$ . The case in which $f(x)>0$ is proven analogously.

(ii): By assumption and [BC11, Corollary 8.32], $f$ is locally Lipschitz continuous on $\operatorname{dom}f\supseteq\Pi(\Delta)$ . By convexity of $f$ , if $x\in(\nabla f)^{-1}(0)$ , then $f(x)=\inf f(X)<\min\{0,\sup f(\operatorname{dom}f)\}$ , hence (70) is satisfied. The result now follows from (i). ∎

Remark 3.8.

A convex function is strictly differentiable at every point where it is Gâteaux differentiable. Indeed, supposing that a function $f$ is convex and Gâteaux differentiable at $x\in\operatorname{dom}f$ , it then follows, from (26) and [BC11, Proposition 17.26(i)], that $\partial f(x)=\{\nabla f(x)\}$ is a singleton, and, from [BC11, Proposition 17.39], that $f$ is lsc at $x$ and $x\in\operatorname{int}\operatorname{dom}\,f$ . By combining with [RW98, Proposition 8.12 & Theorem 9.18(a) & (c)], $f$ is strictly differentiable at $x$ .

The following result shows that, under a differentiation assumption, the inverse of the DR operator is continuous. This property, and its connection to stability, is explored further in Section 4.

Corollary 3.9.

Suppose that $f$ is strictly differentiable on an open set $U$ contained in $\Pi(\Delta)$ . Then

[TABLE]

and $T_{A,B}^{-1}$ is continuous on $\Pi^{-1}(U)$ . Consequently, if the limit of a convergent DR sequence is contained in $\Pi^{-1}(U)$ , then it is necessarily a fixed point $z$ of $T_{A,B}$ with $P_{A}z\in A\cap B$ .

Proof.

Let $(y,\sigma)\in\Pi^{-1}(U)$ . Then $y\in U$ and there exists $(x,\rho)\in\operatorname{dom}f\times\mathbb{R}$ such that $(y,\sigma)\in T_{A,B}(x,\rho)$ . Since $f$ is strictly differentiable on $U$ , it is Lipschitz continuous around $y$ with

[TABLE]

By Lemma 3.5, $\sigma=\rho+f(y)$ and $y=x-\sigma\nabla f(y)$ , which proves (72). To deduce the continuity of $T^{-1}_{A,B}$ , observe that, since $f$ is strictly differentiable on $U$ , $\nabla f$ is continuous on $U$ [RW98, Corollary 9.19(a)–(b)].

Finally, let $(z_{n})_{n\in{\mathbb{N}}}$ be a DR sequence which converges to a point $z=(x,\rho)\in U$ . Without loss of generally, we can and do assume that $z_{n}=(x_{n},\rho_{n})\in U$ for every ${n\in{\mathbb{N}}}$ . Then, using the continuity of $T^{-1}_{A,B}$ and the fact that $z_{n-1}=T^{-1}_{A,B}(z_{n})$ gives

[TABLE]

which shows that $z\in T_{A,B}z$ and thus $z\in\operatorname{Fix}T_{A,B}$ . In turn, applying Lemma 3.6 (i) yields $P_{A}z\in P_{A}\operatorname{Fix}T_{A,B}=A\cap B$ . ∎

4 Stability and local convergence

In this section, we use an inverse function argument to give a condition under which the DRA algorithm is stable around fixed points in the sense of Lipschitz continuity. Again, we emphasize that alone such results do not guarantee convergence of the DRA. This question will be addressed in Section 5.

To begin, we recall two facts which will be of use: an inverse function theorem, and the so-called Sherman–Morrison formula.

Fact 4.1 (Single-valued Lipschitzian invertibility).

Let $T\colon\mathbb{R}^{m}\to\mathbb{R}^{m}$ be strictly differentiable at $\bar{x}$ . If $\nabla T(\bar{x})$ is nonsingular, then $T^{-1}$ has a Lipschitz continuous single-valued localization, $S$ , around $\bar{y}:=T\bar{x}$ for $\bar{x}$ . Moreover, the Lipschitz modulus of $S$ at $\bar{y}$ is equal to $\|\nabla T(\bar{x})^{-1}\|$ and $S$ is strictly differentiable at $\bar{y}$ with $\nabla S(\bar{y})=\nabla T(\bar{x})^{-1}$ .

Proof.

This is a special case of [RW98, Corollary 9.55]. ∎

Fact 4.2 (Sherman–Morrison formula).

Let $M$ be a nonsingular square matrix and let $u$ and $v$ be column vectors of appropriate dimensions so that the following multiplication operators are well defined. Then the following assertions hold.

(i)

If $1+v^{\top}M^{-1}u\neq 0$ , then $M+uv^{\top}$ is nonsingular and

[TABLE] 2. (ii)

If $M+uv^{\top}$ is singular, then $1+v^{\top}M^{-1}u=0$ .

Proof.

(i): See [SM50]. (ii): This is the contrapositive of (i). ∎

We are ready to prove our main result regarding stability of the DRA. In the following, $\succeq$ denotes the Löwner partial order on the space of symmetric matrices. We say that $f$ is twice strictly differentiable at $\bar{x}$ if $f$ is differentiable around $\bar{x}$ and $\nabla f$ is strictly differentiable at $\bar{x}$ .

Theorem 4.3 (Stability of the DRA).

Let $\bar{z}:=(\bar{x},\bar{\rho})\in\operatorname{Fix}T_{A,B}$ , and suppose that $f$ is twice strictly differentiable at $\bar{x}$ and that $\bar{\rho}\nabla^{2}f(\bar{x})\succeq 0$ . Then $(T_{A,B}^{-1})^{-1}$ has a Lipschitz continuous single-valued localization, $S$ , around $\bar{z}$ for $\bar{z}$ which is strictly differentiable at $\bar{z}$ and has Lipschitz modulus at $\bar{z}$ equal to $\ell\leq 1$ where

[TABLE]

Furthermore, if $\bar{z}=(\bar{x},0)\in A\cap B\subseteq\operatorname{Fix}T_{A,B}$ , then $S$ and $T_{A,B}$ coincide on a neighborhood of $\bar{z}$ .

Proof.

Since $f$ is twice strictly differentiable at $\bar{x}$ , $\nabla f$ both exists and is Lipschitz continuous around $\bar{x}$ . In particular, $f$ is continuous differentiable around $\bar{x}$ and, consequently, strictly differentiable around $\bar{x}$ . Therefore, for every $(x_{+},\rho_{+})\in X\times\mathbb{R}$ with $x^{+}$ sufficiently close to $\bar{x}$ , Corollary 3.9 gives that

[TABLE]

and, since $\nabla f$ is strictly differentiable at $\bar{x}$ , $T^{-1}_{A,B}$ is strictly differentiable at $\bar{z}$ with Jacobian given by

[TABLE]

Now, by distinguishing two cases, we show that $\nabla T^{-1}(\bar{z})$ is nonsingular and

[TABLE]

Case 1: Assume $\bar{\rho}=0$ . Then (78) becomes

[TABLE]

and hence $\det\nabla T^{-1}_{A,B}(\bar{z})=\det(\operatorname{Id}+\nabla f(\bar{x})\nabla f(\bar{x})^{\top})$ . Noting that

[TABLE]

it follows from 4.2 (i) that $\operatorname{Id}+\nabla f(\bar{x})\nabla f(\bar{x})^{\top}$ is nonsingular and that

[TABLE]

Therefore, $\det\nabla T^{-1}_{A,B}(\bar{z})=\det(\operatorname{Id}+\nabla f(\bar{x})\nabla f(\bar{x})^{\top})\neq 0$ and hence, in particular, $\nabla T^{-1}_{A,B}(\bar{z})$ is nonsingular. To estimate $\|M\|$ where $M:=(\nabla T^{-1}_{A,B}(\bar{z}))^{-1}$ , recall that $\|M\|=\sqrt{\lambda_{\max}(M^{\top}M)},$ where $\lambda_{\max}$ denotes the largest eigenvalue. Using block matrix inversion and (82) gives

[TABLE]

and so

[TABLE]

Let $\lambda$ be an eigenvalue of $\operatorname{Id}-\alpha\nabla f(\bar{x})\nabla f(\bar{x})^{\top}$ , that is,

[TABLE]

If $\lambda=1$ , then we must have $\det(-\alpha\nabla f(\bar{x})\nabla f(\bar{x})^{\top})=0$ , which occurs if and only if $\dim X>1$ or $\nabla f(\bar{x})=0$ . Otherwise, using (85) and 4.2 (ii) yields

[TABLE]

Hence, either $\lambda=1$ or $\lambda=\alpha$ . In either case,

[TABLE]

and, by noting that $\alpha\leq 1$ with equality if and only if $\nabla f(\bar{x})=0$ , we deduce that

[TABLE]

Case 2: Assume $\bar{\rho}\neq 0$ . Then $\bar{z}\in\operatorname{Fix}T_{A,B}\smallsetminus(A\cap B)$ and, by Lemma 3.6 (iii), $\bar{x}\in f^{-1}(0)\cap(\nabla f)^{-1}(0)$ (i.e., $f(\bar{x})=0$ and $\nabla f(\bar{x})=0$ ). In turn, (78) becomes

[TABLE]

and, since $\bar{\rho}\nabla f^{2}(\bar{x})\succeq 0$ by assumption, we have $\operatorname{Id}+\bar{\rho}\nabla f^{2}(\bar{x})\succeq\operatorname{Id}$ so that

[TABLE]

where $\lambda_{\min}$ denotes the smallest eigenvalue. We therefore have that both $\operatorname{Id}+\bar{\rho}\nabla f^{2}(\bar{x})$ and $\nabla T^{-1}_{A,B}(\bar{z})$ are nonsingular and, moreover, that

[TABLE]

Using (90) yields $0<\lambda\leq 1$ for every eigenvalue $\lambda$ of $(\operatorname{Id}+\bar{\rho}\nabla^{2}f(\bar{x}))^{-1}$ , and as the matrix is symmetric, we have

[TABLE]

Noting that $\nabla f(\bar{x})=0$ , we see that this completes the proof of (79).

In either of the above cases, we have that $\nabla T^{-1}_{A,B}$ is nonsingular at $\bar{z}$ and that $\|(\nabla T^{-1}_{A,B}(\bar{z}))^{-1}\|$ satisfies (79). Now, as $\bar{z}\in T_{A,B}\bar{z}$ and $T_{A,B}^{-1}$ is single-valued at $\bar{z}$ , it follows that $\bar{z}=T^{-1}_{A,B}\bar{z}$ . By applying 4.1, we deduce that $(T^{-1}_{A,B})^{-1}$ has a Lipschitz continuous single-valued localization, $S$ , around $\bar{z}$ for $\bar{z}$ which is strictly differentiable at $\bar{z}$ , and which has Lipschitz modulus at $\bar{z}$ equal to $\|\nabla S(\bar{z})\|=\|(\nabla T^{-1}_{A,B}(\bar{z}))^{-1}\|\leq 1$ .

Further assume that $\bar{z}=(\bar{x},0)\in A\cap B$ . We shall show that $S$ coincides with $T_{A,B}$ around $\bar{z}$ . First we note that since $S$ is a localization at $\bar{z}$ for $\bar{z}$ , by definition, there exist neighborhoods $U$ and $V$ of $\bar{z}$ such that

[TABLE]

Now set $\delta>0$ such that $\mathbb{B}\left({\bar{z}};{\delta}\right)\subseteq U$ and $\mathbb{B}\left({\bar{z}};{2\delta}\right)\subseteq V$ . Applying Lemma 2.2 (ii) gives

[TABLE]

As $T_{A,B}\subseteq(T_{A,B}^{-1})^{-1}$ , combining (93) with (94) gives that

[TABLE]

and since $T_{A,B}z\neq\varnothing$ and $Sz$ is a singleton, the above inclusion must be an equality. This yields $T_{A,B}=S$ on $\mathbb{B}\left({\bar{z}};{\delta}\right)$ , as was claimed. We therefore deduce that $T_{A,B}$ is single-valued and locally Lipschitz on $\mathbb{B}\left({\bar{z}};{\delta}\right)$ with modulus at $\bar{z}$ equal to $\ell:=\|\nabla T_{A,B}(\bar{z})\|=\|\nabla S(\bar{z})\|=\|(\nabla T^{-1}_{A,B}(\bar{z}))^{-1}\|\leq 1$ satisfying (88). This completes the proof. ∎

A closer inspection of the proof of Theorem 4.3 shows that it actually proves $Q$ -linear convergence of the DRA in a special case. Recall that a sequence $(z_{n})_{n\in{\mathbb{N}}}$ is said to converge $Q$ -linearly to $\bar{z}$ with rate $\kappa\in\left[0,1\right[$ if

[TABLE]

Corollary 4.4 (Local $Q$ -linear convergence of the DRA).

Let $\bar{z}:=(\bar{x},0)\in A\cap B$ , and suppose that $X=\mathbb{R}$ and that $f$ is twice strictly differentiable at $\bar{x}$ with $f^{\prime}(\bar{x})\neq 0$ . Then there exists $\delta\in\mathbb{R}_{++}$ such that $T_{A,B}$ is a single-valued contraction mapping on $\mathbb{B}\left({\bar{z}};{\delta}\right)$ with $T_{A,B}(\mathbb{B}\left({\bar{z}};{\delta}\right))\subset\mathbb{B}\left({\bar{z}};{\delta}\right)$ . Furthermore, for any starting point $z_{0}\in\mathbb{B}\left({\bar{z}};{\delta}\right)$ , the DR sequence $(z_{n})_{n\in{\mathbb{N}}}$ converges $Q$ -linearly to $\bar{z}$ with rate

[TABLE]

Proof.

By applying Theorem 4.3 to $X=\mathbb{R}$ , there exists $\delta\in\mathbb{R}_{++}$ such that $T_{A,B}$ is single-valued and locally Lipschitz continuous on $\mathbb{B}\left({\bar{z}};{\delta}\right)$ with modulus at $\bar{z}$ equal to $\kappa:=\|\nabla T_{A,B}(\bar{z})\|=1/\sqrt{1+|f^{\prime}(\bar{x})|^{2}}<1$ . From the definition of the Lipschitz modulus at $\bar{z}$ , we have

[TABLE]

Let $\kappa^{\prime}\in\left]\kappa,1\right[$ . Then, by shrinking $\delta$ if necessary, we have

[TABLE]

and hence $T_{A,B}$ is a (single-valued) contraction mapping on $\mathbb{B}\left({\bar{z}};{\delta}\right)$ . Substituting $z^{\prime}=\bar{z}$ and noting that $T_{A,B}\bar{z}=\bar{z}$ yield

[TABLE]

which implies that $T_{A,B}(\mathbb{B}\left({\bar{z}};{\delta}\right))\subseteq\mathbb{B}\left({\bar{z}};{\kappa^{\prime}\delta}\right)\subset\mathbb{B}\left({\bar{z}};{\delta}\right)$ and that the DRA sequence $(z_{n})_{{n\in{\mathbb{N}}}}$ converges to $\bar{z}$ whenever $z_{0}\in\mathbb{B}\left({\bar{z}};{\delta}\right)$ . Now since $z_{n}\to\bar{z}$ , the claimed $Q$ -linear rate follows from (98). ∎

Remark 4.5.

Let $\bar{z}:=(\bar{x},\bar{\rho})\in\operatorname{Fix}T_{A,B}$ and suppose that $f$ is twice strictly differentiable at $\bar{x}$ . By Lemma 3.6 (i), $(\bar{x},f(\bar{x}))=(\bar{x},0)\in P_{B}(\bar{x},-\bar{\rho})$ and so

[TABLE]

Differentiating the objective function twice gives

[TABLE]

If $\bar{\rho}\neq 0$ , then since $f(\bar{x})=0$ and $\nabla f(\bar{x})=0$ (Lemma 3.6 (iii)), the second order optimality condition yields

[TABLE]

Let us compare (103) to the assumption in Theorem 4.3. The latter assumed that $\bar{\rho}\nabla^{2}f(\bar{x})\succeq 0$ which is equivalent to

[TABLE]

a condition which is stronger than (103). Nevertheless, (104) holds as soon as one of the following holds: (i) $\bar{\rho}=0$ , (ii) $\bar{\rho}\geq 0$ and $f$ is convex, or (iii) $\bar{\rho}\leq 0$ and $f$ is concave. In fact, when $\bar{\rho}\nabla^{2}f(\bar{x})\succeq 0$ fails, unstable fixed points can arise as is the case in the following example.

Example 4.6 (An unstable fixed point).

Consider $X=\mathbb{R}$ and the function $f=\frac{1}{2}|\cdot|^{2}$ . Appealing to Theorem 4.3, we deduce that $T_{A,B}$ is single-valued and locally Lipschitz around the point $(0,0)\in A\cap B$ . However, $T_{A,B}$ is not locally Lipschitz around the point $\bar{z}=(0,-\frac{1}{2})\in\operatorname{Fix}T_{A,B}\smallsetminus(A\cap B)$ . To see this, let $\varepsilon\geq 0$ and consider the point $z_{\varepsilon}=(-\varepsilon,-\frac{1}{2})$ . We have from Lemma 3.3 (i) that

[TABLE]

Let $(y,f(y))\in P_{B}(-\varepsilon,\frac{1}{2})$ . Then (51) implies that

[TABLE]

To show that $\bar{z}$ is in fact a fixed point, setting $\varepsilon=0$ in (106), we deduce that $y=0$ or $y=-1$ . Further we observe that it cannot be the case that $y=-1$ since

[TABLE]

and so we conclude that $P_{B}(0,\frac{1}{2})=(0,0)$ , which together with (105) gives $T_{A,B}\bar{z}=T_{A,B}z_{0}=(0,-\frac{1}{2})=\bar{z}$ and hence $\bar{z}\in\operatorname{Fix}T_{A,B}$ .

Now, to see that $\bar{z}$ is not stable (in the sense of Lipschitz continuity of $T_{A,B}$ ), consider the point $z_{\varepsilon}=(-\varepsilon,-\frac{1}{2})$ can be made arbitrarily close to $\bar{z}$ by choosing $\varepsilon>0$ sufficiently small. For all $\varepsilon\approx 0$ , the optimality condition (106) has only one solution at $y\approx-1$ . But this implies that

[TABLE]

and consequently that $\|T_{A,B}z_{\varepsilon}-T_{A,B}\bar{z}\|\approx 1$ while $\|z_{\varepsilon}-\bar{z}\|=\varepsilon$ , thus $T_{A,B}$ is not locally Lipschitz around $\bar{z}$ . Note that it does not contradict Theorem 4.3 since the condition $\bar{\rho}f^{\prime\prime}(\bar{x})\geq 0$ is not satisfied.∎

In a later example (Example LABEL:ex:p_norm), we show that in the setting of Example 4.6 the DRA is globally convergent.

Recall that a sequence $(z_{n})_{n\in{\mathbb{N}}}$ is said to converge $R$ -linearly to a point $\bar{z}$ if there exist constants $\eta\in\mathbb{R}_{+}$ and $\kappa\in\left[0,1\right[$ such that

[TABLE]

Clearly the notion of $Q$ -linear convergence implies $R$ -linear convergence.

To complement the results in this section, we deduce following $R$ -linear convergence result using existing results in the literature. Note that, in contrast to setting of Theorem 4.3, the following result only applies to fixed points at which $\nabla f$ is nonsingular.

Proposition 4.7 (Local $R$ -linear convergence of the DRA).

Let $\bar{z}:=(\bar{x},0)\in A\cap B$ , and suppose that $f$ is continuously differentiable around $\bar{x}$ with $\nabla f(\bar{x})\neq 0$ . Then there exists $\delta>0$ such that, for any starting point ${z_{0}\in\mathbb{B}\left({\bar{z}};{\delta}\right)}$ , the DR sequence $(z_{n})_{n\in{\mathbb{N}}}$ converges $R$ -linearly to a point in $A\cap B$ .

Proof.

By assumption, $f$ is continuously differentiable on $U$ from some a neighborhood $U$ of $\bar{x}$ . Define a function $G\colon U\times\mathbb{R}\to\mathbb{R}\colon(x,\rho)\mapsto f(x)-\rho$ and let $D:=\{0\}\subseteq\mathbb{R}$ . Then $U\times\mathbb{R}$ is a neighborhood of $(\bar{x},0)$ , $G$ is a $C^{1}$ mapping, $D$ is a closed convex subset of $\mathbb{R}$ ,

[TABLE]

In view of [RW98, Definition 10.23(b)], $B$ is amenable at $(\bar{x},0)$ and hence superregular at $(\bar{x},0)$ by [LLM09, Proposition 4.8]. Moreover, the normal cones to $A$ and $B$ can be described, respectively, by [Mor06, Proposition 1.2] and [RW98, Example 6.8] as

[TABLE]

Since it is assumed that $\nabla f(\bar{x})\neq 0$ , it follows that $N_{A}(\bar{x},0)\cap(-N_{B}(\bar{x},0))=\{0\}$ , that is, to say that $\{A,B\}$ is strongly regular at $(\bar{x},0)$ . The assumptions of [Pha16, Theorem 4.3] (or [DP16, Corollary 5.22]) are thus satisfied, from which the result follows. ∎

To conclude this section, we note that Theorem 4.3 applies in situations when does not Proposition 4.7. In a subsequent section, we shall revisit the following example.

Example 4.8 (A stable fixed point).

Consider the function $f=\frac{1}{2}\|\cdot\|^{2}$ and the point $\{0\}=A\cap B\subseteq\operatorname{Fix}T_{A,B}\subseteq X\times\mathbb{R}$ . Then $f$ does not satisfy the assumptions of Proposition 4.7 at $(\bar{x},\bar{\rho})=(0,0)$ because $0=f(0)=\nabla f(0)$ . Nevertheless, as $f$ is twice continuously differentiable at $\bar{x}=0$ with $\nabla^{2}f=\operatorname{Id}$ , Theorem 4.3 still applies and shows that the DR operator is single-valued and Lipschitz continuous around $(0,0)$ .∎

5 A Lyapunov-type approach to convergence

In this section, we prove convergence of the DRA assuming the existence of a Lyapunov-type function which is assumed to possess the following properties on a subset of $X\times\mathbb{R}$ . In fact, our framework also provides a procedure for the construction of such a function. In practice, this mean that the candidate Lyapunov-type function can be concretely constructed and its properties easily checked.

Assumption 5.1.

There exists a proper convex function $F\colon D\to\left(-\infty,+\infty\right]$ and a nonempty convex subset $D$ of $\operatorname{dom}F$ such that the following hold:

(i)

The subdifferential of $F$ satisfies

$(\forall x\in D)\quad\partial F(x)\supseteq\begin{cases}\left\{{\frac{f(x)}{\|x^{*}\|^{2}}x^{*}}~{}\Big{|}~{}{x^{*}\in\partial^{0}\!f(x)}\right\}&\text{~{}if~{}}0\notin\partial^{0}\!f(x),\\ \{0\}&\text{~{}if~{}}f(x)=0.\end{cases}$

(112)

(ii)

$F$ * is coercive.*

(iii)

$F$ * is continuous on $D\cap f^{-1}(0)$ .*

The intuition behind 5.1, specifically (112), is the similar to that proposed in [Ben15]. One seeks a function $V\colon D\times\mathbb{R}\to\left[-\infty,+\infty\right]$ of the form

[TABLE]

such that for every $z:=(x,\rho)\in D\times\mathbb{R}$ , its level set at the point $z_{+}$ is tangent to $z-z_{+}$ , where $z_{+}\in T_{A,B}z$ . To do so, we construct an $F$ satisfying 5.1 by anti-subdifferentiating (112). An illustration of such a function is given in LABEL:fig:level_set. In particular, if the function $f$ is strictly differentiable at $x\not\in(\nabla f)^{-1}(0)$ , then (112) becomes

[TABLE]

and further, when $\dim X=1$ , then the expression further simplifies to $F^{\prime}=f/f^{\prime}$ .

The two piecewise-defined cases in (112) are consistent in the sense that, if $0\not\in\partial f(x)$ and $f(x)=0$ , then both cases yield $0\in\partial F(x)$ . The inclusion of the “ $f(x)=0$ ” case allows our analysis to include situations in which the “ $0\not\in\partial^{0}\!f(x)$ ” case has a remove discontinuity.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

A Lyapunov-type approach to convergence of the Douglas–Rachford algorithm

Abstract

1 Introduction

2 Preliminaries

2.1 The Douglas–Rachford algorithm

Fact 2.1** (Projectors and reflectors onto convex sets).**

Proof.

Lemma 2.2** (Properties of the DRA).**

Proof.

2.2 Convexity

Fact 2.3**.**

Proof.

2.3 Subdifferentiability

Fact 2.4** (Fermat’s rule).**

Proof.

Fact 2.5** (Sum and product rules).**

Proof.

Lemma 2.6** (Scalar multiplication rule).**

Proof.

Remark 2.7** (Multiplication by zero).**

Lemma 2.8** (Robustness of the symmetric subdifferential).**

Proof.

Lemma 2.9** (Upper semicontinuity of the symmetric subdifferential).**

Proof.

2.4 Coercivity

Fact 2.10** (Coercive functions).**

Proof.

Lemma 2.11**.**

Proof.

3 The Douglas–Rachford algorithm for finding a zero of a function

Example 3.1** (A discontinuous, lsc function with closed graph).**

Example 3.2** (A proper lsc convex function with nonclosed graph).**

Lemma 3.3** (Range of TA,BT_{A,B}TA,B​).**

Proof.

Lemma 3.4** (Projector onto the graph of fff).**

Proof.

Lemma 3.5** (One DR step).**

Proof.

Lemma 3.6** (Fixed points of TA,BT_{A,B}TA,B​).**

Proof.

Corollary 3.7**.**

Proof.

Remark 3.8**.**

Corollary 3.9**.**

Proof.

4 Stability and local convergence

Fact 4.1** (Single-valued Lipschitzian invertibility).**

Proof.

Fact 4.2** (Sherman–Morrison formula).**

Proof.

Theorem 4.3** (Stability of the DRA).**

Proof.

Corollary 4.4** (Local QQQ-linear convergence of the DRA).**

Proof.

Remark 4.5**.**

Example 4.6** (An unstable fixed point).**

Proposition 4.7** (Local RRR-linear convergence of the DRA).**

Proof.

Example 4.8** (A stable fixed point).**

5 A Lyapunov-type approach to convergence

Assumption 5.1**.**

Fact 2.1 (Projectors and reflectors onto convex sets).

Lemma 2.2 (Properties of the DRA).

Fact 2.3.

Fact 2.4 (Fermat’s rule).

Fact 2.5 (Sum and product rules).

Lemma 2.6 (Scalar multiplication rule).

Remark 2.7 (Multiplication by zero).

Lemma 2.8 (Robustness of the symmetric subdifferential).

Lemma 2.9 (Upper semicontinuity of the symmetric subdifferential).

Fact 2.10 (Coercive functions).

Lemma 2.11.

Example 3.1 (A discontinuous, lsc function with closed graph).

Example 3.2 (A proper lsc convex function with nonclosed graph).

Lemma 3.3 (Range of $T_{A,B}$ ).

Lemma 3.4 (Projector onto the graph of $f$ ).

Lemma 3.5 (One DR step).

Lemma 3.6 (Fixed points of $T_{A,B}$ ).

Corollary 3.7.

Remark 3.8.

Corollary 3.9.

Fact 4.1 (Single-valued Lipschitzian invertibility).

Fact 4.2 (Sherman–Morrison formula).

Theorem 4.3 (Stability of the DRA).

Corollary 4.4 (Local $Q$ -linear convergence of the DRA).

Remark 4.5.

Example 4.6 (An unstable fixed point).

Proposition 4.7 (Local $R$ -linear convergence of the DRA).

Example 4.8 (A stable fixed point).

Assumption 5.1.