A proximal point algorithm revisited and extended

Gheorghe Morosanu

arXiv:1703.04051·math.OC·March 14, 2017·J. Optim. Theory Appl.

A proximal point algorithm revisited and extended

Gheorghe Morosanu

PDF

TL;DR

This paper revisits and extends a proximal point algorithm for finding zeros of maximal monotone operators, providing convergence analysis, practical implementation guidance, and simulation results to demonstrate its effectiveness.

Contribution

It introduces a more general proximal point algorithm with convergence guarantees and practical considerations, extending previous methods in the field.

Findings

01

The algorithm converges strongly under specified conditions.

02

It can approximate minimizers of convex functionals.

03

Simulations confirm practical applicability.

Abstract

This Note is inspired by the recent paper by Djafary Rouhani and Moradi [J. Optim. Theory Appl. 172 (2017) 222-235], where a proximal point algorithm proposed by Boikanyo and Moro\c{s}anu [Optim. Lett. 7 (2013) 415-420] is discussed. We start with a brief history of the subject and then propose and analyse the following more general algorithm for approximating the zeroes of a maximal monotone operator $A$ in real Hilbert space $H$ $x_{n + 1} = (I + β_{n} A)^{- 1} (u_{n} + α_{n} (x_{n} + e_{n})), n \geq 0,$ where $x_{0} \in H$ is a given starting point, $u_{n} \to u$ is a given sequence in $H$ , $R ∋ α_{n} \to 0$ , and $(e_{n})$ is the error sequence satisfying $α_{n} e_{n} \to 0$ . Besides the main result on the strong convergence of $(x_{n})$ , we discuss some particular cases, including the approximation of minimizers of convex functionals, explain how to use our…

Equations44

x_{n + 1} = (I + β_{n} A)^{- 1} (u_{n} + α_{n} (x_{n} + e_{n})), n \geq 0,

x_{n + 1} = (I + β_{n} A)^{- 1} (u_{n} + α_{n} (x_{n} + e_{n})), n \geq 0,

(x_{1} - x_{2}, y_{1} - y_{2}) \geq 0 \forall [x_{1}, y_{1}], [x_{2}, y_{2}] \in G (A) .

(x_{1} - x_{2}, y_{1} - y_{2}) \geq 0 \forall [x_{1}, y_{1}], [x_{2}, y_{2}] \in G (A) .

\partial ϕ (x) = {y \in H; ϕ (x) - ϕ (v) \leq (y, x - v) \forall v \in D (ϕ)}

\partial ϕ (x) = {y \in H; ϕ (x) - ϕ (v) \leq (y, x - v) \forall v \in D (ϕ)}

\mbox F in d x \in D (A) \mbox s u c h t ha t 0 \in A x .

\mbox F in d x \in D (A) \mbox s u c h t ha t 0 \in A x .

x_{n + 1} = J_{β_{n}} (x_{n} + e_{n}), n \geq 0,

x_{n + 1} = J_{β_{n}} (x_{n} + e_{n}), n \geq 0,

x_{n + 1} = J_{β_{n}} (λ_{n} u + (1 - λ_{n}) x_{n} + e_{n}), n \geq 0,

x_{n + 1} = J_{β_{n}} (λ_{n} u + (1 - λ_{n}) x_{n} + e_{n}), n \geq 0,

x_{n + 1} = J_{β_{n}} (λ_{n} u + (1 - λ_{n}) (x_{n} + e_{n})), n \geq 0 .

x_{n + 1} = J_{β_{n}} (λ_{n} u + (1 - λ_{n}) (x_{n} + e_{n})), n \geq 0 .

x_{n + 1} = J_{β_{n}} (u_{n} + α_{n} (x_{n} + e_{n})), n \geq 0,

x_{n + 1} = J_{β_{n}} (u_{n} + α_{n} (x_{n} + e_{n})), n \geq 0,

A x_{n} ∋ z_{n} := \frac{1}{β _{n - 1}} (u_{n - 1} + α_{n - 1} x_{n - 1} + α_{n - 1} e_{n - 1} - x_{n - 1}) \to 0 .

A x_{n} ∋ z_{n} := \frac{1}{β _{n - 1}} (u_{n - 1} + α_{n - 1} x_{n - 1} + α_{n - 1} e_{n - 1} - x_{n - 1}) \to 0 .

(v - x_{n}, w - z_{n}) \geq 0 \forall [v, w] \in G (A),

(v - x_{n}, w - z_{n}) \geq 0 \forall [v, w] \in G (A),

(v - p, w - 0) \geq 0 \forall [v, w] \in G (A),

(v - p, w - 0) \geq 0 \forall [v, w] \in G (A),

[p, 0] \in G (A) \Rightarrow p \in D (A), 0 \in A p .

[p, 0] \in G (A) \Rightarrow p \in D (A), 0 \in A p .

∥ x_{n + 1} - p ∥

∥ x_{n + 1} - p ∥

∥ x_{n + 1} ∥ \leq c ∥ x_{n} ∥ + c_{1} \forall n \geq 0,

∥ x_{n + 1} ∥ \leq c ∥ x_{n} ∥ + c_{1} \forall n \geq 0,

∥ x_{n} ∥

∥ x_{n} ∥

∥ x_{n + 1} - P_{F} u ∥

∥ x_{n + 1} - P_{F} u ∥

x_{n + 1} = J_{β_{n}} u_{n} \forall n \geq 0,

x_{n + 1} = J_{β_{n}} u_{n} \forall n \geq 0,

∥ x_{n + 1} - P_{F} u ∥

∥ x_{n + 1} - P_{F} u ∥

(I + β_{0} A) x = u_{0} + α_{0} x_{0}

(I + β_{0} A) x = u_{0} + α_{0} x_{0}

(I + β_{1} A) x = u_{1} + α_{1} (x_{1} + e_{1})

(I + β_{1} A) x = u_{1} + α_{1} (x_{1} + e_{1})

x_{2} = (I + β_{1} A)^{- 1} (u_{1} + α_{1} (x_{1} + e_{1})),

x_{2} = (I + β_{1} A)^{- 1} (u_{1} + α_{1} (x_{1} + e_{1})),

z_{n + 1} = (I + β_{n} A)^{- 1} (u_{n} + α_{n} z_{n}) + e_{n + 1} \forall n \geq 0,

z_{n + 1} = (I + β_{n} A)^{- 1} (u_{n} + α_{n} z_{n}) + e_{n + 1} \forall n \geq 0,

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

11institutetext: Author’s Address:

Central European University

Department of Mathematics and its Applications

Nador u. 9

1051 Budapest, Hungary

[email protected]

A proximal point algorithm revisited and extended

Gheorghe Moroşanu

(Received: date / Accepted: date)

Abstract

This Note is inspired by the recent paper by Djafary Rouhani and Moradi [J. Optim. Theory Appl. 172 (2017) 222-235], where a proximal point algorithm proposed by Boikanyo and Moroşanu [Optim. Lett. 7 (2013) 415-420] is discussed. We start with a brief history of the subject and then propose and analyse the following more general algorithm for approximating the zeroes of a maximal monotone operator $A$ in real Hilbert space $H$

[TABLE]

where $x_{0}\in H$ is a given starting point, $u_{n}\rightarrow u$ is a given sequence in $H$ , ${R}\ni\alpha_{n}\rightarrow 0$ , and $(e_{n})$ is the error sequence satisfying $\alpha_{n}e_{n}\rightarrow 0$ . Besides the main result on the strong convergence of $(x_{n})$ , we discuss some particular cases, including the approximation of minimizers of convex functionals, explain how to use our algorithm in practice, and present some simulations to illustrate the applicability of our algorithm.

Keywords:

Maximal monotone operator Proximal point algorithm Convex function Strong convergence

MSC:

47J25 47H05 90C25 90C90

1 Introduction

Let $H$ be a real Hilbert space with scalar product $(\cdot,\cdot)$ and norm $\|\cdot\|$ . An operator $A:D(A)\subset H\to H$ (possibly set-valued) is said to be monotone if its graph $G(A)=\{[x,y]\in D(A)\times H;\,y\in Ax\}$ is a monotone subset of $H\times H$ , i.e.,

[TABLE]

If in addition $G(A)$ is not properly contained in the graph of any other monotone operator in $H$ , then $A$ is called maximal monotone. It is well-known that a monotone operator $A$ is maximal monotone if and only if the range of $I+\lambda A$ is all of $H$ for all $\lambda>0$ (equivalently for some $\lambda>0$ ). In this case the so-called resolvent operator $J_{\lambda}=(I+\lambda A)^{-1}$ is everywhere defined, single-valued and nonexpansive (i.e., Lipschitz with constant $L=1$ ). If $\phi:H\to(-\infty,+\infty]$ is a proper (i.e., not identically $+\infty$ ), convex, lower semicontiunous function then the subdifferential operator defined by

[TABLE]

is maximal monotone. For more information on monotone operators and convex functions see brezis and morosanu .

We are interested in solving the problem

[TABLE]

Denote by $F$ the solution set of (1), i.e., $F=A^{-1}0$ . One of the most important iterative methods for finding approximate solutions of (1) is the proximal point algorithm (PPA) which was introduced by Martinet martinet for a particular case of $A$ and then extended by Rockafellar rockafellar to a general maximal monotone operator. For each $x_{0}\in H$ the PPA generates the sequence $(x_{n})$ as follows

[TABLE]

where $\beta_{n}\in(0,\infty)$ for all $n\geq 0$ and $(e_{n})$ is the sequence of computational errors. Unfortunately (under the suitable conditions $\lim\inf\beta_{n}>0$ , $\sum_{n=0}^{\infty}\|e_{n}\|<\infty$ ) $(x_{n})$ converges in general only weakly (to points of F), even in the particular case when $A$ is a subdifferential operator (see guler ). Subsequently much work has been dedicated towards modifying the PPA to obtain algorithms that generate strongly convergent sequences. Recall that, inspired by Lehdili and Moudafi’s prox-Tikhonov method (see lehmud ), Xu xu considered the following iterative scheme

[TABLE]

where $\lambda_{n}\in(0,1),\ \beta_{n}\in(0,\infty)\ \ \forall n\geq 0$ , $\lambda_{n}\rightarrow 0$ , $\sum_{n=1}^{\infty}\lambda_{n}=\infty$ , and proved that, under some additional conditions, $x_{n}$ converges strongly to $P_{F}u$ , the metric projection of $u$ onto $F$ (which was assumed to be nonempty). The best convergence result on (3) has been reported later by Wang and Cui wangcui . Specifically, they proved that $(x_{n})$ generated by (3) converges strongly to $P_{F}u$ under the following conditions: $F\neq\emptyset$ , $\lambda_{n}\in(0,1)$ , $\beta_{n}\in(0,\infty)\ \forall n\geq 0$ , $\lim\inf\beta_{n}>0,\ \lambda_{n}\rightarrow 0,\ \sum_{n=0}^{\infty}\lambda_{n}=\infty$ , and either $\sum_{n=0}^{\infty}\|e_{n}\|<\infty$ or $\lim\|e_{n}\|/\lambda_{n}=0$ . In fact, under these conditions, (3) is equivalent with

[TABLE]

In bm a strong convergence result for $(x_{n})$ generated by (4) was reported in the case of the alternative framework: $F\neq\emptyset$ , $\lambda_{n}\in(0,1),\ \beta_{n}\in(0,\infty)\ \ \forall n\geq 0$ , $\lambda_{n}\rightarrow 1$ , $\beta_{n}\rightarrow\infty$ , and $(e_{n})$ bounded. The same framework is reconsidered in a recent paper by Djafari Rouhani and Moradi rouhanimoradi . They use the condition $(\lambda_{n}-1)e_{n}\rightarrow 0$ (instead of the boundedness of $(e_{n})$ ). In fact this condition is also easily visible from our approach in bm .

The main observation leading to this Note is that: while the convex combination $\lambda_{n}u+(1-\lambda_{n})(x_{n}+e_{n})$ in (4) is relevant when $\lambda_{n}\rightarrow 0$ , it is not the case if $\lambda_{n}\rightarrow 1$ . Indeed, we can consider the following more general algorithm

[TABLE]

where

$(H)$ $A:D(A)\subset H\to H$ is a maximal monotone operator with $A^{-1}0=:F\neq\emptyset$ ; $\beta_{n}\in(0,\infty)$ , $\alpha_{n}\in{R}$ for all $n\geq 0$ , $\beta_{n}\rightarrow\infty$ , $\alpha_{n}\rightarrow 0$ ; $\alpha_{n}e_{n}\rightarrow 0$ ; $u_{n}\rightarrow u$ .

Our main result (Theorem 2.1) states that under hypotheses $(H)$ , for every $x_{0}\in H$ , the sequence $(x_{n})$ generated by (5) converges strongly to $P_{F}u$ . By chosing $u_{n}=\lambda_{n}u$ and $\alpha_{n}=1-\lambda_{n},\ n\geq 0$ with $\lambda_{n}\rightarrow 1$ , we reobtain Theorem 1 in bm and Theorem 3.2 in rouhanimoradi . In addition if $\alpha_{n}=0$ for all $n\geq 0$ (or for all $n\geq N$ ) then (5) defines just a simple sequence (not an iterative method since $x_{n+1}$ is no longer dependent on $x_{n}$ ) which approximates $P_{F}u$ and in this case Theorem 3.4 in rouhanimoradi is reobtained as a simple particular case (with $u_{n}:=\lambda_{n}u+(1-\lambda_{n})(y_{0}+e_{n}),\ n\geq 0$ ).

2 Main Result

Since we want to show that the sequences generated by (5) are convergent, we begin this section with a preliminary result stating that a necessary condition is $F=A^{-1}0\neq\emptyset$ .

Lemma 1

Assume that $A:D(A)\subset H\to H$ is a maximal monotone operator, $\beta_{n}\rightarrow\infty$ , $(u_{n})_{n\geq 0}$ is a bounded sequence, $|\alpha_{n}|\leq c\ \forall n\geq 0$ for some $c<1$ , and $(\alpha_{n}e_{n})$ is bounded. Then the sequence $(x_{n})$ generated by (5) is bounded for all $x_{0}\in H$ (equivalently, for some $x_{0}\in H$ ) if and only if $F\neq\emptyset$ .

Proof Assume that for some $x_{0}\in H$ the sequence $(x_{n})$ generated by (5) is bounded. We have

[TABLE]

Therefore taking the limit in the obvious inequality

[TABLE]

we infer

[TABLE]

where $p$ is a weak cluster point of $(x_{n})$ . By the maximality of $A$ we obtain

[TABLE]

Conversely, assume $F\neq\emptyset$ . Let $p\in F$ and $x_{0}\in H$ be arbitrary but fixed points. Since the resolvent operator is nonexpansive we have

[TABLE]

Therefore

[TABLE]

where $c_{1}$ is a positive constant. From (6) we derive by induction

[TABLE]

which shows that $(x_{n})$ is bounded. ∎

Before stating our main theorem let us recall the following result which was proved independently by Bruck bruck and Moroşanu moro .

Lemma 2

Let $A:D(A)\subset H\to H$ be a maximal monotone operator with $F=:A^{-1}0$ nonempty. Then for every $u\in H$ , $(I+\lambda A)^{-1}u\rightarrow P_{F}u$ as $\lambda\rightarrow\infty$ , where $P_{F}u$ denotes the metric projection of $u$ onto $F$ .

Now let us state the main result of this Note.

Theorem 2.1

Assume $(H)$ (see the previous section). Then for all $x_{0}\in H$ the sequence $(x_{n})$ generated by algorithm (5) converges strongly to $P_{F}u$ (the metric projection of $u$ onto $F=A^{-1}0$ ).

Proof Let $x_{0}\in H$ be an arbitrary but fixed point. By Theorem 1 the corresponding sequence $(x_{n})$ generated by (5) is bounded (since there exists a natural number $N$ such that $|\alpha_{n}|\leq c<1$ for $n\geq N$ so Lemma 1 is applicable with $x_{0}:=x_{N}$ ). Thus we have

[TABLE]

So by $(H)$ and Lemma 2 we conclude that $\|x_{n}-P_{F}u\|\rightarrow 0$ as $n\rightarrow\infty$ . ∎

3 Concluding Comments

If $A=\partial\phi$ where $\phi:H\to(-\infty,\infty]$ is a proper, convex, lower semicontinuous function the algorithm (5) serves as a method for appoximating minimizers of $\phi$ (assuming that the set of minimizers of $\phi$ is nonempty), since in this case $p\in A^{-1}0$ if and only if $p$ is a minimizer of $\phi$ .
The error sequence $(e_{n})$ is allowed to be bounded as usual in numerical analysis, or even unbounded with $\alpha_{n}\|e_{n}\|\rightarrow 0$ .
Theorem 2.1 is a generalization of both Theorem 3.2 in rouhanimoradi and Theorem 1 in bm .

If $\alpha_{n}=0$ for all $n\geq 0$ (or for all $\alpha_{n}\geq N$ ) then (5) defines just a simple sequence, not a real iterative method, since $x_{n+1}$ does not depend on $x_{n}$ . In this case we have

[TABLE]

or for all $n\geq N$ . In fact, according to Lemma 2, we have

[TABLE]

Note that the second algorithm introduced and studied in (rouhanimoradi, , p. 228) is in fact a sequence of the form (7) with $u_{n}=\lambda_{n}u+(1-\lambda_{n})(y_{0}+e_{n})$ , $n\geq 0$ .

Let us explain how the algorithn (5) works when performing simulations. Assume conditions $(H)$ are fulfilled. In addition, for the sake of simplicity, $A$ is assumed to be single-valued. For a given $x_{0}\in H$ we compute $x_{1}$ by solving for $x$ the equation

[TABLE]

and get $x_{1}+e_{1}$ instead of the exact solution $x=x_{1}$ . We do not have any error for $x_{0}$ (i.e., $e_{0}=0$ ) but we have a computational error $e_{1}$ for $x_{1}$ . Next we solve for $x$ the equation

[TABLE]

and get $x_{2}+e_{2}$ instead of the exact solution

[TABLE]

and so on. Thus using the computer we obtain $z_{n}=x_{n}+e_{n}$ satisfying

[TABLE]

where $z_{0}=0$ . If $\|e_{n}\|\leq\varepsilon$ for all $n\geq 0$ , then $\|x_{n}-z_{n}\|\leq\varepsilon$ for all $n\geq 0$ , i.e., $z_{n}$ approximates $P_{F}u$ for $n$ large enough.

4 Simulations

Intuitively, in order to achieve fast convergence of the sequence $(x_{n})$ generated by algorithm (5) to $P_{F}u$ we need to choose a point $u$ as close as possible to $F=A^{-1}0$ and sequences $(\beta_{n})$ and $(\alpha_{n})$ that converge fastly to $\infty$ and to [math], respectively.

etc., etc., … to be continued.

Bibliography12

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1) Brezis, H.: Opérateurs Maximaux Monotones et Semi-Groupes de Contractions dans les Espaces de Hilbert, North Holland Math. Studies, Vol. 5. North Holland, Amsterdam (1973)
2(2) Moroşanu, G.: Nonlinear Evolution Equations and Applications. Reidel, Dordrecht (1988)
3(3) Martinet, B.: Régularisation d’inéquations variationnelles par approximations succesives. Rev. Française Informat. Recherche Opérationnelle 4 (Ser. R-3), 154-158 (1970)
4(4) Rockafellar, R.T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim. 14 , 877-898 (1976)
5(5) Güler, O.: On the convergence of the proximal point algorithm for convex minimization. SIAM J. Control Optim. 29 , 403-419 (1991)
6(6) Lehdili, N., Moudafi, A.: Combining the proximal algorithm and Tikhonov regularization. Optimization 37 , 239-252 (1996)
7(7) Xu, H.K.: A regularization method for the proximal point algorithm. J. Global Optim. 36 , 115-125 (2006)
8(8) Wang, F., Cui, H.: On the contraction-proximal point algorithms with multi-parameters. J. Global Optim. 54 , 485-491 (2012)