Pointwise and ergodic convergence rates of a variable metric proximal   ADMM

Max L.N. Goncalves; Jefferson G. Melo; M. Marques Alves

arXiv:1702.06626·math.OC·May 5, 2017

Pointwise and ergodic convergence rates of a variable metric proximal ADMM

Max L.N. Goncalves, Jefferson G. Melo, M. Marques Alves

PDF

TL;DR

This paper establishes the first global pointwise and ergodic convergence rates for a variable metric proximal ADMM, advancing understanding of its efficiency in solving linearly constrained convex optimization problems.

Contribution

It introduces a novel convergence analysis for VM-PADMM, including nonasymptotic rates, by linking it to a new VM-HPE framework for monotone inclusions.

Findings

01

Achieves $ ext{O}(1/\sqrt{k})$ pointwise convergence rate.

02

Achieves $ ext{O}(1/k)$ ergodic convergence rate.

03

First to establish these rates for VM-PADMM and VM-HPE framework.

Abstract

In this paper, we obtain global $O (1/ k)$ pointwise and $O (1/ k)$ ergodic convergence rates for a variable metric proximal alternating direction method of multipliers(VM-PADMM) for solving linearly constrained convex optimization problems. The VM-PADMM can be seen as a class of ADMM variants, allowing the use of degenerate metrics (defined by noninvertible linear operators). We first propose and study nonasymptotic convergence rates of a variable metric hybrid proximal extragradient (VM-HPE) framework for solving monotone inclusions. Then, the above-mentioned convergence rates for the VM-PADMM are obtained essentially by showing that it falls within the latter framework. To the best of our knowledge, this is the first time that global pointwise (resp. pointwise and ergodic) convergence rates are obtained for the VM-PADMM (resp. VM-HPE framework).

Equations285

minimize f (x) + g (y) subject to A x + B y = b,

minimize f (x) + g (y) subject to A x + B y = b,

x_{k}

x_{k}

y_{k}

γ_{k}

r_{x} \in max ε_{x} + \partial_{ε_{x}} f (x) - A^{*} \tilde{γ}, r_{y} \in \partial_{ε_{y}} g (y) - B^{*} \tilde{γ}, r_{γ} = A x + B y - b, {∥ r_{x} ∥_{x}^{*}, ∥ r_{y} ∥_{y}^{*}, ∥ r_{γ} ∥_{γ}^{*}} \leq ρ, ε_{y} \leq ε,

r_{x} \in max ε_{x} + \partial_{ε_{x}} f (x) - A^{*} \tilde{γ}, r_{y} \in \partial_{ε_{y}} g (y) - B^{*} \tilde{γ}, r_{γ} = A x + B y - b, {∥ r_{x} ∥_{x}^{*}, ∥ r_{y} ∥_{y}^{*}, ∥ r_{γ} ∥_{γ}^{*}} \leq ρ, ε_{y} \leq ε,

⟨ z, M z^{'} ⟩

⟨ z, M z^{'} ⟩

∥ z + z^{'} ∥_{Z, M}^{2}

∥ z ∥_{Z, M}^{*} := ∥ z^{'} ∥_{Z, M} \leq 1 sup ⟨ z, z^{'} ⟩_{Z} (z \in Z) .

∥ z ∥_{Z, M}^{*} := ∥ z^{'} ∥_{Z, M} \leq 1 sup ⟨ z, z^{'} ⟩_{Z} (z \in Z) .

M ⪯ N ⟺ N - M \in M_{+}^{Z} .

M ⪯ N ⟺ N - M \in M_{+}^{Z} .

∥ \cdot ∥_{Z, M} \leq c ∥ \cdot ∥_{Z, N} and ∥ \cdot ∥_{Z, N}^{*} \leq c ∥ \cdot ∥_{Z, M}^{*} .

∥ \cdot ∥_{Z, M} \leq c ∥ \cdot ∥_{Z, N} and ∥ \cdot ∥_{Z, N}^{*} \leq c ∥ \cdot ∥_{Z, M}^{*} .

⟨ v - v^{'}, z - z^{'} ⟩ \geq 0 \forall z, z^{'} \in Z, \forall v \in T (z), \forall v^{'} \in T (z^{'}) .

⟨ v - v^{'}, z - z^{'} ⟩ \geq 0 \forall z, z^{'} \in Z, \forall v \in T (z), \forall v^{'} \in T (z^{'}) .

T^{ε} (z) := {v \in Z ∣ ⟨ v - v^{'}, z - z^{'} ⟩ \geq - ε, \forall z^{'} \in Z, \forall v^{'} \in T (z^{'})} \forall z \in Z .

T^{ε} (z) := {v \in Z ∣ ⟨ v - v^{'}, z - z^{'} ⟩ \geq - ε, \forall z^{'} \in Z, \forall v^{'} \in T (z^{'})} \forall z \in Z .

\tilde{z}_{k}^{a} := \frac{1}{k} i = 1 \sum k \tilde{z}_{i}, r_{k}^{a} := \frac{1}{k} i = 1 \sum k r_{i}, ε_{k}^{a} := \frac{1}{k} i = 1 \sum k ⟨ r_{i}, \tilde{z}_{i} - \tilde{z}_{k}^{a} ⟩ .

\tilde{z}_{k}^{a} := \frac{1}{k} i = 1 \sum k \tilde{z}_{i}, r_{k}^{a} := \frac{1}{k} i = 1 \sum k r_{i}, ε_{k}^{a} := \frac{1}{k} i = 1 \sum k ⟨ r_{i}, \tilde{z}_{i} - \tilde{z}_{k}^{a} ⟩ .

0 \in T (z),

0 \in T (z),

r_{k} := M_{k} (z_{k - 1} - z_{k}) \in T (\tilde{z}_{k}),

r_{k} := M_{k} (z_{k - 1} - z_{k}) \in T (\tilde{z}_{k}),

∥ z_{k} - \tilde{z}_{k} ∥_{Z, M_{k}}^{2} + η_{k} \leq σ ∥ z_{k - 1} - \tilde{z}_{k} ∥_{Z, M_{k}}^{2} + η_{k - 1} .

r_{k} \in T (\tilde{z}_{k}), ∥ r_{k} + \tilde{z}_{k} - z_{k - 1} ∥_{Z}^{2} \leq σ ∥ \tilde{z}_{k} - z_{k - 1} ∥_{Z}^{2},

r_{k} \in T (\tilde{z}_{k}), ∥ r_{k} + \tilde{z}_{k} - z_{k - 1} ∥_{Z}^{2} \leq σ ∥ \tilde{z}_{k} - z_{k - 1} ∥_{Z}^{2},

z_{k} = z_{k - 1} - r_{k},

i = 0 \sum k c_{i} \leq C_{S}, \frac{1}{1 + c _{k}} M_{k} ⪯ M_{k + 1} ⪯ (1 + c_{k}) M_{k} \forall k \geq 0.

i = 0 \sum k c_{i} \leq C_{S}, \frac{1}{1 + c _{k}} M_{k} ⪯ M_{k + 1} ⪯ (1 + c_{k}) M_{k} \forall k \geq 0.

i = 0 \prod k (1 + c_{i}) \leq C_{P} \mbox an d M_{j} ⪯ C_{P} M_{k}, \forall j, k \geq 0.

i = 0 \prod k (1 + c_{i}) \leq C_{P} \mbox an d M_{j} ⪯ C_{P} M_{k}, \forall j, k \geq 0.

d_{0} := in f {∥ z^{*} - z_{0} ∥_{Z, M_{0}} ∣ z^{*} \in T^{- 1} (0)},

d_{0} := in f {∥ z^{*} - z_{0} ∥_{Z, M_{0}} ∣ z^{*} \in T^{- 1} (0)},

∥ r_{i} ∥_{Z, M_{i}}^{*} \leq (\frac{2 ( 1 + σ ) C _{P} ( d _{0}^{2} + η _{0} ) + 2 ( 1 - σ ) η _{0}}{( 1 - σ ) k})^{1/2} .

∥ r_{i} ∥_{Z, M_{i}}^{*} \leq (\frac{2 ( 1 + σ ) C _{P} ( d _{0}^{2} + η _{0} ) + 2 ( 1 - σ ) η _{0}}{( 1 - σ ) k})^{1/2} .

i = O (⌈ \frac{C _{p} ( d _{0}^{2} + η _{0} )}{ρ ^{2}} ⌉)

i = O (⌈ \frac{C _{p} ( d _{0}^{2} + η _{0} )}{ρ ^{2}} ⌉)

r_{i} \in T (\tilde{z}_{i}) \mbox an d ∥ r_{i} ∥_{Z, M_{i}}^{*} \leq ρ .

r_{i} \in T (\tilde{z}_{i}) \mbox an d ∥ r_{i} ∥_{Z, M_{i}}^{*} \leq ρ .

\tilde{z}_{k}^{a} := \frac{1}{k} i = 1 \sum k \tilde{z}_{i}, r_{k}^{a} := \frac{1}{k} i = 1 \sum k r_{i}, ε_{k}^{a} := \frac{1}{k} i = 1 \sum k ⟨ r_{i}, \tilde{z}_{i} - \tilde{z}_{k}^{a} ⟩ .

\tilde{z}_{k}^{a} := \frac{1}{k} i = 1 \sum k \tilde{z}_{i}, r_{k}^{a} := \frac{1}{k} i = 1 \sum k r_{i}, ε_{k}^{a} := \frac{1}{k} i = 1 \sum k ⟨ r_{i}, \tilde{z}_{i} - \tilde{z}_{k}^{a} ⟩ .

∥ r_{k}^{a} ∥_{Z, M_{k}}^{*} \leq \frac{E d _{0}^{2} + η _{0}}{k},

∥ r_{k}^{a} ∥_{Z, M_{k}}^{*} \leq \frac{E d _{0}^{2} + η _{0}}{k},

0 \leq ε_{k}^{a} \leq \frac{E ( d _{0}^{2} + η _{0} )}{k},

O ((1 + C_{S}) C_{p}^{2} max {⌈ \frac{d _{0}^{2} + η _{0}}{ρ} ⌉, ⌈ \frac{d _{0}^{2} + η _{0}}{ε} ⌉})

O ((1 + C_{S}) C_{p}^{2} max {⌈ \frac{d _{0}^{2} + η _{0}}{ρ} ⌉, ⌈ \frac{d _{0}^{2} + η _{0}}{ε} ⌉})

r_{k}^{a} \in T^{ε_{k}^{a}} (\tilde{z}_{k}^{a}), ∥ r_{k}^{a} ∥_{Z, M_{k}}^{*} \leq ρ \mbox an d ε_{k}^{a} \leq ε .

r_{k}^{a} \in T^{ε_{k}^{a}} (\tilde{z}_{k}^{a}), ∥ r_{k}^{a} ∥_{Z, M_{k}}^{*} \leq ρ \mbox an d ε_{k}^{a} \leq ε .

minimize f (x) + g (y) subject to A x + B y = b,

minimize f (x) + g (y) subject to A x + B y = b,

0 \in \partial f (x) - A^{*} γ, 0 \in \partial g (y) - B^{*} γ, A x + B y - b = 0.

0 \in \partial f (x) - A^{*} γ, 0 \in \partial g (y) - B^{*} γ, A x + B y - b = 0.

Ω^{*} := {(x^{*}, y^{*}, γ^{*}) \in X \times Y \times Γ ∣ (x^{*}, y^{*}, γ^{*}) \mbox i s a so l u t i o n o f \eqref e q : f ooc},

Ω^{*} := {(x^{*}, y^{*}, γ^{*}) \in X \times Y \times Γ ∣ (x^{*}, y^{*}, γ^{*}) \mbox i s a so l u t i o n o f \eqref e q : f ooc},

x \in X min {f (x) - ⟨ γ_{k - 1}, A x ⟩_{X} + \frac{1}{2} ∥ A x + B y_{k - 1} - b ∥_{Γ, H_{k}}^{2} + \frac{1}{2} ∥ x - x_{k - 1} ∥_{X, R_{k}}^{2}}

x \in X min {f (x) - ⟨ γ_{k - 1}, A x ⟩_{X} + \frac{1}{2} ∥ A x + B y_{k - 1} - b ∥_{Γ, H_{k}}^{2} + \frac{1}{2} ∥ x - x_{k - 1} ∥_{X, R_{k}}^{2}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Pointwise and ergodic convergence rates of a variable metric proximal ADMM

M.L.N. Gonçalves Instituto de Matemática e Estatística, Universidade Federal de Goiás, Campus II- Caixa Postal 131, CEP 74001-970, Goiânia-GO, Brazil. (E-mails: [email protected] and [email protected]). The work of these authors was supported in part by CNPq Grants 406250/2013-8, 444134/2014-0 and 309370/2014-0.

M. Marques Alves

Departamento de Matemática, Universidade Federal de Santa Catarina, Florianópolis, Brazil, 88040-900 ([email protected]). The work of this author was partially supported by CNPq grants no. 406250/2013-8, 306317/2014-1 and 405214/2016-2.

J.G. Melo 11footnotemark: 1

(May 4, 2017)

Abstract

In this paper, we obtain global $\mathcal{O}(1/\sqrt{k})$ pointwise and $\mathcal{O}(1/{k})$ ergodic convergence rates for a variable metric proximal alternating direction method of multipliers (VM-PADMM) for solving linearly constrained convex optimization problems. The VM-PADMM can be seen as a class of ADMM variants, allowing the use of degenerate metrics (defined by noninvertible linear operators). We first propose and study nonasymptotic convergence rates of a variable metric hybrid proximal extragradient (VM-HPE) framework for solving monotone inclusions. Then, the above-mentioned convergence rates for the VM-PADMM are obtained essentially by showing that it falls within the latter framework. To the best of our knowledge, this is the first time that global pointwise (resp. pointwise and ergodic) convergence rates are obtained for the VM-PADMM (resp. VM-HPE framework).

2000 Mathematics Subject Classification: 90C25, 90C60, 49M27, 47H05, 47J22, 65K10.

Key words: alternating direction method of multipliers, variable metric, pointwise and ergodic convergence rates, hybrid proximal extragradient method, convex program.

1 Introduction

We consider the linearly constrained convex optimization problem

[TABLE]

where $f:\mathcal{X}\to\overline{\mathbb{R}}:=\mathbb{R}\cup\{+\infty\}$ and $g:\mathcal{Y}\to\overline{\mathbb{R}}$ are extended-real-valued proper closed and convex functions, $\mathcal{X},\mathcal{Y}$ and $\Gamma$ are finite-dimensional real vector spaces, and $A:\mathcal{X}\to\Gamma$ and $B:\mathcal{Y}\to\Gamma$ are linear operators. One of the most popular methods for solving (1) is the alternating direction method of multipliers (ADMM) [4, 14, 15], for which many variants have been proposed and studied in the literature; see, e.g., [1, 3, 7, 9, 10, 11, 12, 13, 17, 18, 19, 21, 25, 31].

In this paper, we obtain global ergodic and pointwise convergence rates for a variable metric proximal ADMM (VM-PADMM) which can be described as follows: given an initial point $(x_{0},y_{0},\gamma_{0})\in\mathcal{X}\times\mathcal{Y}\times\Gamma$ and a stepsize $\theta>0$ , compute a sequence $\{(x_{k},y_{k},\gamma_{k})\}$ , recursively, by

[TABLE]

where $H_{k}$ , $R_{k}$ and $S_{k}$ are selfadjoint linear operators such that $H_{k}$ is positive definite and $R_{k}$ and $S_{k}$ are positive semidefinite, and $\|\cdot\|_{\Gamma,H_{k}}^{2}:=\langle{H_{k}(\cdot)},{\cdot}\rangle_{\Gamma}$ , etc. We start by reviewing some existing methods and works related to the above method.

VM-PADMM and some variants. The VM-PADMM (2)–(4) can be seen as a class of ADMM variants, depending on the choices of the linear operators $H_{k}$ , $R_{k}$ and $S_{k}$ . Namely,

•

by taking $H_{k}=\beta I$ with $\beta>0$ , $R_{k}=0$ , $S_{k}=0$ and $\theta=1$ , it reduces to the standard ADMM, whose the ergodic convergence rate was established in [30];

•

the ADMM in [21] (related to the Uzawa method [38]) consists of taking $H_{k}=\beta I$ with $\beta>0$ , $R_{k}$ constant, $S_{k}=0$ and $\theta=1$ . Pointwise and ergodic convergence rates for this variant were obtained in [21, 22];

•

the proximal ADMM consists of choosing $H_{k}=\beta I$ with $\beta>0$ , $R_{k}$ and $S_{k}$ constant. This method has been studied by many authors; see, for instance [8, 10, 16, 35], where convergence rates are analyzed;

•

by choosing $H_{k}=\beta_{k}I$ , $R_{k}=0$ and $S_{k}=0$ , it corresponds to a variable penalty parameter ADMM, for which asymptotic convergence analysis was considered in [20, 23, 36];

•

the VM-PADMM (2)–(4) with $R_{k}$ and $S_{k}$ positive definite is closely related to the method studied in [19, 26] for solving (point-to-point) continuous monotone variational inequality problems (in the setting of problem (1), it demands $f$ and $g$ to be continuously differentiable). We mention that, contrary to our analysis, the latter references consider the stepsize $\theta=1$ and do not present nonasymptotic convergence rates;

•

by letting $H_{k}=\beta I$ , $\beta>0$ , and $\theta=1$ , the resulting method becomes similar to Algorithm 7 in [2], where a composite structure of $f$ is considered and ergodic convergence rates were obtained under the additional conditions that $B=I$ in (1) and the dual solution set of (1) be bounded.

Contributions of the paper. We obtain an $\mathcal{O}(1/k)$ global convergence rate for an ergodic sequence associated to the VM-PADMM (2)–(4) with $\theta\in(0,(\sqrt{5}+1)/2)$ , which provides, for given tolerances $\rho,\varepsilon>0$ , triples $(x,y,\tilde{\gamma})$ , $(r_{x},r_{y},r_{\gamma})$ and scalars $\varepsilon_{x},\varepsilon_{y}\geq 0$ such that

[TABLE]

in at most $\mathcal{O}\left(\max\left\{\left\lceil d_{0}/\rho\,\right\rceil,\left\lceil d_{0}^{2}/\varepsilon\,\right\rceil\right\}\right)$ iterations, where $\|\cdot\|^{*}_{x}\,,\;\|\cdot\|^{*}_{y}\,$ and $\;\|\cdot\|^{*}_{\gamma}$ denote dual seminorms associated to the linear operators $H_{k},R_{k}$ and $S_{k}$ , and $d_{0}$ is a scalar measuring the quality of the initial point. Moreover, we establish an $\mathcal{O}(1/\sqrt{k})$ pointwise convergence rate in which the inclusions in (5) are strengthened, in the sense that $\varepsilon_{x}=\varepsilon_{y}=0$ , and the bound on the number of iterations becomes $\mathcal{O}\left(\left\lceil d_{0}^{2}/\rho^{2}\,\right\rceil\right)$ . Our study is done by first establishing global ergodic and pointwise convergence rates for a variable metric hybrid proximal extragradient (VM-HPE) framework for finding zeroes of maximal monotone operators, and then by showing that the VM-PADMM (2)–(4) can be seen as an instance of the latter framework. To the best of our knowledge, this is the first time that global pointwise (resp. pointwise and ergodic) convergence rates are obtained for the VM-PADMM (2)–(4) (resp. VM-HPE framework). Besides, our analysis allows degenerate metrics (induced by positive semidefinite linear operators) which makes the VM-PADMM (2)–(4) (and the VM-HPE framework) more suitable for applications. We next briefly review some related works to the VM-HPE framework.

VM-HPE type frameworks. The VM-HPE framework proposed in this work is a generalization of a special instance of the HPE framework [37] allowing variations in the metric (induced by positive semidefinite linear operators) along the iterations. The iteration complexity of the HPE framework was first analyzed in [28] and subsequently applied to the study of several methods; see, for example, [24, 27, 29, 30]. An inexact variable metric proximal point type method was proposed in [32] but, contrary to our VM-HPE framework, it demands the metrics to be nondegenerate (induced by invertible linear operators). Moreover, the convergence analysis presented in [32] does not include nonasymptotic convergence rates.

Outline of the paper. Subsection 1.1 presents our notation and basic results. Section 2 introduces the VM-HPE framework and presents its nonasymptotic pointwise and ergodic convergence rates, whose proofs are postponed to Appendix A. Section 3 contains two subsections. In Subsection 3.1, we formally state the VM-ADMM (2)–(4) and presents its nonasymptotic pointwise and ergodic convergence rates. In Subsection 3.2, we obtain the convergence rates of the VM-ADMM by viewing it as an instance of the VM-HPE framework.

1.1 Basic results and notation

Let $\mathcal{Z}$ be a finite-dimensional real vector space with inner product $\langle{\cdot},{\cdot}\rangle_{\mathcal{Z}}$ and induced norm $\|\cdot\|_{\mathcal{Z}}:=\sqrt{\langle{\cdot},{\cdot}\rangle_{\mathcal{Z}}}$ . Denote by $\mathcal{M}^{\mathcal{Z}}_{+}$ (resp. $\mathcal{M}^{\mathcal{Z}}_{++}$ ) the space of selfadjoint positive semidefinite (resp. definite) linear operators on $\mathcal{Z}$ . Each element $M\in\mathcal{M}^{\mathcal{Z}}_{+}$ induces a symmetric bilinear form $\langle{M(\cdot)},{\cdot}\rangle_{\mathcal{Z}}$ on $\mathcal{Z}\times\mathcal{Z}$ and a seminorm $\|\cdot\|_{\mathcal{Z},M}:=\sqrt{\langle{M(\cdot)},{\cdot}\rangle_{\mathcal{Z}}}$ on $\mathcal{Z}$ . Since $\langle{M(\cdot)},{\cdot}\rangle_{\mathcal{Z}}$ is symmetric and bilinear, the following hold, for all $z,z^{\prime}\in\mathcal{Z}$ ,

[TABLE]

Moreover, each $M\in\mathcal{M}^{\mathcal{Z}}_{+}$ also induces a (extended) dual seminorm on $\mathcal{Z}$ defined by

[TABLE]

On the other hand, each $M\in\mathcal{M}^{\mathcal{Z}}_{++}$ induces an inner product $\langle{M(\cdot)},{\cdot}\rangle_{\mathcal{Z}}$ and a norm $\|\cdot\|_{\mathcal{Z},M}:=\sqrt{\langle{M(\cdot)},{\cdot}\rangle_{\mathcal{Z}}}$ on $\mathcal{Z}$ , etc.

Next two propositions, whose proofs are omitted, will be useful in this paper.

Proposition 1.1.

For every $M\in\mathcal{M}_{+}^{\mathcal{Z}}$ , we have $\emph{dom}\,\|\cdot\|^{*}_{\mathcal{Z},M}=\mathcal{R}(M)$ and $\|M(\cdot)\|^{*}_{\mathcal{Z},M}=\|\cdot\|_{\mathcal{Z},M}$ , where $\mathcal{R}(M)$ denotes the range of $M$ .

Let the partial order $\preceq$ on $\mathcal{M}^{\mathcal{Z}}_{+}$ be defined by

[TABLE]

Proposition 1.2.

Let $M,N\in\mathcal{M}^{\mathcal{Z}}_{+}$ and $c>0$ . If $M\preceq cN$ , then

[TABLE]

A set-valued mapping $T:\mathcal{Z}\rightrightarrows\mathcal{Z}$ is said to be monotone if

[TABLE]

Moreover, $T$ is maximal monotone if it is monotone and, additionally, if $S$ is a monotone operator such that $T(z)\subset S(z)$ for every $z\in\mathcal{Z}$ then $T=S$ . The inverse operator $T^{-1}:\mathcal{Z}\rightrightarrows\mathcal{Z}$ of $T$ is given by $T^{-1}(v):=\{z\in\mathcal{Z}\;|\;v\in T(z)\}$ . Given $\varepsilon\geq 0$ , the $\varepsilon$ -enlargement $T^{\varepsilon}:\mathcal{Z}\rightrightarrows\mathcal{Z}$ of a set-valued mapping $T:\mathcal{Z}\rightrightarrows\mathcal{Z}$ is defined as

[TABLE]

Recall that the $\varepsilon$ -subdifferential of a convex function $f:\mathcal{Z}\to\overline{\mathbb{R}}$ is defined by $\partial_{\varepsilon}f(z):=\{v\in\mathcal{Z}\,|\,f(z^{\prime})\geq f(z)+\langle{v},{z^{\prime}-z}\rangle-\varepsilon\;\;\forall z^{\prime}\in\mathcal{Z}\}$ for every $z\in\mathcal{Z}$ . When $\varepsilon=0$ , then $\partial_{0}f(z)$ is denoted by $\partial f(z)$ and is called the subdifferential of $f$ at $z$ . The operator $\partial f$ is trivially monotone if $f$ is proper. If $f$ is a proper closed and convex function, then $\partial f$ is also maximal monotone [34].

The following result is a particular case of the weak transportation formula in [6, Theorem 2.3] combined with [5, Proposition 2(i)].

Theorem 1.3.

Suppose $T:\mathcal{Z}\rightrightarrows\mathcal{Z}$ is maximal monotone and let $\tilde{z}_{i},r_{i}\in\mathcal{Z}$ , for $i=1,\dots,k$ , be such that $r_{i}\in T(\tilde{z}_{i})$ and define

[TABLE]

*Then, the following hold: *

(a)

$\varepsilon_{k}^{a}\geq 0$ * and $r_{k}^{a}\in T^{\varepsilon_{k}^{a}}(\tilde{z}_{k}^{a})$ ;*

(b)

if, in addition, $T=\partial f$ for some proper closed and convex function $f$ , then $r_{k}^{a}\in\partial_{\varepsilon_{k}^{a}}f(\tilde{z}_{k}^{a})$ .

2 A variable metric HPE framework

Consider the monotone inclusion problem

[TABLE]

where $\mathcal{Z}$ is a finite-dimensional inner product real vector space and $T:\mathcal{Z}\rightrightarrows\mathcal{Z}$ is maximal monotone. Assume that the solution set $T^{-1}(0)$ of (9) is nonempty.

In this section, we propose a variable metric hybrid proximal extragradient (VM-HPE) framework for solving (9) and analyze its nonasymptotic convergence rates. The proposed framework finds its roots in the hybrid proximal extragradient (HPE) framework of [37], for which the iteration complexity was recently obtained in [28]. Our main results on pointwise and ergodic convergence rates for the VM-HPE framework are presented in Theorems 2.2 and 2.3, respectively. In Section 3, we will show how the VM-HPE framework can be used to analyze the nonasymptotic convergence of a VM-PADMM for solving linearly constrained convex optimization problems.

We begin by stating the VM-HPE framework.

A variable metric hybrid proximal extragradient (VM-HPE) framework

(0)

Let $z_{0}\in\mathcal{Z}$ , $\eta_{0}\in\mathbb{R}_{+}$ and $\sigma\in[0,1)$ be given, and set $k=1$ .

(1)

Choose $M_{k}\in\mathcal{M}^{\mathcal{Z}}_{+}$ and find $(z_{k},\tilde{z}_{k},\eta_{k})\in\mathcal{Z}\times\mathcal{Z}\times\mathbb{R}_{+}$ such that

$\displaystyle r_{k}:=M_{k}(z_{k-1}-{z_{k}})\in T(\tilde{z}_{k}),$

(10)

$\displaystyle\|{z_{k}}-{\tilde{z}}_{k}\|_{\mathcal{Z},M_{k}}^{2}+\eta_{k}\leq\sigma\|{z_{k-1}}-\tilde{z}_{k}\|_{\mathcal{Z},M_{k}}^{2}+\eta_{k-1}.$

(11)

(2)

Set $k\leftarrow k+1$ and go to step 1.

end

Remarks. 1) Letting $M_{k}\equiv I$ and $\eta_{k}\equiv 0$ in (10) and (11), respectively, we find that the sequences $\{z_{k}\}$ , $\{\tilde{z}_{k}\}$ and $\{r_{k}\}$ satisfy

[TABLE]

which is to say that in this case the VM-HPE framework reduces to a special case of the HPE framework (see pp. 2763 in [28]) with $\lambda_{k}\equiv 1$ (in the notation of [28]) or, in other words, the VM-HPE framework is a generalization of a special case of the HPE framework in which variations in the metric are allowed along the iterations. 2) If the sequence $\{M_{k}\}_{k\geq 0}$ is taken to be constant, then the VM-HPE framework reduces to a special case of the NE-HPE framework studied in [16]. 3) We also mention that a variable metric inexact proximal point method with relative error tolerance was proposed in [32] but, contrary to our framework, the method of [32] demands that every operator $M_{k}$ must be positive definite. Moreover, the convergence analysis presented in [32] does not include nonasymptotic convergence rates. The fact that the VM-HPE framework allows positive semidefinite operators $M_{k}$ will be crucial for viewing the VM-PADMM of Section 3 as a special instance of it.

From now on in this section, we assume the following condition to hold:

Assumption 2.1.

For the sequence $\{M_{k}\}_{k\geq 1}$ generated by the VM-HPE framework, there exist $M_{0}\in\mathcal{M}^{\mathcal{Z}}_{+}$ , $0\leq C_{S}<\infty$ and, for each $k\geq 0$ , $c_{k}\geq 0$ such that $\{c_{k}\}_{k\geq 0}$ and $\{M_{k}\}_{k\geq 0}$ satisfy

[TABLE]

Remark. The above assumption (which is similar to condition (1.4) in [32]) is satisfied, for instance, if the sequence $\{M_{k}\}_{k\geq 0}$ is taken to be constant and $c_{k}\equiv 0$ , in which case one can choose $C_{S}=0$ .

It is easy to check that Assumption 2.1 implies the existence of a constant $C_{P}>0$ such that $\{c_{k}\}_{k\geq 0}$ and $\{M_{k}\}_{k\geq 0}$ satisfy

[TABLE]

In the remaining part of this section, we present pointwise and ergodic convergence rates for the VM-HPE framework. These results will depend on the quantity:

[TABLE]

which measures the “quality” of the initial guess $z_{0}\in\mathcal{Z}$ in the VM-HPE framework with respect to the solution set $T^{-1}(0)$ .

For technical reasons and for the convenience of the reader, the proofs of the next two theorems will be given in Appendix A.

Theorem 2.2.

(Pointwise convergence rate of the VM-HPE framework)*

Let $\{\tilde{z}_{k}\}$ , $\{r_{k}\}$ and $\{M_{k}\}$ be generated by the VM-HPE framework. Let also $C_{P}$ and $d_{0}$ be as in (13) and (14), respectively. Then, for every $k\geq 1$ , $r_{k}\in T(\tilde{z}_{k})$ and there exists $i\leq k$ such that*

[TABLE]

Remarks. 1) If $c_{k}\equiv 0$ in Assumption 2.1 (in which case $M_{k}\equiv M_{0}$ ), then the upper bound in (15) with $C_{S}=0$ and $C_{P}=1$ reduces essentially to a special case of [16, Theorem 3.3(a)] (with $\lambda_{k}\equiv 1,\varepsilon_{k}\equiv 0$ and $d(w)_{z}(z^{\prime})=(1/2)\|z-z^{\prime}\|^{2}$ ). Additionally, if $M_{0}=I$ and $\eta_{0}=0$ , then the bound (15) becomes similar to the corresponding one in [28, Theorem 4.4(a)]. 2) For a given tolerance $\rho>0$ , Theorem 2.2 ensures that there exists an index

[TABLE]

such that

[TABLE]

In this case, $\tilde{z}_{i}\in\mathcal{Z}$ can be interpreted as a $\rho$ -approximate solution of (9) with residual $r_{i}\in\mathcal{Z}$ (see, e.g., [28] for the definition of a related concept). 3) Although $M_{i}$ may not be invertible, criterion (17) makes sense due to the fact that $r_{i}$ belongs to the image of $M_{i}$ (see (10)). Indeed, if $\|r_{i}\|_{\mathcal{Z},M_{i}}^{*}=0$ , then (10) and Proposition 1.1 imply that $r_{i}=0$ , and hence it follows from (17) that $\tilde{z}_{i}$ is a solution of problem (9).

Before presenting the ergodic convergence of the VM-HPE framework, let us define the ergodic sequences $\{\tilde{z}_{k}^{a}\}$ , $\{r_{k}^{a}\}$ and $\{\varepsilon_{k}^{a}\}$ associated to $\{\tilde{z}_{k}\}$ and $\{r_{k}\}$ as follows:

[TABLE]

Theorem 2.3.

(Ergodic convergence rate of the VM-HPE framework)*

Let $\{\tilde{z}_{k}^{a}\}$ , $\{r_{k}^{a}\}$ and $\{\varepsilon_{k}^{a}\}$ be given as in (18) and $\{M_{k}\}$ be generated by the VM-HPE framework. Let also $C_{S}$ , $C_{P}$ and $d_{0}$ be as in (12), (13) and (14), respectively. Then, for every $k\geq 1$ , we have $r^{a}_{k}\in T^{\varepsilon^{a}_{k}}(\tilde{z}^{a}_{k})$ and*

[TABLE]

where $\mathcal{E}:=(1+C_{P})\left(\sqrt{C_{P}}+C_{S}C_{P}\right)+C_{S}C_{P}^{3/2}$ and $\widehat{\mathcal{E}}:=2C_{P}(1+C_{S})\left[{\sigma C_{P}}/{(1-\sigma)}+2(1+C_{P})\right]$ .

Remarks.

Similarly to the first remark after Theorem 2.2, Theorem 2.3 is also related to [16, Theorem 3.4] and [28, Theorem 4.7].
For given tolerances $\rho,\varepsilon>0$ , Theorem 2.3 ensures that in at most

[TABLE]

iterations there hold

[TABLE]

Note that (21), in terms of the dependence on $\rho>0$ , is better than the bound in (16) by a factor of $\mathcal{O}\left(\rho\right)$ but, on the other hand, since $\varepsilon_{k}^{a}$ can be strictly positive, the inclusion in (22) is potentially weaker than the one in (17).

3 A variable metric proximal alternating direction method of multipliers

This section contains two subsections. In Subsection 3.1, we formally state the VM-PADMM (2)–(4) and present its nonasymptotic convergence rates. The main results are Theorems 3.2 and 3.3 in which pointwise and ergodic convergence rates are obtained, respectively. The proofs of the latter theorems are discussed separately in Subsection 3.2 by viewing the method as an instance of the VM-HPE framework and by applying the results of Section 2.

3.1 VM-PADMM and its convergence rates

Let $\mathcal{X}$ , $\mathcal{Y}$ and $\Gamma$ be finite-dimensional real inner product vector spaces. Consider the convex optimization problem (1), i.e.,

[TABLE]

where the following assumptions are assumed to hold:

(O1)

$f:\mathcal{X}\to\overline{\mathbb{R}}$ and $g:\mathcal{Y}\to\overline{\mathbb{R}}$ are proper closed and convex functions;

(O2)

$A:\mathcal{X}\to\Gamma$ and $B:\mathcal{Y}\to\Gamma$ are linear operators and $b\in\Gamma$ ;

(O3)

the solution set of (23) is nonempty.

Under the above assumptions and standard constraint qualifications (see, e.g.,[33, Corollaries 28.2.2 and 28.3.1]), a vector $(x^{*},y^{*})\in\mathcal{X}\times\mathcal{Y}$ is a solution of (23) if and only if there exists a (Lagrange multiplier) $\gamma^{*}\in\Gamma$ such that $(x^{*},y^{*},\gamma^{*})$ is a solution of

[TABLE]

Motivated by the above statement, we define

[TABLE]

which is assumed to be nonempty.

The convergence rates of the VM-PADMM (stated below) for solving (23) will be obtained by viewing the optimization problem (23) as the monotone inclusion (24), which is associated to a certain maximal monotone operator (see (48)) in $\mathcal{X}\times\mathcal{Y}\times\Gamma$ , and by applying the results of the previous section.

Variable metric proximal alternating direction method of multipliers (VM-PADMM).

(0)

Let $(x_{0},y_{0},\gamma_{0})\in\mathcal{X}\times\mathcal{Y}\times{\Gamma}$ and $\theta\in(0,(\sqrt{5}+1)/2)$ be given, and set $k=1$ .

(1)

Choose $R_{k}\in\mathcal{M}^{\mathcal{X}}_{+}$ , $S_{k}\in\mathcal{M}^{\mathcal{Y}}_{+}$ and $H_{k}\in\mathcal{M}^{\Gamma}_{++}$ and compute an optimal solution $x_{k}\in\mathcal{X}$ of the subproblem

$\min_{x\in\mathcal{X}}\left\{f(x)-\langle{{\gamma}_{k-1}},{Ax}\rangle_{\mathcal{X}}+\frac{1}{2}\|Ax+By_{k-1}-b\|^{2}_{\Gamma,H_{k}}+\frac{1}{2}\|x-x_{k-1}\|_{\mathcal{X},R_{k}}^{2}\right\}$

(26)

and compute an optimal solution $y_{k}\in\mathcal{Y}$ of the subproblem

$\min_{y\in\mathcal{Y}}\left\{g(y)-\langle{{\gamma}_{k-1}},{By}\rangle_{\mathcal{Y}}+\frac{1}{2}\|Ax_{k}+By-b\|^{2}_{\Gamma,H_{k}}+\frac{1}{2}\|y-y_{k-1}\|_{\mathcal{Y},S_{k}}^{2}\right\}.$

(27)

(2)

Set

$\gamma_{k}=\gamma_{k-1}-\theta H_{k}\left(Ax_{k}+By_{k}-b\right),$

(28)

$k\leftarrow k+1$ , and go to step (1).

end

Remarks. 1) As already mentioned in Section 1, the VM-PADMM can be regarded as a class of ADMM instances, allowing a unified study of different variants of ADMM. 2) An usual choice for the linear operator $H_{k}$ is $\beta_{k}I$ , where $\beta_{k}>0$ plays the role of a penalty parameter. 3) The proximal terms in (26) and (27) defined by $R_{k}$ and $S_{k}$ , respectively, may have different roles. Namely, they can be used to regularize the subproblems in (26) and (27), making them strongly convex (when $R_{k}$ and $S_{k}$ are positive definite operators) and hence admitting unique solutions. Moreover, by a careful choice of these operators, subproblems (26) and (27) may become much easier to solve; for instance, if $H_{k}=\beta_{k}I$ , then $R_{k}=\tau_{k}I-\beta_{k}A^{*}A$ with $\tau_{k}>\beta_{k}\|A^{*}A\|$ and $S_{k}=s_{k}I-\beta_{k}B^{*}B$ with $s_{k}>\beta_{k}\|B^{*}B\|$ eliminate the presence of quadratic forms associated to $A^{*}A$ and $B^{*}B$ in (26) and (27), respectively.

From now on in this section, the following conditions are assumed to hold:

Assumption 3.1.

For the sequences $\{R_{k}\}_{k\geq 1}$ , $\{S_{k}\}_{k\geq 1}$ and $\{H_{k}\}_{k\geq 1}$ generated by the VM-PADMM, there exist $R_{0}\in\mathcal{M}_{+}^{\mathcal{X}}$ , $S_{0}\in\mathcal{M}_{+}^{\mathcal{Y}}$ , $H_{0}\in\mathcal{M}_{++}^{\Gamma}$ , $0\leq C_{S}<\infty$ and, for each $k\geq 0$ , $c_{k}\in[0,1]$ such that $\{c_{k}\}_{k\geq 0}$ , $\{Q_{k,1}:=R_{k}\}_{k\geq 0}$ , $\{Q_{k,2}:=S_{k}\}_{k\geq 0}$ and $\{Q_{k,3}:=H_{k}\}_{k\geq 0}$ satisfy

[TABLE]

Analogously to condition (13), assumption 3.1 implies the existence of $C_{P}>0$ such that $\{c_{k}\}_{k\geq 0}$ satisfies

[TABLE]

We mention that Assumption 3.1 is similar to Condition C in [19] but, contrary to the latter reference, none of the operators $R_{k}$ and $S_{k}$ is assumed to be positive definite.

Similarly to the previous section, the following quantity will be needed:

[TABLE]

where $(x_{0},y_{0},\gamma_{0})$ and $\theta$ are given in Step (0) of the VM-PADMM, $R_{0}\in\mathcal{M}_{+}^{\mathcal{X}}$ , $S_{0}\in\mathcal{M}_{+}^{\mathcal{Y}}$ and $H_{0}\in\mathcal{M}_{++}^{\Gamma}$ are given in Assumption 3.1, and $\Omega^{*}$ is defined in (25).

Next we present the two main results of this paper, whose proofs are given in Subsection 3.2.

Theorem 3.2.

(Pointwise convergence rate of the VM-PADMM)*

Let $\{(x_{k},y_{k},\gamma_{k})\}$ , $\{R_{k}\}$ , $\{S_{k}\}$ and $\{H_{k}\}$ be generated by the VM-PADMM and let*

[TABLE]

Let also $C_{P}$ and $d_{0}$ be as in (30) and (31), respectively. Then, there exists a parameter $\sigma_{\theta}\in(0,1)$ such that, for all $k\geq 1$ ,

[TABLE]

and, for some $i\leq k$ ,

[TABLE]

*where $\tau_{\theta}:=(8(\sigma_{\theta}+\theta-1)\max\{1,\theta/{(2-\theta)}\})/\sqrt{\theta^{3}}$ . *

Remark. For a given tolerance $\rho>0$ , Theorem 3.2 guarantees the existence of triples $(x,y,\tilde{\gamma})$ , $(r_{x},r_{y},r_{\gamma})$ and operators $R\in\mathcal{M}_{+}^{\mathcal{X}}$ , $S\in\mathcal{M}_{+}^{\mathcal{Y}}$ and $H\in\mathcal{M}_{++}^{\Gamma}$ (generated by the VM-PADMM) such that

[TABLE]

in at most

[TABLE]

iterations, where $C_{P}$ and $d_{0}$ are as in (30) and (31), respectively. The triple $(x,y,\tilde{\gamma})$ in (35) can be seen as a $\rho$ -approximate solution of the KKT system (24) with residual $(r_{x},r_{y},r_{\gamma})$ .

Before proceeding to present the ergodic convergence of the VM-PADMM we need to introduce its associated ergodic sequences. Let $\{(x_{k},y_{k},\gamma_{k})\}$ be generated by the VM-PADMM, let $\{\tilde{\gamma}_{k}\}$ and $\{(r_{k,x},r_{k,y},r_{k,\gamma})\}$ be defined as in (32) and (33), respectively, and let the ergodic sequences associated to them be defined by

[TABLE]

Theorem 3.3.

(Ergodic convergence rate of the VM-PADMM)*

Let $\{R_{k}\}$ , $\{S_{k}\}$ and $\{H_{k}\}$ be generated by the VM-PADMM and let $\{(x_{k}^{a},y_{k}^{a})\}$ , $\{\tilde{\gamma}_{k}^{a}\}$ , $\{(r^{a}_{k,x},r^{a}_{k,y},r^{a}_{k,\gamma})\}$ and $\{(\varepsilon^{a}_{k,x},\varepsilon^{a}_{k,y})\}$ be the ergodic sequences defined as in (37)–(39). Let also $C_{S}$ , $C_{P}$ , and $d_{0}$ be as in (29), (30) and (31), respectively. Then, there exists a parameter $\sigma_{\theta}\in(0,1)$ such that, for all $k\geq 1$ , there hold $\varepsilon_{k,x}^{a},\,\varepsilon_{k,y}^{a}\geq 0$ ,*

[TABLE]

and

[TABLE]

*where $\mathcal{E}$ and $\widehat{\mathcal{E}}$ are as in Theorem 2.3 with $\sigma=\sigma_{\theta}$ and $\tau_{\theta}$ is as in Theorem 3.2. *

Remark. Given tolerances $\rho,\varepsilon>0$ , Theorem 3.3 guarantees that there exist scalars $\varepsilon_{x},\varepsilon_{y}\geq 0$ , triples $(x,y,\tilde{\gamma})$ , $(r_{x},r_{y},r_{\gamma})$ and operators $R\in\mathcal{M}_{+}^{\mathcal{X}}$ , $S\in\mathcal{M}_{+}^{\mathcal{Y}}$ and $H\in\mathcal{M}_{++}^{\Gamma}$ (generated by the VM-PADMM) such that

[TABLE]

in at most

[TABLE]

iterations, where $C_{S},C_{P}$ and $d_{0}$ are as in Assumption 3.1, (30) and (31), respectively. Note that while the dependence on the tolerance $\rho$ in (44) is better than the corresponding one in (36) by a factor of $\mathcal{O}(\rho)$ , the inclusions in (43) are potentially weaker than the corresponding ones in (35). The triple $(x,y,\tilde{\gamma})$ in (43) can be seen as a $(\rho,\varepsilon)$ -approximate solution of the KKT system (24) with residual $(r_{x},r_{y},r_{\gamma})$ .

3.2 Proof of Theorems 3.2 and 3.3

The main goal of this subsection is to prove Theorems 3.2 and 3.3 by viewing the VM-PADMM as an instance of the VM-HPE framework of Section 2 for solving (9) with $T:\mathcal{Z}\rightrightarrows\mathcal{Z}$ defined by

[TABLE]

where $\mathcal{Z}:=\mathcal{X}\times\mathcal{Y}\times\Gamma$ is endowed with the usual inner product of vectors $z=(x,y,\gamma),z^{\prime}=(x^{\prime},y^{\prime},\gamma^{\prime})$ :

[TABLE]

The desired results will then follow essentially from Theorems 2.2 and 2.3, and from the identity

[TABLE]

where $T^{-1}(0)$ and $\Omega^{*}$ are the solution sets defined in (9) and (25), respectively. The following linear operators will be needed in our analysis:

[TABLE]

where $\{R_{k}\}_{k\geq 1}$ , $\{S_{k}\}_{k\geq 1}$ and $\{H_{k}\}_{k\geq 1}$ are generated by the VM-PADMM and $R_{0}\in\mathcal{M}_{+}^{\mathcal{X}}$ , $S_{0}\in\mathcal{M}_{+}^{\mathcal{Y}}$ , $H_{0}\in\mathcal{M}_{++}^{\Gamma}$ are given in Assumption 3.1.

We begin by presenting a preliminary technical result.

Proposition 3.4.

*Let $\{(x_{k},y_{k},\gamma_{k})\}$ be generated by the VM-PADMM and let $\{\tilde{\gamma}_{k}\}$ be defined as in (32). Let also $\{M_{k}\}$ be defined as in (54). Then, *

[TABLE]

Proof.

From the first order optimality conditions for (26) and (27), we obtain, respectively,

[TABLE]

which, combined with (32), yields

[TABLE]

On the other hand, (28) (and the assumption $H_{k}\in\mathcal{M}_{++}^{\Gamma}$ ) gives

[TABLE]

Using (54), (56) and (57) we obtain (55). ∎

The next lemma will allow us to use the main results of Section 2 for analyzing the nonasymptotic convergence of the VM-PADMM.

Lemma 3.5.

The sequence $\{M_{k}\}_{k\geq 0}$ defined in (54), the scalar $C_{S}$ and the sequence $\{c_{k}\}$ given in Assumption 3.1 satisfy condition (12) of Assumption 2.1.

Proof.

Note that the first condition in (29) is identical to the first one in (12). To finish the proof, note that the second condition in (29), which by Assumption 3.1 is assumed to hold for $\{R_{k}\}_{k\geq 0}$ , $\{S_{k}\}_{k\geq 0}$ and $\{H_{k}\}_{k\geq 0}$ , combined with the (block) diagonal structure of $M_{k}$ gives the second condition in (12) for $\{c_{k}\}_{k\geq 0}$ and $\{M_{k}\}_{k\geq 0}$ . ∎

The following two technical results will be used to prove that the VM-PADMM is an instance of the VM-HPE framework.

Lemma 3.6.

*Let $\{(x_{k},y_{k},\gamma_{k})\}$ , $\{S_{k}\}$ and $\{H_{k}\}$ be generated by the VM-PADMM and let $\{\tilde{\gamma}_{k}\}$ be defined as in (32). Let also $d_{0}$ be defined as in (31). Then, the following hold:

for any $k\geq 1$ , we have

[TABLE]

(b)* we have*

[TABLE]

(c)* for any $t>0$ and $k\geq 2$ , we have*

[TABLE]

Proof.

(a) This item follows trivially from (28) and (32).

(b) First note that

[TABLE]

which combined with the property (7) yields, for all $z^{*}:=(x^{*},y^{*},\gamma^{*})\in\Omega^{*}$ ,

[TABLE]

Direct use of the above inequality and (54) yields

[TABLE]

where $z_{0}:=(x_{0},y_{0},\gamma_{0})$ and $z_{1}:=(x_{1},y_{1},\gamma_{1})$ . On the other hand, from Proposition 3.4 and (54) with $k=1$ , we have $r_{1}:=M_{1}(z_{0}-z_{1})\in T(\tilde{z}_{1})$ , where $T$ is given in (48). Using this fact, (50) and the monotonicity of $T$ , we obtain $\langle\tilde{z}_{1}-z^{*},r_{1}\rangle\geq 0$ for all $z^{*}=(x^{*},y^{*},z^{*})\in\Omega^{*}$ . Hence, from the latter inequality, Lemma A.1 with $(z,z_{+},\tilde{z})=(z_{0},z_{1},\tilde{z}_{1})$ and $M=M_{1}$ , we have, for all $z^{*}=(x^{*},y^{*},z^{*})\in\Omega^{*}$ ,

[TABLE]

Note now that letting $\tilde{z}_{1}:=(x_{1},y_{1},\tilde{\gamma}_{1})$ , it follows from (54), item (a) and some direct calculations that

[TABLE]

Moreover, using (54) with $k=1$ and item (a), we find

[TABLE]

Combining the previous two estimates, we obtain

[TABLE]

If $\theta\in(0,1]$ , then the last inequality implies that

[TABLE]

Now, if $\theta\in(1,(\sqrt{5}+1)/2)$ , we have

[TABLE]

where the second inequality is due to property (7), and the last inequality is due to (54) and definitions of $z_{0},z_{1}$ and ${z}^{*}$ . Hence, combining the last estimative with (59), we obtain

[TABLE]

Thus, it follows from (59), (62) and the last inequality that

[TABLE]

Since, $M_{1}\preceq(1+c_{0})M_{0}\preceq 2M_{0}$ (see Assumption 3.1 and Lemma 3.5), the desired inequality follows from (58) and (63), and definition of $d_{0}$ in (31).

(c) Using the first order optimality condition for (27), (32) and item (a), we find, for every $k\geq 1$ ,

[TABLE]

For any $k\geq 2$ , using the above inclusion with $k\leftarrow k$ and $k\leftarrow k-1$ , the monotonicity of $\partial g$ and the property (6), we find

[TABLE]

where the last inequality is due to Proposition 1.2 and Assumption 3.1, and so the proof of the lemma follows. ∎

Lemma 3.7.

For every $\theta\in(0,(\sqrt{5}+1)/2)$ , there exists a parameter $\sigma_{\theta}\in(0,1)$ such that, for all $\sigma\in[\sigma_{\theta},1)$ , the matrix

[TABLE]

is symmetric positive definite, and

[TABLE]

Proof.

Since $M_{\theta}(\sigma)$ is symmetric, the proof is immediate by noting that for $\sigma=1$ and for every $\theta\in(0,(\sqrt{5}+1)/2)$ , $M_{\theta}(\sigma)$ is definite positive and (64) trivially holds. ∎

Next we show that the VM-PADMM can be regarded as an instance of the VM-HPE framework.

Proposition 3.8.

Let $\{(x_{k},y_{k},\gamma_{k})\}$ be generated by the VM-PADMM and let $\{\tilde{\gamma}_{k}\}$ and $\{M_{k}\}$ be defined as in (32) and (54), respectively. Let also $d_{0}$ , $T$ , $\sigma_{\theta}$ and $\tau_{\theta}$ be as in (31), (48), Lemma 3.7, and Theorem 3.2, respectively. Define $z_{0}:=(x_{0},y_{0},\gamma_{0})$ , $\eta_{0}:=\tau_{\theta}d_{0}^{2}$ and, for all $k\geq 1$ ,

[TABLE]

Then, for all $k\geq 1$ ,

[TABLE]

*As a consequence, the VM-PADMM falls within the VM-HPE framework (with input $z_{0}$ , $\eta_{0}$ and $\sigma=\sigma_{\theta}$ ) for solving (9) with $T$ as in (48). *

Proof.

First note that the inclusion in (67) follows from (48), (55) and the definitions of $z_{k}$ , $\tilde{z}_{k}$ and $r_{k}$ in (65). Now, using (49), (54), (65) and some direct calculations, we obtain

[TABLE]

Using the same reasoning and Lemma 3.6(a), we also find

[TABLE]

Hence, from Lemma 3.6(a) and some algebraic manipulations, we obtain

[TABLE]

which in turn, combined with (3.2) and (69), yields

[TABLE]

We will now consider two cases: $k=1$ and $k>1$ . In the first case, it follows from (3.2) with $k=1$ , Lemma 3.6(b), the first inequality in (64) with $\sigma=\sigma_{\theta}$ , and definitions of $\eta_{0}$ and $\eta_{1}$ that

[TABLE]

where the last inequality is due to $\sqrt{\theta}\leq 3/2$ . Hence, since $({2-3\sqrt{2}})/3\geq(2\sqrt{2}-4)/{\sqrt{2}}$ , inequality (67) for $k=1$ now follows from the second inequality in (64) with $\sigma=\sigma_{\theta}$ . On the other hand, assuming $k>1$ , from inequality (3.2), Lemma 3.6(c) with $t=\sqrt{2}$ , the first inequality in (64) with $\sigma=\sigma_{\theta}$ , and definition of $\{\eta_{k}\}$ in (66), we have

[TABLE]

Since $c_{k-1}\leq 1$ (see Assumption 3.1), we obtain from (64) with $\sigma=\sigma_{\theta}$ that the term inside bracket is nonnegative. Hence, inequality (67) for $k>1$ now follows from the first statement of Lemma 3.7.

The last statement of the proposition follows directly from (67) and VM-HPE framework’s definition. ∎

We are now ready to prove Theorems 3.2 and 3.3.

Proof of Theorem 3.2: Using Proposition 3.8 and Theorem 2.2, we conclude that, for every $k\geq 1$ , (33) holds and there exists $i\leq k$ such that

[TABLE]

where $\{M_{k}\}$ and $\{z_{k}\}$ are defined in (54) and (65), respectively. Hence, using Proposition 1.1, we obtain

[TABLE]

On the other hand, using Proposition 1.1 and the definition in (33), we find

[TABLE]

which, combined with (71) and (3.2), proves (34). ∎

Proof of Theorem 3.3: Combining Proposition 3.8 and Theorem 2.3, and taking into account that $r_{k}^{a}=(r_{k,\,x}^{a},r_{k,\,y}^{a},r_{k,\,\gamma}^{a})$ , we conclude that, for every $k\geq 1$ ,

[TABLE]

On the other hand, (33), (37) and (38) yield

[TABLE]

Additionally, (37), (38) and some algebraic manipulations give

[TABLE]

Hence, combining the identity in (74) with the last two displayed equations, we also find

[TABLE]

where the last equality is due to the definitions of $\varepsilon_{k,x}^{a}$ and $\varepsilon_{k,y}^{a}$ in (39). Therefore, the inequalities in (41) and (42) now follows from (73) and (74), respectively.

To finish the proof of the theorem, note that direct use of Theorem 1.3(b) (for $f$ and $g$ ), (33) and (37)–(39) give $\varepsilon_{k,x}^{a},\,\varepsilon_{k,y}^{a}\geq 0$ and (40). ∎

Appendix A Proof of Theorems 2.2 and 2.3

We start by presenting the following two Lemmas.

Lemma A.1.

For any $z^{*},z,z_{+},\tilde{z}\in\mathcal{Z}$ and $M\in\mathcal{M}^{\mathcal{Z}}_{+}$ , we have

[TABLE]

Proof.

Direct calculations yield

[TABLE]

∎

Lemma A.2.

Let $\{z_{k}\}$ , $\{M_{k}\}$ , $\{\tilde{z}_{k}\}$ and $\{\eta_{k}\}$ be generated by the VM-HPE framework. For every $k\geq 1$ and $z^{*}\in T^{-1}(0):$

(a)

we have

[TABLE]

(b)

we have

[TABLE]

where $C_{P}$ and $M_{0}$ are as in (13) and Assumption 2.1, respectively.

Proof.

(a) From Lemma A.1 with $(z,z_{+},\tilde{z})=(z_{k-1},z_{k},\tilde{z}_{k})$ and $M=M_{k}$ , (10) and (11), we obtain

[TABLE]

Hence, (a) follows from the above inequality, the fact that $0\in T(z^{*})$ and $r_{k}\in T(\tilde{z}_{k})$ (see (10)), and the monotonicity of $T$ .

(b) Using (a), (8) and Assumption 2.1, we find

[TABLE]

Thus, the result follows by applying the above inequality recursively and by using (13). ∎

We are now ready to prove Theorem 2.2.

Proof of Theorem 2.2: First, note that the desired inclusion holds due to (10). Now, using (7) and (11), we obtain, respectively,

[TABLE]

Combining the above inequalities, we find

[TABLE]

which in turn, combined with Lemma A.2(b), yields

[TABLE]

for all $z^{*}\in T^{-1}(0)$ . Hence, (15) follows from Proposition 1.1, (10), (14), (75) and the fact that $\sum_{i=1}^{k}\,t_{i}\geq k\min_{i=1,\dots,k}\{t_{i}\}$ . ∎

Before proceeding to the proof of the ergodic convergence of the VM-HPE framework, let us first present an auxiliary result.

Proposition A.3.

Let $\{z_{k}\}$ , $\{M_{k}\}$ and $\{\eta_{k}\}$ be generated by the VM-HPE framework and consider $\{\tilde{z}_{k}^{a}\}$ and $\{\varepsilon_{k}^{a}\}$ as in (18). Then, for every $k\geq 1$ ,

[TABLE]

where $M_{0}$ and $\{c_{k}\}$ are given in Assumption 3.1.

Proof.

Using Lemma A.1 with $(z^{*},z,z_{+},\tilde{z})=(\tilde{z}^{a}_{k},z_{i-1},z_{i},\tilde{z}_{i})$ and $M=M_{i}$ , (10) and (11), we find, for every $i=1,\dots,k$ ,

[TABLE]

where the second inequality is due to the fact that $1-\sigma\geq 0$ . Hence, using Assumption 2.1 and simple calculations, we obtain

[TABLE]

Summing up the last inequality from $i=1$ to $i=k$ and using the definition of $\varepsilon_{k}^{a}$ in (18), we have

[TABLE]

which clearly gives (76). ∎

Proof of Theorem 2.3: Note first that the desired inclusion and the first inequality in (20) follow from (10), (18) and Theorem 1.3(a). Take $z^{*}\in T^{-1}(0)$ . Now, let us prove the second inequality in (20), which will follow by bounding the term in the right-hand side of (76). Note that, using the convexity of $\|\cdot\|_{M_{i-1}}^{2}$ , inequality (7) and (18), we find

[TABLE]

From (13), we have $M_{i-1}\preceq C_{P}M_{j}$ for all $j=1,\dots,k$ . Hence, using Proposition 1.2, inequality (11), Lemma A.2(b) and (14), we find

[TABLE]

On the other hand, using (7), $M_{i-1}\preceq C_{P}M_{j}$ for all $j=1,\dots,k$ , Proposition 1.2, Lemma A.2(b) and (14), we obtain

[TABLE]

It follows from inequalities (77)–(A) and the fact that $k\geq 1$ that

[TABLE]

which, combined with Proposition A.3 and the first condition in (12), yields

[TABLE]

Therefore, the second inequality in (20) now follows from definition of $\widehat{\mathcal{E}}$ and simple calculus.

To finish the proof of the theorem, it remains to prove (19). Assume first that $k\geq 2$ . Using (18) and simple calculus, we have

[TABLE]

From (13), we obtain $M_{1}\preceq C_{P}M_{k}$ and $M_{1}\preceq C_{P}M_{0}$ . Hence, it follows from Propositions 1.1 and 1.2 that

[TABLE]

Direct use of Proposition 1.1 yields

[TABLE]

Next step is to estimate the general term in the summation in (80). To do this, first note that using Assumption 2.1, we find

[TABLE]

and so

[TABLE]

From (13) and the last inequality in (83), we obtain, respectively, $M_{i}\preceq C_{p}M_{k}$ and $L_{i}\preceq c_{i}(2+c_{i})M_{i}$ . Hence, using Propositions 1.1 and 1.2, we have

[TABLE]

Again, from (13), we obtain $M_{i+1}\preceq C_{P}M_{k}$ and $M_{i+1}\preceq(1+c_{i})M_{i}$ , and consequently

[TABLE]

Hence, using (13) and (A)–(A), we find

[TABLE]

Finally, using the definition of $d_{0}$ in (14), (80)–(82), (A) and Lemma A.2(b), we conclude that

[TABLE]

which gives (19) for the case $k\geq 2$ . Note now that by (13), we have $M_{1}\preceq C_{P}M_{0}$ and so using Propositions 1.1 and 1.2, Lemma A.2(b), (14) and the second identity in (18) with $k=1$ , we find

[TABLE]

which in turn, combined with the fact that $C_{P}\geq 1$ , gives (19) for $k=1$ . ∎

Bibliography38

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Attouch and M. Soueycatt. Augmented Lagrangian and proximal alternating direction methods of multipliers in Hilbert spaces. Applications to games, PDE’s and control. Pac. J. Optim. , 5(1):17–37, 2008.
2[2] S. Banert, R. I. Bot, and E. R. Csetnek. Fixing and extending some recent results on the ADMM algorithm. Avaliable on http://www.arxiv.org .
3[3] B.He, H. Liu, Z. Wang, and X. Yuan. A strictly contractive peaceman–rachford splitting method for convex programming. SIAM J. Optim. , 24(3):1011–1040, 2014.
4[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. , 3(1):1–122, 2011.
5[5] R. S. Burachik, A. N. Iusem, and B. F. Svaiter. Enlargement of monotone operators with applications to variational inequalities. Set-Valued Anal. , 5(2):159–180, 1997.
6[6] R. S. Burachik, C. A. Sagastizábal, and B. F. Svaiter. ϵ italic-ϵ \epsilon -enlargements of maximal monotone operators: theory and applications. In Reformulation: nonsmooth, piecewise smooth, semismooth and smoothing methods (Lausanne, 1997) , volume 22 of Appl. Optim. , pages 25–43. Kluwer Acad. Publ., Dordrecht, 1999.
7[7] A. Chambolle and T. Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. , 40(1):120–145, 2011.
8[8] Y. Cui, X. Li, D. Sun, and K. C. Toh. On the convergence properties of a majorized ADMM for linearly constrained convex optimization problems with coupled objective functions. J. Optim. Theory Appl. , 169(3):1013–1041, 2016.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Pointwise and ergodic convergence rates of a variable metric proximal ADMM

Abstract

1 Introduction

1.1 Basic results and notation

Proposition 1.1**.**

Proposition 1.2**.**

Theorem 1.3**.**

2 A variable metric HPE framework

Assumption 2.1**.**

Theorem 2.2**.**

Theorem 2.3**.**

3 A variable metric proximal alternating direction method of multipliers

3.1 VM-PADMM and its convergence rates

Assumption 3.1**.**

Theorem 3.2**.**

Theorem 3.3**.**

3.2 Proof of Theorems 3.2 and 3.3

Proposition 3.4**.**

Proof.

Lemma 3.5**.**

Proof.

Lemma 3.6**.**

Proof.

Lemma 3.7**.**

Proof.

Proposition 3.8**.**

Proof.

Appendix A Proof of Theorems 2.2 and 2.3

Lemma A.1**.**

Proof.

Lemma A.2**.**

Proof.

Proposition A.3**.**

Proof.

Proposition 1.1.

Proposition 1.2.

Theorem 1.3.

Assumption 2.1.

Theorem 2.2.

Theorem 2.3.

Assumption 3.1.

Theorem 3.2.

Theorem 3.3.

Proposition 3.4.

Lemma 3.5.

Lemma 3.6.

Lemma 3.7.

Proposition 3.8.

Lemma A.1.

Lemma A.2.

Proposition A.3.