Alternating Direction Method of Multipliers with Variable Metric   Indefinite Proximal Terms for Convex Optimization

Yan Gu; Nobuo Yamashita

arXiv:1906.12112·math.OC·July 1, 2019

Alternating Direction Method of Multipliers with Variable Metric Indefinite Proximal Terms for Convex Optimization

Yan Gu, Nobuo Yamashita

PDF

Open Access

TL;DR

This paper introduces a variable metric indefinite proximal ADMM for convex optimization, providing convergence conditions and a new BFGS-based proximal term that enhances algorithm speed and applicability.

Contribution

It develops a globally convergent variable metric indefinite proximal ADMM and proposes a novel BFGS-based indefinite proximal term.

Findings

01

The proposed method converges globally under certain conditions.

02

A new BFGS-based indefinite proximal term satisfies convergence criteria.

03

Numerical experiments show improved performance over fixed positive semidefinite proximal terms.

Abstract

This paper studies a proximal alternating direction method of multipliers (ADMM) with variable metric indefinite proximal terms for linearly constrained convex optimization problems. The proximal ADMM plays an important role in many application areas, since the subproblems of the method are easy to solve. Recently, it is reported that the proximal ADMM with a certain fixed indefinite proximal term is faster than that with a positive semidefinite term, and still has the global convergence property. On the other hand, Gu and Yamashita studied a variable metric semidefinite proximal ADMM whose proximal term is generated by the BFGS update. They reported that a slightly indefinite matrix also makes the algorithm work well in their numerical experiments. Motivated by this fact, we consider a variable metric indefinite proximal ADMM, and give sufficient conditions on the proximal terms for…

Equations254

min {f (x) + g (y) ∣ A x + B y = b, x \in R^{n}, y \in R^{n}},

min {f (x) + g (y) ∣ A x + B y = b, x \in R^{n}, y \in R^{n}},

L_{β} (x, y, λ) := f (x) + g (y) - ⟨ λ, A x + B y - b ⟩ + \frac{β}{2} ∥ A x + B y - b ∥^{2},

L_{β} (x, y, λ) := f (x) + g (y) - ⟨ λ, A x + B y - b ⟩ + \frac{β}{2} ∥ A x + B y - b ∥^{2},

{(x^{k + 1}, y^{k + 1}) = ar g x, y min L_{β} (x, y, λ^{k}) λ^{k + 1} = λ^{k} - β (A x^{k + 1} + B y^{k + 1} - b) .

{(x^{k + 1}, y^{k + 1}) = ar g x, y min L_{β} (x, y, λ^{k}) λ^{k + 1} = λ^{k} - β (A x^{k + 1} + B y^{k + 1} - b) .

x^{k + 1} = ar g x min L_{β} (x, y^{k}, λ^{k}) + \frac{1}{2} ∥ x - x^{k} ∥_{S}^{2},

x^{k + 1} = ar g x min L_{β} (x, y^{k}, λ^{k}) + \frac{1}{2} ∥ x - x^{k} ∥_{S}^{2},

y^{k + 1} = ar g y min L_{β} (x^{k + 1}, y, λ^{k}) + \frac{1}{2} ∥ y - y^{k} ∥_{T}^{2},

λ^{k + 1} = λ^{k} - α β (A x^{k + 1} + B y^{k + 1} - b),

T = τ r I - β B^{⊤} B \leavevmode \leavevmode with \leavevmode \leavevmode r > β ∥ B^{⊤} B ∥, \leavevmode \leavevmode τ \in (0.75, 1) .

T = τ r I - β B^{⊤} B \leavevmode \leavevmode with \leavevmode \leavevmode r > β ∥ B^{⊤} B ∥, \leavevmode \leavevmode τ \in (0.75, 1) .

x^{k + 1} = ar g x min L_{β} (x, y^{k}, λ^{k}) + \frac{1}{2} ∥ x - x^{k} ∥_{S}^{2},

x^{k + 1} = ar g x min L_{β} (x, y^{k}, λ^{k}) + \frac{1}{2} ∥ x - x^{k} ∥_{S}^{2},

y^{k + 1} = ar g y min L_{β} (x^{k + 1}, y, λ^{k}) + \frac{1}{2} ∥ y - y^{k} ∥_{T_{k}}^{2},

λ^{k + 1} = λ^{k} - β (A x^{k + 1} + B y^{k + 1} - b),

ξ_{x}^{*} - A^{⊤} λ^{*} = 0,

ξ_{x}^{*} - A^{⊤} λ^{*} = 0,

ξ_{y}^{*} - B^{⊤} λ^{*} = 0,

A x^{*} + B y^{*} - b = 0,

ξ_{x}^{*} \in \partial f (x^{*}), ξ_{y}^{*} \in \partial g (y^{*}) .

(x - x^{k + 1})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k} + β A^{⊤} (A x^{k + 1} + B y^{k} - b) + S (x^{k + 1} - x^{k})) \geq 0, \forall x \in R^{n},

(x - x^{k + 1})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k} + β A^{⊤} (A x^{k + 1} + B y^{k} - b) + S (x^{k + 1} - x^{k})) \geq 0, \forall x \in R^{n},

(y - y^{k + 1})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k} + β B^{⊤} (A x^{k + 1} + B y^{k + 1} - b) + T_{k} (y^{k + 1} - y^{k})) \geq 0, \forall y \in R^{n},

(y - y^{k + 1})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k} + β B^{⊤} (A x^{k + 1} + B y^{k + 1} - b) + T_{k} (y^{k + 1} - y^{k})) \geq 0, \forall y \in R^{n},

- A^{⊤} λ^{k} + β A^{⊤} (A x^{k + 1} - b) = - A^{⊤} λ^{k + 1} - β A^{⊤} B y^{k + 1}

- A^{⊤} λ^{k} + β A^{⊤} (A x^{k + 1} - b) = - A^{⊤} λ^{k + 1} - β A^{⊤} B y^{k + 1}

- B^{⊤} λ^{k} + β B^{⊤} (A x^{k + 1} + B y^{k + 1} - b) = - B^{⊤} λ^{k + 1} .

- B^{⊤} λ^{k} + β B^{⊤} (A x^{k + 1} + B y^{k + 1} - b) = - B^{⊤} λ^{k + 1} .

(x - x^{k + 1})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k + 1} + β A^{⊤} B (y^{k} - y^{k + 1}) + S (x^{k + 1} - x^{k})) \geq 0, \forall x \in R^{n},

(x - x^{k + 1})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k + 1} + β A^{⊤} B (y^{k} - y^{k + 1}) + S (x^{k + 1} - x^{k})) \geq 0, \forall x \in R^{n},

(y - y^{k + 1})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k + 1} + T_{k} (y^{k + 1} - y^{k})) \geq 0, \forall y \in R^{n} .

(y - y^{k + 1})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k + 1} + T_{k} (y^{k + 1} - y^{k})) \geq 0, \forall y \in R^{n} .

u=\left(\begin{array}[]{c}x\\ y\end{array}\right),\;w=\left(\begin{array}[]{c}x\\ y\\ \lambda\end{array}\right).

u=\left(\begin{array}[]{c}x\\ y\end{array}\right),\;w=\left(\begin{array}[]{c}x\\ y\\ \lambda\end{array}\right).

(x - \overset{x}{^})^{⊤} (ξ_{x} - \hat{ξ}_{x}) \geq ∥ x - \overset{x}{^} ∥_{Σ_{f}}^{2},

(x - \overset{x}{^})^{⊤} (ξ_{x} - \hat{ξ}_{x}) \geq ∥ x - \overset{x}{^} ∥_{Σ_{f}}^{2},

(y - \overset{y}{^})^{⊤} (ξ_{y} - \hat{ξ}_{y}) \geq ∥ y - \overset{y}{^} ∥_{Σ_{g}}^{2} .

(y - \overset{y}{^})^{⊤} (ξ_{y} - \hat{ξ}_{y}) \geq ∥ y - \overset{y}{^} ∥_{Σ_{g}}^{2} .

\Sigma=\left(\begin{array}[]{c c}\Sigma_{f}&0\\ 0&\Sigma_{g}\end{array}\right).

\Sigma=\left(\begin{array}[]{c c}\Sigma_{f}&0\\ 0&\Sigma_{g}\end{array}\right).

P_{k}=\left(\begin{array}[]{c c}S&0\\ 0&T_{k}\end{array}\right),D_{k}=\left(\begin{array}[]{c c c}S&0&0\\ 0&T_{k}&0\\ 0&0&\frac{1}{\beta}I\end{array}\right),\mathrm{and}\;G_{k}=\left(\begin{array}[]{c c c}S+\Sigma_{f}&0&0\\ 0&T_{k}+\Sigma_{g}+\beta B^{\top}B&0\\ 0&0&\frac{1}{\beta}I\end{array}\right),

P_{k}=\left(\begin{array}[]{c c}S&0\\ 0&T_{k}\end{array}\right),D_{k}=\left(\begin{array}[]{c c c}S&0&0\\ 0&T_{k}&0\\ 0&0&\frac{1}{\beta}I\end{array}\right),\mathrm{and}\;G_{k}=\left(\begin{array}[]{c c c}S+\Sigma_{f}&0&0\\ 0&T_{k}+\Sigma_{g}+\beta B^{\top}B&0\\ 0&0&\frac{1}{\beta}I\end{array}\right),

Γ_{k} = T_{+}^{k} + T_{-}, \forall k,

Γ_{k} = T_{+}^{k} + T_{-}, \forall k,

Λ_{k} = - \frac{γ _{k - 1}}{2} T_{+}^{k} - 2 T_{-} + Σ_{g}, \forall k,

Δ_{k} = T_{k} + \frac{3}{2} Σ_{g} - \frac{γ _{k - 1}}{2} T_{+}^{k} - 2 T_{-} + (\frac{3}{4} - \frac{1}{2} c) β B^{⊤} B, \forall k,

(w^{k + 1} - w^{*})^{⊤} D_{k} (w^{k + 1} - w^{k}) + ∥ u^{k + 1} - u^{*} ∥_{Σ}^{2} \leq β (A x^{k + 1} - A x^{*})^{⊤} (B y^{k + 1} - B y^{k}) .

(w^{k + 1} - w^{*})^{⊤} D_{k} (w^{k + 1} - w^{k}) + ∥ u^{k + 1} - u^{*} ∥_{Σ}^{2} \leq β (A x^{k + 1} - A x^{*})^{⊤} (B y^{k + 1} - B y^{k}) .

(x^{k + 1} - x^{*})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k + 1} + β A^{⊤} B (y^{k} - y^{k + 1}) + S (x^{k + 1} - x^{k})) \leq 0,

(x^{k + 1} - x^{*})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k + 1} + β A^{⊤} B (y^{k} - y^{k + 1}) + S (x^{k + 1} - x^{k})) \leq 0,

(y^{k + 1} - y^{*})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k + 1} + T_{k} (y^{k + 1} - y^{k})) \leq 0,

(y^{k + 1} - y^{*})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k + 1} + T_{k} (y^{k + 1} - y^{k})) \leq 0,

(x^{k + 1} - x^{*})^{⊤} S (x^{k + 1} - x^{k}) + (x^{k + 1} - x^{*})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k + 1}) \leq β (A x^{k + 1} - A x^{*})^{⊤} (B y^{k + 1} - B y^{k})

(x^{k + 1} - x^{*})^{⊤} S (x^{k + 1} - x^{k}) + (x^{k + 1} - x^{*})^{⊤} (ξ_{x}^{k + 1} - A^{⊤} λ^{k + 1}) \leq β (A x^{k + 1} - A x^{*})^{⊤} (B y^{k + 1} - B y^{k})

(y^{k + 1} - y^{*})^{⊤} T_{k} (y^{k + 1} - y^{k}) + (y^{k + 1} - y^{*})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k + 1}) \leq 0.

(y^{k + 1} - y^{*})^{⊤} T_{k} (y^{k + 1} - y^{k}) + (y^{k + 1} - y^{*})^{⊤} (ξ_{y}^{k + 1} - B^{⊤} λ^{k + 1}) \leq 0.

(x^{k + 1} - x^{*})^{⊤} (ξ_{x}^{k + 1} - ξ_{x}^{*}) \geq ∥ x^{k + 1} - x^{*} ∥_{Σ_{f}}^{2},

(x^{k + 1} - x^{*})^{⊤} (ξ_{x}^{k + 1} - ξ_{x}^{*}) \geq ∥ x^{k + 1} - x^{*} ∥_{Σ_{f}}^{2},

(y^{k + 1} - y^{*})^{⊤} (ξ_{y}^{k + 1} - ξ_{y}^{*}) \geq ∥ y^{k + 1} - y^{*} ∥_{Σ_{g}}^{2},

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Direction-of-Arrival Estimation Techniques · Indoor and Outdoor Localization Technologies

Full text

Alternating Direction Method of Multipliers with Variable Metric Indefinite Proximal Terms for Convex Optimization

Yan Gu∗ and Nobuo Yamashita∗

∗Graduate School of Informatics, Kyoto University, Kyoto 6068501, Japan.

Email: [email protected]; [email protected].

(March 15, 2024)

Abstract

This paper studies a proximal alternating direction method of multipliers (ADMM) with variable metric indefinite proximal terms for linearly constrained convex optimization problems. The proximal ADMM plays an important role in many application areas, since the subproblems of the method are easy to solve. Recently, it is reported that the proximal ADMM with a certain fixed indefinite proximal term is faster than that with a positive semidefinite term, and still has the global convergence property. On the other hand, Gu and Yamashita studied a variable metric semidefinite proximal ADMM whose proximal term is generated by the BFGS update. They reported that a slightly indefinite matrix also makes the algorithm work well in their numerical experiments. Motivated by this fact, we consider a variable metric indefinite proximal ADMM, and give sufficient conditions on the proximal terms for the global convergence. Moreover, we propose a new indefinite proximal term based on the BFGS update which can satisfy the conditions for the global convergence.

Keywords: alternating direction method of multipliers, variable metric indefinite proximal term, BFGS update, global convergence, convex optimization

1 Introduction

We consider the following convex composite optimization problem:

[TABLE]

where $f\colon{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}\cup\{\infty\}$ and $g\colon{\mathbb{R}}^{n}\rightarrow{\mathbb{R}}\cup\{\infty\}$ are proper convex functions, $A\in{\mathbb{R}}^{m\times n},B\in{\mathbb{R}}^{m\times n}$ and $b\in{\mathbb{R}}^{m}$ . Various practical problems of science and engineering, such as machine learning [33, 43], total variation denoising [38] and statistics [39] can be formulated as Problem (1.1). Usually, we say that $f$ is a loss function and $g$ is a structured regularization term.

The augmented Lagrangian function of (1.1) is defined as

[TABLE]

where $\lambda\in{\mathbb{R}}^{m}$ is the Lagrangian multiplier for the linear constraints $Ax+By=b$ in (1.1), and $\beta$ is a positive scalar. Note that ${\cal L}_{\beta}\colon{\mathbb{R}}^{n}\times{\mathbb{R}}^{n}\times{\mathbb{R}}^{m}\rightarrow{\mathbb{R}}$ .

A number of efficient first-order algorithms have been developed for problem (1.1) including operator splitting methods [1, 3, 8, 11, 13, 35], gradient methods [37, 40, 41], primal dual methods [5, 7, 17], etc. One may solve problem (1.1) is the classical augmented Lagrangian method (ALM), which generates the updates

[TABLE]

In this case, the vectors $x^{k+1}$ and $y^{k+1}$ should be updated at the same time ignoring the separability of the original functions. Generally, the joint minimization problem (1.3) is a challenge to be solved exactly or approximately with a high accuracy. We want to exploit the separability of the objective function to reduce the difficulty. The classical ADMM is one of such methods, and it efficiently solves problem (1.1) [22, 20]. The convergence analysis for the classical ADMM can be referred to [22, 20, 19, 4, 14].

Fazel et al. [18] proposed a more convenient semi-proximal ADMM by adding proximal terms to subproblems which takes the following scheme:

[TABLE]

where $\alpha\in(0,(1+\sqrt{5})/2)$ , and $S,T\succeq 0$ . For a vector $z\in{\mathbb{R}}^{n}$ and a semidefinite matrix $G$ , the norm $\|\cdot\|_{G}$ is defined by $\|z\|_{G}\;=\sqrt{z^{\top}Gz}$ . In this paper, even if $G\in{\mathbb{R}}^{n\times n}$ is not positive semidefinite, we denote $\|z\|^{2}_{G}=z^{\top}Gz$ for simplicity.

The proximal ADMM covers the classical ADMM when $S=T=0$ . When $S$ and $T$ are two positive definite matrices and $\alpha=1$ , this semi-proximal ADMM reduces to the proximal ADMM proposed by Eckstein [12]. The proximal ADMM has an advantage that its subproblems are easy to solve, and it also can efficiently handle the multi-block convex optimization problem which is known as block-wise ADMM [31]. See [9, 30, 18, 42] for more details of the semi-proximal ADMM.

It is well known that the global convergence of the semi-proximal ADMM (1.4c) is easier to prove. However, it is not satisfactory in numerical performance. The paper [10] mentioned that the proximal matrix $T$ in (1.4b) could be indefinite if $\alpha\in(0,1)$ though it provided no further discussions on theoretical properties. Then Li et al. [34] proved the global convergence. He et al. [29] proposed a linearized version of ADMM with a positive-indefinite proximal term. They considered the case that matrix $S=0$ and $\alpha=1$ in (1.4c), and generated the proximal matrix $T$ as

[TABLE]

The proximal matrix $T$ is not necessarily positive semidefinite. A smaller value $\tau\in(0.75,1)$ can ensure the convergence and also give better numerical performance.

How to choose the proximal term is also one of the important research topics for ADMM. The popular proximal term is always chosen as a constant matrix. He et al. [27] extended the work to allow the parameters $\beta$ , proximal terms $T$ and $S$ to be replaced by some bounded sequences of positive definite matrices $\{T_{k}\}$ and $\{S_{k}\}$ . The resulting ADMM is a variable metric proximal ADMM, which is also closely related to the inexact ADMM [13, 6, 27, 44, 15, 16]. The convergences of such methods have been studied in [36, 2, 23] but a better selection of the sequence $\{T_{k}\}$ has not been provided.

Quite recently, Gu and Yamashita [25] proposed to construct a variable positive semi-definite sequence $\{T_{k}\}$ with $T_{k}=B_{k}-\nabla^{2}_{xx}{\cal L}_{\beta}(x,y,\lambda)$ when $f$ is quadratic. Note that $M=\nabla^{2}_{xx}{\cal L}_{\beta}(x,y,\lambda)$ is a constant matrix. They generated $B_{k}$ via the BFGS update with respect to $M$ at every iteration. Gu and Yamashita [26] further extended such a proximal ADMM for more general convex optimization problems with the proximal term generated by the Broyden family update. In these ADMMs, the proximal terms $T_{k}$ contain some second order information on the augmented Lagrangian function. The papers [25, 26] report some numerical results for LASSO and L1 regularized logistic regression. The results show that the algorithms can get a solution faster than the general indefinite proximal ADMM whose proximal term is fixed. Another interesting numerical result in [25, 26] is that a variable indefinite sequence via the BFGS update also shows a good performance.

Inspired by the variable metric semi-proximal ADMM [25, 26] and the indefinite proximal ADMM [29], it is worth considering ADMM with a sequence of indefinite proximal matrices. We call the resulting ADMM a variable metric indefinite proximal ADMM (VMIP-ADMM). Throughout our discussion, we always choose the stepsize $\alpha$ in (1.4c) be 1 as that in [29], which is good enough for such methods in practice and simple for the convergence analysis.

We now introduce the whole update scheme of the VMIP-ADMM:

[TABLE]

where $S$ is a fixed positive semi-definite and $T_{k}$ is possibly indefinite. Note that the VMIP-ADMM can unify the several existing ADMMs.

•

Let $S=0$ , $T_{k}\equiv 0$ , VMIP-ADMM reduces to the classical ADMM;

•

Let $S$ and $T_{k}\equiv T$ be positive semidefinite matrices, VMIP-ADMM turns to be the semi-proximal ADMM (1.4c);

•

Let $\{T_{k}\}$ be a positive semidefinite sequence, that is, $T_{k}\succeq 0$ for all $k$ . VMIP-ADMM becomes the variable semi-proximal ADMM;

•

Let $S=0$ , $T_{k}\equiv T$ be a positive indefite matrix, VMIP-ADMM covers the indefinite-proximal ADMM proposed in [29].

We present sufficient conditions on $\{T_{k}\}$ for the global convergence of VMIP-ADMM. The proof is followed by the analysis technique in Gu et al. [24], which separated the constant indefinite term “ $T$ ” into two semidefinite parts as $T=T_{+}-T_{-}$ . Moreover, we provide a construction of the indefinite term $T_{k}$ via the BFGS update. We extend a useful theorem in [25] for a special case when $y$ -subproblems (1.6b) are unconstrained quadratic programming problems. We construct the $T_{k}$ with $T_{k}=B_{k}-M$ , where $M$ is the Hessian matrix of the augmented Lagrangian function (1.2) and $B_{k}$ is generated by the BFGS update with respect to $\tau M$ , $\tau<1$ . We also show that this construction of $T_{k}$ satisfies the above conditions for the global convergence property when $\tau\in(0.75,1)$ .

The remaining parts of the paper are organized as follows. We first give notations and some preliminaries that will be useful for subsequent analysis in Section 2. Then we present sufficient conditions on the proximal matrices $\{T_{k}\}$ for the global convergence. In Section 3, we discuss the choices of proximal matrix $T_{k}$ that guarantees the global convergence. We also show how to determine the value of $\tau$ . Some conclusions and future works are given in Section 4.

2 Global convergence of the variable metric indefinite proximal ADMM

In this section, we show the global convergence of the variable metric indefinite proximal ADMM (1.6c) (VMIP-ADMM) for problem (1.1). To this end, we first present optimality conditions of problem (1.1) and some useful properties which will be frequently used in our analysis. Then we give sufficient conditions on $\{T_{k}\}$ under which VMIP-ADMM converges globally.

2.1 Optimality conditions for problem (1.1)

Let $\Omega={\mathbb{R}}^{n}\times{\mathbb{R}}^{n}\times{\mathbb{R}}^{m}.$ The KKT conditions of problem (1.1) are written as:

[TABLE]

Let $\Omega^{*}$ be a set of $(x^{*},y^{*},\lambda^{*})$ satisfying the KKT conditions (2.1d).

Throughout this paper, we make the following assumption.

Assumption 2.1.

The set $\Omega^{*}$ of KKT points is non-empty.

The optimality conditions of subproblems (1.6a) and (1.6b) can be obtained respectively that

[TABLE]

and

[TABLE]

where $\xi_{x}^{k+1}\in\partial f(x^{k+1})$ and $\xi_{y}^{k+1}\in\partial g(y^{k+1})$ .

Since $\lambda^{k+1}=\lambda^{k}-\beta(Ax^{k+1}+By^{k+1}-b)$ from (1.6c), we have

[TABLE]

and

[TABLE]

Then the above optimality conditions can be written as

[TABLE]

and

[TABLE]

2.2 Notations and Conditions on $\{T_{k}\}$

We use the following notations throughout this paper:

[TABLE]

Since the subdifferential mappings of the closed proper convex functions $f$ and $g$ are maximal monotone, there exist two positive semidefinite matrices $\Sigma_{f}$ and $\Sigma_{g}$ such that for all $x,\hat{x}\in{\mathbb{R}}^{n}$ , $\xi_{x}\in\partial f(x)$ , and $\hat{\xi}_{x}\in\partial f(\hat{x})$ ,

[TABLE]

and for all $y,\hat{y}\in{\mathbb{R}}^{n}$ , $\xi_{y}\in\partial g(y)$ , and $\hat{\xi}_{y}\in\partial g(\hat{y})$ ,

[TABLE]

Let $\Sigma\in{\mathbb{R}}^{2n\times 2n}$ denote

[TABLE]

We first give the conditions for $S$ and the indefinite proximal sequence $\{T_{k}\}$ to guarantee the global convergence.

Condition 2.1.

The matrix $S$ in (1.6a) satisfies

(a)

$S+\frac{1}{2}\Sigma_{f}\succeq 0$ ;

(b)

$S+\Sigma_{f}+\beta A^{\top}A\succ 0$ .

Moreover, for sequence $\{T_{k}\}$ generated in (1.6c), there exist a non-negative sequence $\{\gamma_{k}\}$ and positive semidefinite sequences $\{T_{+}^{k}\}$ and $\{T_{-}\}$ such that

(c)

$T_{k}=T_{+}^{k}-T_{-}$ * for all $k$ ;*

(d)

$T_{k}+\Sigma_{g}+\beta B^{\top}B\succ 0$ * for all $k$ ;*

(e)

$\frac{1}{1+\gamma_{k}}T_{+}^{k}\preceq T_{+}^{k+1}\preceq(1+\gamma_{k})T_{+}^{k},\;\;\forall k\geq 0,$ * $\sum\limits_{k=0}^{\infty}\gamma_{k}<\infty$ ;*

(f)

$T_{k+1}+\Sigma_{g}+\beta B^{\top}B\preceq(1+\gamma_{k})(T_{k}+\Sigma_{g}+\beta B^{\top}B)$ * for all $k$ ;*

(g)

$\exists\;c\in(0,0.5)$ , $T_{k}+\frac{3}{2}\Sigma_{g}-\frac{\gamma_{k-1}}{2}T_{+}^{k}-2T_{-}+(\frac{3}{4}-\frac{1}{2}c)\beta B^{\top}B\succeq 0$ for all $k$ .

Condition (a) and (b) indicate that the proximal marrix $S$ is allowed to be a slight indefinite but no less than $-\frac{1}{2}\Sigma_{f}$ . Condition (c) decomposes the indefinite matrix $T_{k}$ to two positive semidefinite parts. Note that we require the second part $T_{-}$ be fixed. This condition will play an important role in the main analysis. Condition (d) allows $T_{k}$ to be indefinite. Condition (e) and (f) are the boundness for positive semi-definite part $T_{+}^{k}$ and indefinite $T_{k}$ , respectably. Condition (g) is a requirement for global convergence and also an important condition for us to discuss the range of the indefiniteness.

For simplicity, we further define the following matrices. For all $k$ ,

[TABLE]

where $S,T_{k}$ and $\beta$ are given in (1.6c).

Moreover, we also define the following matrices

[TABLE]

where $\{\gamma_{k}\}$ is a sequence satisfying Condition 2.1. Note that $\Gamma_{k}\succeq 0$ for all $k$ .

2.3 Technical lemmas for convergence analysis of the variable metric indefinite proximal ADMM

In order to show that VMIP-ADMM converges to a solution of (1.1) globally, we first give some properties for the sequence $\{w_{k}\}=\{(x^{k},y^{k},\lambda^{k})\}$ generated by (1.6c).

Lemma 2.2.

Let $\{w^{k}\}$ be generated by (1.6c). Then, for given $w^{*}=(x^{*},y^{*},\lambda^{*})\in\Omega^{*}$ , we have

[TABLE]

Proof.

By taking $x=x^{*}$ and $y=y^{*}$ in the optimality conditions (2.2) and (2.3), respectively, we have

[TABLE]

and

[TABLE]

where $\xi_{x}^{k+1}\in\partial f(x^{k+1})$ and $\xi_{y}^{k+1}\in\partial g(y^{k+1})$ .

The inequalities are further rearranged as

[TABLE]

and

[TABLE]

Moreover, from (2.4)-(2.5) with $x=x^{k+1}$ , $y=y^{k+1}$ , $\hat{x}=x^{*}$ and $\hat{y}=y^{*}$ , we have

[TABLE]

and

[TABLE]

where $\xi_{x}^{*}\in\partial f(x^{*})$ and $\xi_{y}^{*}\in\partial g(y^{*})$ satisfy the KKT conditions (2.1a) and (2.1b), respectively. It then follows from (2.1a) and (2.11) that

[TABLE]

Combining this inequality and (2.9), we have

[TABLE]

In a similar way, we have from (2.1b), (2.10) and (2.12) that

[TABLE]

Rearranging (1.6c), we have $Ax^{k+1}+By^{k+1}-b=\frac{1}{\beta}\left(\lambda^{k}-\lambda^{k+1}\right)$ . It then follows from (2.1c) that

[TABLE]

Adding (2.3) and (2.14), and recalling the definition of $D_{k}$ and $\Sigma$ , it holds that

[TABLE]

The inequality (2.8) in Lemma 2.2 is further rearranged as follows.

Lemma 2.3.

Let $\{w^{k}\}$ be generated by (1.6c). Then, for given $w^{*}=(x^{*},y^{*},\lambda^{*})\in\Omega^{*}$ , we have

[TABLE]

Proof.

Noting that $Ax^{*}+By^{*}-b=0$ , the twice of the right hand of (2.8) is written as

[TABLE]

where the last equality follows from (1.6c). Then the assertion is directly obtained from (2.8). ∎

Next we give a simple but important lemma.

Lemma 2.4.

For vectors $a,b\in{\mathbb{R}}^{n}$ , and symmetric positive semidefinite matrices $M_{1},M_{2}\in{\mathbb{R}}^{n\times n}$ , we have that

[TABLE]

Proof.

For a positive semidefinite matrix $M_{1}$ , we have

[TABLE]

which implies

[TABLE]

In a similar way for $M_{2}$ , we have

[TABLE]

The assertion immediately follows by adding (2.17) and (2.18). ∎

In order to bound $(w^{k+1}-w^{*})^{\top}D_{k}(w^{k+1}-w^{k})$ further, we now give two technical lemmas to estimate upper-bounds for the crossing term $(By^{k+1}-By^{k})^{\top}(\lambda^{k}-\lambda^{k+1})$ in (2.3).

Lemma 2.5.

Let $\{w^{k}\}$ be generated by the scheme (1.6c). Suppose that the proximal sequence $\{T_{k}\}$ satisfies Condition 2.1. Then it holds that

[TABLE]

where $\Gamma_{k}$ and $\Lambda_{k}$ are defined in (2.7).

Proof.

From the optimality condition (2.3) for $y^{k+1}$ , we can easily derive the optimality condition for $y^{k}$ as

[TABLE]

Choosing $y=y^{k}$ in (2.3), we have

[TABLE]

Moreover, letting $y=y^{k+1}$ in (2.20), we have

[TABLE]

Summing inequalities (2.3) and (2.22), we obtain that

[TABLE]

It then follows from (2.5) that

[TABLE]

which is equivalent to

[TABLE]

Recall that $T_{k-1}=T_{+}^{k-1}-T_{-}$ from (c) in Condition 2.1 and $T_{+}^{k-1},T_{-}\succeq 0$ . Then we have

[TABLE]

where the inequality follows from (2.16) with $a=(y^{k+1}-y^{k})$ , $b=(y^{k}-y^{k-1})$ , $M_{1}=T_{+}^{k-1}$ and $M_{2}=T_{-}$ .

We then have from (2.23) that

[TABLE]

where the second inequality follows from $T_{k}=T_{+}^{k}-T_{-}$ and (2.3), the third inequality follows from Condition 2.1 (d), and the last equality is from the definitions (2.7a) and (2.7b). Then it shows the assertion (2.19). ∎

Besides Lemma 2.5, we can derive another estimation for $(By^{k+1}-By^{k})^{\top}(\lambda^{k}-\lambda^{k+1})$ , whose proof is similar to that in [29, Lemma 4.4].

Lemma 2.6.

Let $\{w^{k}\}$ be generated by the scheme (1.6c). Then, for any $c\in(0,0.5)$ , it holds that

[TABLE]

Proof.

See [29, Lemma 4.4]. ∎

Based on the above two lemmas for $(By^{k+1}-By^{k})^{\top}(\lambda^{k}-\lambda^{k+1})$ , we can further bound $(w^{k+1}-w^{*})^{\top}D_{k}(w^{k+1}-w^{k})$ in (2.3) of Lemma 2.3.

Lemma 2.7.

Let $\{w^{k}\}$ be generated by (1.6c). Suppose that the proximal sequence $\{T_{k}\}$ satisfies Condition 2.1. Then, for given $w^{*}=(x^{*},y^{*},\lambda^{*})\in\Omega^{*}$ , we have

[TABLE]

Proof.

The term $2(By^{k+1}-By^{k})^{\top}(\lambda^{k}-\lambda^{k+1})$ in inequality (2.3) can be bounded by the above lemmas (2.19) and (2.25), and then the assertion is obtained. ∎

2.4 Global Convergence of the variable metric indefinite proximal ADMM

In this subsection we show the global convergence based on the results in the previous subsection and Condition 2.1. Firstly, we obtain the following contractive result, which will play a key role in proving the convergence of (1.6c).

Lemma 2.8.

Let $w^{*}=(x^{*},y^{*},\lambda^{*})\in\Omega^{*}$ , and let $\{w^{k}\}$ be generated by the scheme (1.6c). Suppose that the proximal sequence $\{T_{k}\}$ satisfies Condition 2.1. Then we have

[TABLE]

where $\Gamma_{k}$ and $\Delta_{k}$ are given in (2.7).

Proof.

By the identity $\|a+b\|^{2}=\|a\|^{2}-\|b\|^{2}+2(a+b)^{\top}b$ , we get

[TABLE]

Moreover,

[TABLE]

Then we have

[TABLE]

Since the term $2(w^{k+1}-w^{*})^{\top}D_{k}(w^{k+1}-w^{k})$ in equality (2.4) can be bounded by (2.7) in Lemma 2.7, we can rearrange (2.4) as

[TABLE]

where the last equality follows from the definitions of $P_{k}$ and $D_{k}$ in (2.6). Rearranging (2.4) further, we have

[TABLE]

that is,

[TABLE]

From the definition of $G_{k}$ in (2.6), inequality (2.4) can be written as

[TABLE]

where the second inequality follows from the well-known inequality $\|a\|_{M}^{2}+\|b\|_{M}^{2}\geq\frac{1}{2}\|a-b\|_{M}^{2}$ with $M=\Sigma$ , $a=u^{k}-u^{*}$ and $b=u^{k+1}-u^{*}$ .

From the definitions (2.7b) and (2.7c), we have that

[TABLE]

Thus the proof is completed. ∎

Condition 2.1 (a) implies $\|x^{k+1}-x^{k}\|_{S+\frac{1}{2}\Sigma_{f}}^{2}\geq 0$ for all $k$ . Moreover, Condition 2.1 (g) implies $\|y^{k+1}-y^{k}\|_{\Delta_{k}}^{2}\geq 0$ for all $k$ . Therefore, Term1 in (2.8) is always nonnegative, which indicates the contraction of the sequence $\{w_{k}\}$ .

It follows from the definition of $\{G_{k}\}$ and Condition 2.1 (a), (c) and (e) that $0\preceq G_{k+1}\preceq(1+\gamma_{k})G_{k}$ for all $k$ . We define two constants $C_{s}$ and $C_{p}$ as follows:

[TABLE]

From the assumption $\sum_{0}^{\infty}\gamma_{k}<\infty$ and $\gamma_{k}\geq 0$ , we have $0\leq C_{s}<\infty$ and $1\leq C_{p}<\infty$ . Moreover, we can easily get

[TABLE]

which means that the sequences $\{G_{k}\}$ is bounded.

Now we give the main convergent theorem of this subsection.

Theorem 2.9.

Let $w^{*}=(x^{*},y^{*},\lambda^{*})\in\Omega^{*}$ , and let $\{w^{k}\}$ be a sequence generated by (1.6c). Suppose that $\{T_{k}\}$ is a sequence satisfying Condition 2.1. Then the sequence $\{w^{k}\}$ converges to a point $w^{*}\in\Omega^{*}$ .

Proof.

First we show that the sequence $\{w^{k}\}$ is bounded. Since $0\preceq G_{k+1}\preceq(1+\gamma_{k})G_{k}$ , we have

[TABLE]

Combining the inequality (2.31) with (2.8) in Lemma 2.8, we have

[TABLE]

It then follows that for all $k$ ,

[TABLE]

Note that

[TABLE]

$T_{k}+\Sigma_{g}+\beta B^{\top}B$ is positive definite from Condition 2.1 (d), and $C_{p}\left(\|w^{0}-w^{*}\|_{G_{0}}^{2}+\frac{1}{2}\|y^{0}-y^{1}\|_{\Gamma_{0}}^{2}\right)$ is a constant. It then follows from (2.4) that $\{y^{k}\}$ and $\{\lambda^{k}\}$ are bounded. We now show that $\{x^{k}\}$ is also bounded.

From (2.4) and (2.4), we have

[TABLE]

Summing up the inequalities, we obtain

[TABLE]

Since $(1+C_{s}C_{p})\left(\|w^{0}-w^{*}\|_{G_{0}}^{2}+\frac{1}{2}\|y^{0}-y^{1}\|_{\Gamma_{0}}^{2}\right)$ is a finite constant, we have

[TABLE]

which indicates that

[TABLE]

Note that $Ax^{*}+By^{*}-b=0$ , and

[TABLE]

It then follows from (2.35) that $\|A(x^{k+1}-x^{*})\|$ is bounded. Moreover, inequalities (2.4) and (2.34) imply $\|x^{k+1}-x^{*}\|_{S+\Sigma_{f}}^{2}$ is bounded. Therefore $\|x^{k+1}-x^{*}\|_{S+\Sigma_{f}+\beta A^{\top}A}^{2}$ is abounded since

[TABLE]

From the positive definiteness of $S+\Sigma_{f}+\beta A^{\top}A$ in Condition 2.1 (b), it shows that $\{x^{k}\}$ is also bounded. Consequently, the sequence $\{w^{k}\}$ is bounded.

Next we should show that any cluster point of the sequence $\{w^{k}\}$ is an optimal solution of (1.1) and the sequence $\{w^{k}\}$ has only one cluster point. This can be done in a way similar to the proof of that in [25]. ∎

3 VMIP-ADMM with the BFGS update

As shown in the recent researches [25, 26], a special variable metric proximal term via the BFGS update can get a solution faster on the iteration and CPU time than the proximal ADMM [18, 29] with a fixed proximal matrix $T$ . Moreover, in their experiments, a slightly indefinite variable also performs well without the theoretical analysis. Note that this choice should have an assumption that the $y$ -subproblems (1.6b) should be unconstrained quadratic programming problem. Based on the analysis above and the previous studies, we propose indefinite proximal terms $\{T_{k}\}$ updated by the BFGS update, and show that $\{T_{k}\}$ satisfies Condition 2.1.

3.1 Construction of the indefinite proximal matrix $T_{k}$ via the BFGS update

Inspired by the semidefinite proximal ADMM with the BFGS update [25, 26], we construct the indefinite matrix $T_{k}$ by the BFGS update.

We first explain the pure BFGS update for the following unconstrained quadratic optimization:

[TABLE]

where $M\in{\mathbb{R}}^{n\times n}$ is a positive definite matrix. Let $s\in{\mathbb{R}}^{n}$ and $l=Ms$ . Note that $s^{\top}l>0$ when $s\neq 0$ . The BFGS update generates a sequence of approximate matrices $\{B_{k}\}$ of $M$ , and its inverse $H_{k}=B_{k}^{-1}$ . For a given matrix $B_{k}$ , the BFGS update generates $B_{k+1}^{\mathrm{BFGS}}$ and $H_{k+1}^{\mathrm{BFGS}}$ with $s$ and $l$ as follows

[TABLE]

Note that $B^{\rm BFGS}_{k+1}$ and $H^{\rm BFGS}_{k+1}$ are positive definite whenever $B_{k},H_{k}\succ 0$ since $s^{\top}l>0$ . Note also that $H^{\rm BFGS}_{k+1}l=s=M^{-1}l$ .

We now explain how to construct $T_{k}$ via the BFGS update. Throughout this section we suppose that $g$ in the objective function (1.1) is a convex quadratic function. Then $y$ -subproblems (1.6b) are unconstrained quadratic programming problems, and the Hessian matrix of the augmented Lagrangian function (1.2) is a constant matrix given as

[TABLE]

where $\bar{M}\colon=\nabla^{2}_{yy}g(y)$ . Note that $M$ is always positive semidefinite since $\bar{M}\succeq 0$ .

We consider a perturbed matrix $M^{\delta}\colon=M+\delta I\succ 0$ with a sufficiently small $\delta>0$ , and construct an approximate matrix $B_{k}$ of $M^{\delta}$ via the BFGS update (3.1). Let $s_{k}=x^{k+1}-x^{k}$ , where $\{x^{k}\}$ is a sequence generated by (1.6c). We propose that $\{B_{k}\}$ is generated as

[TABLE]

where $\tilde{l}_{k}=Ms_{k}+\delta s_{k}=M^{\delta}s_{k}$ , and $\{c_{k}\}$ is a sequence such that $c_{k}\in[0,1],$ and $\sum\limits_{k=0}^{\infty}c_{k}<\infty$ . We can rewrite the update formula (3.3) as

[TABLE]

where $B^{\rm BFGS}_{k+1}$ is updated by the pure BFGS update (3.1) with respect to $M^{\delta}$ at every iteration. Note that $B_{k+1}=B^{\rm BFGS}_{k+1}$ when $c_{k}=1$ .

We then propose the following construction of $T_{k}$ via the BFGS update.

3.2 Discussion on the Condition 2.1 for the indefinite matrix $T_{k}$

We now consider matrices $\{T_{+}^{k}\}$ and $T_{-}$ such that $T_{k}=T_{+}^{k}-T_{-},\;\;T_{+}^{k}\succeq 0,\;\;T_{-}\succeq 0$ in Condition 2.1 (c). Let

[TABLE]

Note that $T_{k}=T_{+}^{k}-T_{-}=B_{k}-M$ and $T_{-}\succeq 0$ . Thus we only show that $T_{+}^{k}$ is positive semidefinite.

To this end, we give an extension result related to Theorem 2.2 in [25].

Lemma 3.1.

Let $M\in{\mathbb{R}}^{n\times n}$ be a positive definite matrix. Let $s\in{\mathbb{R}}^{n}$ such that $s\neq 0$ , and let $l=Ms$ . If a given matrix $H_{k}\in{\mathbb{R}}^{n\times n}$ satisfies $H_{k}\preceq\tau_{1}M^{-1}$ with $\tau_{1}\geq 1$ , then $H_{k+1}^{\mathrm{BFGS}}$ which is generated by the BFGS update (3.2) with respect to $M$ also satisfies $H_{k+1}^{\mathrm{BFGS}}\preceq\tau_{1}M^{-1}$ .

Proof.

Let $v$ be an arbitrary nonzero vector in ${\mathbb{R}}^{n}$ , and $\Psi=\{z\in{\mathbb{R}}^{n}\;|\;s^{\top}z=0\}$ . As shown in [25, Lemma 2.1], there exist $c\in{\mathbb{R}}$ and $z\in\Psi$ such that $v=cl+z$ . Together with $H_{k+1}^{\rm BFGS}l=s=M^{-1}l$ and $s^{\top}z=0$ , we can obtain that for any $\tau_{1}\geq 1$ ,

[TABLE]

where the forth equality follows from (3.2), and the inequality follows from the positive definiteness of $M^{-1}$ and the assumption that $H_{k}\preceq\tau_{1}M^{-1}$ . Since $v$ is arbitrary, we have $H_{k+1}^{\rm BFGS}\preceq\tau_{1}M^{-1}$ . ∎

Lemma 3.1 implies that $B^{\rm BFGS}_{k+1}\succeq\tau M^{\delta}$ when $B_{k}\succeq\tau M^{\delta}$ with $\tau=\frac{1}{\tau_{1}}\leq 1$ , and hence

[TABLE]

That is, if $B_{0}\succeq\tau M^{\delta}$ and $\tau\leq 1$ , we have $B_{k}\succeq\tau M^{\delta}$ for all $k$ , and hence $T_{+}^{k}\succeq 0$ for all $k$ . When $\tau=1$ , it is reduced to the variable metric semi-proximal ADMM in [25].

For instance, we can choose the initial matrix $B_{0}$ as

[TABLE]

It is easy to see that $B_{0}\succeq\tau M^{\delta}$ .

Next we show that the $T_{k}$ , $T_{+}^{k}$ and $T_{-}$ satisfy Condition 2.1 (d)-(g). We suppose that $B_{0}\succeq\tau M^{\delta}$ and $\tau\in(\frac{3}{4},1)$ .

First we show Condition 2.1 (e). Note that ${s}_{k}^{\top}B_{k}{s}_{k}\geq\tau{s}_{k}^{\top}M^{\delta}{s}_{k}\geq\tau\delta\|s_{k}\|^{2}$ , $\tilde{l}_{k}^{\top}s_{k}=s_{k}^{\top}Ms_{k}+\delta\|s_{k}\|^{2}\geq\delta\|s_{k}\|^{2}$ , and $M$ is the constant matrix. Therefore, we can suppose that $\|B^{\rm BFGS}_{k+1}-B_{k}\|$ is bounded above by some constant $Q>0$ , that is, $-QI\preceq B^{\rm BFGS}_{k+1}-B_{k}\preceq QI$ . Moreover, $T_{+}^{k}=B_{k}-\tau M\succeq\tau M^{\delta}-\tau M\succeq\tau\delta I$ . Then we can obtain that

[TABLE]

On the other hand, we have

[TABLE]

Let $\gamma_{k}=\frac{Q}{\tau\delta}c_{k}$ . Then we have

[TABLE]

Note that $\bar{M}=\nabla^{2}_{yy}g(y)=\Sigma_{g}$ . Then $T_{k}+\Sigma_{g}+\beta B^{\top}B=B_{k}-M+\Sigma_{g}+\beta B^{\top}B=B_{k}\succ 0$ which shows that Condition (d) holds.

Next we show Condition (f). Since (3.4) implies that $B_{k+1}-\tau M=T_{+}^{k+1}\preceq(1+\gamma_{k})T_{+}^{k}=(1+\gamma_{k})(B_{k}-\tau M)$ and $M$ is positive semidefinite, we have

[TABLE]

Obviously,

[TABLE]

Finally, we show Condition (g). From the definition of $M$ , we have

[TABLE]

where the matrix inequality follows from $B_{k}\succeq\tau M^{\delta}=\tau M+\tau\delta I\succeq\tau M$ . Note that there exist $\bar{k}$ such that $\gamma_{k}\leq 1$ for all $k\geq\bar{k}$ . Without loss of generality, we assume $\bar{k}=0$ and thus $\left(1-\frac{\gamma_{k-1}}{2}\right)\geq 0$ for all $k$ .

Let $c=2(\tau-\frac{3}{4})$ . It is easy to see that $c\in(0,\frac{1}{2})$ . Moreover, $3\tau-\frac{3}{2}>0$ and $3\tau-\frac{9}{4}-\frac{1}{2}c=2\tau-\frac{3}{2}>0$ .

As a conclusion of the above discussion, the indefinite proximal term $T_{k}$ generated via the BFGS update can satisfy Condition 2.1. Obviously, the VMIP-ADMM can cover the general indefinite proximal ADMM as the following remark.

Remark 3.2.

When $\{T_{k}\}$ be a constant sequence for all $k$ , that is, $T_{k}=T$ , then we can write $T=T_{+}-T_{-}$ , where $T_{+},T_{-}\succeq 0$ . It is easy to check that the boundness Condition (e) and (f) immediately hold when $\gamma_{k}\equiv 0$ . Let $T_{+}=\tau(rI-\beta B^{\top}B)\succ 0$ and $T_{-}=(1-\tau)\beta B^{\top}B\succeq 0$ , we choose

[TABLE]

Condition (d) holds. For $\tau\in(0.75,1)$ , taking $c=2(\tau-\frac{3}{4})$ , then Condition (g) turns to be

[TABLE]

It is reduced to the indefinite proximal ADMM in [29].

4 Conclusions

In this paper, we proposed a variable metric indefinite proximal ADMM whose indefinite proximal term can be chosen differently at every iterative step. We proved the global convergence of the proposed method under some requirements by applying an analysis technique in [24]. Moreover, for a special problem whose $y$ -subproblems are unconstrained quadratic programming problem, we proposed to construct the indefinite term $T_{k}$ via the BFGS update. We showed that such construction can satisfy the general convergent conditions.

Note that a strictly contractive version of the original ADMM which is known as the Peaceman-Rachford splitting method (PRSM) sometimes performs better in numerical experiments with some penalty parameters [28]. An indefinite proximal version of the PRSM also has been studied by many researchers [21, 32]. A further extension is to consider the variable metric indefinite term for PRSM. We leave this topic as one of our future work.

On the other hand, how to choose an adjusted proximal term is important to design a more efficient algorithm. The BFGS update provides better performance for some special problems whose $y$ -subproblem is quadratic problem. It is worth developing some efficient proximal term for a general nonlinear subproblem.

Bibliography44

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Attouch, L. M. Briceno-Arias, and P. L. Combettes , A parallel splitting method for coupled monotone inclusions , SIAM Journal on Control and Optimization, 48 (2010), pp. 3246–3270.
2[2] S. Banert, R. I. Bot, and E. R. Csetnek , Fixing and extending some recent results on the ADMM algorithm , ar Xiv preprint ar Xiv:1612.05057, (2016).
3[3] H. H. Bauschke, P. L. Combettes, et al. , Convex analysis and monotone operator theory in Hilbert spaces , vol. 408, Springer, 2011.
4[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein , Distributed optimization and statistical learning via the alternating direction method of multipliers , Foundations and Trends® in Machine Learning, 3 (2011), pp. 1–122.
5[5] A. Chambolle and T. Pock , A first-order primal-dual algorithm for convex problems with applications to imaging , Journal of mathematical imaging and vision, 40 (2011), pp. 120–145.
6[6] G. Chen and M. Teboulle , A proximal-based decomposition method for convex minimization problems , Mathematical Programming, 64 (1994), pp. 81–101.
7[7] P. Chen, J. Huang, and X. Zhang , A primal–dual fixed point algorithm for convex separable minimization with applications to image restoration , Inverse Problems, 29 (2013), p. 025011.
8[8] P. L. Combettes , Iterative construction of the resolvent of a sum of maximal monotone operators , J. Convex Anal, 16 (2009), pp. 727–748.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Alternating Direction Method of Multipliers with Variable Metric Indefinite Proximal Terms for Convex Optimization

Abstract

1 Introduction

2 Global convergence of the variable metric indefinite proximal ADMM

2.1 Optimality conditions for problem (1.1)

Assumption 2.1**.**

2.2 Notations and Conditions on {Tk}\{T_{k}\}{Tk​}

Condition 2.1**.**

2.3 Technical lemmas for convergence analysis of the variable metric indefinite proximal ADMM

Lemma 2.2**.**

Proof.

Lemma 2.3**.**

Proof.

Lemma 2.4**.**

Proof.

Lemma 2.5**.**

Proof.

Lemma 2.6**.**

Proof.

Lemma 2.7**.**

Proof.

2.4 Global Convergence of the variable metric indefinite proximal ADMM

Lemma 2.8**.**

Proof.

Theorem 2.9**.**

Proof.

3 VMIP-ADMM with the BFGS update

3.1 Construction of the indefinite proximal matrix TkT_{k}Tk​ via the BFGS update

3.2 Discussion on the Condition 2.1 for the indefinite matrix TkT_{k}Tk​

Lemma 3.1**.**

Proof.

Remark 3.2**.**

4 Conclusions

Assumption 2.1.

2.2 Notations and Conditions on $\{T_{k}\}$

Condition 2.1.

Lemma 2.2.

Lemma 2.3.

Lemma 2.4.

Lemma 2.5.

Lemma 2.6.

Lemma 2.7.

Lemma 2.8.

Theorem 2.9.

3.1 Construction of the indefinite proximal matrix $T_{k}$ via the BFGS update

3.2 Discussion on the Condition 2.1 for the indefinite matrix $T_{k}$

Lemma 3.1.

Remark 3.2.