Alternating Direction Method of Multipliers with Variable Metric Indefinite Proximal Terms for Convex Optimization
Yan Gu, Nobuo Yamashita

TL;DR
This paper introduces a variable metric indefinite proximal ADMM for convex optimization, providing convergence conditions and a new BFGS-based proximal term that enhances algorithm speed and applicability.
Contribution
It develops a globally convergent variable metric indefinite proximal ADMM and proposes a novel BFGS-based indefinite proximal term.
Findings
The proposed method converges globally under certain conditions.
A new BFGS-based indefinite proximal term satisfies convergence criteria.
Numerical experiments show improved performance over fixed positive semidefinite proximal terms.
Abstract
This paper studies a proximal alternating direction method of multipliers (ADMM) with variable metric indefinite proximal terms for linearly constrained convex optimization problems. The proximal ADMM plays an important role in many application areas, since the subproblems of the method are easy to solve. Recently, it is reported that the proximal ADMM with a certain fixed indefinite proximal term is faster than that with a positive semidefinite term, and still has the global convergence property. On the other hand, Gu and Yamashita studied a variable metric semidefinite proximal ADMM whose proximal term is generated by the BFGS update. They reported that a slightly indefinite matrix also makes the algorithm work well in their numerical experiments. Motivated by this fact, we consider a variable metric indefinite proximal ADMM, and give sufficient conditions on the proximal terms for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Direction-of-Arrival Estimation Techniques · Indoor and Outdoor Localization Technologies
Alternating Direction Method of Multipliers with Variable Metric Indefinite Proximal Terms for Convex Optimization
Yan Gu∗ and Nobuo Yamashita∗
∗Graduate School of Informatics, Kyoto University, Kyoto 6068501, Japan.
Email: [email protected]; [email protected].
(March 15, 2024)
Abstract
This paper studies a proximal alternating direction method of multipliers (ADMM) with variable metric indefinite proximal terms for linearly constrained convex optimization problems. The proximal ADMM plays an important role in many application areas, since the subproblems of the method are easy to solve. Recently, it is reported that the proximal ADMM with a certain fixed indefinite proximal term is faster than that with a positive semidefinite term, and still has the global convergence property. On the other hand, Gu and Yamashita studied a variable metric semidefinite proximal ADMM whose proximal term is generated by the BFGS update. They reported that a slightly indefinite matrix also makes the algorithm work well in their numerical experiments. Motivated by this fact, we consider a variable metric indefinite proximal ADMM, and give sufficient conditions on the proximal terms for the global convergence. Moreover, we propose a new indefinite proximal term based on the BFGS update which can satisfy the conditions for the global convergence.
Keywords: alternating direction method of multipliers, variable metric indefinite proximal term, BFGS update, global convergence, convex optimization
1 Introduction
We consider the following convex composite optimization problem:
[TABLE]
where and are proper convex functions, and . Various practical problems of science and engineering, such as machine learning [33, 43], total variation denoising [38] and statistics [39] can be formulated as Problem (1.1). Usually, we say that is a loss function and is a structured regularization term.
The augmented Lagrangian function of (1.1) is defined as
[TABLE]
where is the Lagrangian multiplier for the linear constraints in (1.1), and is a positive scalar. Note that .
A number of efficient first-order algorithms have been developed for problem (1.1) including operator splitting methods [1, 3, 8, 11, 13, 35], gradient methods [37, 40, 41], primal dual methods [5, 7, 17], etc. One may solve problem (1.1) is the classical augmented Lagrangian method (ALM), which generates the updates
[TABLE]
In this case, the vectors and should be updated at the same time ignoring the separability of the original functions. Generally, the joint minimization problem (1.3) is a challenge to be solved exactly or approximately with a high accuracy. We want to exploit the separability of the objective function to reduce the difficulty. The classical ADMM is one of such methods, and it efficiently solves problem (1.1) [22, 20]. The convergence analysis for the classical ADMM can be referred to [22, 20, 19, 4, 14].
Fazel et al. [18] proposed a more convenient semi-proximal ADMM by adding proximal terms to subproblems which takes the following scheme:
[TABLE]
where , and . For a vector and a semidefinite matrix , the norm is defined by . In this paper, even if is not positive semidefinite, we denote for simplicity.
The proximal ADMM covers the classical ADMM when . When and are two positive definite matrices and , this semi-proximal ADMM reduces to the proximal ADMM proposed by Eckstein [12]. The proximal ADMM has an advantage that its subproblems are easy to solve, and it also can efficiently handle the multi-block convex optimization problem which is known as block-wise ADMM [31]. See [9, 30, 18, 42] for more details of the semi-proximal ADMM.
It is well known that the global convergence of the semi-proximal ADMM (1.4c) is easier to prove. However, it is not satisfactory in numerical performance. The paper [10] mentioned that the proximal matrix in (1.4b) could be indefinite if though it provided no further discussions on theoretical properties. Then Li et al. [34] proved the global convergence. He et al. [29] proposed a linearized version of ADMM with a positive-indefinite proximal term. They considered the case that matrix and in (1.4c), and generated the proximal matrix as
[TABLE]
The proximal matrix is not necessarily positive semidefinite. A smaller value can ensure the convergence and also give better numerical performance.
How to choose the proximal term is also one of the important research topics for ADMM. The popular proximal term is always chosen as a constant matrix. He et al. [27] extended the work to allow the parameters , proximal terms and to be replaced by some bounded sequences of positive definite matrices and . The resulting ADMM is a variable metric proximal ADMM, which is also closely related to the inexact ADMM [13, 6, 27, 44, 15, 16]. The convergences of such methods have been studied in [36, 2, 23] but a better selection of the sequence has not been provided.
Quite recently, Gu and Yamashita [25] proposed to construct a variable positive semi-definite sequence with when is quadratic. Note that is a constant matrix. They generated via the BFGS update with respect to at every iteration. Gu and Yamashita [26] further extended such a proximal ADMM for more general convex optimization problems with the proximal term generated by the Broyden family update. In these ADMMs, the proximal terms contain some second order information on the augmented Lagrangian function. The papers [25, 26] report some numerical results for LASSO and L1 regularized logistic regression. The results show that the algorithms can get a solution faster than the general indefinite proximal ADMM whose proximal term is fixed. Another interesting numerical result in [25, 26] is that a variable indefinite sequence via the BFGS update also shows a good performance.
Inspired by the variable metric semi-proximal ADMM [25, 26] and the indefinite proximal ADMM [29], it is worth considering ADMM with a sequence of indefinite proximal matrices. We call the resulting ADMM a variable metric indefinite proximal ADMM (VMIP-ADMM). Throughout our discussion, we always choose the stepsize in (1.4c) be 1 as that in [29], which is good enough for such methods in practice and simple for the convergence analysis.
We now introduce the whole update scheme of the VMIP-ADMM:
[TABLE]
where is a fixed positive semi-definite and is possibly indefinite. Note that the VMIP-ADMM can unify the several existing ADMMs.
- •
Let , , VMIP-ADMM reduces to the classical ADMM;
- •
Let and be positive semidefinite matrices, VMIP-ADMM turns to be the semi-proximal ADMM (1.4c);
- •
Let be a positive semidefinite sequence, that is, for all . VMIP-ADMM becomes the variable semi-proximal ADMM;
- •
Let , be a positive indefite matrix, VMIP-ADMM covers the indefinite-proximal ADMM proposed in [29].
We present sufficient conditions on for the global convergence of VMIP-ADMM. The proof is followed by the analysis technique in Gu et al. [24], which separated the constant indefinite term “” into two semidefinite parts as . Moreover, we provide a construction of the indefinite term via the BFGS update. We extend a useful theorem in [25] for a special case when -subproblems (1.6b) are unconstrained quadratic programming problems. We construct the with , where is the Hessian matrix of the augmented Lagrangian function (1.2) and is generated by the BFGS update with respect to , . We also show that this construction of satisfies the above conditions for the global convergence property when .
The remaining parts of the paper are organized as follows. We first give notations and some preliminaries that will be useful for subsequent analysis in Section 2. Then we present sufficient conditions on the proximal matrices for the global convergence. In Section 3, we discuss the choices of proximal matrix that guarantees the global convergence. We also show how to determine the value of . Some conclusions and future works are given in Section 4.
2 Global convergence of the variable metric indefinite proximal ADMM
In this section, we show the global convergence of the variable metric indefinite proximal ADMM (1.6c) (VMIP-ADMM) for problem (1.1). To this end, we first present optimality conditions of problem (1.1) and some useful properties which will be frequently used in our analysis. Then we give sufficient conditions on under which VMIP-ADMM converges globally.
2.1 Optimality conditions for problem (1.1)
Let The KKT conditions of problem (1.1) are written as:
[TABLE]
Let be a set of satisfying the KKT conditions (2.1d).
Throughout this paper, we make the following assumption.
Assumption 2.1**.**
The set of KKT points is non-empty.
The optimality conditions of subproblems (1.6a) and (1.6b) can be obtained respectively that
[TABLE]
and
[TABLE]
where and .
Since from (1.6c), we have
[TABLE]
and
[TABLE]
Then the above optimality conditions can be written as
[TABLE]
and
[TABLE]
2.2 Notations and Conditions on
We use the following notations throughout this paper:
[TABLE]
Since the subdifferential mappings of the closed proper convex functions and are maximal monotone, there exist two positive semidefinite matrices and such that for all , , and ,
[TABLE]
and for all , , and ,
[TABLE]
Let denote
[TABLE]
We first give the conditions for and the indefinite proximal sequence to guarantee the global convergence.
Condition 2.1**.**
The matrix in (1.6a) satisfies
- (a)
;
- (b)
.
Moreover, for sequence generated in (1.6c), there exist a non-negative sequence and positive semidefinite sequences and such that
- (c)
* for all ;*
- (d)
* for all ;*
- (e)
* ;*
- (f)
* for all ;*
- (g)
, for all .
Condition (a) and (b) indicate that the proximal marrix is allowed to be a slight indefinite but no less than . Condition (c) decomposes the indefinite matrix to two positive semidefinite parts. Note that we require the second part be fixed. This condition will play an important role in the main analysis. Condition (d) allows to be indefinite. Condition (e) and (f) are the boundness for positive semi-definite part and indefinite , respectably. Condition (g) is a requirement for global convergence and also an important condition for us to discuss the range of the indefiniteness.
For simplicity, we further define the following matrices. For all ,
[TABLE]
where and are given in (1.6c).
Moreover, we also define the following matrices
[TABLE]
where is a sequence satisfying Condition 2.1. Note that for all .
2.3 Technical lemmas for convergence analysis of the variable metric indefinite proximal ADMM
In order to show that VMIP-ADMM converges to a solution of (1.1) globally, we first give some properties for the sequence generated by (1.6c).
Lemma 2.2**.**
Let be generated by (1.6c). Then, for given , we have
[TABLE]
Proof.
By taking and in the optimality conditions (2.2) and (2.3), respectively, we have
[TABLE]
and
[TABLE]
where and .
The inequalities are further rearranged as
[TABLE]
and
[TABLE]
Moreover, from (2.4)-(2.5) with , , and , we have
[TABLE]
and
[TABLE]
where and satisfy the KKT conditions (2.1a) and (2.1b), respectively. It then follows from (2.1a) and (2.11) that
[TABLE]
Combining this inequality and (2.9), we have
[TABLE]
In a similar way, we have from (2.1b), (2.10) and (2.12) that
[TABLE]
Rearranging (1.6c), we have . It then follows from (2.1c) that
[TABLE]
Adding (2.3) and (2.14), and recalling the definition of and , it holds that
[TABLE]
The inequality (2.8) in Lemma 2.2 is further rearranged as follows.
Lemma 2.3**.**
Let be generated by (1.6c). Then, for given , we have
[TABLE]
Proof.
Noting that , the twice of the right hand of (2.8) is written as
[TABLE]
where the last equality follows from (1.6c). Then the assertion is directly obtained from (2.8). ∎
Next we give a simple but important lemma.
Lemma 2.4**.**
For vectors , and symmetric positive semidefinite matrices , we have that
[TABLE]
Proof.
For a positive semidefinite matrix , we have
[TABLE]
which implies
[TABLE]
In a similar way for , we have
[TABLE]
The assertion immediately follows by adding (2.17) and (2.18). ∎
In order to bound further, we now give two technical lemmas to estimate upper-bounds for the crossing term in (2.3).
Lemma 2.5**.**
Let be generated by the scheme (1.6c). Suppose that the proximal sequence satisfies Condition 2.1. Then it holds that
[TABLE]
where and are defined in (2.7).
Proof.
From the optimality condition (2.3) for , we can easily derive the optimality condition for as
[TABLE]
Choosing in (2.3), we have
[TABLE]
Moreover, letting in (2.20), we have
[TABLE]
Summing inequalities (2.3) and (2.22), we obtain that
[TABLE]
It then follows from (2.5) that
[TABLE]
which is equivalent to
[TABLE]
Recall that from (c) in Condition 2.1 and . Then we have
[TABLE]
where the inequality follows from (2.16) with , , and .
We then have from (2.23) that
[TABLE]
where the second inequality follows from and (2.3), the third inequality follows from Condition 2.1 (d), and the last equality is from the definitions (2.7a) and (2.7b). Then it shows the assertion (2.19). ∎
Besides Lemma 2.5, we can derive another estimation for , whose proof is similar to that in [29, Lemma 4.4].
Lemma 2.6**.**
Let be generated by the scheme (1.6c). Then, for any , it holds that
[TABLE]
Proof.
See [29, Lemma 4.4]. ∎
Based on the above two lemmas for , we can further bound in (2.3) of Lemma 2.3.
Lemma 2.7**.**
Let be generated by (1.6c). Suppose that the proximal sequence satisfies Condition 2.1. Then, for given , we have
[TABLE]
Proof.
The term in inequality (2.3) can be bounded by the above lemmas (2.19) and (2.25), and then the assertion is obtained. ∎
2.4 Global Convergence of the variable metric indefinite proximal ADMM
In this subsection we show the global convergence based on the results in the previous subsection and Condition 2.1. Firstly, we obtain the following contractive result, which will play a key role in proving the convergence of (1.6c).
Lemma 2.8**.**
Let , and let be generated by the scheme (1.6c). Suppose that the proximal sequence satisfies Condition 2.1. Then we have
[TABLE]
where and are given in (2.7).
Proof.
By the identity , we get
[TABLE]
Moreover,
[TABLE]
Then we have
[TABLE]
Since the term in equality (2.4) can be bounded by (2.7) in Lemma 2.7, we can rearrange (2.4) as
[TABLE]
where the last equality follows from the definitions of and in (2.6). Rearranging (2.4) further, we have
[TABLE]
that is,
[TABLE]
From the definition of in (2.6), inequality (2.4) can be written as
[TABLE]
where the second inequality follows from the well-known inequality with , and .
From the definitions (2.7b) and (2.7c), we have that
[TABLE]
Thus the proof is completed. ∎
Condition 2.1 (a) implies for all . Moreover, Condition 2.1 (g) implies for all . Therefore, Term1 in (2.8) is always nonnegative, which indicates the contraction of the sequence .
It follows from the definition of and Condition 2.1 (a), (c) and (e) that for all . We define two constants and as follows:
[TABLE]
From the assumption and , we have and . Moreover, we can easily get
[TABLE]
which means that the sequences is bounded.
Now we give the main convergent theorem of this subsection.
Theorem 2.9**.**
Let , and let be a sequence generated by (1.6c). Suppose that is a sequence satisfying Condition 2.1. Then the sequence converges to a point .
Proof.
First we show that the sequence is bounded. Since , we have
[TABLE]
Combining the inequality (2.31) with (2.8) in Lemma 2.8, we have
[TABLE]
It then follows that for all ,
[TABLE]
Note that
[TABLE]
is positive definite from Condition 2.1 (d), and is a constant. It then follows from (2.4) that and are bounded. We now show that is also bounded.
[TABLE]
Summing up the inequalities, we obtain
[TABLE]
Since is a finite constant, we have
[TABLE]
which indicates that
[TABLE]
Note that , and
[TABLE]
It then follows from (2.35) that is bounded. Moreover, inequalities (2.4) and (2.34) imply is bounded. Therefore is abounded since
[TABLE]
From the positive definiteness of in Condition 2.1 (b), it shows that is also bounded. Consequently, the sequence is bounded.
Next we should show that any cluster point of the sequence is an optimal solution of (1.1) and the sequence has only one cluster point. This can be done in a way similar to the proof of that in [25]. ∎
3 VMIP-ADMM with the BFGS update
As shown in the recent researches [25, 26], a special variable metric proximal term via the BFGS update can get a solution faster on the iteration and CPU time than the proximal ADMM [18, 29] with a fixed proximal matrix . Moreover, in their experiments, a slightly indefinite variable also performs well without the theoretical analysis. Note that this choice should have an assumption that the -subproblems (1.6b) should be unconstrained quadratic programming problem. Based on the analysis above and the previous studies, we propose indefinite proximal terms updated by the BFGS update, and show that satisfies Condition 2.1.
3.1 Construction of the indefinite proximal matrix via the BFGS update
Inspired by the semidefinite proximal ADMM with the BFGS update [25, 26], we construct the indefinite matrix by the BFGS update.
We first explain the pure BFGS update for the following unconstrained quadratic optimization:
[TABLE]
where is a positive definite matrix. Let and . Note that when . The BFGS update generates a sequence of approximate matrices of , and its inverse . For a given matrix , the BFGS update generates and with and as follows
[TABLE]
[TABLE]
Note that and are positive definite whenever since . Note also that .
We now explain how to construct via the BFGS update. Throughout this section we suppose that in the objective function (1.1) is a convex quadratic function. Then -subproblems (1.6b) are unconstrained quadratic programming problems, and the Hessian matrix of the augmented Lagrangian function (1.2) is a constant matrix given as
[TABLE]
where . Note that is always positive semidefinite since .
We consider a perturbed matrix with a sufficiently small , and construct an approximate matrix of via the BFGS update (3.1). Let , where is a sequence generated by (1.6c). We propose that is generated as
[TABLE]
where , and is a sequence such that and . We can rewrite the update formula (3.3) as
[TABLE]
where is updated by the pure BFGS update (3.1) with respect to at every iteration. Note that when .
We then propose the following construction of via the BFGS update.
3.2 Discussion on the Condition 2.1 for the indefinite matrix
We now consider matrices and such that in Condition 2.1 (c). Let
[TABLE]
Note that and . Thus we only show that is positive semidefinite.
To this end, we give an extension result related to Theorem 2.2 in [25].
Lemma 3.1**.**
Let be a positive definite matrix. Let such that , and let . If a given matrix satisfies with , then which is generated by the BFGS update (3.2) with respect to also satisfies .
Proof.
Let be an arbitrary nonzero vector in , and . As shown in [25, Lemma 2.1], there exist and such that . Together with and , we can obtain that for any ,
[TABLE]
where the forth equality follows from (3.2), and the inequality follows from the positive definiteness of and the assumption that . Since is arbitrary, we have . ∎
Lemma 3.1 implies that when with , and hence
[TABLE]
That is, if and , we have for all , and hence for all . When , it is reduced to the variable metric semi-proximal ADMM in [25].
For instance, we can choose the initial matrix as
[TABLE]
It is easy to see that .
Next we show that the , and satisfy Condition 2.1 (d)-(g). We suppose that and .
First we show Condition 2.1 (e). Note that , , and is the constant matrix. Therefore, we can suppose that is bounded above by some constant , that is, . Moreover, . Then we can obtain that
[TABLE]
On the other hand, we have
[TABLE]
Let . Then we have
[TABLE]
Note that . Then which shows that Condition (d) holds.
Next we show Condition (f). Since (3.4) implies that and is positive semidefinite, we have
[TABLE]
Obviously,
[TABLE]
Finally, we show Condition (g). From the definition of , we have
[TABLE]
where the matrix inequality follows from . Note that there exist such that for all . Without loss of generality, we assume and thus for all .
Let . It is easy to see that . Moreover, and .
As a conclusion of the above discussion, the indefinite proximal term generated via the BFGS update can satisfy Condition 2.1. Obviously, the VMIP-ADMM can cover the general indefinite proximal ADMM as the following remark.
Remark 3.2**.**
When be a constant sequence for all , that is, , then we can write , where . It is easy to check that the boundness Condition (e) and (f) immediately hold when . Let and , we choose
[TABLE]
Condition (d) holds. For , taking , then Condition (g) turns to be
[TABLE]
It is reduced to the indefinite proximal ADMM in [29].
4 Conclusions
In this paper, we proposed a variable metric indefinite proximal ADMM whose indefinite proximal term can be chosen differently at every iterative step. We proved the global convergence of the proposed method under some requirements by applying an analysis technique in [24]. Moreover, for a special problem whose -subproblems are unconstrained quadratic programming problem, we proposed to construct the indefinite term via the BFGS update. We showed that such construction can satisfy the general convergent conditions.
Note that a strictly contractive version of the original ADMM which is known as the Peaceman-Rachford splitting method (PRSM) sometimes performs better in numerical experiments with some penalty parameters [28]. An indefinite proximal version of the PRSM also has been studied by many researchers [21, 32]. A further extension is to consider the variable metric indefinite term for PRSM. We leave this topic as one of our future work.
On the other hand, how to choose an adjusted proximal term is important to design a more efficient algorithm. The BFGS update provides better performance for some special problems whose -subproblem is quadratic problem. It is worth developing some efficient proximal term for a general nonlinear subproblem.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] H. Attouch, L. M. Briceno-Arias, and P. L. Combettes , A parallel splitting method for coupled monotone inclusions , SIAM Journal on Control and Optimization, 48 (2010), pp. 3246–3270.
- 2[2] S. Banert, R. I. Bot, and E. R. Csetnek , Fixing and extending some recent results on the ADMM algorithm , ar Xiv preprint ar Xiv:1612.05057, (2016).
- 3[3] H. H. Bauschke, P. L. Combettes, et al. , Convex analysis and monotone operator theory in Hilbert spaces , vol. 408, Springer, 2011.
- 4[4] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein , Distributed optimization and statistical learning via the alternating direction method of multipliers , Foundations and Trends® in Machine Learning, 3 (2011), pp. 1–122.
- 5[5] A. Chambolle and T. Pock , A first-order primal-dual algorithm for convex problems with applications to imaging , Journal of mathematical imaging and vision, 40 (2011), pp. 120–145.
- 6[6] G. Chen and M. Teboulle , A proximal-based decomposition method for convex minimization problems , Mathematical Programming, 64 (1994), pp. 81–101.
- 7[7] P. Chen, J. Huang, and X. Zhang , A primal–dual fixed point algorithm for convex separable minimization with applications to image restoration , Inverse Problems, 29 (2013), p. 025011.
- 8[8] P. L. Combettes , Iterative construction of the resolvent of a sum of maximal monotone operators , J. Convex Anal, 16 (2009), pp. 727–748.
