Convergence Revisit on Generalized Symmetric ADMM
Jianchao Bai, Xiaokai Chang, Jicheng Li, Fengmin Xu

TL;DR
This paper revisits the convergence properties of the generalized symmetric ADMM algorithm, establishing sublinear and linear convergence rates under specific conditions, thereby enhancing understanding of its theoretical performance.
Contribution
It provides new convergence rate results for the generalized symmetric ADMM, including sublinear and linear rates under particular assumptions and parameter settings.
Findings
Sublinear nonergodic convergence rate established.
Linear convergence under piecewise linear sub-differential and polyhedral constraints.
Convergence results depend on dual stepsize parameters within a specific isosceles triangle region.
Abstract
In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remaining convergence results are established for the stepsize parameters of dual variables belonging to a special isosceles triangle region, which aims to strengthen our understanding for convergence of the generalized symmetric ADMM.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Optimization and Variational Analysis
Convergence Revisit on Generalized Symmetric ADMM
††thanks: The work was supported by the National Natural Science Foundation of China (Nos. 11671318; 11571271; 11631013) and the Natural Science Foundation of Fujian Province (No. 2016J01028). The second author Xiaokai Chang was supported by the Hongliu Foundation of First-class Disciplines of Lanzhou University of Technology.
Jianchao Bai 111Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an, 710129, China. Past addresses: School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China ([email protected]). Xiaokai Chang 222College of Science, Lanzhou University of Technology, Lanzhou 730050, China ([email protected]). Jicheng Li 333School of Mathematics and Statistics, Xi’an Jiaotong University, Xi’an 710049, China ([email protected]). Fengmin Xu 444School of Economics and Finance, Xi’an Jiaotong University, Xi’an 710049, China ([email protected]).
Abstract
In this note, we show a sublinear nonergodic convergence rate for the algorithm developed in [Bai, et al. Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)], as well as its linear convergence under assumptions that the sub-differential of each component objective function is piecewise linear and all the constraint sets are polyhedra. These remaining convergence results are established for the stepsize parameters of dual variables belonging to a special isosceles triangle region, which aims to strengthen our understanding for convergence of the generalized symmetric ADMM.
Keywords: Convex optimization; Alternating direction method of multipliers; Symmetric parameter domain; Convergence rate
Mathematics Subject Classification(2010): 65K10; 68W40; 90C25
1 Introduction
Revisit the following prototype multi-block separable convex optimization
[TABLE]
where are closed and proper convex functions (possibly nonsmooth); and are given matrices and vectors, respectively; and are polyhedra; and denote two integers. Throughout we assume the solution set of the problem (1) is nonempty and all the matrices and have full column rank.
By denoting and , the augmented Lagrangian function of the problem (1) is written as
[TABLE]
where is a penalty parameter and
[TABLE]
denotes the Lagrangian function associated with a Lagrange multiplier . As studied in our recent work [2], the Generalized Symmetric Alternating Direction Method of Multipliers (GS-ADMM) reads the following updates:
[TABLE]
where and are stepsize parameters satisfying
[TABLE]
and are proximal parameters for the regularization terms and , respectively.
By making use of a prediction-correction interpretation for GS-ADMM, we analyzed its global convergence, sublinear convergence rate in the ergodic sense and convergence complexity of two special cases allowing either or to be zero. However, two remaining tasks were not settled as mentioned by the past reviewers: (1) How to establish its worst-case convergence rate in the nonergodic sense, where denotes the iteration number? (2) Whether there exists a linear convergence rate of GS-ADMM under some mild assumptions? This note aims to give positive answers for these questions but for the following subregion (shown in the right-hand side of Fig. 1) of , that is,
[TABLE]
Notice that the above region is much wider than that () in [8, Algorithm 3]. Moreover, it can be seen by later analysis that the symmetric ADMM (S-ADMM, [9]) for solving the two-block separable convex optimization also has the worst-case convergence rate in the nonegodic sense as well as global linear convergence rate for parameters belonging to .
1.1 Relationship of GS-ADMM to related works
The algorithm GS-ADMM was initially proposed to generalize the meaningful S-ADMM [9] for solving the grouped multi-block separable convex optimization problem (1), whose convergence and iteration complexity could be still ensured for a larger domain of stepsizes of dual variables than that introduced in [9]. In practise, convergence of GS-ADMM was analyzed by estimating the lower bound of directly and by treating the domain of stepsize parameters as a whole, while convergence of S-ADMM was showed separately by splitting the domain of into several subdomains, where and are called the predictive variable and the correcting variable, respectively. Note that by taking , GS-ADMM with will become S-ADMM but continue to converge in the relatively larger convergence domain . In addition, the original S-ADMM only works for the two-block case and may not be convenient for solving large-scale problems, while GS-ADMM could handle large-scale multiple block problems since the block variables within each group were updated in a Jacobian scheme.
Regardless of the additional dual variable update (i.e. ), then GS-ADMM becomes a proximal ADMM-type algorithm with . Moreover, it will become the classical ADMM proposed by Glowinski-Marrocco [7] when considering the simple two block case without using proximal regularization terms. To the best of knowledge, the first proximal ADMM was proposed by Eckstein [3] as GS-ADMM with and with the following proximal terms
[TABLE]
where for any nonzero scalars Later, a perfect extension on convergence analysis from the classical ADMM to GS-ADMM with and , but allowing the stepsize to stay in the range was studied, see Xu-Wu [14] and Fazel, et. al. [5] for more details. Recently, He-Xu-Yuan [10] constructed a proximal ADMM for solving the problem (1) with only block variables, and their algorithm could be regarded as a special version of GS-ADMM with barring the -updates. Especially, the partially proximal ADMM-type algorithm [12] with a specified regularization term as ours could be treated as the case that GS-ADMM with and Considering the middle update (i.e. ), convergence domain of the dual stepsizes of GS-ADMM is still larger than that in the symmetric ADMM with indefinite proximal regularization [6, 13].
1.2 Notations and organizations
Throughout the note, the symbols denote the sets of real numbers, dimensional real column vectors and real matrices, respectively. For any , represents their inner product and denotes the Euclidean norm of , where T denotes the transpose operation. For any symmetric matrix , we define which is not necessarily nonnegative unless is positive definite. The symbols and denote respectively the maximum and minimum eigenvalue of a square matrix. The notations and stand for the identity matrix and zero matrix with proper dimensions, respectively. We call a piecewise linear multifunction if its graph is a union of finitely many polyhedra. For convenience, let
[TABLE]
and the corresponding solution set be where and . We also preset
[TABLE]
[TABLE]
[TABLE]
and
[TABLE]
The rest of this paper is organized as follows. In Section 2, by making use of some well-known identities, inequalities and matrix decomposition techniques, we first establish sublinear convergence rate of GS-ADMM in the nonergodic sense. Then, its global linear convergence rate, measured by an error function or , is analyzed under mild assumptions. Finally, we briefly conclude the paper in Section 3.
2 Main results
At the beginning of this section, we first analyze the worst-case nonergodic convergence rate of GS-ADMM for any . Then, by using several well-known inequalities its convergence rate is strengthened to linear under the assumption that the subdifferential of each objective function is piecewise linear.
2.1 Sublinear nonergodic convergence rate
Let us review the following two basic lemmas given in [2], which aims to interpret the GS-ADMM into a prediction-correction procedure.
Lemma 2.1
For the iterates defined in (5), we have and
[TABLE]
where and
[TABLE]
with
[TABLE]
[TABLE]
Lemma 2.2
For the sequences and generated by GS-ADMM, the following equality holds
[TABLE]
where
[TABLE]
Now, we give a lemma to guarantee the positive definiteness of , defined by
[TABLE]
which plays a significant role in showing the whole convergence rate of GS-ADMM.
Lemma 2.3
Let be given by (8) and (11), respectively. Then, the matrix is symmetric positive definite for any .
**Proof ** By simple calculations, the matrix can be explicitly written as
[TABLE]
where is defined in (9) and
[TABLE]
Clearly, the matrix is symmetric positive definite if and only if both and are symmetric positive definite. Well, is symmetric and its positivity can be guaranteed by the known conditions that and the full column rank assumption on the matrices . Hence, we just need to demonstrate the positivity of the matrix .
Noting that by the region shown in (3) we have
[TABLE]
Besides, it follows
[TABLE]
where is a diagonal matrix and
[TABLE]
In the above decomposition, we have
[TABLE]
and
[TABLE]
where
[TABLE]
So, the matrix is positive definite if and only if
[TABLE]
is positive definite. Notice that is positive definite if , and is positive definite if
[TABLE]
which is clearly guaranteed by the conditions (12). This completes the proof.
Theorem 2.1
[2]** The sequences and generated by GS-ADMM satisfy
[TABLE]
where and
[TABLE]
is symmetric positive definite for any .
In view of both Lemma 2.3 and Theorem 2.1, the sequence generated by GS-ADMM is contractive, which implies a global convergence of GS-ADMM. In fact, by estimating the lower bound of , a global convergence of GS-ADMM was proved in [2] for the larger region . Next, we will show sublinear nonergodic convergence rate of GS-ADMM for our discussed stepsize region .
Lemma 2.4
Let be given by (8), (11) and (16), respectively. Then, the sequences and generated by GS-ADMM satisfy
[TABLE]
**Proof ** Setting in (7), we obtain
[TABLE]
Meanwhile, the inequality (7) with also implies
[TABLE]
which, by letting , gives
[TABLE]
Because of the skew-symmetric property of , i.e.,
[TABLE]
we have from (17) and (18) that
[TABLE]
Thus, adding the identity
[TABLE]
to both sides of (19), we get
[TABLE]
which immediately completes the whole proof by the relationships in (10) and (16).
Next, we establish the worst-case nonergodic convergence rate of GS-ADMM in terms of optimality errors based on the following theorem.
Theorem 2.2
Let the sequences and be generated by GS-ADMM. Then, for any integer there exists a constant such that
[TABLE]
**Proof ** Combining the aforementioned Theorem 2.1 and Lemma 2.3, there exists a constant such that
[TABLE]
which suggests
[TABLE]
for any integer . Meanwhile, by setting and into the following well-known identity
[TABLE]
we have
[TABLE]
where the above first inequality uses Lemma 2.4 and the final equality uses Lemma 2.3. Therefore, it holds by (21) that
[TABLE]
Substituting it into (20), the proof is completed.
Theorem 2.3
For any integer , there exists a constant such that
[TABLE]
where is defined by (23) satisfying (25), and depends on the problem data and the parameters of GS-ADMM.
**Proof ** Let
[TABLE]
componentwisely defined as
[TABLE]
Then, according to the proof of [2, Lemma 2], that is, the first-order optimality conditions of the subproblems of GS-ADMM, we have
[TABLE]
which implies
[TABLE]
Here the notation denotes the normal cone of at . By (24) and Theorem 2.2, it can be deduced that
[TABLE]
where and in the following proof, depends only on the problem data and the parameters of GS-ADMM.
We next prove the inequality in the right-hand side of (22). Since the equality (6) can be rewritten as
[TABLE]
we have
[TABLE]
Clearly, a nonergodic convergence rate in general is stronger than the ergodic convergence rate for GS-ADMM. Let Then, for any tolerance , Theorem 2.2 tells us that it needs at most iterations to ensure If and , then we will have . Hence, we could use (or equivalently the iterate since by the proof of [2, Theorem 6]) as an approximate solution of the problem when the right-hand sides of the inequalities in (22) are sufficiently small.
2.2 Linear convergence rate
Throughout this subsection, all subdifferentials of the functions in (1) are assumed to be piecewise liner multi-functions. Under this hypothesis we will prove a global linear convergence rate of GS-ADMM by the aid of an error function
[TABLE]
If we simply denote by
Since each in the problem (1) is a polyhedron, so is convex and any projection operator is piecewise linear from [4, Proposition 4.1.4]. Here is nonexpansive, that is, the following inequality holds:
[TABLE]
Let be the sub-differential of a convex function , defined as
[TABLE]
Then, for any saddle-point of (1), there exist and such that
[TABLE]
which can be characterized by solving the equation with
[TABLE]
Under the assumption that and are piecewise linear multi-functions, is also piecewise linear. Besides, if and only if . The following lemma, coming from Robinsons’s continuity property [11] for polyhedral multi-functions, shows that could provide a global error bound on the distance of to the solution set .
Lemma 2.5
Under the assumption that and are piecewise linear multi-functions, there exists a constant such that
[TABLE]
For convenience of analysis, let
[TABLE]
Define
[TABLE]
with
[TABLE]
Note that all the above notations are positive since the matrices have full column rank. Hence, is a positive number.
Theorem 2.4
Let be defined in (27) with being defined in (26). Then, the sequences and generated by GS-ADMM satisfy
[TABLE]
**Proof ** Firstly, by the equation (20) mentioned in [2], that is,
[TABLE]
there exists such that
[TABLE]
Therefore, we have from the definition of and the nonexpansive property of the projection operator that
[TABLE]
where the second equality uses the fact
[TABLE]
Similarly, there exists such that
[TABLE]
Hence, we have
[TABLE]
Secondly, we can get by the update of and in GS-ADMM as well as (6) and (38) that
[TABLE]
which further shows
[TABLE]
Denote by
[TABLE]
Then, by combining (2.2), (2.2)-(50) together with the following identity
[TABLE]
for any , it can be achieved by the fact that
[TABLE]
[TABLE]
Based on the above preparations, we show a global linear convergence rate of GS-ADMM.
Theorem 2.5
Let be defined in (27) with being defined in (26). Then, there exists a constant such that the sequence generated by GS-ADMM satisfies
[TABLE]
where
[TABLE]
**Proof ** Because is a closed convex set, there exists a satisfying
[TABLE]
Then, by Lemma 2.5 and Theorem 2.4 there exists a constant such that
[TABLE]
where and are respectively defined in Lemma 2.3 and (16). So, we will have from the above inequality that
[TABLE]
This completes the whole proof.
Next, we show that generated by GS-ADMM converges to a point R-linearly.
Corollary 2.1
Let be defined in Theorem 2.5 and the sequence be generated by GS-ADMM. Then, there exists a point such that
[TABLE]
where
[TABLE]
**Proof ** Select such that and let
[TABLE]
Then, it follows from Theorem 2.1 that implying
[TABLE]
where the last inequality comes from Theorem 2.5. According to [2, Theorem 6], the sequence generated by GS-ADMM converges to a . Hence, we obtain by (52) that , which together with (53) show
[TABLE]
Hence, the assertion (51) holds, namely, converges R-linearly.
3 Conclusion remark
In this note, we further study iteration-complexity of GS-ADMM for solving the prototype multi-block separable convex optimization model. We establish its sublinear nonergodic convergence rate and also a R-linear convergence rate under assumptions that the sub-differential of each component function in the objective function is piecewise linear and all the constraint sets are polyhedra. By the fourth part discussed in [2] and the analysis in this work, the GS-ADMM with either or has a similar convergence rate as described in Theorem 2.3, Theorem 2.5 and Corollary 2.1. Viewed from the proof of Theorem 2.5, the linear convergence analysis depends mainly on Theorem 2.4 and the positivity of the matrix . Hence, if the sequence generated by an algorithm has the property similar to the results of Theorem 2.1, then one can prove that such algorithm converges linearly provided that the weighted matrix is positive definite.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1]
- 2[2] Bai, J.C., Li, J.C., Xu, F.M., Zhang, H.C.: Generalized symmetric ADMM for separable convex optimization. Comput. Optim. Appl. 70, 129-170 (2018)
- 3[3] Eckstein, J.: Some saddle-function splitting methods for convex programming. Optim. Methods Softw. 4, 75-83 (1994)
- 4[4] Facchinei, F., Pang, J.S.: Finite-Dimensional Variational Inequalities and Complementarity Problems. Springer-Verlag, Berlin (2003)
- 5[5] Fazel, M., Pong, T.K., Sun, D.F., Tseng, P.: Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34, 946-977 (2013)
- 6[6] Gao, B., Ma, F.: Symmetric alternating direction method with indefinite proximal regularization for linearly constrained convex optimization. J. Optim. Theory Appl. 176, 178-204 (2018)
- 7[7] Glowinski, R.: Marrocco, A.: Approximation par e ´ ´ 𝑒 \acute{e} l e ´ ´ 𝑒 \acute{e} ments finis d’rdre un et r e ´ ´ 𝑒 \acute{e} solution, par p e ´ ´ 𝑒 \acute{e} nalisation-dualit e ´ ´ 𝑒 \acute{e} d’une classe de probl e ` ` 𝑒 \grave{e} mes de Dirichlet non lin e ´ ´ 𝑒 \acute{e} aires. Rev. Fr. Autom. Inform. Rech. Op e ´ ´ 𝑒 \acute{e} r. Anal. Num e ´ ´ 𝑒 \acute{e} r. 2, 41-76 (1975)
- 8[8] He, B.S., Yuan, X.M.: Block-wise alternating direction method of multipliers for multiple-block convex programming and beyond. SMAI J. Comput. Math. 1, 145-174 (2015)
