A block symmetric Gauss-Seidel decomposition theorem for convex composite quadratic programming and its applications
Xudong Li, Defeng Sun, Kim-Chuan Toh

TL;DR
This paper establishes a decomposition theorem for the block symmetric Gauss-Seidel method, linking it to convex quadratic programming, and extends it to solve convex composite quadratic problems with inexact and accelerated variants.
Contribution
The paper introduces the block sGS decomposition theorem and extends the classical block sGS method to convex composite quadratic programming, including inexact and accelerated versions.
Findings
Exact solution of quadratic programming via block sGS cycles
Extension to convex composite quadratic programming with nonsmooth terms
Achieves accelerated convergence rate of O(1/k^2) with inexact computation
Abstract
For a symmetric positive semidefinite linear system of equations , where is partitioned into blocks, with , we show that each cycle of the classical block symmetric Gauss-Seidel (block sGS) method exactly solves the associated quadratic programming (QP) problem but added with an extra proximal term of the form , where is a symmetric positive semidefinite matrix related to the sGS decomposition and is the previous iterate. By leveraging on such a connection to optimization, we are able to extend the result (which we name as the block sGS decomposition theorem) for solving a convex composite QP (CCQP) with an additional possibly nonsmooth term in , i.e., $\min\{ p(x_1) + \frac{1}{2}\langle {\bf x},\, \mathcal{Q} {\bf x} \rangle…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques · Optimization and Variational Analysis
A block symmetric Gauss-Seidel decomposition theorem for convex composite
quadratic programming and its applications
Xudong Li, Defeng Sun , Kim-Chuan Toh Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 ([email protected]).Department of Mathematics and Risk Management Institute, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 ([email protected]).Department of Mathematics, National University of Singapore, 10 Lower Kent Ridge Road, Singapore 119076 ([email protected]).
(May 16, 2017)
Abstract
For a symmetric positive semidefinite linear system of equations {\cal Q}\mbox{\boldmath{x}}=\mbox{\boldmath{b}}, where \mbox{\boldmath{x}}=(x_{1},\ldots,x_{s}) is partitioned into blocks, with , we show that each cycle of the classical block symmetric Gauss-Seidel (block sGS) method exactly solves the associated quadratic programming (QP) problem but added with an extra proximal term of the form \frac{1}{2}\|\mbox{\boldmath{x}}-\mbox{\boldmath{x}}^{k}\|_{\cal T}^{2}, where is a symmetric positive semidefinite matrix related to the sGS decomposition of and \mbox{\boldmath{x}}^{k} is the previous iterate. By leveraging on such a connection to optimization, we are able to extend the result (which we name as the block sGS decomposition theorem) for solving a convex composite QP (CCQP) with an additional possibly nonsmooth term in , i.e., \min\{p(x_{1})+\frac{1}{2}\langle\mbox{\boldmath{x}},\,{\cal Q}\mbox{\boldmath{x}}\rangle-\langle\mbox{\boldmath{b}},\,\mbox{\boldmath{x}}\rangle\}, where is a proper closed convex function. Based on the block sGS decomposition theorem, we extend the classical block sGS method to solve a CCQP. In addition, our extended block sGS method has the flexibility of allowing for inexact computation in each step of the block sGS cycle. At the same time, we can also accelerate the inexact block sGS method to achieve an iteration complexity of after performing cycles. As a fundamental building block, the block sGS decomposition theorem has played a key role in various recently developed algorithms such as the inexact semiproximal ALM/ADMM for linearly constrained multi-block convex composite conic programming (CCCP), and the accelerated block coordinate descent method for multi-block CCCP.
Keywords: Convex composite quadratic programming, block symmetric Gauss-Seidel, Schur complement, augmented Lagrangian method
AMS subject classifications: 90C06, 90C20, 90C25, 65F10
1 Introduction
It is well known that the classical block symmetric Gauss-Seidel (block sGS) method [1, 7, 13, 23] can be used to solve a symmetric positive semidefinite linear system of equations {\cal Q}\mbox{\boldmath{x}}=\mbox{\boldmath{b}} where \mbox{\boldmath{x}}=(x_{1};\ldots;x_{s}) is partitioned into blocks with . We are particularly interested in the case when . In this paper, we show that each cycle of the classical block sGS method exactly solves the corresponding convex quadratic programming (QP) problem but added with an extra proximal term depending on the previous iterate (say \mbox{\boldmath{x}}^{k}). Through such a connection to optimization, we are able to extend the result (which we name as the block sGS decomposition theorem) to a convex composite QP (CCQP) with an additional possibly nonsmooth term in , and subsequently extend the classical block sGS method to solve a CCQP. We can also extend the classical block sGS method to the inexact setting, where the underlying linear system for each block of the new iterate \mbox{\boldmath{x}}^{k+1} need not be solved exactly. Moreover, by borrowing ideas in the optimization literature, we are able to accelerate the classical block sGS method and provide new convergence results. More details will be given later.
Assume that for , and , where is a given integer. Consider the following symmetric positive semidefinite block linear system of equations:
[TABLE]
where \mbox{\boldmath{x}}=[x_{1};\;\ldots;\;x_{s}]\in{\cal X}, \mbox{\boldmath{b}}=[b_{1};\;\ldots;\;b_{s}]\in{\cal X}, and
[TABLE]
with for . It is well known that (1) is the optimality condition for the following unconstrained QP:
[TABLE]
Note that even though our problem is phrased in the matrix-vector setting for convenience, one can consider the setting where each is a real -dimensional inner product space and is a linear map from to . Throughout the paper, we make the following assumption:
Assumption 1**.**
* is symmetric positive semidefinite and each diagonal block is symmetric positive definite for .
From the following decomposition of :
[TABLE]
where
[TABLE]
the classical block sGS iteration in numerical analysis is usually derived as a natural generalization of the classical pointwise sGS for solving a symmetric positive definite linear system of equations, and the latter is typically derived as a fixed-point iteration for the sGS matrix splitting based on (7); see for example [23, Sec. 4.1.1], [13, Sec. 4.5]. Specifically, the block sGS fixed-point iteration in the third normal form (in the terminology used in [13]) reads as follows:
[TABLE]
where .
In this paper, we give a derivation of the classical block sGS method (17) from the optimization perspective. By doing so, we are able to extend the classical block sGS method to solve a structured CCQP problem of the form:
[TABLE]
where is a proper closed convex function such as or (the indicator function of defined by if and otherwise). Our specific contributions are described in the next few paragraphs. We note that the main results presented here are parts of the thesis of the first author [18].
First, we establish the key result of the paper, the block sGS decomposition theorem, which states that each cycle of the block sGS method, say at the th iteration, corresponds exactly to solving (18) with an additional proximal term \frac{1}{2}\|\mbox{\boldmath{x}}-\mbox{\boldmath{x}}^{k}\|^{2}_{{\cal T}_{\cal Q}} added to its objective function, i.e.,
[TABLE]
where , and \|\mbox{\boldmath{x}}\|_{{\cal T}_{\cal Q}}^{2}=\langle\mbox{\boldmath{x}},\,{\cal T}_{\cal Q}\mbox{\boldmath{x}}\rangle. It is clear that when , the problem (18) is exactly the QP (6) associated with the linear system (1). Therefore, we can interpret the classical block sGS method as a proximal-point minimization method for solving the QP (6), and each cycle of the classical block sGS method solves exactly the proximal subproblem (19) associated with the QP (6). As far as we are aware of, this is the first time in which the classical block sGS method (17) (and also the pointwise sGS method) is derived from an optimization perspective.
Second, we also establish a factorization view of the block sGS decomposition theorem and show its equivalence to the Schur complement based (SCB) reduction procedure proposed in [17] for solving a recursively defined variant of the proximal subproblem (19). The SCB reduction procedure in [17] is derived by inductively finding an appropriate proximal term to be added to the objective function of (18) so that the block variables can be eliminated in a sequential manner and thus ending with a minimization problem involving only the variable . In a nutshell, we show that the SCB reduction procedure sequentially eliminates the blocks (in the reversed order starting from ) in the variable of the proximal subproblem (19) by decomposing the proximal term \frac{1}{2}\|\mbox{\boldmath{x}}-\mbox{\boldmath{x}}^{k}\|^{2}_{{\cal T}_{\cal Q}} also in a sequential manner. In turn, each of the reduction step corresponds exactly to one step in a cycle of the block sGS method.
Third, based on the block sGS decomposition theorem, we are able to extend the classical block sGS method for solving the QP (6) to solve the CCQP (18), and each cycle of the extended block sGS method corresponds precisely to solving the proximal subproblem (19). Our extension of the block sGS method has thus overcome the limitation of the classical method by allowing us to solve the nonsmooth CCQP which often arises in practice, for example, in semidefinite programming where and is the cone of symmetric positive semidefinite matrices. Moreover, our extension also allows the updates of the blocks to be inexact. As a consequence, we also obtain an inexact version of the classical block sGS method, where the iterate \mbox{\boldmath{x}}^{k+1} need not be computed exactly from (17). We should emphasize that the inexact block sGS method is potentially very useful when a diagonal block, say , in (5) is large and the computation of \mbox{\boldmath{x}}^{k+1}_{i} must be done via an iterative solver rather than a direct solver. Note that even for the linear system (17), our systematic approach (in section 4) to derive the inexact extension of the classical block sGS method appears to be new. The only inexact variant of the classical block sGS method for (17) with a convergence proof we are aware of is the pioneering work of Bank et al. in [3]. In [3], the authors showed that by modifying the diagonal blocks in , the linear system involved in each block can be solved by a given fixed number of pointwise sGS cycles.
Fourth, armed with the optimization interpretation of each cycle of the block sGS method, it becomes easy for us to adapt ideas from the optimization literature to establish the iteration complexity of O(\|\mbox{\boldmath{x}}^{0}-\mbox{\boldmath{x}}^{*}\|_{{\widehat{{\cal Q}}}}^{2}/k) for the extended block sGS method as well as to accelerate it to obtain the complexity of O(\|\mbox{\boldmath{x}}^{0}-\mbox{\boldmath{x}}^{*}\|_{{\widehat{{\cal Q}}}}^{2}/(k+1)^{2}), after running for cycles, where \mbox{\boldmath{x}}^{*} is an optimal solution for (18). Just as in the classical block sGS method, we can obtain a linear rate of convergence for our extended inexact block sGS method under the assumption that is positive definite. With the help of an extensive optimization literature on the linear convergences of proximal gradient methods, we are further able to relax the positive definiteness assumption on to a mild error bound assumption on the function in (18) and derive at least R-linear convergence results for our extended block sGS method. The error bound assumption in fact holds automatically for many interesting applications, including the important case when is a piecewise linear-quadratic function. We note that there is active research in studying the convergence of proximal gradient methods for a convex composite minimization problem of the form \min\{f(\mbox{\boldmath{x}})+g(\mbox{\boldmath{x}})\mid\mbox{\boldmath{x}}\in{\cal X}\}, with being a smooth convex function and a proper closed convex function whose proximal map is easy to compute; see for example [24] and the references therein. In each iteration of a typical proximal gradient method, a simple proximal term \frac{L}{2}\|\mbox{\boldmath{x}}-\bar{\mbox{\boldmath{x}}}\|^{2}, where is a Lipschitz constant for the gradient of , is added to the objective function. Our extended block sGS method for (CCQP) differs from those proximal gradient methods in the literature in that the proximal term we add comes from the sophisticated positive semidefinite linear operator associated with the sGS decomposition of
Recent research works in [6, 16, 17, 25, 26] have shown that our block sGS decomposition theorem for the CCQP (18) can play an essential role in the design of efficient algorithms for solving various convex optimization problems such as convex composite quadratic semidefinite programming problems. Indeed, the block sGS decomposition based ADMM algorithms designed in [6, 25, 26] have found applications in various recent papers such as [2, 10, 15]. Our experiences have shown that the inexact block sGS cycle can provide the much needed computational efficiency when one is designing an algorithm based on the framework of the proximal augmented Lagrangian (ALM) or proximal alternating direction method of multipliers (ADMM) for solving important classes of large scale convex composite optimization problems. As a concrete illustration of the application of our block sGS decomposition theorem, we will briefly describe in section 5 on how to utilize the theorem in the design of the proximal augmented Lagrangian method for solving a linearly constrained convex composite quadratic programming problem.
The idea of sequentially updating the blocks of a multi-block variable, either in the Gauss-Seidel fashion or the successive over-relaxation (SOR) fashion, has been incorporated into quite a number of optimization algorithms [5] and in solving nonlinear equations [22]. Indeed the Gauss-Seidel (also known as the block coordinate descent) approach for solving optimization problems has been considered extensively; we refer the readers to [4, 12] for the literature review on the recent developments, especially for the case where . Here we would like to emphasize that even for the case of an unconstrained smooth convex minimization problem \min\{f(\mbox{\boldmath{x}})\mid\mbox{\boldmath{x}}\in{\cal X}\}, whose objective function f(\mbox{\boldmath{x}}) (not necessarily strongly convex) has a Lipschitz continuous gradient of modulus , it is only proven recently in [4] that the block coordinate (gradient) descent method is globally convergent with the iteration complexity of after cycles, where is the number of blocks. When f(\mbox{\boldmath{x}}) is the quadratic function in (6), the block coordinate descent method is precisely the classical block Gauss-Seidel (GS) method. In contrast to the block sGS method, each iteration of the block GS method does not appear to have an optimization equivalence. Despite the extensive work on the Gauss-Seidel approach for solving convex optimization problems, surprisingly, little is known about the symmetric Gauss-Seidel approach for solving the same problems except for the recent paper [25] which utilized our block sGS decomposition theorem to design an inexact accelerated block coordinate descent method to solve a problem of the form \min\{p(x_{1})+f(\mbox{\boldmath{x}})\mid\mbox{\boldmath{x}}\in{\cal X}\}.
The remaining parts of the paper are organized as follows. The next section is devoted to the block sGS decomposition theorem for the CCQP (18). In section 3, we present a factorization view of the block sGS theorem and prove its equivalence to the SCB reduction procedure proposed in [17, 18]. In the following section, we derive the block sGS method from an optimization perspective and extend it to solve the CCQP (18). The convergence results for our extended block sGS method are also presented in this section. In section 5, the application of our block sGS decomposition theorem is demonstrated in the design of a proximal augmented Lagrangian method for solving a linearly constrained convex composite quadratic programming problem. The extension of the classical block symmetric SOR method for solving (18) is presented in section 6. We conclude our paper in the final section.
We end the section by giving some notation. For a symmetric matrix , the notation () means that the matrix is symmetric positive semidefinite (definite). The spectral norm of is denote by
2 Derivation of the block sGS decomposition theorem for (18)
In this section, we present the derivation of one cycle of the block sGS method for (18) from the optimization perspective as mentioned in the introduction.
Recall the decomposition of in (7), , in (16) and the sGS linear operator defined by
[TABLE]
Given \bar{}\mbox{\boldmath{x}}\in{\cal X}, corresponding to problem (18), we consider solving the following subproblem
[TABLE]
where \mbox{\boldmath{\delta^{\prime}}},\mbox{\boldmath{\delta}}\in{\cal X} are two given error vectors with , and
[TABLE]
We note that the vectors \mbox{\boldmath{\delta^{\prime}}},\mbox{\boldmath{\delta}} need not be known a priori. We should view \mbox{\boldmath{x}}^{+} as an approximate solution to (21) without the perturbation term \langle\mbox{\boldmath{x}},\,\Delta(\mbox{\boldmath{\delta^{\prime}}},\mbox{\boldmath{\delta}})\rangle. Once \mbox{\boldmath{x}}^{+} has been computed, the associated error vectors can then be obtained, and \mbox{\boldmath{x}}^{+} is then the exact solution to the perturbed problem (21).
The following theorem shows that \mbox{\boldmath{x}}^{+} can be computed by performing exactly one cycle of the block sGS method for (18). In particular, if and \mbox{\boldmath{\delta}}^{\prime}=0=\mbox{\boldmath{\delta}}, then the computation of \mbox{\boldmath{x}}^{+} corresponds exactly to one cycle of the classical block sGS method. For the proof, we need to define the following notation for a given \mbox{\boldmath{x}}=(x_{1};\ldots;x_{s}),
[TABLE]
We also define .
Theorem 1** (sGS Decomposition).**
Assume that and the self-adjoint linear operators are positive definite for all . Then, it holds that
[TABLE]
For suppose that we have computed defined by
[TABLE]
Then the optimal solution \mbox{\boldmath{x}}^{+} for (21) can be computed exactly via the following steps:
[TABLE]
Proof.
Since , we know that , and are all nonsingular. Then, (23) can easily be obtained from the following observation
[TABLE]
Next we show the equivalence between (21) and (25). By noting that and , we can define as follows:
[TABLE]
The optimality conditions corresponding to and in (27) can be written as
[TABLE]
where . Simple calculations show that (28a) together with (24) can equivalently be rewritten as
[TABLE]
where \mbox{\boldmath{\gamma}}=(\gamma_{1};0;\ldots,0)\in{\cal X}, while (25) can equivalently be recast as
[TABLE]
By substituting \mbox{\boldmath{x}}^{\prime}=({\cal D}+{\cal U})^{-1}(\mbox{\boldmath{b}}-\mbox{\boldmath{\gamma}}+\mbox{\boldmath{\delta}}^{\prime}-{\cal U}^{*}\bar{}\mbox{\boldmath{x}}) into the above equation, we obtain that
[TABLE]
which, together with (26), (22) and the definition of in (20), implies that
[TABLE]
In the above, we have used the fact that . By noting that (29) is in fact the optimality condition for (21) and , we have thus obtained the equivalence between (21) and (25). This completes the proof of the theorem. ∎
We shall explain here the roles of the error vectors \mbox{\boldmath{\delta}}^{\prime} and in the above block sGS decomposition theorem. There is no need to choose these error vectors in advance. We emphasize that and obtained from (24) and (25) should be viewed as approximate solutions to the minimization problems without the terms involving and . Once these approximate solutions have been computed, they would generate and automatically. With these known error vectors, we know that the computed approximate solutions are the exact solutions to the minimization problems in (24) and (25).
The following proposition is useful in estimating the error term \Delta(\mbox{\boldmath{\delta}}^{\prime},\mbox{\boldmath{\delta}}) in (21).
Proposition 1**.**
Denote , which is positive definite. Let \xi=\|\widehat{\cal Q}^{-1/2}\Delta(\mbox{\boldmath{\delta}}^{\prime},\mbox{\boldmath{\delta}})\|. It holds that
[TABLE]
Proof.
Recall that . Thus, we have
[TABLE]
which, together with the definition of \Delta(\mbox{\boldmath{\delta}}^{\prime},\mbox{\boldmath{\delta}}) in (22), implies that
[TABLE]
The desired result then follows. ∎
Theorem 1 shows that instead of solving the QP subproblem (21) directly with an -dimensional variable , where , the computation can be decomposed into pieces of smaller dimensional problems involving only the variable for each . Such a decomposition is obviously highly useful for dealing with a large scale CCQP of the form (18) when is very large. The benefit is especially important because the computation of for involves only solving linear systems of equations. Of course, one would still have to solve a potentially difficult subproblem involving the variable due to the presence of the possibly nonsmooth term , i.e.,
[TABLE]
where is a known vector depending on the previously computed . However, in many applications, is usually a simple nonsmooth function such as , , or for which the corresponding subproblem is not difficult to solve. As a concrete example, suppose that . Then and the Moreau-Yosida proximal map can be computed efficiently for various nonsmooth function including the examples just mentioned. In fact, one can always make the subproblem easier to solve by (a) adding an additional proximal term to (21), where with ; and (b) modifying the sGS operator to , where . With the additional proximal term involving , the subproblem corresponding to then becomes
[TABLE]
In fact, more generally, one can also modify the other diagonal blocks in to make the linear systems involved easier to solve by adding the proximal term \frac{1}{2}\|\mbox{\boldmath{x}}-\bar{\mbox{\boldmath{x}}}\|^{2}_{\mbox{diag}(J_{1},J_{2},\ldots,J_{s})} to (21), where , are given symmetric matrices. Correspondingly, the sGS linear operator for the proximal term to be added to the problem (21) then becomes where , and in (23) becomes . There are many suitable choices for , A conservative choice would be , in which case the linear system to be solved has its coefficient matrix given by . Another possible choice of is the sGS linear operator associated with the matrix , in which case the linear system involved has its coefficient matrix given by and its solution can be computed by using one cycle of the sGS method. The latter choice has been considered in [3] for its variant of the classical block sGS method. Despite the advantage of simplifying the linear systems to be solved, one should note that the price to pay for adding the extra proximal term \frac{1}{2}\|\mbox{\boldmath{x}}-\bar{\mbox{\boldmath{x}}}\|^{2}_{\mbox{diag}(J_{1},\ldots,J_{s})} is worsening the convergence rate of the overall block sGS method.
3 A factorization view of the block sGS decomposition
theorem and its equivalence to the SCB reduction procedure
In this section, we present a factorization view of the block sGS decomposition theorem and show its equivalence to the Schur complement based (SCB) reduction procedure developed in [17, 18].
Let be the zero matrix in and . For , let and define and as follows:
[TABLE]
and
[TABLE]
Then, the above definitions indicate that, for ,
[TABLE]
In [17, 18], the SCB reduction procedure corresponding to problem (18) is derived through the construction of the above self-adjoint linear operator on . Now we recall the key steps in the SCB reduction procedure derived in the previous work. For , define
[TABLE]
Note that . It is easy to show that
[TABLE]
By first solving the inner minimization problem with respect to , we get the solution as a function of as follows:
[TABLE]
And the minimum value is given by
[TABLE]
Thus (32) reduces to a problem involving only the variables , which, up to a constant, is given by
[TABLE]
where Observe that (41) has exactly the same form as (32). By repeating the above procedure to sequentially eliminate the variables , we will finally arrive at a minimization problem involving only the variable . Once that minimization problem is solved, we can recover the solutions for in a sequential manner.
Now we will prove the equivalence between the block sGS decomposition theorem and the SCB reduction procedure in the subsequent analysis by proving that , where is given in (20). For , define the block matrices and by
[TABLE]
where is the identity matrix. Note that . Given , we have, by simple calculations, that for any ,
[TABLE]
From (30) and (43), we have that
[TABLE]
Lemma 1**.**
Let and be given in (16). It holds that
[TABLE]
Proof.
It can be verified directly that
[TABLE]
The second equality follows readily from the first. ∎
In the proof of the next lemma, we will make use of the well known fact that for given symmetric matrices such that and , we have that
[TABLE]
Theorem 2**.**
It holds that
[TABLE]
Proof.
By using (54), for , we have that
[TABLE]
where
[TABLE]
Thus, from (44), we know that for ,
[TABLE]
For , by (31), we have that
[TABLE]
and consequently,
[TABLE]
Thus, by recalling the definitions of and in (42) and using (56), we obtain through simple calculations that
[TABLE]
Thus, by using the fact that , we get
[TABLE]
By Lemma 1, it follows that
[TABLE]
where the last equation follows from (23) in Theorem 1. Since , we know that
[TABLE]
This completes the proof of the theorem. ∎
4 An extended block sGS method for solving the CCQP (18)
With the block sGS decomposition theorem (Theorem 1) and Proposition 1, we can now extend the classical block sGS method to solve the CCQP (18). The detail steps of the algorithm for solving (18) are given as follows.
Algorithm 1: An sGS based inexact proximal gradient method for (18).
Input \widetilde{}\mbox{\boldmath{x}}^{1}=\mbox{\boldmath{x}}^{0}\in{\rm dom}(p)\times\mathbb{R}^{n_{2}}\times\ldots\times\mathbb{R}^{n_{s}}, and a summable sequence of nonnegative numbers . For , perform the following steps in each iteration.
Step 1.
Compute
\mbox{\boldmath{x}}^{k}=\mbox{argmin}_{\mbox{\boldmath{x}}\in{\cal X}}\;\Big{\{}p(x_{1})+q(\mbox{\boldmath{x}})+\frac{1}{2}\|\mbox{\boldmath{x}}-\widetilde{}\mbox{\boldmath{x}}^{k}\|_{{\cal T}_{\cal Q}}^{2}-\langle\mbox{\boldmath{x}},\,\Delta(\,\widetilde{}\mbox{\boldmath{\delta}}^{k},\mbox{\boldmath{\delta}}^{k})\rangle\Big{\}},
(57)
via the sGS decomposition procedure described in Theorem 1, where \widetilde{}\mbox{\boldmath{\delta}}^{k},\,\mbox{\boldmath{\delta}}^{k}\in{\cal X} are error vectors such that
\max\{\|\widetilde{}\mbox{\boldmath{\delta}}^{k}\|,\|\mbox{\boldmath{\delta}}^{k}\|\}\leq\frac{\epsilon_{k}}{t_{k}}.
(58)
Step 2.
Choose such that and set . Compute
\widetilde{}\mbox{\boldmath{x}}^{k+1}=\mbox{\boldmath{x}}^{k}+\beta_{k}(\mbox{\boldmath{x}}^{k}-\mbox{\boldmath{x}}^{k-1}).
We have the following iteration complexity convergence results for Algorithm 1.
Proposition 2**.**
*Suppose \mbox{\boldmath{x}}^{*} is an optimal solution of problem (18). Let \{\mbox{\boldmath{x}}^{k}\} be the sequence generated by Algorithm 1. Define .
(a) If for all , it holds that*
[TABLE]
*where and .
(b) If for all , it holds that*
[TABLE]
where .
Proof.
(a) The result can be proved by applying Theorem 2.1 in [14]. In order to apply the theorem, we need to verify that the error \mbox{\boldmath{e}}:=\mbox{\boldmath{\gamma}}+{\cal Q}\mbox{\boldmath{x}}^{k}-\mbox{\boldmath{b}}+{\cal T}_{\cal Q}(\mbox{\boldmath{x}}^{k}-\widetilde{\mbox{\boldmath{x}}}^{k}), where \mbox{\boldmath{\gamma}}=(\gamma_{1};0;\ldots;0) and , incurred for solving the subproblem (without the perturbation term \Delta(\,\widetilde{}\mbox{\boldmath{\delta}}^{k},\mbox{\boldmath{\delta}}^{k})) in Step 1 inexactly is sufficiently small. From Theorem 1, we know that
[TABLE]
The theorem is proved via Theorem 2.1 in [14] if we can show that \|\widehat{{\cal Q}}^{-1/2}\Delta(\,\widetilde{}\mbox{\boldmath{\delta}}^{k},\mbox{\boldmath{\delta}}^{k})\|\leq M\frac{\epsilon_{k}}{t_{k}}. But from (58) and Proposition 1, we have that
[TABLE]
thus the required inequality indeed holds true, and the proof is completed.
(b) There is no straightforward theorem for which we can apply to prove the result, we will provide the proof in the Appendix. ∎
Remark 1**.**
It is not difficult to show that if , , and \mbox{\boldmath{\delta}}^{k}=\widetilde{}\mbox{\boldmath{\delta}}^{k}=0 for all , then Algorithm 1 exactly coincides with the classical block sGS method (17); and if \mbox{\boldmath{\delta}}^{k}, \widetilde{}\mbox{\boldmath{\delta}}^{k} are allowed to be non-zero but satisfy the condition (58) for all , then we obtain the inexact extension of the classical block sGS method.
Remark 2**.**
Proposition 2 shows that the classical block sGS method for solving (1) can be extended to solve the convex composite QP (18). It also demonstrates the advantage of interpreting the block sGS method from the optimization perspective. For example, one can obtain the iteration complexity result for the classical block sGS method without assuming that is positive definite. To the best of our knowledge, such a complexity result for the classical block sGS is new. More importantly, inexact and accelerated versions of the block sGS method can also be derived for (1).
Remark 3**.**
In solving (57) via the sGS decomposition procedure to satisfy the error condition (58), let \mbox{\boldmath{x}}^{\prime}=[x^{\prime}_{1};\ldots;x^{\prime}_{s}] be the intermediate solution computed during the backward GS sweep (in Theorem 1) and the associated error vector be \widetilde{\mbox{\boldmath{\delta}}}^{k}=[\widetilde{\delta}^{k}_{1};\ldots;\widetilde{\delta}^{k}_{s}]. In the forward GS sweep, one can often save computations by using the computed to estimate for , and the resulting error vector will be given by \delta^{k}_{i}=\widetilde{\delta}^{k}_{i}+\mbox{\sum_{j=1}^{i-1}}Q_{ji}^{*}(x_{j}^{k+1}-\widetilde{x}^{k}_{j}). If we have that
[TABLE]
where \rho=\frac{c}{\sqrt{s}}\|\widetilde{\mbox{\boldmath{\delta}}}^{k}\| and is some given constant, then clearly . When all the error components satisfy the previous bound for , regardless of whether is estimated from or computed afresh, we get \|\mbox{\boldmath{\delta}}^{k}\|\leq\sqrt{2(1+c^{2})}\|\widetilde{\mbox{\boldmath{\delta}}}^{k}\|. Consequently the error condition (58) can be satisfied with a slightly larger error tolerance . It is easy to see that one can use the condition in (59) to decide whether can be estimated from without contributing a large error to \|\mbox{\boldmath{\delta}}^{k}\| for each .
Besides the above iteration complexity results, one can also study the linear convergence rate of Algorithm 1. Indeed, just as in the case of the classical block sGS method, the convergence rate of our extended inexact block sGS method for solving (18) can also be established when . The precise result is given in the next theorem.
Theorem 3**.**
Suppose that the relative interior of the domain of , , is non-empty, and for all . Then
[TABLE]
where , and is defined as in Proposition 2. Note that .
Proof.
For notational convenience, we let \Delta^{j}=\Delta(\widetilde{\mbox{\boldmath{\delta}}}^{j},\mbox{\boldmath{\delta}}^{j}) in this proof.
Define by E_{1}(\mbox{\boldmath{x}})=x_{1} and by \widehat{p}(\mbox{\boldmath{x}})=p(E_{1}{\widehat{{\cal Q}}}^{-1/2}\mbox{\boldmath{x}}). Since , it is clear that and hence . By [21, Theorem 23.9], we have that
[TABLE]
From the optimality condition of \mbox{\boldmath{x}}^{j}, we have that
[TABLE]
where \mbox{\boldmath{\gamma}}^{j}=(\gamma_{1}^{j};0;\ldots;0) with . Let \hat{\mbox{\boldmath{x}}}^{j}={\widehat{{\cal Q}}}^{1/2}\mbox{\boldmath{x}}^{j} and \hat{\mbox{\boldmath{x}}}^{j-1}={\widehat{{\cal Q}}}^{1/2}\mbox{\boldmath{x}}^{j-1}. Then we have that
[TABLE]
Similarly if \mbox{\boldmath{x}}^{*} is an optimal solution of (18), then we have that
[TABLE]
By using the nonexpansive property of , we have that
[TABLE]
By applying the above inequality sequentially for , we get the required result in (60). ∎
Remark 4**.**
In fact, one can weaken the positive definiteness assumption of in the above theorem and still expect a linear rate of convergence. As a simple illustration, we only discuss here the exact version of Algorithm 1, i.e., \widetilde{}\mbox{\boldmath{\delta}}^{k}=\mbox{\boldmath{\delta}}^{k}=0, under the error bound condition [19, 20] on which holds automatically if is a convex piecewise quadratic/linear function such as , or if . When for all , one can prove that \{F(\mbox{\boldmath{x}}^{k})\} converges at least Q-linearly and \{\mbox{\boldmath{x}}^{k}\} converges at least R-linearly to an optimal solution of problem (18) by using the techniques developed in [19, 20]. Interested readers may refer to [28, 30] for more details. For the accelerated case, with the additional fixed restarting scheme incorporated in Algorithm 1, both the R-linear convergences of \{F(\mbox{\boldmath{x}}^{k})\} and \{\mbox{\boldmath{x}}^{k}\} can be obtained from [29, Corollary 3.8].
5 An illustration on the application of the block sGS decomposition theorem in designing
an efficient proximal ALM
In this section, we demonstrate the usefulness of our block sGS decomposition theorem as a building block for designing an efficient proximal ALM for solving a linearly constrained convex composite QP problem given by
[TABLE]
where is a positive semidefinite linear operator on , is a given linear map, and \mbox{\boldmath{g}}\in{\cal X}, are given data. Here and are two finite dimensional inner product spaces. Specifically, we show how the block sGS decomposition theorem given in Theorem 1 can be applied within the proximal ALM. We must emphasize that our main purpose here is to briefly illustrate the usefulness of the block sGS decomposition theorem but not to focus on the proximal ALM itself. Indeed, simply being capable of handling the nonsmooth function has already distinguished our approach from other approaches of using the sGS technique in optimization algorithms, e.g, [8, 9], where the authors incorporated the pointwise sGS splitting as a preconditioner within the Douglas–Rachford splitting method for a convex-concave saddle point problem.
In depth analysis of various recently developed ADMM-type algorithms and accelerated block coordinate descent algorithms employing the block sGS decomposition theorem as a building block can be found in [6, 15, 16, 17, 25]. Thus we shall not elaborate here again on the essential role played by the block sGS decomposition theorem in the design of those algorithms.
Although the problem (62) looks deceivingly simple, in fact it is a powerful model which includes the important class of standard convex quadratic semidefinite programming (QSDP) in the dual form given by
[TABLE]
where , are given data, is a given linear map that is assumed to be surjective, is a self-adjoint positive semidefinite linear operator, and is any subspace containing , the range space of . Here denotes the space of symmetric matrices and denotes the cone of symmetric positive semidefinite matrices in . One can obviously express the QSDP problem (63) in the form of (62) by defining \mbox{\boldmath{x}}=(Z;\xi;W), , , and .
We begin with the augmented Lagrangian function associated with (62):
[TABLE]
where is a given penalty parameter and is the multiplier associated with the equality constraint. The template for a proximal ALM is given as follows. Given , \mbox{\boldmath{x}}^{0}\in{\cal X} and . Perform the following steps in each iteration.
Step 1.
Compute
[TABLE]
where \mbox{\boldmath{b}}=\mbox{\boldmath{g}}+{\cal A}^{*}(\sigma d-y^{k}).
Step 2.
Compute y^{k+1}=y^{k}+\tau\sigma({\cal A}\mbox{\boldmath{x}}^{k}-d), where is the step-length.
It is clear that the subproblem (65) has the form given in (18). Thus, one can apply the block sGS decomposition theorem to efficiently solve the subproblem if we choose , i.e., the sGS operator associated with . For the QSDP problem (63) with , we have that
[TABLE]
and that the subproblem (65) can be efficiently solved by one cycle of the extended block sGS method explicitly as follows, given the iterate and multiplier .
Step 1a.
Compute as the solution of where .
Step 1b.
Compute from where .
Step 1c.
Compute Z^{k+1}=\mbox{argmin}\Big{\{}\delta_{\mathbb{S}^{n}_{+}}(Z)+\frac{\sigma}{2}\|Z+{\cal B}^{*}\xi^{\prime}+{\cal H}W^{\prime}-\sigma^{-1}b_{Z}\|^{2}\Big{\}}, where .
Step 1d.
Compute from
Step 1e.
Compute from
From the above implementation, one can see how simple it is for one to apply the block sGS decomposition theorem to solve the complicated subproblem (65) arising from QSDP. Note that in Step 1a and Step 1e, we only need to compute and , respectively, and we do not need the values of and explicitly. Here, for simplicity, we only write down the exact version of a proximal ALM by using our exact block sGS decomposition theorem. Without any difficulty, one can also apply the inexact version of the block sGS decomposition theorem to derive a more practical inexact proximal ALM for solving (62), say when the linear systems involved are large scale and have to be solved by a Krylov subspace iterative method.
6 Extension of the classical block symmetric SOR method for solving (18)
In a way similar to what we have done in section 2, we show in this section that the classical block symmetric SOR (block sSOR) method can also be interpreted from an optimization perspective.
Given a parameter , the th iteration of the classical block sSOR method in the third normal form is defined by
[TABLE]
where
[TABLE]
, and . Note that for , we have that and . We should mention that the classical block sSOR method is typically not derived in the form given in (67), see for example [13, p.117], but one can show with some algebraic manipulations that (67) is an equivalent reformulation.
Denote
[TABLE]
In the next proposition, we show that can be decomposed as the sum of and . Similar to the linear operator in section 2, is the key ingredient which enables us to derive the block sSOR method from the optimization perspective, and to extend it to solve the CCQP (18).
Proposition 3**.**
Let , and denote , . It holds that
[TABLE]
Proof.
Let and . Note that and
[TABLE]
Now
[TABLE]
From here, we get the required expression for in (68). ∎
Given two error tolerance vectors and \mbox{\boldmath{\delta}}^{\prime} with , let
[TABLE]
Given \bar{}\mbox{\boldmath{x}}\in{\cal X}, similar to Theorem 1, one can prove without much difficulty that the optimal solution of the following minimization subproblem
[TABLE]
can be computed by performing exactly one cycle of the block sSOR method. In particular, when and \mbox{\boldmath{\delta}}=\mbox{\boldmath{\delta}}^{\prime}=\mbox{\bf 0}, the optimal solution to (69) can be computed by (67), i.e., set \bar{}\mbox{\boldmath{x}}=\mbox{\boldmath{x}}^{k}, then \mbox{\boldmath{x}}^{k+1} obtained from (67) is the optimal solution to (69). By replacing and in Algorithm 1 with and , respectively, one can obtain a block sSOR based inexact proximal gradient method for solving (18) and the convergence results presented in Proposition 2 and Theorem 3 still remain valid with replaced by .
Remark 5**.**
For the classical pointwise sSOR method, it was shown in [13, Theorem 4.8.14] that if there exist positive constants and such that
[TABLE]
then its convergence rate is where Interestingly, for the convergence rate of our block sSOR method in Theorem 3, we also have a similar estimate given by
[TABLE]
In order to minimize the upper bound, we can choose and then we get
[TABLE]
7 Conclusion
In this paper, we give an optimization interpretation that each cycle of the classical block sGS method is equivalent to solving the associated multi-block convex QP problem with an additional proximal term. This equivalence is fully characterized via our block sGS decomposition theorem. A factorization view of this theorem and its equivalence to the SCB reduction procedure are also established. The classical block sGS method, viewed from the optimization perspective via the block sGS decomposition theorem, is then extended to the inexact setting for solving a class of multi-block convex composite QP problems involving nonsmooth functions. Moreover, we are able to derive and iteration complexities for our inexact block sGS method and its accelerated version, respectively. These new interpretations and convergence results, together with the incorporation of the (inexact) sGS decomposition techniques in the design of efficient algorithms for core optimization problems in [6, 15, 16, 17, 25], demonstrate the power and usefulness of our simple yet elegant block sGS decomposition theorem. We believe this decomposition theorem will be proven to be even more useful in solving other optimization problems and beyond.
Appendix: Proof of part (b) of Proposition 2
To begin the proof, we state the following lemma from [24].
Lemma 2**.**
Suppose that and are two sequences of nonnegative scalars, and is a nondecreasing sequence of scalars such that . Suppose that for all , the inequality holds. Then for all , where .
Proof.
In this proof, we let \Delta^{j}=\Delta(\tilde{\mbox{\boldmath{\delta}}}^{j},\mbox{\boldmath{\delta}}^{j}). Note that under the assumption that for all , \widetilde{\mbox{\boldmath{x}}}^{j}=\mbox{\boldmath{x}}^{j-1}. Note also that from (58), we have that , where is given as in Proposition 2.
From the optimality of \mbox{\boldmath{x}}^{j} in (57), one can show that
[TABLE]
Let \mbox{\boldmath{e}}^{j}=\mbox{\boldmath{x}}^{j}-\mbox{\boldmath{x}}^{*}. By setting \mbox{\boldmath{x}}=\mbox{\boldmath{x}}^{j-1} and \mbox{\boldmath{x}}=\mbox{\boldmath{x}}^{*} in (70), we get
[TABLE]
By multiplying to (71) and combining with (72), we get
[TABLE]
where a_{j}=2j[F(\mbox{\boldmath{x}}^{j})-F(\mbox{\boldmath{x}}^{*})] and b_{j}=\|\mbox{\boldmath{e}}^{j}\|_{\widehat{{\cal Q}}}. Note that the last inequality follows from (72) with and some simple manipulations. To summarize, we have . By applying Lemma 2, we get
[TABLE]
where with . Applying the above result to (73), we get
[TABLE]
From here, the required result in Part (b) of Proposition 2 follows. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] O. Axelsson , Iterative Solution Methods , Cambridge University Press, 1994.
- 2[2] M. R. Bai, X. J. Zhang, G. Y. Ni, and C. F. Cui , An adaptive correction approach for tensor completion , SIAM J. Imaging Sciences, 9 (2016), pp. 1298–1323.
- 3[3] R. E. Bank, T. F. Dupont, and H. Yserentant , The hierarchical basis multigrid method , Numer. Math., 52 (1988), pp. 427–458.
- 4[4] A. Beck and L. Tetruashvili , On the convergence of block coordinate descent type methods , SIAM Journal on Optimization, 23 (2013), pp. 2037–2060.
- 5[5] D. P. Bertsekas , Nonlinear Programming , 2nd ed., Athena Scientific, Belmont, Massachusetts, 1995.
- 6[6] L. Chen, D. F. Sun, and K.-C. Toh , An efficient inexact symmetric Gauss-Seidel based majorized ADMM for high-dimensional convex composite conic programming , Mathematical Programming, 161 (2017), pp. 237–270.
- 7[7] A. Greenbaum , Iterative Methods for Solving Linear Systems , SIAM, Philadelphia, 1997.
- 8[8] B. Kristian and H. P. Sun , Preconditioned Douglas–Rachford splitting methods for convex-concave saddle-point problems , SIAM Journal on Numerical Analysis, 53 (2015), pp. 421–444.
