A Control-Theoretic Approach to Analysis and Parameter Selection of Douglas-Rachford Splitting
Jacob H. Seidman, Mahyar Fazlyab, Victor M. Preciado, George J. Pappas

TL;DR
This paper introduces a control-theoretic framework for analyzing and selecting parameters in Douglas-Rachford splitting and ADMM algorithms, providing convergence guarantees and simplifying rate proofs.
Contribution
It offers a unified, dynamical system-based analysis method for convergence and parameter tuning of Douglas-Rachford splitting, applicable under various convexity and smoothness conditions.
Findings
Derived a dimensionally independent matrix inequality for convergence analysis.
Provided performance guarantees and optimal parameter settings.
Simplified proof of O(1/k) convergence rate without strong convexity.
Abstract
Douglas-Rachford splitting and its equivalent dual formulation ADMM are widely used iterative methods in composite optimization problems arising in control and machine learning applications. The performance of these algorithms depends on the choice of step size parameters, for which the optimal values are known in some specific cases, and otherwise are set heuristically. We provide a new unified method of convergence analysis and parameter selection by interpreting the algorithm as a linear dynamical system with nonlinear feedback. This approach allows us to derive a dimensionally independent matrix inequality whose feasibility is sufficient for the algorithm to converge at a specified rate. By analyzing this inequality, we are able to give performance guarantees and parameter settings of the algorithm under a variety of assumptions regarding the convexity and smoothness of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
A Control-Theoretic Approach to Analysis and Parameter Selection of Douglas-Rachford Splitting
Jacob H. Seidman1, Mahyar Fazlyab2, Victor M. Preciado2, George J. Pappas2 1: Department of Applied Mathematics and Computational Science, University of Pennsylvania Email: [email protected]. 2: Department of Electrical and Systems Engineering, University of Pennsylvania. Email: {mahyarfa, preciado, pappasg}@seas.upenn.edu. Work was supported by ARL CRA DCIST W911NF-17-2- 0181 and the NSF under grants CAREER-ECCS-1651433.
Abstract
Douglas-Rachford splitting and its equivalent dual formulation ADMM are widely used iterative methods in composite optimization problems arising in control and machine learning applications. The performance of these algorithms depends on the choice of step size parameters, for which the optimal values are known in some specific cases, and otherwise are set heuristically. We provide a new unified method of convergence analysis and parameter selection by interpreting the algorithm as a linear dynamical system with nonlinear feedback. This approach allows us to derive a dimensionally independent matrix inequality whose feasibility is sufficient for the algorithm to converge at a specified rate. By analyzing this inequality, we are able to give performance guarantees and parameter settings of the algorithm under a variety of assumptions regarding the convexity and smoothness of the objective function. In particular, our framework enables us to obtain a new and simple proof of the convergence rate of the algorithm when the objective function is not strongly convex.
Index Terms:
Optimization algorithms, Lyapunov methods
I Introduction
In this paper, we consider problems of the form
[TABLE]
where are convex, closed, and proper (c.c.p.). Douglas-Rachford splitting (DRS) solves problem (1) with the following iterations:
[TABLE]
where is the proximal operator (see Definition 1) and and are known as the proximal step size and relaxation parameter, respectively. For a proper selection of these parameters, the limiting values of both and will be a solution to (1). The goal of this work is to provide convergence rates for DRS over various assumptions on and , and optimize these rates with respect to the algorithm parameters and using semidefinite programs (SDPs).
The algorithm was first proposed in [1], and has since found application in general separable optimization problems [2]. Its dual formulation, ADMM, has been particularly useful in distributed optimization problems [3]. Since the iterates of ADMM can be written as applying DRS to the dual problem [4, 5], convergence results for one algorithm are valid for the other as well when strong duality holds.
The convergence of DRS has previously been analyzed using monotone operator theory and variational inequalities, see [6], [7]. These techniques have led to proofs of a convergence rate for the non-strongly convex case [8], and linear convergence when is smooth and strongly convex [9], [10]. In the most general case, a condition for convergence is that with [11], though there exist cases where the algorithm converges with [12].
Recently, there has been interest in automating the analysis and design of optimization algorithms via SDPs, [13, 14, 15, 16, 17, 18, 19]. In particular, through the method of integral quadratic constraints proposed in [14], the authors of [12] derive an SDP for choosing the parameters of ADMM in the case of smooth and strongly convex . Using a similar framework, the authors of [20] provide evidence that as the relaxation parameter approaches from below, the linear convergence rate is close to being optimal and in [21] are able to analytically solve the SDP to give a convergence rate. The work in [22] gives an optimal choice for the relaxation parameter when is quadratic. Furthermore, [23] gives a set of assumptions in which a bound on the linear convergence rate is minimized by setting .
Our Contribution: By viewing DRS as a linear system with non-linear feedback, we derive a dimensionally independent matrix inequality which gives convergence guarantees via Lyapunov functions. Whereas such an approach was previously applied in [12] to the case of smooth and strongly convex , our framework is novel in that it encompasses varying assumptions on the smoothness and convexity of . By changing a single term in the Lyapunov function for each scenario, we are able to relate the satisfaction of a matrix inequality to the convergence of the algorithm. In particular, we give a new and simple proof of convergence in the non-strongly convex case. These symbolic results can then be used to select step sizes that optimize the derived rates.
In the strongly convex case, the corresponding matrix inequality is sufficient to guarantee a linear convergence rate. We are able to modify the matrix inequality to linearize the dependence on , allowing us to numerically optimize its value for the convergence rate directly. While previous work derived SDP’s which can verify the performance of the algorithm for a given parameter setting, to the best of our knowledge this is the first time such a method immediately gives an optimal relaxation parameter when solved numerically, as opposed to having to search over a range of values for .
II Preliminaries
We denote the set of real numbers by , the set of real -dimensional vectors by , the set of real -dimensional matrices by , and the -dimensional identity matrix and zero matrix by and , respectively. For a function , we denote by the effective domain of . The subdifferential of a function at a point is . By abuse of notation we will also refer to a subgradient, that is an element of the subdifferential by as well. The indicator function of a set is given by if and if . For two matrices and their Kronecker product is .
We say a differentiable function is -smooth on if for some and all . This also implies for all , . A differentiable function is -strongly convex on if for some and all . The class of functions which are -smooth and -strongly convex is denoted by .
Definition 1** **(Proximal Operator)
Given a c.c.p. function and , the proximal operator is defined as
[TABLE]
The point also is given by the implicit solution to the subgradient equation
[TABLE]
We say that a nonlinear function satisfies the incremental quadratic constraint [24] (or point-wise integral quadratic constraint [14]) defined by if for all ,
[TABLE]
For define
[TABLE]
and define as . It was noted in [25, 14] that a differentiable function belongs to the class on if and only if the gradient satisfies the incremental QC defined by . When , the subgradient satisfies the QC defined by . If we define
[TABLE]
then the proximal operator of a function , , satisfies the incremental QC defined by [15].
III Analysis of Douglas-Rachford Splitting via Matrix Inequalities
III-A Douglas-Rachford Splitting as a Dynamical System
We can write the updates in (2) as a linear system with state and feedback nonlinearity ,
[TABLE]
where
[TABLE]
Our main technique is to describe the nonlinearity with incremental QCs representing the operator. This allows us to derive a matrix inequality as a sufficient condition for closed-loop stability of the system via a Lyapunov function argument. We perform this derivation in the following three cases:
- •
Case 1: and ,
- •
Case 2: and , with ,
- •
Case 3: and , with .
We will see that for each case only one term in the Lyapunov function needs to be modified to obtain the convergence result. We then use the matrix inequality condition for each case to obtain information about optimal choices of the algorithm parameters both symbolically and numerically.
III-B Characterization of Fixed Points
From relation (4), the iterates (2) can be rewritten as
[TABLE]
The fixed points of (10) satisfy
[TABLE]
Since , the rightmost equality is exactly the optimality condition for (1).
We will also make use of the following relation, obtained from adding the equations in (10) and the definition of ,
[TABLE]
From this, we can interpret the feedback nonlinearity as the optimality residual of problem (1), which is driven to zero by the linear system in the feedback interconnection.
III-C Convergence Certificates via Matrix Inequalities
III-C1 Case 1: Non-strongly convex and non-smooth case
We first assume that . We propose the following family of Lyapunov functions parameterized by a sequence with ,
[TABLE]
for all and . For notational convenience define the partial sums . The presence of the running sum of subgradients is reminiscent of the Popov criterion [26]. It can also be interpreted as the running weighted sum of fixed point residuals (see (11)). The next lemma shows how this Lyapunov function can ensure a convergence rate in terms of the growth of .
Lemma 1
Consider the algorithm in (2). Suppose there exists a sequence with such that for all . Then
[TABLE]
Proof:
Since for all , in particular we have that , or
[TABLE]
Removing the first term on the left, and dividing through by gives
[TABLE]
The result follows from the fact that the left side is a weighted average, as and . ∎
In the following theorem, we derive a matrix inequality in terms of , and as a sufficient condition to guarantee , which in turn implies (14).
Theorem 1
Let , , and consider the following matrix inequality
[TABLE]
where
[TABLE]
If and are chosen so that (17) is satisfied for all , then for all and the iterates in (2) satisfy
[TABLE]
Proof:
We first see that can be written as a quadratic form. Define the error signal
[TABLE]
Using the updates in (10), the fact that (see (11)), and the relation (12), it can be verified that
[TABLE]
where is given by (18a). Next, note that
[TABLE]
where is defined in (7). Since and , this is exactly the incremental QC that the operator satisfies. Thus, we have for all , . We also note that
[TABLE]
As and , we similarly conclude that is implied from the incremental QC that satisfies. Returning to (17), if we multiply from the left and right by and respectively, we obtain
[TABLE]
Since and we have shown that and , it must be that . Hence, , and the result now follows from Lemma 1. ∎
III-C2 Case 2: Non-strongly convex and smooth
If with , we may leverage the smoothness of to refine the result of the previous section. In particular, we can use the inequality for -smooth functions (see Preliminaries) to relate the behavior of the subgradients to the objective values, whereas in the previous section this inequality was not available. For let
[TABLE]
with defined as in the previous case.
This Lyapunov function leads to the following Lemma, the proof of which is identical to that of Lemma 1.
Lemma 2
Consider the algorithm in (2). Suppose there exists a sequence with such that for all . Then
[TABLE]
This allows us to prove the following theorem for when is non-strongly convex and smooth.
Theorem 2
Let and consider the following matrix inequality
[TABLE]
where
[TABLE]
and and are defined in (18). If and are chosen so that (25) is satisfied for all , then for all with and the iterates in (2) satisfy
[TABLE]
Proof:
We begin by bounding the difference of the Lyapunov function defined in (24), , by a quadratic form in the error signal (see (20)). From the convexity and smoothness of , we can write
[TABLE]
where we have used that . From the convexity of
[TABLE]
Adding these three inequalities together and using the relation (12) allows us to conclude
[TABLE]
Using the recursion for , we then find that
[TABLE]
The proof now proceeds identically as in the proof of Theorem 1 up to the statement that (25) implies . Then by (32), we have that , and the result follows from Lemma 2. ∎
III-C3 Case 3: Strongly convex and smooth
We now assume that and , with . For this scenario let
[TABLE]
The following lemma characterizes when we can extract a linear convergence rate from this Lyapunov function.
Lemma 3
Consider the algorithm in (2). Suppose there exists such the Lyapunov function defined by (33) satisfies for all . Then
[TABLE]
Proof:
The proof follows immediately , the definition (33) of , and induction. ∎
We again see that the difference can be written as a quadratic form acting on the error signal as defined in (20). Using the definition for in terms of the previous iterates, we can write
[TABLE]
where is given by
[TABLE]
From (35), we can use the same reasoning developed in the non-strongly convex case to arrive at the following theorem.
Theorem 3
Let and consider the following matrix inequality
[TABLE]
where is given in (36) and and are given in (18). If , , and are chosen so that (37) is satisfied for all , then for all and with , the iterates in (2) satisfy the following linear convergence rate
[TABLE]
Proof:
We proceed identically as in the proof of Theorems 1 and 2. If (37) is satisfied, then for all . This is equivalent to , by which (38) follows from Lemma 3. ∎
IV Optimizing the Bound and Relaxation Parameter
For each of the cases presented in the previous section we use the associated matrix inequality to optimize our bounds on the convergence rate. In the case of non-strongly convex we provide analytic convergence rates, while for strongly convex we optimize our bounds numerically.
IV-1 Case 1: Non-strongly convex and non-smooth case
We now select algorithm parameters that satisfy the matrix inequality in (17). In doing so, we arrive at a new and simple proof of the convergence of DRS in the non-strongly convex and non-smooth case.
Theorem 4
If and , then for any choice of and , if we set , with
[TABLE]
then and satisfy the matrix inequality (17).
Proof:
Making these substitutions gives ∎
Remark 1
The convergence rate bound provided by the parameter choices in Theorem 4 guarantees convergence only if , which in this case means . This is consistent with the conditions on the relaxation parameter found in [11].
Remark 2
After setting as in (39), we can maximize by setting which results in and the following result
[TABLE]
Remark 3
Using the relation (12) we can rewrite (40) as
[TABLE]
Thus we see that Theorem 4 also gives a rate toward the iterates being a fixed point of the algorithm.
IV-2 Case 2: Non-strongly convex and smooth case
When with , we have the following result on feasibility of the matrix inequality (25).
Theorem 5
For any and with , if we set ,
[TABLE]
Then , and satisfy the matrix inequality (25).
Proof:
This can be verified by substituting the expressions for , , and into the minors of and seeing that Sylverster’s criterion is satisfied [27]. ∎
Remark 4
Note that for a constant selection of , . If (25) holds then
[TABLE]
Remark 5
For moderate values of , we can take a second order Taylor expansion of the rightmost term in (42b) and maximize the resulting expression with respect to . This suggests that we should set to
[TABLE]
IV-3 Case 3: Strongly convex and smooth case
When , , with , we can modify the matrix inequality (37) to get a linear dependence on the relaxation parameter. If we define and
[TABLE]
then (37) is equivalent to
[TABLE]
As , if (46) is satisfied then it must be the case that . We now recognize that is the Schur Complement of the bottom right entry in the matrix
[TABLE]
By the properties of the Schur complement [28], we can conclude that (37) is satisfied if and only if . The advantage of using instead of (37), is that now both the convergence rate and the relaxation parameter appear linearly. If we set , , and for all , the corresponding bound in (38) can be optimized by solving the following SDP, where is now a decision variable,
[TABLE]
where the decision variables are and . The optimal from solving this program over a range of step sizes and condition numbers is shown in Figure 1. We see that with increasing , the optimal choice of decreases. Across the range of values of and , the SDP (48) returns as the optimal relaxation parameter.
Remark 6
We may repeat the same derivation for the inequalities (17) and (25) to linearize their dependence on as well.
V Numerical Experiments
We investigate how our theoretical results compare with the experimental performance of DRS in the three scenarios described above. For the non-smooth and non-strongly convex case, we consider a basis pursuit problem (see [3]),
[TABLE]
with data and . We run DRS on the dual of this problem (ADMM) which is non-smooth and non-strongly convex. We test the convergence over a range of values of , including (see Remark 2), and fixed .
For the smooth cases we consider a LASSO problem,
[TABLE]
with , and . For the non-strongly convex case we set to be rank deficient and plot the convergence of DRS over a range of with fixed. The value is found by performing a grid search over possible values and choosing that which maximizes the rate given by Theorem 2. For the strongly convex case is set to be full rank and plot the convergence of DRS over a range of with fixed. Again, is the value of which gives the best rate as provided by Theorem 3. The results are presented in Figure 2.
VI Conclusion
We presented a unified framework for deriving convergence bounds for DRS and parameter settings that optimize these bounds. Our framework encompasses different assumptions on the smoothness and convexity of . We are able to give simple proofs of convergence and find optimal choices for the relaxation parameter by solving a small convex program for a fixed . It is important to note that the parameter selections optimize our bounds in the sense of the best worst-case convergence rate over the entire class of objective functions with . While there are scenarios where additional structure in the problem might make alternate parameter settings more effective, the settings we see here bound the worst-case performance agnostic of any additional problem structure. For future work, this framework will be extended to encompass accelerated variants of DRS, as well as three or more operator splitting and multi-block ADMM.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] J. Douglas and H. H. Rachford, “On the numerical solution of heat conduction problems in two and three space variables,” Transactions of the American mathematical Society , vol. 82, no. 2, pp. 421–439, 1956.
- 2[2] G. Stathopoulos, H. Shukla, A. Szucs, Y. Pu, C. N. Jones et al. , “Operator splitting methods in control,” Foundations and Trends® in Systems and Control , vol. 3, no. 3, pp. 249–362, 2016.
- 3[3] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein et al. , “Distributed optimization and statistical learning via the alternating direction method of multipliers,” Foundations and Trends® in Machine learning , vol. 3, no. 1, pp. 1–122, 2011.
- 4[4] D. Gabay, “Chapter ix applications of the method of multipliers to variational inequalities,” in Studies in mathematics and its applications . Elsevier, 1983, vol. 15, pp. 299–331.
- 5[5] J. Eckstein, “Splitting methods for monotone operators with applications to parallel optimization,” Ph.D. dissertation, Massachusetts Institute of Technology, 1989.
- 6[6] P.-L. Lions and B. Mercier, “Splitting algorithms for the sum of two nonlinear operators,” SIAM Journal on Numerical Analysis , vol. 16, no. 6, pp. 964–979, 1979.
- 7[7] J. Eckstein and D. P. Bertsekas, “On the douglas—rachford splitting method and the proximal point algorithm for maximal monotone operators,” Mathematical Programming , vol. 55, no. 1-3, pp. 293–318, 1992.
- 8[8] B. He and X. Yuan, “On the o(1/n) convergence rate of the douglas–rachford alternating direction method,” SIAM Journal on Numerical Analysis , vol. 50, no. 2, pp. 700–709, 2012.
