Relative-error inertial-relaxed inexact versions of Douglas-Rachford and ADMM splitting algorithms
M. Marques Alves, Jonathan Eckstein, Marina Geremia, Jefferson Melo

TL;DR
This paper introduces new inexact, inertial, and relaxed variants of Douglas-Rachford and ADMM algorithms for convex optimization, demonstrating improved computational performance on LASSO and logistic regression problems.
Contribution
It develops novel inexact inertial-relaxed algorithms for Douglas-Rachford and ADMM, expanding their theoretical framework and practical efficiency.
Findings
Improved computational performance on LASSO and logistic regression
New inexact variants with inertial and overrelaxation features
Theoretical analysis based on a new inexact proximal point framework
Abstract
This paper derives new inexact variants of the Douglas-Rachford splitting method for maximal monotone operators and the alternating direction method of multipliers (ADMM) for convex optimization. The analysis is based on a new inexact version of the proximal point algorithm that includes both an inertial step and overrelaxation. We apply our new inexact ADMM method to LASSO and logistic regression problems and obtain somewhat better computational performance than earlier inexact ADMM methods.
| Problem | relerr | primDR | primDR_relx_in | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ball64_singlepixcam | 280 | 278 | 123 | 0.439 | 0.442 | |||||
| Logo64_singlepixcam | 283 | 282 | 139 | 0.491 | 0.493 | |||||
| Mug32_singlepixcam | 153 | 153 | 136 | 0.888 | 0.888 | |||||
| Mug128_singlepixcam | 920 | 914 | 435 | 0.473 | 0.476 | |||||
| finance1000 | 974 | 1709 | 1079 | 1.107 | 0.631 | |||||
| PEMS | 3354 | 3648 | 1088 | 0.324 | 0.298 | |||||
| Brain | 1855 | 2295 | 1219 | 0.657 | 0.531 | |||||
| Colon | 450 | 482 | 256 | 0.568 | 0.531 | |||||
| Leukemia | 675 | 774 | 424 | 0.628 | 0.547 | |||||
| Lymphoma | 908 | 925 | 482 | 0.531 | 0.521 | |||||
| Prostate | 1520 | 1739 | 998 | 0.656 | 0.574 | |||||
| srbct | 426 | 401 | 221 | 0.519 | 0.551 | |||||
| Geometric mean | 692.06 | 761.02 | 399.85 | 0.577 | 0.525 |
| Problem | relerr | primDR | primDR_relx_in | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ball64_singlepixcam | 603 | 382 | 191 | 0.316 | 0.500 | |||||
| Logo64_singlepixcam | 621 | 369 | 212 | 0.341 | 0.574 | |||||
| Mug32_singlepixcam | 998 | 307 | 302 | 0.303 | 0.984 | |||||
| Mug128_singlepixcam | 1214 | 1046 | 488 | 0.402 | 0.466 | |||||
| finance1000 | 18944 | 7852 | 9737 | 0.514 | 1.240 | |||||
| PEMS | 85858 | 9318 | 9235 | 0.107 | 0.991 | |||||
| Brain | 24612 | 7116 | 7655 | 0.311 | 1.075 | |||||
| Colon | 5847 | 1401 | 1461 | 0.249 | 1.042 | |||||
| Leukemia | 7888 | 2321 | 2543 | 0.322 | 1.095 | |||||
| Lymphoma | 15266 | 3179 | 3083 | 0.202 | 0.969 | |||||
| Prostate | 20615 | 5193 | 6629 | 0.321 | 1.276 | |||||
| srbct | 6213 | 1505 | 1334 | 0.215 | 0.886 | |||||
| Geometric mean | 5859.43 | 1876.32 | 1652.97 | 0.282 | 0.880 |
| Problem | relerr | primDR | primDR_relx_in | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| Ball64_singlepixcam | 11.02 | 7.86 | 3.75 | 0.341 | 0.477 | |||||
| Logo64_singlepixcam | 11.37 | 7.62 | 4.04 | 0.355 | 0.531 | |||||
| Mug32_singlepixcam | 1.07 | 0.51 | 0.43 | 0.374 | 0.862 | |||||
| Mug128_singlepixcam | 248.38 | 218.08 | 101.17 | 0.407 | 0.464 | |||||
| finance1000 | 805.17 | 327.56 | 347.97 | 0.432 | 1.062 | |||||
| PEMS | 7546.11 | 1092.16 | 988.12 | 0.131 | 0.905 | |||||
| Brain | 13.59 | 5.94 | 5.53 | 0.407 | 0.929 | |||||
| Colon | 1.56 | 0.45 | 0.28 | 0.179 | 0.620 | |||||
| Leukemia | 4.24 | 2.23 | 1.59 | 0.375 | 0.717 | |||||
| Lymphoma | 7.18 | 2.63 | 2.03 | 0.283 | 0.773 | |||||
| Prostate | 33.21 | 13.15 | 11.88 | 0.357 | 0.904 | |||||
| srbct | 1.83 | 0.42 | 0.35 | 0.192 | 0.847 | |||||
| Geometric mean | 21.13 | 8.75 | 6.41 | 0.303 | 0.733 |
| Problem | absgeom | relerr | primDR | primDR_relx_in | |||
| Colon | 2666 | 2145 | 1979 | 1578 | |||
| Leukemia | 1662 | 1116 | 922 | 788 | |||
| Prostate | 1936 | 1583 | 1677 | 1198 | |||
| Arcene | 419 | 276 | 359 | 290 | |||
| Geometric mean | 1376.91 | 1011.28 | 1023.76 | 810.72 | |||
| Problem | |||||||
| Colon | 0.5919 | 0.7356 | 0.7974 | 1.0839 | |||
| Leukemia | 0.4741 | 0.7061 | 0.8546 | 1.2104 | |||
| Prostate | 0.6188 | 0.7568 | 0.7144 | 0.9439 | |||
| Arcene | 0.6921 | 1.0507 | 0.8078 | 0.7688 | |||
| Geometric mean | 0.5887 | 0.8016 | 0.7924 | 0.9849 |
| Problem | absgeom | relerr | primDR | FISTA | primDR_relx_in | ||||
| Colon | 20612 | 23919 | 21697 | 26247 | 8283 | ||||
| Leukemia | 7715 | 12086 | 11625 | 6536 | 4448 | ||||
| Prostate | 18901 | 27505 | 24548 | 13730 | 10997 | ||||
| Arcene | 780 | 3236 | 3589 | 4648 | 1450 | ||||
| Geometric mean | 6958.73 | 12665.18 | 12209.43 | 10228.97 | 4923.21 | ||||
| Problem | |||||||||
| Colon | 0.4018 | 0.3463 | 0.3817 | 0.3156 | 0.9499 | ||||
| Leukemia | 0.5765 | 0.3681 | 0.3826 | 0.6805 | 0.6636 | ||||
| Prostate | 0.5818 | 0.3998 | 0.4479 | 0.8009 | 0.7699 | ||||
| Arcene | 1.8589 | 0.4481 | 0.4041 | 0.3119 | 0.2173 | ||||
| Geometric mean | 0.7074 | 0.4032 | 0.4032 | 0.4813 | 0.5699 |
| Problem | absgeom | relerr | primDR | FISTA | primDR_relx_in | ||||
| Colon | 182.3601 | 36.5207 | 91.5726 | 73.2987 | 12.8243 | ||||
| Leukemia | 112.7412 | 105.4221 | 241.1378 | 60.9476 | 23.0547 | ||||
| Prostate | 342.1609 | 719.6731 | 850.8159 | 206.3883 | 128.6972 | ||||
| Arcene | 122.7208 | 312.1101 | 370.9415 | 184.3489 | 46.1276 | ||||
| Geometric mean | 171.41 | 224.11 | 288.93 | 114.18 | 36.39 | ||||
| Problem | |||||||||
| Colon | 0.0703 | 0.1203 | 0.1401 | 0.1749 | 0.8003 | ||||
| Leukemia | 0.2045 | 0.2186 | 0.0956 | 1.0215 | 0.2527 | ||||
| Prostate | 0.3761 | 0.1788 | 0.1513 | 0.6236 | 0.2426 | ||||
| Arcene | 0.3759 | 0.1478 | 0.1244 | 0.2502 | 0.4969 | ||||
| Geometric mean | 0.2123 | 0.1623 | 0.1259 | 0.3187 | 0.3951 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Relative-error inertial-relaxed inexact versions of Douglas-Rachford and ADMM splitting algorithms
M. Marques Alves
Departamento de Matemática, Universidade Federal de Santa Catarina, Florianópolis, Brazil, 88040-900 ([email protected]). The work of this author was partially supported by CNPq grants no. 405214/2016-2 and 304692/2017-4.
Jonathan Eckstein Department of Managment Science and Information Systems and RUTCOR, Rutgers Business School Newark and New Brunswick, Piscataway, NJ 08854, USA ([email protected]). The work of this author was partially supported by National Science Foundation grant CCF-161761 and Air Force Office of Scientific Research grant FA9550-15-1-0251.
Marina Geremia
Departamento de Matemática, Universidade Federal de Santa Catarina, Florianópolis, Brazil, 88040-900 ([email protected]).
Jefferson G. Melo
IME, Universidade Federal de Goiás, Goiânia, Brazil, 74001-970 ([email protected]).
(May 9, 2014)
Abstract
This paper derives new inexact variants of the Douglas-Rachford splitting method for maximal monotone operators and the alternating direction method of multipliers (ADMM) for convex optimization. The analysis is based on a new inexact version of the proximal point algorithm that includes both an inertial step and overrelaxation. We apply our new inexact ADMM method to LASSO and logistic regression problems and obtain somewhat better computational performance than earlier inexact ADMM methods.
2000 Mathematics Subject Classification: 90C25, 90C30, 47H05.
Key words: Inertial, proximal point algorithm, operator splitting, ADMM, relative error criterion, relaxation.
1 Introduction
This paper develops a sequence of three algorithms, each building on the previous one. The first algorithm is a new variant of the proximal point algorithm [28] for the general, abstract problem , where is a set-valued maximal monotone operator on for which . Our proposed method is a new inertial variant of the relaxed hybrid proximal projection (HPP) method introduced in [31]; see also [30]. It lacks the full generality of [31], but introduces a new “inertial” step modification.
Using this first algorithm, we then develop a new inexact variant of the Douglas-Rachford (DR) splitting method for monotone inclusion problems of the of form , where are set-valued maximal monotone operators.
Finally, based on this latter method, we derive a new inexact variant of the alternating direction method of multipliers (ADMM) algorithm for solving convex optimization problems of the form , where are closed proper convex functions. Using the well known LASSO and logistic regression problems as examples, we perform some computational tests on this last algorithm in Section 5 below, finding somewhat better practical performance than earlier proposed inexact ADMM methods from [17, 18].
This path for developing approximate DR and ADMM methods was pioneered in [16], and is also taken in the more recent paper by Eckstein and Yao [18]: in each case, one takes an approximate form of proximal point algorithm (PPA) [28] and uses it to obtain an approximate form of DR splitting, which can then be used to obtain a new “primDR” variant of the ADMM; the iteration complexity of the “primDR” ADMM was later studied in [3]. The main difference between this paper and the development of “primDR” in [18] is in the underlying variant of the PPA. The “primDR” analysis used the hybrid proximal extragradient (HPE) method [29] due to Solodov and Svaiter, whereas here we instead use the new inexact HPP developed in Section 2.
Our general approach resembles that of [18] in that it uses a primal derivation and the “coupling matrix” between and in the optimization formulation must be the identity, whereas [16], drawing on early work in [22], uses a dual derivation and allows for more general coupling matrices. Our analysis is also much closer to [18] than that of [17], which uses a primal-dual “Lagrangian splitting” analysis patterned after [19].
Inertial algorithms for convex optimization and monotone inclusions [2] have been a subject of intense research in recent years. They appear in connection with continuous dynamics — see, e.g. [2, 7, 8] — accelerated first- and second-order algorithms, and operator splitting methods — see e.g. [5, 6, 12, 13, 25] — with good theoretical and practical performance improvements over prior methods. The inertial methods we propose here have the novel property of simultaneously combining inexact iterations, inertia, and relaxation, with the maximum inertial step and maximum relaxation factor being subject to a mutual constraint; see (20) and (21) below. However, the inertial and relaxation parameters may be chosen independently of the relative-error tolerances.
The remainder of this paper is organized as follows: Section 2 presents our inertial-relaxed HPP method (Algorithm 1) and its convergence analysis (Theorems 2.4 and 2.5). Section 3 then uses the HPP method to develop an inexact inertial-relaxed DR method (Algorithm 2), for which convergence is established in Theorem 3.3. Section 4 then uses inertial-relaxed DR method to derive a partially inexact relative-error ADMM method (Algorithm 3). The main result of this section is Theorem 4.4. Section 5 presents numerical experiments on LASSO and logistic regression problems.
2 An inertial-relaxed hybrid proximal projection
(HPP) method
We begin by developing a new method for the problem
[TABLE]
where is a maximal monotone operator; we assume that this problem has a solution. Our new proposed procedure for this problem, related to the method of [31] but having a new “inertial” step feature, is given below as Algorithm 1.
We make the following remarks concerning this algorithm:
- (i)
The extrapolation step in (2) introduces inertial effects — see e.g. [1, 2] — controlled by the parameter . The effect of the overrelaxation parameter in (4) is similar but not identical, as shown in Figure 1 below. Conditions on , and that guarantee the convergence of Algorithm 1 are given in Theorem 2.5 — see (20) and (21) and Figure 2 below.
- (ii)
If , in which case , Algorithm 1 reduces to a special case of the HPP method of [31]; see also [30]. Algorithm 1 is also closely related to the inertial version of the HPP method presented in [1], although that method uses a different relative error criterion.
- (iii)
At each iteration , condition (3) is a relative error criterion for the inexact solution of the proximal subproblem . If , then this equation must be solved exactly and the pair may be written . Here, we are primarily concerned with situations in which the calculation of is relatively difficult and must be approached with an iterative algorithm. In such cases, we use the condition (3) as an acceptance criterion to truncate such an iterative calculation, possibly saving computational effort. We do not specify the exact form of the iterative algorithm used to produce a pair satisfying (3), as it depends on the class of problems to which the algorithm is being applied (and thus the structure of the operator ). See [30, 31] for a related discussion; an abstract formalism of the class of algorithm needed to find a solution to (3) is the “-procedure” described in [18] and also used in Section 3 below.
- (iv)
The point in (4) may be viewed as , where denotes orthogonal projection onto the hyperplane
[TABLE]
which strictly separates from the solution set of (1). This kind of projective approach to approximate proximal point algorithms was pioneered in [30].
- (v)
Algorithm 1 is an inexact variant of the proximal point algorithm (PPA) [28]. In particular, each of its iterations performs an approximate resolvent calculation subject a relative error criterion, and then executes a projection operation in the manner introduced in [30]; see [29, 31] for related work. The main difference from [30] is the inertial step (2).
If in Algorithm 1, then it follows from the inclusion in (3) that is a solution of (1), that is, , so we halt immediately with the solution . For the remainder of this section, we assume that and hence that Algorithm 1 generates an infinite sequence of iterates. The following well-known identity will be useful in the analysis of Algorithm 1:
[TABLE]
Lemma 2.1**.**
[31, Lemma 2]*
For each , condition (3) implies that*
[TABLE]
An immediate implication of Lemma 2.1 is that if and only if .
The proof of the following proposition can be found, using different notation, in [31]. For the convenience of the reader, we also present it here.
Proposition 2.2**.**
Let , and be generated by Algorithm 1 and define, for all ,
[TABLE]
Then, for any ,
[TABLE]
Proof.
We start by defining as the orthogonal projection of onto the hyperplane , i.e.,
[TABLE]
Next we show that the hyperplane stricly separates the current point from the solution set , that is,
[TABLE]
To this end, , and the monotonicity of yield , which is equivalent to the second inequality in (11). On the other hand, note that from (3) and the Young inequality we have
[TABLE]
which in turn yields
[TABLE]
One consequence of (12) is the first inequality in (11), so (11) must hold.
From (10) and (11), we may infer that is the projection onto the halfspace , which is a convex set containing . The well-known firm nonexpansivess properties of the projection operation then imply that
[TABLE]
Algebraic manipulation of (4) and (10) yields . Combining this equation with (6) with gives
[TABLE]
which after some rearrangement yields
[TABLE]
Using (13) in the first term on the right-hand side of this identity produces
[TABLE]
To finish the proof, we observe that (14) and (4) yield
[TABLE]
Combining this inequality with (15), (8) and the bounds results in (9). ∎
The inequality (17) presented in the following proposition plays a role in the convergence analysis of inertial proximal algorithms — see e.g. [2] — similar to that played by Fejér monotonicity in the analysis of standard proximal algorithms.
Proposition 2.3**.**
Let , and be generated by Algorithm 1 and let be as in (8). Further let and define
[TABLE]
Then, and
[TABLE]
that is, the sequences , , and satisfy the assumptions of Lemma A.5 below.
Proof.
From (2) we obtain , which in conjunction with (6) and some algebraic manipulation yields
[TABLE]
Using the above identity and (16) we obtain, for all , that
[TABLE]
From (9) in Proposition 2.2 and the definition of in (16), the above inequality yields (17). Finally, follows from the initialization and the first definition in (16). ∎
The following theorem presents our first result on the asymptotic convergence of Algorithm 1 under the summability assumption (18). Next, Theorem 2.5 gives sufficient conditions (20) and (21) on the inertial and relaxation parameters to assure that (18) is satisfied.
Theorem 2.4** (Convergence of Algorithm 1).**
Let , , and be generated by Algorithm 1. If and
[TABLE]
then converges to a solution of the monotone inclusion problem (1). Moreover, converges to the same solution and converges to zero.
Proof.
Define is as in (8). Using Proposition 2.3, (18), that for all , and Lemma A.5, it follows that (i) exist for every and . So, in particular, is bounded and (ii) . From the form of (8), that , and the assumption that , and Lemma 2.1, we conclude that
[TABLE]
Now let be any cluster point of the bounded sequence . By (19), this point is also a cluster point of and . Let be an increasing sequence of indices such that . We then have
[TABLE]
which by the standard closure property of maximal monotone operators yields . Hence, the desired result on follows from (i) and Opial’s lemma (stated below as Lemma A.4). On the other hand, the convergence of and (19) yields the remaining results regarding and . ∎
Theorem 2.5** (Convergence of Algorithm 1).**
Let , and be generated by Algorithm 1. Assume that , and satisfy the following (for some ):
[TABLE]
and
[TABLE]
Then,
[TABLE]
As a consequence, it follows that under the assumptions (20) and (21) the sequence generated by Algorithm 1 converges to a solution of the monotone inclusion problem (1) whenever . Moreover, under the above assumptions, converges to the same solution and converges to zero.
Proof.
Using (2), the Cauchy-Schwarz inequality and the Young inequality with and we find
[TABLE]
Starting with a rearrangement of (17), we then obtain
[TABLE]
where
[TABLE]
Some elementary algebraic manipulations of (24) then yield
[TABLE]
Define now the scalar function:
[TABLE]
and
[TABLE]
where is as in (16). Using (26)-(28) and the assumption that is nondecreasing — see (20) — we obtain, for all ,
[TABLE]
We will now show that admits a uniform positive lower bound. To this end, note first that from (21) and Lemma A.2 below that we have
[TABLE]
Using the latter identity, (27), and Lemma A.3 below with , , and , we conclude that is decreasing in and is a root of . Thus, in view of (20), we conclude that
[TABLE]
which gives the desired uniform positive lower bound on .
[TABLE]
which, in turn, combined with (20) and the definition of in (28), gives
[TABLE]
Note now that (31), (20) and (28) also yield
[TABLE]
and so,
[TABLE]
Hence, (22) follows directly from (2) and (33). On the other hand, the second statement of the theorem follows from (22) and Theorem 2.4 (recall that for all ). ∎
We close this section with a few further remarks about the analysis of Algorithm 1:
- (i)
Conditions (20) and (21) on , and guarantee that the summability condition (18) is satisfied, thus guaranteeing the convergence of Algorithm 1. Similar conditions were also recently proposed and studied in [4, 6]. Since Algorithm 1 is be the basis of the DR and ADMM methods developed in the next two sections, conditions (20) and (21) will also play an important role in their convergence analyses.
- (ii)
If we set in (20), then it follows immediately from (21) that . On the other hand, we have in (21) whenever (see also Figure 2). Setting in (20) is corresponds to the standard strategy in the literature of inertial proximal algorithms; see e.g. [2, 12].
3 A partially inexact inertial-relaxed Douglas-Rachford (DR) algorithm
Consider the monotone inclusion problem of finding such that
[TABLE]
where and are (set-valued) maximal monotone operators on for which the solution set of (34) is nonempty.
A popular operator splitting algorithms for finding approximate solutions to (34) is the Douglas-Rachford (DR) algorithm [15, 24, 16]:
[TABLE]
where is a scaling parameter, is the current iterate and and are the resolvent operators of and , respectively. The DR algorithm (35) is a splitting algorithm for solving the (structured) inclusion (34) in the sense that the resolvents and are employed separately, but the resolvent of is not. Such methods may be useful in situation in which the values of and are relatively easy to evaluate in comparison to those of .
This section will develop an inexact version of the DR algorithm (35) for the situation in which the resolvent of one of the operators, say , is relatively hard, but evaluating is a simple calculation. To this end, we consider the following equivalent formulation of (35) (see, e.g., [16]): given some ,
[TABLE]
In this case, . Since the resolvent of is assumed to be easily computable, the pair in (37) is explicitly given by
[TABLE]
For , we by contrast suppose that exact computation of the pair satisfying (36) requires a relatively time-consuming iterative process, which we model immediately below by the notion of a -procedure as introduced in [18]. We first remark that (36) can be posed in the more general framework of solving monotone inclusion problems of the form
[TABLE]
where and .
Definition 3.1** (–procedure for solving (38)).**
A –procedure for (approximately) solving any instance of (38) is a mapping such that if one lets for all and any given and , then , for all , the sequence is convergent, and .
Following [18], the intuitive meaning of is that is the trial approximation generated by some iterative procedure for solving (38), starting from some initial guess . We refer the interested reader to [18, Section 5] for a more detalied discussion and interpretation on the -procedure concept.
We make the following standing assumption:
Assumption 1**.**
There exists a -procedure (according to Definition 3.1) for approximately solving any instance of (38).
We now combine the hypothesized -procedure with an acceptance criterion for the approximate solution of (36). We will follow the general approach of [18], which is to exploit the connection between the DR algorithm (36)-(37) and the proximal point algorithm as established in [16]. Specifically, the DR algorithm (36), (37) is a special instance of the PP algorithm in the sense that,
[TABLE]
where the “splitting” operator is defined as [16]
[TABLE]
The operator defined in (47) is maximal monotone and
[TABLE]
which, in particular, gives that any solution of the monotone inclusion problem (1) with , namely
[TABLE]
yields a solution of (34).
Here, we follow a similar derivation to [18], but use Algorithm 1 of Section 2 to (49) in place of the HPE method of [29]. The result is an inertial-relaxed inexact relative-error DR algorithm for solving (34). We should emphasize that even (there is no inertial step) and (no overrelaxation), the resulting algorithm differs from that of [18]. This difference arises because the underlying “convergence engine” of Algorithm 1 is a form of hybrid proximal-projection (HPP) algorithm, whereas [18] used an HPE algorithm in the equivalent role, using an extragradient step instead of projection.
The proposed algorithm for solving (34) is shown as Algorithm 2. We should mention that a different inexact DR splitting algorithm in which relative errors are allowed in both (40) and (42) was recently proposed and studied in [32], but without computational testing. The following proposition shows that Algorithm 2 is indeed a special instance of Algorithm 1 for solving (1) with .
Proposition 3.2**.**
Consider the sequences evolved by Algorithm 2 and for each let denote the value of for which (43) is satisfied. For each , define, with as in Algorithm 2,
[TABLE]
Then these latter sequences satisfy the conditions (2)-(4) of Algorithm 1 with and .
Proof.
Fix any . From (39) and the definitions of and in (50) we have
[TABLE]
which is exactly (2). Now note that the inclusion in (3) follows from the fact that , (47), (42), from (40), and the definitions of and in (50).
[TABLE]
which is exactly the inequality in (3) with . Finally,
[TABLE]
which establishes (4) and thus completes the proof of the proposition. ∎
The following theorem states the asymptotic convergence properties of Algorithm 2, which are essentially direct consequences of Proposition 3.2 and Theorem 2.5.
Theorem 3.3** (Convergence of Algorithm 2).**
Consider the sequences evolved by Algorithm 2 with the parameters , and satisfying the conditions (20) and (21) of Theorem 2.5. Then
- (a)
If the outer loop (over ) executes an infinite number of times, with each inner loop (over ) terminating in a finite number of iterations , then and both converge to some solution of (34), and and both converge to some , with converging to .
- (b)
If the outer loop executes only a finite number of times, ending with , with the last invocation of the inner loop executing an infinite number of times, then and both converge to some solution of (34), and converges to some , with converging to .
- (c)
If Algorithm 2 stops with , then is a solution of (34).
Proof.
(a) For each , again let be the index of inner iteration that first meets the inner-loop termination condition. Using Proposition 3.2, (44), the descriptions of algorithms 1 and 2, and Theorem 2.5, we conclude that there exists such that and
[TABLE]
From and (48) we obtain that is a solution of (34). Moreover, it follows from (51), the inclusion in (40), (44), and the continuity of that
[TABLE]
We also have since, from (51), . Altogether, we have that is a solution of (34) and and both converge to . From (52) we now have
[TABLE]
From we then obtain . On the other hand, using the equation in (42), (44), (51) and (53) we find
[TABLE]
Using the above convergence result, that , the inclusion in (42), and Lemma A.1, we obtain that . Finally, .
(b) First note that using (41) we obtain , which in view of Definition 3.1 yields , for all , , , and , for some . Combining limits, we obtain that . From Lemma A.1, we also have . Now combining the limits with (42) and the continuity of , we also find
[TABLE]
and so
[TABLE]
From the inclusion in (42) and (again) Lemma A.1 we obtain that . On the other hand, using (43) and the hypothesis that the inner loop executes an infinite number of times at iteration , we obtain, for all , that
[TABLE]
Since the left-hand side of the above inequality converges to zero and the right-hand side is nonnegative, the right-hand side also converges to zero and in particular . Since and , we conclude that and, hence, from (54), that .
(c) If , then it follows from the inclusion in (40) and (42) that . ∎
4 A partially inexact relative-error inertial-relaxed
ADMM
We now consider the convex optimization problem
[TABLE]
where are proper, convex and lower semicontinuous functions for which .
The alternating direction method of multipliers (ADMM) [21, 23] is a first-order algorithm for solving (56) which has become popular over the last decade largely due to its wide range of applications in data science (see, e.g., [11]). As applied to (56), one iteration of the ADMM may be described as:
[TABLE]
In many applications, the function is such that (58) has a closed-form or otherwise straightforward solution (e.g., ). We consider situations in which this is the case, but solving (58) is more difficult and requires some form of iterative process. Eckstein and Yao [18, Section 6] proposed and studied the asymptotic convergence of an inexact version of the ADMM tailored to such situations: at each iteration, (57) may be approximately solved within a relative-error tolerance. This method is a special version of their inexact relative-error Douglas-Rachford (DR) algorithm mentioned in Section 3, as applied to the monotone inclusion problem
[TABLE]
which is, in particular, a special case of (34) with and . Problem (60) is, under standard qualification conditions, equivalent to (56). Recall that we are assuming , i.e., that (60) admits at least one solution.
In this section, we propose and study the asymptotic behaviour of a (partially) inexact relative-error inertial-relaxed ADMM algorithm for solving (56). The proposed method, namely Algorithm 3, is a special version of Algorithm 2 when applied to solving (60) and may be viewed as an alternative to the Eckstein-Yao approximate ADMM [18] that incorporates inertial and relaxation effects to accelerate convergence.
To formalize the inexact solution process for the subproblems (57), we introduce the notion of an -procedure [18]. First, we note that any instance of (57) can be posed slightly more abstractly as
[TABLE]
where and .
Definition 4.1** (-procedure for solving (61)).**
A –procedure for (approximately) solving any instance of (61) is a mapping such that if one lets for all and any given and , then
[TABLE]
Quoting [18, Assumption 2], “the idea behind this definition is that is the iterate produced by the -subproblem solution procedure with penalty parameter , the Lagrange multiplier estimate equal to , and , starting from the solution estimate ”. For the remainder of this section, we assume the following.
Assumption 2**.**
There exists a –procedure (according to Definition 4.1) for approximately solving any instance of (61).
The next lemma shows that the -procedure is essentially a form of –procedure (see Definition 3.1). Although the proof essentially duplicates analysis in [17, 18], it is not presented as a separate result there. Therefore we include the proof in the interest of rigor and completeness.
Lemma 4.2**.**
Let be a –procedure for solving (61), where , for , and define by
[TABLE]
Then, is a –procedure (see Definition 3.1) for approximately solving (38) in which , , , and .
Proof.
Assume that for some , and all . In view of (63) and the fact that we have
[TABLE]
and so, for all ,
[TABLE]
Using the latter identity and the fact that is a –procedure (see Definition 4.1) we obtain
[TABLE]
which, in particular, after some computations, yields , i.e., for all . Using this fact and the definition of we find , which in turn combined with the fact that and the continuity of implies that . On the other hand, using the definition of (again) we also obtain , which gives that is convergent and . Altogether, we proved that , for all , that the sequence is convergent and , which finishes the proof. ∎
Our inertial-relaxed inexact ADMM for solving (56) is presented as Algorithm 3. Before establishing its convergence, we make the following remarks regarding this algorithm:
- (i)
Similarly to Algorithm 2, Algorithm 3 benefits from inertial and relaxation effects — see (64) and (72) — as well as from the relative error criterion (69) allowing inexact solution of the -subproblem (65).
- (ii)
Algorithm 3 can be viewed as an inertial-relaxed version of Algorithm 4 in [18], but we emphasize that even without inertia or relaxation (that is, when and ) it differs from the latter algorithm since Algorithm 4 is based on an approximate proximal point algorithm using an extragradient “corrector” step, while Algorithm 3 is instead based indirectly on Algorithm 1, an approximate proximal point method using projective corrector steps. In developing Algorithm 3, we also experimented with using extragradient correction, but obtained better numerical performance from projective correction.
- (iii)
The derivation of Algorithm 3 mirrors that in [18], except that the underlying convergence “engine” from [30] is replaced by Algorithm 1. It should be noted that [17] provides a different way of deriving approximate ADMM algorithms. This approach results in different approximate forms of the ADMM, allowing for both relative and absolute error criteria, both of a practically verifiable form. It is also possible that the work in [32] could lead to still more approximate forms of the ADMM.
Proposition 4.3**.**
*For any given execution of Algorithm 3, define *
[TABLE]
for all applicable and . Then these sequences conform to the recursions (39)-(46) in Algorithm 2 with , the -procedure (63), and the maximal monotone operators and .
Proof.
In view of (73) and (64) we have
[TABLE]
which is identical to (39) in Algorithm 2. Fix . Then (66), Definition 4.1, (73) lead to
[TABLE]
Combining (74), (67), (66), (75), (73), and (63), we deduce that
[TABLE]
which yields (40) and (41). Note now that (68) is equivalent to the condition , which, in view of (74), is clearly equivalent to (42) with . To prove (43), note that from (73), (74), (67) and (69) we obtain
[TABLE]
which in view of (73) and (74) is equivalent to (43). Finally, similar reasoning establishes that (44)-(46) are equivalent to (70)-(72). ∎
Theorem 4.4** (Convergence of Algorithm 3).**
Consider any execution of Algorithm 3 for which , , and satisfy conditions (20) and (21) of Theorem 2.5. Then:
- (a)
If for each the outer loop (over ) executes an infinite number of times, with each inner loop (over ) terminating in a finite number of iterations , then and both converge to some solution of (60), and converges to some such that .
- (b)
If the outer loop executes only a finite number of times, ending with , with the last invocation of the inner loop executing an infinite number of times, then and both converge to some solution of (60), and converges to some such that .
- (c)
If Algorithm 3 stops with either or then is a solution of (60).
Proof.
The result follows from immediately by combining Proposition 4.3, Theorem 3.3, and the definitions of Algorithms 2 and 3. ∎
5 Numerical experiments
This section describes numerical experiments on the LASSO and logistic regression problems, which are both instances of the minimization problem (56). We tested the following algorithms: the inexact relative-error ADMM admm_primDR from [18]; the relative-error method relerr from [17]; Algorithm 3 from this paper, which we denote as admm_primDR_relx_in; the absolute-error aproximate ADMM absgeom discussed in [18] and a backtraking variant of FISTA [10] (also discussed in [18]). We implemented all algorithms in MATLAB, and, analogously to [18], we used the following condition to terminate the outer loop:
[TABLE]
where , and is a tolerance parameter set to .
Moreover, in our implementation of Algorithm 3 from this paper, we replaced the error condition (69) with the stronger condition
[TABLE]
which we empirically found to yield better numerical performance.
5.1 Numerical experiments on the LASSO problem
In this subsection, we report numerical experiments on the LASSO problem [33]
[TABLE]
where , and , which is an instance of (56) with and . For the data and , we used the same (non-artificial) datasets as in [18].
We tested three algorithms for solving (78):
- •
The inexact relative-error ADMM admm_primDR from [18]. For this algorithm, we used the same parameter values as in [18], namely and (except for the PEMS problem instance, for which ).
- •
The relative-error algorithm relerr from [17]. We also used , (for all problem instances except PEMS, which we used ). For this set of LASSO problems, the experiments in [17, 18] already show admm_primDR to outperform the algorithms of [17], as well as FISTA [10].
- •
Algorithm 3 from this paper which we denote as admm_primDR_relx_in. We used the parameter settings , and — see conditions (20) and (21) and Figure 2. We also set and (except for the PEMS problem instance, for which ).
We implemented all of the algorithms in MATLAB, using a conjugate gradient procedure to approximately solve the subproblems corresponding to , exactly as in [18]. Table 1 shows number of outer iterations, Table 2 shows the total number of inner (conjugate gradient) iterations, and Table 3 shows runtimes in seconds. Figure 3 shows the same results graphically. In each table, the smallest value in each row appears in bold. In terms of runtime, the new algorithm outperforms that of [18] for all problem except the finance1000 instance.
5.2 Numerical experiments on logistic regression problems
This section describes numerical experiments on the –regularized logistic regression problem [20, 26]
[TABLE]
using a training dataset consisting of pairs , where is a feature vector, is the corresponding label, represents a weighting of the feature and reresents a kind of bias. Problem (79) is clearly a special instance of (56) with and
[TABLE]
We considered four standard cancer DNA microarray non-artificial datasets from [14] (also used in [18, Subsection 7.2]) and tested five algorithms: absgeom, relerr, admm_primDR, FISTA and admm_primDR_relx_in. For relerr and admm_primDR algorithms we used the same parameter values as in Subsection 5.1; for admm_primDR_relx_in we used the parameter settings , and — see conditions (20) and (21) and Figure 2. We also set and .
Analogously to [18], we used an L-BFGS procedure to approximately solve the subproblems corresponding to from (80). Tables 4, 5 and 6 show outer iterations, total inner iterations and runtimes, respectively. These results are also graphically summarized in Figure 4. The new algorithm has the best aggregate performance by all measures, and the best run time for all the datesets.
Appendix A Auxiliary results
Lemma A.1** (See for example Proposition 20.33 of [9]).**
If is maximal monotone on , is such that for all , , and , then .
Lemma A.2**.**
The inverse function of the scalar map
[TABLE]
is
[TABLE]
Proof.
We first claim that for all and for all . To establish this claim, we first note that by elementary calculus and some simplifications, we have
[TABLE]
The discriminant of is negative, so it has no real roots and the denominator of (81) is always positive. The expression in the numerator is convex and applying the quadratic formula yields that that its roots are and , so therefore it is nonpositive on and negative on . Therefore, exists for all and is negative for all , implying that is a decreasing function on . By direct calculation, and , so therefore \big{\{}\psi(\beta)\;|\;\beta\in[0,1]\big{\}}=[0,2] and \big{\{}\psi(\beta)\;|\;\beta\in(0,1)]\big{\}}=(0,2), establishing the initial claim. To continue the proof, we next establish that
[TABLE]
To this end, fix any and define
[TABLE]
which implies the quadratic equation
[TABLE]
We now consider three cases in (83): , , and .
:
in this case, simplification of (83) and the definition of yield that .
:
the unique minimizer of the quadratic function in (83) is \beta^{*}:=(4-\rho)/\big{(}4(1-\rho)\big{)}, which must be greater than because . Thus, we have , so is the smaller root of the quadratic equation in (83). Using the quadratic formula and rationalizing the denominator,
[TABLE]
:
in this case, as defined in the previous case is the unique maximizer of the quadratic function in (83) and . So and is the larger root of the quadratic in (83). Since the coefficient of the quadratic term is negative in this case, this root also takes the form (84), and consequently (85) still holds.
The proof of (82) is now complete. Finally, we now prove that
[TABLE]
To this end, let and define
[TABLE]
Using the above definition and the quadratic formula, we conclude that also satisfies the quadratic equation (83), which after some simple algebra gives
[TABLE]
that is, , which in turn is equivalent to (86). ∎
Lemma A.3**.**
Let be a real function and assume that and . Define
[TABLE]
- (i)
*If , then is a decreasing affine function and as in (87) is its unique root *(see Figure 5(a)).
- (ii)
*If *(resp. ), then is a convex (resp. concave) quadratic function and as in (87) is its smallest (resp. largest) root (see Figure 5(b) and Figure 5(c), resp.).
*In both cases (i) and (ii), as in (87) is a root of , and is decreasing in the interval *(see Figure 5**).
Proof.
The proof of (i) is straightforward. To prove (ii), note that rationalizing the denominator of (87) results in , which in turn implies that (ii) follows from the quadratic formula and the assumption that . The last statement of the lemma is a direct consequence of (i), (ii) and the assumption that . ∎
Lemma A.4** (Opial [27]).**
Let and be a sequence in such that every cluster point of belongs to and exists for every . Then converges to a point in .
The following lemma was essentially proved by Alvarez and Attouch in [2, Theorem 2.1].
Lemma A.5**.**
Let the sequences , , and in and be such that , and
[TABLE]
The following hold:
- (a)
For all ,
[TABLE] 2. (b)
If , then exists, i.e., the sequence converges to some element in .
Proof.
It was proved in [2, Theorem 2.1] that , where Using this, the assumptions , and (88), and some algebraic manipulations we find, for all ,
[TABLE]
which proves (a). To finish the proof, we note that (b) was established within the proof of [2, Theorem 2.1]. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. Alvarez. Weak convergence of a relaxed and inertial hybrid projection-proximal point algorithm for maximal monotone operators in Hilbert space. SIAM J. Optim. , 14(3):773–782, 2003.
- 2[2] F. Alvarez and H. Attouch. An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. , 9(1-2):3–11, 2001.
- 3[3] M. Marques Alves and M. Geremia. Iteration complexity of an inexact Douglas-Rachford method and of a Douglas-Rachford-Tseng’s F-B four-operator splitting method for solving monotone inclusions. Numerical Algorithms, to appear , 2019.
- 4[4] M. Marques Alves and R.T. Marcavillaca. On inexact relative-error hybrid proximal extragradient, forward-backward and Tseng’s modified forward-backward methods with inertial effects,. Set-Valued and Variational Analysis, to appear , 2019.
- 5[5] H. Attouch and A. Cabot. Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Preprint hal-01708216, HAL Open Archive, 2018.
- 6[6] H. Attouch and A. Cabot. Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Preprint hal-01708905, HAL Open Archive, 2018.
- 7[7] H. Attouch, Z. Chbani, J. Peypouquet, and P. Redont. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. , 168(1-2, Ser. B):123–175, 2018.
- 8[8] H. Attouch, J. Peypouquet, and P. Redont. Fast convex optimization via inertial dynamics with Hessian driven damping. J. Differential Equations , 261(10):5734–5783, 2016.
