Proximal extrapolated gradient methods with prediction and correction for monotone variational inequalities
Xiaokai Chang, Sanyang Liu, Jianchao Bai, Jun Yang

TL;DR
This paper introduces a proximal extrapolated gradient method with prediction and correction for monotone variational inequalities, enabling larger step sizes and improved numerical efficiency through theoretical convergence guarantees and practical experiments.
Contribution
It extends proximal gradient methods by allowing larger step sizes via prediction and correction, with proven convergence and enhanced numerical performance.
Findings
The method converges under a very weak condition.
Larger step sizes improve numerical efficiency.
Numerical experiments confirm theoretical advantages.
Abstract
An efficient proximal-gradient-based method, called proximal extrapolated gradient method, is designed for solving monotone variational inequality in Hilbert space. The proposed method extends the acceptable range of parameters to obtain larger step sizes. The step size is predicted based a local information of the operator and corrected by linesearch procedures to satisfy a very weak condition, which is even weaker than the boundedness of sequence generated and always holds when the operator is the gradient of a convex function. We establish its convergence and ergodic convergence rate in theory under the larger range of parameters. Furthermore, we improve numerical efficiency by employing the proposed method with non-monotonic step size, and obtain the upper bound of the parameter relating to step size by an extremely simple example. Related numerical experiments illustrate the…
| TFBF | PEG | MPG | IPEG | IPEG | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| () | () | |||||||||||||
| Iter | prox | F | Time | Iter | F | Time | Iter | Time | Iter | Time | Iter | Time | ||
| 141 | 294 | 435 | 0.05 | 73 | 143 | 0.02 | 243 | 0.03 | 62 | 0.01 | 48 | 0.01 | ||
| 163 | 341 | 504 | 0.1 | 76 | 149 | 0.04 | 262 | 0.03 | 66 | 0.02 | 50 | 0.01 | ||
| 174 | 365 | 539 | 2.21 | 80 | 157 | 0.76 | 284 | 1.23 | 70 | 0.32 | 53 | 0.31 | ||
| 139 | 292 | 431 | 0.05 | 78 | 154 | 0.02 | 229 | 0.03 | 77 | 0.01 | 63 | 0.01 | ||
| 145 | 305 | 450 | 0.31 | 83 | 164 | 0.09 | 249 | 0.24 | 83 | 0.08 | 67 | 0.07 | ||
| 170 | 359 | 529 | 4.89 | 88 | 174 | 1.44 | 270 | 3.41 | 88 | 1.13 | 71 | 1.01 | ||
| TFBF | PEG | IPEG() | IPEG() | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Iter | prox | F | Time | Iter | F | Time | Iter | Time | Iter | Time | |
| 81 | 173 | 254 | 0.02 | 82 | 164 | 0.1 | 72 | 0.01 | 58 | 0.01 | |
| 84 | 177 | 261 | 0.02 | 79 | 156 | 0.1 | 70 | 0.01 | 56 | 0.01 | |
| 88 | 186 | 274 | 0.02 | 85 | 169 | 0.1 | 75 | 0.01 | 59 | 0.01 | |
| TFBF | PEG | IPEG() | IPEG() | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Iter | prox | F | Time | Iter | Time | Iter | Time | Iter | Time | ||
| 1 | 500 | 1066 | 2279 | 3345 | 0.25 | 1185 | 0.17 | 1185 | 0.13 | 972 | 0.09 |
| 1000 | 1155 | 2469 | 3624 | 1.75 | 1323 | 0.93 | 1268 | 0.42 | 1033 | 0.39 | |
| 5000 | 1389 | 2969 | 4358 | 56.87 | 1575 | 27.89 | 1630 | 26.43 | 1326 | 22.86 | |
| 2 | 500 | 1270 | 2715 | 3985 | 0.29 | 1447 | 0.19 | 1480 | 0.14 | 1165 | 0.12 |
| 1000 | 1134 | 2424 | 3558 | 1.56 | 1274 | 0.86 | 1262 | 0.41 | 1028 | 0.40 | |
| 5000 | 1365 | 2918 | 4283 | 55.74 | 1554 | 33.91 | 1603 | 29.94 | 1303 | 25.64 | |
| data | TFBF | PEG | IPEG() | IPEG() | |||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Iter | prox | F | Time | Iter | F | Time | Iter | Time | Iter | Time | |
| w7a | 971 | 1950 | 2867 | 4.1 | 968 | 1933 | 2.9 | 827 | 1.6 | 716 | 1.4 |
| a9a | 6758 | 14439 | 21197 | 27.8 | 4241 | 8601 | 12.2 | 3498 | 6.1 | 2844 | 5.0 |
| real-sim | 3984 | 8510 | 12494 | 153.8 | 2651 | 5312 | 70.9 | 2230 | 35.1 | 1796 | 32.8 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Variational Analysis · Advanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques
∎
11institutetext: ✉ Xiaokai Chang 22institutetext: [email protected]
✉ Jianchao Bai 33institutetext: [email protected] 44institutetext: 1 School of Science, Lanzhou University of Technology, Lanzhou, P. R. China.
2 School of Mathematics and Statistics, Xidian University, Xi’an, P. R. China.
3 Department of Applied Mathematics, Northwestern Polytechnical University, Xi’an, P. R. China.
4 School of Mathematics and Information Science, Xianyang Normal University, Xianyang, P. R. China.
Proximal extrapolated gradient methods with prediction and correction for monotone variational inequalities
Xiaokai Chang1,2
Sanyang Liu1
Jianchao Bai3
Jun Yang4
(Received: date / Accepted: date)
Abstract
An efficient proximal-gradient-based method, called proximal extrapolated gradient method, is designed for solving monotone variational inequality in Hilbert space. The proposed method extends the acceptable range of parameters to obtain larger step sizes. The step size is predicted based a local information of the operator and corrected by linesearch procedures to satisfy a very weak condition, which is even weaker than the boundedness of sequence generated and always holds when the operator is the gradient of a convex function. We establish its convergence and ergodic convergence rate in theory under the larger range of parameters. Furthermore, we improve numerical efficiency by employing the proposed method with non-monotonic step size, and obtain the upper bound of the parameter relating to step size by an extremely simple example. Related numerical experiments illustrate the improvements in efficiency from the larger step size.
Keywords:
Variational inequalities proximal gradient method convex optimization nonmonotonic step size
MSC:
47J20 65C10 65C15 90C33
††journal: COAP
1 Introduction
Let be a real Hilbert space equipped with inner product and its induced norm . We consider the variational inequality problem:
[TABLE]
where is an operator and is a proper lower semicontinuous convex function. We use to represent the domain of , defined by . For a continuously differentiable and convex function with its gradient denoted by , then problem (1) is equivalent to
[TABLE]
Let be a closed and convex subset of . Let be the indicator function of the set , that is, if and otherwise. When , variational inequality (1) reduces to
[TABLE]
Problem (1) and its special cases (2) and (3) have wide applications in disciplines including mechanics, signal and image processing, and economics app1 ; app2 ; app3 ; app4 ; statistical_learning ; 11. , to cite a few. Throughout the paper, the solution set of problem (1) is assumed to be nonempty, and the following assumptions hold:
(A1) is monotone, i.e.,
(A2) is -Lipschitz continuous (), that is,
[TABLE]
(A3) is a continuous function.
Many efficient methods have been proposed for solving the problem (1) and its special cases, for instance, alternating direction method of multipliers (ADMM) statistical_learning ; PC-ADMM ; He_ADMM-based ; ADMM , extragradient method 6. ; extragradient ; extragradient-type ; 13. , proximal (projected) gradient method 8. ; FBS ; FBS_P ; FBS2015 ; modified-FB ; New_properties and its accelerated version FISTA-CD ; Nesterov1983 . Here, we would concentrate on the most simple case of these approaches: forward-backward splitting (FBS) method. Under the assumption that is -Lipschitz continuous, the iterative scheme of the classical FBS method for problem (1) reads
[TABLE]
where is some positive number and can be viewed as a step size of the forward step, and the proximal operator is defined in Section 2.
To establish convergence of the iteration (4), it often requires the restrictive assumptions that is -Lipschitz continuous, strongly (or inverse strongly) monotone with . To overcome this drawback, Korpelevich extragradient and Antipin 13. proposed the following extragradient method for (3) with two-step projection procedures
[TABLE]
where denotes the (metric) projection onto , is any positive sequence verifying for some values . The extragradient method has received great attentions and has been improved in various ways 5. ; M-extra ; Non-Lip ; Low-cost ; 9. , including linesearch procedures or/and avoiding Lipschitz-continuity assumption, decreasing a number of metric projections, etc. For instance, Censor, Gibali and Reich 5. introduced
[TABLE]
where the step size satisfies . Since the second projection in (8) can be found in a closed form, this method is more applicable when a projection onto the closed convex set is a nontrivial problem. For a more general problem (1), Tseng modified-FB modified the iteration (4) and proposed the following forward-backward-forward (FBF) method involving one proximal operator and two values of per iteration:
[TABLE]
where . Since then, Tseng’s method has attracted a lot of interests due to its simplicity and generality, see FB-Tseng ; Tseng ; inertial-FBF for more details.
In the literature, the inertial extrapolation has been conducted to accelerated proximal gradient methods in the spirit of Nesterov’s extrapolation techniques Nesterov1983 ; Nesterov2004 , whose basic idea is to make full use of historical information at each iteration. A typical scheme of the proximal gradient method with extrapolation for solving (1) is
[TABLE]
where . Recently, using a fixed parameter in (9), Malitsky 9. introduced the iteration
[TABLE]
for solving (3). However, the step size ( or ) requires the information of the Lipschitz constant , which is a main drawback of the algorithms introduced above. In fact, these algorithms with a large value of can lead to very small step size, which may give rise to a slow convergent algorithm 10. . To obtain a proper step size, Armijo-type line search and outer approximation techniques were involved in Khobotov ; search-strategy ; Solodov ; extragradient-type . Due to the extra proximal operator as well as the evaluations of , these algorithms will be computationally expensive when proximal operator or is hard to compute and somewhat expensive.
For getting a proper step without using the Lipschitz constant , Malitsky 9. introduced an efficient method whose main updates are
[TABLE]
By updating the step size given by a specific procedure according the progress of algorithm, a weak convergence result was proved, but this process involves the computation of additional projections onto . Later, Mainge and Gobinddass 10. introduced a more general framework:
[TABLE]
where the step size needs to satisfy many inequality constraints and can be obtained by linesearch procedure, see (10., , Section 3.1 and Section 3.2.2). Based on the scheme (10), local information of the operator and some linesearch procedures, Malitsky Proximal-extrapolated proposed simpler schemes which do not require Lipschitz continuity of the operator. Furthermore, the involved linesearch procedure doesn’t need extra prox or projection and it can be applied to a more general problem (1). By overcoming the estimation of and linesearch procedure for the scheme (10), Yang and Liu yang proposed an extragradient method with lower computational complexity but nonincreasing step sizes. The important parameter relating to the step size was restricted on with in yang and with variable from linesearch in 9. for guaranteeing the convergence.
The aim of this paper is to propose a proximal gradient algorithm with larger step size, extend the range of to that is less than or equal to 1, and then improve the range of . Our proposed methods do not require Lipschitz constant, and its step size is predicted by using two previous iterates, and corrected by linesearch to satisfy a very weak condition, which always holds when for a convex function . Specifically, by the aid of the vital inequalities in convergence’s proof we first introduce a function defined as
[TABLE]
for any to ensure some convergence properties. Then we get , and use to control the step size. Our range of is larger than that presented in 9. ; yang , see Lemma 2 for more explanations. Secondly, the region of is partitioned as
[TABLE]
to explore convergence of the proposed method, and the ergodic convergence rate is established. Finally, we obtain the upper bound of by an extremely simple example, and improve numerical efficiency by introducing nonmonotonic step size but . In fact, the proposed nonmonotonic step size can break away from overdependence on the initial point, but it would have to be monotonic in the end for getting convergence.
The paper is organized as follows. In Section 2, we provide some useful facts and notations. In Section 3, we introduce our algorithm and explore the properties of the function . A weak convergence theorem of our method is proved in Section 3.1. In Section 3.2, we establish the ergodic convergence rate of the proposed algorithms, and we improve the algorithms in Section 3.3 to avoid the adverse effects of the nonincreasing step size. In Section 4, we show by an example that any value of with does not guarantee convergence of our algorithm. Numerical experiments on solving some problems tested in the literatures are provided and analyzed in Section 5. We finally conclude our paper in Section 6.
2 Preliminaries
In this section, we introduce some notations and facts on the well-known properties of the proximal operator, Opial condition and Young’s inequality, which are used for the sequel convergence analyses.
The proximal operator prox with prox, is defined by
[TABLE]
Setting
[TABLE]
it is clear that problem (1) is equivalent to finding such that for all .
Fact 1
Bauschke2011Convex * Let be a convex function, and . Then if and only if*
[TABLE]
Fact 2
Opial * (Opial 1967) Let be a nonempty set of and be a sequence in such that the following two conditions hold:
(1) for every , exists;
(2) every sequential weak cluster point of is in .
Then converges weakly to a point in .*
Fact 3
Let , be two nonnegative real sequences and such that
[TABLE]
Then is convergent and .
Fact 4
(Young’s inequality) For all and , we have
[TABLE]
The following identity (cosine rule) appears in many times and we will use it for simplicity of convergence analyses. For all ,
[TABLE]
3 Proximal Extrapolated Gradient Method with Prediction and Correction
In this section, we state our proximal extrapolated gradient method with prediction and correction (PEG), by using the step size function defined in (11).
Algorithm 1** (PEG for solving (1))**
Step 0.
Take , choose , , and a bounded sequence . Set , and .
Step 1.
**Prediction:
**1.a. Compute
[TABLE]
1.b. Compute
[TABLE]
if , then stop: is a solution.
Step 2.
Correction when :
Check
[TABLE]
if not hold, set and return to Step 1.b.
Step 3.
Set and return to Step 1.
The aim of Correction step is to bound by the given sequence when , as convergence analysis requires . In practice, we don’t need to give the sequence , but generate adaptively by
[TABLE]
for given and small (e.g., ), then for all and . Moreover, we observe for bounding more tightly due to .
For a convex function , if we observe , see (40), so Correction step is not necessary. However for other cases, one needs to apply linesearch to ensure . Interestingly, for all the tested problems shown in Section 5, the linesearch in Correction step does not start to arrive termination conditions, when using (16) with . Namely, the predicted step is good enough for obtaining a convergent sequence for the tested problems, though the convergence without prediction is unknown in general.
The following lemma shows that the correction procedure described in Algorithm 1 is well-defined.
Lemma 1
The correction procedure always terminates. i.e., is well defined when .
Proof. Denote
[TABLE]
From (Bauschke2011Convex, , Theorem 23.47), we have that as ( denotes the closures of ), which together with the nonexpansivity of yields
[TABLE]
By taking the limit as , we deduce that . Notice that , we observe .
By a contradiction, suppose that the correction procedure in Algorithm 1 fails to terminate at the -th iteration. Then, for all with , we have . Since as , so , this gives a contradiction , which completes the proof.
Remark 1
Note that the sequence is monotonically decreasing. Since is a -Lipschitz continuous mapping (), we have
[TABLE]
for . Thus the predicted step sequence has a lower bound , then when its limit exists and . If , is well defined from Lemma 1, and has a lower bound for some , which implies as well.
Below, we derive the analytical expression of .
Lemma 2
For the function defined in (11), we have with for .
Proof. Fix , then . Noting that the structure of (11) and is a maximum value, so , which together with and shows
[TABLE]
By the first-order optimality condition of the optimization problem (17), we have . Substituting it into (17), the result can be deduced.
By Lemma 2 and Fig. 1, the maximum value of is when , and in fact,
[TABLE]
In this case, we have , and .
Remark 2
It can be noticed that the method proposed in yang is a special case of Algorithm 1, when and , but from Lemma 2. Namely, we extend the range of and then improve the upper bound of when the operator is the gradient of a convex function or using linesearch, see Fig. 1, which causes larger step size that will be more efficient for numerical experiments.
3.1 Convergence Analysis
This section devotes to studying convergence properties of Algorithm 1. For , its convergence and convergence rate can be obtained by combining the methods in yang ; Proximal-extrapolated with the basic theory of limit. However, it is a completely different situation for , since the desired properties (such as monotonicity and nonnegativity) are no longer valid in the case of although we can adopte a larger value of .
We next give a basic lemma about the iterations generated by Algorithm 1 for any , which play a crucial role in proving the main convergence results.
Lemma 3
Let and be two sequences generated by Algorithm 1. For any , we have
[TABLE]
Proof. Followed by and Fact 1, we have
[TABLE]
which shows
[TABLE]
Substituting and into the above inequality respectively, we obtain
[TABLE]
Multiplying (21) by and then adding it to (20), which by yields
[TABLE]
Multiplying (22) by and using again, we get
[TABLE]
Finally, adding (19) to (23) gives us
[TABLE]
Then, using (13), the updating of and Cauchy-Schwarz inequality, we obtain
[TABLE]
The proof is completed.
Lemma 4
Let , be two sequences generated by Algorithm 1 and (the solution set of problem (1)). Then, for any , we have
[TABLE]
where is defined as in (12).
Proof. Using Fact 4, for any we have
[TABLE]
Meanwhile, for any we deduce
[TABLE]
Combining the above inequalities we have
[TABLE]
In addition, the monotonicity of implies for any
[TABLE]
Substituting (24) and (25) into (18), we deduce by the aids of in (12) that
[TABLE]
Since and is a monotone decreasing sequence, we have . Note that for any , then
[TABLE]
This completes the proof.∎
By Lemma 4 and some transpositions, we have the following results directly.
Lemma 5
Let , be two sequences generated by Algorithm 1 and . Then, for any , we have
[TABLE]
where
[TABLE]
or
[TABLE]
Because the sequence is monotonically decreasing, we have for any . But for , we have . So, convergence of Algorithm 1 with is different from that with , and hence cannot be established by the similar methods as in yang ; Proximal-extrapolated .
Notice that in (32) when , we take (32) to study the convergence of Algorithm 1 with . Consequently, a larger upper bound of is obtained than that in yang . While for the case of , we take (36) as for all , and further investigate the properties of to ensure convergence of Algorithm 1.
Below we state and prove our main convergence result of Algorithm 1 for above two different regions: and .
Theorem 1
Let be the sequence generated by Algorithm 1 with . Then, converges weakly to a solution of problem (1).
Proof. From Remark 1, we have . Then for any and , we have
[TABLE]
Thus, there exists an integer such that for any ,
[TABLE]
which implies that in (32) when . Recall for any , we deduce in (32). Hence, by Lemma 5 and Fact 3, is convergent and . This means that is bounded and so does . Also, we have and By , we also have that and is bounded.
In what follows, we prove the sequence converges weakly to a solution of problem (1). For any cluster of , there exists a subsequence that converges weakly to , namely . It is obvious that also converges weakly to . Next we verify that . Applying Fact 1, we deduce
[TABLE]
Letting in (38) and using the facts , is lower semicontinuous and we obtain
[TABLE]
which confirms .
Finally, we prove that . We take in the definition (32) of and label as . Notice that is bounded and is continuous from (A3), we observe
[TABLE]
Therefore, , which by Fact 2 shows .
Now, we focus on convergence analysis of Algorithm 1 with and use (36). For this case, we can not establish the nonnegativity of and the monotonic decreasing of because . Consequently, convergence of can not be obtained from (27). We thus need to further investigate the sequence for getting a clear convergence, by using the boundedness of from Correction step.
First, we show that when and the operator is the gradient of a convex function , i.e., . From (26) with and , we deduce
[TABLE]
Using and the convexity of yields
[TABLE]
where . This together with , and gives us
[TABLE]
where , which implies from that
[TABLE]
That is to say, Correction step is not necessary when , for a convex function .
Theorem 2
Let be the sequence generated by Algorithm 1 with . Then, converges weakly to a solution of problem (1).
Proof. Firstly, gives . Note that , by taking the limit and from , we have
[TABLE]
for any . Thus, there exists an integer such that for any ,
[TABLE]
By , Remark 1 and Lemma 5, for any and , we have
[TABLE]
where from Remark 1 and Correction step. This together with in (36) implies that is bounded and
[TABLE]
so and By the fact , we have .
Due to , then is bounded. We can complete the proof by Remark 1 and the similar methods as in the proof of Theorem 1.
Remark 3
By the above analysis, it seems that convergence of the proposed algorithm could be still ensured without the assumption (A3), but it is not clear how to prove this as far as we known. Actually, the assumption (A3) is not restrictive, is continuous on when is an open set (this includes all finite-valued functions) or for any closed convex set . Moreover, (A3) holds for any separable lower semicontinuous convex function from (Bauschke2011Convex, , Corollary 9.15).
3.2 Ergodic Convergence Rate for
Since there are many researches about the convergence rate when , we just focus on the case when . Actually, the optimal rate of convergence is for the extragradient method rate-convergence . In this subsection, we investigate the ergodic convergence rate of the sequence for the general case (1).
From 11. and (Proximal-extrapolated, , Lemma 2.12), if and only if and
[TABLE]
The following theorem shows that the above criteria can be used to find under a desired accuracy.
Theorem 3
Let and be generated by Algorithm 1. For any and a sufficiently large related to , we define
[TABLE]
for any , then and
[TABLE]
Proof. First of all, we have by (26) that
[TABLE]
Since as , there exists a sufficiently large such that for any , it holds (If , then , else is a solution). So, we let with . Recalling (48) we deduce for any that
[TABLE]
Note that the function is convex. Now, applying the Jensen’s inequality to the left-hand side of the above inequality and taking
[TABLE]
into account, we have
[TABLE]
where
[TABLE]
Evidently, which ends the proof.
Notice that has a lower bound from Remark 1. Fixing , then we get as . This implies and Algorithm 1 has the ergodic convergence rate when .
3.3 Heuristics on Nonmonotonic Step Sizes
Generally speaking, the variable step is more beneficial than a fixed step for the proximal gradient methods. In Algorithm 1, the step size is updated but in a nonincreasing way, which might be adverse if the algorithm starts in the region with a big curvature of . Namely, the step size in Algorithm 1 is overdependent on the initial point. For the purpose of obtaining nonmonotonic step sizes, we present an improved algorithm as follows:
Algorithm 2** (Improved PEG with nonmonotonic step size.)**
Step 0.
Take , choose , , and a bounded sequence . Set , and . Choose and a sequence with and when for given .
Step 1.
**Prediction:
**1.a. Compute
[TABLE]
1.b. Compute
[TABLE]
if , then stop: is a solution.
Step 2.
Correction:
Check
[TABLE]
if not hold, set and return to Step 1.b.
Step 3.
Set and return to Step 1.
Since the step size is no longer monotonically decreasing, in (32) is not necessarily valid when , so Algorithm 2 implements Correction step for any . By and , we can deduce . Then Lemmas 3, 4 and 5 with (36) are still valid for sequences and generated by Algorithm 2.
The constant in Algorithm 2 is given only to ensure the upper boundedness of . Hence, it makes sense to choose quite large. In this case, the step sizes generated are allowed to increase but be bounded from Remark 1. Consequently, it follows from when for given that the sequence generated by Algorithm 2 is monotonically decreasing and then convergent,
[TABLE]
and . Under these conditions, it is not difficult to prove the following convergence theorem by using Lemma 5 with (36), though we do not know how to choose a proper .
Theorem 4
Let be a sequence generated by Algorithm 2 with . Then, converges weakly to a solution of problem (1).
4 Further Discussion
From the statement above, the condition for any is sufficient to ensure convergence of the proposed method. In this section, we explain by an extremely simple example that Algorithm 1 is not convergent when for any . That is to say, we would derive an upper bound of to guarantee the convergence of Algorithm 1, but Algorithm 1 with in some regions remains to be further studied, see Fig. 2.
Consider the simplest optimization problem
[TABLE]
Obviously, it can be formulated as a special case of problem (3) with (the identity operator), and . Followed by the updates of Algorithm 1, we have
[TABLE]
For any , , if
[TABLE]
then we can rewrite (54) as
[TABLE]
By (55) and Vieta’s Theorem, we have
[TABLE]
If , then the iterative (56) is not convergent. As a result, (54) is not convergent either. Namely, if
[TABLE]
then the iterative (54) is not convergent. By Remark 1 and , the convergence of Algorithm 1 can not be guaranteed if and for any .
5 Numerical Experiments
In this section, we perform Algorithm 2 111All codes are available at http://www.escience.cn/people/changxiaokai/Codes.html (denoted by “IPEG”) for solving some randomly generated minimization problems over difficult nonlinear constraints. The following state-of-the-art algorithms are compared to investigate the computational efficiency of IPEG:
- •
Tseng’s forward-backward-forward splitting method used as in (Proximal-extrapolated, , Section 4) (denoted by “TFBF”), with ;
- •
Proximal extrapolated gradient methods (Proximal-extrapolated, , Algorithm 2) (denoted by “PEG”), with line search and ;
- •
Modified projected gradient method yang (denoted by “MPG”), with .
- •
FISTA Nesterov1983 with standard linesearch (denoted by “FISTA”), with ;
We denote the random number generator by for generating data again in Python 3.8. All experiments are performed on an Intel(R) Core(TM) i5-4590 CPU@ 3.30 GHz PC with 8GB of RAM running on 64-bit Windows operating system.
Since solutions of (1) coincide with zeros of the residual function
[TABLE]
for some positive number , and implies , thus we use with given to terminate our algorithms, and the same is used to terminate PEG, MPG, FB and FISTA. In particular for TFBF, we use
[TABLE]
as in Proximal-extrapolated .
We generate as in M-extra , choose as a small perturbation of and take . This gives us an approximation of the local inverse Lipschitz constant of at . There are many choices of the sequence , but in the earlier iterations the large range of is benefit for selecting proper step size, we thus use
[TABLE]
for a given . In this section, we fix and . For applying Correction step, we use and (16) with and .
We report the number of iterations (Iter), the number of proximal operators ( prox), the number of () and the computing time (Time) measured in seconds. Note that the number of iterations equals that of proximal operators for PEG and IPEG, and is 2 smaller than that of for IPEG, we thus report the number of iterations and the number of for PEG and only the number of iterations for IPEG. The bold letter indicates the best results in the following tables.
Problem 1
The first problem (called Sun’s problem) was considered in 9. ; 15. ; yang , and the Lipschitz-continuous and monotone operator was generated by
[TABLE]
where
[TABLE]
and Here is a square matrix defined by
[TABLE]
and We choose the feasible set as and .
For Problem 1, the initial point is generated uniformly randomly from . For every and every above, the test results are listed in Table 1. In addition, we show the evolutions of and with respect to Iter for solving Problem 1 with , in Fig. 3.
Problem 2
The second test problem is the so-called Kojima-Shindo Nonlinear Complementarity Problem (NCP), considered in 10. ; P-G , where and the mapping is defined by
[TABLE]
The feasible set is and .
We choose three particular starting points: , and . The numerical results are reported in Table 2 and the evolutions of and with respect to Iter for solving Problem 1 with are shown in Fig. 4.
Problem 3
The third problem is HpHard problem, considered as in yang ; Proximal-extrapolated . Let with and , where , and , is a skew-symmetric matrix, every entry of and is uniformly generated from . The matrix is diagonal and its diagonal entry is uniformly generated from . Every entry of is uniformly generated from . The feasible set is and .
For every , as shown in Table 3, we have generated randomly two different and with and . For all tests, we take . Since is an affine operator, the number of iterations is 2 smaller than that of for PEG, thus we just report the number of iterations.
Problem 4
The fourth example is a sparse logistic regression problem for binary classification. Let be the training set, where is the feature vector of each data sample, and is the binary label. The formulation of sparse logistic regression reads
[TABLE]
where and is set to be in the numerical test.
Let and set . Then the objective in (61) is with and . It is easy to derive that . Thus, . We take three popular datasets from LIBSVM 222https://www.csie.ntu.edu.tw/$\sim$cjlin/libsvmtools/datasets/: w7a with , , a9a with , and real-sim with , .
Since is convex and , we apply IPEG to (61) without Correction step. We use to terminate all the algorithms for getting more accurate solution, and choose the smallest objective value among all methods and set it to . The results are shown in Table 4. To illustrate how does the value and change over times, we give two convergence plots for data “a9a” in Fig. 6.
To summarize our numerical experiments on Problems 1-4, we want to make some observations. Firstly, the advantage of IPEG in comparison with other algorithms is a larger interval for possible step size , see Fig. 3(b), Fig. 4(b) and Fig. 5(b), which resulted from the proper choice of and the larger value of .
Secondly, we observed that for the majority of the test problems, IPEG is more efficient than other algorithms in both the number of iterations and the CPU time. Furthermore, IPEG with performs efficiently than that with from the convergence plots of shown in Fig. 3(a), Fig. 4(a) and Fig. 5(a), which is extremely due to the larger step size and the use of only one value of the mapping required per iteration. Although linesearch is involved in Correction step, the condition required is so weak that the linesearch is not started for many problems.
In addition, since MPG yang adopted nonincreasing step sizes, it is adverse when starting in the region with a big curvature of , see Fig. 3(b) and the results of MPG for Problem 1. From Fig. 5, the step sizes generated by IEPG have fluctuated within a range at the first 500 iterations, after that the range decreases as we use (59) with to control the increase of step sizes.
6 Conclusions
Without the knowledge of Lipschitz constant, we have proposed a proximal extrapolated gradient method using a prediction-correction procedure to determine stepsizes, and improved it numerically with non-monotonic step size. The method extended the range of parameters (considering the case of ) and obtained a larger step size than the existing methods by using correction step. Finally, a number of experiments illustrate that the proposed method is efficient, and the improvement can be resulted from the larger step size.
In addition, we have shown by an extremely simple example that our method is not convergent if for any . From Fig. 3, the convergence of the proposed method remains unknown for in some regions. Especially for , it remains to be explored whether there are any (larger) such that Algorithms 1 and 2 are convergent. Perhaps our method without the correction step is convergent as well, and can be generalized to other methods that need to estimate the Lipschitz constant. We leave this as an interesting topic for our future research.
Acknowledgements.
The research of Xiaokai Chang was supported by the Hongliu Foundation of First-class Disciplines of Lanzhou University of Technology. The project was supported by the National Natural Science Foundation of China under Grant 61877046 and the Natural Science Basic Research Plan in Shaanxi Province of China (2017JM1014).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1(1) Antipin, A.S.: On a method for convex programs using a symmetrical modification of the Lagrange function. Ekonomika i Matematicheskie Metody, 12(6), 1164–1173 (1976)
- 2(2) Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer Berlin, New York (2011)
- 3(3) Bertsekas, D.P., Gafni, E.M.: Projection methods for variational inequalities with applications to the traffic assignment problem. Math. Program. Study, 17, 139–159 (1982)
- 4(4) Burachik, R.S., Lopes, J.O., Svaiter, B.F.: An outer approximation method for the variational inequality problem. SIAM J. Control Optim. 43(6), 2071–2088 (2005)
- 5(5) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3(1), 1–122 (2011)
- 6(6) Bot R.I., Csetnek, E.R.: Forward-backward and Tseng’s type penalty schemes for monotone inclusion problems. Set-Valued Var. Anal. 22, 313–331 (2014)
- 7(7) Bot R.I., Csetnek, E.R.: An inertial forward-backward-forward primal-dual splitting algorithm for solving monotone inclusion problems. Numer. Algor. 71, 519–540 (2016)
- 8(8) Chang, X., Liu, S., Zhao, P., Li, X.: Convergent prediction-correction-based ADMM for multi-block separable convex programming. J. Comput. Appl. Math. 335, 270–288 (2018)
