An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems
Weiwei Kong, Jefferson G. Melo, Renato D.C. Monteiro

TL;DR
This paper introduces an adaptive accelerated inexact proximal point method for efficiently solving linearly constrained nonconvex composite optimization problems, improving upon previous methods with adaptive strategies and nonconvex subproblem handling.
Contribution
It develops a novel adaptive variant of the quadratic penalty accelerated inexact proximal point method that handles nonconvex subproblems more efficiently.
Findings
The proposed methods outperform existing approaches in numerical tests.
Adaptive stepsize adjustment improves convergence speed.
The methods effectively solve large-scale nonconvex constrained problems.
Abstract
This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve linearly set constrained nonconvex composite optimization problems using a quadratic penalty approach where the generated penalized subproblems are solved by a variant of the underlying AIPP method. The variant, in turn, solves a given penalized subproblem by generating a sequence of proximal subproblems which are then solved by an accelerated composite gradient algorithm. The main difference between AIPP and its variant is that the proximal subproblems in the former are always convex while the ones in the latter are not necessarily convex due to the fact that their prox parameters are chosen as aggressively as possible so as to improve efficiency. The possibly nonconvex proximal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Optimization and Variational Analysis
11institutetext: Weiwei Kong 22institutetext: Renato D.C. Monteiro 33institutetext: School of Industrial and Systems Engineering, Georgia Institute of Technology, Atlanta, GA, 30332-0205. (33email: [email protected] & [email protected]). The works of these authors were partially supported by ONR Grant N00014-18-1-2077.
Jefferson G. Melo 44institutetext: Institute of Mathematics and Statistics, Federal University of Goias, Campus II- Caixa Postal 131, CEP 74001-970, Goiânia-GO, Brazil. (44email: [email protected]). The work of this author was supported in part by CNPq Grant 406975/2016-7.
An efficient adaptive accelerated inexact proximal point method for solving linearly constrained nonconvex composite problems
Weiwei Kong
Jefferson G. Melo
Renato D.C. Monteiro
(March 17, 2024)
Abstract
This paper proposes an efficient adaptive variant of a quadratic penalty accelerated inexact proximal point (QP-AIPP) method proposed earlier by the authors. Both the QP-AIPP method and its variant solve linearly set constrained nonconvex composite optimization problems using a quadratic penalty approach where the generated penalized subproblems are solved by a variant of the underlying AIPP method. The variant, in turn, solves a given penalized subproblem by generating a sequence of proximal subproblems which are then solved by an accelerated composite gradient algorithm. The main difference between AIPP and its variant is that the proximal subproblems in the former are always convex while the ones in the latter are not necessarily convex due to the fact that their prox parameters are chosen as aggressively as possible so as to improve efficiency. The possibly nonconvex proximal subproblems generated by the AIPP variant are also tentatively solved by a novel adaptive accelerated composite gradient algorithm based on the validity of some key convergence inequalities. As a result, the variant generates a sequence of proximal subproblems where the stepsizes are adaptively changed according to the responses obtained from the calls to the accelerated composite gradient algorithm. Finally, numerical results are given to demonstrate the efficiency of the proposed AIPP and QP-AIPP variants.
2000 Mathematics Subject Classification: 47J22, 90C26, 90C30, 90C60, 65K10.
Key words: quadratic penalty method, nonconvex program, iteration-complexity, proximal point method, first-order accelerated methods.
1 Introduction
This paper presents a computationally efficient variant of the quadratic penalty accelerated inexact proximal point (QP-AIPP) method studied in WJRproxmet1 .
The QP-AIPP method of WJRproxmet1 is designed for solving the linearly–constrained nonconvex composite optimization problem
[TABLE]
where is a linear operator, , is a closed proper convex function, and is a real-valued differentiable (possibly nonconvex) function whose gradient is –Lipschitz and which, for some , satisfies
[TABLE]
The QP-AIPP method solves (1) via a quadratic penalty method, i.e., a sequence of penalty subproblems of the form
[TABLE]
for an increasing sequence of positive penalty parameters , is solved by the accelerated inexact proximal point (AIPP) method (discussed below) in which each penalty subproblem is solved using a common starting point (i.e., a cold–start strategy is adopted).
We briefly outline the AIPP method of WJRproxmet1 . First, note that (3) is a special case of
[TABLE]
where is a function satisfying
[TABLE]
where . In the general setting of (4)–(5), the AIPP method generates a sequence using an inexact proximal point (IPP) framework (see for example Rock:ppa ; hpe_svaiter99 ), i.e., given , it computes as a suitable approximate solution of the proximal subproblem
[TABLE]
for some prox-parameter . Note that the first inequality in (5) implies that the objective function of (6) is convex as long as is not larger than . The AIPP method sets for every and uses an accelerated composite gradient (ACG) variant (see for example beck2009fast ; MontSvaiter_fista ; Nesterov1983 ) to approximately solve (6).
Since the larger is the faster the above IPP framework converges to a desirable approximate solution, the main goal of this paper is to develop an aggressive AIPP variant, and subsequently an aggressive QP-AIPP variant, which possibly chooses substantially larger than despite potential loss of convexity of (6). An important ingredient in obtaining this aggressive AIPP variant is the development of a relaxed ACG (R-ACG) algorithm that approximately solves (6) according to a more relaxed termination criterion. More specifically, within a reasonably number of iterations, the algorithm: (i) either solves the possibly nonconvex subproblem (6) according to the relaxed criterion or stops with failure due to being too large; and (ii) always solves (6) according to the relaxed criterion when its objective function is convex. The aforementioned relaxed AIPP (R-AIPP) variant starts with a relatively large initial prox parameter and, in each one of its steps, calls the R-ACG algorithm to solve the corresponding prox subproblem. If a key descent inequality fails, then the prox parameter is halved, the prox center is maintained, and the R-ACG algorithm is invoked once again to solve the resulting prox subproblem; otherwise, the prox parameter is preserved and takes the place of .
This paper also considers a more general version of (1) in which the linear constraint is replaced by the linear set constraint , where is a closed convex set. Clearly, when , the more general problem reduces to (1). Under the assumption that is bounded and all penalty subproblems are solved by the AIPP variant using the aforementioned cold–start strategy, it turns out that the iteration complexity of the QP-AIPP variant for finding the desired approximate solution is considerably worse than that of the QP-AIPP method of WJRproxmet1 . If, on the other hand, the QP-AIPP variant adopts the warm–start strategy in which the R-AIPP method for solving the current penalty subproblem starts from the approximate solution found for the previous subproblem, then the iteration complexity of this relaxed QP-AIPP (R-QP-AIPP) variant is shown to be the same as that of the QP-AIPP method of WJRproxmet1 , up to a logarithmic factor.
The proposed AIPP and QP-AIPP variants are compared with three state-of-the-art optimization methods on five different optimization problems. The computational results obtained show that the variants can substantially outperform most of the competing methods on many problem instances.
Related works. We first discuss papers dealing with related algorithms for solving the convex version of (1) and other related monotone problems. Iteration-complexity analysis of quadratic penalty methods for solving (1) under the assumption that is convex and is a convex indicator function was first studied in LanRen2013PenMet and further explored in Aybatpenalty ; IterComplConicprog . Iteration-complexity of first-order augmented Lagrangian methods for solving the latter class of linearly constrained convex programs was studied in AybatAugLag ; LanMonteiroAugLag ; ShiqiaMaAugLag16 ; zhaosongAugLag18 ; Patrascu2017 ; YangyangAugLag17 . Inexact proximal point methods using accelerated gradient algorithms to solve their prox-subproblems were previously considered in GlanPDaccel2014 ; YHe2 ; YheMoneiroNash ; OliverMonteiro ; MonteiroSvaiterAcceleration in the setting of convex-concave saddle point problems and monotone variational inequalities.
We now discuss papers dealing with related algorithms for solving (1) when is nonconvex and the assumptions mentioned after (1) hold. Paper WJRproxmet1 is, up to our knowledge, the first one to consider a proximal method with acceleration strategy for solving (1). Previous works using acceleration strategies were concerned with the unconstrained problem (4). Namely, nonconv_lan16 proposed an accelerated gradient framework to solve (4) with better iteration complexity than the usual composite gradient method. Since then, many authors have proposed other accelerated frameworks for solving (4) under different assumptions on the functions and (see, for example, Aaronetal2017 ; Paquette2017 ; Ghadimi2019 ; Li_Lin2015 ; CatalystNC ). In particular, by exploiting the lower curvature , Aaronetal2017 ; Paquette2017 ; CatalystNC proposed some algorithms which improve the iteration-complexity bound of nonconv_lan16 in terms of the dependence on the upper curvature . Finally, there has been a growing interest in the iteration complexity of methods for solving optimization problems using second order information (see, for example, Aaronetal2017 ; MonteiroSvaiterNewton ; NesterovSec_ord ; CartToint ).
Organization of the paper. Subsection 1.1 provides some basic definitions and notation. Section 2 begins with presenting some background materials and transitions into defining a general descent (GD) framework for solving the nonconvex optimization problem (4). Section 3 presents and derives the complexity of an R-ACG algorithm which attempts to solve (6) even when it is not convex. Section 4 presents a relaxed variant of the AIPP method proposed in WJRproxmet1 . Section 5 presents a relaxed variant of the QP-AIPP method proposed in WJRproxmet1 . Section 6 presents numerical results to illustrate the efficiency of the AIPP and QP-AIPP variants. Finally, Section 7 presents some concluding remarks.
1.1 Basic definitions and notation
This subsection provides some basic definitions and notation used in this paper.
The set of natural numbers is denoted by . The set of real numbers is denoted by . The set of non-negative real numbers and the set of positive real numbers are denoted by and , respectively. Let denote a real valued –dimension inner product space, whose inner product and its associated induced norm are denoted by and , respectively. Let denote the Frobenius inner product. Let denote the cone of positive semidefinite –by– matrices. For , define . The set of proper lower semi-continuous convex functions defined on is denoted by . Given a linear operator , the operator norm of is denoted by .
Let be given. The effective domain of is denoted by and is proper if . If is differentiable at , then its affine approximation at is denoted by
[TABLE]
Also, for , its -subdifferential at is denoted by
[TABLE]
The subdifferential of at , denoted by , corresponds to .
For a given , the closure of the set is denoted by , the indicator function of , denoted by , is defined as if and if . Moreover, the normal cone of at a point is denoted by
[TABLE]
2 A general descent framework
As discussed in Section 1, all the penalized subproblems (see (2)) that arise during the execution of the QP-AIPP method, as well as the R-QP-AIPP method, are of the form (4). Hence, efficiently obtaining a solution of (4) is of paramount importance for both the QP-AIPP and R-QP-AIPP methods. While the QP-AIPP method uses the AIPP method to solve (4), the R-QP-AIPP method uses the R-AIPP method which will be discussed in Section 4. The discussion of this section (as well as Section 3) will essentially pave the way towards the presentation of the R-AIPP method.
More specifically, this section presents and analyzes a GD framework for solving (4) that makes use of a black box (see step 1 of the GD framework below). In addition, it describes: the assumptions and relevant quantities underlying problem (4), the notion of approximate stationary point of (4) adopted in this section and Section 4, and the relationship between the GD framework and the GIPP framework of WJRproxmet1 , of which the AIPP method is an instance of.
Our problem of interest in this section and Section 4 is (4) which is assumed to satisfy the following assumptions:
- (A1)
;
- (A2)
is a nonconvex differentiable function on and there exist a scalar such that
[TABLE]
- (A3)
.
In addition, the analysis in Section 4 makes use of the quantity
[TABLE]
which is positive in view of assumption (A2). While it is generally difficult to compute the above quantity, it is well known that assumption (A2) implies that . Moreover, it is shown in Proposition 6 below that the smaller is, the better the iteration complexity of R-AIPP method in Section 4 becomes.
It is well-known that a necessary condition for to be a local minimum of (4) is that be a stationary point of , i.e., . A relaxation of this inclusion leads to the following definition of an approximate stationary point of (4): given a tolerance , a pair is said to be a –approximate stationary point of (4) if
[TABLE]
Given a general quadruple , the following simple refinement procedure shows how to obtain a pair satisfying the inclusion in (11) with a technically useful bound on the residual (see Proposition 1 below).
Refinement procedure.
Input: a scalar , a pair of functions satisfying assumptions (A1) and (A2), and a quadruple ;
Output: a triple satisfying (16);
- (0)
set
[TABLE]
- (1)
compute
[TABLE]
- (2)
return the triple .
For the sake of brevity, we write to indicate that the triple is the output of the above refinement procedure with inputs , , and . We now state an important property of this procedure, whose proof can be found in Appendix A.
Proposition 1
Let a pair of functions satisfying (A1)–(A3) and a quadruple be given and let . Then, and
[TABLE]
where is as in (12).
The above proposition shows that the pair , computed as in (13) and (14), clearly satisfies the inclusion in (11) and that the quantity has an upper bound expressed in terms of the two quantities: and . Given a tolerance , it will be shown in Proposition 2 below that the GD framework stated next generates a sequence of iterates whose corresponding refined sequence obtained as for every yields a –approximate stationary point of (4).
GD framework.
Input: a scalar , a function pair satisfying assumptions (A1)–(A3), an initial point , and a scalar pair ;
- (0)
set and ;
- (1)
find a triple such that its corresponding refined triple
[TABLE]
satisfies
[TABLE]
- (2)
set and go to step 1.
We now make three remarks about the GD framework. First, no termination criterion is added to the GD framework so as to be able to discuss convergence rate results about its generated sequence. A discussion of how to terminate it is given after Proposition 2 below. Second, step 1 should be viewed as an oracle in that it does not specify how to compute the triple . Third, Corollary 1 below shows that if the stepsize is chosen so that the prox subproblem (6) is a strongly convex composite problem, i.e., where is as in (10), the point is chosen as its unique optimal solution, and is set to zero, then the triple satisfies (17) and (18) with and . Thus, when , we conclude that: (i) there always exists a triple satisfying (17) and (18); and, (ii) the GD framework can be viewed as an IPP method. Fourth, the R-AIPP of Section 4, being a special instance of the GD framework, can also be viewed as an IPP method which chooses in the open rectangle and applies an ACG variant, such as the one described in Section 3, to problem (6) in order to obtain a triple satisfying (17) and (18).
The following result shows an important property about the sequence of iterates .
Proposition 2
The sequences of stepsizes and iterate pairs satisfy
[TABLE]
for every , where .
Proof
Let be fixed. The inclusion in (19) follows from Proposition 1 with and the definitions of and in step 1 of the GD framework. To show the inequality in (19), first observe that (17) and the definition of in (4) implies that
[TABLE]
Now, let be arbitrary. In view of step 1 of the GD framework we have . Hence Proposition 1 with and (18) with imply that
[TABLE]
The inequality in (19) now follows by combining (20) and (21).
We now make three remarks about the GD framework in light of Proposition 2. First, if the GD framework stops when a pair such that is found, then it follows from (11) and the inclusion in (19) that is a –approximate stationary point of (4). Second, if the sequence of stepsizes satisfies , then it follows from the inequality in (19) and assumption (A3) that the GD framework indeed stops according to the above termination criterion. Third, (19) indicates that the larger the stepsizes are, the faster the quantity approaches zero.
For the remainder of this section, our goal is to show that the GD framework can be seen as a relaxation of the GIPP framework studied in WJRproxmet1 . The proof of this fact is not essential in establishing any results pertaining to the R-AIPP method in Section 4 or the R-QP-AIPP method in Section 5 and may skipped without any loss of continuity.
Recall that, for a given and , the GIPP framework in WJRproxmet1 considers a sequence satisfying
[TABLE]
for every . We now state a simple technical result which will not only be used in this section but also later in the analysis of the R-ACG algorithm (see Section 3).
Lemma 1
Assume that and satisfy
[TABLE]
Then, the quantity defined in (15) satisfies .
Proof
Let be computed as in (13) and (15). It follows from (8) and (23) that
[TABLE]
Considering the above inequality at the point , along with some algebraic manipulation, we have
[TABLE]
where the last equality is due to the definitions of and given in (4) and (15), respectively.
The following result shows the relationship between the GIPP framework of WJRproxmet1 and the GD framework of this section.
Proposition 3
If, for some , constant , and index , the quadruple satisfies (22), then satisfies (17) and (18) for any and . As a consequence, if , then every instance of the GIPP framework is an instance of the GD framework for any satisfying
[TABLE]
Proof
The proof that satisfies (17) with can be found in (WJRproxmet1, , Proposition 5(a)). Now, let and observe that from Lemma 1 with and we have . It follows from the last inequality and the inequality in (22) that . Combining the previous inequality with the assumption on now shows that satisfies (18). The second part of the proposition follows immediately from the first part and condition (24).
The above proposition shows that if is bounded and the parameter triple satisfies (24), then the condition for finding an iterate in the GD framework is more relaxed than the condition for finding an iterate in the GIPP framework. As a consequence, under the conditions in (24), an optimization algorithm (such as the R-ACG algorithm of Section 3) applied to (6) is expected to find the triple for the GD framework faster than the quadruple for the GIPP framework.
The following corollary justifies the third remark following the GD framework.
Corollary 1
Let and be given, where is as in (10). Then, (6) has a unique global minimum and the triple where satisfies (17) and (18) with and .
Proof
The existence and unique uniqueness of follows from the fact that is strongly convex. Moreover, the fact that is the unique global minimum of (6) implies that the quadruple , where , satisfies (22) with . The conclusion of the corollary now follows immediately from the first part of Proposition 3 with .
3 A relaxed accelerated composite gradient algorithm
This section presents and analyzes an ACG variant, namely, the R-ACG algorithm, which is used as an important tool in the development of the R-AIPP method of Section 4. More specifically, the R-AIPP method can be viewed as a special instance of the GD framework where step 1 is implemented by repeatedly calling the ACG variant of this section.
Before describing the variant, we consider its assumptions as well as the problem that it solves. First, we describe the assumptions. Let be given and assume that it can be decomposed as where:
- (B1)
;
- (B2)
is a differentiable function on such that for some ,
[TABLE]
We now describe our problem of interest in this section.
Problem A: Given satisfying the above assumptions, a point , and a pair of parameters , the problem is to find a triple such that
[TABLE]
The following simple result shows how the ability to solve Problem A allows us to implement the “step 1” oracle in the GD framework.
Proposition 4
Assume that satisfies conditions (A1) and (A2), and let be given. Then the following statements hold:
- (a)
if satisfies (25) with for some , then the triple satisfies (17);
- (b)
*if solves Problem A with input for some , then the triple solves step 1 of the GD framework. *
Proof
(a) Assume that satisfies (25). It follows from the fact that and the definition of that
[TABLE]
and thus the triple satisfies (17).
(b) Assume that satisfies (26) and define and . Moreover, let be computed as in (15) with as in (13). It follows from Lemma 1, the definition of , the fact that , and the inclusion in (26) that . Using the inequality in (26) and the fact that gives and thus the pair satisfies (18) in view of the definition of . As a consequence, the triple solves step 1 of the GD framework.
The R-ACG algorithm presented below, which is a modified ACG variant for minimizing the function , solves Problem A under the assumption that is convex (see Proposition 5(c) below). As a consequence, it can be used to implement step 1 of the GD framework whenever is sufficiently small. More specifically, since is clearly convex whenever is chosen in , where is as in (10), we can use the R-ACG algorithm to solve problem A with and , and hence the “step 1” oracle in the GD framework in view of Proposition 4(b). In fact, the AIPP method developed in WJRproxmet1 is an instance of the GIPP framework (and hence an instance of the GD framework) in which, given an upper bound on , it chooses for all and in which step 1 is implemented with a single call to the R-ACG algorithm presented below.
However, our main goal in this paper is the development of an instance of the GD framework which aggressively chooses (possibly) much larger than since, according to Proposition 2, this strategy can potentially reduce its number of iterations. In this regard, the R-ACG algorithm presented below accepts as input a function of the form for some in which is not necessarily convex, and terminates with either failure or by finding a triple satisfying (25) within iterations (see statements (a) and (b) of Proposition 5 below). Clearly, in the second case, the triple is guaranteed to satisfy (17) but not necessarily (18) (see Proposition 4(a)). If (18) is satisfied then the R-ACG algorithm clearly provides a solution of the “step 1” oracle of the GD framework; otherwise, the stepsize is considered large. The R-AIPP method of Section 4 is an instance of the GD framework which attempts to provide a solution of its “step 1” oracle in this manner and adaptively reduces whenever it is found to be large.
R-ACG algorithm.
Input: a scalar , a function pair satisfying assumptions (B1) and (B2), an initial point , and a pair of parameters ;
Output: a triple satisfying (25) or a failure status;
- (0)
set , , , , and define
[TABLE]
- (1)
compute
[TABLE]
and set
[TABLE]
- (2)
if both inequalities
[TABLE]
hold, then go to step 3; otherwise, stop with failure;
- (3)
if both inequalities
[TABLE]
hold, then return ; otherwise, increment and go to step 1.
Some comments about the above algorithm are in order. First, step 1 is essentially a standard step of an ACG variant (see, for example, YHe2 ; WJRproxmet1 ) applied to the problem with the exception that it also computes in (33) the quantities and which, together with , determine the termination criteria for the method. Second, it is shown in (WJRproxmet1, , Lemma 9) that a simplified version of the above algorithm, namely, one that does not include the two tests performed in step 2 and stops whenever the inequality in (22) is satisfied with , implements step 1 of the GIPP framework in WJRproxmet1 . Finally, it is well-known (see, for example, (YHe2, , Proposition 2.3)) that the scalar updated according to (29) satisfies
[TABLE]
The next result establishes the iteration-complexity bound and some properties of the R-ACG algorithm.
Proposition 5
The R-ACG algorithm satisfies the following statements:
- (a)
it stops (either with success or failure) in at most
[TABLE]
iterations, where
[TABLE]
- (b)
if it stops with success then its output satisfies
[TABLE]
- (c)
if is convex then it always terminates with success and its output solves Problem A.
Proof
(a) See Appendix A.2.
(b) This follows from the fact that when the R-ACG algorithm stops with success, the last iterate satisfies (37).
(c) It follows from (WJRproxmet1, , Proposition 8(c)) that if is convex, then the iterate satisfies (34) and the inclusion for every . Hence, since the aforementioned inclusion and the definition of in (27) imply (35), we conclude that the R-ACG algorithm does not terminate with failure (see step 2). As a consequence, it follows from statement (a) that it must terminate with success. It then follows from the previous inclusion, and the fact that the last iterate satisfies (36), that fulfills (26).
4 A relaxed accelerated inexact proximal point method
This section states and analyzes a relaxed variant of the AIPP method proposed in WJRproxmet1 , namely, the R-AIPP method, for computing an approximate stationary point of (4) as in (11).
The R-AIPP method stated below is an instance of the GD framework which implements its step 1 by repeatedly invoking the ACG variant in Section 3 and thereby generates the method’s iteration sequence. More specifically, if denotes the previous iterate in the GD framework and then the R-ACG algorithm is invoked to attempt to solve Problem A with curvature , function pair , and initial point given by
[TABLE]
If it succeeds, it obtains a pair which will satisfy condition (25) of Problem A. Consequently, if the triple satisfies (18), then it is a solution of step 1 of the GD framework. If the R-ACG algorithm declares failure or the triple does not satisfy (18), then the stepsize is reduced and the above procedure is repeated.
R-AIPP method.
Input: a tolerance , a scalar , a function pair satisfying assumptions (A1)–(A3), an initial point , a scalar , and a pair of parameters ;
Output: a pair satisfying (11);
- (0)
set and ;
- (1)
apply the R-ACG algorithm to Problem A in Section 3 with inputs , , , and , where
[TABLE]
if the R-ACG algorithm stops with failure then set and repeat this step; otherwise, let denote its output triple and go to step 2;
- (2)
compute through the refinement procedure; if
[TABLE]
then set and go to step 1; otherwise, set
[TABLE]
and go to step 3;
- (3)
if satisfies
[TABLE]
then return ; otherwise, increment and go to step 1;
We now give some comments about the above method. First, it performs two types of iterations, namely, the outer iterations which are indexed by and the inner ones which are performed by the R-ACG algorithm every time it is called in step 1. Second, if the call to the R-ACG algorithm in step 1 does not stop with failure then, by Proposition 5(b), the triple output by the R-ACG algorithm together with the stepsize will satisfy (41) where . Hence, by Proposition 4(a), the triple will satisfy (17). If is also not halved in step 2 then the definition of and Proposition 4(b) imply that the triple also satisfies (18). As a consequence, a single iteration of the R-AIPP method implements step 1 of the GD framework. Third, the termination condition (43) and Proposition 1, with , imply that the required solution, i.e., a pair satisfying (11), is obtained when the R-AIPP method terminates. Fourth, since the R-AIPP iterates implement step 1 of GD framework, and the sequence is bounded below (see Lemma 2(b) below), Proposition 2 implies that the sequence generated by the R-AIPP method has a subsequence approaching zero, and thus the method must terminate in step 3. Fifth, although the R-AIPP method does not necessarily generate proximal subproblems with convex objective functions, it is shown in Proposition 6 below that it has an iteration-complexity similar to that of the AIPP method of WJRproxmet1 . Finally, in contrast to the aforementioned AIPP method, the R-AIPP neither requires an upper bound on the quantity in (10) as part of its input nor does it place any restriction on the initial stepsize .
Each iteration of the R-AIPP method may call the R-ACG algorithm multiple times (possibly just one time). Invocations of the R-ACG algorithm that stop with success are said to be of type while the other invocations are said to be of type . Let (resp., ) denote the total number of R-ACG calls of type (resp., type ). The following technical result provides some basic facts about , and the sequence of stepsizes .
Lemma 2
The following statements hold for the R-AIPP method:
- (a)
if the stepsize for some , then every iteration is of type and, as a consequence, for every ;
- (b)
* can be bounded as ;*
- (c)
* is non-increasing and satisfies for all .*
Proof
(a) Since , the definition of in (10) implies that is convex, where is as defined in (42) with . Hence, Proposition 5(c) together with Proposition 4(b) imply that step 1 and step 2 do not halve at the iteration, which is to say that this iteration is of type . Since is clearly nonincreasing, the same conclusion holds true for every iteration . Moreover, as is not halved for subsequent iterations following , it follows that for every .
(b) Using the fact that immediately before each iteration of type , the stepsize is halved, we see that the condition in part (a) would eventually be satisfied for some iteration , and hence is finite. Now, note that if then the inequality in part (b) follows immediately. Assume then that . It now follows from part (a) and the definition of that , which clearly implies the inequality in part (b).
(c) The first statement follows trivially from the update rule of in the R-AIPP method. Now, note that the definition of together with the update rule for imply, for every , that The inequality in part (c) then follows from the inequality in part (b).
In view of Lemma 2(a), choosing an initial stepsize satisfying results in an R-AIPP variant with constant stepsize, which resembles the AIPP method described in WJRproxmet1 .
The next proposition presents a worst-case iteration complexity bound on the number of inner iterations of the R-AIPP method with respect to the inputs and , the quantity in (10), and the tolerance .
Proposition 6
Defining , the R-AIPP method outputs a –approximate stationary point of (4) in at most
[TABLE]
inner iterations.
Proof
Let (resp. ) denote the total number of inner iterations performed during all calls of type (resp. type ) (see the paragraph preceding Lemma 2). Clearly, the total number of inner iterations is . We now bound each one of the quantities and separately by using the fact that assumption (A2), (42), and Proposition 5(a) imply that the number of inner iterations performed during each call to the R-ACG algorithm is bounded by
[TABLE]
where is the value of just before the call and is as in (40) with .
We first consider . Note that Lemma 2(b) implies that is finite. Since when , we may assume without loss of generality that . Note that the values of just before the calls of type O are exactly . Hence, we conclude that
[TABLE]
where the second inequality is due the fact that Lemma 2(b) implies for every . Thus, we obtain
[TABLE]
We now bound . Suppose that and observe that the termination criterion (43) is not satisfied in the first iterations. Since the R-AIPP method is an instance of the GD framework, it follows from Proposition 2 that
[TABLE]
Using the fact that Lemma 2(c) implies and for every , we obtain
[TABLE]
Hence, we conclude that
[TABLE]
It can be easily seen that the bound in (47) trivially holds when in view of the last term in it. Indeed, to prove this, just assume that in the above argument bounding . Now, since , the bound in (44) follows by adding (45) and (47).
The last statement of the proposition follows due to Proposition 1 and the termination condition in step 3 of the R-AIPP method.
Observe that, unless is large or is small, the first term in (44) dominates the second one.
The numerical experiments in Section 6 consider three variants of the R-AIPP method, two of which are R-AIPP instances with different choices of . More specifically, given an upper bound on , one of the R-AIPP instances chooses while the other one chooses . For the problem instances considered, the former choice of is relatively small, while the latter choice is relatively large.
We now end this section by discussing some possible choices of the initial stepsize and how the corresponding R-AIPP instances compare to the AIPP method of WJRproxmet1 . First, the AIPP method requires knowledge of an upper bound on such that , and, as a consequence of a more general iteration complexity bound derived in (WJRproxmet1, , Corollary 14), its inner iteration complexity can be shown to be
[TABLE]
Now, if as above is also known to the R-AIPP and the input is set to , then its inner iteration complexity (44) reduces to
[TABLE]
which is the same as (48) up to a logarithmic factor. On the other hand, if is chosen so that then (44) reduces to
[TABLE]
whose dominant first term is as good as the dominant first term in (48) whenever .
5 A relaxed quadratic penalty AIPP method
This section presents the R-QP-AIPP method for solving a class of linearly–set–constrained nonconvex composite optimization problems. Similar to the QP-AIPP method of WJRproxmet1 , the R-QP-AIPP method is a quadratic penalty–based method that solves a sequence of penalized subproblems, for increasing values of the penalty parameter, using the R-AIPP method of Section 4. The section contains two subsections. The first one describes the main problem of interest, its underlying assumptions, and the notion of a corresponding approximate stationary point which R-QP-AIPP method will provably obtain, and briefly outlines a cold–started quadratic penalty–based method for obtaining such a point. The second one presents a warm–started quadratic penalty–based method, namely, the R-QP-AIPP method, for obtaining the desired stationary point and establishes its ACG iteration complexity.
5.1 The linearly–set–constrained problem
This subsection describes the main problem of interest in this section, namely, the linearly–set–constrained nonconvex composite optimization problem (51), its underlying assumptions, and a notion of an approximate stationary point of it. Moreover, it describes the quadratic penalty subproblem (parameterized a penalty parameter) associated with (51) and discusses the relationship between their corresponding approximate stationary points. It then outlines a (static and dynamic) cold–started quadratic penalty–based method and its corresponding iteration-complexity bound, which turns out to be larger than that of the QP-AIPP method of WJRproxmet1 .
The main problem of interest for this section is the linearly–set–constrained nonconvex composite optimization problem
[TABLE]
where closed convex set , linear operator , and functions , satisfy the following assumptions:
- (C1)
and its diameter
[TABLE]
is finite;
- (C2)
and ;
- (C3)
is a nonconvex differentiable function on and there exist a scalar such that
[TABLE]
- (C4)
.
We make two remarks about the above assumptions. First, Lemma 4 in Appendix A.3 shows that (C1), (C3), and the additional assumption that be lower semicontinuous on imply (C4). Second, denoting as the quantity (10) with , assumption (C3) implies that . Moreover, it is shown in Theorem 5.1 below that the smaller is, the better the iteration complexity of the R-QP-AIPP method becomes.
We now discuss a notion of approximate stationary point for (51). Clearly, (51) is equivalent to the problem
[TABLE]
Moreover, a necessary condition for a point to be a local minimum to the above problem is that there exists a multiplier such that
[TABLE]
Given a tolerance pair , a triple is said to be a –approximate stationary point of (1) if it satisfies
[TABLE]
Clearly, a –approximate stationary point of (51) when means that the pair and the multiplier satisfy (55).
We now describe the quadratic penalty subproblem (parameterized by a penalty parameter) with respect to (51). Defining the quadratic penalty function as
[TABLE]
where
[TABLE]
for every , the quadratic penalty subproblem parameterized by a penalty parameter with respect to (51) is
[TABLE]
We now make four remarks regarding (59). First, (3) is an instance of (59) in which . Second, when , the optimal value of (59) coincides with in (C4), and hence there is no abuse of notation made here. Third, it is easily seen that
[TABLE]
where is as in (51). Finally, (59) is a penalty subproblem involving only the original variable of formulation (51) rather than the one associated with (54) (constructed as in Section 1 with replaced by ), which involves the pair of variables .
The following result shows how a –approximate stationary point of (59) yields a –approximate stationary point of (51) when the penalty parameter is sufficiently large.
Proposition 7
Let and be given and suppose that is a –approximate stationary point of (59) as in (11) with . Moreover, let be as in (10) with and define
[TABLE]
where and are as in (51) and (C4), respectively. Then, the following statements hold:
- (a)
for every , the pair satisfies (9);
- (b)
the triple satisfies the inclusions and the first inequality of (56) and
[TABLE]
- (c)
if, in addition, the penalty parameter satisfies
[TABLE]
then , and hence is a –approximate stationary point of (51).
Proof
Throughout this proof, we will make use of the well known fact (see, for example, (beck2017first, , Theorems 6.39 & 6.60)) that is convex, differentiable, its gradient is –Lipschitz, and, for every ,
[TABLE]
(a) This follows immediately from the definition of in (61), assumption (C3), and the fact that is 1–Lipschitz continuous.
(b) Using the definitions of and given in (62), and the fact that (65) at implies , observe that: (i) ; and (ii) . It now follows from the definition of a –approximate stationary point of (59) with and the previous observations that
[TABLE]
Hence, with the additional fact that from (11), it follows that the inclusions and first inequality of (56) hold. Next, observe that the convexity of and the first inclusion in (5.1) imply that or equivalently,
[TABLE]
Considering (67) at any and using the fact that for any , the definition of in (10), and the definitions of and , we conclude that
[TABLE]
Taking the infimum over immediately implies (63).
(c) Using (64), the fact that , and the definition of , it follows from part (b) that
[TABLE]
In view of the above proposition, we now outline a static penalty method for obtaining a –approximate stationary point of (51). First, let be given and select a penalty parameter satisfying (64). Second, obtain a –approximate stationary point of (59) using the R-AIPP method of Section 4 with starting point and inputs and , which satisfy assumptions (A1)–(A3) in view of Proposition 7(a) and assumptions and . Finally, compute the pair according to (62) and output the triple , which is a –approximate stationary point of (51) in view of Proposition 7(c). Using (61) with , the definitions in (61), the fact that , and the complexity bound for the R-AIPP method described in Proposition 6 with , it is easy to see that the ACG iteration complexity of the outlined method is
[TABLE]
where and the last quantity ignores any constants aside from the tolerances. A drawback of this static penalty method is that it requires in its first step the selection of a single parameter , which is generally difficult to obtain. This issue can be circumvented by considering a dynamic cold–started penalty method in which the static penalty method is repeated for a sequence of increasing values of and common starting point . It can be shown that the resulting cold–started dynamic penalty method has an ACG iteration complexity that is still on the same order as (68). Note that the bound (68) is actually when (see (C2)) but our interest lies in the case where since an initial point is generally not known.
The QP-AIPP method of WJRproxmet1 is a modified cold–started dynamic penalty method like the one just outlined, but which replaces the R-AIPP method called in step 2 of the static penalty method with the AIPP method of WJRproxmet1 . It has been shown in (WJRproxmet1, , Theorem 18) that its ACG iteration complexity bound for finding a –approximate stationary point of (1) is . This bound is established without assuming that is bounded and is clearly better than the one in (68).
The next subsection considers a warm–started dynamic penalty method, similar to the one described immediately after Proposition 7, in which the input to the R-AIPP call for solving the next penalty subproblem is chosen to be the output from the R-AIPP call for solving the current one. It is shown in Theorem 5.1 of Subsection 5.2 that its ACG iteration complexity is , which is the same as the one for the QP-AIPP method up to a logarithmic factor. As a side remark, we note that although a warm–started version of the QP-AIPP method in WJRproxmet1 can be also considered, the aforementioned ACG iteration complexity bound was derived for its cold–started version.
5.2 The R-QP-AIPP method
The goal of this subsection is to describe the R-QP-AIPP method, i.e., the warm–started dynamic penalty method mentioned at the end of Subsection 5.1, and establish its corresponding ACG iteration complexity.
We start by describing the R-QP-AIPP method.
R-QP-AIPP method.
Input: a problem instance of the form in (51), a scalar , a tolerance pair , an initial point , a scalar , and a pair of parameters ;
Output: a triple satisfying (56);
- (0)
set and ;
- (1)
set and
[TABLE]
call the R-AIPP method on (4) with inputs , , , , , and , to obtain a -approximate stationary point of (4), and set
[TABLE]
- (2)
if the residual
[TABLE]
then return ; otherwise, set , increment , and go to step 1.
Before giving some remarks about the above method, we discuss its general structure. Every loop of the R-QP-AIPP method invokes in its step 1 the R-AIPP method of Section 4 to compute a -approximate stationary point of the current penalty subproblem (59). The latter method in turn uses the R-ACG algorithm of Section 3 as a subroutine in its implementation (see step 1 of the R-AIPP method). Moreover, step 1 of the R-QP-AIPP implements a warm–start strategy, namely, the input point of the current R-AIPP call is set to be the output point of the previous R-AIPP call.
We now make three remarks about the R-QP-AIPP method. First, it follows from Proposition 7(b) that, for every , the triple satisfies the inclusions and the first inequality in (56). Second, since every loop of the R-QP-AIPP method doubles , the condition (64) will be eventually satisfied. Hence, in view of Proposition 7(c), the pair corresponding to this will satisfy the condition and the R-QP-AIPP method will stop in view of its stopping criterion in step 2. Finally, in view of the first and second remarks, we conclude that the R-QP-AIPP method outputs a triple satisfying (56).
Before deriving the ACG iteration complexity of the R-QP-AIPP method, we note that the number of ACG iterations needed in the execution of its step 1 depends on the quantity (see the left–hand–side of (68) with ). The result below shows that the warm–start strategy in step 1 of the method together with the boundedness of imply that the aforementioned quantity has an upper bound that is independent of the size of the parameter .
Lemma 3
Let and be as in step 0 and the input of the R-QP-AIPP method, respectively, and define
[TABLE]
where and are as in (59) and (62), respectively. Then, for every , we have
[TABLE]
Proof
The case in which follows trivially from the definition of in (69). Consider now the case in which . Remark that due to step 2 of R-QP-AIPP and (59) and that is a –approximate stationary point of (59) with due to the warm–start strategy in step 1 of the R-QP-AIPP method. It now follows from the aforementioned remarks, the last inequality in (60) with , and Proposition 7(b) with , that
[TABLE]
Grouping terms in the last expression together, using the definition of given in (69), and the fact that , we conclude that
[TABLE]
Combining (71) and (72) yields (70).
The following result establishes the iteration complexity of the R-QP-AIPP method with respect to the inputs and , the quantity in (10) with , and the tolerance pair .
Theorem 5.1
Given a tolerance pair , define
[TABLE]
where is given in (62). Then, defining , the R-QP-AIPP method outputs a –approximate stationary point of (51) in at most
[TABLE]
ACG iterations, where is as in (69).
Proof
Define and let be the smallest index such that . Since the R-QP-AIPP invokes the R-AIPP method with , it follows from Lemma 3 and Proposition 6, with , that the total number of ACG iterations at the iteration of the R-QP-AIPP method is on the order of
[TABLE]
Hence, the R-QP-AIPP method stops in a total number of ACG iterations bounded above by the sum of the quantity in (75) over .
We now focus on simplifying some of the quantities in the aforementioned sum. Using the fact that , we obtain the bound
[TABLE]
Now, if , then the above inequality implies that . Assume then that . Observe that the definition of implies that or, equivalently, . Combining the previous inequality with (76), we conclude that
[TABLE]
and also
[TABLE]
It now follows from (75), (77), and (78) that the R-QP-AIPP method stops in a total number of ACG iterations bounded by the quantity in (74).
The statement that is a –approximate stationary point follows from Proposition 7(b) and the termination condition in step 2 of the R-QP-AIPP method.
We now make three remarks about the complexity bound in (74). First, in terms of the tolerance pair , it is , which improves upon the complexity in (68) by a factor. Second, unless is large or is small, the first term in (74) dominates the second one.
We now end this section by discussing some possible choices of the initial stepsize and how the corresponding R-QP-AIPP instances compare to the QP-AIPP method of WJRproxmet1 . First, recall that the QP-AIPP method requires the knowledge of an upper bound on such that , and remark that, under the same assumptions of this paper, it can be shown using (WJRproxmet1, , Theorem 18) that its ACG iteration complexity is
[TABLE]
Now, if as above is also known to the R-AIPP and the input is set to , then the ACG iteration complexity (74) reduces to
[TABLE]
which is the same as (74) up to a logarithmic factor. On the other hand, if is chosen so that then (74) reduces to
[TABLE]
whose dominant first term is as good as the dominant first term in (79) when .
6 Numerical experiments
This section presents computational results that highlight the performance of the R-AIPP and R-QP-AIPP methods. It contains three subsections. The first subsection compares three variants of the R-AIPP method against three state-of-the-art nonconvex composite optimization algorithms. The second subsection uses the six algorithms in the first subsection as subroutines in a quadratic penalty method similar to the one in Section 5. More specifically, given an algorithm out of the six algorithms in the first subsection, a corresponding quadratic penalty method is considered in which steps 0 to 2 of the R-QP-AIPP method in Section 5 are executed with algorithm replacing the R-AIPP method. The third subsection presents a summary of the numerical experiments.
We first describe the three different R-AIPP variants considered. While the second variant does not assume knowledge of an upper bound on the quantity in (10), the first and third variants do in order to determine their initial stepsize . More specifically, the first variant, referred to as R-AIPPc, is the R-AIPP method with initial stepsize chosen to be . As opposed to the two algorithms explained below, which can adaptively change between iterations, this algorithm is a constant stepsize method (see Lemma 2 and the paragraph following it). The second variant, referred to as R-AIPPv1, is the R-AIPP method with initial stepsize chosen to be . Since is relatively large in the experiments considered, is halved in some of its outer iterations. The third variant, referred to as R-AIPPv2, is a variant of the R-AIPP method with initial stepsize chosen to be . This variant modifies the R-AIPP method by adding conditions that allow the stepsize to increase between subproblems. More specifically, the R-AIPPv2 method doubles the value of at the end of iteration when: (a) has never been halved in step 1 or 2 and (b) the number of inner iterations performed by the R-ACG algorithm in step 1 is less than 250. All R-AIPP variants are run with , a problem–specific value of , and adaptively estimate the constant that is used in each iteration of the R-ACG algorithm.
We now make three remarks about the above R-AIPP variants and the AIPP method of WJRproxmet1 . First, while both the R-AIPPc and AIPP method choose the stepsizes to be constant, the former method differs from the latter one in that it uses a more relaxed criterion, i.e., (17) and (18), for solving the prox subproblem (6). Moreover, the limited numerical experiments in Appendix A.4 show that this relaxation drastically improves upon the efficiency of the AIPP method, regardless of the magnitude of the ratio . As we believe that this effect would observed in the other problem instances of this section, we choose not to include the AIPP method as part of our suite of benchmark algorithms for the sake of brevity. Second, the R-AIPPv1 and R-AIPPv2 methods differ from the R-AIPPc method in that they permit the stepsizes to be significantly larger than the constant ones chosen for the R-AIPPc method. As will be observed in the numerical experiments below, this can drastically improve the efficiency of the adaptive stepsize R-AIPP variants. Third, in view of the descriptions of the R-AIPP variants in the previous paragraph, both the R-AIPPc and R-AIPPv1 methods are instances of the R-AIPP method while the R-AIPPv2 method is not. However, the R-AIPPv2 method is clearly an instance of the GD framework, and hence a similar analysis to the one in Section 4 may be used to establish its ACG iteration complexity. For sake of brevity we omit its analysis in this paper.
We now describe the three other nonconvex composite optimization algorithms considered. The first algorithm is an implementation of the unified problem-parameter free accelerated gradient (UPFAG) method that is proposed and analyzed in Ghadimi2019 . The particular implementation considered is the UPFAG-fullBB method, which utilizes a Barzilai–Borwein type stepsize selection strategy and is described in (Ghadimi2019, , Section 4). Its input parameters include and . The second algorithm is an implementation of the NC-FISTA method in liang2019fistatype . The particular implementation considered uses input parameters . The third algorithm is an implementation of the accelerated gradient (AG) method that is proposed and analyzed in nonconv_lan16 . The particular implementation considered is Algorithm 2, which is described in (nonconv_lan16, , Section 2).
Finally, we state some additional details about the numerical experiments. First, for each linearly–set–constrained problem of the form given in (51), the quadratic penalty method used to solve it starts with the initial penalty parameter chosen to be . Second, each algorithm is run with a time limit of 4000 seconds. If an algorithm does not terminate with a solution for a particular problem instance, we do not report any details about its iteration count or function value at the point of termination and the runtime for that instance is marked with a [*] symbol. Third, the iterations listed in the tables this section include backtracking iterations if a parameter line search method is used as part of the algorithm. Finally, all algorithms described at the beginning of this section are implemented in MATLAB 2019a and are run on Linux 64-bit machines each containing Xeon E5520 processors and at least 8 GB of memory.
6.1 Unconstrained problems
This subsection examines the performance of the R-AIPP method as a nonconvex composite optimization solver for solving problems of the form given in (4). Given a function pair satisfying assumptions (A1)–(A3) with , tolerance , and an initial point , each algorithm seeks a pair satisfying
[TABLE]
Two problems are considered, namely: (i) the quadratic matrix problem; and (ii) the support vector machine problem in Ghadimi2019 .
All methods that terminated within 4000 seconds converged to the same objective value, which, for each table in this subsection, is given in a column labeled . The bold numbers in each of the aforementioned tables highlight the algorithm that performed the most efficiently in terms of iteration count or total runtime.
6.1.1 Quadratic matrix problem
Given a pair of dimensions , scalar pair , linear operators and defined by
[TABLE]
for matrices , positive diagonal matrix , and vector , this sub–subsection considers the following quadratic matrix (QM) problem:
[TABLE]
where denotes the –dimensional spectraplex.
We now describe the experiment parameters for the instances considered. First, the dimensions were set to be and only 2.5% of the entries of the submatrices and being nonzero. Second, the entries of , and (resp., ) are generated by sampling from the uniform distribution (resp., ). Third, the initial starting point is , where is the -dimensional identity matrix. Fourth, with respect to the termination criterion (82), the inputs, for every , are
[TABLE]
Fifth, the R-AIPP variants used a parameter value of . Finally, each problem instance considered is based on a specific curvature pair for which the scalar pair is selected so that and .
We now present the numerical tables for this set of problem instances. We start with instances in which is fixed.
We now present instances where .
6.1.2 Support vector machine problem
Given a pair of dimensions , matrix and vector this sub–subsection considers the following (sigmoid) support vector machine (SVM) problem
[TABLE]
where denotes the column of .
We now describe the experiment parameters for the instances considered. First, the entries of are generated by sampling from the uniform distribution , with only 5% of the entries being nonzero, and where the entries of are sampled from the uniform distribution over the –dimensional ball centered at 0 with radius 50. Second, the initial starting point is . Third, the curvature parameters for each problem instance are Fourth, with respect to the termination criterion (82), the inputs, for every , are
[TABLE]
Fifth, the R-AIPP variants used a parameter value of . Finally, each problem instance considered is based on a specific dimension pair .
We now present the numerical tables for this set of problem instances.
6.2 Linearly constrained problems
This subsection examines the performance of the R-QP-AIPP method as a nonconvex linearly–set–constrained composite optimization solver for solving problems of the form given in (51). Given a linear operator , convex set , function pair satisfying assumptions (C1)–(C3), tolerance pair , and an initial point , each algorithm seeks a triple satisfying
[TABLE]
Three problems are considered, namely: (i) the linearly–constrained quadratic matrix problem; (ii) the sparse principal component analysis problem in NIPS2014_5615 ; and (iii) the bounded matrix completion problem in yao2017efficient .
The bold numbers in each of the tables in this subsection highlight the algorithm that performed the most efficiently in terms of iteration count or total runtime.
6.2.1 Linearly–constrained quadratic matrix problem
Given a pair of dimensions , scalar pair , linear operators , , and defined by
[TABLE]
for matrices , positive diagonal matrix , and vector pair , this sub–subsection considers the following linearly–constrained quadratic matrix (LCQM) problem:
[TABLE]
where denotes the –dimensional spectraplex.
We now describe the experiment parameters for the instances considered. First, the dimensions were set to be and only 1.0% of the entries of the submatrices and being nonzero. Second, the entries of , and (resp., ) were generated by sampling from the uniform distribution (resp., ). Third, the initial starting point was chosen to be a random point in . More specifically, three unit vectors and three scalars are first generated by sampling vectors and scalars and setting and for . The initial iterate for the first subproblem is then set to . Fourth, with respect to the termination criterion (82), the inputs, for every , are
[TABLE]
Fifth, the R-AIPP variants used a parameter value of . Finally, each problem instance considered is based on a specific curvature pair for which the scalar pair is selected so that and .
We now present the numerical tables for this set of problem instances.
6.2.2 Sparse principal component analysis problem
Given integer , positive scalar pair , and matrix , this sub–subsection considers the following sparse principal component analysis (PCA) problem:
[TABLE]
where denotes the –Fantope and is the minimax concave penalty (MCP) function given by
[TABLE]
We now describe the experiment parameters for the instances considered. First, the scalar parameters are chosen to be . Second, the matrix is generated according to an eigenvalue decomposition , based on a parameter pair , where is as in the problem description and is a positive integer. In particular, we choose , the first column of to be a sparse vector whose first entries are , and the other entries of to be sampled randomly from the standard Gaussian distribution. Third, the initial starting point is where is a diagonal matrix whose first entries are 1 and whose remaining entries are 0. Fourth, the curvature parameters for each problem instance are Fifth, with respect to the termination criterion (82), the inputs, for every , are
[TABLE]
Sixth, the R-AIPP variants used a parameter value of . Finally, each problem instance considered is based on a specific parameter pair where is part of the process of generating (see the second description above).
We now present the numerical tables for this set of problem instances.
6.2.3 Bounded matrix completion problem
Given a dimension pair , positive scalar triple , scalar pair , matrix , and indices , this sub–subsection considers the following bounded matrix completion (BMC) problem:
[TABLE]
where denotes the nuclear norm, the function is the linear operator that zeros out any entry not in , the function denotes the largest singular value of , and
[TABLE]
We now describe the experiment parameters for the instances considered. First, the matrix is the user–movie ratings data matrix of the MovieLens 100K dataset111See the MovieLens 100K dataset containing 610 users and 9724 movies, which is found in https://grouplens.org/datasets/movielens/., the index set is the set of nonzero entries in , and the dimension pair is set to be . Second, the initial starting point was chosen to be . Third, the curvature parameters for each problem instance are and and the bounds are set to . Fourth, with respect to the termination criterion (82), the inputs, for every , are
[TABLE]
Fifth, the R-AIPP variants used a parameter value of . Finally, each problem instance considered is based on a specific parameter triple .
We now present the numerical tables for this set of problem instances.
6.3 Summary of the numerical experiments
All three variants of the R-AIPP method perform well (relative to the other methods) in the numerical experiments of this section. The R-AIPPv2 method, in particular, is the best performing method in a large proportion of both the unconstrained and constrained problem instances. A potential explanation is that the stepsizes generated by this method may become significantly larger than the initial stepsize parameters and used in the R-AIPPv1 and R-AIPPc methods, respectively, which in view of the third remark following Proposition 2, speeds up the convergence of the quantity to zero.
Moreover, the adaptive stepsize R-AIPP variants, namely, the R-AIPPv1 and R-AIPPv2 methods, have been shown to perform well regardless of the size of the ratio (see, for example, Tables 1–4). This is a significant improvement over the AIPP method of WJRproxmet1 which has only been shown to perform well when the ratio is large (see, for example, Table 13).
7 Concluding remarks
Observing the arguments used in the proofs of Proposition 7, Lemma 3, and Theorem 5.1, it is straightforward to see that the assumption of being bounded can be relaxed to assuming that the iterates generated by R-QP-AIPP method of Section 5 be bounded. Explicitly assuming that the iterates satisfy , for every and some , the resulting ACG iteration complexity of R-QP-AIPP method is (74) with replaced by the quantity
[TABLE]
where is as in step 0 of the method, , the quantity is as in (10) with , and the quantities and are from the input of the R-QP-AIPP method and (59). It should be noted however that we were not able to show that the iterates is bounded. Hence, it is still an open problem to establish the iteration complexity of R-QP-AIPP when is unbounded.
Note that the description of the R-AIPP (resp. R-QP-AIPP) method of Section 4 (resp. Section 5) does not actually require knowledge of an upper bound on the parameter in (10). This is in contrast to the AIPP (resp. QP-AIPP) method of WJRproxmet1 , which requires in order to establish its validity and iteration complexity. In addition, one could consider a R-AIPP (resp. R-QP-AIPP) variant in which the quantity (resp. ) is adaptively inferred from its iterates rather than requiring knowledge of its value beforehand. While for the sake of brevity we omit the formal description and analysis of such a variant in this paper, we conjecture that the iteration complexity of the R-AIPP (resp. R-QP-AIPP) variant is as in (44) (resp. (74)) with (resp. ) replaced with a quantity that lower bounds it, e.g., the maximum of the lower estimates of (resp. ) which are inferred by the generated iterates.
Appendix A Appendix
This appendix contains proofs and statements of several technical results used in the main body of the paper. It contains three subsections. The first subsection consists of proofs about the refinement procedure of Section 2; the second subsection consists of proofs about the R-ACG algorithm of Section 3; and the third subsection consists of technical results related to Section 5.
A.1 Properties of the refinement procedure
Proof (of Proposition 1)
It follows from (WJRproxmet1, , Lemma 19) with that and
[TABLE]
Dividing by and rearranging terms yields
[TABLE]
Adding to both sides and using the definition of gives
[TABLE]
which is the inclusion in (16).
We now bound . Since (WJRproxmet1, , Lemma 19) implies that and is –Lipschitz continuous then
[TABLE]
which is the inequality in (16).
A.2 Properties of the R-ACG algorithm
Proof (of Proposition 5(a))
Let denote the quantity in (39). Assume that the R-ACG algorithm has performed -iterations without declaring failure. In view of step 2 of the R-ACG algorithm, it follows that both (34) and (35) hold for every . We will show that it must stop successfully at the end of the iteration, and hence that the conclusion of the lemma holds. Indeed, note that (38), (39), and the fact that for all implies that
[TABLE]
Combining the triangle inequality, (34), the fact that and from (86), and the relation for all , we obtain
[TABLE]
On the other hand, using the triangle inequality and the fact that for every (under the choice of ), we obtain
[TABLE]
Combining the previous estimates, we then conclude that
[TABLE]
which, after a simple algebraic manipulation, easily implies that
[TABLE]
Using the first term in the maximum of (40) together with the second inequality of (88) immediately implies that (36) holds with . To show that (37) holds at , observe that the definition of in (27), (35) with , the second inequality of (88), and the second term in the maximum of (40) imply that
[TABLE]
A.3 Results related to Section 5
Lemma 4
Assume that satisfy assumptions (C1) and (C3) in Section 5, and that, in addition, is lower semicontinuous on . Then, is a proper lower semicontinuous function which has a global minimum over .
Proof
Suppose . Since is closed, there exists such that for every satisfying . Hence, . Now suppose . By the lower semicontinuity of and we have
[TABLE]
and, since is differentiable on , the function is proper lower semicontinuous with . The last statement of the lemma follows from the well known fact that infimum of a lower semicontinuous function over a bounded set, namely, , is always attained.
A.4 Comparison with the AIPP method
This subsection presents some computational results that compare the AIPP method of WJRproxmet1 with the R-AIPPc method described at the beginning of Section 6. The main problem of interest for this sub-subsection is the quadratic matrix problem described in Sub-subsection 6.1.1.
We now describe the particular implementation of the AIPP method used in this sub-subsection, which differs from its description in WJRproxmet1 in two ways. First, its innermost subroutine, namely, the ACG method, stops immediately when a quadruple satisfying (22) is found. Second, for each iteration of the method, a triple is generated from the refinement procedure in Section 2 by assigning , and the method stops with the desired output when satisfies condition (82).
All experiment parameters for the R-AIPPc method and the problem instances are as described in Sub-subsection 6.1.1 below, while the AIPP uses a parameter input of for its results.
We now present the numerical tables for this set of problem instances.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] B. Amir. First-order methods in optimization , volume 25. SIAM, 2017.
- 2[2] N.S. Aybat and G. Iyengar. A first-order smoothed penalty method for compressed sensing. SIAM J. Optim. , 21(1):287–313, 2011.
- 3[3] N.S. Aybat and G. Iyengar. A first-order augmented Lagrangian method for compressed sensing. SIAM J. Optim. , 22(2):429–459, 2012.
- 4[4] A. Beck and M. Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. , 2(1):183–202, 2009.
- 5[5] Y. Carmon, J.C. Duchi, O. Hinder, and A. Sidford. Accelerated methods for nonconvex optimization. SIAM J. Optim. , 28(2):1751–1772, 2018.
- 6[6] C. Cartis, N. Gould, and P. Toint. On the complexity of steepest descent, Newton’s and regularized Newton’s methods for nonconvex unconstrained optimization problems. SIAM J. Optim. , 20(6):2833–2852, 2010.
- 7[7] Y. Chen, G. Lan, and Y. Ouyang. Optimal primal-dual methods for a class of saddle point problems. SIAM J. Optim. , 24(4):1779–1814, 2014.
- 8[8] D. Drusvyatskiy and C. Paquette. Efficiency of minimizing compositions of convex functions and smooth maps. Math. Program. , pages 1–56, 2018.
