Exact augmented Lagrangian functions for nonlinear semidefinite programming
Ellen H. Fukuda, Bruno F. Louren\c{c}o

TL;DR
This paper develops a unified framework for constructing exact augmented Lagrangian functions for nonlinear semidefinite programming, enabling reformulation into unconstrained problems with proven differentiability and exactness.
Contribution
It generalizes previous work to NSDP, introduces a practical exact augmented Lagrangian function, and proves its properties under nondegeneracy conditions.
Findings
The proposed augmented Lagrangian is continuously differentiable.
The function is exact under nondegeneracy conditions.
Preliminary numerical experiments demonstrate its effectiveness.
Abstract
In this paper, we study augmented Lagrangian functions for nonlinear semidefinite programming (NSDP) problems with exactness properties. The term exact is used in the sense that the penalty parameter can be taken appropriately, so a single minimization of the augmented Lagrangian recovers a solution of the original problem. This leads to reformulations of NSDP problems into unconstrained nonlinear programming ones. Here, we first establish a unified framework for constructing these exact functions, generalizing Di Pillo and Lucidi's work from 1996, that was aimed at solving nonlinear programming problems. Then, through our framework, we propose a practical augmented Lagrangian function for NSDP, proving that it is continuously differentiable and exact under the so-called nondegeneracy condition. We also present some preliminary numerical experiments.
| Iterations | Evaluations | Initial | Final | Time (s) | |
|---|---|---|---|---|---|
| 5 | 114.62 | 371.22 | 21.98 | 805.2 | 0.208 |
| 10 | 520.96 | 1844.62 | 23.31 | 1000.0 | 1.923 |
| 15 | 1191.62 | 4297.74 | 24.77 | 1000.0 | 10.170 |
| 20 | 2101.02 | 7801.00 | 25.39 | 1000.0 | 42.490 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Exact Augmented Lagrangian Functions for
Nonlinear Semidefinite Programming††thanks: This is a pre-print of an article published in Computational Optimization and Applications. The final authenticated version is available online at: https://doi.org/10.1007/s10589-018-0017-z. This work was supported by the Grant-in-Aid for Young Scientists (B) (26730012) and for Scientific Research (B) (15H02968) from Japan Society for the Promotion of Science.
Ellen H. Fukuda Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto 606–8501, Japan ([email protected]).
Bruno F. Lourenço Department of Mathematical Informatics, Graduate School of Information Science & Technology, University of Tokyo, Tokyo 113–8656, Japan ([email protected]).
(June 20, 2018)
Abstract
In this paper, we study augmented Lagrangian functions for nonlinear semidefinite programming (NSDP) problems with exactness properties. The term exact is used in the sense that the penalty parameter can be taken appropriately, so a single minimization of the augmented Lagrangian recovers a solution of the original problem. This leads to reformulations of NSDP problems into unconstrained nonlinear programming ones. Here, we first establish a unified framework for constructing these exact functions, generalizing Di Pillo and Lucidi’s work from 1996, that was aimed at solving nonlinear programming problems. Then, through our framework, we propose a practical augmented Lagrangian function for NSDP, proving that it is continuously differentiable and exact under the so-called nondegeneracy condition. We also present some preliminary numerical experiments.
Keywords: Differentiable exact merit functions, generalized augmented Lagrangian functions, nonlinear semidefinite programming.
1 Introduction
The following nonlinear semidefinite programming (NSDP) problem is considered:
[TABLE]
where and are twice continuously differentiable functions, is the linear space of all real symmetric matrices of dimension , and is the cone of all positive semidefinite matrices in . For simplicity, here we do not take equality constraints into consideration. The above problem extends the well-known nonlinear programming (NLP) and the linear semidefinite programming (linear SDP) problems. NLP and linear SDP models are certainly important, but they may be insufficient in applications where more general constraints are necessary. In particular, in the recent literature, some applications of NSDP are considered in different fields, such as control theory [3, 16], structural optimization [24, 26], truss design problems [4], and finance [25]. However, compared to NLP and linear SDP models, there are still few methods available to solve NSDP, and the theory behind them requires more investigation.
Some theoretical issues associated to NSDP, like optimality conditions, are discussed in [8, 18, 23, 28, 32]. There are, in fact, some methods for NSDPs proposed in the literature such as primal-dual interior-point, augmented Lagrangian, filter-based, sequential quadratic programming, and exact penalty methods. Nevertheless, there are few implementations and, as far as we know, only two general-purpose solvers are able to handle nonlinear semidefinite constraints: PENLAB/PENNON [17] and NuOpt [36]. For a complete survey, see Yamashita and Yabe [35], and references therein.
Here, our main object of interest is the so-called augmented Lagrangian functions and this work can be seen as a stepping stone towards new algorithms for NSDPs. An augmented Lagrangian function is basically the usual Lagrangian function with an additional term that depends on a positive coefficient, called the penalty parameter. When there exists an appropriate choice of the parameter, such that a single minimization of the augmented Lagrangian recovers a solution to the original problem, then we say that this function is exact. This is actually the same definition of the so-called exact penalty function. The difference is that an exact augmented Lagrangian function is defined on the product space of the problem’s variables and the Lagrange multipliers, and an exact penalty function is defined on the same space of the original problem’s variables. Both exact functions, which are also called exact merit functions, have been studied quite extensively when the original problem is an NLP.
The first proposed exact merit functions were nondifferentiable, and the basic idea was to incorporate terms into the objective function that penalize constraint violations. However, unconstrained minimization of nondifferentiable functions demands special methods, and so, continuously differentiable exact functions were considered subsequently. For NLPs, both exact penalty and exact augmented Lagrangian functions were studied. The first one has an advantage of having to deal with less variables, but it tends to have a more complicated formula, because the information of the Lagrange multipliers is, in some sense, hidden in the formula. In most exact penalty functions, this is done by using a function that estimates the value of the Lagrange multipliers associated to a point [12]. The evaluation of this estimate is, however, computationally expensive.
To overcome such a drawback, exact augmented Lagrangian functions can be considered, with a price of increasing the number of variables. The choice between these two types of exact functions depends, of course, on the optimization problem at hand. So, exact augmented Lagrangian functions were proposed in [10] and [11], by Di Pillo and Grippo for NLP problems with equality and inequality constraints, respectively. They were further investigated in [6, 13, 15, 29], with additional theoretical issues and schemes for box-constrained NLP problems. However, as far as we know, there are no proposals for exact augmented Lagrangian functions for more general conic constrained problems, in particular, for NSDP. The augmented Lagrangian function considered by Correa and Ramírez [9], and Shapiro and Sun [33], for example, is not exact.
In this paper, we introduce a continuously differentiable exact augmented Lagrangian function for NSDP problems. We also give a unified framework for constructing such functions. More precisely, we propose a generalized augmented Lagrangian function for NSDP, and give conditions for it to be exact. The main difference between the classical (and not exact) augmented Lagrangian and this exact version is the addition of a term, that we define in Section 3 as . This is a continuously differentiable function defined in the product space of problem’s variables and the Lagrange multipliers, with key properties that guarantee the exactness of the augmented Lagrangian function. A general framework with such term was also given by Di Pillo and Lucidi in [14] for the NLP case. Besides the optimization problem, a difference between [14] and our work is that, here, we propose the generalization first, and then construct one particular exact augmented Lagrangian function. We believe that the generalized function can be used in the future to easily build other exact merit functions, together with possibly useful methods. Meanwhile, we make some preliminary numerical experiments with the particular exact function, using a quasi-Newton method.
The paper is organized as follows. In Section 2, we start with basic definitions and necessary results associated to NSDP problems. In Section 3, a general framework for constructing augmented Lagrangian with exactness properties is given. A practical exact augmented Lagrangian as well as its exactness results are given in Section 4. This particular function is used in Section 5, where some numerical examples are presented. We conclude in Section 6, with some final remarks.
2 Preliminaries
Let us first present some basic notations that will be used throughout the paper. Let be a -dimensional column vector and a symmetric matrix with dimension . We use and to denote the th element of and entry (th row and th column) of , respectively. We also use the notation and to denote and , respectively. The trace of is denoted by . Moreover, if , then the inner product of and is written as , and the Frobenius norm of is given by . The identity matrix, with dimension defined in each context, is denoted by , and denotes the projection onto the cone .
For a function , its gradient and Hessian at a point are given by and , respectively. For , denotes the matrix with term given by the partial derivatives . If , then its gradient at with respect to and are denoted by and , respectively. Similarly, the Hessian of at with respect to is written as . For any linear operator defined by with , , and , the adjoint operator is defined by
[TABLE]
Given a mapping , its derivative at a point is denoted by and defined by
[TABLE]
where are the partial derivative matrices.
One important operator that is necessary when dealing with NSDP problems is the Jordan product associated to the space . For any , it is defined by
[TABLE]
Taking , we also denote by the linear operator given by
[TABLE]
Since we are only considering the space of symmetric matrices, we have . In the following lemmas, we present some useful results associated to this Jordan product and the projection operator .
Lemma 2.1**.**
For any matrix , the following statements hold:
- (a)
;
- (b)
.
Proof.
See [34, Section 1]. ∎
Lemma 2.2**.**
If , then the following statements are equivalent:
- (a)
* and ;*
- (b)
* and ;*
- (c)
.
Proof.
It follows from [5, Section 8.12] and [34, Lemma 2.1(b)]. ∎
Lemma 2.3**.**
The following statements hold.
- (a)
Let be a differentiable function, and define as . Then, the gradient of at is given by
[TABLE]
A similar result holds when the domain of the functions and is changed to .
- (b)
Let be differentiable functions, and define as . Then, we have
[TABLE]
- (c)
Let be a differentiable function, and define as , with . Then, we obtain
[TABLE]
- (d)
Let be a differentiable function, and define as . Then, we have
[TABLE]
Proof.
Item (a) follows from [27, Corollary 3.2] and item (b) follows easily from the definitions of adjoint operator and Jordan product. Item (c) holds also from the definition of adjoint operator, and because for all . For item (d), observe that for all , we obtain
[TABLE]
where is the directional derivative of at in the direction . From the differentiability of , we have . Recalling that denotes the adjoint of , this equality yields
[TABLE]
for all , which completes the proof. ∎
Let us return to problem (NSDP). Define as the Lagrangian function associated to problem (NSDP), that is,
[TABLE]
The pair satisfies the KKT conditions of problem (NSDP) (or, it is a KKT pair) if the following conditions hold:
[TABLE]
where
[TABLE]
The above conditions are necessary for optimality under a constraint qualification. Moreover, Lemma 2.2 shows that the condition can be replaced by because and hold. Furthermore, it can be shown that this condition can also be replaced by [35, Section 2].
Now, consider the nonlinear programming below:
[TABLE]
where , and is a penalty parameter. Observe that the above problem is unconstrained, with both the original variable and the Lagrange multiplier as variables. As usual, we say that is stationary of (or for problem (2.2)) when . We use and to denote the sets of global and local minimizers, respectively, of problem (2.2). We also define and as the set of global and local minimizers of problem (NSDP), respectively. Using such notations, we present the formal definition of exact augmented Lagrangian functions.
Definition 2.4**.**
A function is called an exact augmented Lagrangian function associated to (NSDP) if, and only if, there exists satisfying the following:
- (a)
For all , if , then and is a corresponding Lagrange multiplier. Conversely, if with as a corresponding Lagrange multiplier, then for all .
- (b)
For all , if , then and is a corresponding Lagrange multiplier.
Basically, the above definition shows that is an exact augmented Lagrangian function when, without considering Lagrange multipliers, there are equivalence between the global minimizers, and if all local solutions of (2.2) are local solutions of (NSDP), for penalty parameters greater than a threshold value. It means that the original constrained conic problem (NSDP) can be replaced with an unconstrained nonlinear programming problem (2.2) when the penalty parameter is chosen appropriately. Note that the definition of exact penalty functions is similar. The only difference is that in the exact penalty case, the objective function of problem (2.2) does not involve Lagrange multipliers explicitly.
3 A general framework
In this section, we propose a general formula for continuously differentiable augmented Lagrangian functions associated to NSDP problems, with exactness properties. It can be seen as a generalization of the one proposed by Di Pillo and Lucidi in [14] for NLP problems. With this purpose, let us first define the following function :
[TABLE]
Observe that this function is continuously differentiable because and are both continuously differentiable. Moreover, it has the properties below.
Lemma 3.1**.**
Let be defined by (3.1). Then, the following statements hold.
- (a)
If and , then .
- (b)
If , then for all .
Proof.
(a) Clearly, because is a cone. From Lemma 2.2, we have . Thus, taking the square of the Frobenius norm in both sides of this expression gives the result.
(b) Since , we obtain . Using this fact and the nonexpansive property of the projection, we get
[TABLE]
Thus, the result follows by squaring both sides of the above inequality. ∎
We propose a generalized augmented Lagrangian function as follows:
[TABLE]
where is a penalty parameter, , and is given in (3.1), namely
[TABLE]
We will show now that is an exact augmented Lagrangian function associated to (NSDP) in the sense of Definition 2.4, when certain assumptions for , , and are satisfied.
Assumption 3.2**.**
The functions satisfy the following conditions.
- (a)
* are continuously differentiable for all .*
- (b)
* for all feasible for (NSDP), , and all .*
Moreover, if is a KKT pair of (NSDP), then the conditions below hold.
- (c)
* for all .*
- (d)
, , and .
- (e)
There exist neighborhoods and of and , respectively, and a continuous function such that and for all .
Proposition 3.3**.**
Suppose that Assumption 3.2(a) holds. Then, the function defined in (3.2) is continuously differentiable. Moreover, its gradient with respect to and , respectively, can be written as follows:
[TABLE]
Proof.
The continuous differentiability of follows from Assumption 3.2(a) and the fact that , , and are continuously differentiable. For the gradient’s formula, we use Lemma 2.3(a),(c),(d) and some simple calculations. ∎
Before proving the exactness results, we will first show the relation between the function and the objective function of (NSDP). As we can see in the next propositions, the values of and at KKT points coincide, but if a point is only feasible, then a simple inequality holds.
Proposition 3.4**.**
Suppose that Assumption 3.2(b) holds. Let be a feasible point of (NSDP). Then, for all and all .
Proof.
Let and be taken arbitrarily. Since is feasible for (NSDP), we have . Thus, Lemma 3.1(b) shows that is satisfied. The proof is complete because also holds from Assumption 3.2(b). ∎
Proposition 3.5**.**
Suppose that Assumption 3.2 holds. Let be a KKT pair of (NSDP). Then, is also stationary of , and for all .
Proof.
Let be arbitrarily given and recall the formulas of and given in Proposition 3.3. From Assumption 3.2(b),(c), we have . So, from the KKT conditions (2.1), , , and also hold, which imply that
[TABLE]
from Lemma 3.1(a). Moreover, Lemma 2.2 shows that
[TABLE]
The equalities (3.3), (3.4) and Assumption 3.2(d) yield . Moreover, from (3.4) and Assumption 3.2(c), we have
[TABLE]
So, once again using Assumption 3.2(d), equalities (3.3), (3.4) and the KKT condition , we can conclude that holds. Finally, (3.3) and Assumption 3.2(d) also yields , and the proof is complete. ∎
The above proposition shows that a KKT pair of (NSDP) is stationary of , and this assertion does not depend on the parameter . The exactness properties of can be shown only if the other implication also holds, that is, a stationary point of should be a KKT pair of (NSDP), at least when is greater than some threshold value. If such a statement holds, then the exactness of is guaranteed, as it can be seen below. Before that, we recall that the global (local) minimizers of problems (NSDP) and (2.2) with are, respectively, denoted by () and (). We also consider the following assumption:
Assumption 3.6**.**
The sets and are nonempty for all . Moreover, for every there is at least one such that is a KKT pair of (NSDP).
The existence of optimal solutions of the unconstrained problem is guaranteed if an extraneous compact set is considered [12], or by exploiting some properties of the problem, as coercivity and monotonicity [1]. Furthermore, we can ensure the existence of a Lagrange multiplier by imposing some constraint qualification. Now, if we define
[TABLE]
then, using Assumption 3.6, we obtain .
Lemma 3.7**.**
Suppose that Assumptions 3.2 and 3.6 hold. Then for all , implies .
Proof.
Let be arbitrarily given and . By assumption, also holds, and thus, is a KKT pair of (NSDP). From Proposition 3.5, we obtain
[TABLE]
Recall that because of Assumption 3.6. So, take , with . Since satisfies the KKT conditions of (NSDP), once again from Proposition 3.5, we have . This fact, together with (3.6), and the definition of global solutions, gives
[TABLE]
which shows that the whole expression above holds with equalities. Therefore, , which completes the proof. ∎
Theorem 3.8**.**
Suppose that Assumption 3.2 and 3.6 hold. Assume also that there exists such that every stationary point of is also a KKT pair of (NSDP) for all . Then, is an exact augmented Lagrangian function associated to (NSDP), in other words:
- (a)
* for all .*
- (b)
L_{\mbox{\tiny{NLP}}}(c)\subseteq\big{\{}(x,\Lambda)\colon x\in L_{\mbox{\tiny{NSDP}}}\mbox{ and }\Lambda\mbox{ is a corresponding multiplier}\big{\}}* for all .*
Proof.
(a) Let be arbitrarily given. From Lemma 3.7, we only need to prove that . Let . From (3), we need to show that , with as a corresponding Lagrange multiplier. Then, is a stationary point of . From this theorem’s assumption, it is also a KKT pair of (NSDP), which implies from Proposition 3.5. Now, assume that there exists such that . Since satisfies a constraint qualification from Assumption 3.6, there exists such that satisfies the KKT conditions of (NSDP). Once again by Proposition 3.5, we have . So, the definition of global minimizers gives
[TABLE]
which shows that the whole expression above holds with equalities. Therefore, , with as a corresponding Lagrange multiplier.
(b) Let and . Since is a stationary point of , it is also a KKT pair of (NSDP) by theorem’s assumption. So, from Proposition 3.5, we have
[TABLE]
Moreover, from the definition of local minimizer, there exist neighborhoods and of and , respectively, such that
[TABLE]
Here, we suppose that and are sufficiently small, which guarantees the existence of a function as in Assumption 3.2(e). In particular, we obtain for all . This inequality, together with (3.7), shows that
[TABLE]
Thus, from Proposition 3.4 and Assumption 3.2(e), we get
[TABLE]
for all that is feasible for (NSDP). So, we conclude that . ∎
The above result shows that the generalized function is an exact augmented Lagrangian function if a finite penalty parameter satisfying
[TABLE]
is guaranteed to exist. However, even if Assumption 3.2 holds, we usually cannot expect that such exists. In the next section, we will observe that the functions , , and , used in the formula of , should be taken carefully for such a purpose.
4 The proposed exact augmented Lagrangian function
Here, we construct a particular exact augmented Lagrangian function by choosing the functions , , and , used in (formula (3.2)) appropriately. Before that, let us note that by defining
[TABLE]
where is the penalty parameter, we obtain the augmented Lagrangian function for NSDP given in [9, 33]. This function is actually an extension of the classical augmented Lagrangian function for NLP (see [6] for instance), and it is equal to the Lagrangian function with some additional terms. However, it is not exact in the sense of Definition 2.4.
In order to construct an augmented Lagrangian function with exactness property, we choose a more complex , that satisfies Assumption 3.2(d),(e). As in [14], the function of item (e) can be taken as a function that estimates the value of the Lagrange multipliers associated to a point. One possibility for such an estimate for NSDP problems is given in [21], which in turn extends the ones proposed in [19, 20]. Basically, given , we consider the following unconstrained problem:
[TABLE]
where are positive scalars, and denotes the residual function associated to the feasible set, that is,
[TABLE]
Observe that if, and only if, is feasible for (NSDP). The idea underlying problem (4.1) is to force KKT conditions (2.1) to hold, except for the feasibility of the Lagrange multiplier. Actually, this problem can be seen as a linear least squares problem, and so its solution can be written explicitly, when the so-called nondegeneracy assumption holds. It is well-known that the nondegeneracy condition, defined below, extends the classical linear independence constraint qualification for nonlinear programming [8, 32], see also Section 4 and Corollary 2 in [28]. In particular, under nondegeneracy, Lagrange multiplies are ensured to exist at optimal points.
Assumption 4.1**.**
Every feasible for (NSDP) is nondegenerate, that is,
[TABLE]
where denotes the tangent cone of at , is the image of the linear map , and means lineality space.
Lemma 4.2**.**
Suppose that Assumption 4.1 holds. For a given , define as
[TABLE]
Then, the following statements are true.
- (a)
* is continuously differentiable and for all , the matrix is positive definite.*
- (b)
The solution of problem (4.1) is unique and it is given by
[TABLE]
- (c)
If is a KKT pair of (NSDP), then .
- (d)
The operator is continuously differentiable, and , where
[TABLE]
Proof.
See [21, Lemma 2.2 and Proposition 2.3]. ∎
The augmented Lagrangian function that we propose is given by
[TABLE]
where and are given in Lemma 4.2. It is equivalent to the usual augmented Lagrangian function for NSDP, except for the last term. So, comparing to the generalized one (3.2), we have
[TABLE]
Observe that the functions defined in such a way satisfy Assumption 3.2. In fact, items (a), (b), and (c) of this assumption hold trivially, and item (d) is satisfied because of Lemma 4.2(c). The function of item (e) corresponds to , with the necessary properties described in Lemma 4.2(c),(d). Note that for all in this case.
Now, from (4.2) and (4.3), observe that
[TABLE]
Also, consider the following auxiliary function defined by
[TABLE]
The gradient of with respect to is given by
[TABLE]
with
[TABLE]
where the second equality follows from (4.6). Using Lemma 2.3(a), as well as some additional calculations, we obtain
[TABLE]
Moreover, the gradient of with respect to can be written as follows:
[TABLE]
Here, we point out that the formulas of , and , presented respectively in (4.4), (4) and (4.9), do not require explicit computation of the multiplier estimate . In fact, the estimate only appears in the expression , that can be written as (4.6). It means that both and their gradients do not require solving the linear least squares problem (4.1), which is computational expensive.
4.1 Exactness results
In the whole section, we suppose that Assumptions 3.6 and 4.1 hold. Indeed, it can be noted that the assertion about constraint qualifications in Assumption 3.6 holds automatically from Assumption 4.1. Here, we will show that the particular augmented Lagrangian , defined in (4.4), is in fact exact. With this purpose, we will first establish the relation between the KKT points of the original (NSDP) problem and the stationary points of the unconstrained problem:
[TABLE]
Proposition 4.3**.**
Let be a KKT pair of (NSDP). Then, for all , and is stationary of , that is, and .
Proof.
Recalling that the functions defined in (4.5) satisfy Assumption 3.2, the result follows from Proposition 3.5. ∎
Proposition 4.4**.**
Let be feasible for (NSDP) and . So, there exist such that if is stationary of with , and , then is a KKT pair of (NSDP).
Proof.
Let us first consider an arbitrary pair and . For convenience, we also define the following function:
[TABLE]
with the last equality following from Lemma 2.1(a). Lemma 2.1(b) and the definition of in (4.7) show that
[TABLE]
The above expression can be rewritten using the distributivity of the Jordan product:
[TABLE]
Moreover, from (4.11), the above equality, the distributivity and the commutativity of the Jordan product, we obtain
[TABLE]
Using this expression, we have
[TABLE]
Now, the formula of in (4) and the equality (4.6) show that
[TABLE]
Thus, from (4.12), we have
[TABLE]
where
[TABLE]
Let us now consider that is stationary of . Since , we obtain, from (4.9),
[TABLE]
because is nonsingular from Lemma 4.2(a). Recalling that also holds, then, from (4.13) we obtain
[TABLE]
where
[TABLE]
Using the fact that for any matrices , from (4.15), we can write
[TABLE]
Moreover, the definition of projection and Lemma 2.1(a) yield
[TABLE]
The above inequality, together with (4.16) implies
[TABLE]
where denotes the smallest singular value function.
Recalling that is feasible for (NSDP), we observe that if , then for all . Since , this also shows that
[TABLE]
with defined in (4.2). Now, note that from Lemma 4.2(a), is positive definite. Also, define
[TABLE]
Observe that is continuous because all functions involved in its formula are continuous, and that , which is positive definite. Therefore, there is such that implies that is also positive definite.
Letting , there exist with such that both and are positive definite for all in the set
[TABLE]
Now, we would like to prove that there is such that is positive definite for all and all in . To do so, suppose that this statement is false. Then, there are sequences and such that and is not positive definite for all . Since is compact, we may assume that converges to some . However, we have
[TABLE]
Since is positive definite, should also be positive definite for sufficiently large. This contradicts the fact that no is positive definite, by construction. We conclude that there is such that is positive definite for all and all in .
Considering one such , we now seek some such that is nonsingular for all . We remark that we already know that is positive over . Denote by the maximum singular value function. Then, recalling the formula for and elementary properties of singular values111Namely, that ., we have that if
[TABLE]
then will be nonsingular for all . As is a continuous function of , we have that is finite. Thus, is positive definite, and hence for all and . Similarly, there exists such that
[TABLE]
Finally, consider and such that is stationary of , , and . From (4.17) and the above inequality, it means that . Also, from Lemma 2.2, and the fact that , it yields
[TABLE]
Moreover, from (4.14) and the fact that is nonsingular by Lemma 4.2(a), we have . Since also holds, from (4), we obtain . Therefore, is a KKT pair of (NSDP). ∎
Proposition 4.5**.**
Let , , and be sequences such that and is stationary of for all . Assume that there are subsequences and of and , respectively, such that and for some . Then, either there exists such that is a KKT pair of (NSDP) for all , or is a stationary point of the residual function that is infeasible for (NSDP).
Proof.
We first show that is a stationary point of , in other words,
[TABLE]
In fact, using (4) and dividing the equation by , we have
[TABLE]
Recalling Lemma 4.2, we observe that all the functions involved in the above equation are continuous. Thus, taking the limit , from the definition of in (4.7), we obtain , as we claimed. Now, assume that is feasible. Then, from Proposition 4.4, there exists such that is a KKT pair of (NSDP) for all , which completes the proof. ∎
Now, recalling Definition 2.4, once again, we use the notations () and () to denote the sets of global (local) minimizers of problems (NSDP) and (4.10), respectively. The following theorems show that the proposed function given in (4.4) is in fact an exact augmented Lagrangian function. However, the results are established as in [2], where it is admitted that we can end up with a stationary point of the residual function that is infeasible for (NSDP).
Theorem 4.6**.**
Let , , and be sequences such that and for all . Assume that there are subsequences and of and , respectively, such that and for some . Then, either there exists such that , with an associated Lagrange multiplier for all , or is a stationary point of the residual function that is infeasible for (NSDP).
Proof.
From Proposition 4.5, either there exists such that is a KKT pair of (NSDP) for all , or is a stationary point of that is infeasible for (NSDP). So, the result follows in the same way as the proof of Theorem 3.8(b). ∎
For the above result, that concerns local minimizers, we note that the existence of the subsequence is guaranteed, for example, when the whole sequence is bounded. Moreover, if the constraint function is convex with respect to the cone , then the residual function is also convex, which means that all stationary points of are feasible for (NSDP). In the case of global minimizers, it is possible to prove full equivalence, and we do not have to concern about stationary points of that are infeasible for (NSDP).
Proposition 4.7**.**
Let , , and be sequences such that and for some , , and for all . Then, there exists such that with an associated Lagrange multiplier for all .
Proof.
Let , which exists by Assumption 3.6. Because of the nondegeneracy constraint qualification, there exists such that is a KKT pair of (NSDP). So, from Proposition 4.3, holds for all . Moreover, since , we have
[TABLE]
for all . Taking the supremum limit in this inequality, we obtain
[TABLE]
Observe now that the formula of in (4.4) can be written equivalently as
[TABLE]
Recall from Lemma 4.2 that the functions involved in the above equality are all continuous. This fact, together with inequality (4.19), shows that , that is, is feasible. So, from Proposition 4.5, we conclude that there exists such that is a KKT pair of (NSDP) for all . Since and the norm is always nonnegative, (4.20) implies . Again, taking the supremum limit in such an inequality, we have
[TABLE]
which, together with (4.19) shows that . Thus, holds.
Now, since is feasible for (NSDP), there exist as in Proposition 4.4. Consider large enough so that , , , and for all . Since is stationary of , from Proposition 4.4, we obtain that is also a KKT pair of (NSDP) for all . Once again from Proposition 4.3 and (4.18), we have . Therefore, for all . ∎
Theorem 4.8**.**
Assume that there exists such that is bounded. Then, there exists such that for all , where is defined in (3).
Proof.
From Lemma 3.7, we only need to show the existence of such that for all . Assume that this statement is false. Then, there exist sequences and with , , and , but such that . Since is bounded, we can assume, without loss of generality, that and for some . Thus, Proposition 4.7 shows that there exists such that for all , which is a contradiction. ∎
5 Preliminary numerical experiments
This work is focused on the theoretical aspects of exact augmented Lagrangian functions, however, we take a look at the numerical prospects of our approach by examining two simple problems in the next two subsections. We will now explain briefly our proposal. Given a problem (NSDP), the idea is to use some unconstrained optimization method to solve (4.10), i.e., minimization of given in (4.4). First, an initial point , together with the initial penalty parameter are selected. Then, we choose the initial Lagrange multiplier . One possibility is to use the multiplier estimate, i.e., to set as , where is defined in (4.3).
Then, we run the unconstrained optimization method of our choice. However, since the penalty values for which becomes exact is not known beforehand, we attempt to adjust the penalty parameter between iterations as follows. Let and . Denote by and , the values of , and at the th iteration, respectively. Recalling the function defined in (4.7), if at the th iteration we have
[TABLE]
then we let be . Otherwise, we let be . This is an idea that appears in many augmented Lagrangian methods, for example in [7]. The motivation is that is zero if and only if , and . Therefore, is a measure of the degree to which complementarity and feasibility are satisfied, taking into account the current penalty parameter. In summary, whenever there is not enough progress, we increase the penalty. In order to avoid the problem becoming too ill-conditioned, we never increase the penalty past some fixed value . We also point out that the update of the penalty parameter can be done by using the so-called test function, which is originally defined in [20]. However, as it can be seen in the paper about exact penalty functions [2], the above approach using is more efficient, which justifies its use here.
In our implementation, , and is set to , and , respectively. The maximum number of iterations is . The values of and , which control the behavior of in (4.2), are set to and , respectively. The initial penalty parameter is computed using the formula:
[TABLE]
with , which is similar to the one used in [7]. The unconstrained method of our choice is the BFGS method using the Armijo’s condition for the line search. We stop the algorithm when the KKT conditions are satisfied within or when the norm of the gradient of is less than .
We implemented the algorithm in Python and ran all the experiments on a Intel Core i7-6700 machine with 8 cores and 16GB of memory. As we already pointed out after (4.9), an important implementational aspect is that we never need to explicitly evaluate the function , except in the optional way of computing the initial Lagrange multiplier.
5.1 Noll’s example
As an initial example, we took a look at this simple instance by Noll [30]:
[TABLE]
The problem (Noll) is already in the format (NSDP), which can be seen by letting be the function defined by
[TABLE]
In order to compute and its gradient (see (4.4), (4.6), (4) and (4.9)), we need the partial derivative matrices of and the adjoint of the gradient of , which are given below:
[TABLE]
where is an arbitrary matrix with entry denoted by .
The optimal value of (Noll) is and it is achieved at . Starting at , our method found a solution satisfying the optimality criteria in 14 iterations and seconds. The initial and final penalty parameters were and , respectively. The objective function, the constraint function and their gradients were evaluated times each.
5.2 The closest correlation matrix problem
Let be a symmetric matrix. The goal is to find a correlation matrix that is as close as possible to . In other words, we seek a solution to the following problem:
[TABLE]
There are many variants of (Cor) where weighting factors are added, constraints on the eigenvalues are considered, and so on. With that, this family of problems has found of wealth of applications in statistics and finance [22].
In this example, it is possible to show that the nondegeneracy condition is satisfied at every feasible point (Assumption 4.1), which guarantee the theoretical properties of the exact augmented Lagrangian function . This was proved by Qi and Sun in [31], but since there are some differences in notation, we will first take a look at this issue. Qi and Sun proved the following result.
Proposition 5.1**.**
Let be such that , for all . Then,
[TABLE]
where is the linear map that maps a symmetric matriz to its diagonal .
Proof.
See Proposition 2.1 and Equation (2.2) in [31].∎∎
We now write (Cor) in a format similar to (NSDP). For that, denote by the the matrix that has in the and entries and [math] elsewhere. Then, by discarding constant terms in the objective function, (Cor) can be reformulated equivalently as follows.
[TABLE]
Here, can be thought as an upper triangular matrix without the diagonal. That is why the dimension of is and we index by using for .
Proposition 5.2**.**
Problem (Cor2) satisfies Assumption 4.1.
Proof.
We must show that
[TABLE]
where is the function such that
[TABLE]
Now, let be arbitrary. We can write as
[TABLE]
From Proposition 5.1, the first summation belongs to . Then, noting that is the space spanned by , we conclude that the second summation belongs to . This shows that Assumption 4.1 is satisfied. ∎∎
We now write some useful formulae which can be used in conjunction with (4.6), (4) and (4.9) to compute and its gradient. Let be an arbitrary symmetric matrix. Then,
[TABLE]
for , where corresponds to the upper triangular part of without the diagonal.
We now move on to the experiments. We generated symmetric matrices such that the diagonal entries are all and non-diagonal elements are uniform random numbers between and . This was repeated for . We then ran our algorithm using as initial point the matrix having in all its entries. The results can be seen in Table 1. All the values depicted in Table 1 are averages among runs. The column “Iterations” correspond to average number of BFGS iterations. At each run, we recorded the number of function evaluations for , which is the same for and . Then, the column “Evaluations” in Table 1 is the average number of function evaluations. Columns “Initial ” and “Final ” correspond to the average of the initial and final penalty parameters, respectively. Finally, column “Time (s)” is the average running time, in seconds.
No failures were detected, that is, we obtained approximate KKT points within for all the instances. We also observed that, except for , the final penalty parameter climbed up to the maximum value. At first glance, this suggests that the penalty was not large enough. However, we were still able to solve the problems without increasing the maximum value. In fact, we noted that, in some cases, the performance degraded when the maximum penalty value was increased. At this moment, the method presented here is not competitive against the approach in [31], where we observed that an instance is typically solved in less than a second in our hardware. However, it should be emphasized that a second-order method is used in [31], where here we used BFGS. It would be interesting to apply and analyze a second-order method in combination with the exact augmented Lagrangian function , but this investigation is beyond the scope of this paper.
6 Final remarks
We proposed a generalized augmented Lagrangian function for NSDP problems, giving conditions for it to be exact. After that, we considered a particular function , and we proved that it is exact under the nondegeneracy condition and some reasonable assumptions as in Theorem 4.8. We also presented some preliminary numerical experiments using a quasi-Newton method with BFGS formula, showing the validity of the approach. One future work is to analyze more efficient methods that can solve the unconstrained minimization of . From Lemma 4.2 and the formula of , given in (4.4), we observe that is an function, i.e., it is continuously differentiable and its gradient is semismooth. It means that the unconstrained problem can be solved with second-order methods, as the semismooth Newton. However, the gradient of , given in (4), contains second-order terms of problem functions and , and thus, a second-order method would have to deal with third-order derivatives. We believe that we can use the idea proposed in [19], that avoids these third-order terms, but still guaranteeing the global superlinear convergence.
By using the generalized function as a tool, other practical exact augmented Lagrangian functions can be studied for NSDP, or other important conic optimization problems. In fact, as it can be seen in [14], many other exact augmented Lagrangian functions exist, but only for the classical nonlinear programming. For example, recalling (4.5), we note that is defined by choosing functions and as constants. By considering more sophisticated formulas, it is possible to weaken the assumptions used here. This should be another matter of investigation.
Acknowledgements
We would like to thank the anonymous referees for their suggestions which improved the original version of the paper. We are also thankful to Akiko Kobayashi for valuable discussions about exact augmented Lagrangian functions.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] T. A. André and P. J. S. Silva. Exact penalties for variational inequalities with applications to nonlinear complementarity problems. Computational Optimization and Applications , 47(3):401–429, 2010.
- 2[2] R. Andreani, E. H. Fukuda, and P. J. S. Silva. A Gauss-Newton approach for solving constrained optimization problems using differentiable exact penalties. Journal of Optimization Theory and Applications , 156(2):417–449, 2013.
- 3[3] P. Apkarian, D. Noll, and H. D. Tuan. Fixed-order H ∞ subscript 𝐻 {H}_{\infty} control design via a partially augmented Lagrangian method. International Journal of Robust and Nonlinear Control , 13(12):1137–1148, 2003.
- 4[4] A. Ben-Tal, F. Jarre, M. Kočvara, A. Nemirovski, and J. Zowe. Optimal design of trusses under a nonconvex global buckling constraint. Optimization and Engineering , 1(2):189–213, 2000.
- 5[5] D. S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas . Princeton University Press, 2nd edition, 2009.
- 6[6] D. P. Bertsekas. Constrained Optimization and Lagrange Multipliers Methods . Academic Press, New York, 1982.
- 7[7] E. G. Birgin and J. M. Martínez. Practical augmented Lagrangian methods for constrained optimization . Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014.
- 8[8] J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer-Verlag, New York, 2000.
