Exact augmented Lagrangian functions for nonlinear semidefinite   programming

Ellen H. Fukuda; Bruno F. Louren\c{c}o

arXiv:1705.06551·math.OC·June 27, 2018·Comput. Optim. Appl.

Exact augmented Lagrangian functions for nonlinear semidefinite programming

Ellen H. Fukuda, Bruno F. Louren\c{c}o

PDF

TL;DR

This paper develops a unified framework for constructing exact augmented Lagrangian functions for nonlinear semidefinite programming, enabling reformulation into unconstrained problems with proven differentiability and exactness.

Contribution

It generalizes previous work to NSDP, introduces a practical exact augmented Lagrangian function, and proves its properties under nondegeneracy conditions.

Findings

01

The proposed augmented Lagrangian is continuously differentiable.

02

The function is exact under nondegeneracy conditions.

03

Preliminary numerical experiments demonstrate its effectiveness.

Abstract

In this paper, we study augmented Lagrangian functions for nonlinear semidefinite programming (NSDP) problems with exactness properties. The term exact is used in the sense that the penalty parameter can be taken appropriately, so a single minimization of the augmented Lagrangian recovers a solution of the original problem. This leads to reformulations of NSDP problems into unconstrained nonlinear programming ones. Here, we first establish a unified framework for constructing these exact functions, generalizing Di Pillo and Lucidi's work from 1996, that was aimed at solving nonlinear programming problems. Then, through our framework, we propose a practical augmented Lagrangian function for NSDP, proving that it is continuously differentiable and exact under the so-called nondegeneracy condition. We also present some preliminary numerical experiments.

Tables1

Table 1. Table 1: Results for ( Cor ).

$m$	Iterations	Evaluations	Initial $c$	Final $c$	Time (s)
5	114.62	371.22	21.98	805.2	0.208
10	520.96	1844.62	23.31	1000.0	1.923
15	1191.62	4297.74	24.77	1000.0	10.170
20	2101.02	7801.00	25.39	1000.0	42.490

Equations177

\begin{array}[]{ll}\underset{x}{\mbox{minimize}}&f(x)\\ \mbox{subject to}&G(x)\in\mathbb{S}_{+}^{m},\end{array}

\begin{array}[]{ll}\underset{x}{\mbox{minimize}}&f(x)\\ \mbox{subject to}&G(x)\in\mathbb{S}_{+}^{m},\end{array}

G^{*} Z = (⟨ G_{1}, Z ⟩, \dots, ⟨ G_{s}, Z ⟩)^{⊤}, Z \in S^{ℓ} .

G^{*} Z = (⟨ G_{1}, Z ⟩, \dots, ⟨ G_{s}, Z ⟩)^{⊤}, Z \in S^{ℓ} .

\nabla G (x) v = i = 1 \sum s v_{i} \frac{\partial G ( x )}{\partial x _{i}}, v \in R^{s},

\nabla G (x) v = i = 1 \sum s v_{i} \frac{\partial G ( x )}{\partial x _{i}}, v \in R^{s},

Y \circ Z := \frac{Y Z + Z Y}{2} .

Y \circ Z := \frac{Y Z + Z Y}{2} .

L_{Y} (Z) := Y \circ Z .

L_{Y} (Z) := Y \circ Z .

\nabla ψ (x) = 2\nabla Q (x)^{*} P_{S_{+}^{m}} (Q (x)) .

\nabla ψ (x) = 2\nabla Q (x)^{*} P_{S_{+}^{m}} (Q (x)) .

\nabla P (x)^{*} Z = [⟨ \frac{\partial R _{1} ( x )}{\partial x _{i}} \circ R_{2} (x) + R_{1} (x) \circ \frac{\partial R _{2} ( x )}{\partial x _{i}}, Z ⟩]_{i = 1}^{n} \mbox f or a l l Z \in S^{m} .

\nabla P (x)^{*} Z = [⟨ \frac{\partial R _{1} ( x )}{\partial x _{i}} \circ R_{2} (x) + R_{1} (x) \circ \frac{\partial R _{2} ( x )}{\partial x _{i}}, Z ⟩]_{i = 1}^{n} \mbox f or a l l Z \in S^{m} .

\nabla S (x)^{*} Z = ⟨ W, Z ⟩ \nabla ξ (x) \mbox f or a l l Z \in S^{m} .

\nabla S (x)^{*} Z = ⟨ W, Z ⟩ \nabla ξ (x) \mbox f or a l l Z \in S^{m} .

\nabla T (Y)^{*} Z = ⟨ Y, Z ⟩ \nabla η (Y) + η (Y) Z \mbox f or a l l Z \in S^{m} .

\nabla T (Y)^{*} Z = ⟨ Y, Z ⟩ \nabla η (Y) + η (Y) Z \mbox f or a l l Z \in S^{m} .

\nabla T (Y) (W)

\nabla T (Y) (W)

⟨ W, \nabla T (Y)^{*} Z ⟩ = ⟨ \nabla T (Y) (W), Z ⟩ = ⟨ \nabla η (Y), W ⟩ ⟨ Y, Z ⟩ + η (Y) ⟨ W, Z ⟩

⟨ W, \nabla T (Y)^{*} Z ⟩ = ⟨ \nabla T (Y) (W), Z ⟩ = ⟨ \nabla η (Y), W ⟩ ⟨ Y, Z ⟩ + η (Y) ⟨ W, Z ⟩

L (x, Λ) := f (x) - ⟨ G (x), Λ ⟩ .

L (x, Λ) := f (x) - ⟨ G (x), Λ ⟩ .

\begin{array}[]{rcl}\nabla_{x}L(x,\Lambda)&=&0,\\ \Lambda\circ G(x)&=&0,\\ G(x)&\in&\mathbb{S}_{+}^{m},\\ \Lambda&\in&\mathbb{S}_{+}^{m},\\ \end{array}

\begin{array}[]{rcl}\nabla_{x}L(x,\Lambda)&=&0,\\ \Lambda\circ G(x)&=&0,\\ G(x)&\in&\mathbb{S}_{+}^{m},\\ \Lambda&\in&\mathbb{S}_{+}^{m},\\ \end{array}

\nabla_{x} L (x, Λ) = \nabla f (x) - \nabla G (x)^{*} Λ.

\nabla_{x} L (x, Λ) = \nabla f (x) - \nabla G (x)^{*} Λ.

\begin{array}[]{ll}\underset{x,\Lambda}{\mbox{minimize}}&\Psi_{c}(x,\Lambda)\\ \mbox{subject to}&(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m},\end{array}

\begin{array}[]{ll}\underset{x,\Lambda}{\mbox{minimize}}&\Psi_{c}(x,\Lambda)\\ \mbox{subject to}&(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m},\end{array}

φ (Y, Z) := P_{S_{+}^{m}} (\frac{Z}{2} - Y)_{F}^{2} - \frac{∥ Z ∥ _{F}^{2}}{4} .

φ (Y, Z) := P_{S_{+}^{m}} (\frac{Z}{2} - Y)_{F}^{2} - \frac{∥ Z ∥ _{F}^{2}}{4} .

P_{S_{+}^{m}} (\frac{Z}{2} - Y)_{F} = P_{S_{+}^{m}} (\frac{Z}{2} - Y) - P_{S_{+}^{m}} (- Y)_{F} \leq \frac{Z}{2}_{F} .

P_{S_{+}^{m}} (\frac{Z}{2} - Y)_{F} = P_{S_{+}^{m}} (\frac{Z}{2} - Y) - P_{S_{+}^{m}} (- Y)_{F} \leq \frac{Z}{2}_{F} .

\mathcal{A}_{c}(x,\Lambda):=f(x)+\alpha_{c}(x,\Lambda)\varphi\big{(}G(x),\beta_{c}(x,\Lambda)\Lambda\big{)}+\gamma(x,\Lambda),

\mathcal{A}_{c}(x,\Lambda):=f(x)+\alpha_{c}(x,\Lambda)\varphi\big{(}G(x),\beta_{c}(x,\Lambda)\Lambda\big{)}+\gamma(x,\Lambda),

\varphi\big{(}G(x),\beta_{c}(x,\Lambda)\Lambda\big{)}=\left\|P_{\mathbb{S}^{m}_{+}}\left(\frac{\beta_{c}(x,\Lambda)}{2}\Lambda-G(x)\right)\right\|_{F}^{2}-\frac{\beta_{c}(x,\Lambda)^{2}}{4}\|\Lambda\|^{2}_{F}.

\varphi\big{(}G(x),\beta_{c}(x,\Lambda)\Lambda\big{)}=\left\|P_{\mathbb{S}^{m}_{+}}\left(\frac{\beta_{c}(x,\Lambda)}{2}\Lambda-G(x)\right)\right\|_{F}^{2}-\frac{\beta_{c}(x,\Lambda)^{2}}{4}\|\Lambda\|^{2}_{F}.

\nabla_{x} A_{c} (x, Λ)

\nabla_{x} A_{c} (x, Λ)

\nabla_{Λ} A_{c} (x, Λ)

\varphi\big{(}G(x),\beta_{c}(x,\Lambda)\Lambda\big{)}=0

\varphi\big{(}G(x),\beta_{c}(x,\Lambda)\Lambda\big{)}=0

P_{S_{+}^{m}} (\frac{β _{c} ( x , Λ )}{2} Λ - G (x)) = \frac{β _{c} ( x , Λ )}{2} Λ.

P_{S_{+}^{m}} (\frac{β _{c} ( x , Λ )}{2} Λ - G (x)) = \frac{β _{c} ( x , Λ )}{2} Λ.

\nabla f (x) - 2 α_{c} (x, Λ) \nabla G (x)^{*} P_{S_{+}^{m}} (\frac{β _{c} ( x , Λ )}{2} Λ - G (x)) = \nabla f (x) - \nabla G (x)^{*} Λ.

\nabla f (x) - 2 α_{c} (x, Λ) \nabla G (x)^{*} P_{S_{+}^{m}} (\frac{β _{c} ( x , Λ )}{2} Λ - G (x)) = \nabla f (x) - \nabla G (x)^{*} Λ.

\tilde{G}_{\mbox N S D P}

\tilde{G}_{\mbox N S D P}

f (\overset{x}{ˉ}) = A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) .

f (\overset{x}{ˉ}) = A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) .

A_{c} (\tilde{x}, \tilde{Λ}) = f (\tilde{x}) \leq f (\overset{x}{ˉ}) = A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) \leq A_{c} (\tilde{x}, \tilde{Λ}),

A_{c} (\tilde{x}, \tilde{Λ}) = f (\tilde{x}) \leq f (\overset{x}{ˉ}) = A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) \leq A_{c} (\tilde{x}, \tilde{Λ}),

A_{c} (\tilde{x}, \tilde{Λ}) = f (\tilde{x}) \leq f (\overset{x}{ˉ}) = A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) \leq A_{c} (\tilde{x}, \tilde{Λ}),

A_{c} (\tilde{x}, \tilde{Λ}) = f (\tilde{x}) \leq f (\overset{x}{ˉ}) = A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) \leq A_{c} (\tilde{x}, \tilde{Λ}),

A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) = f (\overset{x}{ˉ}) .

A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) = f (\overset{x}{ˉ}) .

A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) \leq A_{c} (x, Λ) \mbox f or a l l (x, Λ) \in V_{\overset{x}{ˉ}} \times V_{\overset{ˉ}{Λ}} .

A_{c} (\overset{x}{ˉ}, \overset{ˉ}{Λ}) \leq A_{c} (x, Λ) \mbox f or a l l (x, Λ) \in V_{\overset{x}{ˉ}} \times V_{\overset{ˉ}{Λ}} .

f (\overset{x}{ˉ}) \leq A_{c} (x, Γ (x)) \mbox f or a l l x \in V_{\overset{x}{ˉ}} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Exact Augmented Lagrangian Functions for

Nonlinear Semidefinite Programming††thanks: This is a pre-print of an article published in Computational Optimization and Applications. The final authenticated version is available online at: https://doi.org/10.1007/s10589-018-0017-z. This work was supported by the Grant-in-Aid for Young Scientists (B) (26730012) and for Scientific Research (B) (15H02968) from Japan Society for the Promotion of Science.

Ellen H. Fukuda Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto 606–8501, Japan ([email protected]).

Bruno F. Lourenço Department of Mathematical Informatics, Graduate School of Information Science & Technology, University of Tokyo, Tokyo 113–8656, Japan ([email protected]).

(June 20, 2018)

Abstract

In this paper, we study augmented Lagrangian functions for nonlinear semidefinite programming (NSDP) problems with exactness properties. The term exact is used in the sense that the penalty parameter can be taken appropriately, so a single minimization of the augmented Lagrangian recovers a solution of the original problem. This leads to reformulations of NSDP problems into unconstrained nonlinear programming ones. Here, we first establish a unified framework for constructing these exact functions, generalizing Di Pillo and Lucidi’s work from 1996, that was aimed at solving nonlinear programming problems. Then, through our framework, we propose a practical augmented Lagrangian function for NSDP, proving that it is continuously differentiable and exact under the so-called nondegeneracy condition. We also present some preliminary numerical experiments.

Keywords: Differentiable exact merit functions, generalized augmented Lagrangian functions, nonlinear semidefinite programming.

1 Introduction

The following nonlinear semidefinite programming (NSDP) problem is considered:

[TABLE]

where $f\colon\mathbb{R}^{n}\to\mathbb{R}$ and $G\colon\mathbb{R}^{n}\to\mathbb{S}^{m}$ are twice continuously differentiable functions, $\mathbb{S}^{m}$ is the linear space of all real symmetric matrices of dimension $m\times m$ , and $\mathbb{S}^{m}_{+}$ is the cone of all positive semidefinite matrices in $\mathbb{S}^{m}$ . For simplicity, here we do not take equality constraints into consideration. The above problem extends the well-known nonlinear programming (NLP) and the linear semidefinite programming (linear SDP) problems. NLP and linear SDP models are certainly important, but they may be insufficient in applications where more general constraints are necessary. In particular, in the recent literature, some applications of NSDP are considered in different fields, such as control theory [3, 16], structural optimization [24, 26], truss design problems [4], and finance [25]. However, compared to NLP and linear SDP models, there are still few methods available to solve NSDP, and the theory behind them requires more investigation.

Some theoretical issues associated to NSDP, like optimality conditions, are discussed in [8, 18, 23, 28, 32]. There are, in fact, some methods for NSDPs proposed in the literature such as primal-dual interior-point, augmented Lagrangian, filter-based, sequential quadratic programming, and exact penalty methods. Nevertheless, there are few implementations and, as far as we know, only two general-purpose solvers are able to handle nonlinear semidefinite constraints: PENLAB/PENNON [17] and NuOpt [36]. For a complete survey, see Yamashita and Yabe [35], and references therein.

Here, our main object of interest is the so-called augmented Lagrangian functions and this work can be seen as a stepping stone towards new algorithms for NSDPs. An augmented Lagrangian function is basically the usual Lagrangian function with an additional term that depends on a positive coefficient, called the penalty parameter. When there exists an appropriate choice of the parameter, such that a single minimization of the augmented Lagrangian recovers a solution to the original problem, then we say that this function is exact. This is actually the same definition of the so-called exact penalty function. The difference is that an exact augmented Lagrangian function is defined on the product space of the problem’s variables and the Lagrange multipliers, and an exact penalty function is defined on the same space of the original problem’s variables. Both exact functions, which are also called exact merit functions, have been studied quite extensively when the original problem is an NLP.

The first proposed exact merit functions were nondifferentiable, and the basic idea was to incorporate terms into the objective function that penalize constraint violations. However, unconstrained minimization of nondifferentiable functions demands special methods, and so, continuously differentiable exact functions were considered subsequently. For NLPs, both exact penalty and exact augmented Lagrangian functions were studied. The first one has an advantage of having to deal with less variables, but it tends to have a more complicated formula, because the information of the Lagrange multipliers is, in some sense, hidden in the formula. In most exact penalty functions, this is done by using a function that estimates the value of the Lagrange multipliers associated to a point [12]. The evaluation of this estimate is, however, computationally expensive.

To overcome such a drawback, exact augmented Lagrangian functions can be considered, with a price of increasing the number of variables. The choice between these two types of exact functions depends, of course, on the optimization problem at hand. So, exact augmented Lagrangian functions were proposed in [10] and [11], by Di Pillo and Grippo for NLP problems with equality and inequality constraints, respectively. They were further investigated in [6, 13, 15, 29], with additional theoretical issues and schemes for box-constrained NLP problems. However, as far as we know, there are no proposals for exact augmented Lagrangian functions for more general conic constrained problems, in particular, for NSDP. The augmented Lagrangian function considered by Correa and Ramírez [9], and Shapiro and Sun [33], for example, is not exact.

In this paper, we introduce a continuously differentiable exact augmented Lagrangian function for NSDP problems. We also give a unified framework for constructing such functions. More precisely, we propose a generalized augmented Lagrangian function for NSDP, and give conditions for it to be exact. The main difference between the classical (and not exact) augmented Lagrangian and this exact version is the addition of a term, that we define in Section 3 as $\gamma$ . This is a continuously differentiable function defined in the product space of problem’s variables and the Lagrange multipliers, with key properties that guarantee the exactness of the augmented Lagrangian function. A general framework with such $\gamma$ term was also given by Di Pillo and Lucidi in [14] for the NLP case. Besides the optimization problem, a difference between [14] and our work is that, here, we propose the generalization first, and then construct one particular exact augmented Lagrangian function. We believe that the generalized function can be used in the future to easily build other exact merit functions, together with possibly useful methods. Meanwhile, we make some preliminary numerical experiments with the particular exact function, using a quasi-Newton method.

The paper is organized as follows. In Section 2, we start with basic definitions and necessary results associated to NSDP problems. In Section 3, a general framework for constructing augmented Lagrangian with exactness properties is given. A practical exact augmented Lagrangian as well as its exactness results are given in Section 4. This particular function is used in Section 5, where some numerical examples are presented. We conclude in Section 6, with some final remarks.

2 Preliminaries

Let us first present some basic notations that will be used throughout the paper. Let $x\in\mathbb{R}^{r}$ be a $r$ -dimensional column vector and $Z\in\mathbb{S}^{s}$ a symmetric matrix with dimension $s\times s$ . We use $x_{i}$ and $Z_{ij}$ to denote the $i$ th element of $x$ and $(i,j)$ entry ( $i$ th row and $j$ th column) of $Z$ , respectively. We also use the notation $[x_{i}]_{i=1}^{r}$ and $[Z_{ij}]_{i,j=1}^{s}$ to denote $x$ and $Z$ , respectively. The trace of $Z$ is denoted by $\mathrm{tr}(Z):=\sum_{i=1}^{s}Z_{ii}$ . Moreover, if $Y\in\mathbb{S}^{s}$ , then the inner product of $Y$ and $Z$ is written as $\langle Y,Z\rangle:=\mathrm{tr}(YZ)$ , and the Frobenius norm of $Z$ is given by $\|Z\|_{F}:=\langle Z,Z\rangle^{1/2}$ . The identity matrix, with dimension defined in each context, is denoted by $I$ , and $P_{\mathbb{S}^{m}_{+}}$ denotes the projection onto the cone $\mathbb{S}^{m}_{+}$ .

For a function $p\colon\mathbb{R}^{s}\to\mathbb{R}$ , its gradient and Hessian at a point $x\in\mathbb{R}^{s}$ are given by $\nabla p(x)\in\mathbb{R}^{s}$ and $\nabla^{2}p(x)\in\mathbb{R}^{s\times s}$ , respectively. For $q\colon\mathbb{S}^{\ell}\to\mathbb{R}$ , $\nabla q(Z)$ denotes the matrix with $(i,j)$ term given by the partial derivatives $\partial q(Z)/\partial Z_{ij}$ . If $\psi\colon\mathbb{R}^{s}\times\mathbb{S}^{\ell}\to\mathbb{R}$ , then its gradient at $(x,Z)\in\mathbb{R}^{s}\times\mathbb{S}^{\ell}$ with respect to $x$ and $Z$ are denoted by $\nabla_{x}\psi(x,{Z})\in\mathbb{R}^{s}$ and $\nabla_{{Z}}\psi(x,{Z})\in\mathbb{S}^{\ell}$ , respectively. Similarly, the Hessian of $\psi$ at $(x,Z)$ with respect to $x$ is written as $\nabla_{xx}^{2}\psi(x,{Z})$ . For any linear operator $\mathcal{G}\colon\mathbb{R}^{s}\to\mathbb{S}^{\ell}$ defined by $\mathcal{G}v=\sum_{i=1}^{s}v_{i}\mathcal{G}_{i}$ with $\mathcal{G}_{i}\in\mathbb{S}^{\ell}$ , $i=1,\dots,s$ , and $v\in\mathbb{R}^{s}$ , the adjoint operator $\mathcal{G}^{*}$ is defined by

[TABLE]

Given a mapping $\mathcal{G}\colon\mathbb{R}^{s}\to\mathbb{S}^{\ell}$ , its derivative at a point $x\in\mathbb{R}^{s}$ is denoted by $\nabla\mathcal{G}(x)\colon\mathbb{R}^{s}\to\mathbb{S}^{\ell}$ and defined by

[TABLE]

where $\partial\mathcal{G}(x)/\partial x_{i}\in\mathbb{S}^{\ell}$ are the partial derivative matrices.

One important operator that is necessary when dealing with NSDP problems is the Jordan product associated to the space $\mathbb{S}^{m}$ . For any $Y,Z\in\mathbb{S}^{m}$ , it is defined by

[TABLE]

Taking $Y\in\mathbb{S}^{m}$ , we also denote by $\mathcal{L}_{Y}\colon\mathbb{S}^{m}\to\mathbb{S}^{m}$ the linear operator given by

[TABLE]

Since we are only considering the space $\mathbb{S}^{m}$ of symmetric matrices, we have $\mathcal{L}_{Y}(Z)=\mathcal{L}_{Z}(Y)$ . In the following lemmas, we present some useful results associated to this Jordan product and the projection operator $P_{\mathbb{S}^{m}_{+}}$ .

Lemma 2.1.

For any matrix $Z\in\mathbb{R}^{m\times m}$ , the following statements hold:

(a)

$P_{\mathbb{S}^{m}_{+}}(-Z)=P_{\mathbb{S}^{m}_{+}}(Z)-Z$ ;

(b)

$P_{\mathbb{S}^{m}_{+}}(Z)\circ P_{\mathbb{S}^{m}_{+}}(-Z)=0$ .

Proof.

See [34, Section 1]. ∎

Lemma 2.2.

If $Y,Z\in\mathbb{S}^{m}$ , then the following statements are equivalent:

(a)

$Y,Z\in\mathbb{S}^{m}_{+}$ * and $Y\circ Z=0$ ;*

(b)

$Y,Z\in\mathbb{S}^{m}_{+}$ * and $\langle Y,Z\rangle=0$ ;*

(c)

$Y-P_{\mathbb{S}^{m}_{+}}(Y-Z)=0$ .

Proof.

It follows from [5, Section 8.12] and [34, Lemma 2.1(b)]. ∎

Lemma 2.3.

The following statements hold.

(a)

Let $Q\colon\mathbb{R}^{n}\to\mathbb{S}^{m}$ be a differentiable function, and define $\psi\colon\mathbb{R}^{n}\to\mathbb{R}$ as $\psi(x):=\|P_{\mathbb{S}^{m}_{+}}(Q(x))\|^{2}_{F}$ . Then, the gradient of $\psi$ at $x\in\mathbb{R}^{n}$ is given by

[TABLE]

A similar result holds when the domain of the functions $Q$ and $\psi$ is changed to $\mathbb{S}^{m}$ .

(b)

Let $R_{1},R_{2}\colon\mathbb{R}^{n}\to\mathbb{S}^{m}$ be differentiable functions, and define $P\colon\mathbb{R}^{n}\to\mathbb{S}^{m}$ as $P(x):=\mathcal{L}_{R_{1}(x)}(R_{2}(x))=R1(x)\circ R_{2}(x)$ . Then, we have

[TABLE]

(c)

Let $\xi\colon\mathbb{R}^{n}\to\mathbb{R}$ be a differentiable function, and define $S\colon\mathbb{R}^{n}\to\mathbb{S}^{m}$ as $S(x):=\xi(x)W$ , with $W\in\mathbb{S}^{m}$ . Then, we obtain

[TABLE]

(d)

Let $\eta\colon\mathbb{S}^{m}\to\mathbb{R}$ be a differentiable function, and define $T\colon\mathbb{S}^{m}\to\mathbb{S}^{m}$ as $T(Y):=\eta(Y)Y$ . Then, we have

[TABLE]

Proof.

Item (a) follows from [27, Corollary 3.2] and item (b) follows easily from the definitions of adjoint operator and Jordan product. Item (c) holds also from the definition of adjoint operator, and because $\partial S(x)/\partial x_{i}=(\partial\xi(x)/\partial x_{i})W$ for all $i$ . For item (d), observe that for all $W\in\mathbb{S}^{m}$ , we obtain

[TABLE]

where $\eta^{\prime}(Y;W)$ is the directional derivative of $\eta$ at $Y$ in the direction $W$ . From the differentiability of $\eta$ , we have $\nabla T(Y)(W)=\langle\nabla\eta(Y),W\rangle Y+\eta(Y)W$ . Recalling that $\nabla T(Y)^{*}$ denotes the adjoint of $\nabla T(Y)$ , this equality yields

[TABLE]

for all $W,Z\in\mathbb{S}^{m}$ , which completes the proof. ∎

Let us return to problem (NSDP). Define $L\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ as the Lagrangian function associated to problem (NSDP), that is,

[TABLE]

The pair $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ satisfies the KKT conditions of problem (NSDP) (or, it is a KKT pair) if the following conditions hold:

[TABLE]

where

[TABLE]

The above conditions are necessary for optimality under a constraint qualification. Moreover, Lemma 2.2 shows that the condition $\Lambda\circ G(x)=0$ can be replaced by $\langle\Lambda,G(x)\rangle=0$ because $G(x)\in\mathbb{S}_{+}^{m}$ and $\Lambda\in\mathbb{S}_{+}^{m}$ hold. Furthermore, it can be shown that this condition can also be replaced by $\Lambda G(x)=0$ [35, Section 2].

Now, consider the nonlinear programming below:

[TABLE]

where $\Psi_{c}\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ , and $c>0$ is a penalty parameter. Observe that the above problem is unconstrained, with both the original variable $x$ and the Lagrange multiplier $\Lambda$ as variables. As usual, we say that $(x,\Lambda)$ is stationary of $\Psi_{c}$ (or for problem (2.2)) when $\nabla\Psi_{c}(x,\Lambda)=0$ . We use $G_{\mbox{\tiny{NLP}}}(c)$ and $L_{\mbox{\tiny{NLP}}}(c)$ to denote the sets of global and local minimizers, respectively, of problem (2.2). We also define $G_{\mbox{\tiny{NSDP}}}$ and $L_{\mbox{\tiny{NSDP}}}$ as the set of global and local minimizers of problem (NSDP), respectively. Using such notations, we present the formal definition of exact augmented Lagrangian functions.

Definition 2.4.

A function $\Psi_{c}\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ is called an exact augmented Lagrangian function associated to (NSDP) if, and only if, there exists $\hat{c}>0$ satisfying the following:

(a)

For all $c\geq\hat{c}$ , if $(x,\Lambda)\in G_{\mbox{\tiny{NLP}}}(c)$ , then $x\in G_{\mbox{\tiny{NSDP}}}$ and $\Lambda$ is a corresponding Lagrange multiplier. Conversely, if $x\in G_{\mbox{\tiny{NSDP}}}$ with $\Lambda$ as a corresponding Lagrange multiplier, then $(x,\Lambda)\in G_{\mbox{\tiny{NLP}}}(c)$ for all $c\geq\hat{c}$ .

(b)

For all $c\geq\hat{c}$ , if $(x,\Lambda)\in L_{\mbox{\tiny{NLP}}}(c)$ , then $x\in L_{\mbox{\tiny{NSDP}}}$ and $\Lambda$ is a corresponding Lagrange multiplier.

Basically, the above definition shows that $\Psi_{c}$ is an exact augmented Lagrangian function when, without considering Lagrange multipliers, there are equivalence between the global minimizers, and if all local solutions of (2.2) are local solutions of (NSDP), for penalty parameters greater than a threshold value. It means that the original constrained conic problem (NSDP) can be replaced with an unconstrained nonlinear programming problem (2.2) when the penalty parameter is chosen appropriately. Note that the definition of exact penalty functions is similar. The only difference is that in the exact penalty case, the objective function of problem (2.2) does not involve Lagrange multipliers explicitly.

3 A general framework

In this section, we propose a general formula for continuously differentiable augmented Lagrangian functions associated to NSDP problems, with exactness properties. It can be seen as a generalization of the one proposed by Di Pillo and Lucidi in [14] for NLP problems. With this purpose, let us first define the following function $\varphi\colon\mathbb{S}^{m}\times\mathbb{S}^{m}\to\mathbb{R}$ :

[TABLE]

Observe that this function is continuously differentiable because $\|\cdot\|^{2}_{F}$ and $\|P_{\mathbb{S}^{m}_{+}}(\cdot)\|^{2}_{F}$ are both continuously differentiable. Moreover, it has the properties below.

Lemma 3.1.

Let $\varphi\colon\mathbb{S}^{m}\times\mathbb{S}^{m}\to\mathbb{R}$ be defined by (3.1). Then, the following statements hold.

(a)

If $Y,Z\in\mathbb{S}^{m}_{+}$ and $\langle Y,Z\rangle=0$ , then $\varphi(Y,Z)=0$ .

(b)

If $Y\in\mathbb{S}^{m}_{+}$ , then $\varphi(Y,Z)\leq 0$ for all $Z\in\mathbb{S}^{m}$ .

Proof.

(a) Clearly, $Z/2\in\mathbb{S}^{m}_{+}$ because $\mathbb{S}^{m}_{+}$ is a cone. From Lemma 2.2, we have $Z/2=P_{\mathbb{S}^{m}_{+}}(Z/2-Y)$ . Thus, taking the square of the Frobenius norm in both sides of this expression gives the result.

(b) Since $Y\in\mathbb{S}^{m}_{+}$ , we obtain $P_{\mathbb{S}^{m}_{+}}(-Y)=0$ . Using this fact and the nonexpansive property of the projection, we get

[TABLE]

Thus, the result follows by squaring both sides of the above inequality. ∎

We propose a generalized augmented Lagrangian function $\mathcal{A}_{c}\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ as follows:

[TABLE]

where $c>0$ is a penalty parameter, $\alpha_{c},\beta_{c},\gamma\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ , and $\varphi$ is given in (3.1), namely

[TABLE]

We will show now that $\mathcal{A}_{c}$ is an exact augmented Lagrangian function associated to (NSDP) in the sense of Definition 2.4, when certain assumptions for $\alpha_{c}$ , $\beta_{c}$ , and $\gamma$ are satisfied.

Assumption 3.2.

The functions $\alpha_{c},\beta_{c},\gamma\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ satisfy the following conditions.

(a)

$\alpha_{c},\beta_{c},\gamma$ * are continuously differentiable for all $c>0$ .*

(b)

$\alpha_{c}(x,\Lambda)>0$ * for all $x$ feasible for (NSDP), $\Lambda\in\mathbb{S}^{m}$ , and all $c>0$ .*

Moreover, if $(\bar{x},\bar{\Lambda})\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ is a KKT pair of (NSDP), then the conditions below hold.

(c)

$\alpha_{c}(\bar{x},\bar{\Lambda})\beta_{c}(\bar{x},\bar{\Lambda})=1$ * for all $c>0$ .*

(d)

$\gamma(\bar{x},\bar{\Lambda})=0$ , $\nabla_{x}\gamma(\bar{x},\bar{\Lambda})=0$ , and $\nabla_{\Lambda}\gamma(\bar{x},\bar{\Lambda})=0$ .

(e)

There exist neighborhoods $V_{\bar{x}}$ and $V_{\bar{\Lambda}}$ of $\bar{x}$ and $\bar{\Lambda}$ , respectively, and a continuous function $\Gamma\colon V_{\bar{x}}\to V_{\bar{\Lambda}}$ such that $\Gamma(\bar{x})=\bar{\Lambda}$ and $\gamma(x,\Gamma(x))=0$ for all $x\in V_{\bar{x}}$ .

Proposition 3.3.

Suppose that Assumption 3.2(a) holds. Then, the function $\mathcal{A}_{c}$ defined in (3.2) is continuously differentiable. Moreover, its gradient with respect to $x$ and $\Lambda$ , respectively, can be written as follows:

[TABLE]

Proof.

The continuous differentiability of $\mathcal{A}_{c}$ follows from Assumption 3.2(a) and the fact that $f$ , $G$ , and $\varphi$ are continuously differentiable. For the gradient’s formula, we use Lemma 2.3(a),(c),(d) and some simple calculations. ∎

Before proving the exactness results, we will first show the relation between the function $\mathcal{A}_{c}$ and the objective function $f$ of (NSDP). As we can see in the next propositions, the values of $\mathcal{A}_{c}$ and $f$ at KKT points coincide, but if a point is only feasible, then a simple inequality holds.

Proposition 3.4.

Suppose that Assumption 3.2(b) holds. Let $x\in\mathbb{R}^{n}$ be a feasible point of (NSDP). Then, $\mathcal{A}_{c}(x,\Lambda)\leq f(x)+\gamma(x,\Lambda)$ for all $\Lambda\in\mathbb{S}^{m}$ and all $c>0$ .

Proof.

Let $\Lambda\in\mathbb{S}^{m}$ and $c>0$ be taken arbitrarily. Since $x$ is feasible for (NSDP), we have $G(x)\in\mathbb{S}^{m}_{+}$ . Thus, Lemma 3.1(b) shows that $\varphi(G(x),\beta_{c}(x,\Lambda)\Lambda)\leq 0$ is satisfied. The proof is complete because $\alpha_{c}(x,\Lambda)>0$ also holds from Assumption 3.2(b). ∎

Proposition 3.5.

Suppose that Assumption 3.2 holds. Let $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ be a KKT pair of (NSDP). Then, $(x,\Lambda)$ is also stationary of $\mathcal{A}_{c}$ , and $\mathcal{A}_{c}(x,\Lambda)=f(x)$ for all $c>0$ .

Proof.

Let $c>0$ be arbitrarily given and recall the formulas of $\nabla_{x}\mathcal{A}_{c}(x,\Lambda)$ and $\nabla_{\Lambda}\mathcal{A}_{c}(x,\Lambda)$ given in Proposition 3.3. From Assumption 3.2(b),(c), we have $\beta_{c}(x,\Lambda)>0$ . So, from the KKT conditions (2.1), $G(x)\in\mathbb{S}^{m}_{+}$ , $\beta_{c}(x,\Lambda)\Lambda\in\mathbb{S}^{m}_{+}$ , and $\langle G(x),\beta_{c}(x,\Lambda)\Lambda\rangle=0$ also hold, which imply that

[TABLE]

from Lemma 3.1(a). Moreover, Lemma 2.2 shows that

[TABLE]

The equalities (3.3), (3.4) and Assumption 3.2(d) yield $\nabla_{\Lambda}\mathcal{A}_{c}(x,\Lambda)=0$ . Moreover, from (3.4) and Assumption 3.2(c), we have

[TABLE]

So, once again using Assumption 3.2(d), equalities (3.3), (3.4) and the KKT condition $\nabla f(x)-\nabla G(x)^{*}\Lambda=0$ , we can conclude that $\nabla_{x}\mathcal{A}_{c}(x,\Lambda)=0$ holds. Finally, (3.3) and Assumption 3.2(d) also yields $\mathcal{A}_{c}(x,\Lambda)=f(x)$ , and the proof is complete. ∎

The above proposition shows that a KKT pair of (NSDP) is stationary of $\mathcal{A}_{c}$ , and this assertion does not depend on the parameter $c$ . The exactness properties of $\mathcal{A}_{c}$ can be shown only if the other implication also holds, that is, a stationary point of $\mathcal{A}_{c}$ should be a KKT pair of (NSDP), at least when $c$ is greater than some threshold value. If such a statement holds, then the exactness of $\mathcal{A}_{c}$ is guaranteed, as it can be seen below. Before that, we recall that the global (local) minimizers of problems (NSDP) and (2.2) with $\Psi_{c}:=\mathcal{A}_{c}$ are, respectively, denoted by $G_{\mbox{\tiny{NSDP}}}$ ( $L_{\mbox{\tiny{NSDP}}}$ ) and $G_{\mbox{\tiny{NLP}}}(c)$ ( $L_{\mbox{\tiny{NLP}}}(c)$ ). We also consider the following assumption:

Assumption 3.6.

The sets $G_{\mbox{\tiny{NSDP}}}$ and $G_{\mbox{\tiny{NLP}}}(c)$ are nonempty for all $c>0$ . Moreover, for every $x\in G_{\mbox{\tiny{NSDP}}}$ there is at least one $\Lambda\in\mathbb{S}^{m}$ such that $(x,\Lambda)$ is a KKT pair of (NSDP).

The existence of optimal solutions of the unconstrained problem is guaranteed if an extraneous compact set is considered [12], or by exploiting some properties of the problem, as coercivity and monotonicity [1]. Furthermore, we can ensure the existence of a Lagrange multiplier by imposing some constraint qualification. Now, if we define

[TABLE]

then, using Assumption 3.6, we obtain $\tilde{G}_{\mbox{\tiny{NSDP}}}\neq\emptyset$ .

Lemma 3.7.

Suppose that Assumptions 3.2 and 3.6 hold. Then for all $c>0$ , $G_{\mbox{\tiny{NLP}}}(c)\subseteq\tilde{G}_{\mbox{\tiny{NSDP}}}$ implies $G_{\mbox{\tiny{NLP}}}(c)=\tilde{G}_{\mbox{\tiny{NSDP}}}$ .

Proof.

Let $c>0$ be arbitrarily given and $(\bar{x},\bar{\Lambda})\in G_{\mbox{\tiny{NLP}}}(c)$ . By assumption, $(\bar{x},\bar{\Lambda})\in\tilde{G}_{\mbox{\tiny{NSDP}}}$ also holds, and thus, $(\bar{x},\bar{\Lambda})$ is a KKT pair of (NSDP). From Proposition 3.5, we obtain

[TABLE]

Recall that $\tilde{G}_{\mbox{\tiny{NSDP}}}\neq\emptyset$ because of Assumption 3.6. So, take $(\tilde{x},\tilde{\Lambda})\in\tilde{G}_{\mbox{\tiny{NSDP}}}$ , with $(\tilde{x},\tilde{\Lambda})\neq(\bar{x},\bar{\Lambda})$ . Since $(\tilde{x},\tilde{\Lambda})$ satisfies the KKT conditions of (NSDP), once again from Proposition 3.5, we have $f(\tilde{x})=\mathcal{A}_{c}(\tilde{x},\tilde{\Lambda})$ . This fact, together with (3.6), and the definition of global solutions, gives

[TABLE]

which shows that the whole expression above holds with equalities. Therefore, $(\tilde{x},\tilde{\Lambda})\in G_{\mbox{\tiny{NLP}}}(c)$ , which completes the proof. ∎

Theorem 3.8.

Suppose that Assumption 3.2 and 3.6 hold. Assume also that there exists $\hat{c}>0$ such that every stationary point of $\mathcal{A}_{c}$ is also a KKT pair of (NSDP) for all $c\geq\hat{c}$ . Then, $\mathcal{A}_{c}$ is an exact augmented Lagrangian function associated to (NSDP), in other words:

(a)

$G_{\mbox{\tiny{NLP}}}(c)=\tilde{G}_{\mbox{\tiny{NSDP}}}$ * for all $c\geq\hat{c}$ .*

(b)

$L_{\mbox{\tiny{NLP}}}(c)\subseteq\big{\{}(x,\Lambda)\colon x\in L_{\mbox{\tiny{NSDP}}}\mbox{ and }\Lambda\mbox{ is a corresponding multiplier}\big{\}}$ * for all $c\geq\hat{c}$ .*

Proof.

(a) Let $c\geq\hat{c}$ be arbitrarily given. From Lemma 3.7, we only need to prove that $G_{\mbox{\tiny{NLP}}}(c)\subseteq\tilde{G}_{\mbox{\tiny{NSDP}}}$ . Let $(\bar{x},\bar{\Lambda})\in G_{\mbox{\tiny{NLP}}}(c)$ . From (3), we need to show that $\bar{x}\in G_{\mbox{\tiny{NSDP}}}$ , with $\bar{\Lambda}$ as a corresponding Lagrange multiplier. Then, $(\bar{x},\bar{\Lambda})$ is a stationary point of $\mathcal{A}_{c}$ . From this theorem’s assumption, it is also a KKT pair of (NSDP), which implies $f(\bar{x})=\mathcal{A}_{c}(\bar{x},\bar{\Lambda})$ from Proposition 3.5. Now, assume that there exists $\tilde{x}\in G_{\mbox{\tiny{NSDP}}}$ such that $\tilde{x}\neq\bar{x}$ . Since $\tilde{x}$ satisfies a constraint qualification from Assumption 3.6, there exists $\tilde{\Lambda}$ such that $(\tilde{x},\tilde{\Lambda})$ satisfies the KKT conditions of (NSDP). Once again by Proposition 3.5, we have $f(\tilde{x})=\mathcal{A}_{c}(\tilde{x},\tilde{\Lambda})$ . So, the definition of global minimizers gives

[TABLE]

which shows that the whole expression above holds with equalities. Therefore, $\bar{x}\in G_{\mbox{\tiny{NSDP}}}$ , with $\bar{\Lambda}$ as a corresponding Lagrange multiplier.

(b) Let $c\geq\hat{c}$ and $(\bar{x},\bar{\Lambda})\in L_{\mbox{\tiny{NLP}}}(c)$ . Since $(\bar{x},\bar{\Lambda})$ is a stationary point of $\mathcal{A}_{c}$ , it is also a KKT pair of (NSDP) by theorem’s assumption. So, from Proposition 3.5, we have

[TABLE]

Moreover, from the definition of local minimizer, there exist neighborhoods $V_{\bar{x}}$ and $V_{\bar{\Lambda}}$ of $\bar{x}$ and $\bar{\Lambda}$ , respectively, such that

[TABLE]

Here, we suppose that $V_{\bar{x}}$ and $V_{\bar{\Lambda}}$ are sufficiently small, which guarantees the existence of a function $\Gamma$ as in Assumption 3.2(e). In particular, we obtain $\mathcal{A}_{c}(\bar{x},\bar{\Lambda})\leq\mathcal{A}_{c}(x,\Gamma(x))$ for all $x\in V_{\bar{x}}$ . This inequality, together with (3.7), shows that

[TABLE]

Thus, from Proposition 3.4 and Assumption 3.2(e), we get

[TABLE]

for all $x\in V_{\bar{x}}$ that is feasible for (NSDP). So, we conclude that $\bar{x}\in L_{\mbox{\tiny{NSDP}}}$ . ∎

The above result shows that the generalized function $\mathcal{A}_{c}$ is an exact augmented Lagrangian function if a finite penalty parameter $\hat{c}>0$ satisfying

[TABLE]

is guaranteed to exist. However, even if Assumption 3.2 holds, we usually cannot expect that such $\hat{c}$ exists. In the next section, we will observe that the functions $\alpha_{c}$ , $\beta_{c}$ , and $\gamma$ , used in the formula of $\mathcal{A}_{c}$ , should be taken carefully for such a purpose.

4 The proposed exact augmented Lagrangian function

Here, we construct a particular exact augmented Lagrangian function by choosing the functions $\alpha_{c}$ , $\beta_{c}$ , and $\gamma$ , used in $\mathcal{A}_{c}$ (formula (3.2)) appropriately. Before that, let us note that by defining

[TABLE]

where $c>0$ is the penalty parameter, we obtain the augmented Lagrangian function for NSDP given in [9, 33]. This function is actually an extension of the classical augmented Lagrangian function for NLP (see [6] for instance), and it is equal to the Lagrangian function with some additional terms. However, it is not exact in the sense of Definition 2.4.

In order to construct an augmented Lagrangian function with exactness property, we choose a more complex $\gamma$ , that satisfies Assumption 3.2(d),(e). As in [14], the function $\Gamma$ of item (e) can be taken as a function that estimates the value of the Lagrange multipliers associated to a point. One possibility for such an estimate for NSDP problems is given in [21], which in turn extends the ones proposed in [19, 20]. Basically, given $x\in\mathbb{R}^{n}$ , we consider the following unconstrained problem:

[TABLE]

where $\zeta_{1},\zeta_{2}\in\mathbb{R}$ are positive scalars, and $r\colon\mathbb{R}^{n}\to\mathbb{R}$ denotes the residual function associated to the feasible set, that is,

[TABLE]

Observe that $r(x)=0$ if, and only if, $x$ is feasible for (NSDP). The idea underlying problem (4.1) is to force KKT conditions (2.1) to hold, except for the feasibility of the Lagrange multiplier. Actually, this problem can be seen as a linear least squares problem, and so its solution can be written explicitly, when the so-called nondegeneracy assumption holds. It is well-known that the nondegeneracy condition, defined below, extends the classical linear independence constraint qualification for nonlinear programming [8, 32], see also Section 4 and Corollary 2 in [28]. In particular, under nondegeneracy, Lagrange multiplies are ensured to exist at optimal points.

Assumption 4.1.

Every $x\in\mathbb{R}^{n}$ feasible for (NSDP) is nondegenerate, that is,

[TABLE]

where $\mathcal{T}_{\mathbb{S}^{m}_{+}}(G(x))$ denotes the tangent cone of $\mathbb{S}^{m}_{+}$ at $G(x)$ , $\mathrm{Im}\,\nabla G(x)$ is the image of the linear map $\nabla G(x)$ , and $\mathrm{lin}$ means lineality space.

Lemma 4.2.

Suppose that Assumption 4.1 holds. For a given $x\in\mathbb{R}^{n}$ , define $N\colon\mathbb{R}^{n}\to\mathbb{S}^{m}$ as

[TABLE]

Then, the following statements are true.

(a)

$N(\cdot)$ * is continuously differentiable and for all $x\in\mathbb{R}^{n}$ , the matrix $N(x)$ is positive definite.*

(b)

The solution of problem (4.1) is unique and it is given by

[TABLE]

(c)

If $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ is a KKT pair of (NSDP), then $\Lambda(x)=\Lambda$ .

(d)

The operator $\Lambda(\cdot)$ is continuously differentiable, and $\nabla\Lambda(x)=N(x)^{-1}Q(x)$ , where

[TABLE]

Proof.

See [21, Lemma 2.2 and Proposition 2.3]. ∎

The augmented Lagrangian function $L_{c}\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{R}$ that we propose is given by

[TABLE]

where $\Lambda(\cdot)$ and $N(\cdot)$ are given in Lemma 4.2. It is equivalent to the usual augmented Lagrangian function for NSDP, except for the last term. So, comparing to the generalized one (3.2), we have

[TABLE]

Observe that the functions $\alpha_{c},\beta_{c},\gamma$ defined in such a way satisfy Assumption 3.2. In fact, items (a), (b), and (c) of this assumption hold trivially, and item (d) is satisfied because of Lemma 4.2(c). The function $\Gamma$ of item (e) corresponds to $\Lambda(\cdot)$ , with the necessary properties described in Lemma 4.2(c),(d). Note that $\gamma(x,\Lambda(x))=0$ for all $x$ in this case.

Now, from (4.2) and (4.3), observe that

[TABLE]

Also, consider the following auxiliary function $Y_{c}\colon\mathbb{R}^{n}\times\mathbb{S}^{m}\to\mathbb{S}^{m}$ defined by

[TABLE]

The gradient of $L_{c}(x,\Lambda)$ with respect to $x$ is given by

[TABLE]

with

[TABLE]

where the second equality follows from (4.6). Using Lemma 2.3(a), as well as some additional calculations, we obtain

[TABLE]

Moreover, the gradient of $L_{c}(x,\Lambda)$ with respect to $\Lambda$ can be written as follows:

[TABLE]

Here, we point out that the formulas of $L_{c}$ , $\nabla_{x}L_{c}$ and $\nabla_{\Lambda}L_{c}$ , presented respectively in (4.4), (4) and (4.9), do not require explicit computation of the multiplier estimate $\Lambda(x)$ . In fact, the estimate only appears in the expression $N(x)(\Lambda(x)-\Lambda)$ , that can be written as (4.6). It means that both $L_{c}$ and their gradients do not require solving the linear least squares problem (4.1), which is computational expensive.

4.1 Exactness results

In the whole section, we suppose that Assumptions 3.6 and 4.1 hold. Indeed, it can be noted that the assertion about constraint qualifications in Assumption 3.6 holds automatically from Assumption 4.1. Here, we will show that the particular augmented Lagrangian $L_{c}$ , defined in (4.4), is in fact exact. With this purpose, we will first establish the relation between the KKT points of the original (NSDP) problem and the stationary points of the unconstrained problem:

[TABLE]

Proposition 4.3.

Let $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ be a KKT pair of (NSDP). Then, for all $c>0$ , $L_{c}(x,\Lambda)=f(x)$ and $(x,\Lambda)$ is stationary of $L_{c}$ , that is, $\nabla_{x}L_{c}(x,\Lambda)=0$ and $\nabla_{\Lambda}L_{c}(x,\Lambda)=0$ .

Proof.

Recalling that the functions defined in (4.5) satisfy Assumption 3.2, the result follows from Proposition 3.5. ∎

Proposition 4.4.

Let $\hat{x}\in\mathbb{R}^{n}$ be feasible for (NSDP) and $\hat{\Lambda}\in\mathbb{S}^{m}$ . So, there exist $\hat{c},\hat{\delta}_{1},\hat{\delta}_{2}>0$ such that if $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ is stationary of $L_{c}$ with $\|x-\hat{x}\|\leq\hat{\delta}_{1}$ , $\|\Lambda-\hat{\Lambda}\|_{F}\leq\hat{\delta}_{2}$ and $c\geq\hat{c}$ , then $(x,\Lambda)$ is a KKT pair of (NSDP).

Proof.

Let us first consider an arbitrary pair $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ and $c>0$ . For convenience, we also define the following function:

[TABLE]

with the last equality following from Lemma 2.1(a). Lemma 2.1(b) and the definition of $Y_{c}(x,\Lambda)$ in (4.7) show that

[TABLE]

The above expression can be rewritten using the distributivity of the Jordan product:

[TABLE]

Moreover, from (4.11), the above equality, the distributivity and the commutativity of the Jordan product, we obtain

[TABLE]

Using this expression, we have

[TABLE]

Now, the formula of $\nabla_{x}L_{c}(x,\Lambda)$ in (4) and the equality (4.6) show that

[TABLE]

Thus, from (4.12), we have

[TABLE]

where

[TABLE]

Let us now consider $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ that is stationary of $L_{c}$ . Since $\nabla_{\Lambda}L_{c}(x,\Lambda)=0$ , we obtain, from (4.9),

[TABLE]

because $N(x)$ is nonsingular from Lemma 4.2(a). Recalling that $\nabla_{x}L_{c}(x,\Lambda)=0$ also holds, then, from (4.13) we obtain

[TABLE]

where

[TABLE]

Using the fact that $\|W\|_{F}^{2}/2-\|Z\|_{F}^{2}\leq\|W-Z\|_{F}^{2}$ for any matrices $W,Z$ , from (4.15), we can write

[TABLE]

Moreover, the definition of projection and Lemma 2.1(a) yield

[TABLE]

The above inequality, together with (4.16) implies

[TABLE]

where $\sigma_{\min}(\cdot)$ denotes the smallest singular value function.

Recalling that $\hat{x}$ is feasible for (NSDP), we observe that if $c\to\infty$ , then $\hat{Y}_{c}(\hat{x},\Lambda)\to P_{\mathbb{S}^{m}_{+}}(G(\hat{x}))=G(\hat{x})$ for all $\Lambda$ . Since $r(\hat{x})=0$ , this also shows that

[TABLE]

with $N(\hat{x})$ defined in (4.2). Now, note that from Lemma 4.2(a), $N(\hat{x})$ is positive definite. Also, define

[TABLE]

Observe that $M$ is continuous because all functions involved in its formula are continuous, and that $M(\hat{x})=N(\hat{x})$ , which is positive definite. Therefore, there is $\delta_{1}>0$ such that $\|x-\hat{x}\|\leq\delta_{1}$ implies that $M(x)$ is also positive definite.

Letting $\hat{\Lambda}\in\mathbb{S}^{m}$ , there exist $c_{0},\hat{\delta}_{1},\hat{\delta}_{2}>0$ with $\hat{\delta}_{1}<\delta_{1}$ such that both $M(x)$ and $N_{c_{0}}(x,\Lambda)$ are positive definite for all $(x,\Lambda)$ in the set

[TABLE]

Now, we would like to prove that there is $\hat{c}_{0}>0$ such that $N_{c}(x,\Lambda)$ is positive definite for all $c\geq\hat{c}_{0}$ and all $(x,\Lambda)$ in $\mathcal{V}$ . To do so, suppose that this statement is false. Then, there are sequences $\{c_{k}\}\subset\mathbb{R}_{++}$ and $\{(x^{k},\Lambda_{k})\}\in\mathcal{V}$ such that $c_{k}\to\infty$ and $N_{c_{k}}(x^{k},\Lambda_{k})$ is not positive definite for all $k$ . Since $\mathcal{V}$ is compact, we may assume that $\{(x^{k},\Lambda_{k})\}$ converges to some $(\tilde{x},\tilde{\Lambda})\in\mathcal{V}$ . However, we have

[TABLE]

Since $M(\tilde{x})$ is positive definite, $N_{c_{k}}(x^{k},\Lambda_{k})$ should also be positive definite for $k$ sufficiently large. This contradicts the fact that no $N_{c_{k}}(x^{k},\Lambda_{k})$ is positive definite, by construction. We conclude that there is $\hat{c}_{0}$ such that $N_{c}(x,\Lambda)$ is positive definite for all $c\geq\hat{c}_{0}$ and all $(x,\Lambda)$ in $\mathcal{V}$ .

Considering one such $(x,\Lambda)\in\mathcal{V}$ , we now seek some $c(x,\Lambda)\geq\hat{c}_{0}$ such that $\tilde{N}_{c}(x,\Lambda)$ is nonsingular for all $c\geq c(x,\Lambda)$ . We remark that we already know that $\sigma_{\min}(N_{c}(x,\Lambda))$ is positive over $\mathcal{V}$ . Denote by $\sigma_{\max}(\cdot)$ the maximum singular value function. Then, recalling the formula for $\tilde{N}_{c}(x,\Lambda)$ and elementary properties of singular values111Namely, that $\inf_{\|x\|=1}\|Ax-Bx\|=\sigma_{\min}(A-B)\geq\sigma_{\min}(A)-\sigma_{\max}(B)$ ., we have that if

[TABLE]

then $\tilde{N}_{c}(x,\Lambda)$ will be nonsingular for all $c\geq c(x,\Lambda)$ . As $c(x,\Lambda)$ is a continuous function of $(x,\Lambda)$ , we have that $c_{1}:=\sup_{\mathcal{V}}c(x,\Lambda)$ is finite. Thus, $\tilde{N}_{c}(x,\Lambda)$ is positive definite, and hence $\sigma_{\min}(\tilde{N}_{c}(x,\Lambda))>0$ for all $c\geq c_{1}$ and $(x,\Lambda)\in\mathcal{V}$ . Similarly, there exists $\hat{c}\geq c_{1}$ such that

[TABLE]

Finally, consider $(x,\Lambda)\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ and $c\in\mathbb{R}_{++}$ such that $(x,\Lambda)$ is stationary of $L_{c}$ , $\|x-\hat{x}\|\leq\hat{\delta}_{1}$ , $\|\Lambda-\hat{\Lambda}\|_{F}\leq\hat{\delta}_{2}$ and $c\geq\hat{c}$ . From (4.17) and the above inequality, it means that $Y_{c}(x,\Lambda)=0$ . Also, from Lemma 2.2, and the fact that $c>0$ , it yields

[TABLE]

Moreover, from (4.14) and the fact that $N(x)$ is nonsingular by Lemma 4.2(a), we have $\Lambda(x)=\Lambda$ . Since $\nabla_{x}L_{c}(x,\Lambda)=0$ also holds, from (4), we obtain $\nabla_{x}L(x,\Lambda)=0$ . Therefore, $(x,\Lambda)$ is a KKT pair of (NSDP). ∎

Proposition 4.5.

Let $\{x^{k}\}\subset\mathbb{R}^{n}$ , $\{\Lambda_{k}\}\subset\mathbb{S}^{m}$ , and $\{c_{k}\}\subset\mathbb{R}_{++}$ be sequences such that $c_{k}\to\infty$ and $(x^{k},\Lambda_{k})$ is stationary of $L_{c_{k}}$ for all $k$ . Assume that there are subsequences $\{x^{k_{j}}\}$ and $\{\Lambda_{k_{j}}\}$ of $\{x^{k}\}$ and $\{\Lambda_{k}\}$ , respectively, such that $x^{k_{j}}\to\hat{x}$ and $\Lambda_{k_{j}}\to\hat{\Lambda}$ for some $(\hat{x},\hat{\Lambda})\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ . Then, either there exists $\hat{k}>0$ such that $(x^{k_{j}},\Lambda_{k_{j}})$ is a KKT pair of (NSDP) for all $k_{j}\geq\hat{k}$ , or $\hat{x}$ is a stationary point of the residual function $r$ that is infeasible for (NSDP).

Proof.

We first show that $\hat{x}$ is a stationary point of $r$ , in other words,

[TABLE]

In fact, using (4) and dividing the equation $\nabla_{x}L_{c_{k_{j}}}(x^{k_{j}},\Lambda_{k_{j}})=0$ by $c_{k_{j}}$ , we have

[TABLE]

Recalling Lemma 4.2, we observe that all the functions involved in the above equation are continuous. Thus, taking the limit $k_{j}\to\infty$ , from the definition of $Y_{c}$ in (4.7), we obtain $-\nabla G(\hat{x})^{*}P_{\mathbb{S}^{m}_{+}}(-G(\hat{x}))=0$ , as we claimed. Now, assume that $\hat{x}$ is feasible. Then, from Proposition 4.4, there exists $\hat{k}>0$ such that $(x^{k_{j}},\Lambda_{k_{j}})$ is a KKT pair of (NSDP) for all $k_{j}\geq\hat{k}$ , which completes the proof. ∎

Now, recalling Definition 2.4, once again, we use the notations $G_{\mbox{\tiny{NSDP}}}$ ( $L_{\mbox{\tiny{NSDP}}}$ ) and $G_{\mbox{\tiny{NLP}}}(c)$ ( $L_{\mbox{\tiny{NLP}}}(c)$ ) to denote the sets of global (local) minimizers of problems (NSDP) and (4.10), respectively. The following theorems show that the proposed function $L_{c}$ given in (4.4) is in fact an exact augmented Lagrangian function. However, the results are established as in [2], where it is admitted that we can end up with a stationary point of the residual function $r$ that is infeasible for (NSDP).

Theorem 4.6.

Let $\{x^{k}\}\subset\mathbb{R}^{n}$ , $\{\Lambda_{k}\}\subset\mathbb{S}^{m}$ , and $\{c_{k}\}\subset\mathbb{R}_{++}$ be sequences such that $c_{k}\to\infty$ and $(x^{k},\Lambda_{k})\in L_{\mbox{\tiny{NLP}}}(c_{k})$ for all $k$ . Assume that there are subsequences $\{x^{k_{j}}\}$ and $\{\Lambda_{k_{j}}\}$ of $\{x^{k}\}$ and $\{\Lambda_{k}\}$ , respectively, such that $x^{k_{j}}\to\hat{x}$ and $\Lambda_{k_{j}}\to\hat{\Lambda}$ for some $(\hat{x},\hat{\Lambda})\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ . Then, either there exists $\hat{k}>0$ such that $x^{k_{j}}\in L_{\mbox{\tiny{NSDP}}}$ , with an associated Lagrange multiplier $\Lambda_{k_{j}}$ for all $k_{j}\geq\hat{k}$ , or $\hat{x}$ is a stationary point of the residual function $r$ that is infeasible for (NSDP).

Proof.

From Proposition 4.5, either there exists $\hat{k}>0$ such that $(x^{k_{j}},\Lambda_{k_{j}})$ is a KKT pair of (NSDP) for all $k_{j}\geq\hat{k}$ , or $\hat{x}$ is a stationary point of $r$ that is infeasible for (NSDP). So, the result follows in the same way as the proof of Theorem 3.8(b). ∎

For the above result, that concerns local minimizers, we note that the existence of the subsequence $\{x^{k_{j}}\}$ is guaranteed, for example, when the whole sequence $\{x^{k}\}$ is bounded. Moreover, if the constraint function $G$ is convex with respect to the cone $\mathbb{S}^{m}_{+}$ , then the residual function $r$ is also convex, which means that all stationary points of $r$ are feasible for (NSDP). In the case of global minimizers, it is possible to prove full equivalence, and we do not have to concern about stationary points of $r$ that are infeasible for (NSDP).

Proposition 4.7.

Let $\{x^{k}\}\subset\mathbb{R}^{n}$ , $\{\Lambda_{k}\}\subset\mathbb{S}^{m}$ , and $\{c_{k}\}\subset\mathbb{R}_{++}$ be sequences such that $x^{k}\to\hat{x}$ and $\Lambda_{k}\to\hat{\Lambda}$ for some $(\hat{x},\hat{\Lambda})\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ , $c_{k}\to\infty$ , and $(x^{k},\Lambda_{k})\in G_{\mbox{\tiny{NLP}}}(c_{k})$ for all $k$ . Then, there exists $\hat{k}>0$ such that $x^{k}\in G_{\mbox{\tiny{NSDP}}}$ with an associated Lagrange multiplier $\Lambda_{k}$ for all $k\geq\hat{k}$ .

Proof.

Let $\bar{x}\in G_{\mbox{\tiny{NSDP}}}$ , which exists by Assumption 3.6. Because of the nondegeneracy constraint qualification, there exists $\bar{\Lambda}\in\mathbb{S}^{m}$ such that $(\bar{x},\bar{\Lambda})$ is a KKT pair of (NSDP). So, from Proposition 4.3, $f(\bar{x})=L_{c}(\bar{x},\bar{\Lambda})$ holds for all $c>0$ . Moreover, since $(x^{k},\Lambda_{k})\in G_{\mbox{\tiny{NLP}}}(c_{k})$ , we have

[TABLE]

for all $k$ . Taking the supremum limit in this inequality, we obtain

[TABLE]

Observe now that the formula of $L_{c_{k}}(x^{k},\Lambda_{k})$ in (4.4) can be written equivalently as

[TABLE]

Recall from Lemma 4.2 that the functions involved in the above equality are all continuous. This fact, together with inequality (4.19), shows that $P_{\mathbb{S}^{m}_{+}}(-G(\hat{x}))=0$ , that is, $\hat{x}$ is feasible. So, from Proposition 4.5, we conclude that there exists $\hat{k}>0$ such that $(x^{k},\Lambda_{k})$ is a KKT pair of (NSDP) for all $k\geq\hat{k}$ . Since $c_{k}>0$ and the norm is always nonnegative, (4.20) implies $L_{c_{k}}(x^{k},\Lambda_{k})\geq f(x^{k})-\|\Lambda_{k}\|_{F}^{2}/(2c_{k})$ . Again, taking the supremum limit in such an inequality, we have

[TABLE]

which, together with (4.19) shows that $f(\hat{x})\leq f(\bar{x})$ . Thus, $\hat{x}\in G_{\mbox{\tiny{NSDP}}}$ holds.

Now, since $\hat{x}$ is feasible for (NSDP), there exist $\hat{c},\hat{\delta}_{1},\hat{\delta}_{2}$ as in Proposition 4.4. Consider $\hat{k}$ large enough so that $\|x^{k}-\hat{x}\|\leq\hat{\delta}_{1}$ , $\|\Lambda_{k}-\hat{\Lambda}\|_{F}\leq\hat{\delta}_{2}$ , $c_{k}\geq\hat{c}$ , and $(x^{k},\Lambda_{k})\in G_{\mbox{\tiny{NLP}}}(c_{k})$ for all $k\geq\hat{k}$ . Since $(x^{k},\Lambda_{k})$ is stationary of $L_{c_{k}}$ , from Proposition 4.4, we obtain that $(x^{k},\Lambda_{k})$ is also a KKT pair of (NSDP) for all $k\geq\hat{k}$ . Once again from Proposition 4.3 and (4.18), we have $f(x^{k})=L_{c_{k}}(x^{k},\Lambda_{k})\leq f(\bar{x})$ . Therefore, $x^{k}\in G_{\mbox{\tiny{NSDP}}}$ for all $k\geq\hat{k}$ . ∎

Theorem 4.8.

Assume that there exists $\bar{c}>0$ such that $\bigcup_{c\geq\bar{c}}G_{\mbox{\tiny{NLP}}}(c)$ is bounded. Then, there exists $\hat{c}>0$ such that $G_{\mbox{\tiny{NLP}}}(c)=\tilde{G}_{\mbox{\tiny{NSDP}}}$ for all $c\geq\hat{c}$ , where $\tilde{G}_{\mbox{\tiny{NSDP}}}$ is defined in (3).

Proof.

From Lemma 3.7, we only need to show the existence of $\hat{c}>0$ such that $G_{\mbox{\tiny{NLP}}}(c)\subseteq\tilde{G}_{\mbox{\tiny{NSDP}}}$ for all $c\geq\hat{c}$ . Assume that this statement is false. Then, there exist sequences $\{(x^{k},\Lambda_{k})\}\subset\mathbb{R}^{n}\times\mathbb{S}^{m}$ and $\{c_{k}\}\subset\mathbb{R}_{++}$ with $c_{k}\to\infty$ , $c_{k}\geq\bar{c}$ , and $(x^{k},\Lambda_{k})\in G_{\mbox{\tiny{NLP}}}(c_{k})$ , but such that $(x^{k},\Lambda_{k})\notin\tilde{G}_{\mbox{\tiny{NSDP}}}$ . Since $\bigcup_{c\geq\bar{c}}G_{\mbox{\tiny{NLP}}}(c)$ is bounded, we can assume, without loss of generality, that $x^{k}\to\hat{x}$ and $\Lambda_{k}\to\hat{\Lambda}$ for some $(\hat{x},\hat{\Lambda})\in\mathbb{R}^{n}\times\mathbb{S}^{m}$ . Thus, Proposition 4.7 shows that there exists $\hat{k}>0$ such that $(x^{k},\Lambda_{k})\in\tilde{G}_{\mbox{\tiny{NSDP}}}$ for all $k\geq\hat{k}$ , which is a contradiction. ∎

5 Preliminary numerical experiments

This work is focused on the theoretical aspects of exact augmented Lagrangian functions, however, we take a look at the numerical prospects of our approach by examining two simple problems in the next two subsections. We will now explain briefly our proposal. Given a problem (NSDP), the idea is to use some unconstrained optimization method to solve (4.10), i.e., minimization of $L_{c}$ given in (4.4). First, an initial point $x^{0}\in\mathbb{R}^{n}$ , together with the initial penalty parameter $c_{0}>0$ are selected. Then, we choose the initial Lagrange multiplier $\Lambda_{0}\in\mathbb{S}^{m}$ . One possibility is to use the multiplier estimate, i.e., to set $\Lambda_{0}$ as $\Lambda(x^{0})$ , where $\Lambda(\cdot)$ is defined in (4.3).

Then, we run the unconstrained optimization method of our choice. However, since the penalty values for which $L_{c}$ becomes exact is not known beforehand, we attempt to adjust the penalty parameter between iterations as follows. Let $\tau\in(0,1)$ and $\rho>1$ . Denote by $x^{k}$ and $\Lambda_{k}$ , $c_{k}$ the values of $x$ , $\Lambda$ and $c$ at the $k$ th iteration, respectively. Recalling the function $Y_{c}(\cdot,\cdot)$ defined in (4.7), if at the $k$ th iteration we have

[TABLE]

then we let $c_{k}$ be $\rho\,c_{k-1}$ . Otherwise, we let $c_{k}$ be $c_{k-1}$ . This is an idea that appears in many augmented Lagrangian methods, for example in [7]. The motivation is that $\|Y_{c_{k}}(x^{k},\Lambda_{k})\|_{F}$ is zero if and only if $G(x^{k})\in\mathbb{S}^{m}_{+}$ , $\Lambda_{k}\in\mathbb{S}^{m}_{+}$ and $G(x^{k})\circ\Lambda_{k}=0$ . Therefore, $\|Y_{c_{k}}(x^{k},\Lambda_{k})\|_{F}$ is a measure of the degree to which complementarity and feasibility are satisfied, taking into account the current penalty parameter. In summary, whenever there is not enough progress, we increase the penalty. In order to avoid the problem becoming too ill-conditioned, we never increase the penalty past some fixed value $c_{\max}$ . We also point out that the update of the penalty parameter can be done by using the so-called test function, which is originally defined in [20]. However, as it can be seen in the paper about exact penalty functions [2], the above approach using $Y_{c}(\cdot,\cdot)$ is more efficient, which justifies its use here.

In our implementation, $\rho$ , $\tau$ and $c_{\max}$ is set to $1.1$ , $0.9$ and $1000$ , respectively. The maximum number of iterations is $5000$ . The values of $\zeta_{1}$ and $\zeta_{2}$ , which control the behavior of $N(x)$ in (4.2), are set to $1$ and $10^{-4}$ , respectively. The initial penalty parameter $c_{0}$ is computed using the formula:

[TABLE]

with $c_{\min}=0.1$ , which is similar to the one used in [7]. The unconstrained method of our choice is the BFGS method using the Armijo’s condition for the line search. We stop the algorithm when the KKT conditions are satisfied within $10^{-5}$ or when the norm of the gradient of $L_{c}$ is less than $10^{-5}$ .

We implemented the algorithm in Python and ran all the experiments on a Intel Core i7-6700 machine with 8 cores and 16GB of memory. As we already pointed out after (4.9), an important implementational aspect is that we never need to explicitly evaluate the function $\Lambda(\cdot)$ , except in the optional way of computing the initial Lagrange multiplier.

5.1 Noll’s example

As an initial example, we took a look at this simple instance by Noll [30]:

[TABLE]

The problem (Noll) is already in the format (NSDP), which can be seen by letting $G$ be the function defined by

[TABLE]

In order to compute $L_{c}$ and its gradient (see (4.4), (4.6), (4) and (4.9)), we need the partial derivative matrices of $G$ and the adjoint of the gradient of $G$ , which are given below:

[TABLE]

where $V\in\mathbb{S}^{3}$ is an arbitrary matrix with $(i,j)$ entry denoted by $V_{ij}$ .

The optimal value of (Noll) is $-2$ and it is achieved at $(2,0)$ . Starting at $(1,0)$ , our method found a solution satisfying the optimality criteria in 14 iterations and $0.01$ seconds. The initial and final penalty parameters were $6.66$ and $10.74$ , respectively. The objective function, the constraint function and their gradients were evaluated $41$ times each.

5.2 The closest correlation matrix problem

Let $H$ be a $m\times m$ symmetric matrix. The goal is to find a correlation matrix $X$ that is as close as possible to $H$ . In other words, we seek a solution to the following problem:

[TABLE]

There are many variants of (Cor) where weighting factors are added, constraints on the eigenvalues are considered, and so on. With that, this family of problems has found of wealth of applications in statistics and finance [22].

In this example, it is possible to show that the nondegeneracy condition is satisfied at every feasible point (Assumption 4.1), which guarantee the theoretical properties of the exact augmented Lagrangian function $L_{c}$ . This was proved by Qi and Sun in [31], but since there are some differences in notation, we will first take a look at this issue. Qi and Sun proved the following result.

Proposition 5.1.

Let $Y\in\mathbb{S}^{m}_{+}$ be such that $Y_{ii}=1$ , for all $i$ . Then,

[TABLE]

where $\mathrm{diag}\colon\mathbb{S}^{m}\to\mathbb{R}^{m}$ is the linear map that maps a symmetric matriz $Z$ to its diagonal $(Z_{11},Z_{22},\ldots,Z_{mm})$ .

Proof.

See Proposition 2.1 and Equation (2.2) in [31].∎∎

We now write (Cor) in a format similar to (NSDP). For that, denote by $A^{ij}$ the $m\times m$ the matrix that has $1$ in the $(i,j)$ and $(j,i)$ entries and [math] elsewhere. Then, by discarding constant terms in the objective function, (Cor) can be reformulated equivalently as follows.

[TABLE]

Here, $x$ can be thought as an upper triangular matrix without the diagonal. That is why the dimension of $x$ is $m(m-1)/2$ and we index $x$ by using $x_{ij}$ for $1\leq i<j\leq m$ .

Proposition 5.2.

Problem (Cor2) satisfies Assumption 4.1.

Proof.

We must show that

[TABLE]

where $G:\mathbb{R}^{m(m-1)/2}\to\mathbb{S}^{m}$ is the function such that

[TABLE]

Now, let $Y\in\mathbb{S}^{m}$ be arbitrary. We can write $Y$ as

[TABLE]

From Proposition 5.1, the first summation belongs to $\mathrm{lin}\mathcal{T}_{\mathbb{S}^{m}_{+}}(Y)$ . Then, noting that $\mathrm{Im}\,\nabla G(x)$ is the space spanned by $\{A^{ij}\colon 1\leq i<j\leq m\}$ , we conclude that the second summation belongs to $\mathrm{Im}\,\nabla G(x)$ . This shows that Assumption 4.1 is satisfied. ∎∎

We now write some useful formulae which can be used in conjunction with (4.6), (4) and (4.9) to compute $L_{c}$ and its gradient. Let $V$ be an arbitrary $m\times m$ symmetric matrix. Then,

[TABLE]

for $1\leq i<j\leq m$ , where $v$ corresponds to the upper triangular part of $V$ without the diagonal.

We now move on to the experiments. We generated $50$ symmetric matrices $H$ such that the diagonal entries are all $1$ and non-diagonal elements are uniform random numbers between $-1$ and $1$ . This was repeated for $m=5,10,15,20$ . We then ran our algorithm using as initial point the matrix having $1$ in all its entries. The results can be seen in Table 1. All the values depicted in Table 1 are averages among $50$ runs. The column “Iterations” correspond to average number of BFGS iterations. At each run, we recorded the number of function evaluations for $f$ , which is the same for $G,\nabla f$ and $\nabla G$ . Then, the column “Evaluations” in Table 1 is the average number of function evaluations. Columns “Initial $c$ ” and “Final $c$ ” correspond to the average of the initial and final penalty parameters, respectively. Finally, column “Time (s)” is the average running time, in seconds.

No failures were detected, that is, we obtained approximate KKT points within $10^{-5}$ for all the instances. We also observed that, except for $m=5$ , the final penalty parameter climbed up to the maximum value. At first glance, this suggests that the penalty was not large enough. However, we were still able to solve the problems without increasing the maximum value. In fact, we noted that, in some cases, the performance degraded when the maximum penalty value was increased. At this moment, the method presented here is not competitive against the approach in [31], where we observed that an $20\times 20$ instance is typically solved in less than a second in our hardware. However, it should be emphasized that a second-order method is used in [31], where here we used BFGS. It would be interesting to apply and analyze a second-order method in combination with the exact augmented Lagrangian function $L_{c}$ , but this investigation is beyond the scope of this paper.

6 Final remarks

We proposed a generalized augmented Lagrangian function $\mathcal{A}_{c}$ for NSDP problems, giving conditions for it to be exact. After that, we considered a particular function $L_{c}$ , and we proved that it is exact under the nondegeneracy condition and some reasonable assumptions as in Theorem 4.8. We also presented some preliminary numerical experiments using a quasi-Newton method with BFGS formula, showing the validity of the approach. One future work is to analyze more efficient methods that can solve the unconstrained minimization of $L_{c}$ . From Lemma 4.2 and the formula of $L_{c}$ , given in (4.4), we observe that $L_{c}$ is an $\mbox{SC}^{1}$ function, i.e., it is continuously differentiable and its gradient is semismooth. It means that the unconstrained problem can be solved with second-order methods, as the semismooth Newton. However, the gradient of $L_{c}$ , given in (4), contains second-order terms of problem functions $f$ and $G$ , and thus, a second-order method would have to deal with third-order derivatives. We believe that we can use the idea proposed in [19], that avoids these third-order terms, but still guaranteeing the global superlinear convergence.

By using the generalized function $\mathcal{A}_{c}$ as a tool, other practical exact augmented Lagrangian functions can be studied for NSDP, or other important conic optimization problems. In fact, as it can be seen in [14], many other exact augmented Lagrangian functions exist, but only for the classical nonlinear programming. For example, recalling (4.5), we note that $L_{c}$ is defined by choosing functions $\alpha_{c}$ and $\beta_{c}$ as constants. By considering more sophisticated formulas, it is possible to weaken the assumptions used here. This should be another matter of investigation.

Acknowledgements

We would like to thank the anonymous referees for their suggestions which improved the original version of the paper. We are also thankful to Akiko Kobayashi for valuable discussions about exact augmented Lagrangian functions.

Bibliography36

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] T. A. André and P. J. S. Silva. Exact penalties for variational inequalities with applications to nonlinear complementarity problems. Computational Optimization and Applications , 47(3):401–429, 2010.
2[2] R. Andreani, E. H. Fukuda, and P. J. S. Silva. A Gauss-Newton approach for solving constrained optimization problems using differentiable exact penalties. Journal of Optimization Theory and Applications , 156(2):417–449, 2013.
3[3] P. Apkarian, D. Noll, and H. D. Tuan. Fixed-order H ∞ subscript 𝐻 {H}_{\infty} control design via a partially augmented Lagrangian method. International Journal of Robust and Nonlinear Control , 13(12):1137–1148, 2003.
4[4] A. Ben-Tal, F. Jarre, M. Kočvara, A. Nemirovski, and J. Zowe. Optimal design of trusses under a nonconvex global buckling constraint. Optimization and Engineering , 1(2):189–213, 2000.
5[5] D. S. Bernstein. Matrix Mathematics: Theory, Facts, and Formulas . Princeton University Press, 2nd edition, 2009.
6[6] D. P. Bertsekas. Constrained Optimization and Lagrange Multipliers Methods . Academic Press, New York, 1982.
7[7] E. G. Birgin and J. M. Martínez. Practical augmented Lagrangian methods for constrained optimization . Society for Industrial and Applied Mathematics, Philadelphia, PA, 2014.
8[8] J. F. Bonnans and A. Shapiro. Perturbation Analysis of Optimization Problems . Springer-Verlag, New York, 2000.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Exact Augmented Lagrangian Functions for

Abstract

1 Introduction

2 Preliminaries

Lemma 2.1**.**

Proof.

Lemma 2.2**.**

Proof.

Lemma 2.3**.**

Proof.

Definition 2.4**.**

3 A general framework

Lemma 3.1**.**

Proof.

Assumption 3.2**.**

Proposition 3.3**.**

Proof.

Proposition 3.4**.**

Proof.

Proposition 3.5**.**

Proof.

Assumption 3.6**.**

Lemma 3.7**.**

Proof.

Theorem 3.8**.**

Proof.

4 The proposed exact augmented Lagrangian function

Assumption 4.1**.**

Lemma 4.2**.**

Proof.

4.1 Exactness results

Proposition 4.3**.**

Proof.

Proposition 4.4**.**

Proof.

Proposition 4.5**.**

Proof.

Theorem 4.6**.**

Proof.

Proposition 4.7**.**

Proof.

Theorem 4.8**.**

Proof.

5 Preliminary numerical experiments

5.1 Noll’s example

5.2 The closest correlation matrix problem

Proposition 5.1**.**

Proof.

Proposition 5.2**.**

Proof.

6 Final remarks

Acknowledgements

Lemma 2.1.

Lemma 2.2.

Lemma 2.3.

Definition 2.4.

Lemma 3.1.

Assumption 3.2.

Proposition 3.3.

Proposition 3.4.

Proposition 3.5.

Assumption 3.6.

Lemma 3.7.

Theorem 3.8.

Assumption 4.1.

Lemma 4.2.

Proposition 4.3.

Proposition 4.4.

Proposition 4.5.

Theorem 4.6.

Proposition 4.7.

Theorem 4.8.

Proposition 5.1.

Proposition 5.2.