A Method to Guarantee Local Convergence for Sequential Quadratic   Programming with Poor Hessian Approximation

Tuan T. Nguyen; Mircea Lazar; Hans Butler

arXiv:1704.03064·math.OC·April 12, 2017·CDC

A Method to Guarantee Local Convergence for Sequential Quadratic Programming with Poor Hessian Approximation

Tuan T. Nguyen, Mircea Lazar, Hans Butler

PDF

Open Access

TL;DR

This paper introduces a simple method to ensure local convergence of SQP algorithms even when using poor Hessian approximations, addressing practical computational challenges.

Contribution

It proposes a novel approach that guarantees local convergence of SQP with low-quality Hessian approximations, which was not previously established.

Findings

01

The method guarantees local convergence despite poor Hessian approximations.

02

Numerical example demonstrates the effectiveness of the proposed approach.

Abstract

Sequential Quadratic Programming (SQP) is a powerful class of algorithms for solving nonlinear optimization problems. Local convergence of SQP algorithms is guaranteed when the Hessian approximation used in each Quadratic Programming subproblem is close to the true Hessian. However, a good Hessian approximation can be expensive to compute. Low cost Hessian approximations only guarantee local convergence under some assumptions, which are not always satisfied in practice. To address this problem, this paper proposes a simple method to guarantee local convergence for SQP with poor Hessian approximation. The effectiveness of the proposed algorithm is demonstrated in a numerical example.

Figures1

Click any figure to enlarge with its caption.

Tables1

Table 1. TABLE I: Computation times

Method	Number of	Computation
	iterations	time (ms)
SQP-EH	$13$	$113$
iSQP-GGN, $α = 0.30$	$23$	$141$
iSQP-GGN, $α = 0.35$	$18$	$107$
iSQP-GGN, $α = 0.40$	$22$	$136$
iSQP-GGN, $α = 0.45$	$27$	$160$
iSQP-I, $α = 0.25$	$73$	$359$
iSQP-I, $α = 0.30$	$72$	$347$

Equations140

\nabla_{x} Φ (x) = [\frac{\partial Φ ( x )}{\partial x _{[1]}} \frac{\partial Φ ( x )}{\partial x _{[2]}} \dots \frac{\partial Φ ( x )}{\partial x _{[n]}}] .

\nabla_{x} Φ (x) = [\frac{\partial Φ ( x )}{\partial x _{[1]}} \frac{\partial Φ ( x )}{\partial x _{[2]}} \dots \frac{\partial Φ ( x )}{\partial x _{[n]}}] .

x min

x min

L (x, λ) := F_{1} (x) + λ^{T} F_{2} (x),

L (x, λ) := F_{1} (x) + λ^{T} F_{2} (x),

\nabla_{[x, λ]^{T}} L (x_{*}, λ_{*})^{T} = [J_{1} (x_{*})^{T} + J_{2} (x_{*})^{T} λ_{*} F_{2} (x_{*})] = 0_{(n + m) \times 1},

\nabla_{[x, λ]^{T}} L (x_{*}, λ_{*})^{T} = [J_{1} (x_{*})^{T} + J_{2} (x_{*})^{T} λ_{*} F_{2} (x_{*})] = 0_{(n + m) \times 1},

x_{k + 1} = x_{k} + Δ x_{k},

x_{k + 1} = x_{k} + Δ x_{k},

Δ x_{k} min

Δ x_{k} min

F_{1∣ k}

F_{1∣ k}

J_{1∣ k}

d^{T} B_{k} d \geq β_{1} ∥ d ∥_{2}^{2},

d^{T} B_{k} d \geq β_{1} ∥ d ∥_{2}^{2},

J_{2∣ k} d = 0_{m \times 1} .

J_{2∣ k} d = 0_{m \times 1} .

∥ B_{k} ∥_{2} \leq β_{2} .

∥ B_{k} ∥_{2} \leq β_{2} .

B_{k} Δ x_{k}^{S QP} + J_{1∣ k}^{T} + J_{2∣ k}^{T} λ_{k + 1} = 0_{n \times 1},

B_{k} Δ x_{k}^{S QP} + J_{1∣ k}^{T} + J_{2∣ k}^{T} λ_{k + 1} = 0_{n \times 1},

J_{2∣ k} Δ x_{k}^{S QP} + F_{2∣ k} = 0_{m \times 1},

[B_{k} J_{2∣ k} J_{2∣ k}^{T} 0_{m \times m}] [Δ x_{k}^{S QP} λ_{k + 1}] = - [J_{1∣ k}^{T} F_{2∣ k}] .

[B_{k} J_{2∣ k} J_{2∣ k}^{T} 0_{m \times m}] [Δ x_{k}^{S QP} λ_{k + 1}] = - [J_{1∣ k}^{T} F_{2∣ k}] .

[Δ x_{k}^{S QP} λ_{k + 1}] = - [B_{k} J_{2∣ k} J_{2∣ k}^{T} 0_{m \times m}]^{- 1} [J_{1∣ k}^{T} F_{2∣ k}] .

[Δ x_{k}^{S QP} λ_{k + 1}] = - [B_{k} J_{2∣ k} J_{2∣ k}^{T} 0_{m \times m}]^{- 1} [J_{1∣ k}^{T} F_{2∣ k}] .

B_{k} + c J_{2∣ k}^{T} J_{2∣ k} ≻ 0, \forall c > c_{0} .

B_{k} + c J_{2∣ k}^{T} J_{2∣ k} ≻ 0, \forall c > c_{0} .

C_{k}

C_{k}

D_{k}

[B_{k} J_{2∣ k} J_{2∣ k}^{T} 0_{m \times m}]^{- 1}

[B_{k} J_{2∣ k} J_{2∣ k}^{T} 0_{m \times m}]^{- 1}

Δ x_{k}^{S QP} = - (I_{n} - T_{2∣ k}^{C_{k}} J_{2∣ k}) C_{k}^{- 1} J_{1∣ k}^{T} - T_{2∣ k}^{C_{k}} F_{2∣ k},

Δ x_{k}^{S QP} = - (I_{n} - T_{2∣ k}^{C_{k}} J_{2∣ k}) C_{k}^{- 1} J_{1∣ k}^{T} - T_{2∣ k}^{C_{k}} F_{2∣ k},

T_{2∣ k}^{C_{k}} := C_{k}^{- 1} J_{2∣ k}^{T} (J_{2∣ k} C_{k}^{- 1} J_{2∣ k}^{T})^{- 1} .

T_{2∣ k}^{C_{k}} := C_{k}^{- 1} J_{2∣ k}^{T} (J_{2∣ k} C_{k}^{- 1} J_{2∣ k}^{T})^{- 1} .

Δ x_{k}^{S QP} = - (I_{n} - T_{2∣ k}^{B_{k}} J_{2∣ k}) B_{k}^{- 1} J_{1∣ k}^{T} - T_{2∣ k}^{B_{k}} F_{2∣ k},

Δ x_{k}^{S QP} = - (I_{n} - T_{2∣ k}^{B_{k}} J_{2∣ k}) B_{k}^{- 1} J_{1∣ k}^{T} - T_{2∣ k}^{B_{k}} F_{2∣ k},

T_{2∣ k}^{B_{k}} := B_{k}^{- 1} J_{2∣ k}^{T} (J_{2∣ k} B_{k}^{- 1} J_{2∣ k}^{T})^{- 1} .

T_{2∣ k}^{B_{k}} := B_{k}^{- 1} J_{2∣ k}^{T} (J_{2∣ k} B_{k}^{- 1} J_{2∣ k}^{T})^{- 1} .

Δ x_{k} = α Δ x_{k}^{S QP} + (1 - α) Δ x_{k}^{f},

Δ x_{k} = α Δ x_{k}^{S QP} + (1 - α) Δ x_{k}^{f},

J_{2∣ k} Δ x_{k}^{f} + F_{2∣ k} = 0_{m \times 1} .

J_{2∣ k} Δ x_{k}^{f} + F_{2∣ k} = 0_{m \times 1} .

Δ x_{k}^{f 1}

Δ x_{k}^{f 1}

Δ x_{k}^{f 2}

T_{2∣ k} := J_{2∣ k}^{T} (J_{2∣ k} J_{2∣ k}^{T})^{- 1} .

T_{2∣ k} := J_{2∣ k}^{T} (J_{2∣ k} J_{2∣ k}^{T})^{- 1} .

Δ x_{k}^{f} = - (\frac{1}{1 - α} T_{2∣ k} - \frac{α}{1 - α} T_{2∣ k}^{C_{k}}) F_{2∣ k} .

Δ x_{k}^{f} = - (\frac{1}{1 - α} T_{2∣ k} - \frac{α}{1 - α} T_{2∣ k}^{C_{k}}) F_{2∣ k} .

J_{2∣ k}^{T} Δ x_{k}^{f}

J_{2∣ k}^{T} Δ x_{k}^{f}

Δ x_{k} = - α (I_{n} - T_{2∣ k}^{C_{k}} J_{2∣ k}) C_{k}^{- 1} J_{1∣ k}^{T} - T_{2∣ k} F_{2∣ k} .

Δ x_{k} = - α (I_{n} - T_{2∣ k}^{C_{k}} J_{2∣ k}) C_{k}^{- 1} J_{1∣ k}^{T} - T_{2∣ k} F_{2∣ k} .

G_{k} := (I_{n} - T_{2∣ k}^{C_{k}} J_{2∣ k}) C_{k}^{- 1} J_{1∣ k}^{T} .

G_{k} := (I_{n} - T_{2∣ k}^{C_{k}} J_{2∣ k}) C_{k}^{- 1} J_{1∣ k}^{T} .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Optimization Algorithms Research · Sparse and Compressive Sensing Techniques · Advanced Control Systems Optimization

Full text

A Method to Guarantee Local Convergence for Sequential Quadratic Programming with Poor Hessian Approximation

Tuan T. Nguyen, Mircea Lazar and Hans Butler The authors are with the Department of Electrical Engineering, Eindhoven University of Technology, P.O. Box 513, 5600 MB Eindhoven, The Netherlands. E-mails: {t.t.nguyen,m.lazar,h.butler}@tue.nl

Abstract

Sequential Quadratic Programming (SQP) is a powerful class of algorithms for solving nonlinear optimization problems. Local convergence of SQP algorithms is guaranteed when the Hessian approximation used in each Quadratic Programming subproblem is close to the true Hessian. However, a good Hessian approximation can be expensive to compute. Low cost Hessian approximations only guarantee local convergence under some assumptions, which are not always satisfied in practice. To address this problem, this paper proposes a simple method to guarantee local convergence for SQP with poor Hessian approximation. The effectiveness of the proposed algorithm is demonstrated in a numerical example.

I INTRODUCTION

Sequential Quadratic Programming (SQP) is one of the most effective methods for solving nonlinear optimization problems. The idea of SQP is to iteratively approximate the Nonlinear Programming (NLP) problem by a sequence of Quadratic Programming (QP) subproblems [1]. The QP subproblems should be constructed in a way that the resulting sequence of solutions converges to a local optimum of the NLP.

There are different ways to construct the QP subproblems. When the exact Hessian is used to construct the QP subproblems, local convergence with quadratic convergence rate is guaranteed. However, the true Hessian can be indefinite when far from the solution. Consequently, the QP subproblems are non-convex and generally difficult to solve, since the objective may be unbounded below and there may be many local solutions [2]. Moreover, computing the exact Hessian is generally expensive, which makes SQP with exact Hessian difficult to apply to large-scale problems and real-time applications.

To overcome these drawbacks, positive (semi-) definite Hessian approximations are usually used in practice. SQP methods using Hessian approximations generally guarantee local convergence under some assumptions. Some SQP variants employ iterative updates scheme for the Hessian approximation to keep it close to the true Hessian. Broyden-Fletcher-Goldfarb-Shanno (BFGS) is one of the most popular update schemes of this type [3, 4]. The BFGS-SQP version guarantees superlinear convergence when the initial Hessian estimate is close enough to the true Hessian [1]. Another variant which is very popular for constrained nonlinear least square problems is the Generalized Gauss-Newton (GGN) method [5, 6]. GGN method converges locally only if the residual function is small at the solution [7]. Some other SQP variants belong to the class of Sequential Convex Programming (SCP), or Sequential Convex Quadratic Programming (SCQP) methods, which exploit convexity in either the objective or the constraint functions to formulate convex QP subproblems [8, 9]. SCP methods also have local convergence under similar assumption of small residual function. However, these assumptions are not always satisfied in practice, resulting in poor Hessian approximation and thus no convergence is guaranteed.

This paper proposes a simple method to guarantee local convergence for SQP methods with poor Hessian approximations. The proposed method interpolates between the search direction provided by solving the QP subproblem and a feasible search direction. It is proven that there exists a suitable interpolation coefficient such that the resulting algorithm converges locally to a local optimum of the NLP with linear convergence rate. A numerical example is presented to demonstrate the effectiveness of the proposed method.

The idea of interpolating an optimal search direction with a feasible search direction was proposed in our previous work for quadratic optimization problems with nonlinear equality constraints [10]. The method proposed in [10] was applied effectively to a practical application in commutation of linear motors [11]. This paper extends the idea to general nonlinear programming problems.

The remainder of this paper is organized as follows. Section II introduces the notation used in the paper. Section III reviews the basic SQP method. Section IV presents the proposed algorithm and proves the optimality property and local convergence property of the algorithm. An example is shown in Section V for demonstration. Section VI summarizes the conclusions.

II NOTATION

Let $\mathbb{N}$ denote the set of natural numbers, $\mathbb{R}$ denote the set of real numbers. The notation $\mathbb{R}_{[c_{1},c_{2})}$ denotes the set $\{c\in\mathbb{R}:c_{1}\leq c<c_{2}\}$ . Let $\mathbb{R}^{n}$ denote the set of real column vectors of dimension $n$ , $\mathbb{R}^{n\times m}$ denote the set of real $n\times m$ matrices. For a vector $x\in\mathbb{R}^{n}$ , $x_{[i]}$ denotes the $i$ -th element of $x$ . The notation $0_{n\times m}$ denotes the $n\times m$ zero matrix and $I_{n}$ denotes the $n\times n$ identity matrix. Let $\|\cdot\|_{2}$ denote the 2-norm. The Nabla symbol $\nabla$ denotes the gradient operator. For a vector $x\in\mathbb{R}^{n}$ and a mapping $\Phi:\mathbb{R}^{n}\rightarrow\mathbb{R}$

[TABLE]

Let $\mathcal{B}(x_{0},r)$ denote the open ball $\{x\in\mathbb{R}^{n}:\|x-x_{0}\|_{2}<r\}$ .

III THE BASIC SQP METHOD

This section reviews the basic SQP method. Consider the nonlinear optimization problem with nonlinear equality constraints:

Problem III.1 (NLP)

[TABLE]

where $x\in\mathbb{R}^{n}$ , $F_{1}:\mathbb{R}^{n}\rightarrow\mathbb{R}$ and $F_{2}:\mathbb{R}^{n}\rightarrow\mathbb{R}^{m}$ . Here, $n$ is the number of optimization variables and $m$ is the number of constraints. In this paper, we are only interested in the case when the constraint set has an infinite number of points, i.e. $m<n$ , since the other cases are trivial. Furthermore, let us assume that the columns of $\nabla_{x}F_{2}(x)^{T}$ are linearly independent at the solutions of the NLP.

For ease of presentation, in this paper we only consider equality constraints. The method can be extended to inequality constraints using an active set strategy or squared slack variables [1, Section 4].

First, let us define the Lagrangian function of the NLP Problem III.1

[TABLE]

where $\lambda\in\mathbb{R}^{m}$ is the Lagrange multipliers vector. The Karush-Kuhn-Tucker (KKT) optimality conditions of Problem III.1 are

[TABLE]

where $J_{1}(x):=\nabla_{x}F_{1}(x)$ and $J_{2}(x):=\nabla_{x}F_{2}(x)$ are the Jacobian matrices of $\nabla_{x}F_{1}(x)$ and $\nabla_{x}F_{2}(x)$ . Note that $J_{1}(x)\in\mathbb{R}^{1\times n}$ and $J_{2}(x)\in\mathbb{R}^{m\times n}$ .

The solution of the optimization problem is searched for in an iterative way. At a current iterate $x_{k}$ , the next iterate is computed as

[TABLE]

where $\Delta x_{k}$ is the search direction. In SQP methods, the search direction $\Delta x^{SQP}_{k}$ is the solution of the following QP subproblem

Problem III.2 (QP Subproblem)

[TABLE]

where we introduce the following notation for brevity

[TABLE]

Here, $B_{k}\in\mathbb{R}^{n\times n}$ is either the exact Hessian of the Lagrangian $\nabla_{xx}^{2}\mathcal{L}(x,\lambda)$ , or a positive (semi-) definite approximation of the Hessian. Similar to [1], to guarantee that the QP subproblem has a unique solution, we assume that the matrices $B_{k}$ satisfy the following conditions:

Assumption III.3

The matrices $B_{k}$ are uniformly positive definite on the null spaces of the matrices $J_{2|k}$ , i.e., there exists a $\beta_{1}>0$ such that for each $k$

[TABLE]

for all $d\in\mathbb{R}^{n}$ which satisfy

[TABLE]

Assumption III.4

The sequence $\{B_{k}\}$ is uniformly bounded, i.e, there exists a $\beta_{2}>0$ such that for each $k$

[TABLE]

The KKT optimality conditions of the QP subproblem III.2 are

[TABLE]

or equivalently

[TABLE]

It should be noted that $B_{k}$ is positive (semi-) definite and is not necessarily invertible, but the matrix $\begin{bmatrix}B_{k}&J_{2|k}^{T}\\ J_{2|k}&0_{m\times m}\end{bmatrix}$ is invertible due to Assumption III.3 [12, Theorem 3.2]. Therefore, the KKT condition (6) has a unique solution

[TABLE]

For convergence analysis, it is convenient to have an explicit expression of $\Delta x^{SQP}_{k}$ . Since $J_{2|k}^{T}J_{2|k}$ is positive semidefinite and $B_{k}$ is positive deinite on the null space of $J_{2|k}$ , there exists a constant $c_{0}$ such that [13, Lemma 3.2.1]

[TABLE]

Let us define

[TABLE]

We have that $C_{k}$ and $D_{k}$ are positive definite due to Assumption III.3. It holds that [14, Chapter 6]

[TABLE]

The solution $\Delta x^{SQP}_{k}$ can then be written in an explicit form

[TABLE]

where

[TABLE]

Notice that $T^{C_{k}}_{2|k}\in\mathbb{R}^{n\times m}$ is a generalized right inverse of $J_{2|k}$ , i.e. $J_{2|k}T^{C_{k}}_{2|k}=I_{m}$ .

It should be noted that if $B_{k}$ is nonsingular then $\Delta x^{SQP}_{k}$ can also be written as

[TABLE]

where

[TABLE]

In this case, both (10) and (11) give the same solution.

If $B_{k}$ is the exact Hessian then the basic SQP method is equivalent to applying Newton’s method to solve the KKT conditions (2), which guarantees quadratic local convergence rate [15, Chapter 18]. When an approximation is used instead, local convergence is guaranteed only when $B_{k}$ is close enough to the true Hessian. The readers are referred to [1, Section 3] for more details on local convergence of SQP.

IV PROPOSED METHOD

This section proposes a simple method to guarantee local convergence for SQP with poor Hessian approximation. The proposed method interpolates between an optimal search iteration, without local convergence guarantee, and a feasible search iteration with guaranteed local convergence.

The search direction $\Delta x^{SQP}_{k}$ can be viewed as the optimal direction which iteratively leads to the optimal solution of the NLP, if the iteration converges. However, local convergence is not guaranteed if $B_{k}$ is a poor approximation of the true Hessian.

To guarantee local convergence with poor Hessian approximation, we propose a new search direction which is the interpolation between the optimal search direction $\Delta x^{SQP}_{k}$ and a feasible search direction $\Delta x^{f}_{k}$ , i.e.

[TABLE]

where $\alpha\in\mathbb{R}_{(0,1)}$ . The feasible search direction $\Delta x^{f}_{k}$ only searches for a feasible solution of the set of constraints, but its local convergence is guaranteed. The idea of this proposed interpolated update is to combine the optimality property of the SQP update and the local convergence property of the feasible update.

The feasible search direction can be found as a solution of the linearized constraints

[TABLE]

Since $m<n$ , there is an infinite number of solutions for (13). Two possible solutions are

[TABLE]

where $T_{2|k}\in\mathbb{R}^{n\times m}$ is the Moore-Penrose generalized right inverse of $J_{2|k}$ [16], i.e.

[TABLE]

We propose the following feasible search direction

[TABLE]

It can be verified that $\Delta x^{f}_{k}$ is a solution of (13) as follows

[TABLE]

It has been proven that the feasible updates (14), (15) and (16) converge locally to a feasible solution of the constraints [17, 18].

It is worth mentioning that using the search direction $\Delta x^{f2}_{k}$ in (15) can also guarantee local convergence for the interpolated update. However, this search direction results in the presence of the term $\alpha T^{C_{k}}_{2|k}F_{2|k}$ in the interpolated update (12), which unnecessarily increases the computational load. Therefore, the search direction $\Delta x^{f}_{k}$ in (16) is proposed to help eliminate the unnecessary term $\alpha T^{C_{k}}_{2|k}F_{2|k}$ from the interpolated update (12).

Substituting (10) and (16) into the interpolated update (12) results in

[TABLE]

For brevity, let us denote $G:\mathbb{R}^{n}\rightarrow\mathbb{R}^{n}$ as follows

[TABLE]

In what follows we will prove the optimality property and the local convergence property of the proposed search iteration (18).

Theorem IV.1

If the iteration (18) converges to a fixed point $x_{*}$ , then $x_{*}$ satisfies the KKT optimality conditions (2).

Proof.

Let us denote

[TABLE]

From (5), (13) and (12), it follows that

[TABLE]

By definition, $x_{*}$ is a fixed point of the proposed iteration (18) if

[TABLE]

As a result we have

[TABLE]

Substituting (22) into (16) results in

[TABLE]

It follows from (12), (21) and (23) that

[TABLE]

Due to (4) and (24) we have

[TABLE]

From (22) and (25), it can be concluded that $x_{*}$ satisfies the KKT optimality conditions (2). ∎

Next, we will prove local convergence of the proposed iteration. Let us assume that the approximations $B_{k}$ satisfy the following condition

Assumption IV.2

There exists a $\beta_{3}>0$ such that for each $k\geq 1$

[TABLE]

The following proposition will be used in the proof.

Proposition IV.3

Let $\mathcal{D}\subseteq\mathbb{R}^{n}$ be a convex set in which $F_{2}:\mathcal{D}\rightarrow\mathbb{R}^{m}$ is differentiable and $J_{2}(x)$ is Lipschitz continuous for all $x\in\mathcal{D}$ , i.e. there exists a $\gamma>0$ such that

[TABLE]

Then

[TABLE]

A proof of Proposition IV.3 can be found in [18].

Theorem IV.4

Let $\mathcal{D}\subset\mathbb{R}^{n}$ be a bounded convex set in which the following conditions hold

(i)

$F_{1}(x)$ and $F_{2}(x)$ are Lipschitz continuous and continuosly differentiable, 2. (ii)

$J_{1}(x)$ and $J_{2}(x)$ are Lipschitz continuous and bounded, 3. (iii)

$T_{2}(x)$ is bounded, 4. (iv)

there exists a solution $x_{*}$ of the KKT optimality conditions (2) in $\mathcal{D}$ .

Then there exist a $\alpha\in\mathbb{R}_{(0,1)}$ and a $r\in\mathbb{R}_{>0}$ such that $\mathcal{B}(x_{*},r)\subseteq\mathcal{D}$ and iteration (18) converges to $x_{*}$ for any initial estimate $x_{0}\in\mathcal{B}(x_{*},r)$ .

Proof.

Let us consider two cases

•

$F_{2}(x)$ is linear.

•

$F_{2}(x)$ is nonlinear.

Case 1: in the first case when $F_{2}(x)$ is linear, for any iterate $x_{k}$ we can write

[TABLE]

Since $\Delta x_{k}$ satisfies (20), it follows that

[TABLE]

Therefore, we have that $F_{2|k}=0_{m\times 1}$ for all $k\geq 1$ . The interpolated update is then reduced to

[TABLE]

Let us denote

[TABLE]

From (7) and (9) we have

[TABLE]

We have $\begin{bmatrix}B_{k}&J_{2|k}^{T}\\ J_{2|k}&0_{m\times m}\end{bmatrix}$ is positive definite due to Assumption III.3. It follows that $W_{k}$ is positive definite, due to the facts that the inverse of a positive definite matrix is positive definite, and that every principal submatrix of a positive definite matrix is positive definite [14, Chapter 8]. As a result we have

[TABLE]

This shows that $\Delta x^{SQP}_{k}$ is a descent direction that leads to a decrease in the cost function $F_{1}(x)$ . In addition, since $F_{2|k}=0_{m\times 1}$ for all $k\geq 1$ , we have that $\Delta x^{SQP}_{k}$ is also a feasible direction. Therefore, there exits a stepsize $\alpha\in\mathbb{R}_{(0,1)}$ such that the iteration (30) converges [15, Chapter 3].

Case 2: let us now consider the case when $F_{2}(x)$ is nonlinear. Since the nonlinear constraints are solved by successive linearization (20), we can assume that the solution is reached asymptotically, i.e. $F_{2|k-1}\to 0$ as $k\to\infty$ and $F_{2|k-1}\neq 0$ for all $k<\infty$ . We have

[TABLE]

The second equality in (34) was obtained due to equation (20).

Let us consider the first term on the right hand side of (34). Here, $(I_{n}-T_{2|k-1}J_{2|k-1})$ is the orthogonal projection onto the null space of $J_{2|k-1}$ [14, Chapter 6]. It holds that

[TABLE]

A proof of (35) can be found in [10]. It follows that

[TABLE]

The equality holds if and only if $\Delta x_{k-1}$ is in the null space of $J_{2|k-1}$ , which is equivalent to

[TABLE]

This shows that the equality holds if and only if $x_{k-1}$ is an exact solution of the constraints. This contradicts the assumption that $F_{2|k-1}\neq 0$ for all $k<\infty$ . Therefore, there exists a constant $M\in\mathbb{R}_{(0,1)}$ such that

[TABLE]

Next, let us consider the second term on the right hand side of (34). Observe that by (19), the definition of the matrix inverse and the strict positive definiteness of $C_{k}$ , each element of $G_{k}$ is obtained by adding, multiplying and/or division of real-valued functions. Division only occurs due to the inverse of $C_{k}$ , via the term $\frac{1}{\det(C_{k})}$ . This allows the application of Theorem 12.4 and Theorem 12.5 in [19], to establish Lipschitz continuity in $x$ of $G_{k}$ , from Assumptions III.4 and IV.2 and the conditions that $J_{1}(x)$ and $J_{2}(x)$ are Lipschitz continuous and bounded for all $x\in\mathcal{D}$ . Note that although the theorems in [19] consider functions from $\mathbb{R}$ to $\mathbb{R}$ , the same arguments apply to functions from $\mathbb{R}^{n}$ to $\mathbb{R}$ , by using an appropriate, norm-based Lipschitz inequality. As a result we have

[TABLE]

where $N>0$ .

For the third term on the right hand side of (34), due to the condition that $T_{2}(x)$ is bounded and Proposition IV.3, we have

[TABLE]

where $L>0$ .

From (34), (38), (39) and (40), it follows that

[TABLE]

where $K=M+\alpha N$ . We have $K<1$ for any $\alpha$ that satisfies

[TABLE]

From (18), (21) and (22), it follows that

[TABLE]

Due to the Lipschitz continuity of $G(x)$ and $F_{2}(x)$ , we have

[TABLE]

where $Q>0$ . If $x_{0}$ is close enough to $x_{*}$ such that

[TABLE]

then

[TABLE]

Next, we will prove that if

[TABLE]

then

[TABLE]

Indeed, if (47) holds then due to (41) we have

[TABLE]

This leads to

[TABLE]

We have proven that if (47) holds then (48) holds. Since (46) also holds for any $x_{0}$ which satisfies (45), it follows by induction that

[TABLE]

Therefore, it follows from (41) that

[TABLE]

Therefore, algorithm (18) converges, and by Theorem IV.1, it converges to a KKT point, for any initial estimate $x_{0}\in\mathcal{B}\left(x_{*},r\right)$ , where

[TABLE]

and any $\alpha\in\mathbb{R}_{(0,1)}$ which makes $K<1$ . ∎

It can be seen from (52) that the proposed algorithm has a linear convergence rate.

Remark IV.5

The explicit expression (10) is of interest for convergence analysis. For implementation, instead of (10), the SQP search direction can also be computed as

[TABLE]

Note that (54) differs from (10) in implementation, but they both give the same solution. In this case, using the feasible search direction $\Delta x^{f2}_{k}$ in (15) for the interpolated iteration is more convenient. It can be proven in a similar way that the optimality property and local convergence property hold for the resulting interpolated iteration (12).

The proposed method can be applied to any positive (semi) definite Hessian approximations which satisfy Assumptions III.3, III.4, IV.2. Popular Hessian approximations such as GGN, or any constant Hessian approximation satisfy these conditions. It is worth noting that the simple identity approximation $B_{k}=I_{n}$ also satisfies the mentioned conditions.

The proposed method therefore can be useful in some of the following situations. When the exact Hessian is indefinite or is too expensive to compute and the search iteration using Hessian approximations fails to converge, the proposed method can be used to enforce convergence. For large-scale cases when even Hessian approximations are computationally costly, the simple identity Hessian approximation $B_{k}=I_{n}$ can be used together with the proposed interpolation method. This results in the same search iteration as proposed in [20, 21], although the iteration and convergence therein were derived in a different way. Furthermore, if the cost function is just the 2-norm $F_{1}(x)=x^{T}x$ and the identity Hessian approximation is used then the proposed algorithm recovers the algorithm in our previous work [10]. It should be noted, however, that the identity Hessian approximation may result in a slower convergence rate compared to other Hessian approximations, as can be seen in the example in Section V.

V NUMERICAL EXAMPLE

This section presents a numerical example to verify the performance of the proposed algorithm. Let us consider the test problem 77 in [22].

Problem V.1

[TABLE]

The initial estimate is $x_{0}=\begin{bmatrix}2&2&2&2&2\end{bmatrix}^{T}$ and $\lambda_{0}=\begin{bmatrix}0&0\end{bmatrix}^{T}$ . This is a nonlinear equality constrained least square problem with nonzero residual.

In nonlinear constrained least square problems, the cost function has the least square form

[TABLE]

where $R:\mathbb{R}^{n}\rightarrow\mathbb{R}^{p}$ . A popular Hessian approximation for this type of problems is the GGN approximation

[TABLE]

It is well known that the SQP method with GGN Hessian approximation, also called the GGN method, converges locally if the residual function $R(x)$ is small at the solution [7].

In this example, we test the exact Hessian SQP method (SQP-EH), the GGN method (SQP-GGN), the proposed interpolated method with GGN Hessian approximation (iSQP-GGN), and the proposed interpolated method with identity Hessian approximation $B_{k}=I_{n}$ (iSQP-I). The optimization algorithms are programmed in Matlab and tested on a 2.4GHz computer. The measure of convergence is the 2-norm of the KKT matrix (2), which is called the KKT residual. The optimization algorithms terminate when the KKT residual is less than $10^{-7}$ .

The test results are as follows. The SQP-EH method converges quadratically as expected. The SQP-GGN method does not converge. The iSQP-GGN method converges linearly. This demonstrates that the proposed interpolation scheme can guarantee convergence for the GGN Hessian approximation. The iSQP-I method also converges linearly, but at a slower rate. This is expected since the GGN approximation is a better approximation than the identity matrix. The convergence rate of the methods are shown in Fig 1. The interpolation coefficients $\alpha$ shown here are among the ones that result in fastest convergence rates for each method.

The SQP-EH method, the proposed iSQP-GGN and iSQP-I methods converge to the same solution $x=[1.166172,1.182111,1.380257,1.506036,0.610920]^{T}$ , which is the same with the solution mentioned in [22].

The number of iterations and computation times are summarized in Table I. It is observed that the SQP-EH method requires the least number of iterations, as it converges quadratically. The iSQP-GGN method with $\alpha=0.35$ needs a larger number of iterations, but the total computation time is lower, since it requires less computation per iteration. This demonstrates that with a suitable choice of $\alpha$ , the proposed method can be more efficient than the SQP-EH method, especially in large-scale cases when computation of the exact Hessian can be very expensive.

Examples of large-scale problems are nonlinear model predictive control (NMPC) problems. In [20, 21], the iSQP-I method, which is called projected gradient and constraint linearization method therein, is shown to outperform some commercial solvers when applying it to the NMPC problem for an inverted pendulum. The results of the example above suggest that with a suitable choice of the Hessian approximation, e.g. GGN approximation, the proposed method may even perform better, given the special sparse structure of the NMPC problem. Demonstrating this will be a subject of our future research.

VI CONCLUSIONS

This paper proposed a method to guarantee local convergence for SQP with poor Hessian approximation. The proposed method interpolates between the SQP search direction and a suitable feasible search direction, in order to combine the optimality property and the local convergence property of the two search directions. It was proven that the proposed algorithm converges locally at linear rate to a KKT point of the nonlinear programming problem. The effectiveness of the method was illustrated in a numerical example.

In this paper we only consider the local convergence property. For future work, we will extend the convergence result to global convergence using an augmented Lagrangian merit function [23].

Bibliography23

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] P. T. Boggs and J. W. Tolle, “Sequential quadratic programming,” Acta Numerica , vol. 4, p. 1–51, 1995.
2[2] P. E. Gill, M. A. Saunders, and E. Wong, On the Performance of SQP Methods for Nonlinear Optimization . Cham: Springer International Publishing, 2015, pp. 95–123.
3[3] M. J. D. Powell, A fast algorithm for nonlinearly constrained optimization calculations . Berlin, Heidelberg: Springer Berlin Heidelberg, 1978, pp. 144–157.
4[4] R. Fletcher, Practical Methods of Optimization; (2Nd Ed.) . New York, NY, USA: Wiley-Interscience, 1987.
5[5] H. G. Bock, Recent Advances in Parameter identification Techniques for O.D.E. Boston, MA: Birkhäuser Boston, 1983, pp. 95–121.
6[6] H. G. Bock, E. Kostina, and J. P. Schlöder, Direct Multiple Shooting and Generalized Gauss-Newton Method for Parameter Estimation Problems in ODE Models . Cham: Springer International Publishing, 2015, pp. 1–34.
7[7] M. Diehl, “Real-time optimization for large scale nonlinear processes,” Ph.D. dissertation, Universität Heidelberg, Germany, 2001.
8[8] Q. T. Dinh, C. Savorgnan, and M. Diehl, “Adjoint-based predictor-corrector sequential convex programming for parametric nonlinear optimization,” SIAM Journal on Optimization , vol. 22, no. 4, pp. 1258–1284, 2012.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

A Method to Guarantee Local Convergence for Sequential Quadratic Programming with Poor Hessian Approximation

Abstract

I INTRODUCTION

II NOTATION

III THE BASIC SQP METHOD

Problem III.1** **(NLP)

Problem III.2** **(QP Subproblem)

Assumption III.3

Assumption III.4

IV PROPOSED METHOD

Theorem IV.1

Proof.

Assumption IV.2

Proposition IV.3

Theorem IV.4

Proof.

Remark IV.5

V NUMERICAL EXAMPLE

Problem V.1

VI CONCLUSIONS

Problem III.1 (NLP)

Problem III.2 (QP Subproblem)