No-gap second-order conditions under $n$-polyhedric constraints and finitely many nonlinear constraints
Gerd Wachsmuth

TL;DR
This paper establishes second-order optimality conditions for constrained optimization problems using the concept of n-polyhedricity, under weak regularity assumptions, and minimizes the gap between necessary and sufficient conditions.
Contribution
It introduces second-order optimality conditions under weak regularity assumptions using n-polyhedricity, reducing the gap between necessary and sufficient conditions.
Findings
Derived necessary first and second order optimality conditions.
Established sufficient optimality conditions with minimal gap.
Applied the concept of n-polyhedricity to nonlinear constraints.
Abstract
We consider an optimization problem subject to an abstract constraint and finitely many nonlinear constraints. Using the recently introduced concept of -polyhedricity, we are able to provide second-order optimality conditions under weak regularity assumptions. In particular, we prove necessary optimality conditions of first and second order under the constraint qualification of Robinson, Zowe and Kurcyusz. Similarly, sufficient optimality conditions are stated. The gap between both conditions is as small as possible.
Click any figure to enlarge with its caption.
Figure 1Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsOptimization and Variational Analysis
\addbibresource
world.bib
No-gap second-order conditions under -polyhedric constraints and finitely many nonlinear constraints
Gerd Wachsmuth
Abstract
We consider an optimization problem subject to an abstract constraint and finitely many nonlinear constraints. Using the recently introduced concept of -polyhedricity, we are able to provide second-order optimality conditions under weak regularity assumptions. In particular, we prove necessary optimality conditions of first and second order under the constraint qualification of Robinson, Zowe and Kurcyusz. Similarly, sufficient optimality conditions are stated. The gap between both conditions is as small as possible.
keywords:
second order optimality condition, critical cone, polyhedricity, Legendre form, non-unique multiplier
{msc}\mscLink
49K27
1 Introduction
In this work, we are interested in problems of type
[TABLE]
Our goal is the derivation of second-order necessary and sufficient optimality conditions with minimal gap. Here, and , , are twice Fréchet differentiable, is a Banach space and is assumed to be closed and convex. In particular, we interested in the situation that
[TABLE]
where is a finite measure space and are measurable functions such that becomes non-empty.
It is clear that first-order necessary optimality conditions for (P) can be obtained by using the constraint qualification of Robinson-Zowe-Kurcyusz (RZKCQ), see \citeZoweKurcyusz1979,Robinson1976:1. There are a couple of papers available in which second-order conditions for problems of type (P) are considered. We mention exemplarily \citeBonnansZidani1999,CasasTroeltzsch2002. However, in these papers, the authors have to impose additional regularity assumptions to arrive at second-order necessary conditions. These regularity conditions are rather strong and, in particular, they imply uniqueness of Lagrange multipliers. Thus, these conditions cannot be satisfied in situations in which the Lagrange multipliers associated with a stationary point are not unique. The case of infinite-dimensional equality constraints is considered in \citeIoffe1979.
In this paper, we are utilizing the recently introduced notion of -polyhedricity, see \cite[Definition 4.3]Wachsmuth2016:2 and \crefsubsec:n_polyhedricity below, to derive second-order necessary conditions. Note that the set from (1) is -polyhedric for all non-negative integers , see \cite[Example 4.21(1)]Wachsmuth2016:2. Let be a local optimizer of (P). Under the RZKCQ, there exist multipliers , such that
[TABLE]
Here, denotes the Lagrangian (see (14) below) and is the normal cone of at the point . The set of all multipliers satisfying the above conditions is denoted by . Our main contributions are the following. We shall show that the condition
[TABLE]
is necessary for local optimality if RZKCQ is satisfied and if the set is -polyhedric, where is bigger than the number of active constraints. If, additionally, a quadratic growth condition is satisfied at , we can show
[TABLE]
for some . Under a slight additional assumption, this last condition is also sufficient for the local optimality of .
The paper is organized as follows. In \crefsec:not we introduce the necessary notation and review some known results. The well-known first-order optimality conditions are given in \crefsec:foc. The main results of this paper concerning the second-order conditions for (P) are given in \crefsec:soc. In \crefsec:examples we present two examples which indicate that the results in this paper are sharp. The first example shows that the supremum in (2) is really necessary if the Lagrange multipliers are not unique. The second example demonstrates that the -polyhedricity assumption on is crucial and cannot be replaced by requiring polyhedricity only.
2 Notation, preliminaries and known results
2.1 Notation
We use the definitions and .
For a convex subset of a Banach space and , we define the radial cone, the tangent cone, the normal cone and the polar cone via
[TABLE]
respectively. The annihilator of a functional is defined as
[TABLE]
For and , we define the closed ball
[TABLE]
In order to discuss (P), it will be convenient to define via
[TABLE]
Moreover, we consider as a function from to . Let a point with be given. Using the active and inactive sets of indices, defined via
[TABLE]
respectively, it is easy to check that
[TABLE]
Moreover, for a feasible point of (P) we define the critical cone via
[TABLE]
2.2 On -polyhedricity
As mentioned in the introduction, we are going to employ the concept of -polyhedricity to derive second-order conditions for (P). The notion of -polyhedricity was recently introduced in \citeWachsmuth2016:2 and generalizes the well-known notion of polyhedricity due to \citeMignot1976,Haraux1977.
We recall that a closed convex set is called polyhedric at if
[TABLE]
It was shown in \cite[Lemma 4.1]Wachsmuth2016:2 that this condition equivalent to
[TABLE]
The latter condition is amenable to the following generalization. We say that is -polyhedric at for some , if
[TABLE]
holds, see \cite[Definition 4.3]Wachsmuth2016:2. Many sets which were known to be polyhedric are even -polyhedric for all , see, e.g., \cite[Example 4.21]Wachsmuth2016:2. In particular, this applies to the set of interest from (1).
We provide a lemma, which follows from a simple calculation, see also \cite[Lemma 4.4]Wachsmuth2016:2.
Lemma 2.1**.**
Assume that the set is -polyhedric for some at . Further, let , for be given such that . Then, the set
[TABLE]
is dense in
[TABLE]
2.3 Review of known results
We start by reviewing the results of \citeCasasTroeltzsch2002, see also \citeCasasTroeltzsch1999. In this paper, the authors studied a problem very similar to (P) with (1). However, they considered the situation in which the underlying space is a Lebesgue space and their analysis incorporates the important phenomenon of two-norm discrepancy. In the situation in which all functions are already differentiable in , the problem of \citeCasasTroeltzsch2002 coincides with (P). The main assumption for deriving second-order necessary conditions is a regularity assumption on the solution . For , the -inactive set is defined via
[TABLE]
With this notation, the regularity condition is given by
[TABLE]
Here, we used the notation
[TABLE]
Under the regularity assumption (5), \citeCasasTroeltzsch2002 prove the existence of unique multipliers , such that
[TABLE]
Moreover, they prove the second-order necessary condition
[TABLE]
The appearance of in this formula comes through the general setting of \citeCasasTroeltzsch2002 which includes the two-norm discrepancy. We mention that also sufficient second-order conditions are derived.
Next, we review the results of \citeBonnansZidani1999. In this work, a problem slightly more general than (P) is considered. In fact, the nonlinear constraints are replaced by , where is twice Fréchet differentiable and is a closed convex set in the Banach space . However, the strongest results are obtained in the case that is a polyhedron, i.e., a finitely intersection of closed half-spaces, and this is very similar to (P). To facilitate the comparison with our results, we apply their results to our problem (P). In this case, they use the regularity condition
[TABLE]
for a given KKT multiplier . Via the generalized open mapping theorem from \citeZoweKurcyusz1979, this condition is equivalent to
[TABLE]
In the literature, this condition is often called “strict qualification condition”. To our knowledge, this condition appears first in \cite[Theorem 3.3]MaurerZowe1979. Moreover, it is known that this condition implies the uniqueness of the multipliers , see \citeShapiro1997. Moreover, it is straightforward to check that (5) is strictly stronger than (8). Under condition (8), \cite[Theorem 2.7(iii)]BonnansZidani1999 gives the second-order necessary condition
[TABLE]
Under the additional assumption that the second derivative of the Lagrangian is a Legendre form, they also derive sufficient conditions. Consequently, the gap between necessary and sufficient conditions of second order is as small as possible.
Using the inheritance property \cite[Lemma 3.3]Wachsmuth2016:2 of polyhedric sets, it is possible to generalize the results of \citeBonnansZidani1999 in the following way. Instead of (P), we consider the much more general problem
[TABLE]
Here, , are a Banach spaces, are twice Fréchet differentiable and , are closed, convex and polyhedric sets. Given multipliers , the condition (8) becomes
[TABLE]
Under this condition, we can apply \cite[Theorem 5.4]Wachsmuth2016:2 and obtain the second-order necessary condition
[TABLE]
Note that one has to rewrite the constraints as to apply this theorem. Thus, if this strong regularity condition (10) is satisfied, we can replace the assumption of being polyhedral in \citeBonnansZidani1999 by the much weaker assumption of polyhedricity. We note that also necessary conditions of second order can be found in \cite[Theorems 5.6, 5.7]Wachsmuth2016:2.
3 First-order optimality conditions and constraint qualifications
In this section, we briefly recall first-order optimality conditions for the problem (P) and the constraint qualifications which are required for the derivation. In order to put our problem into the framework of \citeZoweKurcyusz1979, we recall
[TABLE]
and . Now, our problem (P) reads
[TABLE]
An application of \cite[Theorem 3.1]ZoweKurcyusz1979 implies the following first-order necessary conditions.
Theorem 3.1**.**
Assume that is a local minimizer of (P) such that
[TABLE]
is satisfied. Then, there exist , such that
[TABLE]
It is clear that is equivalent to
[TABLE]
Further, condition (13) can be written concisely as
[TABLE]
where the Lagrangian is defined via
[TABLE]
and a prime denotes partial differentiation w.r.t. . For convenience, we recall the expressions for the first and second derivative of the Lagrangian w.r.t.
[TABLE]
for . Here, we used the common abbreviation for the action of a bilinear form on the tuple .
For an arbitrary feasible point , we define the set of Lagrange multipliers via
[TABLE]
We also recall from \cite[Theorem 4.1]ZoweKurcyusz1979 that (RZKCQ) implies the boundedness of . We mentioned that the boundedness of can be shown under the slightly weaker condition
[TABLE]
by a suitable modification of the proof of \cite[Theorem 4.1]ZoweKurcyusz1979. Note that, however, might be empty if only (16) is satisfied.
4 No-gap second-order optimality conditions
In this section, we consider second-order optimality conditions for problem (P).
We begin by the derivation of necessary optimality conditions. In order to apply the results from \cite[Section 3.2.3]BonnansShapiro2000, we introduce
[TABLE]
Now, (P) reads
[TABLE]
Let be a feasible point of (P). From \cite[(3.20) and (3.122)]BonnansShapiro2000, we recall the definition of the critical cone
[TABLE]
which matches our definition (4), and of the set of radial critical directions
[TABLE]
Note that we have
[TABLE]
due to (3).
From \cite[Proposition 3.53]BonnansShapiro2000 we get the following result.
Lemma 4.1**.**
Assume that is a local minimizer of (P) such that (RZKCQ) is satisfied. Further suppose that is dense in . Then,
[TABLE]
The density assumption in this result can be shown under an additional condition on the constraint set .
Theorem 4.2**.**
Assume that is a local minimizer of (P) such that (RZKCQ) is satisfied. We denote by the number of active constraints in , i.e., the number of indices with . Under the assumption that is -polyhedric, we have
[TABLE]
Proof 4.3**.**
We recall the formula
[TABLE]
for the tangent cone of , where denotes the set of active indices. For brevity, we set . Thus,
[TABLE]
In these sets, we have many scalar equalities and inequalities. Due to the assumption that is -polyhedric, we can invoke \creflem:n_polyhedricity. This implies that is dense in . Thus, the assertion follows from \creflem:necessary_condition.
Note that the supremum in the above inequality is attained, since the set of multipliers is weak- compact, see \cite[Theorem 3.9]BonnansShapiro2000, and the second derivative of the Lagrangian is weak- continuous w.r.t. the multipliers. Hence, (18) can be rephrased as follows. For every critical direction , there exist multipliers such that .
If a quadratic growth condition is satisfied at , we get a better inequality.
Corollary 4.4**.**
Additionally to the assumptions of \crefthm:SNC, we assume that the growth condition
[TABLE]
is satisfied for some at , where F=\set{x\in C\given g(x)\in K} is the feasible set of (P). Then,
[TABLE]
Proof 4.5**.**
Under (19), is a local minimizer of on . Note that is twice Fréchet differentiable if is a Hilbert space. In this case, a direct application of \crefthm:SNC yields the claim. If is not a Hilbert space, we can still reproduce \cite[Lemma 3.44]BonnansShapiro2000, which is enough to prove \cite[Prop. 3.53]BonnansShapiro2000 and, consequently, \crefthm:SNC. To this end, we set and check that a second-order Taylor expansion similar to \cite[(3.100)]BonnansShapiro2000 holds. To this end, let and be given such that . We define the path . Then,
[TABLE]
as . Hence, the modified function satisfies the required second-order Taylor expansion.
As usual, second-order sufficient conditions can be derived by a contradiction argument.
Theorem 4.6**.**
Assume that is a stationary point of (P), i.e., there exist . Further, we suppose that the CQ (16), which is slightly weaker than Robinson’s CQ, be satisfied. We assume that
[TABLE]
holds for some , where the extended critical cone is given by
[TABLE]
Then, for all , there is such that
[TABLE]
where F=\set{x\in C\given g(x)\in K} is the feasible set of (P).
Proof 4.7**.**
We fix and proceed by contradiction. This yields a sequence with and .
Using the Fréchet differentiability of , we have
[TABLE]
Owing to the CQ and the generalized open mapping theorem \cite[Theorem 2.1]ZoweKurcyusz1979, we find sequences , with
[TABLE]
and . In particular, and
[TABLE]
Further,
[TABLE]
yields . Hence, for large enough.
Now, for large , we choose , such that
[TABLE]
This is possible since for large enough.
For large enough we have
[TABLE]
Next, we are going to use a Taylor expansion of the Lagrangian. Since and are twice Fréchet differentiable, we have the Taylor expansion
[TABLE]
and analogously for . Now, we utilize that the CQ (16) implies the boundedness of the multipliers . This yields that we can use a Taylor expansion for at and the remainder term is uniform w.r.t. the multipliers . Thus, we can continue with
[TABLE]
In order to deal with the second and third addend, we use again the boundedness of . Together with , both addends belong to as . Thus, we can continue via
[TABLE]
Dividing by and passing to the limit yields the contradiction .
Using the notion of Legendre forms, it possible to weaken the assumed inequality (20). We recall from \cite[Section 6.2]IoffeTichomirov1979:2 that a continuous bilinear form on a Hilbert space is called a Legendre form, if is sequentially weakly lower semicontinuous and if
[TABLE]
Clearly, this definition can also be used if is not a Hilbert space, but only a Banach space. However, it was shown recently in \citeHarder2018 that a reflexive Banach space permits a Legendre form only if it possesses an equivalent Hilbert space norm. The notion of Legendre forms was generalized to non-quadratic forms in \cite[Definition 3.73]BonnansShapiro2000. Therein, a function is called an extended Legendre form, if it is weakly lower semicontinuous, positively homogeneous of degree and if
[TABLE]
is satisfied. We are interested in the case that
[TABLE]
is the maximized Hessian of the Lagrangian. Under the assumption that the set of multipliers is bounded, which holds, e.g., under (16), and non-empty, the function is finite, i.e., it maps to . The next results states necessary conditions which ensure that a sum of two functions is an extended Legendre form. It is inspired by \cite[Proposition 3.76 (ii)]BonnansShapiro2000.
Lemma 4.8**.**
Suppose that is an extended Legendre form and that is positively homogeneous of degree and weakly lower semicontinuous. Then, is an extended Legendre form.
Proof 4.9**.**
It is clear that is positively homogeneous of degree and weakly lower semicontinuous. Now, suppose that and . From
[TABLE]
we infer . Since is an extended Legendre form, follows. This shows that is an extended Legendre form.
The next result is an adaption of \cite[Proposition 3.77]BonnansShapiro2000 to the situation at hand.
Lemma 4.10**.**
Let be a feasible point such that is not empty and bounded. Further, we assume that is a Legendre form and that
- •
* is weakly continuous for all and*
- •
* is weakly lower semicontinuous for all .*
Then, the function defined in (21) is an extended Legendre form.
Proof 4.11**.**
For every , the function is weakly lower semicontinuous, since for . Moreover, these functions are positively -homogeneous. As the supremum of weakly lower semicontinuous functions, the function
[TABLE]
is weakly lower semicontinuous. Now, an application of \creflem:ex_leg_form yields the assertion.
From \cite[Lemma 3.75]BonnansShapiro2000, we obtain the following result.
Lemma 4.12**.**
Let be a reflexive Banach space. Suppose that (16) is satisfied at the feasible point and that is not empty. We further assume that
[TABLE]
is an extended Lagrange form. Then, the condition (20) is equivalent to
[TABLE]
In this case, we have a minimal gap between the necessary and sufficient conditions of \crefthm:SNC,thm:sufficient_condition.
5 Examples
In this section, we provide two examples. These examples illustrate two crucial ingredients of \crefthm:SNC.
The first example is constructed in such a way that the assumptions of \crefthm:SNC are satisfied and, hence, the necessary conditions (18) hold. However, the set of multipliers is not a singleton and the condition
[TABLE]
is violated for all . Hence, it is crucial to take the supremum over all multipliers in (18).
In the other example, we demonstrate that the assumption that is -polyhedric is crucial. To this end, we have to use a polyhedric set which is not -polyhedric.
5.1 Non-unique multipliers
This example is heavily inspired by \cite[Counterexample 1.2]CrouzeixMartinezLegazSeeger1995. We repeat this counterexample, since it will be important in the sequel. We define the matrices
[TABLE]
These matrices have the property that
[TABLE]
Indeed, this can be shown, e.g., by a distinction of the cases and . However, for every , the convex combination
[TABLE]
is not coercive on non-negative vectors, since at least one of the numbers
[TABLE]
will be negative.
We are going to construct a problem of the form
[TABLE]
Here, are (continuous) quadratic functions to be defined below and
[TABLE]
Our point of interest will be defined via
[TABLE]
It is clear that
[TABLE]
The function will satisfy
[TABLE]
The first conditions renders feasible for (24). Due to
[TABLE]
it is easy to check that
[TABLE]
thus, (RZKCQ) is satisfied. Next, we require
[TABLE]
and we compute the set of Lagrange multipliers . This amounts to find all , such that the corresponding satisfies
[TABLE]
By using the formula for the normal cone, we see that this is equivalent to . Thus is a stationary point and
[TABLE]
Note that the critical cone is given by
[TABLE]
Next, we define the second derivatives of and at . To this end, we use the notation
[TABLE]
for the average of a function over an interval and for the difference of the function with this average. With this notation, we introduce
[TABLE]
Note that the quadratic functions and are uniquely determined via the first and second derivatives in and the requirement . Let us check that the second-order sufficient condition (20) is satisfied. For we set . By utilizing (23), we have
[TABLE]
Hence, \crefthm:sufficient_condition implies that is a local minimizer.
It remains to check that the condition
[TABLE]
is violated for all . To this end, we take
[TABLE]
and observe . It is easy to check that
[TABLE]
for . Thus, for every and the associated , we have
[TABLE]
and these two terms cannot be simultaneously non-negative. Hence, there does not exist any multiplier such that (25) holds. This means that (25) fails to be a necessary optimality condition.
5.2 Constraint set which is not -polyhedric
Next, we give a counterexample to demonstrate that the assumption of being -polyhedric in \crefthm:SNC is crucial. Therefore, we need a set which is polyhedric (i.e., -polyhedric), but not -polyhedric. To the best of our knowledge, the set given in \cite[Example 4.24]Wachsmuth2016:2 is the only known set with this property. In order to state our counterexample, we need to adapt the construction from \cite[Example 4.24]Wachsmuth2016:2. In we consider the points
[TABLE]
where . We set
[TABLE]
Since the sequences and converge towards , the set is closed. In what follows, we check that is polyhedric. By arguing as in \cite[Example 4.24]Wachsmuth2016:2, we find that is polyhedric in . Next, it is a little bit tedious to check that is the intersection of the half-spaces which are defined by the following inequalities and that the points on the right-hand side are exactly those points of , , which lie on the boundary of the half-spaces:
[TABLE]
where . In the last two lines, we have used the coefficients
[TABLE]
From this representation of , we learn two things. First, all , are extreme points of and, thus, is not polyhedral. Second, the intersection is a polyhedron for all , since it can be written as a finite intersection of half-spaces. Thus, is closed for all . Hence, is polyhedric at all .
Hence, we have shown that is polyhedric, but not polyhedral. As in \cite[Example 4.24]Wachsmuth2016:2, we can also check that is not -polyhedric.
Next, we compute the intersection of with the hyperplane . To this end, let be the intersection of this hyperplane with the line segment joining and , i.e.,
[TABLE]
One can check that
[TABLE]
We define
[TABLE]
and claim that all points belong to the convex set
[TABLE]
and that the points belong to the relative boundary of this set. Indeed, after a straightforward manipulation, this claim is equivalent to the inequality
[TABLE]
and that we have equality for . This latter equality is clear. Moreover, one can check that for the derivative w.r.t. and for the derivative w.r.t. of the left-hand side is non-negative, both by using the definition of .
Now, we consider the optimization problem
[TABLE]
In order to cast this problem in the form (12), we set and . The feasible set of this problem is and this set is contained in . Hence,
[TABLE]
shows that is a local minimizer of the above problem. Since has a positive -coordinate and since has a negative -coordinate, it is easy to check that (RZKCQ), i.e.,
[TABLE]
is satisfied. Hence, there exist , such that the necessary condition from \crefthm:fonc, i.e.,
[TABLE]
is satisfied. Finally, we check that the necessary optimality condition of second order (18) does not hold. Since the constraint is linear, its second derivative vanishes and the precise value of the multiplier is irrelevant. Next, we construct an element of the tangent cone . From
[TABLE]
we find that . Moreover, and are clear. Thus, belongs to the critical cone , cf. (17). However,
[TABLE]
is negative. Hence, (18) is violated.
We mention that the only assumption of \crefthm:SNC which does not hold is the assumption that is -polyhedric. Hence, this assumption is essential. On the other hand, in the context of \citeBonnansZidani1999, the only assumption which might not hold is the satisfaction of the regularity condition (8). We check that this condition indeed fails. To this end, we start by computing the set of multipliers. It is clear that are multipliers at , if and only if the two conditions
[TABLE]
hold. Due to the construction of the set and due to , we have
[TABLE]
Hence, has to satisfy the inequalities
[TABLE]
Hence, and are the unique Lagrange multipliers for . Finally, the regularity condition (8) is violated, since
[TABLE]
This example also shows that assuming (8) in \citeBonnansZidani1999 cannot be replaced by the assumption of unique multipliers.
6 Conclusions
We have investigated problem (P) featuring an abstract constraint and finitely many nonlinear constraints . Previously, second-order necessary optimality conditions have been obtained under the rather strong regularity condition (8). We propose to use the concept of -polyhedricity of as a novel approach for deriving second-order necessary conditions. In fact, “almost all” sets which are known to be polyhedric are even -polyhedric, see, e.g., \cite[Example 4.21]Wachsmuth2016:2. This allows us to prove second-order necessary conditions under the assumption of the CQ of Robinson, Zowe and Kurcyusz. Second-order sufficient conditions can be obtained by the usual contradiction argument. By means of two counterexamples, we have seen that the assumptions and the formulation of \crefthm:SNC is sharp. The inclusion of the phenomenon of two-norms discrepancy is subject to future research. It would also be interesting to replace the finite-dimensional polyhedral cone by a set involving curvature, e.g., the cone of semi-definite matrices.
\printbibliography
