Strong Metric Subregularity of Mappings in Variational Analysis and Optimization
Radek Cibulka, Asen Dontchev, Alexander Kruger

TL;DR
This paper explores the properties, stability, and applications of strong metric subregularity in variational analysis, extending classical theorems and providing conditions for superlinear convergence of Newton-type methods.
Contribution
It demonstrates the stability of strong metric subregularity under perturbations, extends criteria involving graphical derivatives, and analyzes convergence of Newton's methods in this context.
Findings
Strong metric subregularity is stable under small calmness perturbations.
Extension of Rockafellar's criterion to infinite-dimensional spaces.
Superlinear convergence of Newton's method under strong metric subregularity.
Abstract
Although the property of strong metric subregularity of set-valued mappings has been present in the literature under various names and with various definitions for more than two decades, it has attracted much less attention than its older "siblings", the metric regularity and the strong metric regularity. The purpose of this paper is to show that the strong metric subregularity shares the main features of these two most popular regularity properties and is not less instrumental in applications. We show that the strong metric subregularity of a mapping F acting between metric spaces is stable under perturbations of the form f + F, where f is a function with a small calmness constant. This result is parallel to the Lyusternik-Graves theorem for metric regularity and to the Robinson theorem for strong regularity, where the perturbations are represented by a function f with a small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Strong Metric Subregularity of Mappings
in Variational Analysis and Optimization
R. Cibulka1, A. L. Dontchev2 and A. Y. Kruger3
Dedicated to the memory of Jonathan M. Borwein
Abstract
Although the property of strong metric subregularity of set-valued mappings has been present in the literature under various names and with various (equivalent) definitions for more than two decades, it has attracted much less attention than its older “siblings”, the metric regularity and the strong (metric) regularity. The purpose of this paper is to show that the strong metric subregularity shares the main features of these two most popular regularity properties and is not less instrumental in applications. We show that the strong metric subregularity of a mapping acting between metric spaces is stable under perturbations of the form , where is a function with a small calmness constant. This result is parallel to the Lyusternik-Graves theorem for metric regularity and to the Robinson theorem for strong regularity, where the perturbations are represented by a function with a small Lipschitz constant. Then we study perturbation stability of the same kind for mappings acting between Banach spaces, where is not necessarily differentiable but admits a set-valued derivative-like approximation. Strong metric -subregularity is also considered, where is a positive real constant appearing as exponent in the definition. Rockafellar’s criterion for strong metric subregularity involving injectivity of the graphical derivative is extended to mappings acting in infinite-dimensional spaces. A sufficient condition for strong metric subregularity is established in terms of surjectivity of the Fréchet coderivative, and it is shown by a counterexample that surjectivity of the limiting coderivative is not a sufficient condition for this property, in general. Then various versions of Newton’s method for solving generalized equations are considered including inexact and semismooth methods, for which superlinear convergence is shown under strong metric subregularity. As applications to optimization, a characterization of the strong metric subregularity of the KKT mapping is obtained, as well as a radius theorem for the optimality mapping of a nonlinear programming problem. Finally, an error estimate is derived for a discrete approximation in optimal control under strong metric subregularity of the mapping involved in the Pontryagin principle.
Key Words. strong metric subregularity, perturbations and approximations, generalized derivatives, Newton’s method, nonlinear programming, optimal control.
AMS Subject Classification (2010) 49J53, 49K40, 90C31.
1Department of Mathematics, Faculty of Applied Sciences, University of West Bohemia, Univerzitní 22, 306 14 Pilsen, Czech Republic, [email protected]. Supported by the project GA15-00735S.
2Mathematical Reviews, 416 Fourth Street, Ann Arbor, MI 48107-8604, USA, [email protected]; Institute of Statistics and Mathematical Methods in Economics, Vienna University of Technology, Wiedner Hauptstrasse 8, A-1040, Austria. Supported by NSF, grant 1562209, the Austrian Science Foundation (FWF), grant P26640-N25, and the Australian Research Council, project DP160100854.
3 Centre for Informatics and Applied Optimization, Federation University Australia, POB 663, Ballarat, VIC 3350, Australia, [email protected]. Supported by the Australian Research Council, project DP160100854.
1 Introduction
There are three basic properties of linear mappings in analysis and topology: surjectivity, injectivity and invertibility. Specifically, a linear and bounded mapping acting from a Banach space to a Banach space is said to be surjective when for every there exists such that ; it is said to be injective when implies ; it is said to be invertible when for every there exists a unique such that . The combination of surjectivity and injectivity implies invertibility and in this case the inverse mapping is linear and bounded. When all three properties are equivalent. An extension of surjectivity to nonlinear/set-valued mappings which goes back to the Banach open mapping principle is the well-known property of metric regularity, a name coined by Borwein in [3]. An extension of invertibility, which is particularly useful in optimization, is known as strong metric regularity, a property introduced by Robinson in [32]. In this paper we focus on an extension of injectivity to nonlinear/set-valued mappings called strong metric subregularity, for which in this paper we also use the name “strong subregularity” for short. Although this property has been present in the literature under various names and with various (mostly equivalent) definitions for more than two decades, it has attracted much less attention than its older “siblings”, the metric regularity and the strong (metric) regularity. The purpose of this paper is to demonstrate that the strong subregularity shares the main features of the other two regularity properties and is not less instrumental in applications.
To put the stage, let us first fix the notations and terminology. Throughout, and are metric spaces in general and any metric is denoted by . The space also appears as a linear metric space with shift invariant metric, that is, a metric with the property that for all . Both and could also be Banach spaces and this is always explicitly stated or clear from the context. A norm is generally denoted by , sometimes with a subscript indicating a specific space. The -dimensional Euclidean space is denoted by and the set of nonnegative integers is denoted by . The distance from a point to a set in a metric space is ; the distance to the empty set is always . The closed ball centered at with radius is denoted by and the closed unit ball is . A set is said to be a neighborhood of a point when there exists a real such that .
A set-valued mapping acting from to the subsets of , denoted F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y, is associated with its graph \,\mathop{\rm gph}\nolimits F=\big{\{}\,(x,y)\in X\times Y\,\big{|}\,y\in F(x)\big{\}}, its domain \,\mathop{\rm dom}\nolimits F=\big{\{}\,x\in X\,\big{|}\,F(x)\neq\emptyset\big{\}} and its range \,\mathop{\rm rge}\nolimits F=\big{\{}\,y\in Y\,\big{|}\,\exists\,x\in X\text{with}y\in F(x)\big{\}}. The inverse of is defined as y\mapsto F^{-1}(y)=\big{\{}\,{x\in X}\,\big{|}\,y\in F(x)\big{\}}. The space of all linear bounded (single-valued) mappings acting between Banach spaces and and equipped with the standard operator norm is denoted by . A mapping acting between Banach spaces and is said to be positively homogeneous when its graph is a cone. For a positively homogeneous mapping H:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y the expression is said to be the inner norm of and denoted by , while the expression is the outer norm of and denoted by . Also, recall that the measure of non-compactness [1] of a set is defined as
[TABLE]
Given a (set-valued) mapping acting from a metric space to (the subsets of) a metric space , a point and neighborhoods of and of , the submapping is said to be a graphical localization at for . Local invertibility of at is identified with having a localization at for which is single-valued (a function). The most known manifestation of invertibility of a (nonlinear) function is the classical inverse function theorem: the inverse of a strictly differentiable at function between Banach spaces has a strictly differentiable at single-valued localization at for if and only if the strict derivative is invertible. For a general mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y, the property that has a Lipschitz continuous single-valued localization at for is known as strong metric regularity of at for . In this paper we also use the shorter name strong regularity as in Robinson’s original definition in [32] which, strictly speaking, is somewhat different but is based on the same idea.
A mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y is said to be metrically regular at for when , is locally closed at , meaning that there exists a neighborhood of such that the set is closed in , and there is a constant along with neighborhoods of and of such that
[TABLE]
The infimum of for which there exist neighborhoods and such that (1) holds is called the regularity modulus of and denoted . We use the convention that if and only if is metrically regular at for . A mapping is metrically regular at any point if and only if it is surjective in which case ; this comes from the Banach open mapping principle. A mapping is strongly regular at for if and only if is metrically regular at for and the inverse has a graphical localization at for which is nowhere multivalued; in this case for every there exists a neighborhood of where the localization is Lipschitz continuous with a Lipschitz constant .
A generally set-valued mapping acting from a metric space to the subsets of a metric space is said to be strongly metrically subregular at for when and there is a constant along with neighborhoods of and of such that
[TABLE]
This property can be equivalently defined, see [15, Section 3I, p. 194] with just one neighborhood by adjusting its size, as follows: there is a constant along with a neighborhood of such that
[TABLE]
Either definition yields that is the only point in such that ; that is, is an isolated point of . The infimum of over neighborhoods and such that (2) holds (or over such that (3) holds) is called the subregularity modulus of and denoted by . We adopt the convention that whenever is not strongly subregular at for . Note that we do not assume that the graph of is locally closed at the reference point in the definition of strong subregularity. A mapping whose range is closed is strongly subregular everywhere if and only if it is injective; in this case ; note that in finite dimensions the range of a linear bounded mapping is always closed.
There is a close connection between strong metric subregularity and the properties of the distance function , see [15, Theorem 3I.5]. Directly from the definition it follows that a set-valued mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y is strongly subregular at for if and only if is a local sharp minimizer of the function . Recall that a point is called a local sharp minimizer of a function whenever there is a neighborhood of and a constant such that
A mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y is strongly subregular at for if and only if its inverse has the so-called isolated calmness property at for . Specifically, whenever is strongly subregular at for there exist a constant and neighborhoods of and of such that
[TABLE]
Moreover, the infimum of all such that this inclusion holds for some neighborhoods and , which we denote as , equals . The proof of this statement is straightforward, see e.g. [15, Theorem 3I.3] where it is stated in finite dimensions but can be easily translated into the language of metric spaces.
Strong subregularity and isolated calmness have been considered in various contexts and under various names in the literature. Isolated calmness was formally introduced by the second author in [9] under the name “local upper Lipschitz continuity at a point”; in the same paper the perturbation stability of this property was first proved. The equivalent property of strong subregularity was considered earlier, without giving it a name, by Rockafellar [33]. The name “strong metric subregularity” was first used in [14] where its equivalence with the isolated calmness was proved.
In finite dimensions there is a class of strongly subregular mappings with a particularly simple description. The following theorem is based on an important result by Robinson [31]:
Theorem 1.1**.**
Consider a mapping F:\mathbb{R}^{n}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;\mathbb{R}^{m} whose graph is the union of finitely many polyhedral convex sets. Then is strongly subregular at for if and only if is an isolated point of .
The strong subregularity obeys the paradigm of the inverse function theorem, by which we mean that the property is stable (persistent) under addition of a function whose calmness constant is smaller than the reciprocal of the subregularity modulus. The metric regularity and the strong regularity also obey this paradigm but when the function added to the mapping has a Lipschitz constant smaller than the reciprocal of the regularity modulus. In the case when the mapping is represented by a strictly differentiable function this yields that all three properties are preserved under linearization.
If we fix in the definition of metric regularity (1) we obtain the property of metric subregularity:
[TABLE]
In contrast to metric regularity, the property (5) does not obey the paradigm of the inverse function theorem, as explained in [15, Section 3.8]. Indeed, from Theorem 1.1 every linear mapping between and is metrically subregular, but not every smooth function has this property. Nevertheless, for some special kinds of mappings one may expect stability criteria in terms of infinitesimal approximations, see [21].
The following proposition puts together the strong regularity, the metric regularity, and the strong subregularity of a function at against the invertibility, surjectivity and injectivity of its strict derivative . With some abuse of notation, for a function we say that is (strongly) metrically (sub)regular at and write (sub) instead of (sub).
Proposition 1.2**.**
Let and be Banach spaces and let be strictly differentiable at . Then (i) is strongly regular at if and only if is invertible, in which case (ii) is metrically regular at if and only if is surjective, in which case (iii) Suppose that is closed. Then is strongly subregular at if and only if is injective, in which case Moreover, in this case it is sufficient to assume that is Fréchet differentiable at .
The first statement is a version of the classical inverse function theorem. The second statement follows from the Lyusternik-Graves theorem. We will present a general version of the third statement in Section 2 where we also show that in infinite dimensions the assumption regarding the closedness of the range of the derivative mapping cannot be removed.
From Proposition 1.2 we obtain that if a smooth function is both strongly subregular and metrically regular at , then it is strongly regular at . This is not true however for set-valued mappings even if we require strong subregularity around the reference point. As a counterexample, take , which is both strongly subregular and metrically regular at [math] for [math], strongly regular at every point in its graph different from the origin, and not strongly regular at [math] for [math].
In this paper we present a collection of new results regarding strong metric subregularity; we also give extended versions of known results which is clearly indicated in the text. The paper has two main parts. The first part presents theoretical results mostly related to stability of strong subregularity with respect to (derivative-type) approximations. First we focus on showing perturbation stability in general metric spaces and some consequences for differentiable functions and polyhedral mappings in finite dimensions. Then we deal with mappings of the form where is a not necessarily differentiable function and is a set-valued mapping. Section 4 shows extensions to the so-called strong -subregularity. In Section 5 a partial extension of Rockafellar’s criterion for strong subregularity is obtained for mappings acting in infinite-dimensional spaces. A sufficient condition for strong subregularity is established in terms of surjectivity of the Fréchet coderivative, and it is shown by a counterexample that surjectivity of the limiting coderivative cannot serve as a sufficient condition for this property to hold.
The second part of the paper is devoted to applications that are the main motivation of this study. We consider first various versions of Newton’s method including inexact and semismooth methods, for which a specific mode of convergence is shown under strong subregularity. For a standard nonlinear programming problem, a characterization of the strong subregularity of the optimality mapping is obtained in terms of a strong form of the Mangasarian-Fromovitz constraint qualification and a quadratic growth condition for the objective function. A related result is obtained in [2] for a proper lower semicontinuous convex function defined on a Banach space , whose dual is denoted by . Namely, it is shown that the subdifferential mapping \partial g:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;X^{*}, understood in the sense of convex analysis, is strongly subregular at a point if and only if there exist positive constants and such that
[TABLE]
where denotes the duality pairing. Generalizations of the above results to a non-convex function by using limiting subdifferential and under appropriate additional assumptions can be found in [18, Corollary 3.3 and 3.5], see also [35]. If , a relation of strong subregularity of the limiting subdiferential and quadratic growth of a semi-algebraic function can be found in [17, Theorem 3.1].
As another application, a radius theorem for the optimality mapping for a nonlinear programming problem is proven, giving an expression for the minimal perturbation of the objective function by a quadratic form for which the second-order sufficient optimality condition is violated. Finally, an error estimate is derived for a discrete approximation in optimal control under strong subregularity of the mapping involved in the Pontryagin principle.
2 Perturbed strong subregularity
Recall [15, Section 1.3] that a function acting between metric spaces and is said to be calm at when and there exist a neighborhood of and a constant such that
[TABLE]
The infimum of such that (6) holds for some neighborhood of is the calmness modulus of at and is denoted by . Note that does not have to be an interior point of .
The following theorem shows that the strong subregularity obeys the paradigm of the inverse function theorem: the property is preserved under perturbations by a function with a small calmness modulus. A version of it appeared first in [9, Theorem 3.2] and was echoed later in other publications. More recently, [15, Theorem 3I.7] uses an equivalent definition of strong subregularity and is given in finite dimensions, while the proof in [34, Theorem 3.2] uses the notion of the steepest displacement rate of a set-valued mapping. The proof given here is just an application of the definitions; we present it for completeness.
Theorem 2.1**.**
Suppose that is a metric space and is a linear metric space with shift invariant metric. Let , , and be positive constants such that . Consider a mapping G:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y which is strongly subregular at for with a constant and a neighborhood , and a function which is calm at with a constant and a neighborhood . Then is strongly subregular at for with the constant and the neighborhood ; in particular
[TABLE]
Proof.
By assumption, we have
[TABLE]
Observe that . Take any and any (if there is no such we have and there is nothing to prove). Then there exists such that and from (7) we get
[TABLE]
Taking into account that and is an arbitrary point in , we obtain
[TABLE]
The proof is complete.
The above statement fails when the perturbation is represented by a (calm) set-valued mapping even for . Indeed, the mapping is strongly subregular at [math] for [math]. Let ; clearly has the isolated calmness property at [math] for [math]. However, as easily seen, the sum is not strongly subregular at [math] for [math].
The following corollary specifies the result in Theorem 2.1 for the case when the (single-valued) function is approximated by another function.
Corollary 2.2**.**
Suppose that is a metric space and is a linear metric space with shift invariant metric. Consider F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y, a point and two functions and with . Suppose that is strongly subregular at for , the difference is calm at , and
[TABLE]
Then the mapping is strongly subregular at for and
[TABLE]
In particular, if , then the mapping is strongly subregular at for if and only if is strongly subregular at for , in which case
[TABLE]
Proof.
To show the first statement, fix any and such that . Clearly, there is such that the assumptions of Theorem 2.1 hold for and . Hence is strongly subregular at for with modulus not greater than . The second statement follows from the first one and the fact that and can be interchanged.
Remark 2.3**.**
When and are Banach spaces and is Fréchet differentiable at then the function satisfies the conditions in the second part of Corollary 2.2. Taking we arrive at Proposition 1.2 (iii). But we can consider the much larger class of semidifferentiable functions. Recall that a function is called semidifferentiable at , if there is a (unique) continuous and positively homogeneous function such that the function is the first-order approximation to at , that is, . Every piecewise smooth function is semidifferentiable at any interior point of its domain [15, Proposition 2D.8]. Also if is locally Lipschitz at , then is semidifferentiable at if and only if is directionally differentiable at [15, Proposition 2D.1].
Remark 2.4**.**
Let , with and being normed spaces, and be such that there is a positively homogeneous function which is continuous at [math] and for some positive (such a function is called the first-order -approximation of at in [34]). Taking and observing that is strongly subregular at if and only if so is at [math], we get [34, Theorem 4.1]: If is strongly subregular at [math] and , then is strongly subregular at with modulus not greater than .
We present next a theorem regarding perturbation stability of strong subregularity in an implicit function form. It is an infinite-dimensional version of [15, Theorem 3I.14] whose proof also works in this case with a few minor adjustments and therefore will not be reproduced here.
Theorem 2.5**.**
Let , and be Banach spaces and let and F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y. Consider the generalized equation , its solution mapping S:p\mapsto\big{\{}\,x\,\big{|}\,f(p,x)+F(x)\ni 0\big{\}}, and a pair , and suppose that is continuously Fréchet differentiable on a neighborhood of . If the mapping
[TABLE]
is strongly subregular at for [math], then has the isolated calmness property at for with
[TABLE]
Furthermore, when and are Hilbert spaces and is surjective, then the converse implication holds as well: the mapping is strongly subregular at for [math] provided that has the isolated calmness property at for .
Proof.
The proof of the first part of the theorem which gives the estimate (8) is identical with the proof of [15, Theorem 3I.13] with general Banach space norms replacing the Euclidean ones. Consider the mapping
[TABLE]
Let . Since and are Hilbert spaces, the mapping , where is the adjoint to , has a linear bounded inverse. Let . The further proof is identical to the proof of [15, Lemma 2C.1]. To finish, use the argument in the proof of [15, Proposition 3I.15] replacing the Euclidean norms by the norms of , and spaces, respectively.
Generalizations of the first part of the above statement for parametric generalized equations with a nonsmooth single-valued part can be found in [34, Section 5] (cf. Theorem 3.7 in the next section). Combining Corollary 2.2 and Theorem 1.1 we obtain the following result:
Theorem 2.6**.**
Let and be Banach spaces. Consider a function which is Fréchet differentiable at a point and a set-valued mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with . Then the mapping is strongly subregular at for if and only if the mapping has the same property. In the case when , and the graph of is the union of finitely many polyhedral convex sets, the mapping , and hence , is strongly subregular at for if and only if is an isolated point of .
Theorem 2.6 yields the statement (iii) in Proposition 1.2 but note that the latter imposes the additional condition that the range of is closed. Indeed, is strongly subregular at if and only if the linearization has the same property. The problem is that an injective linear and bounded mapping is not necessarily strongly subregular. Let’s have a closer look at that.
By linearity, is strongly subregular everywhere if and only if is strongly subregular at [math] for [math]. From (3) we obtain that is strongly subregular at [math] for [math] if and only if
[TABLE]
If the dimension of is finite, then (9) holds if and only if , that is, is injective. This is not true in general as Example 2.7 shows. However, if an operator has a closed range then the Banach open mapping theorem yields that there is a constant such that for any there is such that and . Then the injectivity of implies that such a point is unique and therefore
[TABLE]
Consequently, any bounded linear operator which is injective and has a closed range is strongly subregular at [math] for [math], and hence strongly subregular everywhere.
Example 2.7**.**
Let , the space of (infinite) sequences in equipped with the norm , and , the space of (infinite) sequences in equipped with the norm . Define the operator by
[TABLE]
Then with . Indeed, letting , , we get and . On the other hand, for any and we have for any , which means that . The mapping is injective, but not strongly subregular at [math] for [math]. Indeed, suppose on the contrary that there are and such that
[TABLE]
Pick any such that and then set if and otherwise. Then and . Thus
[TABLE]
a contradiction. Given , let if and otherwise. Then is such that and . Hence , that is, (9) fails. The range of is not closed. Indeed, given , let if and otherwise; then . For each , if we set if and otherwise, then and . Then but .
3 Set-valued derivative-type approximations
In this section we continue the analysis started in the preceding section of mappings of the form , where now is a function which is calm at the reference point but not necessarily differentiable there, and is a set-valued mapping. We will now approximate the possibly nonsmooth function around the reference point by a set in . This approach goes back to [23] and the concept of a prederivative which is generated by a set of linear operators.
Theorem 3.1**.**
Let and be Banach spaces and consider a function , a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y and a point such that . Suppose that there exist a subset of and a constant such that: (i) there is a constant such that for every one can find satisfying
[TABLE]
(ii)* for every the mapping*
[TABLE]
is strongly subregular at for and
[TABLE]
where
[TABLE]
Then is strongly subregular at for ; moreover
[TABLE]
Proof.
Note that from (10) we have and also (12) yields that . Choose and such that
[TABLE]
Let be as in condition (i). We will show first that there exists such that
[TABLE]
By the definition of , there is a finite set such that
[TABLE]
Pick any . Then there exists such that
[TABLE]
Let . Since , Theorem 2.1 implies that
[TABLE]
Thus, for any there is such that for each the above inequality holds. Let . Taking into account (15), we obtain (14).
Choose any , then use (i) to find such that (10) is satisfied. Then (10) along with (14) gives us
[TABLE]
Since , we obtain
[TABLE]
Thus, is strongly subregular at for . Since and can be arbitrarily close to and [math], respectively, this yields (13).
Let be Lipschitz continuous around . Bouligand’s limiting Jacobian, denoted by , is defined as the set of all matrices obtained as limits of the usual Jacobians for sequences such that is differentiable at . The convex hull of is Clarke’s generalized Jacobian of at denoted by . If in Theorem 3.1 we choose , , and , then, as well known, see [15, Proposition 6F.3], for every there exists such that (10) is satisfied; that is, assumption (i) holds with an arbitrarily small . In that case we also have , and then Theorem 3.1 gives us the following:
Corollary 3.2**.**
Let , and F:\mathbb{R}^{n}{\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;}{\mathbb{R}^{m}} be such that . Suppose that is Lipschitz continuous around and for every the mapping defined in (11) is strongly subregular at for . Then is strongly subregular at for ; moreover,
[TABLE]
As an application of the above corollary, consider the inequality
[TABLE]
where is a Lipschitz continuous function around some . Inequalities in are understood componentwise. Then, by combining Corollary 3.2 with Theorem 1.1, we obtain
Corollary 3.3**.**
In the context of the inequality system (16), suppose that for every , the point is the only solution of the inequality
[TABLE]
Then the mapping is strongly subregular at for [math].
When is the zero mapping, from Corollary 3.2 we obtain an analogue of Clarke’s inverse function theorem, which seems to be new:
Theorem 3.4**.**
Consider a function which is Lipschitz continuous around . If all matrices in the generalized Jacobian have rank (which is only possible if ), then is strongly subregular at .
In a different direction, Theorem 3.1 may be extended in the following way:
Theorem 3.5**.**
Let and be Banach spaces and consider a function , a set-valued mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y and a point such that . Suppose that there exist a mapping \mathcal{H}:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;\mathcal{L}(X,Y) and a constant such that (i) there is a constant along with a selection for such that
[TABLE]
(ii)* the assumption (ii) in Theorem 3.1 holds with replaced by ; (iii) for any there exists such that whenever .
Then is strongly subregular at for with modulus satisfying (13) where is replaced by . *
Proof.
Let and be as in Theorem 3.1 (ii) with replaced by . Then there exists satisfying
[TABLE]
By (iii), we may make smaller if necessary to have
[TABLE]
From the definition of measure of non-compactness, there is a finite set such that
[TABLE]
Hence, from (19), for any we get
[TABLE]
that is,
[TABLE]
This shows that the measure of non-compactness of the set is not greater than . Since for each the assumption (i) of Theorem 3.1 holds. By (19) we have . We will now prove that
[TABLE]
Choose any . Find such that . Note that, by (18), we have . Inasmuch as , Corollary 2.2 implies that . Since was arbitrarily chosen in we get (20).
Remembering (18), we have that ; that is, the assumptions in (ii) of Theorem 3.1 hold with replaced by . Then is strongly subregular at for with modulus not greater than . This finishes the proof of (13) with , because can be chosen arbitrarily close to [math], which means that and can be made arbitrarily close to and , respectively.
Recall that a function is said to be semismooth at when it is Lipschitz continuous around , directionally differentiable in every direction, and for every there exists such that
[TABLE]
If is semismooth at then for any there is such that inequality (17) is satisfied with being any selection of ; thus Theorem 3.5 is a subregularity version of a statement in [22]. It also yields a version of Corollary 3.2 for Bouligand’s limiting Jacobian which is known to be outer semicontinuous (at any point ) [20, Proposition 7.4.11], that is, condition (iii) in Theorem 3.5 holds.
Corollary 3.6**.**
Let , and F:\mathbb{R}^{n}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;{\mathbb{R}^{m}} be such that . Suppose that is Lipschitz continuous around and that for every there exists along with a selection for such that
[TABLE]
Assume that, for each , the mapping defined in (11) is strongly subregular at for . Then is strongly subregular at for ; moreover,
[TABLE]
Finally, we consider a derivative-type approximation of the function by a positively homogeneous set-valued mapping.
Theorem 3.7**.**
Let and be Banach spaces and consider a function , a set-valued mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y and a point such that . Suppose that there exist a positively homogeneous mapping G:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y and a constant such that (i) there exists a constant such that
[TABLE]
(ii)* the mapping is strongly subregular at for with .
Then is strongly subregular at for ; moreover*
[TABLE]
Proof.
Let be such that . Shrink , if necessary, to have
[TABLE]
Choose any and then an arbitrary . By (22) we find such that . Then and we have
[TABLE]
Therefore for any . Thus, we have
[TABLE]
Noting that was arbitrarily chosen in and can be chosen arbitrarily close to , the proof is complete.
Taking the above proof gives a direct proof of [34, Theorem 4.2]. We show next that Theorem 3.7 implies Theorem 3.1.
Remark 3.8**.**
Let , , , , , and be as in Theorem 3.1. Define G:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y by G(u):=\{Au\,\big{|}\,A\in\mathcal{A}\}, . Then the condition (i) in Theorem 3.1 implies (i) in Theorem 3.7. The mapping from Theorem 3.7 (ii) has . Indeed, in the proof of (14) we showed that for any and any sufficiently close to and [math], respectively, there exists such that
[TABLE]
Fix any , and then pick arbitrary (if any). The very definition of the mapping implies that there is such that . Then
[TABLE]
Taking into account that is a fixed element of , and the constants and can be arbitrarily close to and [math], respectively, we obtain the desired estimate for the subregularity modulus of . Inequality (12) implies that . Therefore condition (ii) in Theorem 3.7 holds. Hence is strongly subregular at for and
[TABLE]
A result analogous to Corollary 3.2 for strong regularity was stated in [25]; a complete proof extended to Banach spaces is given in [5]. In a more recent paper [7] a nonsmooth version of the Lyusternik-Graves theorem for metric regularity is obtained. We note that the proofs in [5] and [7] are much more involved than the proofs of Theorems 3.1 and 3.5 and use other conditions, for example, convexity of the set of derivative approximations.
4 Strong -subregularity
We consider in this section an extension of the strong metric subregularity, the so-called strong metric -subregularity, defined as follows. For a positive scalar , a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y acting between metric spaces and is said to be strongly -subregular at for when and there exist a constant and a neighborhood of such that
[TABLE]
The (usual) strong subregularity is obtained for .
Observe that for this property is not stable under linearization, in the sense of Proposition 1.2. As a counterexample take with . However, if we consider perturbations by a function which is calm of order , then a simple modification of the proof of Theorem 2.1 gives us perturbation stability. Given , a function is said to be -calm at with the constant provided that there is a neighborhood of such that
[TABLE]
The precise result is as follows:
Theorem 4.1**.**
Let be a metric space and be a linear metric space with shift invariant metric. Let , , and , and let and be positive constants such that . Suppose that a mapping G:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y is strongly -subregular at for with constant and neighborhood . Also, consider a function which is -calm at with constant and neighborhood . Then is strongly -subregular at for with constant and neighborhood .
Proof.
The proof repeats that of Theorem 2.1 with some adjustments of the exponents. By assumption, we have
[TABLE]
Observe that . Take any . If is empty we are done. If then
[TABLE]
Since and we have . Taking into account that , we obtain
[TABLE]
and the proof is complete.
As in the standard case with , when and are Banach spaces and the perturbation is represented by a Fréchet differentiable function, we can say more about perturbation stability.
Theorem 4.2**.**
*Let and are Banach spaces and let and . Consider a function which is Fréchet differentiable at and a set-valued mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y such that . Then the mapping is strongly -subregular at for if and only if the mapping has the same property.
Assume, in addition, that is Fréchet differentiable around and is continuous at . Then is strongly -subregular at for if and only if there are and such that for any the mapping is strongly -subregular at for with constant and neighborhood .*
Proof.
The Fréchet differentiability of means that the function has . Let be such that is strongly -subregular at for with constant . Clearly, there are and such that and satisfy the assumptions of Theorem 4.1 with ; hence is strongly -subregular at for . To prove the opposite implication, use and as and , respectively.
Now suppose that is continuously differentiable at . Let and be such that the mapping is strongly -subregular at for with constant and neighborhood . Let be such that . Using standard calculus and making smaller, if necessary, we have that
[TABLE]
Fix any . Then is calm at with a constant and a neighborhood ; moreover . Applying Theorem 4.1 with , we get that is strongly -subregular at for with a constant , which is independent of . The opposite direction follows from the first part of the statement.
We end this section with some comments regarding the recent paper [28]. Taking and in Theorem 4.1, one obtains [28, Theorem 4.1] where the authors use the stronger assumption that the single-valued perturbation is Lipschitz continuous around . The first part of Theorem 4.2 slightly improves [28, Corollary 4.2] where strict differentiability of the single-valued part is assumed, while the second echoes [28, Theorem 4.4].
5 Conditions involving generalized derivatives
In this section and are Banach spaces and and are their duals, respectively. It follows directly from the definition that a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with is strongly subregular at for if and only if its steepest displacement rate at for defined as
[TABLE]
is positive (with the convention that the limit in (24) is when is an isolated point in ). This notion was introduced by A. Uderzo in [34]. It is elementary to check (see [34, Proposition 2.1]) that
[TABLE]
where we set . Thus, if is strongly subregular at for with a constant then we have . Conversely, if for some then is strongly subregular at for with the constant .
When is not an isolated point in , then . Otherwise, the steepest displacement rate (24) coincides with the subregularity constant
[TABLE]
extensively used in [26] when characterizing metric subregularity.
First, we focus on conditions based on tangential approximation of the graph of the mapping in question. Let be a set in and let . The Bouligand-Severi tangent cone to at , denoted by , is the set of all such that there are sequences in and in converging to and [math], respectively, such that for each . For a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with , the graphical derivative mapping of at is defined as
[TABLE]
The following is a generalization of [15, Theorem 4E.1] which goes back to Rockafellar [33]:
Theorem 5.1**.**
Consider a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with . Then
[TABLE]
If, in addition, the dimension of is finite, then
[TABLE]
*that is, is strongly subregular at for if and only if is finite.
Moreover, if both and are finite-dimensional, then (26) holds as equality.*
Proof.
For the first part of the claim, note that if the right-hand side of (26) is infinite then we are done. If not, pick and then such that
[TABLE]
Fix an arbitrary . Then there exist sequences in and in , as well as in , converging to , , and [math], respectively, such that for each . For sufficiently large we have and hence
[TABLE]
Consequently, for each . Thus . Letting we get (26).
Now, let be finite-dimensional. By [15, Proposition 5A.7] we know that is finite if and only if . In view of (26), it is sufficient to prove the part in the first equivalence. Let be finite. Suppose on the contrary that is not strongly subregular at for . Then there is a sequence in converging to such that
[TABLE]
Let , , and , . By the above inequality, and as . Since is finite-dimensional, we can assume that converges to some with . Noting that
[TABLE]
we get that for , that is, , a contradiction.
Let be finite-dimensional as well. Suppose that (26) is strict; then there is a (positive) constant such that . Find a sequence in converging to such that
[TABLE]
Let , , and be defined as in the previous paragraph. For each , we have , , and . Also as . Since both and are finite-dimensional, we can assume that converges to some with and that converges to some . By (27) we conclude that . Dividing (28) by and taking the limit as we get . Hence , a contradiction.
We will now consider dual space conditions for strong subregularity. Unless clearly indicated otherwise, we equip with the product (box) topology. Given a set and a point , the Fréchet normal cone to at , denoted by , is the set of all such that for every there exits such that
[TABLE]
For a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with , the Fréchet coderivative of at acts from to the subsets of and is defined as
[TABLE]
We give next coderivative conditions for strong subregularity:
Theorem 5.2**.**
Consider a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with . If is finite-dimensional, then
[TABLE]
If, in addition, is locally convex at , meaning that is convex for some neighborhood of in , then (29) becomes an equality.
Proof.
If either the right-hand side of (29) is infinite or is an isolated point of (implying that the left-hand side of (29) is zero) then we are done. Suppose that this is not the case, and fix any .
First, we show that
[TABLE]
To obtain (30), it is sufficient to show that, given with , for each there is a constant such that
[TABLE]
Assume on the contrary that there are with and along with a sequence converging to such that
[TABLE]
For each , choose a point such that
[TABLE]
this means in particular that
[TABLE]
The choice of implies that there is with . Hence, we have . Let
[TABLE]
Observe that (33) implies that converges to and
[TABLE]
For each , using (32), we obtain
[TABLE]
Thus , a contradiction. We proved that (31) holds, and consequently so does (30).
Second, we show that (30) implies that . Indeed, let be any sequence in converging to such that
[TABLE]
Let , . By Hahn-Banach theorem, for each , there is with such that . Going to subsequences, if necessary, we may assume that converges to some with and that converges to some with . Then
[TABLE]
Let . Then (30) implies that
[TABLE]
By (25), we have . Letting , we get (29).
Suppose now that is locally convex at . We will show the inequality opposite to (29). Fix an arbitrary (if any). Then there is such that is convex and
[TABLE]
Clearly, in this case , where is the usual normal cone to at in sense of convex analysis. For any from the dual ball of , we have
[TABLE]
that is, is a subgradient at of the sum of two convex functions on : the continuous function and the indicator function of the set , which is convex but not necessarily closed. Applying the convex sum rule [29, Theorem 3.39], we get
[TABLE]
Hence for any with there is with . Thus . Letting we get the desired inequality.
Note that inequality (29) in Theorem 5.2 may be strict rather often. For instance, if the normal cone is trivial, then . Take, for example, F:\mathbb{R}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;\mathbb{R} defined by , . Then while . This particular example was also mentioned in the introduction to illustrate the differences among the regularity properties for set-valued mappings.
Suppose that is finite-dimensional. Combining Theorem 5.2 and Theorem 5.1, we get that for any F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y with
[TABLE]
For any two positively homogeneous mappings , H_{2}:Y\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;X such that we have . Hence one could expect that taking a coderivative of at based on a bigger normal cone than the Fréchet one we can achieve that its inner norm equals to and, therefore to . In finite dimensions, a candidate for that to happen could be the limiting coderivative {D}^{*}F(\bar{x}\hskip 0.9pt|\hskip 0.9pt\bar{y}):\mathbb{R}^{m}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;\mathbb{R}^{n} with values
[TABLE]
where the limiting normal cone to at is a collection of vectors such that there are sequences in and in converging to and , respectively, such that for each . However, the limiting coderivative cannot provide a criterion for strong subregularity, in general. As a counterexample, let F:\mathbb{R}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;\mathbb{R} be defined by . Then is not strongly subregular at [math] for [math] and , but which means that is finite.
Given , we consider an equivalent norm in the product space defined by
[TABLE]
Now we present a necessary and sufficient condition for strong subregularity similar to the statement by Fabian and Preiss [19] guaranteeing that a set-valued mapping is open with a linear rate at a reference point. Note that this statement was proved independently by Ioffe [24] who showed that it implies openness with a linear rate around the reference point.
Theorem 5.3**.**
Consider a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y the graph of which is locally closed at . Then equals to the supremum of for which there exists such that for any with and , one can find a point satisfying
[TABLE]
Proof.
Denote by the supremum from the statement and let .
First, we show that . If , the inequality holds trivially. If not then fix any . Find such that
[TABLE]
Fix an arbitrary with and . Then is distinct from and (35) implies that
[TABLE]
Hence . As , we have . Noting that , we arrive at (34). Thus . The claimed inequality follows after letting .
To show that , assume on the contrary that . Choose such that the set is closed in . Fix any and then pick . Let be arbitrary, and set
[TABLE]
As , there is different from and such that
[TABLE]
Consider a function on a complete metric space . Applying to this function the Ekeland variational principle [4, Theorem 7.1.2] with
[TABLE]
we find a point such that
[TABLE]
Using (36), (37), (38) and (39) we have
[TABLE]
Thus we have and , and, as , also that
[TABLE]
Since (38) means that , from (40) we get
[TABLE]
If , then, by (41),
[TABLE]
which in combination with (39), (37), and (36) implies that
[TABLE]
Summarizing, we have shown that for every and every there exists with and such that no point can satisfy (34). Hence cannot be strictly greater than , a contradiction.
We immediately get a statement characterizing strong subregularity via local and nonlocal slopes/rates of descent.
Corollary 5.4**.**
Consider a mapping F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y the graph of which is locally closed at . Then is strongly subregular at for if and only if
[TABLE]
Moreover, the limit in (42) coincides with .
The limit (42) is taken in the product space and involves all points near excluding those with (external points). At every such point a kind of (nonlocal) descent rate is computed for the distance from to and can be underestimated by the corresponding easier to compute infinitesimal quantities:
[TABLE]
By analogy with the strong slope by De Giorgi, Marino, and Tosques [8], the quantity on the right-hand side of (43) can be interpreted as a kind of slope of at (cf. [26]). It is easy to check that, when is convex, (43) holds as equality.
6 The Newton method
We study the Newton method for solving the generalized equation
[TABLE]
where both and are Banach spaces, is a function, and F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y is a set-valued mapping. Provided that is Fréchet differentiable, the Newton iteration applied to (44) has the form
[TABLE]
In [15, Chapter 6] several results are presented regarding the method (45) under (strong) metric (sub)regularity. In the following subsections we extend some of these results and add new ones.
6.1 Convergence
The following theorem reveals the mode of convergence of the iteration (45) under strong subregularity of the mapping in (44). It improves [15, Theorem 6E.2].
Theorem 6.1**.**
Suppose that the function is Fréchet differentiable around a solution of (44) and the derivative mapping is continuous at . Also suppose that the mapping is strongly subregular at for [math]. Then there exists a neighborhood of such that if a sequence is generated by the Newton method (45) and has a tail with for all , then is superlinearly convergent to .
Proof.
The continuous differentiability of implies that for each there is such that
[TABLE]
By the strong subregularity of , there are positive constants and such that
[TABLE]
Let be such that (46) holds with and set . Let be any sequence generated by the Newton method (45) such that there is such that for all . For any we have and thus
[TABLE]
Therefore for each . Hence as . To see the rate of convergence, let be arbitrary. Find such that and (46) holds with and . Then there is such that whenever . As above, for such an index , we get
[TABLE]
Therefore for any we have . Hence superlinearly.
Clearly, the theorem above can be equivalently stated with the assumption that the entire sequence belongs to ; the statement we choose adds some information which can be meaningful numerically.
Our next theorem extends the result just presented to the case of strong -regularity.
Theorem 6.2**.**
Assume that is Hölder continuous around with an exponent and that is strongly -subregular at for with . Then there exists a neighborhood of such that if a sequence is generated by the Newton method (45) and has a tail with for all , then is convergent to with convergence rate .
Proof.
The assumptions of Theorem 6.1 are satisfied, hence, for a neighborhood of , if has a tail in , then as . Using standard calculus, we find and such that
[TABLE]
In view of Theorem 4.2, adjust , if necessary, and choose a constant such that
[TABLE]
Let be any infinite set for which for all . Fix . Using the inclusion
[TABLE]
we obtain
[TABLE]
This gives us the desired convergence rate.
6.2 Inexact quasi-Newton method
In this subsection we consider an inexact version of the Newton method (45) for solving (44) of the form
[TABLE]
where is a sequence in which represents an approximation of the derivative of provided by, for example, Broyden update, BFGS, and alike. The sequence of functions represents inexactness. The following theorem extends Theorem 6.1 to the iteration (48) and can be regarded as a version of the Dennis-Moré theorem for generalized equations; for related results see [10]:
Theorem 6.3**.**
Suppose that the function is Fréchet differentiable at a solution of (44) and the mapping is strongly subregular at for [math]. Then there exists a neighborhood of such that if a sequence is generated by the method (48), has a tail in and also
[TABLE]
then is superlinearly convergent to .
Proof.
By the definition of the Fréchet differentiability of at , for each there is such that
[TABLE]
Corollary 2.2 implies that is strongly subregular at for [math] if and only if so is , hence there are positive constants and such that
[TABLE]
Let be such that (50) holds with and set . Let be any sequence generated by (48) for which there is such that for all and (49) holds. Make bigger, if necessary, to have
[TABLE]
For any we have
[TABLE]
and thus the combination of (50) and (51) implies that
[TABLE]
Therefore for each . Hence as . To estimate the rate of convergence, let be arbitrary. Find such that and (50) holds with and . Then there is such that and
[TABLE]
As in preceding lines, for such an index we get
[TABLE]
Therefore for any we have . Hence superlinearly.
In the same way, by mimicking Theorem 6.2 one can obtain a statement analogous to Theorem 6.3 for a strongly -subregular mapping, extending a result in [28].
6.3 Semismooth Newton method
We continue our study of Newton method for solving the generalized equation (44) where is Lipschitz continuous but not necessarily differentiable around a reference solution . To deal with a Newton-type iteration we use the “linearization” of at of the form given by the mapping (11) where the matrix is an arbitrarily chosen element of Clarke’s generalized Jacobian. We consider the following version of Newton’s iteration: given choose and then find which satisfies
[TABLE]
When the function in (44) is semismooth (see the paragraph before Corollary 3.6 for the definition), this method is usually referred to as the semismooth Newton method. Note that in the theorem below we assume that possesses the semismoothness property but do not use the directional differentiability of which appears in its definition.
Theorem 6.4**.**
Consider the method (52) applied to (44) with a solution for a function which is semismooth at and assume that for each the mapping defined in (11) is strongly subregular at for [math]. Then there exists a neighborhood of such that if a sequence is generated by (52) and has a tail with for all , then is superlinearly convergent to .
Proof.
First we show that there are positive constants and such that
[TABLE]
Since the set is compact, there exists a constant (cf. the proof of (14)). Fix any . The mapping is outer semicontinuous at , hence there exists such that
[TABLE]
Compactness of the set implies that there is a finite set such that . Hence
[TABLE]
Given there exists such that the mapping is strongly subregular at for [math] with the constant and neighborhood . Let and . Fix any and as in (53). As , using inclusion (54) we find with . Therefore
[TABLE]
Since we get (53).
The semismoothness of implies that for each there is such that
[TABLE]
Let be such that (55) holds with and set . Let be any sequence generated by (52) such that for all . Fix any . As and , using (53) and (55), we get
[TABLE]
Hence as . To establish the rate of convergence, let be arbitrary. Find such that and (55) holds with and . Then there is such that whenever . As above, for such an index , we get
[TABLE]
Hence superlinearly.
Remark 6.5**.**
In view of Corollary 3.2, the assumptions of the above theorem imply that the mapping is strongly subregular at for [math].
If one considers (48) instead of (52), by using the above arguments one can obtain a slight generalization of [6, Theorem 3.2 (ii)].
6.4 Strong subregularity of Newton sequences
Denote by the space of (infinite) sequences in with elements , , , , equipped with the norm Consider the mapping
[TABLE]
that is, is the set of all sequences generated by the (perturbed) Newton method starting from the point . Note that if , then the constant sequence . In particular, if is a solution of (44), then .
Theorem 6.6**.**
Suppose that is Fréchet differentiable around and is continuous at . The mapping is strongly subregular at for [math] if and only if there is such that for any there is with the property that for each and each we have
[TABLE]
In this case, the infimum of such constants is equal to .
Proof.
Denote by the infimum of such that for any there is such that inequality (56) holds for each and each .
First, assume that and fix any . Pick any . Then there is such that for each and each we have
[TABLE]
Let be arbitrary. Pick arbitrary (if any). Then the constant sequence , hence it satisfies (57), that is
[TABLE]
which yields
[TABLE]
As was arbitrary, we conclude that is strongly subregular at for [math] with the constant and neighborhood . Letting we get that , and consequently .
Assume that is strongly subregular at for [math]. Fix any and any . Without loss of generality assume that is small enough to have that . Find such that
[TABLE]
Let . Continuous differentiability of implies that, we can make smaller, if necessary, so that
[TABLE]
Fix any sequence . Pick arbitrary (if any). Note that
[TABLE]
Fix any index , then (58), (60), and (59) imply that
[TABLE]
Noting that and , we get
[TABLE]
We claim that
[TABLE]
Indeed, as , (61) with is (62) for . We proceed by induction, assume that (62) holds for some . This and (61) with imply that
[TABLE]
which is (62) for . Inequality (62) is proved. Noting that we have
[TABLE]
As was arbitrary, the mapping is strongly subregular at for and (56) holds. Clearly, , hence .
7 Applications to optimization
7.1 Nonlinear programming
In this subsection we study strong subregularity of a mapping which plays a major role in the nonlinear programming problem
[TABLE]
subject to equality and inequality constraints:
[TABLE]
where the functions , are twice continuously differentiable everywhere. Under a constraint qualification condition which will be specified a bit later, the first-order necessary optimality condition is represented by the Karush-Kuhn-Tucker (KKT) system
[TABLE]
where
[TABLE]
is the Lagrangian associated with the problem (63); here is the vector of Lagrange multipliers. We study the strong subregularity of the following mapping associated with the KKT system (65):
[TABLE]
Let be a reference solution of (65). Define the index sets
[TABLE]
In further lines we utilize the following condition:
[TABLE]
This condition implies the well-known Mangasarian-Fromovitz Constraint Qualification (MFCQ) condition, in which the set is replaced by . As well known, the MFCQ yields that the set of Lagrange multipliers for problem (63) satisfying (65) is nonempty, convex and compact. The condition (67) was introduced in [27] under the name Strict Mangasarian-Fromovitz Constraint Qualification. This name however does not reflect the nature of the condition since the latter is a condition on the optimality system while MFCQ is a condition on the constraint mapping; actually, MFCQ is equivalent to the metric regularity of that mapping. Condition (67) implies that the set of Lagrange multipliers consists of a single point; we will give a proof of this claim in the proof of the next theorem.
Denote and ; that is, is the matrix whose rows are the vectors Define the so-called critical cone
[TABLE]
Recall that the second-order necessary condition for local optimality has the form
[TABLE]
while the second-order sufficient condition is
[TABLE]
Now we are ready to state the main result of this subsection.
Theorem 7.1**.**
The following are equivalent: (i) The conditions (67) and (69) are both satisfied; (ii) The KKT mapping defined in (66) is strongly subregular at for [math] and is a strong local minimizer of (63), meaning that there is a neighborhood of and a constant such that
[TABLE]
where C:=\{x\in\mathbb{R}^{n}\,\big{|}\,\ \eqref{constr}\mbox{ is satified }\}.
Proof.
Linearizing the functions appearing in the mapping (66) at we obtain the mapping
[TABLE]
where we take into account that and , and use the notation
[TABLE]
in which is a vector with components . We can now apply Theorem 2.6 according to which the mapping in (66) is strongly subregular at for [math] if and only if the mapping defined in (70) has the same property. The graph of the mapping is the union of polyhedral convex sets hence the strong subregularity of is equivalent to the property that the vector is an isolated point in .
Without loss of generality suppose that and . Denote by and the submatrices of corresponding to the index sets and , respectively; that is, the rows of are the vectors , and analogously for .
Let (i) hold. We will now show that is the unique solution of the variational inequality
[TABLE]
where is the subvector of whose components have indices in and is the set of vectors with nonnegative components. Suppose that the mapping is not strongly subregular at for [math]. Then there is a nonzero vector satisfying (71)–(73). Assume that . Multiplying (71) by and taking into account (72) and (73) we obtain which contradicts (69). Hence . But then there exists a nonzero such that and , hence . This contradicts (67). Thus the mapping in (65) is strongly subregular at for [math]. It is a standard fact that when satisfies (65) and the second order sufficient condition (69), then is a strong local solution of problem (63). Hence, (ii) is established.
In the opposite direction, suppose that the conditions in (ii) are satisfied. Then from the analysis in the beginning of the proof we conclude that the vector as an isolated point in . This in turn yields that is the unique solution of the variational inequality (71)–(73). But this immediately implies (67). Furthermore, from the assumed optimality of the second-order necessary condition (68) holds:
[TABLE]
We only need to show that this inequality is strict. On the contrary, suppose that there exists a nonzero such that . Then the nonzero vector is a solution of (71)–(73), a contradiction. Hence the conditions in (i) are satisfied.
Theorem 7.1 partially extends [13, Theorem 2.6] with a new proof; in the latter theorem it is also shown that under the conditions in (i) there exist neighborhoods of and of [math] such that for every the set is nonempty.
7.2 A radius theorem
A classical result, sometimes called the Eckart-Young theorem, says that for any nonsingular matrix ,
[TABLE]
A far reaching generalization of this result was proved in [16], see also [15, Theorem 6A.7], for the property of metric regularity of a set-valued mapping acting between Euclidean spaces. This result was extended later in [14, Theorem 5.12], see also [15, Theorem 6A.9], to the property of strong subregularity as follows:
Theorem 7.2**.**
Consider a mapping F:\mathbb{R}^{n}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;\mathbb{R}^{m} which is strongly subregular at for . Then
[TABLE]
Moreover, the infimum remains unchanged when either taken with respect to linear mappings of rank 1 or enlarged to all functions that are calm at , with replaced by the calmness modulus of at .
Note that in Theorem 7.2 the perturbation is represented by an arbitrary linear and bounded mapping . In a number of cases, however, one should focus on mappings that have special structure. Such a situation arises in particular when one attempts to determine the “radius of good behavior” of an optimization problem. To be specific, consider the problem
[TABLE]
where is a nonempty polyhedral convex subset of and is twice continuously differentiable everywhere. The first-order necessary optimality condition for problem (74) has the form
[TABLE]
In the sequel the mapping is called the optimality mapping. Every solution of the variational inequality (75) is said to be a critical point. The critical cone at for is defined as
[TABLE]
The second-order sufficient optimality condition for problem (74) has the form
[TABLE]
The following theorem is proved in [15, Theorem 4G.4]:
Theorem 7.3**.**
*Let be a critical point for (74). Then the following are equivalent: (a) the second-order sufficient condition (76) holds at ; (b) the point is a local minimizer for problem (74) and the optimality mapping is strongly subregular at for [math].
In either case, is actually a strong local minimizer.*
We now apply this last result to obtain a radius theorem for problem (74). Let be a local minimizer for (74). Along with (74) we consider the perturbed problem
[TABLE]
where is a symmetric matrix which enters the quadratic form representing the perturbation.
Theorem 7.4**.**
Let be a local minimizer for (74), let and be the associated critical cone, and let the second-order sufficient condition (76) holds at . Then
[TABLE]
Proof.
From Theorem 7.3 the quantity on the left side of (78) is the same as the quantity
[TABLE]
Since the strong subregularity is stable under linearization, the optimality mapping for (77) is not strongly subregular at for [math] exactly when the mapping is not strongly subregular at for [math]. Then the quantity in (79) is the same as
[TABLE]
Since the critical cone remains the same for the perturbed problem (77), by Theorem 7.3 the latter quantity equals
[TABLE]
By assumption, is symmetric positive definite on the cone , thus we have
[TABLE]
Let this minimum be attained for some . The matrix
[TABLE]
is symmetric (and negative definite). We have
[TABLE]
hence is not positive definite on . Moreover,
[TABLE]
Thus
[TABLE]
To prove the opposite inequality, observe that for any matrix and any , , we have
[TABLE]
Then
[TABLE]
provided that
[TABLE]
Thus, for any symmetric such that , we have that is positive definite. Hence, Putting this together with (81) we obtain . This proves that the quantity in (80) equals the right side of (79).
Note that when then the right side of (79) equals the smallest eigenvalue of , which, as well known, is equal to the reciprocal of , and we come to the finite-dimensional version of the extension of the Eckart-Young theorem described in [36]: if is symmetric positive definite, then the norm of the smallest in norm symmetric matrix such that is singular, equals . If is a subspace, then the radius quantity becomes where the columns of form a basis of .
Finally, we note that various versions of Theorem 7.3 are available in the literature as mentioned in the Introduction. Theorem 7.4 is new.
7.3 Discrete approximations in optimal control
Consider the following optimal control problem with control constraints:
[TABLE]
subject to
[TABLE]
where , , is a closed convex set in of feasible control values, denotes the derivative of the function with respect to time , and a.e. means almost every in the sense of Lebesgue measure. The admissible controls are functions in , the space of essentially bounded and measurable functions on with values in , and the state trajectories belong to , the space of Lipschitz continuous functions with weak derivatives in and value zero at . In the sequel we sometimes use the shortened notation instead of , etc. We assume that problem (82) has a solution and also that there exists a closed set and a with for almost every so that the functions and are twice continuously differentiable in an open set containing .
It is well known that under some mild conditions which we will not reproduce here, the first-order necessary condition in normal form for a weak minimum, known under the name the Pontryagin maximum principle, at a solution of problem (82) can be expressed in terms of the Hamiltonian in the following way: there exists , the so-called adjoint variable, such that is a solution of the following two-point boundary value problem coupled with a pointwise in variational inequality:
[TABLE]
for a.e. where, as before, is the normal cone to the set at the point . Denote , and let and . Further, for let
[TABLE]
The optimality system (83) then takes the form of the generalized equation where and F:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.8pt\hbox{\rightarrow}}\;Y. In further lines we will show that strong subregularity of the mapping described by (84) for the optimality system (83) provides a basis for obtaining an error estimate for a discrete approximation to this system.
Suppose that the optimality system (83) is solved inexactly by means of a numerical method applied to a discrete approximation provided by the Euler scheme. Specifically, let be a natural number, let be the mesh spacing, and let , . Denote by the space of piecewise linear and continuous functions over the grid with values in and such that , by the space of piecewise linear and continuous functions over the grid with values in and such that , and by the space of piecewise constant and continuous from the right functions over the grid with values in . Clearly, , and . Then introduce the products as an approximation space for the triple . We identify with the vector of its values at the mesh points, and similarly for the adjoint variable , and is regarded as the vector of the values of in the mesh subintervals.
Now, suppose that, as a result of the computations, for certain natural a function is found that satisfies the discrete optimality system:
[TABLE]
for . The system (85) represents the Euler discretization of the optimality system (83) with step-size .
Suppose that the mapping , where and are described in (84), is strongly subregular at for [math]. Then there exist positive scalars and such that if , then
[TABLE]
where the right side of this inequality is the residual associated with the approximate solution . In our specific case, the residual can be estimated by the norm of a function defined for each and as follows:
[TABLE]
Thus, estimating the residual reduces to finding an estimate for the norm . By the definition of the norm in we obtain
[TABLE]
Observe that here is a piecewise linear function across the grid with uniformly bounded derivative, since both and are in some neighborhood of and respectively. Hence, taking into account that the functions , , and are continuously differentiable, this leads us to an estimate of order for the error of the discretization. Specifically, we obtain the following result:
Theorem 7.5**.**
Assume that the optimality mapping associated with (83), where and are defined in (84), is strongly subregular at for [math]. Then there exist and positive reals and such that if for an integer a solution of the discrete optimality system (85) satisfies then
[TABLE]
We should note that the assumption of strong subregularity of the mapping associated with (83) and considered as a mapping from to is quite strong. For example, it follows from the estimate (86) that if the reference optimal control has a point of discontinuity in , its piecewise constant discrete approximation must have a jump at the same point. In the paper [11], see also [12], strong regularity in is obtained under coercivity of the objective function, an assumption which automatically implies continuity of the optimal control as a function of time . Without coercivity, for example, when the problem is linear in control, one needs metric regularity in larger spaces, for some new results in this direction see the recent paper [30]. In such spaces however, it may be not possible to differentiate, and hence to pass to a linearization. Theorem 7.5 should be treated as a first step towards employing strong subregularity to obtain error estimates for discrete approximations in optimal control.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] R. R. Akhmerov, M. I. Kamenskii, A. S. Potapova, A. E. Rodkina, and B. N. Sadovskii, Measure of Noncompactness and Condensing Operators . Birkhäuser, Basel, 1992.
- 2[2] F. J. Aragón Artacho, M. H. Geoffroy, Metric subregularity of the convex subdifferential in Banach spaces. J. Nonlinear Convex Anal. 15 (2014), 35–-47.
- 3[3] J. M. Borwein, Stability and regular points of inequality systems. J. Optim. Th. and Appl. 48 (1986), 9–52.
- 4[4] J. M. Borwein, A. S. Lewis, Convex analysis and nonlinear optimization: theory and examples . Springer Science & \& Business Media, 2010.
- 5[5] R. Cibulka, A. L. Dontchev, A nonsmooth Robinson’s inverse function theorem in Banach spaces. Math. Program. 156 (2016), 257–270.
- 6[6] R. Cibulka, A. L. Dontchev, M. H. Geoffroy, Inexact Newton methods and Dennis–Moré theorems for nonsmooth generalized equations. SIAM J. Control Optim. 53 (2015), 1003–1019 .
- 7[7] R. Cibulka, A. L. Dontchev, V. M. Veliov, Lyusternik-Graves theorems for the sum of a Lipschitz function and a set-valued mapping. SIAM J. Control Optim. , to appear.
- 8[8] E. De Giorgi, A. Marino, and M. Tosques, Problems of evolution in metric spaces and maximal decreasing curve. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8) 68 , (1980), 180–187, in Italian. English translation in De Giorgi, Selected papers, Springer, Heidelberg 2013, 527–533.
