This paper develops new stability and error bounds for complex, nonconvex optimization problems and generalized equations, especially in challenging infinite-dimensional and irregular settings, using advanced metric space techniques.
Contribution
It introduces bounds on solution errors using truncated Hausdorff distances and extends calculus tools for these distances to handle compositions and complex problem structures.
Findings
01
Bounds on solution errors for nonconvex problems
02
Extensions of Hausdorff distance calculus to compositions
03
Applications to KKT systems and difference-of-convex functions
Abstract
Stability and error analysis remain challenging for problems that lack regularity properties near solutions, are subject to large perturbations, and might be infinite dimensional. We consider nonconvex optimization and generalized equations defined on metric spaces and develop bounds on solution errors using the truncated Hausdorff distance applied to graphs and epigraphs of the underlying set-valued mappings and functions. In the process, we extend the calculus of such distances to cover compositions and other constructions that arise in nonconvex problems. The results are applied to constrained problems with feasible sets that might have empty interiors, solution of KKT systems, and optimality conditions for difference-of-convex functions and composite functions.
Equations285
\mathop{\rm dist}(x,C):=\inf\left\{d_{X}(x,\bar{x})~{}|~{}\bar{x}\in C\right\}\mbox{ if $C$ is nonempty and }\mathop{\rm dist}(x,\emptyset):=\infty.
\mathop{\rm dist}(x,C):=\inf\left\{d_{X}(x,\bar{x})~{}|~{}\bar{x}\in C\right\}\mbox{ if $C$ is nonempty and }\mathop{\rm dist}(x,\emptyset):=\infty.
d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),S(\bar{x})\big{)}\leq\kappa(\rho)d_{X}(x,\bar{x})\mbox{ for }x,\bar{x}\in\mathbb{B}_{X}(\rho)\mbox{ and }\rho\in\mathbb{R}_{+}.
d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),S(\bar{x})\big{)}\leq\kappa(\rho)d_{X}(x,\bar{x})\mbox{ for }x,\bar{x}\in\mathbb{B}_{X}(\rho)\mbox{ and }\rho\in\mathbb{R}_{+}.
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Abstract. Stability and error analysis remain challenging for problems that lack regularity properties near solutions, are subject to large perturbations, and might be infinite dimensional. We consider nonconvex optimization and generalized equations defined on metric spaces and develop bounds on solution errors using the truncated Hausdorff distance applied to graphs and epigraphs of the underlying set-valued mappings and functions. In the process, we extend the calculus of such distances to cover compositions and other constructions that arise in nonconvex problems. The results are applied to constrained problems with feasible sets that might have empty interiors, solution of KKT systems, and optimality conditions for difference-of-convex functions and composite functions.
[TABLE]
1 Introduction
Since the early days of convex analysis, epigraphs have been central to understanding functions in the context of minimization problems. Local properties of epigraphs can be used to define subgradients while global properties characterize convexity and lower semicontinuity. The distance between two epigraphs bounds the discrepancy between the corresponding minima and near-minimizers. Likewise, set-valued mappings can be fully represented by their graphs, with graphical convergence being key to understanding approximations of solutions of generalized equations defined by such mappings. These set-based perspectives lead to a unified approach to stability and error analysis for a wide range of variational problems. In this paper, we estimate the truncated Hausdorff distance between sets
and demonstrate that it provides insight about the stability of constraint systems and optimization problems even when the feasible sets have empty interiors.
Without assuming any local properties, we establish that the truncated Hausdorff distance bounds the discrepancy between near-solutions of two generalized equations when applied to the graphs of the underlying set-valued mappings. The result is illustrated in the context of optimality conditions for difference-of-convex functions, composite functions, and nonlinear programs. Throughout, we focus on nonconvex problems. Most of the results are established for general metric spaces and therefore apply broadly, including in areas such as nonparametric statistics, optimal control, function identification, and decision rule optimization.
Stability and error analysis for optimization and, more generally, variational problems have been developed from several angles; see for example [23, 1, 31, 32, 14] for comprehensive treatments. There is an extensive literature on local stability based on metric regularity and calmness
[20, 30], tilt-stability
[18, 24, 17], full-stability
[27], and connections with iterative schemes
[22]; see also the monographs
[7, 26, 25]
and the surveys [29, 8]. This paper takes an alternative, global perspective that can be traced back to the late 60s and pioneering studies of the truncated Hausdorff distance between convex cones [40] and general convex sets [28]. The full potential of the approach emerges in [4, 5, 6], which establish that the truncated Hausdorff distances between epigraphs furnish bounds on the corresponding discrepancies between minima and minimizers; see also [10, 2, 12, 13] for parallel developments and especially the monograph [11] with its detailed treatment of topologies and metrics on spaces of closed sets. From the myriad of possibilities the Attouch-Wets distance [3] emerges as the theoretically most useful by virtue of being a metric on spaces of nonempty closed sets as well as other factors. Still, we concentrate on the truncated Hausdorff distance due to its more intuitive form and direct relationship to quantities of interest such as minima and minimizers. It anyhow furnishes accurate estimates of the Attouch-Wets distance [32, 33]. This global perspective based on set distances provides foundations for computationally attractive approximations of functions [35, 33, 34] and formulations of function identification problems [35], especially in nonparametric statistics [38, 37].
The difficulty of estimating the truncated Hausdorff distance for actual problem instances remains a major hurdle for its practical use. Fundamental results and calculus rules are laid out in [9, 4], but mostly for epigraphs in the convex case. Results on epi-multiplication and epi-sums are given in [4]. Inverse images of convex sets are well-behaved under sufficiently small perturbations. This fact enables the development of results for intersections of sets and sums of functions in the convex case [9]. Since the Legendre-Fenchel transform is an isometry for lower semicontinuous proper convex functions under a closely related pseudo-metric defined in terms of the epi-regularized functions [3], additional estimates of the truncated Hausdorff distance emerge via the dual operations under this transform [4]. In this paper, we switch the focus to nonconvex sets and functions and develop a series of results that support calculations of the truncated Hausdorff distance in practice.
Section 2 lays out the terminology and provides some motivating facts. Section 3 develops estimates for the truncated Hausdorff distance between arbitrary sets. Section 4 turns to specific results for epigraphs and applications in disjunctive programming, formulations with constraint softening, and penalty methods. Section 5 extends the methodology to set-valued mappings and demonstrates its usefulness for generalized equations such as those arising from optimality conditions. An appendix supplements with proofs.
2 Distances and Applications
For a point x in a metric space (X,dX) and C⊂X, we denote by dist(x,C) the usual point-to-set distance, i.e.,
[TABLE]
The excess of C over D⊂X is given by
[TABLE]
exs(C;D):=∞ if C nonempty and D empty, and exs(C;D):=0 otherwise.
The Pompeiu-Hausdorff distance between C and D is max{exs(C;D),exs(D;C)}, but tends to be infinity for unbounded sets and therefore is not central to our development. Instead, we rely on a localization argument relative to a point xctr∈X, which we call the
centroid of X. The choice of centroid can be made
arbitrarily, but results might be sharper if it is near the “interesting” parts of the sets at hand as we often restrict the attention to intersections of sets with the centered closed ball
[TABLE]
Given ρ≥0, we define the truncated Hausdorff distance between two sets C,D⊂X as
[TABLE]
which is always finite as long as C and D are nonempty and ρ<∞.
Trivially, dl^∞(C,D) is the Pompeiu-Hausdorff distance between C and D, but we focus on finite ρ in the following.
The notation for the truncated Hausdorff distance suppresses its dependence on the choice of metric and centroid. The following results holds for all metrics and centroids unless otherwise specified. In particular,
for a normed linear space the metric is consistently assumed to be the one induced by the norm and the centroid is the zero point of the space.
This is a harmless assumption, easily overcome, but kept here to simplify expressions. The “hat-notation” hints to a broader landscape of closely related distances between sets including the Attouch-Wets metric; see [32, Chapter 4] for a summary of results. Although the truncated Hausdorff distance fails to be a metric on spaces of nonempty closed sets, it is obviously nonnegative and symmetric. A triangle inequality of sort also holds. Let R+:=[0,∞).
2.1 Proposition
(triangle inequality, extended sense).
For a metric space X with centroid xctr, sets C1,C2,C3⊂X, and ρ∈R+,
[TABLE]
provided that ρˉ>2ρ+maxi=1,2,3dist(xctr,Ci).
**Proof. **The arguments in the proofs of [4, Prop. 1.2] and [33, Prop. 3.1] can easily be modified for the present assumptions.
For a function f:X→R:=[−∞,∞], the characterizing set in the context of minimization problems is its epigraph
[TABLE]
The truncated Hausdorff distance between epigraphs requires a metric and centroid for X×R and we consistently adopt
the product metric ((x,α),(xˉ,αˉ))↦max{dX(x,xˉ),∣α−αˉ∣} and centroid (xctr,0), where xctr is a centroid of X.
The main motivation for studying the truncated Hausdorff distance between epigraphs is its relation to minima and minimizers. We recall that inff:=inf{f(x)∣x∈X}, ε\mbox−argminf:={x∈domf∣f(x)≤inff+ε} for ε≥0, with domf:={x∈X∣f(x)<∞}, and levδf:={x∈X∣f(x)≤δ} for δ∈R. (We adopt the usual arithmetic rules for extended real-valued numbers with an orientation towards minimization so that ∞−∞ as well as −∞+∞ are set to ∞; see [32, 1.E].)
The application in the context of minimization problems becomes clear from the following two propositions, which are essentially in [5, 33]. Still, due to minor adjustments in assumptions we provide proofs in the appendix.
2.2 Proposition
(approximation of infima and near-minimizers).
For a metric space X, functions f,g:X→R, and ε,ρ∈R+,
[TABLE]
provided that inff,infg∈[−ρ,ρ−ε) and γ\mbox−argminf∩BX(ρ) as well as γ\mbox−argming∩BX(ρ) are nonempty for all γ>0, with the second assertion also requiring δ>ε+2dl^ρ(epif,epig).
These bounds are sharp as discussed in [33]. We note that δ cannot generally be equal to ε+2dl^ρ(epif,epig). For example, suppose that f(x)=x for x>0 and f(x)=∞ otherwise; and g(x)=x for x≥0 and g(x)=∞ otherwise. Then, for ρ≥0, dl^ρ(epif,epig)=0, argming={0}, argminf=∅, and exs(argming;argminf)=∞. The role of ρ emerges from the proposition: it needs to be large enough so that the epigraphs intersected with BX×R(ρ) retain points corresponding to infima and near-minimizers.
2.3 Proposition
(approximation of level sets). For a metric space X, functions f,g:X→R, ρ∈R+, and δ∈[−ρ,ρ],
[TABLE]
provided that ε>δ+exs(epig∩BX×R(ρ);epif).
A parallel development is possible for set-valued mappings from a metric space (X,dX) to a metric space (Y,dY). The values of a set-valued mapping S:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y are the subsets S(x)⊂Y, x∈X, and the graph of S is
[TABLE]
The truncated Hausdorff distance between such graphs requires a metric on X×Y. Throughout, we adopt the product metric ((x,y),(xˉ,yˉ))↦max{dX(x,xˉ),dY(y,yˉ)}. The centroid is likewise constructed from those of X and Y. A prime example of such mappings is the subgradient mapping \partial f:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;X for a convex function f on a Hilbert space X. We recall that a function f:X→R is proper if epif=∅ and f>−∞. It is lower-semicontinuous (lsc) if epif is closed as a subset of X×R.
2.4 Proposition
(approximation of subgradient mappings [4]).
For a Hilbert space X, proper lsc convex functions f,g:X→R, and ρ∈R+ exceeding dist(0,epif) and dist(0,epig), there exist κ,ρˉ∈R+ such that
[TABLE]
Explicit expressions for the constants κ and ρˉ in the proposition are available in [4]. Section 5 establishes that dl^ρ(gph∂f,gph∂g) bounds the discrepancy between near-solutions of the generalized equations 0∈∂f(x) and 0∈∂g(x). Thus, the proposition provides yet another way of bounding the distance between minimizers of f and those of g in the convex case.
We can bring forward the effect of a constraint set C⊂X when the function of interest is expressed as f+ιC, where
[TABLE]
Then, optimality conditions can be stated using normal cones. For example, if C⊂Rn and f:Rn→R are convex, then the generalized equation 0∈∂f(x)+NC(x) characterizes minimizers of f+ιC, where NC(x) is the normal cone of C at x in the sense of convex analysis; see [32, 6.C]. Consequently, it becomes important to examine the graph of a normal cone mapping N_{C}:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;X and its approximations.
2.5 Proposition
(approximation of normal cone mappings).
For closed convex subsets C,D of a Hilbert space and ρ∈R+ exceeding dist(0,C) and dist(0,D), there exist κ,ρˉ∈R+ such that
[TABLE]
**Proof. **In view of Cor. 3.2 below, the result is a direct application of Prop. 2.4 to the functions f=ιC and g=ιD.
These preliminary facts point to a strategy for stability and error analysis of optimization and variational problems that extends much beyond the convex case: estimate the truncated Hausdorff distances between the relevant constraint sets, graphs, and/or epigraphs, which then immediately provide bounds on the discrepancy between solutions. The next sections develop practical guidelines for computing the truncated Hausdorff distance and illustrate the strategy in concrete instances.
3 Distances between Sets
We start with results about product sets, unions, and convex hulls. The main theorem of the section bounds the truncated Hausdorff distance between images of sets under Lipschitz continuous set-valued mappings.
3.1 Proposition
(product sets). For each i=1,…,m, suppose that Ci,Di are subsets of a metric space (Xi,dXi) with centroid xictr and X=X1×⋯×Xm is equipped with the metric dX=maxi=1,…,mdXi and centroid xctr=(xictr,…,xmctr). Then, with C=C1×⋯×Cm and D=D1×⋯×Dm,
[TABLE]
If C∩BX(ρ) and D∩BX(ρ) are nonempty, then the relation holds with equality.
**Proof. **Let η=maxi=1,…,mdl^ρ(Ci,Di), x=(x1,…,xm)∈C∩BX(ρ), and ε>0. Since xi∈Ci∩BXi(ρ) and dist(xi,Di)≤exs(Ci∩BXi(ρ);Di)≤η, there exists yi∈Di with dXi(xi,yi)≤η+ε. We can repeat this construction for all i and obtain y=(y1,…,ym). Then, dX(x,y)=maxi=1,…,mdXi(xi,yi)≤η+ε. Thus, dist(x,D)≤η+ε and also exs(C∩BX(ρ);D)≤η+ε, which holds trivially also when C∩BX(ρ)=∅. Repeating the argument with the roles of C and D reversed establishes that dl^ρ(C,D)≤η+ε. Since this holds for all ε>0, dl^ρ(C,D)≤η and the first conclusion holds.
To establish the inequality the other way, let x=(x1,…,xm)∈C∩BX(ρ), ε>0, and i∈{1,…,m}. Then, there exists y=(y1,…,ym)∈D such that
[TABLE]
Since x∈C∩BX(ρ) is arbitrary, exs(Ci∩BXi(ρ);Di)≤dl^ρ(C,D)+ε. A similar argument with the roles of C and D reversed, allows us to conclude that exs(Di∩BXi(ρ);Ci)≤dl^ρ(C,D)+ε. Thus, dl^ρ(Ci,Di)≤dl^ρ(C,D)+ε. Since i and ε are arbitrary, the conclusion follows.
3.2 Corollary
(indicator functions).
For subsets C,D of a metric space and ρ∈R+,
[TABLE]
**Proof. **By Prop. 3.1, dl^ρ(epiιC,epiιD)=dl^ρ(C×R+,D×R+)=max{dl^ρ(C,D), dl^ρ(R+,R+)}=dl^ρ(C,D) as long as C∩BX(ρ) and D∩BX(ρ) are nonempty. If one or both of these sets are empty, the corollary holds trivially.
3.3 Proposition
(union of sets). For a metric space X, {Cα,Dα⊂X,α∈A}, with A being an arbitrary set, and ρ∈R+,
[TABLE]
**Proof. **Let C=∪α∈ACα, D=∪α∈ADα, and η=supα∈Adl^ρ(Cα,Dα). Suppose that x∈C∩BX(ρ). Then, there exists α∈A such that x∈Cα. Since Dα⊂D and x∈Cα∩BX(ρ),
[TABLE]
The arbitrary choice of x∈C∩BX(ρ) allows us to conclude that exs(C∩BX(ρ);D)≤η. The roles of C and D can be reversed yielding the conclusion.
There is no similar result for intersections. A revealing example is furnished already on R by C1=C2={0}, D1={−ε}, and D2={ε} with ε>0. Then, dl^ρ(C1∩C2,D1∩D2)=∞ because D1∩D2=∅. However, dl^ρ(Ci,Di)=2ε for ρ≥ε and i=1,2.
The difficult occurs even if C1∩C2 and D1∩D2 have nonempty interiors. Consider C1=D1=[−1,0]∪[1,2] and C2=[−1,0]∪[2,3] and D2=[−1,0]∪[2+ε,3] with ε∈(0,1). Then, C1∩C2=[−1,0]∪{2}, D1∩D2=[−1,0], and dl^ρ(Ci,Di)≤ε for i=1,2 and ρ≥3. Still, dl^ρ(C1∩C2,D1∩D2)=2. In the convex case, having intersections with nonempty interior remedy the situation to a large extent; see [9, Cor. 2.5]. In the general case, however, it is difficulty to say more than exs(∩α∈ACα;∩α∈ADα+)≤0, where Dα+={x∈X∣dist(x,Dα)≤exs(Cα;Dα)} for α∈A, which nevertheless provides guidance towards constructing outer approximations.
For large enough ρ, the operation of taking the convex hull is non-expansive under dl^ρ. We denote by conC the convex hull of a set C and N the natural numbers.
3.4 Proposition
(convex hulls).
For subsets C and D of a normed linear space X,
[TABLE]
when ρ∈[0,∞] is such that C,D⊂BX(ρ).
**Proof. **Suppose that x∈conC∩BX(ρ). Thus, there exist r∈N, x1, …,xr∈C, and α1,…,αr≥0, with ∑i=1rαi=1 such that x=∑i=1rαixi. Let ε>0. Since xi∈C∩BX(ρ), there exists yi∈D with
∥xi−yi∥−ε≤dist(xi,D)≤exs(C∩BX(ρ);D)≤dl^ρ(C,D). For y=∑i=1rαiyi, ∥x−y∥≤∑i=1rαi∥xi−yi∥≤dl^ρ(C,D)+ε. Thus, dist(x,conD)≤dl^ρ(C,D)+ε because y∈conD. Since ε and x are arbitrary, exs(conC∩BX(ρ);conD)≤dl^ρ(C,D). The conclusion then follows by symmetry.
The difficulty with unbounded sets and a finite ρ is illustrated by C={λ(−1,1), λ(1,−1)}⊂R2 and D={λ(1,1),λ(−1,−1)}⊂R2, with λ>0. For the norm ∥⋅∥∞ and ρ<λ, dl^ρ(conC,conD)=ρ but dl^ρ(C,D)=0. Near the origin C and D look the same (empty), but their convex hulls are locally rather different.
Next, we turn the focus towards images of sets, which provide foundations for several subsequent results.
For metric spaces (X,dX) and (Y,dY), we say that a set-valued mapping S:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y is Lipschitz continuous with modulusκ:R+→R+relative toρ∗∈[0,∞] if
[TABLE]
We retain this terminology also for point-valued mappings, in which case the left-hand side amounts to the truncated Hausdorff distance between two points.
The image of C⊂X under a set-valued mapping S:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y is the set S(C):=∪x∈CS(x). The corresponding inverse set-valued mapping is S−1(y):={x∈X∣y∈S(x)} for y∈Y. Moreover, for any nonempty C⊂X and f:X→R, infCf:=inf{f(x)∣x∈C} and supCf:=sup{f(x)∣x∈C}. When C is empty, infCf=∞ and supCf=−∞.
3.5 Theorem
(images under Lipschitz mappings).
Suppose that (X,dX) and (Y,dY) are metric spaces, with centroids xctr and yctr, respectively, ρ∈R+, and S,T:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y are nonempty-valued Lipschitz continuous with common modulus κ:R+→R+ relative to ρ∗∈[0,∞]. Then,
for any nonempty C,D⊂X,
[TABLE]
provided that ρ∗>2ρ+max{dist(yctr,S(C)),dist(yctr,S(D)),dist(yctr,T(D))}, ρˉ>0 exceeds
[TABLE]
and ρ^>ρˉ+dl^ρˉ(C,D).
**Proof. **First, we bound dl^ρ∗(S(C),S(D)). Suppose that yˉ∈S(C)∩BY(ρ∗). Then there exists xˉ∈S−1(yˉ)∩C such that dX(xˉ,xctr)≤ρˉ, i.e., xˉ∈C∩BX(ρˉ). Let ε∈(0,ρ^−ρˉ−dl^ρˉ(C,D)). There exists x∈D such that d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)\geq\mathop{\rm exs}\big{(}C\cap\mathbb{B}_{X}(\bar{\rho});D\big{)}\geq\mathop{\rm dist}(\bar{x},D)\geq d_{X}(\bar{x},x)-\varepsilon. Thus, dX(x,xctr)≤dX(xˉ,xctr)+dX(xˉ,x)≤ρˉ+dl^ρˉ(C,D)+ε≤ρ^ so that both xˉ and x are in BX(ρ^). There exists y∈S(x) such that dY(yˉ,y)≤dist(yˉ,S(x))+ε, which implies that y∈S(D). Then,
[TABLE]
which implies that exs(S(C)∩BY(ρ∗);S(D))≤κ(ρ^)dl^ρˉ(C,D)+(κ(ρ^)+1)ε. Repeating the arguments with the roles of C and D reversed and recognizing that ε is arbitrary, lead to
[TABLE]
Second, we bound d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(D),T(D)\big{)}. Suppose that yˉ∈S(D)∩BY(ρ∗). Then there exists xˉ∈S−1(yˉ)∩D such that dX(xˉ,xctr)≤ρˉ, i.e., xˉ∈D∩BX(ρˉ). Let ε>0. There exists y∈T(xˉ) such that dY(yˉ,y)≤dist(yˉ,T(xˉ))+ε, which implies that y∈T(D). Then,
[TABLE]
which implies that \mathop{\rm exs}(S(D)\cap\mathbb{B}_{Y}(\rho^{*});T(D))\leq\sup_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}+\varepsilon. Again by symmetry and the fact that ε is arbitrary, we conclude that
The requirement on ρˉ in the proposition is most easily verified when C and D are bounded, but other possibilities exist for example under a Lipschitz property on the inverse set-valued mappings. An example of this appears in Cor. 4.8 below.
Sums of sets arise among other places in subdifferential calculus: For functions f1 and f2, the set of subgradients ∂(f1+f2)(x)=∂f1(x)+∂f2(x) under appropriate assumptions [32, Sec. 10.9]; here and below subgradients are of the general kind111For f:Rn→R and a point xˉ where f is finite, we recall that v∈∂f(xˉ) (a subgradient of the regular kind) if and only if f(x)≥f(xˉ)+⟨v,x−xˉ⟩+o(∥x−xˉ∥2). Moreover, v∈∂f(xˉ) (a subgradient of the general kind) if and only if
there exist vν→v and xν→x, with f(xν)→f(x), such that vν∈∂f(xν). In the convex case, regular and general subgradients coincide. [32, 25]. Of course, the previous theorem could be used to establish a result about sums. We pursue a direct approach, with a proof in the appendix, as it is instructive and also brings forth a possible adjustment in the case of unbounded sets.
3.6 Proposition
(sums of sets).
For a normed linear space X, nonempty sets {Ci,Di⊂X,i=1,…,m}, and ρ∈R+,
[TABLE]
provided that Ci,Di⊂BX(ρ) for all i=1,2,…,m. If Ci,Di⊂BX(ρ) holds only for i=2,3,…,m, then the inequality remains valid as long as dl^ρ(C1,D1) is replaced by dl^mρ(C1,D1).
A motivation for allowing one unbounded set merges when studying a locally Lipschitz continuous function f:Rn→R, a nonempty closed set C⊂Rn, and the optimality condition 0∈∂f(x)+NC(x) [32, Exer 10.10], where NC(x) is the normal cone of C at x in the general sense [32, 25], i.e., NC(x)=∂ιC(x). Here, ∂f(x) is bounded, but NC(x) is not in the interesting cases. We observe that if there are two or more unbounded sets, then the assertion in the proposition fails. For an example in R2, let C1={λ(1,1+δ)∣λ≥0}, C2={λ(−1,−1+δ)∣λ≥0}, with δ>0, D1={λ(1,1)∣λ≥0}, and D2={λ(−1,−1)∣λ≥0}. All the sets are rays and therefore unbounded. Now, dl^ρ(Ci,Di)≤δρ for i=1,2. However, because C1+C2 is “nearly” the halfspace {(x1,x2)∣x1−x2≤0} for small δ but D1+D2={(x1,x2)∣x1=x2}, dl^ρ(C1+C2,D1+D2)=ρ.
The inequality in the proposition is sharp because for x,y,z∈X and C1={x}, C2={y}, D1={x+z}, and D2={y+z}, we have dl^ρ(C1+C2,D1+D2)=2∥z∥ and dl^ρ(Ci,Di)=∥z∥ for i=1,2 for sufficiently large ρ. Still, we can have strict inequality. For example, x,y∈X, x=y=0, and C1={x}, C2={−x}, D1={y}, and D2={−y}, we have dl^ρ(C1+C2,D1+D2)=0 and dl^ρ(Ci,Di)=∥x−y∥ for i=1,2 for sufficiently large ρ.
3.7 Corollary
(set multiplications).
For nonempty subsets C and D of a normed linear space, nonzero λ,μ∈R, and ρ∈R+,
[TABLE]
when ρˉ>(2ρ+max{∣λ∣dist(0,C),∣λ∣dist(0,D),∣μ∣dist(0,D)})max{∣λ−1∣,∣μ−1∣}.
**Proof. **The result follows from Thm. 3.5 by setting S(x)=λx and T(x)=μx.
We end the section by recording a useful fact about the distance between level-sets of two convex functions, which extends [32, Prop. 7.68] by allowing the functions to be different.
3.8 Proposition
(level-sets; convex case).
For ρ∈R+, α,β∈[−ρ,ρ], and proper convex lsc functions f,g:Rn→R, suppose that α>inff, β>infg, argminf=∅, and argming=∅. Then, with η=dl^ρ(epif,epig),
[TABLE]
provided that ρ0≥max{dist(0,argminf),dist(0,argming)} and ρ∗≥max{ρ0,ρ+dl^ρ(epif,epig)}.
**Proof. **By Prop. 4.5 in [33], exs(levαf∩BRn(ρ);levα+ηg)≤η. An application of Prop. 7.68 in [32] yields
[TABLE]
whenever α+η>β. If α+η≤β, then exs(levα+ηg∩BRn(ρ∗);levβg)=0.
Let x∈levαf∩BRn(ρ). There exists y∈levα+ηg with ∥y−x∥≤η so that y∈BRn(ρ∗). Thus, we have established that
[TABLE]
Repeating the argument with the roles of f and g reversed leads to the conclusion.
The proposition relies heavily on the assumption that levαf and levβg have nonempty interiors. The next section dispenses of that requirement as well as convexity.
4 Distances between Epigraphs of Functions
As special sets, epigraphs offer several possibilities to specialize the results of the previous section and also develop new ones. First, we examine the Kenmochi conditions and their numerous applications including in the analysis of constrained problems with feasible sets that lack interiors. Second, we develop a series of calculus rules relying, in part, on Section 3.
For a metric space (X,dX), let the closed balls at x∈X be denoted by
[TABLE]
4.1 Kenmochi Conditions and Applications
An alternative expression for the truncated Hausdorff distance between epigraphs is provided by the Kenmochi conditions, which can be traced back to [21]; see also [4].
The following result generalizes [33, Prop. 3.2] by relaxing a lsc assumption and establishing that the conditions provide tight estimates. A proof is provided in the appendix.
4.1 Proposition
(Kenmochi conditions).
For a metric space X, functions f,g:X→R, both with nonempty epigraphs, and ρ∈R+,
[TABLE]
For α∈(0,∞), a function f:X→R is α-Hölder continuous with modulusκ:R+→R+ if
[TABLE]
The function is Lipschitz continuous with modulusκ:R+→R+ if the relation holds with α=1.
The truncated Hausdorff distance between epigraphs of functions of this kind can be bounded by an expression involving the worst pointwise difference between the functions over a set.
4.2 Proposition
(estimates from sup-norm). For a metric space X, functions f,g:X→R with nonempty epigraphs, and ρ∈R+, we have that
[TABLE]
where Aρ=levρf∪levρg∩BX(ρ). (Supremum over an empty set is interpreted as zero in this case.)
Suppose also that f and g are α-Hölder continuous with common modulus κ:R+→R+ and α∈(0,∞). Then, for any nonempty C⊂X,
[TABLE]
provided that ρ^>ρ+exs(Aρ;C).
**Proof. **The first assertion holds via Prop. 4.1. For the second assertion, set η=exs(Aρ;C) and let ε∈(0,ρ^−ρ−η). Suppose that x∈levρf∩BX(ρ). Then, there exists xˉ∈C with dX(x,xˉ)≤ηˉ=η+ε and
[TABLE]
A similar result holds with the roles of f and g reversed. Thus, by Prop., 4.1dl^ρ(epif,epig)≤max{ηˉ,κ(ρ^)ηˉα+supC∣f−g∣}. Since ε is arbitrary, ηˉ can be replaced by η and the second conclusion holds.
Example 1: sample average approximations. In stochastic optimization and statistical learning, f:X→R is often given as f(x)=E[ψ(\boldmathξ,x)], where ψ:Ξ×X→R and E denotes the expectation under the distribution of the random vector \boldmathξ with values in Ξ. Under standard assumptions (see [32, Ch. 14], [39, Ch. 7]), f is well defined and Lipschitz continuous with modulus κ:R+→R+. An approximation of f could be the sample average function fν:X→R given by fν(x)=ν−1∑i=1νψ(ξi,x), where ξ1,…,ξν∈Ξ are given data. Under related assumptions, fν is also Lipschitz continuous with the same modulus as f.
When X is finitely compact222Recall that a metric space is finitely compact if all its balls are compact., Aρ in Prop. 4.2 is compact and it is possible to construct for any ε>0 a set C consisting of only a finite number of points and still have exs(Aρ;C)≤ε. Since C is finite, there exists a variety of ways of bounding supC∣f−fν∣, say by δ, using the theory of large deviations; see for example [39, Ch. 7]. Prop. 4.2 then gives that dl^ρ(epif,epifν)≤max{ε,κ(ρ^)ε+δ} when ρ^>ρ+ε.
The next result extends [33, Prop. 3.3] by moving from indicator functions to general functions and from Lipschitz to Hölder continuous functions; see also [4, 9] for results on sums in the convex case.
4.3 Proposition
(sums under Hölder continuity).
For a metric space X, functions fi,gi:X→R, i=1,2, where f1,g1 are α-Hölder continuous with common modulus κ:R+→R+, α∈(0,∞), and both epi(f1+f2) and epi(g1+g2) are nonempty. Then, for ρ∈R+,
[TABLE]
where η=dl^ρˉ(epif2,epig2), provided that Aρ=levρ(f1+f2)∪levρ(g1+g2)∩BX(ρ)=∅, ρˉ≥ρ+max{supBX(ρ)∣f1∣,supBX(ρ)∣g1∣}, and ρ^>ρ+η.
**Proof. **Let ε∈(0,ρ^−ρ−η) and x∈levρ(f1+f2)∩BX(ρ). Then, f2(x)≤ρ−f1(x)≤ρˉ. First, suppose that f2(x)≥−ρˉ so that (x,f2(x))∈epif2∩BX×R(ρˉ). Consequently, there is (xˉ,αˉ)∈epig2 with dX(x,xˉ)≤η+ε and ∣αˉ−f2(x)∣≤η+ε. Thus, g2(xˉ)≤αˉ≤f2(x)+η+ε and
[TABLE]
Second, suppose that f2(x)<−ρˉ. Then, (x,−ρˉ)∈epif2∩BX×R(ρˉ) and there is (xˉ,αˉ)∈epig2 with dX(x,xˉ)≤η+ε and ∣αˉ+ρˉ∣≤η+ε. Thus, g2(xˉ)≤αˉ≤−ρˉ+η+ε and, similar to above,
[TABLE]
The last inequality follows because f1(x)−ρˉ≤supBX(ρ)∣f1∣−ρˉ≤−ρ. Thus, in both cases, we obtain the same upper bound on infBX(x,η+ε)g1+g2. Repeating these arguments with the roles of f1,f2 switched with those of g1,g2, we obtain via Prop. 4.1 that d\hat{\kern-1.49994ptl}_{\rho}\big{(}\mathop{\rm epi}(f_{1}+f_{2}),\mathop{\rm epi}(g_{1}+g_{2})\big{)}\leq\max\{\eta+\varepsilon,\mathop{\rm sup}\nolimits_{A_{\rho}}|f_{1}-g_{1}|+\kappa(\hat{\rho})(\eta+\varepsilon)^{\alpha}+\eta+\varepsilon\}. Since ε is arbitrary, the conclusion follows.
Example 1: continued. Suppose that in addition to f the problem of interest involves a “regularizer” r:X→[0,∞), which is common in statistical learning, i.e., we aim to minimize f+r. We may want to examine the stability of solutions under changes to r. Let rν:X→[0,∞) be such an alternative regularizer. A prime example is when r=0 and we want to quantify the effect of the regularizer rν. We are therefore interested in comparing epi(f+r) to epi(fν+rν). Suppose that r and rν are α-Hölder continuous with common modulus μ:R+→R+ and α∈(0,∞), and X=Rn. A possible choice is to have rν(x)=∑j=1nsν(xj) with sν(τ)=λ∣τ∣−ντ2/2 when ∣τ∣≤λ/ν and sν(τ)=λ2/(2ν) otherwise, with λ>0 being a parameter. This makes rν a nonconvex function with Lipschitz modulus λ globally. An even more aggressive regularizer would be sν(τ)=ν−1∣τ∣, possibly further scaled, which is nonconvex but 1/2-Hölder continuous. Regardless, Prop. 4.3 establishes that
[TABLE]
where η=dl^ρˉ(epif,epifν) can be expressed in terms of κ, ε, and δ, and Aρ and ρ^ are sufficiently large as stipulated by the proposition. In particular when r=0, this error bound provides guidance on how fast the regularizer should vanish as the sample size ν grows. Typically, the sample error δ is of order ν−1/2, which indicates that rν should vanish at the same rate at least when α=1.
Example 2: disjunctive programming. Suppose that {Cα,α∈A} is a collection of nonempty subsets of a Hilbert space X and c∈X. Disjunctive programming studies problems of the form minimize ⟨c,x⟩ subject to x∈∪α∈ACα. The effect of replacing c by d∈X and the sets by {Dα=∅,α∈A} on the minimum value and set of near-minimizers can be bounded by Prop. 2.2 via Prop. 4.3 and Prop. 3.3. Specifically, let f(x)=⟨c,x⟩ if x∈C=∪α∈ACα and f(x)=∞ otherwise. Likewise, g(x)=⟨d,x⟩ if x∈D=∪α∈ADα and g(x)=∞ otherwise. Since infx∈BX(ρ)⟨c,x⟩≥−ρ∥c∥ and similarly with c replaced by d, ρˉ can be set to ρ(1+max{∥c∥,∥d∥}) in Prop. 4.3 and, in view of the Lipschitz continuity of ⟨c,⋅⟩ and ⟨d,⋅⟩,
[TABLE]
where the last inequality follows by Cor. 3.2 and Prop. 3.3. Consequently, solutions of disjunctive programs exhibit a Lipschitz property in this sense under a remarkable absence of assumptions.
As already discussed in Section 3, intersections of sets are generally not stable under perturbations of the individual sets. This fact is the source of many difficulties in constrained optimization. In particular, if the problem of minimizing f0(x) subject to x∈Cα for all α∈A is “approximated” by minimizing g0(x) subject to x∈Dα for all α∈A, with both supX∣f0−g0∣ and dl^ρ(Cα,Dα) being “small” for all α∈A, then their solutions can still be arbitrarily far apart. The issue surfaces even in one dimension: for example, set f0(x)=g0(x)=x, C1=D1={0,1}, C2=[0,1−ε], and D2=[ε,1] for ε∈(0,1). Thus, a major challenge is to construct approximating problems that are associated with small truncated Hausdorff distances to their original counterparts.
We observe that in the convex case having an intersection of constraint sets with nonempty interior suffices to avoid this difficulty as long as the approximations are sufficiently accurate; see [9, Cor. 2.5].
We illustrate three cases, while neither making assumptions about the feasible sets having an interior nor being convex. Moreover, the approximations can be arbitrarily poor, i.e., we are not only considering small perturbations. This forces us to construct approximating problems that are rather different than the actual problems because simply replacing objective functions and constraint sets by approximating counterparts usually fail to achieve small solution errors as the trivial example in the previous paragraph highlights.
Case I. The first case analyzes the feasibility problem of finding an x∈∩i=1mCi when we only have approximating sets D1,…,Dm. We construct an approximating optimization problem in a higher-dimensional space that furnishes an approximating solution of the actual feasibility problem and is computationally attractive as it “nearly” decomposes into m subproblems.
4.4 Theorem
(approximation of feasibility problem). For subsets C1,…,Cm and D1,…,Dm of a metric space (X,dX), with centroid xctr, λ∈(0,∞), ρ>2λ(m−1)maxi=1,…,mdX(xctr,Di), with ∩i=1mCi∩BX(ρ)=∅, and ρˉ∈(3ρ,∞), suppose that the following constraint qualification holds: there exists a nondecreasing function ψ:R+→R+ such that
[TABLE]
Then, any solution
[TABLE]
satisfies
[TABLE]
**Proof. **Let C=C1×⋯×Cm⊂Xm, D=D1×⋯×Dm⊂Xm, and define f,fλ,gλ:Xm→R to have f(x1,…,xm)=0 if (x1,…,xm)∈C and xi=x1 for all i, fλ(x1,…,xm)=λ∑i=1mdX(xi,x1) if (x1,…,xm)∈C, and gλ(x1,…,xm)=λ∑i=1mdX(xi,x1) if (x1,…,xm)∈D. Otherwise, the functions take the value ∞.
First, we examine the Kenmochi conditions for f and fλ. Suppose (x1,…,xm)∈levρˉf∩BXm(ρˉ). (Note that Xm=X×⋯×X is equipped with the product metric.) Then, (x1,…,xm)∈C and xi=x1 for all i. Thus, infBXm((x1,…,xm),0)fλ≤fλ(x1,…,xm)=0=f(x1,…,xm) and the first set of Kenmochi conditions holds with η=0. Next, suppose that (x1,…,xm)∈levρˉfλ∩BXm(ρˉ). Then, xi∈Ci for all i and λ∑i=1mdX(xi,x1)≤ρˉ. In view of the constraint qualification, this implies that
[TABLE]
Let ε>0. There exists xˉ∈∩i=1mCi such that dist(x1,∩i=1mCi)≥dX(x1,xˉ)−ε. Certainly,
[TABLE]
Then, with η=ρˉ/λ+ψ(ρˉ/λ)+ε,
[TABLE]
and the second set of Kenmochi conditions holds with this η.
Since ε is arbitrary, we have established via Prop. 4.1 that
[TABLE]
Second, we estimate dl^ρˉ(epifλ,epigλ). The Lipschitz modulus of the function (x1,…,xm)↦λ∑i=1mdX(xi,x1) is the constant 2mλ. By Prop. 3.1, Prop. 4.3, and Cor. 3.2,
[TABLE]
For any ε>0, we have that
[TABLE]
Thus, ρˉ>3ρ is sufficiently large for use in Prop. 2.1 and
[TABLE]
We next apply Prop. 2.2 to the functions f and gλ. The conditions of the proposition is easily verified. In particular, for (x1,…,xm)∈D,
[TABLE]
which together with the fact that dX(xi,x1)≤2maxi=1,…,mdist(xctr,Di)+ε for any ε>0 ensure that
[TABLE]
Consequently, Prop. 2.2 yields \mathop{\rm exs}\big{(}\mathop{\rm argmin}\nolimits g^{\lambda}\cap\mathbb{B}_{X^{m}}(\rho);~{}\delta\mbox{-}\mathop{\rm argmin}\nolimits f\big{)}\leq\eta for δ>2η. Since δ\mbox−argminf={(x1,…,xm)∈C∣xi=x1,i=1,…,m} for δ≥0, the conclusion holds.
The constraint qualification quantifies how close the points {xi∈Ci,i=1,…,m} will be to ∩i=1mCi when the points are close to each other. An example similar to the one discussed prior to the theorem is furnished by C1=D1={0,1}, C2=[0,1−δ], with δ∈(0,1), and D2=[ε,1−δ], with ε∈(0,1−δ], where dl^ρ(Ci,Di)≤ε for i=1,2 and ρ≥ε. Thus, C1∩C2={0}, but D1∩D2=∅ and it would be futile to attempt to find a feasible point in C1∩C2 by solving x∈D1∩D2. However, the approximating problem of the theorem produces the desired result. Specifically, in this case we can take ψ(γ)=γ/δ for γ≥0. Thus, the approximating problem produces a solution with error of at most ρˉ(λ−1+δ−1λ−1)+(1+4λ)ε. As \varepsilon{\raise 1.0pt\hbox{\scriptstyle,\searrow,}}0, this error vanishes as long as λ is set appropriately, for example to ε−1/2.
In general, the rate of convergence depends on the conditioning function ψ. Poor conditioning requires a large λ that in turn increases the third term in the conclusion of Thm. 4.4. Even in the convex case, the conditioning can be arbitrarily poor: let C1={x∈R2∣x2≤0} and C2={x∈R2∣x1α≤x2} for α>1, with C1∩C2={0}. Then, ψ(γ)=γ1/α and x1∈C1 and x2∈C2 can be close even though x1 is far from the origin for large α. Further details about constraint qualifications arise in the following two theorems for the case of inequality constraints.
Case II. The second case considers the optimization problem
[TABLE]
for which the actual functions need to be approximated by g0,…,gm. As already mentioned, an “approximating” problem obtained by simply replacing fi by gi for i=0,1,…,m might fail to be epigraphically close to the actual problem (1) even though maxi=0,…,msupx∈X∣fi(x)−gi(x)∣ is small. In particular, {x∈X∣gi(x)≤0,i=1,…,m} could be empty while the actual feasible set is nonempty. As an alternative, we examine for λ>0 the approximating problem
[TABLE]
with variable y=(y1,…,ym)∈Rm. We see next that this approximating problem furnishes approximating solutions for (1) via Prop. 2.2.
4.5 Theorem
(approximation by constraint softening). For a metric space X and fi,gi:X→R, i=0,1,…,m, where f0 and g0 are Lipschitz continuous with common modulus κ:R+→R+, consider the functions f,gλ:X×Rm→R defined by
[TABLE]
and, with λ∈(0,∞),
[TABLE]
Then333Here we use the product metric on X×Rm constructed from the sup-norm on Rm., for ρ∈R+,
[TABLE]
as long as ρˉ>2ρ+max{dist((xctr,0),epif),dist((xctr,0),epigλ)}, ρ∗≥ρˉ+max{0,−infBX(ρˉ)f0}, ρ^>ρˉ+max{ρ∗/λ,ψ−1(ρ∗/λ)}, and the following constraint qualification holds: there is a strictly increasing function ψ:R+→R+ such that
[TABLE]
**Proof. **As intermediate steps, we define h,hλ,fλ:X×Rm→R to have values h(x,y)=ιX×{0}(x,y)+ιC(x,y), with C={(x,y)∈X×Rm∣fi(x)≤yi,yi≥0,i=1,…,m}, and
[TABLE]
First, we examine the Kenmochi conditions for h and hλ.
Let (x,y)∈levρ∗hλ∩BX×Rm(ρ∗). Thus, (x,y)∈C, λ∑i=1myi≤ρ∗, and ∥y∥∞≤ρ∗/λ. Let ε>0 and η=max{ρ∗/λ,ψ−1(ρ∗/λ)}+ε. If fi(x)≤0 for all i, then
[TABLE]
Otherwise there is i∗ with fi∗(x)>0 so that
[TABLE]
and ψ−1(ρ∗/λ)≥dist(x,lev0{maxi=1,…,mfi}). There exists xˉ∈lev0{maxi=1,…,mfi} such that dX(x,xˉ)≤dist(x,lev0{maxi=1,…,mfi})+ε≤ψ−1(ρ∗/λ)+ε. Consequently,
[TABLE]
Thus, the second set of Kenmochi conditions holds with this η.
Since hλ≤h, the first set also holds. Consequently, since ε>0 is arbitrary and Prop. 4.1 applies, we have establish that
Second, we consider the Kenmochi conditions for fλ and gλ. Let δ=maxi=0,1,…,msupBX(ρˉ)∣fi−gi∣ and (x,y)∈levρˉfλ∩BX×Rm(ρˉ). Then, (x,y)∈C, fi(x)≤yi, and gi(x)≤yi+δ for all i=1,…,m. Set η=(1+mλ)δ and yˉ=y+(δ,…,δ). With B=BX×Rm((x,y),η), we obtain
[TABLE]
Repeating this argument with the roles of gλ and fλ reversed, we obtain via Prop. 4.1 that dl^ρˉ(epifλ,epigλ)≤(1+mλ)δ. Prop. 2.1 then yields the conclusion.
The theorem presents a tradeoff between two error terms. If the conditioning function ψ(γ)=γβ for β>0, then λ should be of the order O(δ−β/(1+β)) to balance the two terms, where δ=maxi=0,1,…,msupBX(ρˉ)∣fi−gi∣. This leads to the overall rate of convergence O(δ1/(1+β)), which can be significantly worse than what is indicated by the pointwise error δ. Still, the situation is much improved from the approach of simply minimizing g0(x) subject to gi(x)≤0 for i=1,…,m. As discussed prior to the theorem, that problem may have solutions that are arbitrarily far away from those of the actual problem (1). In some sense, the theorem explains the popularity of formulations with constraint softening in practice (see [15] for a prime example); they are in a fundamental way “robust” to inaccuracy in the constraint functions.
Theorem 4.5 makes no Slater-type constraint qualification for the actual problem and places no restrictions on the properties of the constraint functions at points in the feasible set. Naturally, if such conditions are brought in, we can improve the results; cf. Prop. 3.8 and [33, Thm. 4.6].
Case III. While still addressing the actual problem (1), the third case examines the classical penalty method and the resulting unconstrained approximating problems.
4.6 Theorem
(approximation by penalty formulation). For a metric space X, with centroid xctr, λ∈(0,∞), and fi,gi:X→R, i=0,1,…,m, where f0 and g0 are Lipschitz continuous with common modulus κ:R+→R+, consider the functions f,gλ:X×Rm→R defined by
[TABLE]
Then,
[TABLE]
provided that ρˉ>2ρ+max{dist(xctr,epif),dist(xctr,epigλ)}, ρ^>ρˉ+ψ−1((ρˉ−infBX(ρˉ)f0)λ−1), and the same constraint qualification as in Thm. 4.5 holds.
**Proof. **As an intermediate quantity, we define fλ:X→R to have values fλ(x)=f0(x)+λ∑i=1mmax{0,fi(x)}. We start by examining the Kenmochi conditions for f and fλ. Let x∈levρˉfλ∩BX(ρˉ) so that f0(x)+λ∑i=1mmax{0,fi(x)}≤ρˉ. If maxi=1,…,mfi(x)>0, then
[TABLE]
Since f0(x)≤ρˉ, infBX(ρˉ)f0≤ρˉ. These facts together with the constraint qualification lead to
[TABLE]
Let ε∈(0,ρ^−ρˉ−ψ−1((ρˉ−infBX(ρˉ)f0)λ−1)]. There exists xˉ∈lev0{maxi=1,…,mfi} such that dX(x,xˉ)≤η+ε and
[TABLE]
Alternatively, if maxi=1,…,mfi(x)≤0, then infBX(x,0)f≤f0(x)≤fλ(x). We have therefore established the second Kenmochi condition for f and fλ with error max{1,κ(ρ^)}(η+ε). Since f≥fλ, the first Kenmochi condition holds with an error of zero. Since ε>0 is arbitrary, we have established via Prop. 4.1 that
[TABLE]
Trivially, ∣fλ(x)−gλ(x)∣≤(1+mλ)maxi=0,1,…,msupBX(ρˉ)∣fi−gi∣ for x∈BX(ρˉ) so that dl^ρˉ(epifλ,epig) is also bounded by the same quantity; cf. Prop. 4.2. The conclusion then follows by Prop. 2.1.
We again find a tradeoff between two error terms that are nearly identical to those in Thm. 4.5. From this perspective, the penalty formulation has the same rate of convergence as that in Case II and is therefore stable even when the actual feasible set in (1) has an empty interior.
4.2 Calculus Rules for Compositions
The truncated Hausdorff distance between epigraphs of functions that are certain compositions can be bounded as we see next. The results of this subsection extend in some sense Prop. 4.3, which deals with sums. Composition rules for epi-sum and epi-multiplication can be found in [4]; see also [9] for a systematic treatment of the convex case including sums of convex functions.
4.7 Proposition
(compositions; Lipschitz inner mapping). For metric spaces (X,dX) and (Y,dY), with centroids xctr and yctr, respectively, f,g:Y→R, and F,G:X→Y, suppose that F^{-1},G^{-1}:Y\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;X are nonempty-valued and Lipschitz continuous with common modulus κ:R+→R+ relative to ρ∗∈[0,∞]. Then, for ρ∈R+,
[TABLE]
provided that ρ∗>2ρ+max{∣α∣,∣αˉ∣,dist(xctr,F−1(y)),dist(xctr,F−1(yˉ)),anddist(xctr,G−1(yˉ))}
for some (y,α)∈epif and (yˉ,αˉ)∈epig,
[TABLE]
and ρ^>ρˉ+dl^ρˉ(epif,epig).
**Proof. **Let F^,G^:X×R→Y×R have F^(x,α)=(F(x),α) and G^(x,α)=(G(x),α) for (x,α)∈X×R. Then, it follows directly that
[TABLE]
and we can bring in Thm. 3.5 with S=F^−1 and T=G^−1. Let ε>0. There exists x∈F−1(y) such that dX(xctr,x)≤dist(xctr,F−1(y))+ε. Then, f(F(x))=f(y)≤α and (x,α)∈epi(f∘F). Consequently,
[TABLE]
Similar arguments establish that
[TABLE]
This ensures that ρ∗ is selected sufficiently large for the application of Thm. 3.5. Next, we consider the size of ρˉ and find that
[TABLE]
Since similar statements hold with F replaced by G and epif replaced by epig, the condition on ρˉ suffices and Thm. 3.5 yields the conclusion.
4.8 Corollary
(compositions; linear inner mapping). For f,g:Rn→R and nonsingular n×n matrices A and B, suppose that φ,ψ:Rn→R are defined by φ(x)=f(Ax) and ψ(x)=g(Bx), x∈Rn. Then444Here we use the operator norm for matrices., for ρ∈R+,
[TABLE]
as long as ρˉ>max{1,∥A∥,∥B∥}(2ρ+max{∣α∣,∣αˉ∣,dist(0,A−1y), dist(0,A−1yˉ), and dist(0,B−1yˉ)}) for some (y,α)∈epif and (yˉ,αˉ)∈epig.
**Proof. **The result follows directly from Prop. 4.7.
The corollary extends in some sense [9, Cor. 2.6] by allowing for nonconvex f,g and different linear mappings, but at the expense of requiring invertible mappings.
4.9 Proposition
(compositions; Lipschitz outer function). For metric spaces (X,dX) and (Y,dY), with yctr being the centroid of Y, suppose that f:Y→R is Lipschitz continuous with modulus κ:R+→R+, and F,G:X→Y. Then, for ρ∈R+,
[TABLE]
provided that ρ^>ρˉ+dl^ρˉ(gphF,gphG) and
[TABLE]
**Proof. **Let η=dl^ρˉ(gphF,gphG), x∈levρ(f∘F)∩BX(ρ), and ε∈(0,ρ^−ρˉ−η]. Then, (x,F(x))∈BX×Y(ρˉ) and there exists xˉ∈X with dX(xˉ,x)≤η+ε and dY(F(x),G(xˉ))≤η+ε. Since both F(x),G(xˉ)∈BY(ρ^),
[TABLE]
We repeat the argument with the roles of F and G reversed and obtain via Prop. 4.1 that dl^ρ(epi(f∘F),epi(f∘G))≤max{1,κ(ρ^)}(η+ε). Since ε is arbitrary, the conclusion follows.
The previous two propositions largely summarize the line of reasoning in the proofs of Thm. 4.4, 4.5, and 4.6 and thereby facilitate various extensions of Cases I, II, and III.
4.10 Proposition
(inf-projections).
For a metric space X and {fα,gα:X→R}, with A an arbitrary set, define f,g:Rn→R as f(x)=infα∈Afα(x) and g(x)=infα∈Agα(x). Then, for ρ∈R+,
[TABLE]
**Proof. **In view of the fact that epif=∪α∈Aepifα and similarly for epig, the conclusion follows immediately from Prop. 3.3.
Since a function f=supα∈Afα has as epigraph the intersection of epifα,α∈A, it is clear from the discussion in Section 3 that no comparable result is possible for sup-projections. We refer to [9, Cor. 2.5] for a result in the convex case and [36, Thm. 5.6] for one under Lipschitz continuity assumptions.
Given metric spaces X and Y as well as f:X→R and F:X→Y, the epi-compositionFf:Y→R has
[TABLE]
Epi-compositions arise, for example, in parametric studies of equality constrained problems.
4.11 Proposition
(epi-compositions).
For metric spaces (X,dX) and (Y,dY), with xctr being the centroid of X, f,g:X→R, and Lipschitz continuous F,G:X→Y with common modulus κ:R+→R+ relative to ∞, suppose that
[TABLE]
Then, for ρ∈R+,
[TABLE]
provided that ρ∗>2ρ+max{dY(F(x),yctr),dY(F(xˉ),yctr),dY(G(xˉ),yctr),∣α∣,∣αˉ∣} for some (x,α)∈epif and (xˉ,αˉ)∈epig, ρˉ>ρ∗ and also exceeds
[TABLE]
and ρ^>ρˉ+dl^ρˉ(epif,epig).
**Proof. **We start by confirming that epiFf={(F(x),α)∣(x,α)∈epif}; a finite-dimensional version of this fact is asserted as Exercise 1.31 in [32]. For (xˉ,αˉ)∈epif, we have that inf{f(x)∣F(x)=F(xˉ)}≤f(xˉ)≤αˉ. Thus, epiFf⊃{(F(x),α)∣(x,α)∈epif}. Suppose that (y,α)∈epiFf. Then, (Ff)(y)<∞. If (Ff)(y)=−∞, then there exists xˉ∈X such that f(xˉ)≤α and F(xˉ)=y. Consequently, (y,α)∈{(F(x),α)∣(x,α)∈epif}. If (Ff)(y)∈R, then there exists by assumption xˉ∈X such that f(xˉ)=inf{f(x)∣F(x)=y} and F(xˉ)=y. Thus, f(xˉ)=(Ff)(y)≤α, (xˉ,α)∈epif, and epiFf⊂{(F(x),α)∣(x,α)∈epif}. We have confirmed the assertion, which also holds for Gf.
The conclusion follows by Thm. 3.5 applied to the mappings F^,G^:X×R→Y×R defined by F^(x,α)=(F(x),α) and G^(x,α)=(G(x),α). Since F and G are Lipschitz continuous with common modulus κ:R+→R+ relative to ∞, F^ and G^ are Lipschitz continuous with modulus ρ↦max{1,κ(ρ)} relative to any real number. The requirement on ρ∗ in Thm. 3.5 is satisfied because dist((yctr,0),F^(epif))≤max{dY(F(x),yctr),∣α∣} for (x,α)∈epif, with similar inequalities holding for G^ and epig. The requirement on ρˉ in Thm. 3.5 also is satisfied because
[TABLE]
with similar expressions for G^ and epig.
5 Distances between Graphs of Set-Valued Mappings
We next turn to the solution of generalized equations. For metric spaces X and Y, a set-valued mapping S:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y and a point y⋆∈Y define the generalized equation y⋆∈S(x). Its solution set is S−1(y⋆). In this section, we focus on the set of near-solutions that consists of those x∈X with S(x) “nearly reaching” y⋆. Specifically, for ε≥0, the set of ε-solutions is defined as
[TABLE]
For example, suppose that f:Rn→R is locally Lipschitz continuous and C⊂Rn is nonempty and closed. Then, an optimality conditions for the problem of minimizing f+ιC would be
[TABLE]
see [32, Exercise 10.10]. With S=∂f+NC and y⋆=0, the set of ε-solutions becomes
[TABLE]
The next theorem bounds the discrepancy between near-solutions of generalized equations in terms of the truncated Hausdorff distance without making assumptions about local regularity properties of the underlying set-valued mappings.
5.1 Theorem
(approximation of near-solutions of generalized equations). For metric spaces X and Y, suppose that S,T:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y have nonempty graphs, 0≤ε≤ρ<∞, and y⋆∈BY(ρ−ε). Then,
[TABLE]
provided that δ>ε+dl^ρ(gphS,gphT). If X and Y are finitely compact and gphT is closed, then the result also holds for δ=ε+dl^ρ(gphS,gphT).
**Proof. **Let γ∈(0,δ−ε−dl^ρ(gphS,gphT)]. Suppose that x\in S^{-1}\big{(}\mathbb{B}_{Y}(y^{\star},\varepsilon)\big{)}\cap\mathbb{B}_{X}(\rho). Then, there is y∈S(x) with dY(y,y⋆)≤ε so that (x,y)∈BX×Y(ρ). Consequently, for some (xˉ,yˉ)∈gphT,
[TABLE]
Moreover, dY(yˉ,y⋆)≤dY(yˉ,y)+dY(y,y⋆)≤dl^ρ(gphS,gphT)+γ+ε≤δ, which implies that xˉ∈T−1(BY(y⋆,δ)). We have established that
[TABLE]
Since γ is arbitrary, the first conclusion follows. The minimum distance to a nonempty closed subset of a finitely compact space is attained [33, Lemma 2.2], which allows us to use γ=0 in the above arguments. This establishes the second conclusion.
The result of the theorem is sharp. For example, consider S,T:\mathbb{R}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;\mathbb{R} with S(x)=[x,∞) when x∈[0,1] and S(x)=∅ otherwise; and T(x)=(1,∞) when x∈[1,2] and T(x)=∅ otherwise. Then for ρ≥0, dl^ρ(gphS,gphT)=1, S−1(0)={0}, T−1(δ)=[1,2], and exs(S−1(0)∩BR(ρ);T−1(BR(δ))=1 when δ>1. When δ≤1, the excess becomes infinity because T−1(δ)=∅. If T is modified to having T(x)=[1,∞) for x∈[1,2], then δ=1 gives an excess of one.
5.2 Theorem
(sum of mappings under Lipschitz property). For normed linear spaces X and Y, suppose that S_{1},T_{1}:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y are nonempty-valued and Lipschitz continuous with common modulus κ:R+→R+ relative to ρ∗∈[0,∞] and S_{2},T_{2}:X\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;Y have nonempty graphs. Then, for ρ∈R+,
[TABLE]
provided that ρˉ≥ρ+ρ′, with ρ′ such that BY(ρ′) contains both S1(x) and T1(x) for all x∈BX(ρ), ρ^>ρ+dl^ρˉ(gphS2,gphT2), and ρ∗>3ρ′+κ(ρ^)(ρ^−ρ).
**Proof. **Let (x,y)∈gph(T1+T2)∩BX×Y(ρ). Thus, for some y1∈T1(x) and y2∈T2(x) we have y=y1+y2 and ∥y2∥≤∥y∥+∥y1∥≤ρ+ρ′≤ρˉ. Let ε∈(0,ρ^−ρ−dl^ρˉ(gphS2,gphT2)]. Consequently, (x,y2)∈gphT2∩BX×Y(ρˉ) so there exists (xˉ,yˉ2)∈gphS2 with max{∥x−xˉ∥,∥y2−yˉ2∥}≤dl^ρˉ(gphS2,gphT2)+ε≤ρ^−ρ, which ensures that ∥xˉ∥≤∥x−xˉ∥+∥x∥≤ρ^−ρ+ρ≤ρ^. Since S1 is nonempty-valued, there is yˉ1∈S1(xˉ) such that dist(y1,S1(xˉ))≥∥y1−yˉ1∥−ε.
Therefore, (xˉ,yˉ1+yˉ2)∈gph(S1+S2).
Since y1∈BY(ρ′), it follows that
[TABLE]
where the last inequality is a consequence of Prop. 2.1; ρ∗ is indeed sufficiently large because dist(yctr,T1(x))≤ρ′, dist(yctr,S1(x))≤ρ′, and
[TABLE]
Moreover, with yˉ=yˉ1+yˉ2, ∥y−yˉ∥ is not greater than
[TABLE]
This establishes that (xˉ,yˉ)∈gph(S1+S2) satisfies
[TABLE]
Since (x,y) and ε are arbitrary, we obtain that
[TABLE]
The roles of (S1,S2) and (T1,T2) can be reversed, which leads to the conclusion.
A series of results are now possible with applications to games as well as equilibrium and generalized fixed-point problems. We limit the discussion to optimality conditions. As a preliminary example, let C,D⊂Rn be nonempty, possibly nonconvex sets and f,g:Rn→R be smooth and their gradients be Lipschitz continuous with modulus κ:R+→R+ relative to ρ∗=∞, i.e., ∥∇f(x)−∇f(xˉ)∥≤κ(ρ)∥x−xˉ∥ for ∥x∥≤ρ, ∥xˉ∥≤ρ, and ρ∈R+, with the same condition holding for ∇g. Thm. 5.2 enables a study of the optimality conditions 0∈∇f(x)+NC(x) and 0∈∇g(x)+ND(x). The discrepancy between the corresponding near-stationary points are bounded via Thm. 5.1 by
[TABLE]
for sufficiently large ρ^ and ρˉ with further simplifications possible if C and D are convex, cf. Prop. 2.5.
Example 3: difference-of-convex functions. For convex functions f1:Rn→R and f2:Rn→R, the latter also lsc and proper, as well as a point xˉ with f2(xˉ) finite, the following optimality condition holds555For subsets A and B of a linear space, A−B:={a−b∣a∈A,b∈B}. [19]:
[TABLE]
The minimization of such difference-of-convex functions arises in numerous applications include some in moderns statistics [16, 34]. Error analysis of near-stationarity in this case can be carried our as follows.
Suppose initially that f1,g1 are also smooth and ρ∈R+. Then, there are α,ρˉ∈R+ such that666We here use the Euclidean distance on Rn.
[TABLE]
which via Thm. 5.1 gives error estimates of near-stationary points. We can establish this fact by setting S1=−∇f1, T1=−∇g1, S2=∂f2, and T2=∂g2 so that S1 and T1 are nonempty-valued and Lipschitz continuous with some common modulus κ:R+→R+ relative to ρ∗=∞. An application of Thm. 5.2 with these set-valued mappings and ρ′=sup∥x∥2≤ρmax{∥∇f(x)∥2,∥∇g(x)∥2}, ρˉ=ρ+ρ′, and ρ^>ρ+dl^ρˉ(gph∂f2,gph∂g2) yields
[TABLE]
An application of Prop. 2.4 gives the result after an appropriate enlargement of ρˉ.
We can relax the assumption about f1 and g1 being smooth by stating the optimality condition in terms of the set-valued mappings S,T:\mathbb{R}^{n}\times\mathbb{R}^{n}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;\mathbb{R}^{n}\times\mathbb{R}^{n} with expressions
[TABLE]
Clearly, 0∈S(x,v) implies that 0∈∂f2(x)−∂f1(x); and 0∈∂f2(x)−∂f1(x) implies that there exists a “multiplier vector” v∈Rn such that 0∈S(x,v).
A bound on dl^ρ(gphS,gphT) will then via Thm. 5.1 furnish a bound on the difference between near-stationary points in the “primal-dual” space Rn×Rn as one passes from minimizing f2−f1 to minimizing g2−g1. For simplicity, we adopt the sup-norm for the remainder of this example. Specifically, we find that for ρ∈R+
[TABLE]
To see this let ((xˉ,vˉ),(yˉ1,yˉ2))∈gphS∩BR4n(ρ), i.e., yˉ1+vˉ∈∂f1(xˉ) and yˉ2+vˉ∈∂f2(xˉ).
For i=1,2, since ∥xˉ∥∞≤ρ and ∥yˉi+vˉ∥∞≤2ρ, there exists yi∈Rn such that
[TABLE]
which implies ((xˉ,vˉ),(y1,y2))∈gphT. The distance between ((xˉ,vˉ),(y1,y2)) and ((xˉ,vˉ),(yˉ1,yˉ2)) then yields the stated upper bound on dl^ρ(gphS,gphT).
Example 4: KKT conditions. Theorem 5.1 also applies to the KKT conditions for the problem
[TABLE]
when compared to those of an alternative, possibly approximating, problem obtained by replacing the functions by the smooth functions g0,g1,…,gm. Clearly, (x,y)∈Rn+m satisfies the KKT conditions for the actual problem if and only if 0∈S(x,y) and likewise those of the alternative problem if and only if 0∈T(x,y), where the set-valued mappings S,T:\mathbb{R}^{n+m}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;\mathbb{R}^{3m+n} have values
[TABLE]
with y=(y1,…,ym). A bound on the truncated Hausdorff distance between the graphs of these two set-valued mappings furnishes the critical component in the application of Thm. 5.1. In this example, we equip Rn+m and R3m+n with the sup-norm. Then, for ρ∈R+,
[TABLE]
where
[TABLE]
This assertion is realized as follows. Let ((x,y),(u,v,w,s))∈gphS∩BR4m+2n(ρ) be arbitrary and construct xˉ=x, yˉ=y, uˉ=(uˉ1,…,uˉm), with uˉi=max{gi(x),ui} for all i, vˉ=v, wˉ=(wˉ1,…,wˉm), with wˉi=yigi(x) for all i, and sˉ=∇g0(x)+∑i=1myi∇gi(x). It is trivial to verify that ((xˉ,yˉ),(uˉ,vˉ,wˉ,sˉ))∈gphT. For all i,
[TABLE]
[TABLE]
[TABLE]
Consequently, the distance between ((x,y),(u,v,w,s)) and ((xˉ,yˉ),(uˉ,vˉ,wˉ,sˉ)) is at most max{δ,ρδ,(1+mρ)η} and we have that exs((gphS∩BR4m+2n(ρ);gphT)) is bounded by the same quantity. The assertion then follows by symmetry.
We see that despite the fact that minimizers of inequality-constrained problems are unstable under pointwise perturbations of the constraint functions (cf. Section 4), the KKT system has stable solutions in the sense that the excess of near-solutions of one KKT system over those of the other exhibits a Lipschitz property in those perturbations.
We end the paper with a result that generalizes the ideas of Examples 3 and 4. For a proper lsc function φ:Rm→R and a smooth mapping F:Rn→Rm, we recall that under rather weak assumptions777For example, if φ is convex, then it suffices that domφ cannot be separated from the range of the linearized mapping w↦F(xˉ)+∇F(xˉ)w for a local minimizer xˉ. the composite function φ∘F has 0∈∇F(x)⊤∂φ(F(x)) as a necessary optimality condition [32, Thm. 10.6], where the m×n-matrix ∇F(x) is the Jacobian of F at x. By introducing auxiliary vectors y,z∈Rm, the optimality condition is equivalently stated in terms of the set-valued mapping S:\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{n} as 0∈S(x,y,z), with
[TABLE]
Since 0∈S(x,y,z) is also an optimality condition for the problem of minimizing φ(z) subject to F(x)=z, y can be interpreted as a multiplier vector and z as representing feasibility. Parallel conditions hold for a composite function ψ∘G expressed in terms of ψ:Rm→R and G:Rn→Rm, which we may think of as approximations of φ and F. Specifically, under the appropriate assumptions, an optimality condition becomes 0∈T(x,y,z), where the set-valued mapping T:\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\;{\lower 1.0pt\hbox{\rightarrow}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{\rightarrow}}\;\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{n} has
[TABLE]
In view of Thm. 5.1, a bound on dl^ρ(gphS,gphT) leads to estimates of the change in near-stationary points as we pass from φ∘F to ψ∘G.
5.3 Theorem
(stationarity of composite functions). For proper lsc functions φ,ψ:Rm→R, smooth mappings F,G:Rn→Rm, and the resulting set-valued mappings S and T expressed in (2) and (3), we have for
ρ∈R+ that888Here, dl^ρ is defined in terms of the product norm on Rn×Rm×Rm×Rm×Rm×Rn constructed by any norms on Rn and Rm and the matrix norm is any one compatible with the norm on Rm.
[TABLE]
**Proof. **Suppose that ((xˉ,yˉ,zˉ),(uˉ,vˉ,wˉ))∈gphS∩BX(ρ), where X=Rn×Rm×Rm×Rm×Rm×Rn and using the norm indicated in the footnote. Then,
[TABLE]
Since (zˉ,vˉ+yˉ)∈gph∂φ∩BRm×Rm(2ρ) (using the product norm on Rm×Rm) and the fact that gph∂ψ is nonempty [32, Cor. 8.10], there exist z,v∈Rm such that (z,v+yˉ)∈gph∂ψ and neither ∥z−zˉ∥ nor ∥(vˉ−yˉ)−(v−yˉ)∥ exceed dl^2ρ(gph∂φ,gph∂ψ). Construct u=G(xˉ)−z and w=∇G(xˉ)⊤yˉ. Clearly, ((xˉ,yˉ,z),(u,v,w))∈gphT and
[TABLE]
Moreover, due to the assumed compatibility of the adopted matrix norm relative to the norm on Rm,
[TABLE]
The point ((xˉ,yˉ,z),(u,v,w)) is therefore within a distance of
[TABLE]
of ((xˉ,yˉ,zˉ),(uˉ,vˉ,wˉ)), which establishes the conclusion after we realize the obvious symmetry in the result.
Appendix A Proofs
Proof of Prop. 2.2. Denote by dX the metric on X and η=dl^ρ(epif,epig). Let γ∈(0,ρ−ε−inff). Since γ\mbox−argminf∩BX(ρ)=∅, there exists xˉ∈BX(ρ) such that f(xˉ)≤inff+γ<ρ−ε≤ρ. Moreover, f(xˉ)≥inff≥−ρ. Thus, (xˉ,f(xˉ))∈epif∩BX×R(ρ) and there exists (x,α)∈epig such that max{dX(x,xˉ),∣α−f(xˉ)∣}≤dist((xˉ,f(xˉ)),epig)+γ. Then,
[TABLE]
and also η≥∣α−f(xˉ)∣−γ. Collecting the above results yield infg≤g(x)≤α≤f(xˉ)+η+γ≤inff+η+2γ. Since γ is arbitrary, we have established that infg≤inff+η. The same argument with the roles of f and g reversed leads to the first conclusion.
Let xˉ∈ε\mbox−argming∩BX(ρ). Then, g(xˉ)≤infg+ε<ρ, g(xˉ)≥infg≥−ρ, and (xˉ,g(xˉ))∈epig∩BX×R(ρ). Let γ>0. There exists (x,α)∈epif such that max{dX(x,xˉ),∣α−g(xˉ)∣}≤dist((xˉ,g(xˉ)),epif)+γ. Consequently, η≥dX(x,xˉ)−γ\mboxandη≥∣α−g(xˉ)∣−γ. These facts together with the first conclusion establish that f(x)≤α≤g(xˉ)+η+γ≤infg+ε+η+γ≤inff+ε+2η+γ. Thus, x∈(ε+2η+γ)\mbox−argminf and dX(x,xˉ)≤η+γ, and then also \mathop{\rm exs}(\varepsilon\mbox{-}\mathop{\rm argmin}\nolimits g\cap\mathbb{B}_{X}(\rho);~{}(\varepsilon+2\eta+\bar{\gamma})\mbox{-}\mathop{\rm argmin}\nolimits f\big{)}\leq\eta+\gamma when γˉ≥γ. Since γ is arbitrary, the second conclusion follows.
Proof of Prop. 2.3. Let xˉ∈levδg∩BX(ρ) and B=BX×R(ρ). Then, g(xˉ)≤δ≤ρ. There are two cases. Suppose that g(xˉ)≥−ρ. Then, (xˉ,g(xˉ))∈epig∩B. Let γ∈(0,ε−δ−exs(epig∩B;epif)). There exists (x,α)∈epif such that \max\{d_{X}(x,\bar{x}),|\alpha-g(\bar{x})|\}\leq\mathop{\rm dist}((\bar{x},g(\bar{x})),\mathop{\rm epi}f)+\gamma\leq\mathop{\rm exs}\big{(}\mathop{\rm epi}g\cap B;\mathop{\rm epi}f\big{)}+\gamma. Consequently,
[TABLE]
Thus, x∈levεf and dX(x,xˉ)≤exs(epig∩B;epif))+γ. This implies that
[TABLE]
If g(xˉ)<−ρ, the same holds because the arguments in that case can be carried out with g(xˉ) replaced by −ρ.
Since γ is arbitrary, the second conclusion follows.
Proof of Prop. 3.6. Let C=∑i=1mCi, D=∑i=1mDi, and ε>0. Suppose without loss of generality that dl^ρ(C,D)=exs(C∩BX(ρ);D). If C∩BX(ρ)=∅, dl^ρ(C,D)=0 and the result holds trivially. Thus, suppose that C∩BX(ρ)=∅. Then, there are xi∈Ci and yi∈Di, i=1,…,m, such that x=∑i=1mxi∈C∩BX(ρ), ∥xi−yi∥≤dist(xi,Di)+ε, and
[TABLE]
Since xi∈Ci implies xi∈BX(ρ),
[TABLE]
Hence, dl^ρ(C,D)≤∑i=1mdl^ρ(Ci,Di)+(m+1)ε. Since ε is arbitrary, the first conclusion follows. Under the relaxed assumption, x1∈BX(mρ) because {x,xi∈BX(ρ),i=2,…,m}.
Thus,
[TABLE]
Since the other arguments carry over, the second conclusion follows.
Proof of Prop. 4.1. Let η=dl^ρ(epif,epig) and ε>0. Suppose that (x,f(x))∈epif∩BX×R(ρ). Then, there exist (xˉ,αˉ)∈epig such that dX(xˉ,x)≤η+ε, ∣α−f(x)∣≤η+ε, and g(xˉ)≤α<∞. Thus, g(xˉ)≤α≤f(x)+η+ε≤max{f(x),−ρ}+η+ε. This establishes that infB(x,η+ε)g≤max{f(x),−ρ}+η+ε for x∈levρf∩BX(ρ) and f(x)≥−ρ. Suppose that x∈levρf∩BX(ρ) and f(x)<−ρ. Then, (x,−ρ)∈epif∩BX×R(ρ) and there exist (xˉ,αˉ)∈epig such that dX(xˉ,x)≤η+ε, ∣α+ρ∣≤η+ε, and g(xˉ)≤α<∞. Consequently,
[TABLE]
Repeating the arguments with the roles of f and g reversed, we establish that the two sets of constraint on the right-hand side in the proposition is satisfied with η+ε. Thus, the right-hand side does not exceed η+ε. Since ε is arbitrary, the right-hand side furnishes a lower bound on dl^ρ(epif,epig). By [33, Prop. 3.2], it is also an upper bound; the lsc assumption in that proposition is not needed in its proof.
Acknowledgement. This work is supported in part by DARPA (Lagrange) under HR0011-8-34187, ONR (Science of Autonomy) under N0001419WX00183, and AFOSR (Optimization and Discrete Mathematics) under F4FGA08272G001.
Bibliography40
The reference list from the paper itself. Each links out to its DOI / PubMed record.
1[1] H. Attouch. Variational Convergence for Functions and Operators . Applicable Mathematics Sciences. Pitman, 1984.
2[2] H. Attouch, R. Lucchetti, and R. J-B Wets. The topology of the ρ 𝜌 {\rho} -Hausdorff distance. Annali di Matematica pura ed applicata , CLX:303–320, 1991.
3[3] H. Attouch and R. J-B Wets. Isometries for the Legendre-Fenchel transform. Transactions of the American Mathematical Society , 296:33–60, 1986.
4[4] H. Attouch and R. J-B Wets. Quantitative stability of variational systems: I. The epigraphical distance. Transactions of the American Mathematical Society , 328(2):695–729, 1991.
5[5] H. Attouch and R. J-B Wets. Quantitative stability of variational systems: II. A framework for nonlinear conditioning. SIAM J. Optimization , 3:359–381, 1993.
6[6] H. Attouch and R. J-B Wets. Quantitative stability of variational systems: III. ε 𝜀 \varepsilon -approximate solutions. Mathematical Programming , 61:197–214, 1993.
7[7] J.-P. Aubin and I. Ekeland. Applied Nonlinear Analysis . Issue 1237 of Pure and applied mathematics. Wiley, 1984.
8[8] D. Aze. A survey on error bounds for lower semicontinuous functions. In Proceedings of 2003 MODESMAI Conference, ESAIM Proc., vol. 13. EDP Sci., Les Ulis (2003) , pages 1––17, 2003.