Stability and Error Analysis for Optimization and Generalized Equations

Johannes O. Royset

arXiv:1903.08754·math.OC·February 25, 2020·SIAM J. Optim.

Stability and Error Analysis for Optimization and Generalized Equations

Johannes O. Royset

PDF

TL;DR

This paper develops new stability and error bounds for complex, nonconvex optimization problems and generalized equations, especially in challenging infinite-dimensional and irregular settings, using advanced metric space techniques.

Contribution

It introduces bounds on solution errors using truncated Hausdorff distances and extends calculus tools for these distances to handle compositions and complex problem structures.

Findings

01

Bounds on solution errors for nonconvex problems

02

Extensions of Hausdorff distance calculus to compositions

03

Applications to KKT systems and difference-of-convex functions

Abstract

Stability and error analysis remain challenging for problems that lack regularity properties near solutions, are subject to large perturbations, and might be infinite dimensional. We consider nonconvex optimization and generalized equations defined on metric spaces and develop bounds on solution errors using the truncated Hausdorff distance applied to graphs and epigraphs of the underlying set-valued mappings and functions. In the process, we extend the calculus of such distances to cover compositions and other constructions that arise in nonconvex problems. The results are applied to constrained problems with feasible sets that might have empty interiors, solution of KKT systems, and optimality conditions for difference-of-convex functions and composite functions.

Equations285

\mathop{\rm dist}(x,C):=\inf\left\{d_{X}(x,\bar{x})~{}|~{}\bar{x}\in C\right\}\mbox{ if $C$ is nonempty and }\mathop{\rm dist}(x,\emptyset):=\infty.

\mathop{\rm dist}(x,C):=\inf\left\{d_{X}(x,\bar{x})~{}|~{}\bar{x}\in C\right\}\mbox{ if $C$ is nonempty and }\mathop{\rm dist}(x,\emptyset):=\infty.

exs (C; D) := sup {dist (x, D) ∣ x \in C} \mbox i f C, D \mbox a r e n o n e m pt y,

exs (C; D) := sup {dist (x, D) ∣ x \in C} \mbox i f C, D \mbox a r e n o n e m pt y,

B_{X} (ρ) := {x \in X ∣ d_{X} (x^{ctr}, x) \leq ρ} \mbox f or ρ \geq 0.

B_{X} (ρ) := {x \in X ∣ d_{X} (x^{ctr}, x) \leq ρ} \mbox f or ρ \geq 0.

d\hat{\kern-1.49994ptl}_{\rho}(C,D):=\max\Big{\{}\mathop{\rm exs}\big{(}C\cap\mathbb{B}_{X}(\rho);D\big{)},~{}\mathop{\rm exs}\big{(}D\cap\mathbb{B}_{X}(\rho);C\big{)}\Big{\}},

d\hat{\kern-1.49994ptl}_{\rho}(C,D):=\max\Big{\{}\mathop{\rm exs}\big{(}C\cap\mathbb{B}_{X}(\rho);D\big{)},~{}\mathop{\rm exs}\big{(}D\cap\mathbb{B}_{X}(\rho);C\big{)}\Big{\}},

d \hat{l}_{ρ} (C_{1}, C_{3}) \leq d \hat{l}_{\overset{ρ}{ˉ}} (C_{1}, C_{2}) + d \hat{l}_{\overset{ρ}{ˉ}} (C_{2}, C_{3})

d \hat{l}_{ρ} (C_{1}, C_{3}) \leq d \hat{l}_{\overset{ρ}{ˉ}} (C_{1}, C_{2}) + d \hat{l}_{\overset{ρ}{ˉ}} (C_{2}, C_{3})

\mathop{\rm epi}f:=\big{\{}(x,\alpha)\in X\times\mathbb{R}~{}|~{}f(x)\leq\alpha\big{\}}.

\mathop{\rm epi}f:=\big{\{}(x,\alpha)\in X\times\mathbb{R}~{}|~{}f(x)\leq\alpha\big{\}}.

∣ in f f - in f g ∣

∣ in f f - in f g ∣

\displaystyle\mathop{\rm exs}\big{(}\varepsilon\mbox{-}\mathop{\rm argmin}\nolimits g\cap\mathbb{B}_{X}(\rho);~{}\delta\mbox{-}\mathop{\rm argmin}\nolimits f\big{)}

\mathop{\rm exs}\big{(}\mathop{\mathop{\rm lev}}\nolimits_{\delta}g\cap\mathbb{B}_{X}(\rho);\mathop{\mathop{\rm lev}}\nolimits_{\varepsilon}f\big{)}\leq\mathop{\rm exs}\big{(}\mathop{\rm epi}g\cap\mathbb{B}_{X\times\mathbb{R}}(\rho);\mathop{\rm epi}f\big{)}\leq d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)

\mathop{\rm exs}\big{(}\mathop{\mathop{\rm lev}}\nolimits_{\delta}g\cap\mathbb{B}_{X}(\rho);\mathop{\mathop{\rm lev}}\nolimits_{\varepsilon}f\big{)}\leq\mathop{\rm exs}\big{(}\mathop{\rm epi}g\cap\mathbb{B}_{X\times\mathbb{R}}(\rho);\mathop{\rm epi}f\big{)}\leq d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)

\mathop{\rm gph}\nolimits S:=\big{\{}(x,y)\in X\times Y~{}\big{|}~{}y\in S(x)\big{\}}.

\mathop{\rm gph}\nolimits S:=\big{\{}(x,y)\in X\times Y~{}\big{|}~{}y\in S(x)\big{\}}.

d \hat{l}_{ρ} (gph \partial f, gph \partial g) \leq κ d \hat{l}_{\overset{ρ}{ˉ}} (epi f, epi g) .

d \hat{l}_{ρ} (gph \partial f, gph \partial g) \leq κ d \hat{l}_{\overset{ρ}{ˉ}} (epi f, epi g) .

ι_{C} (x) := 0 \mbox i f x \in C \mbox an d ι_{C} (x) := \infty \mbox o t h er w i se .

ι_{C} (x) := 0 \mbox i f x \in C \mbox an d ι_{C} (x) := \infty \mbox o t h er w i se .

d \hat{l}_{ρ} (gph N_{C}, gph N_{D}) \leq κ d \hat{l}_{\overset{ρ}{ˉ}} (C, D) .

d \hat{l}_{ρ} (gph N_{C}, gph N_{D}) \leq κ d \hat{l}_{\overset{ρ}{ˉ}} (C, D) .

d \hat{l}_{ρ} (C, D) \leq i = 1, \dots, m max d \hat{l}_{ρ} (C_{i}, D_{i}) \mbox f or an y ρ \in R_{+} .

d \hat{l}_{ρ} (C, D) \leq i = 1, \dots, m max d \hat{l}_{ρ} (C_{i}, D_{i}) \mbox f or an y ρ \in R_{+} .

dist (x_{i}, D_{i}) - ε \leq d_{X_{i}} (x_{i}, y_{i}) - ε \leq d_{X} (x, y) - ε \leq dist (x, D) \leq d \hat{l}_{ρ} (C, D) .

dist (x_{i}, D_{i}) - ε \leq d_{X_{i}} (x_{i}, y_{i}) - ε \leq d_{X} (x, y) - ε \leq dist (x, D) \leq d \hat{l}_{ρ} (C, D) .

d \hat{l}_{ρ} (epi ι_{C}, epi ι_{D}) = d \hat{l}_{ρ} (C, D) .

d \hat{l}_{ρ} (epi ι_{C}, epi ι_{D}) = d \hat{l}_{ρ} (C, D) .

d\hat{\kern-1.49994ptl}_{\rho}\Bigg{(}\bigcup_{\alpha\in A}C_{\alpha},\bigcup_{\alpha\in A}D_{\alpha}\Bigg{)}\leq\sup_{\alpha\in A}d\hat{\kern-1.49994ptl}_{\rho}(C_{\alpha},D_{\alpha}).

d\hat{\kern-1.49994ptl}_{\rho}\Bigg{(}\bigcup_{\alpha\in A}C_{\alpha},\bigcup_{\alpha\in A}D_{\alpha}\Bigg{)}\leq\sup_{\alpha\in A}d\hat{\kern-1.49994ptl}_{\rho}(C_{\alpha},D_{\alpha}).

dist (x, D) \leq dist (x, D_{α}) \leq exs (C_{α} \cap B_{X} (ρ); D_{α}) \leq d \hat{l}_{ρ} (C_{α}, D_{α}) \leq η .

dist (x, D) \leq dist (x, D_{α}) \leq exs (C_{α} \cap B_{X} (ρ); D_{α}) \leq d \hat{l}_{ρ} (C_{α}, D_{α}) \leq η .

d \hat{l}_{ρ} (con C, con D) \leq d \hat{l}_{ρ} (C, D)

d \hat{l}_{ρ} (con C, con D) \leq d \hat{l}_{ρ} (C, D)

d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),S(\bar{x})\big{)}\leq\kappa(\rho)d_{X}(x,\bar{x})\mbox{ for }x,\bar{x}\in\mathbb{B}_{X}(\rho)\mbox{ and }\rho\in\mathbb{R}_{+}.

d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),S(\bar{x})\big{)}\leq\kappa(\rho)d_{X}(x,\bar{x})\mbox{ for }x,\bar{x}\in\mathbb{B}_{X}(\rho)\mbox{ and }\rho\in\mathbb{R}_{+}.

d\hat{\kern-1.49994ptl}_{\rho}\big{(}S(C),T(D)\big{)}\leq\mathop{\rm sup}\nolimits_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}+\kappa(\hat{\rho})d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)

d\hat{\kern-1.49994ptl}_{\rho}\big{(}S(C),T(D)\big{)}\leq\mathop{\rm sup}\nolimits_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}+\kappa(\hat{\rho})d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)

\sup_{y\in U(E)\cap\mathbb{B}_{Y}(\rho^{*})}\Big{\{}\inf_{U^{-1}(y)\cap E}d_{X}(\cdot,x^{\rm ctr})\Big{\}}\mbox{ for }U=S,T\mbox{ and }E=C,D,

\sup_{y\in U(E)\cap\mathbb{B}_{Y}(\rho^{*})}\Big{\{}\inf_{U^{-1}(y)\cap E}d_{X}(\cdot,x^{\rm ctr})\Big{\}}\mbox{ for }U=S,T\mbox{ and }E=C,D,

d_{Y} (\overset{y}{ˉ}, y)

d_{Y} (\overset{y}{ˉ}, y)

\displaystyle\leq d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(\bar{x}),S(x)\big{)}+\varepsilon\leq\kappa(\hat{\rho})d_{X}(\bar{x},x)+\varepsilon\leq\kappa(\hat{\rho})d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)+(\kappa(\hat{\rho})+1)\varepsilon,

d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(C),S(D)\big{)}\leq\kappa(\hat{\rho})d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D).

d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(C),S(D)\big{)}\leq\kappa(\hat{\rho})d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D).

d_{Y} (\overset{y}{ˉ}, y)

d_{Y} (\overset{y}{ˉ}, y)

\displaystyle\leq d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(\bar{x}),T(\bar{x})\big{)}+\varepsilon\leq\sup_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}+\varepsilon,

d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(D),T(D)\big{)}\leq\sup_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}.

d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(D),T(D)\big{)}\leq\sup_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}.

d\hat{\kern-1.49994ptl}_{\rho}\Bigg{(}\sum_{i=1}^{m}C_{i},\sum_{i=1}^{m}D_{i}\Bigg{)}\leq\sum_{i=1}^{m}d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})

d\hat{\kern-1.49994ptl}_{\rho}\Bigg{(}\sum_{i=1}^{m}C_{i},\sum_{i=1}^{m}D_{i}\Bigg{)}\leq\sum_{i=1}^{m}d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})

d \hat{l}_{ρ} (λ C, μ D) \leq \overset{ρ}{ˉ} ∣ λ - μ ∣ + max {∣ λ ∣, ∣ μ ∣} d \hat{l}_{\overset{ρ}{ˉ}} (C, D),

d \hat{l}_{ρ} (λ C, μ D) \leq \overset{ρ}{ˉ} ∣ λ - μ ∣ + max {∣ λ ∣, ∣ μ ∣} d \hat{l}_{\overset{ρ}{ˉ}} (C, D),

d \hat{l}_{ρ} (lev_{α} f, lev_{β} g) \leq η + (ρ^{*} + ρ_{0}) max {\frac{α + η - β}{α + η - in f g}, \frac{β + η - α}{β + η - in f f}}

d \hat{l}_{ρ} (lev_{α} f, lev_{β} g) \leq η + (ρ^{*} + ρ_{0}) max {\frac{α + η - β}{α + η - in f g}, \frac{β + η - α}{β + η - in f f}}

\mathop{\rm exs}\big{(}\mathop{\mathop{\rm lev}}\nolimits_{\alpha+\eta}g\cap\mathbb{B}_{\mathbb{R}^{n}}(\rho^{*});\mathop{\mathop{\rm lev}}\nolimits_{\beta}g\big{)}\leq\frac{\alpha+\eta-\beta}{\alpha+\eta-\inf g}(\rho^{*}+\rho_{0})

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

**Stability and Error Analysis for

Optimization and Generalized Equations **

Johannes O. Royset

Operations Research Department

Naval Postgraduate School

[email protected]

Abstract. Stability and error analysis remain challenging for problems that lack regularity properties near solutions, are subject to large perturbations, and might be infinite dimensional. We consider nonconvex optimization and generalized equations defined on metric spaces and develop bounds on solution errors using the truncated Hausdorff distance applied to graphs and epigraphs of the underlying set-valued mappings and functions. In the process, we extend the calculus of such distances to cover compositions and other constructions that arise in nonconvex problems. The results are applied to constrained problems with feasible sets that might have empty interiors, solution of KKT systems, and optimality conditions for difference-of-convex functions and composite functions.

[TABLE]

1 Introduction

Since the early days of convex analysis, epigraphs have been central to understanding functions in the context of minimization problems. Local properties of epigraphs can be used to define subgradients while global properties characterize convexity and lower semicontinuity. The distance between two epigraphs bounds the discrepancy between the corresponding minima and near-minimizers. Likewise, set-valued mappings can be fully represented by their graphs, with graphical convergence being key to understanding approximations of solutions of generalized equations defined by such mappings. These set-based perspectives lead to a unified approach to stability and error analysis for a wide range of variational problems. In this paper, we estimate the truncated Hausdorff distance between sets and demonstrate that it provides insight about the stability of constraint systems and optimization problems even when the feasible sets have empty interiors. Without assuming any local properties, we establish that the truncated Hausdorff distance bounds the discrepancy between near-solutions of two generalized equations when applied to the graphs of the underlying set-valued mappings. The result is illustrated in the context of optimality conditions for difference-of-convex functions, composite functions, and nonlinear programs. Throughout, we focus on nonconvex problems. Most of the results are established for general metric spaces and therefore apply broadly, including in areas such as nonparametric statistics, optimal control, function identification, and decision rule optimization.

Stability and error analysis for optimization and, more generally, variational problems have been developed from several angles; see for example [23, 1, 31, 32, 14] for comprehensive treatments. There is an extensive literature on local stability based on metric regularity and calmness [20, 30], tilt-stability [18, 24, 17], full-stability [27], and connections with iterative schemes [22]; see also the monographs [7, 26, 25] and the surveys [29, 8]. This paper takes an alternative, global perspective that can be traced back to the late 60s and pioneering studies of the truncated Hausdorff distance between convex cones [40] and general convex sets [28]. The full potential of the approach emerges in [4, 5, 6], which establish that the truncated Hausdorff distances between epigraphs furnish bounds on the corresponding discrepancies between minima and minimizers; see also [10, 2, 12, 13] for parallel developments and especially the monograph [11] with its detailed treatment of topologies and metrics on spaces of closed sets. From the myriad of possibilities the Attouch-Wets distance [3] emerges as the theoretically most useful by virtue of being a metric on spaces of nonempty closed sets as well as other factors. Still, we concentrate on the truncated Hausdorff distance due to its more intuitive form and direct relationship to quantities of interest such as minima and minimizers. It anyhow furnishes accurate estimates of the Attouch-Wets distance [32, 33]. This global perspective based on set distances provides foundations for computationally attractive approximations of functions [35, 33, 34] and formulations of function identification problems [35], especially in nonparametric statistics [38, 37].

The difficulty of estimating the truncated Hausdorff distance for actual problem instances remains a major hurdle for its practical use. Fundamental results and calculus rules are laid out in [9, 4], but mostly for epigraphs in the convex case. Results on epi-multiplication and epi-sums are given in [4]. Inverse images of convex sets are well-behaved under sufficiently small perturbations. This fact enables the development of results for intersections of sets and sums of functions in the convex case [9]. Since the Legendre-Fenchel transform is an isometry for lower semicontinuous proper convex functions under a closely related pseudo-metric defined in terms of the epi-regularized functions [3], additional estimates of the truncated Hausdorff distance emerge via the dual operations under this transform [4]. In this paper, we switch the focus to nonconvex sets and functions and develop a series of results that support calculations of the truncated Hausdorff distance in practice.

Section 2 lays out the terminology and provides some motivating facts. Section 3 develops estimates for the truncated Hausdorff distance between arbitrary sets. Section 4 turns to specific results for epigraphs and applications in disjunctive programming, formulations with constraint softening, and penalty methods. Section 5 extends the methodology to set-valued mappings and demonstrates its usefulness for generalized equations such as those arising from optimality conditions. An appendix supplements with proofs.

2 Distances and Applications

For a point $x$ in a metric space $(X,d_{X})$ and $C\subset X$ , we denote by $\mathop{\rm dist}(x,C)$ the usual point-to-set distance, i.e.,

[TABLE]

The excess of $C$ over $D\subset X$ is given by

[TABLE]

$\mathop{\rm exs}(C;D):=\infty$ if $C$ nonempty and $D$ empty, and $\mathop{\rm exs}(C;D):=0$ otherwise. The Pompeiu-Hausdorff distance between $C$ and $D$ is $\max\{\mathop{\rm exs}(C;D),\mathop{\rm exs}(D;C)\}$ , but tends to be infinity for unbounded sets and therefore is not central to our development. Instead, we rely on a localization argument relative to a point $x^{\text{ctr}}\in X$ , which we call the centroid of $X$ . The choice of centroid can be made arbitrarily, but results might be sharper if it is near the “interesting” parts of the sets at hand as we often restrict the attention to intersections of sets with the centered closed ball

[TABLE]

Given $\rho\geq 0$ , we define the truncated Hausdorff distance between two sets $C,D\subset X$ as

[TABLE]

which is always finite as long as $C$ and $D$ are nonempty and $\rho<\infty$ . Trivially, $d\hat{\kern-1.49994ptl}_{\infty}(C,D)$ is the Pompeiu-Hausdorff distance between $C$ and $D$ , but we focus on finite $\rho$ in the following.

The notation for the truncated Hausdorff distance suppresses its dependence on the choice of metric and centroid. The following results holds for all metrics and centroids unless otherwise specified. In particular,

for a normed linear space the metric is consistently assumed to be the one induced by the norm and the centroid is the zero point of the space.

This is a harmless assumption, easily overcome, but kept here to simplify expressions. The “hat-notation” hints to a broader landscape of closely related distances between sets including the Attouch-Wets metric; see [32, Chapter 4] for a summary of results. Although the truncated Hausdorff distance fails to be a metric on spaces of nonempty closed sets, it is obviously nonnegative and symmetric. A triangle inequality of sort also holds. Let $\mathbb{R}_{+}:=[0,\infty)$ .

2.1 Proposition

(triangle inequality, extended sense). For a metric space $X$ with centroid $x^{\rm ctr}$ , sets $C_{1},C_{2},C_{3}\subset X$ , and $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $\bar{\rho}>2\rho+\max_{i=1,2,3}\mathop{\rm dist}(x^{\rm ctr},C_{i})$ .

**Proof. **The arguments in the proofs of [4, Prop. 1.2] and [33, Prop. 3.1] can easily be modified for the present assumptions.

For a function $f:X\to\overline{\mathbb{R}}:=[-\infty,\infty]$ , the characterizing set in the context of minimization problems is its epigraph

[TABLE]

The truncated Hausdorff distance between epigraphs requires a metric and centroid for $X\times\mathbb{R}$ and we consistently adopt

the product metric $((x,\alpha),(\bar{x},\bar{\alpha}))\mapsto\max\{d_{X}(x,\bar{x}),|\alpha-\bar{\alpha}|\}$ and centroid $(x^{\text{ctr}},0)$ , where $x^{\text{ctr}}$ is a centroid of $X$ .

The main motivation for studying the truncated Hausdorff distance between epigraphs is its relation to minima and minimizers. We recall that $\inf f:=\inf\{f(x)~{}|$ $x\in X\}$ , $\varepsilon\mbox{-}\mathop{\rm argmin}\nolimits f:=\{x\in\mathop{\rm dom}f~{}|~{}f(x)\leq\inf f+\varepsilon\}$ for $\varepsilon\geq 0$ , with $\mathop{\rm dom}f$ $:=$ $\{x\in X~{}|~{}f(x)<\infty\}$ , and $\mathop{\mathop{\rm lev}}\nolimits_{\delta}f:=\{x\in X~{}|~{}f(x)\leq\delta\}$ for $\delta\in\overline{\mathbb{R}}$ . (We adopt the usual arithmetic rules for extended real-valued numbers with an orientation towards minimization so that $\infty-\infty$ as well as $-\infty+\infty$ are set to $\infty$ ; see [32, 1.E].) The application in the context of minimization problems becomes clear from the following two propositions, which are essentially in [5, 33]. Still, due to minor adjustments in assumptions we provide proofs in the appendix.

2.2 Proposition

(approximation of infima and near-minimizers). For a metric space $X$ , functions $f,g:X\to\overline{\mathbb{R}}$ , and $\varepsilon,\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $\mathop{\rm inf}\nolimits f,\inf g\in[-\rho,\rho-\varepsilon)$ and $\gamma\mbox{-}\mathop{\rm argmin}\nolimits f\cap\mathbb{B}_{X}(\rho)$ as well as $\gamma\mbox{-}\mathop{\rm argmin}\nolimits g\cap\mathbb{B}_{X}(\rho)$ are nonempty for all $\gamma>0$ , with the second assertion also requiring $\delta>\varepsilon+2d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ .

These bounds are sharp as discussed in [33]. We note that $\delta$ cannot generally be equal to $\varepsilon+2d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ . For example, suppose that $f(x)=x$ for $x>0$ and $f(x)=\infty$ otherwise; and $g(x)=x$ for $x\geq 0$ and $g(x)=\infty$ otherwise. Then, for $\rho\geq 0$ , $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)=0$ , $\mathop{\rm argmin}\nolimits g=\{0\}$ , $\mathop{\rm argmin}\nolimits f=\emptyset$ , and $\mathop{\rm exs}(\mathop{\rm argmin}\nolimits g;\mathop{\rm argmin}\nolimits f)=\infty$ . The role of $\rho$ emerges from the proposition: it needs to be large enough so that the epigraphs intersected with $\mathbb{B}_{X\times\mathbb{R}}(\rho)$ retain points corresponding to infima and near-minimizers.

2.3 Proposition

(approximation of level sets). For a metric space $X$ , functions $f,g:X\to\overline{\mathbb{R}}$ , $\rho\in\mathbb{R}_{+}$ , and $\delta\in[-\rho,\rho]$ ,

[TABLE]

provided that $\varepsilon>\delta+\mathop{\rm exs}(\mathop{\rm epi}g\cap\mathbb{B}_{X\times\mathbb{R}}(\rho);\mathop{\rm epi}f)$ .

A parallel development is possible for set-valued mappings from a metric space $(X,d_{X})$ to a metric space $(Y,d_{Y})$ . The values of a set-valued mapping $S:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ are the subsets $S(x)\subset Y$ , $x\in X$ , and the graph of $S$ is

[TABLE]

The truncated Hausdorff distance between such graphs requires a metric on $X\times Y$ . Throughout, we adopt the product metric $((x,y),(\bar{x},\bar{y}))\mapsto\max\{d_{X}(x,\bar{x}),d_{Y}(y,\bar{y})\}$ . The centroid is likewise constructed from those of $X$ and $Y$ . A prime example of such mappings is the subgradient mapping $\partial f:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;X$ for a convex function $f$ on a Hilbert space $X$ . We recall that a function $f:X\to\overline{\mathbb{R}}$ is proper if $\mathop{\rm epi}f\neq\emptyset$ and $f>-\infty$ . It is lower-semicontinuous (lsc) if $\mathop{\rm epi}f$ is closed as a subset of $X\times\mathbb{R}$ .

2.4 Proposition

(approximation of subgradient mappings [4]). For a Hilbert space $X$ , proper lsc convex functions $f,g:X\to\overline{\mathbb{R}}$ , and $\rho\in\mathbb{R}_{+}$ exceeding $\mathop{\rm dist}(0,\mathop{\rm epi}f)$ and $\mathop{\rm dist}(0,\mathop{\rm epi}g)$ , there exist $\kappa,\bar{\rho}\in\mathbb{R}_{+}$ such that

[TABLE]

Explicit expressions for the constants $\kappa$ and $\bar{\rho}$ in the proposition are available in [4]. Section 5 establishes that $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits\partial f,\mathop{\rm gph}\nolimits\partial g)$ bounds the discrepancy between near-solutions of the generalized equations $0\in\partial f(x)$ and $0\in\partial g(x)$ . Thus, the proposition provides yet another way of bounding the distance between minimizers of $f$ and those of $g$ in the convex case.

We can bring forward the effect of a constraint set $C\subset X$ when the function of interest is expressed as $f+\iota_{C}$ , where

[TABLE]

Then, optimality conditions can be stated using normal cones. For example, if $C\subset\mathbb{R}^{n}$ and $f:\mathbb{R}^{n}\to\mathbb{R}$ are convex, then the generalized equation $0\in\partial f(x)+N_{C}(x)$ characterizes minimizers of $f+\iota_{C}$ , where $N_{C}(x)$ is the normal cone of $C$ at $x$ in the sense of convex analysis; see [32, 6.C]. Consequently, it becomes important to examine the graph of a normal cone mapping $N_{C}:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;X$ and its approximations.

2.5 Proposition

(approximation of normal cone mappings). For closed convex subsets $C,D$ of a Hilbert space and $\rho\in\mathbb{R}_{+}$ exceeding $\mathop{\rm dist}(0,C)$ and $\mathop{\rm dist}(0,D)$ , there exist $\kappa,\bar{\rho}\in\mathbb{R}_{+}$ such that

[TABLE]

**Proof. **In view of Cor. 3.2 below, the result is a direct application of Prop. 2.4 to the functions $f=\iota_{C}$ and $g=\iota_{D}$ .

These preliminary facts point to a strategy for stability and error analysis of optimization and variational problems that extends much beyond the convex case: estimate the truncated Hausdorff distances between the relevant constraint sets, graphs, and/or epigraphs, which then immediately provide bounds on the discrepancy between solutions. The next sections develop practical guidelines for computing the truncated Hausdorff distance and illustrate the strategy in concrete instances.

3 Distances between Sets

We start with results about product sets, unions, and convex hulls. The main theorem of the section bounds the truncated Hausdorff distance between images of sets under Lipschitz continuous set-valued mappings.

3.1 Proposition

(product sets). For each $i=1,\dots,m$ , suppose that $C_{i},D_{i}$ are subsets of a metric space $(X_{i},d_{X_{i}})$ with centroid $x^{\rm ctr}_{i}$ and $X=X_{1}\times\dots\times X_{m}$ is equipped with the metric $d_{X}=\max_{i=1,\dots,m}d_{X_{i}}$ and centroid $x^{\rm ctr}=(x^{\rm ctr}_{i},\dots,x^{\rm ctr}_{m})$ . Then, with $C=C_{1}\times\dots\times C_{m}$ and $D=D_{1}\times\dots\times D_{m}$ ,

[TABLE]

If $C\cap\mathbb{B}_{X}(\rho)$ and $D\cap\mathbb{B}_{X}(\rho)$ are nonempty, then the relation holds with equality.

**Proof. **Let $\eta=\max_{i=1,\dots,m}d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})$ , $x=(x_{1},\dots,x_{m})\in C\cap\mathbb{B}_{X}(\rho)$ , and $\varepsilon>0$ . Since $x_{i}\in C_{i}\cap\mathbb{B}_{X_{i}}(\rho)$ and $\mathop{\rm dist}(x_{i},D_{i})\leq\mathop{\rm exs}(C_{i}\cap\mathbb{B}_{X_{i}}(\rho);D_{i})\leq\eta$ , there exists $y_{i}\in D_{i}$ with $d_{X_{i}}(x_{i},y_{i})\leq\eta+\varepsilon$ . We can repeat this construction for all $i$ and obtain $y=(y_{1},\dots,y_{m})$ . Then, $d_{X}(x,y)=\max_{i=1,\dots,m}d_{X_{i}}(x_{i},y_{i})\leq\eta+\varepsilon$ . Thus, $\mathop{\rm dist}(x,D)\leq\eta+\varepsilon$ and also $\mathop{\rm exs}(C\cap\mathbb{B}_{X}(\rho);D)\leq\eta+\varepsilon$ , which holds trivially also when $C\cap\mathbb{B}_{X}(\rho)=\emptyset$ . Repeating the argument with the roles of $C$ and $D$ reversed establishes that $d\hat{\kern-1.49994ptl}_{\rho}(C,D)\leq\eta+\varepsilon$ . Since this holds for all $\varepsilon>0$ , $d\hat{\kern-1.49994ptl}_{\rho}(C,D)\leq\eta$ and the first conclusion holds.

To establish the inequality the other way, let $x=(x_{1},\dots,x_{m})\in C\cap\mathbb{B}_{X}(\rho)$ , $\varepsilon>0$ , and $i\in\{1,\dots,m\}$ . Then, there exists $y=(y_{1},\dots,y_{m})\in D$ such that

[TABLE]

Since $x\in C\cap\mathbb{B}_{X}(\rho)$ is arbitrary, $\mathop{\rm exs}(C_{i}\cap\mathbb{B}_{X_{i}}(\rho);D_{i})\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)+\varepsilon$ . A similar argument with the roles of $C$ and $D$ reversed, allows us to conclude that $\mathop{\rm exs}(D_{i}\cap\mathbb{B}_{X_{i}}(\rho);C_{i})\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)+\varepsilon$ . Thus, $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)+\varepsilon$ . Since $i$ and $\varepsilon$ are arbitrary, the conclusion follows.

3.2 Corollary

(indicator functions). For subsets $C,D$ of a metric space and $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

**Proof. **By Prop. 3.1, $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}\iota_{C},\mathop{\rm epi}\iota_{D})$ $=$ $d\hat{\kern-1.49994ptl}_{\rho}(C\times\mathbb{R}_{+},D\times\mathbb{R}_{+})$ $=$ $\max\{d\hat{\kern-1.49994ptl}_{\rho}(C,D)$ , $d\hat{\kern-1.49994ptl}_{\rho}(\mathbb{R}_{+},\mathbb{R}_{+})\}$ $=$ $d\hat{\kern-1.49994ptl}_{\rho}(C,D)$ as long as $C\cap\mathbb{B}_{X}(\rho)$ and $D\cap\mathbb{B}_{X}(\rho)$ are nonempty. If one or both of these sets are empty, the corollary holds trivially.

3.3 Proposition

(union of sets). For a metric space $X$ , $\{C_{\alpha},D_{\alpha}\subset X,\alpha\in A\}$ , with $A$ being an arbitrary set, and $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

**Proof. **Let $C=\cup_{\alpha\in A}C_{\alpha}$ , $D=\cup_{\alpha\in A}D_{\alpha}$ , and $\eta=\sup_{\alpha\in A}d\hat{\kern-1.49994ptl}_{\rho}(C_{\alpha},D_{\alpha})$ . Suppose that $x\in C\cap\mathbb{B}_{X}(\rho)$ . Then, there exists $\alpha\in A$ such that $x\in C_{\alpha}$ . Since $D_{\alpha}\subset D$ and $x\in C_{\alpha}\cap\mathbb{B}_{X}(\rho)$ ,

[TABLE]

The arbitrary choice of $x\in C\cap\mathbb{B}_{X}(\rho)$ allows us to conclude that $\mathop{\rm exs}(C\cap\mathbb{B}_{X}(\rho);D)\leq\eta$ . The roles of $C$ and $D$ can be reversed yielding the conclusion.

There is no similar result for intersections. A revealing example is furnished already on $\mathbb{R}$ by $C_{1}=C_{2}=\{0\}$ , $D_{1}=\{-\varepsilon\}$ , and $D_{2}=\{\varepsilon\}$ with $\varepsilon>0$ . Then, $d\hat{\kern-1.49994ptl}_{\rho}(C_{1}\cap C_{2},D_{1}\cap D_{2})=\infty$ because $D_{1}\cap D_{2}=\emptyset$ . However, $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})=2\varepsilon$ for $\rho\geq\varepsilon$ and $i=1,2$ . The difficult occurs even if $C_{1}\cap C_{2}$ and $D_{1}\cap D_{2}$ have nonempty interiors. Consider $C_{1}=D_{1}=[-1,0]\cup[1,2]$ and $C_{2}=[-1,0]\cup[2,3]$ and $D_{2}=[-1,0]\cup[2+\varepsilon,3]$ with $\varepsilon\in(0,1)$ . Then, $C_{1}\cap C_{2}=[-1,0]\cup\{2\}$ , $D_{1}\cap D_{2}=[-1,0]$ , and $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})\leq\varepsilon$ for $i=1,2$ and $\rho\geq 3$ . Still, $d\hat{\kern-1.49994ptl}_{\rho}(C_{1}\cap C_{2},D_{1}\cap D_{2})=2$ . In the convex case, having intersections with nonempty interior remedy the situation to a large extent; see [9, Cor. 2.5]. In the general case, however, it is difficulty to say more than $\mathop{\rm exs}(\cap_{\alpha\in A}C_{\alpha};\cap_{\alpha\in A}D_{\alpha}^{+})\leq 0$ , where $D_{\alpha}^{+}=\{x\in X~{}|~{}\mathop{\rm dist}(x,D_{\alpha})\leq\mathop{\rm exs}(C_{\alpha};D_{\alpha})\}$ for $\alpha\in A$ , which nevertheless provides guidance towards constructing outer approximations.

For large enough $\rho$ , the operation of taking the convex hull is non-expansive under $d\hat{\kern-1.49994ptl}_{\rho}$ . We denote by $\mathop{\rm con}C$ the convex hull of a set $C$ and $\mathbb{N}$ the natural numbers.

3.4 Proposition

(convex hulls). For subsets $C$ and $D$ of a normed linear space $X$ ,

[TABLE]

when $\rho\in[0,\infty]$ is such that $C,D\subset\mathbb{B}_{X}(\rho)$ .

**Proof. **Suppose that $x\in\mathop{\rm con}C\cap\mathbb{B}_{X}(\rho)$ . Thus, there exist $r\in\mathbb{N}$ , $x^{1}$ , $\dots,$ $x^{r}\in C$ , and $\alpha_{1},\dots,\alpha_{r}\geq 0$ , with $\sum_{i=1}^{r}\alpha_{i}=1$ such that $x=\sum_{i=1}^{r}\alpha_{i}x^{i}$ . Let $\varepsilon>0$ . Since $x^{i}\in C\cap\mathbb{B}_{X}(\rho)$ , there exists $y^{i}\in D$ with $\|x^{i}-y^{i}\|-\varepsilon\leq\mathop{\rm dist}(x^{i},D)\leq\mathop{\rm exs}(C\cap\mathbb{B}_{X}(\rho);D)\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)$ . For $y=\sum_{i=1}^{r}\alpha^{i}y^{i}$ , $\|x-y\|\leq\sum_{i=1}^{r}\alpha_{i}\|x^{i}-y^{i}\|\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)+\varepsilon$ . Thus, $\mathop{\rm dist}(x,\mathop{\rm con}D)\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)+\varepsilon$ because $y\in\mathop{\rm con}D$ . Since $\varepsilon$ and $x$ are arbitrary, $\mathop{\rm exs}(\mathop{\rm con}C\cap\mathbb{B}_{X}(\rho);\mathop{\rm con}D)\leq d\hat{\kern-1.49994ptl}_{\rho}(C,D)$ . The conclusion then follows by symmetry.

The difficulty with unbounded sets and a finite $\rho$ is illustrated by $C=\{\lambda(-1,1)$ , $\lambda(1,-1)\}$ $\subset\mathbb{R}^{2}$ and $D=\{\lambda(1,1),\lambda(-1,-1)\}\subset\mathbb{R}^{2}$ , with $\lambda>0$ . For the norm $\|\cdot\|_{\infty}$ and $\rho<\lambda$ , $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm con}C,\mathop{\rm con}D)=\rho$ but $d\hat{\kern-1.49994ptl}_{\rho}(C,D)=0$ . Near the origin $C$ and $D$ look the same (empty), but their convex hulls are locally rather different.

Next, we turn the focus towards images of sets, which provide foundations for several subsequent results. For metric spaces $(X,d_{X})$ and $(Y,d_{Y})$ , we say that a set-valued mapping $S:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ is Lipschitz continuous with modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\rho^{*}\in[0,\infty]$ if

[TABLE]

We retain this terminology also for point-valued mappings, in which case the left-hand side amounts to the truncated Hausdorff distance between two points.

The image of $C\subset X$ under a set-valued mapping $S:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ is the set $S(C):=\cup_{x\in C}S(x)$ . The corresponding inverse set-valued mapping is $S^{-1}(y):=\{x\in X~{}|~{}y\in S(x)\}$ for $y\in Y$ . Moreover, for any nonempty $C\subset X$ and $f:X\to\overline{\mathbb{R}}$ , $\mathop{\rm inf}\nolimits_{C}f:=\mathop{\rm inf}\nolimits\{f(x)~{}|~{}x\in C\}$ and $\mathop{\rm sup}\nolimits_{C}f:=\mathop{\rm sup}\nolimits\{f(x)~{}|~{}x\in C\}$ . When $C$ is empty, $\mathop{\rm inf}\nolimits_{C}f=\infty$ and $\mathop{\rm sup}\nolimits_{C}f=-\infty$ .

3.5 Theorem

(images under Lipschitz mappings). Suppose that $(X,d_{X})$ and $(Y,d_{Y})$ are metric spaces, with centroids $x^{\rm ctr}$ and $y^{\rm ctr}$ , respectively, $\rho\in\mathbb{R}_{+}$ , and $S,T:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ are nonempty-valued Lipschitz continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\rho^{*}\in[0,\infty]$ . Then, for any nonempty $C,D\subset X$ ,

[TABLE]

provided that $\rho^{*}>2\rho+\max\{\mathop{\rm dist}(y^{\rm ctr},S(C)),\mathop{\rm dist}(y^{\rm ctr},S(D)),\mathop{\rm dist}(y^{\rm ctr},T(D))\}$ , $\bar{\rho}>0$ exceeds

[TABLE]

and $\hat{\rho}>\bar{\rho}+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)$ .

**Proof. **First, we bound $d\hat{\kern-1.49994ptl}_{\rho^{*}}(S(C),S(D))$ . Suppose that $\bar{y}\in S(C)\cap\mathbb{B}_{Y}(\rho^{*})$ . Then there exists $\bar{x}\in S^{-1}(\bar{y})\cap C$ such that $d_{X}(\bar{x},x^{\rm ctr})\leq\bar{\rho}$ , i.e., $\bar{x}\in C\cap\mathbb{B}_{X}(\bar{\rho})$ . Let $\varepsilon\in(0,\hat{\rho}-\bar{\rho}-d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D))$ . There exists $x\in D$ such that $d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)\geq\mathop{\rm exs}\big{(}C\cap\mathbb{B}_{X}(\bar{\rho});D\big{)}\geq\mathop{\rm dist}(\bar{x},D)\geq d_{X}(\bar{x},x)-\varepsilon$ . Thus, $d_{X}(x,x^{\rm ctr})\leq d_{X}(\bar{x},x^{\rm ctr})+d_{X}(\bar{x},x)\leq\bar{\rho}+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)+\varepsilon\leq\hat{\rho}$ so that both $\bar{x}$ and $x$ are in $\mathbb{B}_{X}(\hat{\rho})$ . There exists $y\in S(x)$ such that $d_{Y}(\bar{y},y)\leq\mathop{\rm dist}(\bar{y},S(x))+\varepsilon$ , which implies that $y\in S(D)$ . Then,

[TABLE]

which implies that $\mathop{\rm exs}(S(C)\cap\mathbb{B}_{Y}(\rho^{*});S(D))\leq\kappa(\hat{\rho})d\hat{\kern-1.49994ptl}_{\bar{\rho}}(C,D)+(\kappa(\hat{\rho})+1)\varepsilon$ . Repeating the arguments with the roles of $C$ and $D$ reversed and recognizing that $\varepsilon$ is arbitrary, lead to

[TABLE]

Second, we bound $d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(D),T(D)\big{)}$ . Suppose that $\bar{y}\in S(D)\cap\mathbb{B}_{Y}(\rho^{*})$ . Then there exists $\bar{x}\in S^{-1}(\bar{y})\cap D$ such that $d_{X}(\bar{x},x^{\rm ctr})\leq\bar{\rho}$ , i.e., $\bar{x}\in D\cap\mathbb{B}_{X}(\bar{\rho})$ . Let $\varepsilon>0$ . There exists $y\in T(\bar{x})$ such that $d_{Y}(\bar{y},y)\leq\mathop{\rm dist}(\bar{y},T(\bar{x}))+\varepsilon$ , which implies that $y\in T(D)$ . Then,

[TABLE]

which implies that $\mathop{\rm exs}(S(D)\cap\mathbb{B}_{Y}(\rho^{*});T(D))\leq\sup_{x\in\mathbb{B}_{X}(\bar{\rho})}d\hat{\kern-1.49994ptl}_{\rho^{*}}\big{(}S(x),T(x)\big{)}+\varepsilon$ . Again by symmetry and the fact that $\varepsilon$ is arbitrary, we conclude that

[TABLE]

The result now follows by Prop. 2.1.

The requirement on $\bar{\rho}$ in the proposition is most easily verified when $C$ and $D$ are bounded, but other possibilities exist for example under a Lipschitz property on the inverse set-valued mappings. An example of this appears in Cor. 4.8 below.

Sums of sets arise among other places in subdifferential calculus: For functions $f_{1}$ and $f_{2}$ , the set of subgradients $\partial(f_{1}+f_{2})(x)=\partial f_{1}(x)+\partial f_{2}(x)$ under appropriate assumptions [32, Sec. 10.9]; here and below subgradients are of the general kind111For $f:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ and a point $\bar{x}$ where $f$ is finite, we recall that $v\in\widehat{\partial}f(\bar{x})$ (a subgradient of the regular kind) if and only if $f(x)\geq f(\bar{x})+\langle v,x-\bar{x}\rangle+o(\|x-\bar{x}\|_{2})$ . Moreover, $v\in\partial f(\bar{x})$ (a subgradient of the general kind) if and only if there exist $v^{\nu}\to v$ and $x^{\nu}\to x$ , with $f(x^{\nu})\to f(x)$ , such that $v^{\nu}\in\widehat{\partial}f(x^{\nu})$ . In the convex case, regular and general subgradients coincide. [32, 25]. Of course, the previous theorem could be used to establish a result about sums. We pursue a direct approach, with a proof in the appendix, as it is instructive and also brings forth a possible adjustment in the case of unbounded sets.

3.6 Proposition

(sums of sets). For a normed linear space $X$ , nonempty sets $\{C_{i},D_{i}\subset X,i=1,\dots,m\}$ , and $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $C_{i},D_{i}\subset\mathbb{B}_{X}(\rho)$ for all $i=1,2,\dots,m$ . If $C_{i},D_{i}\subset\mathbb{B}_{X}(\rho)$ holds only for $i=2,3,\dots,m$ , then the inequality remains valid as long as $d\hat{\kern-1.49994ptl}_{\rho}(C_{1},D_{1})$ is replaced by $d\hat{\kern-1.49994ptl}_{m\rho}(C_{1},D_{1})$ .

A motivation for allowing one unbounded set merges when studying a locally Lipschitz continuous function $f:\mathbb{R}^{n}\to\mathbb{R}$ , a nonempty closed set $C\subset\mathbb{R}^{n}$ , and the optimality condition $0\in\partial f(x)+N_{C}(x)$ [32, Exer 10.10], where $N_{C}(x)$ is the normal cone of $C$ at $x$ in the general sense [32, 25], i.e., $N_{C}(x)=\partial\iota_{C}(x)$ . Here, $\partial f(x)$ is bounded, but $N_{C}(x)$ is not in the interesting cases. We observe that if there are two or more unbounded sets, then the assertion in the proposition fails. For an example in $\mathbb{R}^{2}$ , let $C_{1}=\{\lambda(1,1+\delta)~{}|~{}\lambda\geq 0\}$ , $C_{2}=\{\lambda(-1,-1+\delta)~{}|~{}\lambda\geq 0\}$ , with $\delta>0$ , $D_{1}=\{\lambda(1,1)~{}|~{}\lambda\geq 0\}$ , and $D_{2}=\{\lambda(-1,-1)~{}|~{}\lambda\geq 0\}$ . All the sets are rays and therefore unbounded. Now, $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})\leq\delta\rho$ for $i=1,2$ . However, because $C_{1}+C_{2}$ is “nearly” the halfspace $\{(x_{1},x_{2})~{}|~{}x_{1}-x_{2}\leq 0\}$ for small $\delta$ but $D_{1}+D_{2}=\{(x_{1},x_{2})~{}|~{}x_{1}=x_{2}\}$ , $d\hat{\kern-1.49994ptl}_{\rho}(C_{1}+C_{2},D_{1}+D_{2})=\rho$ .

The inequality in the proposition is sharp because for $x,y,z\in X$ and $C_{1}=\{x\}$ , $C_{2}=\{y\}$ , $D_{1}=\{x+z\}$ , and $D_{2}=\{y+z\}$ , we have $d\hat{\kern-1.49994ptl}_{\rho}(C_{1}+C_{2},D_{1}+D_{2})=2\|z\|$ and $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})=\|z\|$ for $i=1,2$ for sufficiently large $\rho$ . Still, we can have strict inequality. For example, $x,y\in X$ , $x\neq y\neq 0$ , and $C_{1}=\{x\}$ , $C_{2}=\{-x\}$ , $D_{1}=\{y\}$ , and $D_{2}=\{-y\}$ , we have $d\hat{\kern-1.49994ptl}_{\rho}(C_{1}+C_{2},D_{1}+D_{2})=0$ and $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})=\|x-y\|$ for $i=1,2$ for sufficiently large $\rho$ .

3.7 Corollary

(set multiplications). For nonempty subsets $C$ and $D$ of a normed linear space, nonzero $\lambda,\mu\in\mathbb{R}$ , and $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

when $\bar{\rho}>(2\rho+\max\{|\lambda|\mathop{\rm dist}(0,C),|\lambda|\mathop{\rm dist}(0,D),|\mu|\mathop{\rm dist}(0,D)\})\max\{|\lambda^{-1}|,|\mu^{-1}|\}$ .

**Proof. **The result follows from Thm. 3.5 by setting $S(x)=\lambda x$ and $T(x)=\mu x$ .

We end the section by recording a useful fact about the distance between level-sets of two convex functions, which extends [32, Prop. 7.68] by allowing the functions to be different.

3.8 Proposition

(level-sets; convex case). For $\rho\in\mathbb{R}_{+}$ , $\alpha,\beta\in[-\rho,\rho]$ , and proper convex lsc functions $f,g:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ , suppose that $\alpha>\inf f$ , $\beta>\inf g$ , $\mathop{\rm argmin}\nolimits f\neq\emptyset$ , and $\mathop{\rm argmin}\nolimits g\neq\emptyset$ . Then, with $\eta=d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ ,

[TABLE]

provided that $\rho_{0}\geq\max\{\mathop{\rm dist}(0,\mathop{\rm argmin}\nolimits f),\mathop{\rm dist}(0,\mathop{\rm argmin}\nolimits g)\}$ and $\rho^{*}\geq\max\{\rho_{0},\rho+d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)\}$ .

**Proof. **By Prop. 4.5 in [33], $\mathop{\rm exs}(\mathop{\mathop{\rm lev}}\nolimits_{\alpha}f\cap\mathbb{B}_{\mathbb{R}^{n}}(\rho);\mathop{\mathop{\rm lev}}\nolimits_{\alpha+\eta}g)\leq\eta$ . An application of Prop. 7.68 in [32] yields

[TABLE]

whenever $\alpha+\eta>\beta$ . If $\alpha+\eta\leq\beta$ , then $\mathop{\rm exs}(\mathop{\mathop{\rm lev}}\nolimits_{\alpha+\eta}g\cap\mathbb{B}_{\mathbb{R}^{n}}(\rho^{*});\mathop{\mathop{\rm lev}}\nolimits_{\beta}g)=0$ . Let $x\in\mathop{\mathop{\rm lev}}\nolimits_{\alpha}f\cap\mathbb{B}_{\mathbb{R}^{n}}(\rho)$ . There exists $y\in\mathop{\mathop{\rm lev}}\nolimits_{\alpha+\eta}g$ with $\|y-x\|\leq\eta$ so that $y\in\mathbb{B}_{\mathbb{R}^{n}}(\rho^{*})$ . Thus, we have established that

[TABLE]

Repeating the argument with the roles of $f$ and $g$ reversed leads to the conclusion.

The proposition relies heavily on the assumption that $\mathop{\mathop{\rm lev}}\nolimits_{\alpha}f$ and $\mathop{\mathop{\rm lev}}\nolimits_{\beta}g$ have nonempty interiors. The next section dispenses of that requirement as well as convexity.

4 Distances between Epigraphs of Functions

As special sets, epigraphs offer several possibilities to specialize the results of the previous section and also develop new ones. First, we examine the Kenmochi conditions and their numerous applications including in the analysis of constrained problems with feasible sets that lack interiors. Second, we develop a series of calculus rules relying, in part, on Section 3.

For a metric space $(X,d_{X})$ , let the closed balls at $x\in X$ be denoted by

[TABLE]

4.1 Kenmochi Conditions and Applications

An alternative expression for the truncated Hausdorff distance between epigraphs is provided by the Kenmochi conditions, which can be traced back to [21]; see also [4]. The following result generalizes [33, Prop. 3.2] by relaxing a lsc assumption and establishing that the conditions provide tight estimates. A proof is provided in the appendix.

4.1 Proposition

(Kenmochi conditions). For a metric space $X$ , functions $f,g:X\to\overline{\mathbb{R}}$ , both with nonempty epigraphs, and $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

For $\alpha\in(0,\infty)$ , a function $f:X\to\overline{\mathbb{R}}$ is $\alpha$ -Hölder continuous with modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ if

[TABLE]

The function is Lipschitz continuous with modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ if the relation holds with $\alpha=1$ .

The truncated Hausdorff distance between epigraphs of functions of this kind can be bounded by an expression involving the worst pointwise difference between the functions over a set.

4.2 Proposition

(estimates from sup-norm). For a metric space $X$ , functions $f,g:X\to\overline{\mathbb{R}}$ with nonempty epigraphs, and $\rho\in\mathbb{R}_{+}$ , we have that

[TABLE]

where $A_{\rho}=\mathop{\mathop{\rm lev}}\nolimits_{\rho}f\cup\mathop{\mathop{\rm lev}}\nolimits_{\rho}g\cap\mathbb{B}_{X}(\rho)$ . (Supremum over an empty set is interpreted as zero in this case.) Suppose also that $f$ and $g$ are $\alpha$ -Hölder continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ and $\alpha\in(0,\infty)$ . Then, for any nonempty $C\subset X$ ,

[TABLE]

provided that $\hat{\rho}>\rho+\mathop{\rm exs}(A_{\rho};C)$ .

**Proof. **The first assertion holds via Prop. 4.1. For the second assertion, set $\eta=\mathop{\rm exs}(A_{\rho};C)$ and let $\varepsilon\in(0,\hat{\rho}-\rho-\eta)$ . Suppose that $x\in\mathop{\mathop{\rm lev}}\nolimits_{\rho}f\cap\mathbb{B}_{X}(\rho)$ . Then, there exists $\bar{x}\in C$ with $d_{X}(x,\bar{x})\leq\bar{\eta}=\eta+\varepsilon$ and

[TABLE]

A similar result holds with the roles of $f$ and $g$ reversed. Thus, by Prop., 4.1 $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ $\leq$ $\max\{\bar{\eta},\kappa(\hat{\rho})\bar{\eta}^{\alpha}+\mathop{\rm sup}\nolimits_{C}|f-g|\}$ . Since $\varepsilon$ is arbitrary, $\bar{\eta}$ can be replaced by $\eta$ and the second conclusion holds.

Example 1: sample average approximations. In stochastic optimization and statistical learning, $f:X\to\mathbb{R}$ is often given as $f(x)=\mathbb{E}[\psi(\boldmath{\xi},x)]$ , where $\psi:\Xi\times X\to\mathbb{R}$ and $\mathbb{E}$ denotes the expectation under the distribution of the random vector $\boldmath{\xi}$ with values in $\Xi$ . Under standard assumptions (see [32, Ch. 14], [39, Ch. 7]), $f$ is well defined and Lipschitz continuous with modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ . An approximation of $f$ could be the sample average function $f^{\nu}:X\to\mathbb{R}$ given by $f^{\nu}(x)=\nu^{-1}\sum_{i=1}^{\nu}\psi(\xi^{i},x)$ , where $\xi^{1},\dots,\xi^{\nu}\in\Xi$ are given data. Under related assumptions, $f^{\nu}$ is also Lipschitz continuous with the same modulus as $f$ . When $X$ is finitely compact222Recall that a metric space is finitely compact if all its balls are compact., $A_{\rho}$ in Prop. 4.2 is compact and it is possible to construct for any $\varepsilon>0$ a set $C$ consisting of only a finite number of points and still have $\mathop{\rm exs}(A_{\rho};C)\leq\varepsilon$ . Since $C$ is finite, there exists a variety of ways of bounding $\mathop{\rm sup}\nolimits_{C}|f-f^{\nu}|$ , say by $\delta$ , using the theory of large deviations; see for example [39, Ch. 7]. Prop. 4.2 then gives that $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}f^{\nu})\leq\max\{\varepsilon,\kappa(\hat{\rho})\varepsilon+\delta\}$ when $\hat{\rho}>\rho+\varepsilon$ .

The next result extends [33, Prop. 3.3] by moving from indicator functions to general functions and from Lipschitz to Hölder continuous functions; see also [4, 9] for results on sums in the convex case.

4.3 Proposition

(sums under Hölder continuity). For a metric space $X$ , functions $f_{i},g_{i}:X\to\overline{\mathbb{R}}$ , $i=1,2$ , where $f_{1},g_{1}$ are $\alpha$ -Hölder continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ , $\alpha\in(0,\infty)$ , and both $\mathop{\rm epi}(f_{1}+f_{2})$ and $\mathop{\rm epi}(g_{1}+g_{2})$ are nonempty. Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

where $\eta=d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f_{2},\mathop{\rm epi}g_{2})$ , provided that $A_{\rho}=\mathop{\mathop{\rm lev}}\nolimits_{\rho}(f_{1}+f_{2})\cup\mathop{\mathop{\rm lev}}\nolimits_{\rho}(g_{1}+g_{2})\cap\mathbb{B}_{X}(\rho)\neq\emptyset$ , $\bar{\rho}\geq\rho+\max\{\mathop{\rm sup}\nolimits_{\mathbb{B}_{X}(\rho)}|f_{1}|,\mathop{\rm sup}\nolimits_{\mathbb{B}_{X}(\rho)}|g_{1}|\}$ , and $\hat{\rho}>\rho+\eta$ .

**Proof. **Let $\varepsilon\in(0,\hat{\rho}-\rho-\eta)$ and $x\in\mathop{\mathop{\rm lev}}\nolimits_{\rho}(f_{1}+f_{2})\cap\mathbb{B}_{X}(\rho)$ . Then, $f_{2}(x)\leq\rho-f_{1}(x)\leq\bar{\rho}$ . First, suppose that $f_{2}(x)\geq-\bar{\rho}$ so that $(x,f_{2}(x))\in\mathop{\rm epi}f_{2}\cap\mathbb{B}_{X\times\mathbb{R}}(\bar{\rho})$ . Consequently, there is $(\bar{x},\bar{\alpha})$ $\in$ $\mathop{\rm epi}g_{2}$ with $d_{X}(x,\bar{x})\leq\eta+\varepsilon$ and $|\bar{\alpha}-f_{2}(x)|\leq\eta+\varepsilon$ . Thus, $g_{2}(\bar{x})\leq\bar{\alpha}\leq f_{2}(x)+\eta+\varepsilon$ and

[TABLE]

Second, suppose that $f_{2}(x)<-\bar{\rho}$ . Then, $(x,-\bar{\rho})\in\mathop{\rm epi}f_{2}\cap\mathbb{B}_{X\times\mathbb{R}}(\bar{\rho})$ and there is $(\bar{x},\bar{\alpha})\in\mathop{\rm epi}g_{2}$ with $d_{X}(x,\bar{x})\leq\eta+\varepsilon$ and $|\bar{\alpha}+\bar{\rho}|\leq\eta+\varepsilon$ . Thus, $g_{2}(\bar{x})\leq\bar{\alpha}\leq-\bar{\rho}+\eta+\varepsilon$ and, similar to above,

[TABLE]

The last inequality follows because $f_{1}(x)-\bar{\rho}\leq\mathop{\rm sup}\nolimits_{\mathbb{B}_{X}(\rho)}|f_{1}|-\bar{\rho}\leq-\rho$ . Thus, in both cases, we obtain the same upper bound on $\mathop{\rm inf}\nolimits_{\mathbb{B}_{X}(x,\eta+\varepsilon)}g_{1}+g_{2}$ . Repeating these arguments with the roles of $f_{1},f_{2}$ switched with those of $g_{1},g_{2}$ , we obtain via Prop. 4.1 that $d\hat{\kern-1.49994ptl}_{\rho}\big{(}\mathop{\rm epi}(f_{1}+f_{2}),\mathop{\rm epi}(g_{1}+g_{2})\big{)}\leq\max\{\eta+\varepsilon,\mathop{\rm sup}\nolimits_{A_{\rho}}|f_{1}-g_{1}|+\kappa(\hat{\rho})(\eta+\varepsilon)^{\alpha}+\eta+\varepsilon\}$ . Since $\varepsilon$ is arbitrary, the conclusion follows.

Example 1: continued. Suppose that in addition to $f$ the problem of interest involves a “regularizer” $r:X\to[0,\infty)$ , which is common in statistical learning, i.e., we aim to minimize $f+r$ . We may want to examine the stability of solutions under changes to $r$ . Let $r^{\nu}:X\to[0,\infty)$ be such an alternative regularizer. A prime example is when $r=0$ and we want to quantify the effect of the regularizer $r^{\nu}$ . We are therefore interested in comparing $\mathop{\rm epi}(f+r)$ to $\mathop{\rm epi}(f^{\nu}+r^{\nu})$ . Suppose that $r$ and $r^{\nu}$ are $\alpha$ -Hölder continuous with common modulus $\mu:\mathbb{R}_{+}\to\mathbb{R}_{+}$ and $\alpha\in(0,\infty)$ , and $X=\mathbb{R}^{n}$ . A possible choice is to have $r^{\nu}(x)=\sum_{j=1}^{n}s^{\nu}(x_{j})$ with $s^{\nu}(\tau)=\lambda|\tau|-\nu\tau^{2}/2$ when $|\tau|\leq\lambda/\nu$ and $s^{\nu}(\tau)=\lambda^{2}/(2\nu)$ otherwise, with $\lambda>0$ being a parameter. This makes $r^{\nu}$ a nonconvex function with Lipschitz modulus $\lambda$ globally. An even more aggressive regularizer would be $s^{\nu}(\tau)=\nu^{-1}\sqrt{|\tau|}$ , possibly further scaled, which is nonconvex but $1/2$ -Hölder continuous. Regardless, Prop. 4.3 establishes that

[TABLE]

where $\eta=d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f,\mathop{\rm epi}f^{\nu})$ can be expressed in terms of $\kappa$ , $\varepsilon$ , and $\delta$ , and $A_{\rho}$ and $\hat{\rho}$ are sufficiently large as stipulated by the proposition. In particular when $r=0$ , this error bound provides guidance on how fast the regularizer should vanish as the sample size $\nu$ grows. Typically, the sample error $\delta$ is of order $\nu^{-1/2}$ , which indicates that $r^{\nu}$ should vanish at the same rate at least when $\alpha=1$ .

Example 2: disjunctive programming. Suppose that $\{C_{\alpha},\alpha\in A\}$ is a collection of nonempty subsets of a Hilbert space $X$ and $c\in X$ . Disjunctive programming studies problems of the form minimize $\langle c,x\rangle$ subject to $x\in\cup_{\alpha\in A}C_{\alpha}$ . The effect of replacing $c$ by $d\in X$ and the sets by $\{D_{\alpha}\neq\emptyset,\alpha\in A\}$ on the minimum value and set of near-minimizers can be bounded by Prop. 2.2 via Prop. 4.3 and Prop. 3.3. Specifically, let $f(x)=\langle c,x\rangle$ if $x\in C=\cup_{\alpha\in A}C_{\alpha}$ and $f(x)=\infty$ otherwise. Likewise, $g(x)=\langle d,x\rangle$ if $x\in D=\cup_{\alpha\in A}D_{\alpha}$ and $g(x)=\infty$ otherwise. Since $\inf_{x\in\mathbb{B}_{X}(\rho)}\langle c,x\rangle\geq-\rho\|c\|$ and similarly with $c$ replaced by $d$ , $\bar{\rho}$ can be set to $\rho(1+\max\{\|c\|,\|d\|\})$ in Prop. 4.3 and, in view of the Lipschitz continuity of $\langle c,\cdot\rangle$ and $\langle d,\cdot\rangle$ ,

[TABLE]

where the last inequality follows by Cor. 3.2 and Prop. 3.3. Consequently, solutions of disjunctive programs exhibit a Lipschitz property in this sense under a remarkable absence of assumptions.

As already discussed in Section 3, intersections of sets are generally not stable under perturbations of the individual sets. This fact is the source of many difficulties in constrained optimization. In particular, if the problem of minimizing $f_{0}(x)$ subject to $x\in C_{\alpha}$ for all $\alpha\in A$ is “approximated” by minimizing $g_{0}(x)$ subject to $x\in D_{\alpha}$ for all $\alpha\in A$ , with both $\mathop{\rm sup}\nolimits_{X}|f_{0}-g_{0}|$ and $d\hat{\kern-1.49994ptl}_{\rho}(C_{\alpha},D_{\alpha})$ being “small” for all $\alpha\in A$ , then their solutions can still be arbitrarily far apart. The issue surfaces even in one dimension: for example, set $f_{0}(x)=g_{0}(x)=x$ , $C_{1}=D_{1}=\{0,1\}$ , $C_{2}=[0,1-\varepsilon]$ , and $D_{2}=[\varepsilon,1]$ for $\varepsilon\in(0,1)$ . Thus, a major challenge is to construct approximating problems that are associated with small truncated Hausdorff distances to their original counterparts. We observe that in the convex case having an intersection of constraint sets with nonempty interior suffices to avoid this difficulty as long as the approximations are sufficiently accurate; see [9, Cor. 2.5].

We illustrate three cases, while neither making assumptions about the feasible sets having an interior nor being convex. Moreover, the approximations can be arbitrarily poor, i.e., we are not only considering small perturbations. This forces us to construct approximating problems that are rather different than the actual problems because simply replacing objective functions and constraint sets by approximating counterparts usually fail to achieve small solution errors as the trivial example in the previous paragraph highlights.

Case I. The first case analyzes the feasibility problem of finding an $x\in\cap_{i=1}^{m}C_{i}$ when we only have approximating sets $D_{1},\dots,D_{m}$ . We construct an approximating optimization problem in a higher-dimensional space that furnishes an approximating solution of the actual feasibility problem and is computationally attractive as it “nearly” decomposes into $m$ subproblems.

4.4 Theorem

(approximation of feasibility problem). For subsets $C_{1},\dots,C_{m}$ and $D_{1},\dots,D_{m}$ of a metric space $(X,d_{X})$ , with centroid $x^{\rm ctr}$ , $\lambda\in(0,\infty)$ , $\rho>2\lambda(m-1)\max_{i=1,\dots,m}d_{X}(x^{\rm ctr},D_{i})$ , with $\cap_{i=1}^{m}C_{i}\cap\mathbb{B}_{X}(\rho)\neq\emptyset$ , and $\bar{\rho}\in(3\rho,\infty)$ , suppose that the following constraint qualification holds: there exists a nondecreasing function $\psi:\mathbb{R}_{+}\to\mathbb{R}_{+}$ such that

[TABLE]

Then, any solution

[TABLE]

satisfies

[TABLE]

**Proof. **Let $C=C_{1}\times\dots\times C_{m}\subset X^{m}$ , $D=D_{1}\times\dots\times D_{m}\subset X^{m}$ , and define $f,f^{\lambda},g^{\lambda}:X^{m}\to\overline{\mathbb{R}}$ to have $f(x_{1},\dots,x_{m})=0$ if $(x_{1},\dots,x_{m})\in C$ and $x_{i}=x_{1}$ for all $i$ , $f^{\lambda}(x_{1},\dots,x_{m})=\lambda\sum_{i=1}^{m}d_{X}(x_{i},x_{1})$ if $(x_{1},\dots,x_{m})\in C$ , and $g^{\lambda}(x_{1},\dots,x_{m})=\lambda\sum_{i=1}^{m}d_{X}(x_{i},x_{1})$ if $(x_{1},\dots,x_{m})\in D$ . Otherwise, the functions take the value $\infty$ .

First, we examine the Kenmochi conditions for $f$ and $f^{\lambda}$ . Suppose $(x_{1},\dots,x_{m})$ $\in$ $\mathop{\mathop{\rm lev}}\nolimits_{\bar{\rho}}f\cap\mathbb{B}_{X^{m}}(\bar{\rho})$ . (Note that $X^{m}=X\times\dots\times X$ is equipped with the product metric.) Then, $(x_{1},\dots,x_{m})\in C$ and $x_{i}=x_{1}$ for all $i$ . Thus, $\mathop{\rm inf}\nolimits_{\mathbb{B}_{X^{m}}((x_{1},\dots,x_{m}),0)}f^{\lambda}\leq f^{\lambda}(x_{1},\dots,x_{m})=0=f(x_{1},\dots,x_{m})$ and the first set of Kenmochi conditions holds with $\eta=0$ . Next, suppose that $(x_{1},\dots,x_{m})\in\mathop{\mathop{\rm lev}}\nolimits_{\bar{\rho}}f^{\lambda}\cap\mathbb{B}_{X^{m}}(\bar{\rho})$ . Then, $x_{i}\in C_{i}$ for all $i$ and $\lambda\sum_{i=1}^{m}d_{X}(x_{i},x_{1})\leq\bar{\rho}$ . In view of the constraint qualification, this implies that

[TABLE]

Let $\varepsilon>0$ . There exists $\bar{x}\in\cap_{i=1}^{m}C_{i}$ such that $\mathop{\rm dist}(x_{1},\cap_{i=1}^{m}C_{i})\geq d_{X}(x_{1},\bar{x})-\varepsilon$ . Certainly,

[TABLE]

Then, with $\eta=\bar{\rho}/\lambda+\psi(\bar{\rho}/\lambda)+\varepsilon$ ,

[TABLE]

and the second set of Kenmochi conditions holds with this $\eta$ . Since $\varepsilon$ is arbitrary, we have established via Prop. 4.1 that

[TABLE]

Second, we estimate $d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f^{\lambda},\mathop{\rm epi}g^{\lambda})$ . The Lipschitz modulus of the function $(x_{1},\dots,x_{m})\mapsto\lambda\sum_{i=1}^{m}d_{X}(x_{i},x_{1})$ is the constant $2m\lambda$ . By Prop. 3.1, Prop. 4.3, and Cor. 3.2,

[TABLE]

For any $\varepsilon>0$ , we have that

[TABLE]

Thus, $\bar{\rho}>3\rho$ is sufficiently large for use in Prop. 2.1 and

[TABLE]

We next apply Prop. 2.2 to the functions $f$ and $g^{\lambda}$ . The conditions of the proposition is easily verified. In particular, for $(x_{1},\dots,x_{m})\in D$ ,

[TABLE]

which together with the fact that $d_{X}(x_{i},x_{1})\leq 2\max_{i=1,\dots,m}\mathop{\rm dist}(x^{\rm ctr},D_{i})+\varepsilon$ for any $\varepsilon>0$ ensure that

[TABLE]

Consequently, Prop. 2.2 yields $\mathop{\rm exs}\big{(}\mathop{\rm argmin}\nolimits g^{\lambda}\cap\mathbb{B}_{X^{m}}(\rho);~{}\delta\mbox{-}\mathop{\rm argmin}\nolimits f\big{)}\leq\eta$ for $\delta>2\eta$ . Since $\delta\mbox{-}\mathop{\rm argmin}\nolimits f=\{(x_{1},\dots,x_{m})\in C~{}|~{}x_{i}=x_{1},i=1,\dots,m\}$ for $\delta\geq 0$ , the conclusion holds.

The constraint qualification quantifies how close the points $\{x_{i}\in C_{i},i=1,\dots,m\}$ will be to $\cap_{i=1}^{m}C_{i}$ when the points are close to each other. An example similar to the one discussed prior to the theorem is furnished by $C_{1}=D_{1}=\{0,1\}$ , $C_{2}=[0,1-\delta]$ , with $\delta\in(0,1)$ , and $D_{2}=[\varepsilon,1-\delta]$ , with $\varepsilon\in(0,1-\delta]$ , where $d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})\leq\varepsilon$ for $i=1,2$ and $\rho\geq\varepsilon$ . Thus, $C_{1}\cap C_{2}=\{0\}$ , but $D_{1}\cap D_{2}=\emptyset$ and it would be futile to attempt to find a feasible point in $C_{1}\cap C_{2}$ by solving $x\in D_{1}\cap D_{2}$ . However, the approximating problem of the theorem produces the desired result. Specifically, in this case we can take $\psi(\gamma)=\gamma/\delta$ for $\gamma\geq 0$ . Thus, the approximating problem produces a solution with error of at most $\bar{\rho}(\lambda^{-1}+\delta^{-1}\lambda^{-1})+(1+4\lambda)\varepsilon$ . As $\varepsilon{\raise 1.0pt\hbox{$ \scriptstyle,\searrow, $}}0$ , this error vanishes as long as $\lambda$ is set appropriately, for example to $\varepsilon^{-1/2}$ .

In general, the rate of convergence depends on the conditioning function $\psi$ . Poor conditioning requires a large $\lambda$ that in turn increases the third term in the conclusion of Thm. 4.4. Even in the convex case, the conditioning can be arbitrarily poor: let $C_{1}=\{x\in\mathbb{R}^{2}~{}|~{}x_{2}\leq 0\}$ and $C_{2}=\{x\in\mathbb{R}^{2}~{}|~{}x_{1}^{\alpha}\leq x_{2}\}$ for $\alpha>1$ , with $C_{1}\cap C_{2}=\{0\}$ . Then, $\psi(\gamma)=\gamma^{1/\alpha}$ and $x_{1}\in C_{1}$ and $x_{2}\in C_{2}$ can be close even though $x_{1}$ is far from the origin for large $\alpha$ . Further details about constraint qualifications arise in the following two theorems for the case of inequality constraints.

Case II. The second case considers the optimization problem

[TABLE]

for which the actual functions need to be approximated by $g_{0},\dots,g_{m}$ . As already mentioned, an “approximating” problem obtained by simply replacing $f_{i}$ by $g_{i}$ for $i=0,1,\dots,m$ might fail to be epigraphically close to the actual problem (1) even though $\max_{i=0,\dots,m}\mathop{\rm sup}\nolimits_{x\in X}|f_{i}(x)-g_{i}(x)|$ is small. In particular, $\{x\in X~{}|~{}g_{i}(x)\leq 0,i=1,\dots,m\}$ could be empty while the actual feasible set is nonempty. As an alternative, we examine for $\lambda>0$ the approximating problem

[TABLE]

with variable $y=(y_{1},\dots,y_{m})\in\mathbb{R}^{m}$ . We see next that this approximating problem furnishes approximating solutions for (1) via Prop. 2.2.

4.5 Theorem

(approximation by constraint softening). For a metric space $X$ and $f_{i},g_{i}:X\to\mathbb{R}$ , $i=0,1,\dots,m$ , where $f_{0}$ and $g_{0}$ are Lipschitz continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ , consider the functions $f,g^{\lambda}:X\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ defined by

[TABLE]

and, with $\lambda\in(0,\infty)$ ,

[TABLE]

Then333Here we use the product metric on $X\times\mathbb{R}^{m}$ constructed from the sup-norm on $\mathbb{R}^{m}$ ., for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

as long as $\bar{\rho}>2\rho+\max\{\mathop{\rm dist}((x^{\rm ctr},0),\mathop{\rm epi}f),\mathop{\rm dist}((x^{\rm ctr},0),\mathop{\rm epi}g^{\lambda})\}$ , $\rho^{*}\geq\bar{\rho}$ $+$ $\max\{0,$ $-\inf_{\mathbb{B}_{X}(\bar{\rho})}f_{0}\}$ , $\hat{\rho}>\bar{\rho}+\max\{\rho^{*}/\lambda,\psi^{-1}(\rho^{*}/\lambda)\}$ , and the following constraint qualification holds: there is a strictly increasing function $\psi:\mathbb{R}_{+}\to\mathbb{R}_{+}$ such that

[TABLE]

**Proof. **As intermediate steps, we define $h,h^{\lambda},f^{\lambda}:X\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ to have values $h(x,y)=\iota_{X\times\{0\}}(x,y)+\iota_{C}(x,y)$ , with $C=\{(x,y)\in X\times\mathbb{R}^{m}~{}|~{}f_{i}(x)\leq y_{i},y_{i}\geq 0,i=1,\dots,m\}$ , and

[TABLE]

First, we examine the Kenmochi conditions for $h$ and $h^{\lambda}$ . Let $(x,y)\in\mathop{\mathop{\rm lev}}\nolimits_{\rho^{*}}h^{\lambda}\cap\mathbb{B}_{X\times\mathbb{R}^{m}}(\rho^{*})$ . Thus, $(x,y)\in C$ , $\lambda\sum_{i=1}^{m}y_{i}\leq\rho^{*}$ , and $\|y\|_{\infty}\leq\rho^{*}/\lambda$ . Let $\varepsilon>0$ and $\eta=\max\{\rho^{*}/\lambda,\psi^{-1}(\rho^{*}/\lambda)\}+\varepsilon$ . If $f_{i}(x)\leq 0$ for all $i$ , then

[TABLE]

Otherwise there is $i^{*}$ with $f_{i^{*}}(x)>0$ so that

[TABLE]

and $\psi^{-1}(\rho^{*}/\lambda)\geq\mathop{\rm dist}(x,\mathop{\mathop{\rm lev}}\nolimits_{0}\{\max_{i=1,\dots,m}f_{i}\})$ . There exists $\bar{x}\in\mathop{\mathop{\rm lev}}\nolimits_{0}\{\max_{i=1,\dots,m}f_{i}\}$ such that $d_{X}(x,\bar{x})\leq\mathop{\rm dist}(x,\mathop{\mathop{\rm lev}}\nolimits_{0}\{\max_{i=1,\dots,m}f_{i}\})+\varepsilon\leq\psi^{-1}(\rho^{*}/\lambda)\ +\varepsilon$ . Consequently,

[TABLE]

Thus, the second set of Kenmochi conditions holds with this $\eta$ . Since $h^{\lambda}\leq h$ , the first set also holds. Consequently, since $\varepsilon>0$ is arbitrary and Prop. 4.1 applies, we have establish that

[TABLE]

We obtain via Prop. 4.3 that

[TABLE]

Second, we consider the Kenmochi conditions for $f^{\lambda}$ and $g^{\lambda}$ . Let $\delta=\max_{i=0,1,\dots,m}$ $\sup_{\mathbb{B}_{X}(\bar{\rho})}|f_{i}-g_{i}|$ and $(x,y)\in\mathop{\mathop{\rm lev}}\nolimits_{\bar{\rho}}f^{\lambda}\cap\mathbb{B}_{X\times\mathbb{R}^{m}}(\bar{\rho})$ . Then, $(x,y)\in C$ , $f_{i}(x)\leq y_{i}$ , and $g_{i}(x)\leq y_{i}+\delta$ for all $i=1,\dots,m$ . Set $\eta=(1+m\lambda)\delta$ and $\bar{y}=y+(\delta,\dots,\delta)$ . With $B=\mathbb{B}_{X\times\mathbb{R}^{m}}((x,y),\eta)$ , we obtain

[TABLE]

Repeating this argument with the roles of $g^{\lambda}$ and $f^{\lambda}$ reversed, we obtain via Prop. 4.1 that $d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f^{\lambda},\mathop{\rm epi}g^{\lambda})\leq(1+m\lambda)\delta$ . Prop. 2.1 then yields the conclusion.

The theorem presents a tradeoff between two error terms. If the conditioning function $\psi(\gamma)=\gamma^{\beta}$ for $\beta>0$ , then $\lambda$ should be of the order $O(\delta^{-\beta/(1+\beta)})$ to balance the two terms, where $\delta=\mathop{\rm max}\nolimits_{i=0,1,\dots,m}\mathop{\rm sup}\nolimits_{\mathbb{B}_{X}(\bar{\rho})}|f_{i}-g_{i}|$ . This leads to the overall rate of convergence $O(\delta^{1/(1+\beta)})$ , which can be significantly worse than what is indicated by the pointwise error $\delta$ . Still, the situation is much improved from the approach of simply minimizing $g_{0}(x)$ subject to $g_{i}(x)\leq 0$ for $i=1,\dots,m$ . As discussed prior to the theorem, that problem may have solutions that are arbitrarily far away from those of the actual problem (1). In some sense, the theorem explains the popularity of formulations with constraint softening in practice (see [15] for a prime example); they are in a fundamental way “robust” to inaccuracy in the constraint functions.

Theorem 4.5 makes no Slater-type constraint qualification for the actual problem and places no restrictions on the properties of the constraint functions at points in the feasible set. Naturally, if such conditions are brought in, we can improve the results; cf. Prop. 3.8 and [33, Thm. 4.6].

Case III. While still addressing the actual problem (1), the third case examines the classical penalty method and the resulting unconstrained approximating problems.

4.6 Theorem

(approximation by penalty formulation). For a metric space $X$ , with centroid $x^{\rm ctr}$ , $\lambda\in(0,\infty)$ , and $f_{i},g_{i}:X\to\mathbb{R}$ , $i=0,1,\dots,m$ , where $f_{0}$ and $g_{0}$ are Lipschitz continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ , consider the functions $f,g^{\lambda}:X\times\mathbb{R}^{m}\to\overline{\mathbb{R}}$ defined by

[TABLE]

Then,

[TABLE]

provided that $\bar{\rho}>2\rho+\max\{\mathop{\rm dist}(x^{\rm ctr},\mathop{\rm epi}f),\mathop{\rm dist}(x^{\rm ctr},\mathop{\rm epi}g^{\lambda})\}$ , $\hat{\rho}>\bar{\rho}+\psi^{-1}((\bar{\rho}-\mathop{\rm inf}\nolimits_{\mathbb{B}_{X}(\bar{\rho})}f_{0})\lambda^{-1})$ , and the same constraint qualification as in Thm. 4.5 holds.

**Proof. **As an intermediate quantity, we define $f^{\lambda}:X\to\mathbb{R}$ to have values $f^{\lambda}(x)=f_{0}(x)+\lambda\sum_{i=1}^{m}\max\{0,f_{i}(x)\}$ . We start by examining the Kenmochi conditions for $f$ and $f^{\lambda}$ . Let $x\in\mathop{\mathop{\rm lev}}\nolimits_{\bar{\rho}}f^{\lambda}\cap\mathbb{B}_{X}(\bar{\rho})$ so that $f_{0}(x)+\lambda\sum_{i=1}^{m}\max\{0,f_{i}(x)\}\leq\bar{\rho}$ . If $\max_{i=1,\dots,m}f_{i}(x)>0$ , then

[TABLE]

Since $f_{0}(x)\leq\bar{\rho}$ , $\inf_{\mathbb{B}_{X}(\bar{\rho})}f_{0}\leq\bar{\rho}$ . These facts together with the constraint qualification lead to

[TABLE]

Let $\varepsilon\in(0,\hat{\rho}-\bar{\rho}-\psi^{-1}((\bar{\rho}-\mathop{\rm inf}\nolimits_{\mathbb{B}_{X}(\bar{\rho})}f_{0})\lambda^{-1})]$ . There exists $\bar{x}\in\mathop{\mathop{\rm lev}}\nolimits_{0}\{\mathop{\rm max}\nolimits_{i=1,\dots,m}f_{i}\}$ such that $d_{X}(x,\bar{x})\leq\eta+\varepsilon$ and

[TABLE]

Alternatively, if $\max_{i=1,\dots,m}f_{i}(x)\leq 0$ , then $\mathop{\rm inf}\nolimits_{\mathbb{B}_{X}(x,0)}f\leq f_{0}(x)\leq f^{\lambda}(x)$ . We have therefore established the second Kenmochi condition for $f$ and $f^{\lambda}$ with error $\max\{1,\kappa(\hat{\rho})\}(\eta+\varepsilon)$ . Since $f\geq f^{\lambda}$ , the first Kenmochi condition holds with an error of zero. Since $\varepsilon>0$ is arbitrary, we have established via Prop. 4.1 that

[TABLE]

Trivially, $|f^{\lambda}(x)-g^{\lambda}(x)|\leq(1+m\lambda)\mathop{\rm max}\nolimits_{i=0,1,\dots,m}\mathop{\rm sup}\nolimits_{\mathbb{B}_{X}(\bar{\rho})}|f_{i}-g_{i}|$ for $x\in\mathbb{B}_{X}(\bar{\rho})$ so that $d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f^{\lambda},\mathop{\rm epi}g)$ is also bounded by the same quantity; cf. Prop. 4.2. The conclusion then follows by Prop. 2.1.

We again find a tradeoff between two error terms that are nearly identical to those in Thm. 4.5. From this perspective, the penalty formulation has the same rate of convergence as that in Case II and is therefore stable even when the actual feasible set in (1) has an empty interior.

4.2 Calculus Rules for Compositions

The truncated Hausdorff distance between epigraphs of functions that are certain compositions can be bounded as we see next. The results of this subsection extend in some sense Prop. 4.3, which deals with sums. Composition rules for epi-sum and epi-multiplication can be found in [4]; see also [9] for a systematic treatment of the convex case including sums of convex functions.

4.7 Proposition

(compositions; Lipschitz inner mapping). For metric spaces $(X,d_{X})$ and $(Y,d_{Y})$ , with centroids $x^{\rm ctr}$ and $y^{\rm ctr}$ , respectively, $f,g:Y\to\overline{\mathbb{R}}$ , and $F,G:X\to Y$ , suppose that $F^{-1},G^{-1}:Y\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;X$ are nonempty-valued and Lipschitz continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\rho^{*}\in[0,\infty]$ . Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $\rho^{*}>2\rho+\max\{|\alpha|,|\bar{\alpha}|,\mathop{\rm dist}(x^{\rm ctr},F^{-1}(y)),\mathop{\rm dist}(x^{\rm ctr},F^{-1}(\bar{y})),and\newline \mathop{\rm dist}(x^{\rm ctr},G^{-1}(\bar{y}))\}$ for some $(y,\alpha)\in\mathop{\rm epi}f$ and $(\bar{y},\bar{\alpha})\in\mathop{\rm epi}g$ ,

[TABLE]

and $\hat{\rho}>\bar{\rho}+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ .

**Proof. **Let $\hat{F},\hat{G}:X\times\mathbb{R}\to Y\times\mathbb{R}$ have $\hat{F}(x,\alpha)=(F(x),\alpha)$ and $\hat{G}(x,\alpha)=(G(x),\alpha)$ for $(x,\alpha)\in X\times\mathbb{R}$ . Then, it follows directly that

[TABLE]

and we can bring in Thm. 3.5 with $S=\hat{F}^{-1}$ and $T=\hat{G}^{-1}$ . Let $\varepsilon>0$ . There exists $x\in F^{-1}(y)$ such that $d_{X}(x^{\rm ctr},x)\leq\mathop{\rm dist}(x^{\rm ctr},F^{-1}(y))+\varepsilon$ . Then, $f(F(x))=f(y)\leq\alpha$ and $(x,\alpha)\in\mathop{\rm epi}(f\circ F)$ . Consequently,

[TABLE]

Similar arguments establish that

[TABLE]

This ensures that $\rho^{*}$ is selected sufficiently large for the application of Thm. 3.5. Next, we consider the size of $\bar{\rho}$ and find that

[TABLE]

Since similar statements hold with $F$ replaced by $G$ and $\mathop{\rm epi}f$ replaced by $\mathop{\rm epi}g$ , the condition on $\bar{\rho}$ suffices and Thm. 3.5 yields the conclusion.

4.8 Corollary

(compositions; linear inner mapping). For $f,g:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ and nonsingular $n\times n$ matrices $A$ and $B$ , suppose that $\varphi,\psi:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ are defined by $\varphi(x)=f(Ax)$ and $\psi(x)=g(Bx)$ , $x\in\mathbb{R}^{n}$ . Then444Here we use the operator norm for matrices., for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

as long as $\bar{\rho}>\max\{1,\|A\|,\|B\|\}(2\rho+\max\{|\alpha|,|\bar{\alpha}|,\mathop{\rm dist}(0,A^{-1}y)$ , $\mathop{\rm dist}(0,A^{-1}\bar{y})$ , and $\mathop{\rm dist}(0,B^{-1}\bar{y})\})$ for some $(y,\alpha)\in\mathop{\rm epi}f$ and $(\bar{y},\bar{\alpha})\in\mathop{\rm epi}g$ .

**Proof. **The result follows directly from Prop. 4.7.

The corollary extends in some sense [9, Cor. 2.6] by allowing for nonconvex $f,g$ and different linear mappings, but at the expense of requiring invertible mappings.

4.9 Proposition

(compositions; Lipschitz outer function). For metric spaces $(X,d_{X})$ and $(Y,d_{Y})$ , with $y^{\rm ctr}$ being the centroid of $Y$ , suppose that $f:Y\to\mathbb{R}$ is Lipschitz continuous with modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ , and $F,G:X\to Y$ . Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $\hat{\rho}>\bar{\rho}+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm gph}\nolimits F,\mathop{\rm gph}\nolimits G)$ and

[TABLE]

**Proof. **Let $\eta=d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm gph}\nolimits F,\mathop{\rm gph}\nolimits G)$ , $x\in\mathop{\mathop{\rm lev}}\nolimits_{\rho}(f\circ F)\cap\mathbb{B}_{X}(\rho)$ , and $\varepsilon\in(0,\hat{\rho}-\bar{\rho}-\eta]$ . Then, $(x,F(x))\in\mathbb{B}_{X\times Y}(\bar{\rho})$ and there exists $\bar{x}\in X$ with $d_{X}(\bar{x},x)\leq\eta+\varepsilon$ and $d_{Y}(F(x),G(\bar{x}))\leq\eta+\varepsilon$ . Since both $F(x),G(\bar{x})\in\mathbb{B}_{Y}(\hat{\rho})$ ,

[TABLE]

We repeat the argument with the roles of $F$ and $G$ reversed and obtain via Prop. 4.1 that $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}(f\circ F),\mathop{\rm epi}(f\circ G))\leq\max\{1,\kappa(\hat{\rho})\}(\eta+\varepsilon)$ . Since $\varepsilon$ is arbitrary, the conclusion follows.

The previous two propositions largely summarize the line of reasoning in the proofs of Thm. 4.4, 4.5, and 4.6 and thereby facilitate various extensions of Cases I, II, and III.

4.10 Proposition

(inf-projections). For a metric space $X$ and $\{f_{\alpha},g_{\alpha}:X\to\overline{\mathbb{R}}\}$ , with $A$ an arbitrary set, define $f,g:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ as $f(x)=\mathop{\rm inf}\nolimits_{\alpha\in A}f_{\alpha}(x)$ and $g(x)=\mathop{\rm inf}\nolimits_{\alpha\in A}g_{\alpha}(x)$ . Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

**Proof. **In view of the fact that $\mathop{\rm epi}f=\cup_{\alpha\in A}\mathop{\rm epi}f_{\alpha}$ and similarly for $\mathop{\rm epi}g$ , the conclusion follows immediately from Prop. 3.3.

Since a function $f=\sup_{\alpha\in A}f_{\alpha}$ has as epigraph the intersection of $\mathop{\rm epi}f_{\alpha},\alpha\in A$ , it is clear from the discussion in Section 3 that no comparable result is possible for sup-projections. We refer to [9, Cor. 2.5] for a result in the convex case and [36, Thm. 5.6] for one under Lipschitz continuity assumptions.

Given metric spaces $X$ and $Y$ as well as $f:X\to\overline{\mathbb{R}}$ and $F:X\to Y$ , the epi-composition $Ff:Y\to\overline{\mathbb{R}}$ has

[TABLE]

Epi-compositions arise, for example, in parametric studies of equality constrained problems.

4.11 Proposition

(epi-compositions). For metric spaces $(X,d_{X})$ and $(Y,d_{Y})$ , with $x^{\rm ctr}$ being the centroid of $X$ , $f,g:X\to\mathbb{R}$ , and Lipschitz continuous $F,G:X\to Y$ with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\infty$ , suppose that

[TABLE]

Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $\rho^{*}>2\rho+\max\{d_{Y}(F(x),y^{\rm ctr}),$ $d_{Y}(F(\bar{x}),y^{\rm ctr}),d_{Y}(G(\bar{x}),y^{\rm ctr}),|\alpha|,|\bar{\alpha}|\}$ for some $(x,\alpha)\in\mathop{\rm epi}f$ and $(\bar{x},\bar{\alpha})\in\mathop{\rm epi}g$ , $\bar{\rho}>\rho^{*}$ and also exceeds

[TABLE]

and $\hat{\rho}>\bar{\rho}+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ .

**Proof. **We start by confirming that $\mathop{\rm epi}Ff=\{(F(x),\alpha)~{}|~{}(x,\alpha)\in\mathop{\rm epi}f\}$ ; a finite-dimensional version of this fact is asserted as Exercise 1.31 in [32]. For $(\bar{x},\bar{\alpha})\in\mathop{\rm epi}f$ , we have that $\inf\{f(x)~{}|~{}F(x)=F(\bar{x})\}\leq f(\bar{x})\leq\bar{\alpha}$ . Thus, $\mathop{\rm epi}Ff\supset\{(F(x),\alpha)~{}|~{}(x,\alpha)$ $\in$ $\mathop{\rm epi}f\}$ . Suppose that $(y,\alpha)\in\mathop{\rm epi}Ff$ . Then, $(Ff)(y)<\infty$ . If $(Ff)(y)=-\infty$ , then there exists $\bar{x}\in X$ such that $f(\bar{x})\leq\alpha$ and $F(\bar{x})=y$ . Consequently, $(y,\alpha)\in\{(F(x),\alpha)~{}|~{}(x,\alpha)\in\mathop{\rm epi}f\}$ . If $(Ff)(y)\in\mathbb{R}$ , then there exists by assumption $\bar{x}\in X$ such that $f(\bar{x})=\inf\{f(x)~{}|~{}F(x)=y\}$ and $F(\bar{x})=y$ . Thus, $f(\bar{x})=(Ff)(y)\leq\alpha$ , $(\bar{x},\alpha)\in\mathop{\rm epi}f$ , and $\mathop{\rm epi}Ff\subset\{(F(x),\alpha)~{}|~{}(x,\alpha)\in\mathop{\rm epi}f\}$ . We have confirmed the assertion, which also holds for $Gf$ .

The conclusion follows by Thm. 3.5 applied to the mappings $\hat{F},\hat{G}:X\times\mathbb{R}\to Y\times\mathbb{R}$ defined by $\hat{F}(x,\alpha)=(F(x),\alpha)$ and $\hat{G}(x,\alpha)=(G(x),\alpha)$ . Since $F$ and $G$ are Lipschitz continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\infty$ , $\hat{F}$ and $\hat{G}$ are Lipschitz continuous with modulus $\rho\mapsto\max\{1,\kappa(\rho)\}$ relative to any real number. The requirement on $\rho^{*}$ in Thm. 3.5 is satisfied because $\mathop{\rm dist}((y^{\rm ctr},0),\hat{F}(\mathop{\rm epi}f))\leq\max\{d_{Y}(F(x),y^{\rm ctr}),|\alpha|\}$ for $(x,\alpha)\in\mathop{\rm epi}f$ , with similar inequalities holding for $\hat{G}$ and $\mathop{\rm epi}g$ . The requirement on $\bar{\rho}$ in Thm. 3.5 also is satisfied because

[TABLE]

with similar expressions for $\hat{G}$ and $\mathop{\rm epi}g$ .

5 Distances between Graphs of Set-Valued Mappings

We next turn to the solution of generalized equations. For metric spaces $X$ and $Y$ , a set-valued mapping $S:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ and a point $y^{\star}\in Y$ define the generalized equation $y^{\star}\in S(x)$ . Its solution set is $S^{-1}(y^{\star})$ . In this section, we focus on the set of near-solutions that consists of those $x\in X$ with $S(x)$ “nearly reaching” $y^{\star}$ . Specifically, for $\varepsilon\geq 0$ , the set of $\varepsilon$ -solutions is defined as

[TABLE]

For example, suppose that $f:\mathbb{R}^{n}\to\mathbb{R}$ is locally Lipschitz continuous and $C\subset\mathbb{R}^{n}$ is nonempty and closed. Then, an optimality conditions for the problem of minimizing $f+\iota_{C}$ would be

[TABLE]

see [32, Exercise 10.10]. With $S=\partial f+N_{C}$ and $y^{\star}=0$ , the set of $\varepsilon$ -solutions becomes

[TABLE]

The next theorem bounds the discrepancy between near-solutions of generalized equations in terms of the truncated Hausdorff distance without making assumptions about local regularity properties of the underlying set-valued mappings.

5.1 Theorem

(approximation of near-solutions of generalized equations). For metric spaces $X$ and $Y$ , suppose that $S,T:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ have nonempty graphs, $0\leq\varepsilon\leq\rho<\infty$ , and $y^{\star}\in\mathbb{B}_{Y}(\rho-\varepsilon)$ . Then,

[TABLE]

provided that $\delta>\varepsilon+d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)$ . If $X$ and $Y$ are finitely compact and $\mathop{\rm gph}\nolimits T$ is closed, then the result also holds for $\delta=\varepsilon+d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)$ .

**Proof. **Let $\gamma\in(0,\delta-\varepsilon-d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)]$ . Suppose that $x\in S^{-1}\big{(}\mathbb{B}_{Y}(y^{\star},\varepsilon)\big{)}\cap\mathbb{B}_{X}(\rho)$ . Then, there is $y\in S(x)$ with $d_{Y}(y,y^{\star})\leq\varepsilon$ so that $(x,y)\in\mathbb{B}_{X\times Y}(\rho)$ . Consequently, for some $(\bar{x},\bar{y})\in\mathop{\rm gph}\nolimits T$ ,

[TABLE]

Moreover, $d_{Y}(\bar{y},y^{\star})\leq d_{Y}(\bar{y},y)+d_{Y}(y,y^{\star})\leq d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)+\gamma+\varepsilon\leq\delta$ , which implies that $\bar{x}\in T^{-1}(\mathbb{B}_{Y}(y^{\star},\delta))$ . We have established that

[TABLE]

Since $\gamma$ is arbitrary, the first conclusion follows. The minimum distance to a nonempty closed subset of a finitely compact space is attained [33, Lemma 2.2], which allows us to use $\gamma=0$ in the above arguments. This establishes the second conclusion.

The result of the theorem is sharp. For example, consider $S,T:\mathbb{R}\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;\mathbb{R}$ with $S(x)=[x,\infty)$ when $x\in[0,1]$ and $S(x)=\emptyset$ otherwise; and $T(x)=(1,\infty)$ when $x\in[1,2]$ and $T(x)=\emptyset$ otherwise. Then for $\rho\geq 0$ , $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)=1$ , $S^{-1}(0)=\{0\}$ , $T^{-1}(\delta)=[1,2]$ , and $\mathop{\rm exs}(S^{-1}(0)\cap\mathbb{B}_{\mathbb{R}}(\rho);T^{-1}(\mathbb{B}_{\mathbb{R}}(\delta))=1$ when $\delta>1$ . When $\delta\leq 1$ , the excess becomes infinity because $T^{-1}(\delta)=\emptyset$ . If $T$ is modified to having $T(x)=[1,\infty)$ for $x\in[1,2]$ , then $\delta=1$ gives an excess of one.

5.2 Theorem

(sum of mappings under Lipschitz property). For normed linear spaces $X$ and $Y$ , suppose that $S_{1},T_{1}:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ are nonempty-valued and Lipschitz continuous with common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\rho^{*}\in[0,\infty]$ and $S_{2},T_{2}:X\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;Y$ have nonempty graphs. Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

provided that $\bar{\rho}\geq\rho+\rho^{\prime}$ , with $\rho^{\prime}$ such that $\mathbb{B}_{Y}(\rho^{\prime})$ contains both $S_{1}(x)$ and $T_{1}(x)$ for all $x\in\mathbb{B}_{X}(\rho)$ , $\hat{\rho}>\rho+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm gph}\nolimits S_{2},\mathop{\rm gph}\nolimits T_{2})$ , and $\rho^{*}>3\rho^{\prime}+\kappa(\hat{\rho})(\hat{\rho}-\rho)$ .

**Proof. **Let $(x,y)\in\mathop{\rm gph}\nolimits(T_{1}+T_{2})\cap\mathbb{B}_{X\times Y}(\rho)$ . Thus, for some $y_{1}\in T_{1}(x)$ and $y_{2}\in T_{2}(x)$ we have $y=y_{1}+y_{2}$ and $\|y_{2}\|\leq\|y\|+\|y_{1}\|\leq\rho+\rho^{\prime}\leq\bar{\rho}$ . Let $\varepsilon\in(0,\hat{\rho}-\rho-d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm gph}\nolimits S_{2},\mathop{\rm gph}\nolimits T_{2})]$ . Consequently, $(x,y_{2})\in\mathop{\rm gph}\nolimits T_{2}\cap\mathbb{B}_{X\times Y}(\bar{\rho})$ so there exists $(\bar{x},\bar{y}_{2})\in\mathop{\rm gph}\nolimits S_{2}$ with $\max\{\|x-\bar{x}\|,\|y_{2}-\bar{y}_{2}\|\}\leq d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm gph}\nolimits S_{2},\mathop{\rm gph}\nolimits T_{2})+\varepsilon\leq\hat{\rho}-\rho$ , which ensures that $\|\bar{x}\|\leq\|x-\bar{x}\|+\|x\|\leq\hat{\rho}-\rho+\rho\leq\hat{\rho}$ . Since $S_{1}$ is nonempty-valued, there is $\bar{y}_{1}\in S_{1}(\bar{x})$ such that $\mathop{\rm dist}(y_{1},S_{1}(\bar{x}))\geq\|y_{1}-\bar{y}_{1}\|-\varepsilon$ . Therefore, $(\bar{x},\bar{y}_{1}+\bar{y}_{2})\in\mathop{\rm gph}\nolimits(S_{1}+S_{2})$ . Since $y_{1}\in\mathbb{B}_{Y}(\rho^{\prime})$ , it follows that

[TABLE]

where the last inequality is a consequence of Prop. 2.1; $\rho^{*}$ is indeed sufficiently large because $\mathop{\rm dist}(y^{\rm ctr},T_{1}(x))\leq\rho^{\prime}$ , $\mathop{\rm dist}(y^{\rm ctr},S_{1}(x))\leq\rho^{\prime}$ , and

[TABLE]

Moreover, with $\bar{y}=\bar{y}_{1}+\bar{y}_{2}$ , $\|y-\bar{y}\|$ is not greater than

[TABLE]

This establishes that $(\bar{x},\bar{y})\in\mathop{\rm gph}\nolimits(S_{1}+S_{2})$ satisfies

[TABLE]

Since $(x,y)$ and $\varepsilon$ are arbitrary, we obtain that

[TABLE]

The roles of $(S_{1},S_{2})$ and $(T_{1},T_{2})$ can be reversed, which leads to the conclusion.

A series of results are now possible with applications to games as well as equilibrium and generalized fixed-point problems. We limit the discussion to optimality conditions. As a preliminary example, let $C,D\subset\mathbb{R}^{n}$ be nonempty, possibly nonconvex sets and $f,g:\mathbb{R}^{n}\to\mathbb{R}$ be smooth and their gradients be Lipschitz continuous with modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\rho^{*}=\infty$ , i.e., $\|\nabla f(x)-\nabla f(\bar{x})\|\leq\kappa(\rho)\|x-\bar{x}\|$ for $\|x\|\leq\rho$ , $\|\bar{x}\|\leq\rho$ , and $\rho\in\mathbb{R}_{+}$ , with the same condition holding for $\nabla g$ . Thm. 5.2 enables a study of the optimality conditions $0\in\nabla f(x)+N_{C}(x)$ and $0\in\nabla g(x)+N_{D}(x)$ . The discrepancy between the corresponding near-stationary points are bounded via Thm. 5.1 by

[TABLE]

for sufficiently large $\hat{\rho}$ and $\bar{\rho}$ with further simplifications possible if $C$ and $D$ are convex, cf. Prop. 2.5.

Example 3: difference-of-convex functions. For convex functions $f_{1}:\mathbb{R}^{n}\to\mathbb{R}$ and $f_{2}:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ , the latter also lsc and proper, as well as a point $\bar{x}$ with $f_{2}(\bar{x})$ finite, the following optimality condition holds555For subsets $A$ and $B$ of a linear space, $A-B:=\{a-b~{}|~{}a\in A,b\in B\}$ . [19]:

[TABLE]

The minimization of such difference-of-convex functions arises in numerous applications include some in moderns statistics [16, 34]. Error analysis of near-stationarity in this case can be carried our as follows.

Suppose initially that $f_{1},g_{1}$ are also smooth and $\rho\in\mathbb{R}_{+}$ . Then, there are $\alpha,\bar{\rho}\in\mathbb{R}_{+}$ such that666We here use the Euclidean distance on $\mathbb{R}^{n}$ .

[TABLE]

which via Thm. 5.1 gives error estimates of near-stationary points. We can establish this fact by setting $S_{1}=-\nabla f_{1}$ , $T_{1}=-\nabla g_{1}$ , $S_{2}=\partial f_{2}$ , and $T_{2}=\partial g_{2}$ so that $S_{1}$ and $T_{1}$ are nonempty-valued and Lipschitz continuous with some common modulus $\kappa:\mathbb{R}_{+}\to\mathbb{R}_{+}$ relative to $\rho^{*}=\infty$ . An application of Thm. 5.2 with these set-valued mappings and $\rho^{\prime}=\sup_{\|x\|_{2}\leq\rho}\max\{\|\nabla f(x)\|_{2},$ $\|\nabla g(x)\|_{2}\}$ , $\bar{\rho}=\rho+\rho^{\prime}$ , and $\hat{\rho}>\rho+d\hat{\kern-1.49994ptl}_{\bar{\rho}}(\mathop{\rm gph}\nolimits\partial f_{2},\mathop{\rm gph}\nolimits\partial g_{2})$ yields

[TABLE]

An application of Prop. 2.4 gives the result after an appropriate enlargement of $\bar{\rho}$ .

We can relax the assumption about $f_{1}$ and $g_{1}$ being smooth by stating the optimality condition in terms of the set-valued mappings $S,T:\mathbb{R}^{n}\times\mathbb{R}^{n}\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;\mathbb{R}^{n}\times\mathbb{R}^{n}$ with expressions

[TABLE]

Clearly, $0\in S(x,v)$ implies that $0\in\partial f_{2}(x)-\partial f_{1}(x)$ ; and $0\in\partial f_{2}(x)-\partial f_{1}(x)$ implies that there exists a “multiplier vector” $v\in\mathbb{R}^{n}$ such that $0\in S(x,v)$ . A bound on $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)$ will then via Thm. 5.1 furnish a bound on the difference between near-stationary points in the “primal-dual” space $\mathbb{R}^{n}\times\mathbb{R}^{n}$ as one passes from minimizing $f_{2}-f_{1}$ to minimizing $g_{2}-g_{1}$ . For simplicity, we adopt the sup-norm for the remainder of this example. Specifically, we find that for $\rho\in\mathbb{R}_{+}$

[TABLE]

To see this let $((\bar{x},\bar{v}),(\bar{y}_{1},\bar{y}_{2}))\in\mathop{\rm gph}\nolimits S\cap\mathbb{B}_{\mathbb{R}^{4n}}(\rho)$ , i.e., $\bar{y}_{1}+\bar{v}\in\partial f_{1}(\bar{x})$ and $\bar{y}_{2}+\bar{v}\in\partial f_{2}(\bar{x})$ . For $i=1,2$ , since $\|\bar{x}\|_{\infty}\leq\rho$ and $\|\bar{y}_{i}+\bar{v}\|_{\infty}\leq 2\rho$ , there exists $y_{i}\in\mathbb{R}^{n}$ such that

[TABLE]

which implies $((\bar{x},\bar{v}),(y_{1},y_{2}))\in\mathop{\rm gph}\nolimits T$ . The distance between $((\bar{x},\bar{v}),(y_{1},y_{2}))$ and $((\bar{x},\bar{v}),(\bar{y}_{1},\bar{y}_{2}))$ then yields the stated upper bound on $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)$ .

Example 4: KKT conditions. Theorem 5.1 also applies to the KKT conditions for the problem

[TABLE]

when compared to those of an alternative, possibly approximating, problem obtained by replacing the functions by the smooth functions $g_{0},g_{1},\dots,g_{m}$ . Clearly, $(x,y)\in\mathbb{R}^{n+m}$ satisfies the KKT conditions for the actual problem if and only if $0\in S(x,y)$ and likewise those of the alternative problem if and only if $0\in T(x,y)$ , where the set-valued mappings $S,T:\mathbb{R}^{n+m}\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;\mathbb{R}^{3m+n}$ have values

[TABLE]

with $y=(y_{1},\dots,y_{m})$ . A bound on the truncated Hausdorff distance between the graphs of these two set-valued mappings furnishes the critical component in the application of Thm. 5.1. In this example, we equip $\mathbb{R}^{n+m}$ and $\mathbb{R}^{3m+n}$ with the sup-norm. Then, for $\rho\in\mathbb{R}_{+}$ ,

[TABLE]

where

[TABLE]

This assertion is realized as follows. Let $((x,y),(u,v,w,s))\in\mathop{\rm gph}\nolimits S\cap\mathbb{B}_{\mathbb{R}^{4m+2n}}(\rho)$ be arbitrary and construct $\bar{x}=x$ , $\bar{y}=y$ , $\bar{u}=(\bar{u}_{1},\dots,\bar{u}_{m})$ , with $\bar{u}_{i}=\max\{g_{i}(x),u_{i}\}$ for all $i$ , $\bar{v}=v$ , $\bar{w}=(\bar{w}_{1},\dots,\bar{w}_{m})$ , with $\bar{w}_{i}=y_{i}g_{i}(x)$ for all $i$ , and $\bar{s}=\nabla g_{0}(x)+\sum_{i=1}^{m}y_{i}\nabla g_{i}(x)$ . It is trivial to verify that $((\bar{x},\bar{y}),(\bar{u},\bar{v},\bar{w},\bar{s}))\in\mathop{\rm gph}\nolimits T$ . For all $i$ ,

[TABLE]

Consequently, the distance between $((x,y),(u,v,w,s))$ and $((\bar{x},\bar{y}),(\bar{u},\bar{v},\bar{w},\bar{s}))$ is at most $\max\{\delta,$ $\rho\delta,$ $(1+m\rho)\eta\}$ and we have that $\mathop{\rm exs}((\mathop{\rm gph}\nolimits S\cap\mathbb{B}_{\mathbb{R}^{4m+2n}}(\rho);\mathop{\rm gph}\nolimits T))$ is bounded by the same quantity. The assertion then follows by symmetry.

We see that despite the fact that minimizers of inequality-constrained problems are unstable under pointwise perturbations of the constraint functions (cf. Section 4), the KKT system has stable solutions in the sense that the excess of near-solutions of one KKT system over those of the other exhibits a Lipschitz property in those perturbations.

We end the paper with a result that generalizes the ideas of Examples 3 and 4. For a proper lsc function $\varphi:\mathbb{R}^{m}\to\overline{\mathbb{R}}$ and a smooth mapping $F:\mathbb{R}^{n}\to\mathbb{R}^{m}$ , we recall that under rather weak assumptions777For example, if $\varphi$ is convex, then it suffices that $\mathop{\rm dom}\varphi$ cannot be separated from the range of the linearized mapping $w\mapsto F(\bar{x})+\nabla F(\bar{x})w$ for a local minimizer $\bar{x}$ . the composite function $\varphi\circ F$ has $0\in\nabla F(x)^{\top}\partial\varphi(F(x))$ as a necessary optimality condition [32, Thm. 10.6], where the $m\times n$ -matrix $\nabla F(x)$ is the Jacobian of $F$ at $x$ . By introducing auxiliary vectors $y,z\in\mathbb{R}^{m}$ , the optimality condition is equivalently stated in terms of the set-valued mapping $S:\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{n}$ as $0\in S(x,y,z)$ , with

[TABLE]

Since $0\in S(x,y,z)$ is also an optimality condition for the problem of minimizing $\varphi(z)$ subject to $F(x)=z$ , $y$ can be interpreted as a multiplier vector and $z$ as representing feasibility. Parallel conditions hold for a composite function $\psi\circ G$ expressed in terms of $\psi:\mathbb{R}^{m}\to\overline{\mathbb{R}}$ and $G:\mathbb{R}^{n}\to\mathbb{R}^{m}$ , which we may think of as approximations of $\varphi$ and $F$ . Specifically, under the appropriate assumptions, an optimality condition becomes $0\in T(x,y,z)$ , where the set-valued mapping $T:\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\;{\lower 1.0pt\hbox{$ \rightarrow $}}\kern-12.0pt\hbox{\raise 2.5pt\hbox{$ \rightarrow $}}\;\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{n}$ has

[TABLE]

In view of Thm. 5.1, a bound on $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm gph}\nolimits S,\mathop{\rm gph}\nolimits T)$ leads to estimates of the change in near-stationary points as we pass from $\varphi\circ F$ to $\psi\circ G$ .

5.3 Theorem

(stationarity of composite functions). For proper lsc functions $\varphi,\psi:\mathbb{R}^{m}\to\overline{\mathbb{R}}$ , smooth mappings $F,G:\mathbb{R}^{n}\to\mathbb{R}^{m}$ , and the resulting set-valued mappings $S$ and $T$ expressed in (2) and (3), we have for $\rho\in\mathbb{R}_{+}$ that888Here, $d\hat{\kern-1.49994ptl}_{\rho}$ is defined in terms of the product norm on $\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{n}$ constructed by any norms on $\mathbb{R}^{n}$ and $\mathbb{R}^{m}$ and the matrix norm is any one compatible with the norm on $\mathbb{R}^{m}$ .

[TABLE]

**Proof. **Suppose that $((\bar{x},\bar{y},\bar{z}),(\bar{u},\bar{v},\bar{w}))\in\mathop{\rm gph}\nolimits S\cap\mathbb{B}_{X}(\rho)$ , where $X=\mathbb{R}^{n}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{m}\times\mathbb{R}^{n}$ and using the norm indicated in the footnote. Then,

[TABLE]

Since $(\bar{z},\bar{v}+\bar{y})\in\mathop{\rm gph}\nolimits\partial\varphi\cap\mathbb{B}_{\mathbb{R}^{m}\times\mathbb{R}^{m}}(2\rho)$ (using the product norm on $\mathbb{R}^{m}\times\mathbb{R}^{m}$ ) and the fact that $\mathop{\rm gph}\nolimits\partial\psi$ is nonempty [32, Cor. 8.10], there exist $z,v\in\mathbb{R}^{m}$ such that $(z,v+\bar{y})\in\mathop{\rm gph}\nolimits\partial\psi$ and neither $\|z-\bar{z}\|$ nor $\|(\bar{v}-\bar{y})-(v-\bar{y})\|$ exceed $d\hat{\kern-1.49994ptl}_{2\rho}(\mathop{\rm gph}\nolimits\partial\varphi,\mathop{\rm gph}\nolimits\partial\psi)$ . Construct $u=G(\bar{x})-z$ and $w=\nabla G(\bar{x})^{\top}\bar{y}$ . Clearly, $((\bar{x},\bar{y},z),(u,v,w))\in\mathop{\rm gph}\nolimits T$ and

[TABLE]

Moreover, due to the assumed compatibility of the adopted matrix norm relative to the norm on $\mathbb{R}^{m}$ ,

[TABLE]

The point $((\bar{x},\bar{y},z),(u,v,w))$ is therefore within a distance of

[TABLE]

of $((\bar{x},\bar{y},\bar{z}),(\bar{u},\bar{v},\bar{w}))$ , which establishes the conclusion after we realize the obvious symmetry in the result.

Appendix A Proofs

Proof of Prop. 2.2. Denote by $d_{X}$ the metric on $X$ and $\eta=d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ . Let $\gamma\in(0,\rho-\varepsilon-\inf f)$ . Since $\gamma\mbox{-}\mathop{\rm argmin}\nolimits f\cap\mathbb{B}_{X}(\rho)\neq\emptyset$ , there exists $\bar{x}\in\mathbb{B}_{X}(\rho)$ such that $f(\bar{x})\leq\inf f+\gamma<\rho-\varepsilon\leq\rho$ . Moreover, $f(\bar{x})\geq\inf f\geq-\rho$ . Thus, $(\bar{x},f(\bar{x}))\in\mathop{\rm epi}f\cap\mathbb{B}_{X\times\mathbb{R}}(\rho)$ and there exists $(x,\alpha)\in\mathop{\rm epi}g$ such that $\max\{d_{X}(x,\bar{x}),|\alpha-f(\bar{x})|\}\leq\mathop{\rm dist}((\bar{x},f(\bar{x})),\mathop{\rm epi}g)+\gamma$ . Then,

[TABLE]

and also $\eta\geq|\alpha-f(\bar{x})|-\gamma$ . Collecting the above results yield $\inf g\leq g(x)\leq\alpha\leq f(\bar{x})+\eta+\gamma\leq\inf f+\eta+2\gamma$ . Since $\gamma$ is arbitrary, we have established that $\inf g\leq\inf f+\eta$ . The same argument with the roles of $f$ and $g$ reversed leads to the first conclusion.

Let $\bar{x}\in\varepsilon\mbox{-}\mathop{\rm argmin}\nolimits g\cap\mathbb{B}_{X}(\rho)$ . Then, $g(\bar{x})\leq\inf g+\varepsilon<\rho$ , $g(\bar{x})\geq\inf g\geq-\rho$ , and $(\bar{x},g(\bar{x}))\in\mathop{\rm epi}g\cap\mathbb{B}_{X\times\mathbb{R}}(\rho)$ . Let $\gamma>0$ . There exists $(x,\alpha)\in\mathop{\rm epi}f$ such that $\max\{d_{X}(x,\bar{x}),|\alpha-g(\bar{x})|\}\leq\mathop{\rm dist}((\bar{x},g(\bar{x})),\mathop{\rm epi}f)+\gamma$ . Consequently, $\eta\geq d_{X}(x,\bar{x})-\gamma\mbox{ and }\eta\geq|\alpha-g(\bar{x})|-\gamma$ . These facts together with the first conclusion establish that $f(x)\leq\alpha\leq g(\bar{x})+\eta+\gamma\leq\inf g+\varepsilon+\eta+\gamma\leq\inf f+\varepsilon+2\eta+\gamma$ . Thus, $x\in(\varepsilon+2\eta+\gamma)\mbox{-}\mathop{\rm argmin}\nolimits f$ and $d_{X}(x,\bar{x})\leq\eta+\gamma$ , and then also $\mathop{\rm exs}(\varepsilon\mbox{-}\mathop{\rm argmin}\nolimits g\cap\mathbb{B}_{X}(\rho);~{}(\varepsilon+2\eta+\bar{\gamma})\mbox{-}\mathop{\rm argmin}\nolimits f\big{)}\leq\eta+\gamma$ when $\bar{\gamma}\geq\gamma$ . Since $\gamma$ is arbitrary, the second conclusion follows.

Proof of Prop. 2.3. Let $\bar{x}\in\mathop{\mathop{\rm lev}}\nolimits_{\delta}g\cap\mathbb{B}_{X}(\rho)$ and $B=\mathbb{B}_{X\times\mathbb{R}}(\rho)$ . Then, $g(\bar{x})\leq\delta\leq\rho$ . There are two cases. Suppose that $g(\bar{x})\geq-\rho$ . Then, $(\bar{x},g(\bar{x}))\in\mathop{\rm epi}g\cap B$ . Let $\gamma\in(0,\varepsilon-\delta-\mathop{\rm exs}(\mathop{\rm epi}g\cap B;\mathop{\rm epi}f))$ . There exists $(x,\alpha)\in\mathop{\rm epi}f$ such that $\max\{d_{X}(x,\bar{x}),|\alpha-g(\bar{x})|\}\leq\mathop{\rm dist}((\bar{x},g(\bar{x})),\mathop{\rm epi}f)+\gamma\leq\mathop{\rm exs}\big{(}\mathop{\rm epi}g\cap B;\mathop{\rm epi}f\big{)}+\gamma$ . Consequently,

[TABLE]

Thus, $x\in\mathop{\mathop{\rm lev}}\nolimits_{\varepsilon}f$ and $d_{X}(x,\bar{x})\leq\mathop{\rm exs}(\mathop{\rm epi}g\cap B;\mathop{\rm epi}f))+\gamma$ . This implies that

[TABLE]

If $g(\bar{x})<-\rho$ , the same holds because the arguments in that case can be carried out with $g(\bar{x})$ replaced by $-\rho$ . Since $\gamma$ is arbitrary, the second conclusion follows.

Proof of Prop. 3.6. Let $C=\sum_{i=1}^{m}C_{i}$ , $D=\sum_{i=1}^{m}D_{i}$ , and $\varepsilon>0$ . Suppose without loss of generality that $d\hat{\kern-1.49994ptl}_{\rho}(C,D)=\mathop{\rm exs}(C\cap\mathbb{B}_{X}(\rho);D)$ . If $C\cap\mathbb{B}_{X}(\rho)=\emptyset$ , $d\hat{\kern-1.49994ptl}_{\rho}(C,D)=0$ and the result holds trivially. Thus, suppose that $C\cap\mathbb{B}_{X}(\rho)\neq\emptyset$ . Then, there are $x_{i}\in C_{i}$ and $y_{i}\in D_{i}$ , $i=1,\dots,m$ , such that $x=\sum_{i=1}^{m}x_{i}\in C\cap\mathbb{B}_{X}(\rho)$ , $\|x_{i}-y_{i}\|\leq\mathop{\rm dist}(x_{i},D_{i})+\varepsilon$ , and

[TABLE]

Since $x_{i}\in C_{i}$ implies $x_{i}\in\mathbb{B}_{X}(\rho)$ ,

[TABLE]

Hence, $d\hat{\kern-1.49994ptl}_{\rho}(C,D)\leq\sum_{i=1}^{m}d\hat{\kern-1.49994ptl}_{\rho}(C_{i},D_{i})+(m+1)\varepsilon$ . Since $\varepsilon$ is arbitrary, the first conclusion follows. Under the relaxed assumption, $x_{1}\in\mathbb{B}_{X}(m\rho)$ because $\{x,x_{i}\in\mathbb{B}_{X}(\rho),i=2,\dots,m\}$ . Thus,

[TABLE]

Since the other arguments carry over, the second conclusion follows.

Proof of Prop. 4.1. Let $\eta=d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ and $\varepsilon>0$ . Suppose that $(x,f(x))\in\mathop{\rm epi}f\cap\mathbb{B}_{X\times\mathbb{R}}(\rho)$ . Then, there exist $(\bar{x},\bar{\alpha})\in\mathop{\rm epi}g$ such that $d_{X}(\bar{x},x)\leq\eta+\varepsilon$ , $|\alpha-f(x)|\leq\eta+\varepsilon$ , and $g(\bar{x})\leq\alpha<\infty$ . Thus, $g(\bar{x})\leq\alpha\leq f(x)+\eta+\varepsilon\leq\max\{f(x),-\rho\}+\eta+\varepsilon$ . This establishes that $\inf_{\mathbb{B}(x,\eta+\varepsilon)}g\leq\max\{f(x),-\rho\}+\eta+\varepsilon$ for $x\in\mathop{\mathop{\rm lev}}\nolimits_{\rho}f\cap\mathbb{B}_{X}(\rho)$ and $f(x)\geq-\rho$ . Suppose that $x\in\mathop{\mathop{\rm lev}}\nolimits_{\rho}f\cap\mathbb{B}_{X}(\rho)$ and $f(x)<-\rho$ . Then, $(x,-\rho)\in\mathop{\rm epi}f\cap\mathbb{B}_{X\times\mathbb{R}}(\rho)$ and there exist $(\bar{x},\bar{\alpha})\in\mathop{\rm epi}g$ such that $d_{X}(\bar{x},x)\leq\eta+\varepsilon$ , $|\alpha+\rho|\leq\eta+\varepsilon$ , and $g(\bar{x})\leq\alpha<\infty$ . Consequently,

[TABLE]

Repeating the arguments with the roles of $f$ and $g$ reversed, we establish that the two sets of constraint on the right-hand side in the proposition is satisfied with $\eta+\varepsilon$ . Thus, the right-hand side does not exceed $\eta+\varepsilon$ . Since $\varepsilon$ is arbitrary, the right-hand side furnishes a lower bound on $d\hat{\kern-1.49994ptl}_{\rho}(\mathop{\rm epi}f,\mathop{\rm epi}g)$ . By [33, Prop. 3.2], it is also an upper bound; the lsc assumption in that proposition is not needed in its proof.

Acknowledgement. This work is supported in part by DARPA (Lagrange) under HR0011-8-34187, ONR (Science of Autonomy) under N0001419WX00183, and AFOSR (Optimization and Discrete Mathematics) under F4FGA08272G001.

Bibliography40

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] H. Attouch. Variational Convergence for Functions and Operators . Applicable Mathematics Sciences. Pitman, 1984.
2[2] H. Attouch, R. Lucchetti, and R. J-B Wets. The topology of the ρ 𝜌 {\rho} -Hausdorff distance. Annali di Matematica pura ed applicata , CLX:303–320, 1991.
3[3] H. Attouch and R. J-B Wets. Isometries for the Legendre-Fenchel transform. Transactions of the American Mathematical Society , 296:33–60, 1986.
4[4] H. Attouch and R. J-B Wets. Quantitative stability of variational systems: I. The epigraphical distance. Transactions of the American Mathematical Society , 328(2):695–729, 1991.
5[5] H. Attouch and R. J-B Wets. Quantitative stability of variational systems: II. A framework for nonlinear conditioning. SIAM J. Optimization , 3:359–381, 1993.
6[6] H. Attouch and R. J-B Wets. Quantitative stability of variational systems: III. ε 𝜀 \varepsilon -approximate solutions. Mathematical Programming , 61:197–214, 1993.
7[7] J.-P. Aubin and I. Ekeland. Applied Nonlinear Analysis . Issue 1237 of Pure and applied mathematics. Wiley, 1984.
8[8] D. Aze. A survey on error bounds for lower semicontinuous functions. In Proceedings of 2003 MODESMAI Conference, ESAIM Proc., vol. 13. EDP Sci., Les Ulis (2003) , pages 1––17, 2003.