Risk-Averse Models in Bilevel Stochastic Linear Programming

J. Burtscheidt; M. Claus; S. Dempe

arXiv:1901.11349·math.OC·February 1, 2019·SIAM J. Optim.

Risk-Averse Models in Bilevel Stochastic Linear Programming

J. Burtscheidt, M. Claus, S. Dempe

PDF

Open Access

TL;DR

This paper studies bilevel stochastic linear programming models where the leader's risk-averse decision-making is analyzed under distributional perturbations, providing stability, continuity, and reformulation results.

Contribution

It introduces stability and differentiability results for risk measures in bilevel stochastic problems and offers a reformulation approach for finite discrete distributions.

Findings

01

Qualitative stability under probability distribution perturbations

02

Lipschitz continuity and differentiability conditions for risk measures

03

Reformulation of finite discrete distribution problems as standard bilevel problems

Abstract

We consider bilevel linear problems, where some parameters are stochastic, and the leader has to decide in a here-and-now fashion, while the follower has complete information. In this setting, the leader's outcome can be modeled by a random variable, which we evaluate based on some law-invariant convex risk measure. A qualitative stability result under perturbations of the underlying probability distribution is presented. Moreover, for the expectation, the expected excess, and the upper semideviation, we establish Lipschitz continuity as well as sufficient conditions for differentiability. Finally, for finite discrete distributions, we reformulate the bilevel stochastic problems as standard bilevel problems and propose a regularization scheme for bilevel linear problems.

Equations284

x min {c^{⊤} x + y min {q^{⊤} y ∣ y \in Ψ (x, z)} ∣ x \in X},

x min {c^{⊤} x + y min {q^{⊤} y ∣ y \in Ψ (x, z)} ∣ x \in X},

Ψ (x, z) := y Argmin {d^{⊤} y ∣ A y \leq T x + z}

Ψ (x, z) := y Argmin {d^{⊤} y ∣ A y \leq T x + z}

f (x, Z (\cdot)) := c^{⊤} x + y min {q^{⊤} y ∣ y \in Ψ (x, Z (\cdot))}

f (x, Z (\cdot)) := c^{⊤} x + y min {q^{⊤} y ∣ y \in Ψ (x, Z (\cdot))}

x min {R [f (x, Z (\cdot))] ∣ x \in X} .

x min {R [f (x, Z (\cdot))] ∣ x \in X} .

CVaR_{α} [\cdot] = η \in R min {η + \frac{1}{1 - α} EE_{η} [\cdot]}

CVaR_{α} [\cdot] = η \in R min {η + \frac{1}{1 - α} EE_{η} [\cdot]}

Q_{R} (x) := R [f (x, Z (\cdot)] .

Q_{R} (x) := R [f (x, Z (\cdot)] .

∣ f (x, z) - f (x^{'}, z^{'}) ∣

∣ f (x, z) - f (x^{'}, z^{'}) ∣

\leq ∥ c ∥∥ x - x^{'} ∥ + ∥ q ∥∥ y - y^{'} ∥

Ψ (x^{'}, z^{'}) \subseteq Ψ (x, z) + Λ∥ (x, z) - (x^{'}, z^{'}) ∥ B

Ψ (x^{'}, z^{'}) \subseteq Ψ (x, z) + Λ∥ (x, z) - (x^{'}, z^{'}) ∥ B

y min {q^{⊤} y ∣ y \in Ψ (x^{'}, z^{'})}

y min {q^{⊤} y ∣ y \in Ψ (x^{'}, z^{'})}

M_{s}^{p} := {μ \in P (R^{s}) ∣ \int_{R^{s}} ∥ z ∥^{p} \leavevmode μ (d z) < \infty}

M_{s}^{p} := {μ \in P (R^{s}) ∣ \int_{R^{s}} ∥ z ∥^{p} \leavevmode μ (d z) < \infty}

F_{Z} = {x \in R^{n} ∣ (x, z) \in F \forall z \in supp μ_{Z}}

F_{Z} = {x \in R^{n} ∣ (x, z) \in F \forall z \in supp μ_{Z}}

∣ Q_{E} (x) ∣

∣ Q_{E} (x) ∣

\leq ∣ f (x, z_{0}) ∣ + \int_{supp μ_{Z}} ∣ f (x, z) - f (x, z_{0}) ∣ \leavevmode μ_{Z} (d z)

\leq ∣ f (x, z_{0}) ∣ + L ∥ z_{0} ∥ + \int_{supp μ_{Z}} L ∥ z ∥ \leavevmode μ_{Z} (d z) < \infty.

∣ Q_{E} (x) - Q_{E} (x^{'}) ∣ \leq \int_{supp μ} ∣ f (x, z) - f (x^{'}, z) ∣ \leavevmode μ (d z) \leq L ∥ x - x^{'} ∥

∣ Q_{E} (x) - Q_{E} (x^{'}) ∣ \leq \int_{supp μ} ∣ f (x, z) - f (x^{'}, z) ∣ \leavevmode μ (d z) \leq L ∥ x - x^{'} ∥

f(x,z)=c^{\top}x+\min_{y_{+},y_{-},t}\big{\{}q^{\top}(y_{+}-y_{-})\;|\;(y_{+},y_{-},t)\in\Psi_{=}(x,z)\big{\}},

f(x,z)=c^{\top}x+\min_{y_{+},y_{-},t}\big{\{}q^{\top}(y_{+}-y_{-})\;|\;(y_{+},y_{-},t)\in\Psi_{=}(x,z)\big{\}},

Ψ_{=} (x, z) = y_{+}, y_{-}, t Argmin {d^{⊤} (y_{+} - y_{-}) ∣ A (y_{+} - y_{-}) + t = T x + z, y_{+}, y_{-}, t \geq 0} .

Ψ_{=} (x, z) = y_{+}, y_{-}, t Argmin {d^{⊤} (y_{+} - y_{-}) ∣ A (y_{+} - y_{-}) + t = T x + z, y_{+}, y_{-}, t \geq 0} .

\overset{q}{^} := q - q 0_{s}, \overset{y}{^} := y_{+} y_{-} t, \hat{d} := d - d 0_{s}, and \hat{A} := (A, - A, I_{s})

\overset{q}{^} := q - q 0_{s}, \overset{y}{^} := y_{+} y_{-} t, \hat{d} := d - d 0_{s}, and \hat{A} := (A, - A, I_{s})

f(x,z)=c^{\top}x+\min_{\hat{y}}\big{\{}\hat{q}^{\top}\hat{y}\;|\;\hat{y}\in\underset{y^{\prime}}{\mathrm{Argmin}}\{\hat{d}^{\top}y^{\prime}\;|\;\hat{A}y^{\prime}=Tx+z,\;y^{\prime}\geq 0\}\big{\}}.

f(x,z)=c^{\top}x+\min_{\hat{y}}\big{\{}\hat{q}^{\top}\hat{y}\;|\;\hat{y}\in\underset{y^{\prime}}{\mathrm{Argmin}}\{\hat{d}^{\top}y^{\prime}\;|\;\hat{A}y^{\prime}=Tx+z,\;y^{\prime}\geq 0\}\big{\}}.

A := {\hat{A}_{B} \in R^{s \times s} ∣ \hat{A}_{B} is a regular submatrix of \hat{A}}

A := {\hat{A}_{B} \in R^{s \times s} ∣ \hat{A}_{B} is a regular submatrix of \hat{A}}

\hat{A}_{B^{'}}^{- 1} (T x + z) = \hat{A}_{B}^{- 1} (T x + z) and \hat{d}_{N}^{⊤} - \hat{d}_{B}^{⊤} \hat{A}_{B}^{- 1} \hat{A}_{N} \geq 0.

\hat{A}_{B^{'}}^{- 1} (T x + z) = \hat{A}_{B}^{- 1} (T x + z) and \hat{d}_{N}^{⊤} - \hat{d}_{B}^{⊤} \hat{A}_{B}^{- 1} \hat{A}_{N} \geq 0.

A^{*} := {\hat{A}_{B} \in A ∣ \hat{d}_{N}^{⊤} - \hat{d}_{B}^{⊤} \hat{A}_{B}^{- 1} \hat{A}_{N} \geq 0}

A^{*} := {\hat{A}_{B} \in A ∣ \hat{d}_{N}^{⊤} - \hat{d}_{B}^{⊤} \hat{A}_{B}^{- 1} \hat{A}_{N} \geq 0}

f(x,z)=c^{\top}x+\min_{\hat{A}_{B}}\big{\{}\hat{q}^{\top}_{B}\hat{A}_{B}^{-1}(Tx+z)\;|\;\hat{A}_{B}^{-1}(Tx+z)\geq 0,\;\hat{A}_{B}\in\mathcal{A}^{\ast}\big{\}}

f(x,z)=c^{\top}x+\min_{\hat{A}_{B}}\big{\{}\hat{q}^{\top}_{B}\hat{A}_{B}^{-1}(Tx+z)\;|\;\hat{A}_{B}^{-1}(Tx+z)\geq 0,\;\hat{A}_{B}\in\mathcal{A}^{\ast}\big{\}}

R (\hat{A}_{B}) := {(x, z) \in F ∣ \hat{A}_{B}^{- 1} (T x + z) \geq 0, c^{⊤} x + \overset{q}{^}_{B}^{⊤} \hat{A}_{B}^{- 1} (T x + z) = f (x, z)} .

R (\hat{A}_{B}) := {(x, z) \in F ∣ \hat{A}_{B}^{- 1} (T x + z) \geq 0, c^{⊤} x + \overset{q}{^}_{B}^{⊤} \hat{A}_{B}^{- 1} (T x + z) = f (x, z)} .

N_{x_{0}} = F_{x_{0}} \cup \hat{A}_{B} \in A^{*} ⋃ Z_{x_{0}} (\hat{A}_{B}) ∖ int Z_{x_{0}} (\hat{A}_{B}) \cup \hat{A}_{B}, \hat{A}_{B^{'}} \in A^{*} : \overset{q}{^}_{B}^{⊤} \hat{A}_{B}^{- 1} \neq = \overset{q}{^}_{B^{'}}^{⊤} \hat{A}_{B^{'}}^{- 1} ⋃ V_{x_{0}} (\hat{A}_{B}, \hat{A}_{B^{'}})

N_{x_{0}} = F_{x_{0}} \cup \hat{A}_{B} \in A^{*} ⋃ Z_{x_{0}} (\hat{A}_{B}) ∖ int Z_{x_{0}} (\hat{A}_{B}) \cup \hat{A}_{B}, \hat{A}_{B^{'}} \in A^{*} : \overset{q}{^}_{B}^{⊤} \hat{A}_{B}^{- 1} \neq = \overset{q}{^}_{B^{'}}^{⊤} \hat{A}_{B^{'}}^{- 1} ⋃ V_{x_{0}} (\hat{A}_{B}, \hat{A}_{B^{'}})

F_{x_{0}}

F_{x_{0}}

Z_{x_{0}} (\hat{A}_{B})

V_{x_{0}} (\hat{A}_{B}, \hat{A}_{B^{'}})

\nabla_{x} f (x_{0}, z_{0}) \in {c^{⊤} + \overset{q}{^}_{B}^{⊤} \hat{A}_{B}^{- 1} T ∣ \hat{A}_{B} \in A^{*}} .

\nabla_{x} f (x_{0}, z_{0}) \in {c^{⊤} + \overset{q}{^}_{B}^{⊤} \hat{A}_{B}^{- 1} T ∣ \hat{A}_{B} \in A^{*}} .

F \subseteq \hat{A}_{B} \in A^{*} ⋃ R (\hat{A}_{B}) .

F \subseteq \hat{A}_{B} \in A^{*} ⋃ R (\hat{A}_{B}) .

(x_{0}, z_{0}) \in i = 1, \dots, k ⋂ R (\hat{A}_{B^{i}}) \cap int i = 1, \dots, k ⋃ R (\hat{A}_{B^{i}}) .

(x_{0}, z_{0}) \in i = 1, \dots, k ⋂ R (\hat{A}_{B^{i}}) \cap int i = 1, \dots, k ⋃ R (\hat{A}_{B^{i}}) .

\overset{q}{^}_{B^{1}}^{⊤} \hat{A}_{B^{1}}^{- 1} (T x_{0} + z_{0}) = \dots = \overset{q}{^}_{B^{k}}^{⊤} \hat{A}_{B^{k}}^{- 1} (T x_{0} + z_{0}),

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Optimization and Variational Analysis · Optimization and Mathematical Programming

Full text

Risk-Averse Models in Bilevel Stochastic Linear Programming

J. Burtscheidt 111Faculty of Mathematics, University of Duisburg-Essen, Campus Essen, Thea-Leymann-Str. 9, D-45127 Essen, Germany, [johanna.burtscheidt][matthias.claus]@uni-due.de

M. Claus 111Faculty of Mathematics, University of Duisburg-Essen, Campus Essen, Thea-Leymann-Str. 9, D-45127 Essen, Germany, [johanna.burtscheidt][matthias.claus]@uni-due.de

S. Dempe 222Faculty of Mathematics and Computer Science, TU Bergakademie Freiberg, Akademiestraße 6, D-09599 Freiberg, Germany, [email protected]

Abstract

We consider bilevel linear problems, where some parameters are stochastic, and the leader has to decide in a here-and-now fashion, while the follower has complete information. In this setting, the leader’s outcome can be modeled by a random variable, which we evaluate based on some law-invariant convex risk measure. A qualitative stability result under perturbations of the underlying probability distribution is presented. Moreover, for the expectation, the expected excess, and the upper semideviation, we establish Lipschitz continuity as well as sufficient conditions for differentiability. Finally, for finite discrete distributions, we reformulate the bilevel stochastic problems as standard bilevel problems and propose a regularization scheme for bilevel linear problems.

Keywords: Bilevel Stochastic Programming, Risk Measures, Differentiability, Stability, Finite Discrete Models

AMS Subject Classification: 90C15, 90C26, 90C31, 90C34, 91A65

1 Introduction

Bilevel problems arise from the interplay of two decision makers at different levels of a hierarchy. The leader decides first and passes the upper level decision on to the follower. Incorporating the leader’s decision as a parameter, the follower then solves the lower level problem reflecting his or her own goals and returns an optimal solution back to the leader. The leader’s outcome depends on both his or her decision and the solution that is returned from the lower level. In bilevel optimization, it is assumed that the leader has full information about the influence of his or her decision on the lower level problem. As the latter may have more than one solution, models typically consider the case where the follower returns either the best (optimistic model) or the worst (pessimistic model) solution with respect to the leader’s objective. The bilevel optimization problem is to find an optimal upper level decision which, even in a linear setting, results in a nonconvex, nondifferentiable and NP-hard problem (cf. [9, Chapter 3]).

The present work is on bilevel stochastic linear problems, where the realization of some random vector whose distribution does not depend on the upper level decision enters the lower level problem as an additional parameter. It is assumed that the leader has to make his or her decision without knowing the realization of the randomness, while the follower decides under full information. This setting encapsulates two-stage stochastic programming with linear recourse as the special case, where the upper and lower level objective functions coincide.

In classical two-stage stochastic programming, the upper level objective function gives rise to a family of random variables defined by the optimal value function of the recourse problem. In contrast, the arising random variables in optimistic bilevel stochastic programming models depend on the optimal value of a problem where only optimal solutions of the lower level problem are feasible and the decision is made by a different actor. This is a crucial difference that entails a loss of convexity and poses additional challenges.

Nevertheless, bilevel stochastic problems are of great relevance for practical applications and have been discussed in the context of pricing of electricity swing options ([25]), economics ([6]), supply chain planning ([39]), telecommunications ([43]) and general agency problems ([18]). Other works focus on solution methods ([5]), bilevel stochastic problems with Knapsack constraints ([24]) and SMPECs ([29]).

In [21], Ivanov examines bilevel stochastic linear problems with uncertainty in the right-hand side of the lower level problem and utilizes the Value-at-Risk to rank the arising random variables. The results include continuity of the objective function, the existence of a solution, and equivalence to a mixed-integer linear program, if the underlying distribution is finite discrete. The latter result has been extended to the fully random case in [12].

In the present work, we rank the random variables arising from right-hand side uncertainty in the lower level by law-invariant risk measures. In particular, we consider the expectation, the expected excess over a fixed target level, the mean upper semideviation and the Conditional Value-at-Risk and establish Lipschitz continuity of the resulting objective function.

It is well known that stochastic programming models may be smoother than their underlying deterministic counterparts. For instance, for a class of stochastic Stackelberg games employing the expectation, differentiability has been derived in [8]. Overcoming additional challenges arising from nondifferentiable integrands, we establish continuous differentiability for bilevel stochastic linear problems using the expectation, the expected excess or the mean upper semideviation.

Incomplete information or the need for computational efficiency may lead to optimization models where an approximation of the true underlying distribution is employed. This motivates the analysis of the behavior of optimal values and (local) optimal solution sets under perturbations of the underlying distribution (see e.g. [30], [32] and [33] for stability analysis of related models). For bilevel stochastic linear problems, we establish a qualitative stability result that holds for all law-invariant convex risk measures.

All our results regarding finiteness, (Lipschitz) continuity, differentiability and stability cover both the optimistic and the pessimistic approach of bilevel stochastic linear programming.

For finite discrete distributions and optimistic models, we show that the risk-averse bilevel stochastic linear problems using the expectation, the expected excess or the mean upper semideviation are equivalent to standard bilevel linear problems. The resulting problems for the expectation and expected excess have at most one coupling constraint involving variables from different scenarios, which paves the way for decomposition approaches.

Finally, we show that a simplified version of the regularization scheme in [41] can be used to solve bilevel linear problems.

2 Model

Using the optimistic model, we shall consider parametric bilevel linear problems of the form

[TABLE]

where $X\subseteq\mathbb{R}^{n}$ is nonempty, $c\in\mathbb{R}^{n}$ and $q\in\mathbb{R}^{m}$ are vectors, and $\Psi:\mathbb{R}^{n}\times\mathbb{R}^{s}\rightrightarrows\mathbb{R}^{m}$ is the lower level optimal solution set mapping defined by

[TABLE]

with matrices $A\in\mathbb{R}^{s\times m}$ , $T\in\mathbb{R}^{s\times n}$ and a vector $d\in\mathbb{R}^{m}$ . A bilevel stochastic program arises if we assume that the parameter $z=Z(\omega)$ is the realization of a known random vector $Z$ defined on some probability space $(\Omega,\mathcal{F},\mathbb{P})$ . We impose an additional non-anticipativity constraint that creates the following pattern of decision and observation:

Leader decides $x$ $\rightarrow$ $z=Z(\omega)$ is revealed $\rightarrow$ Follower decides $y$ .

Throughout the analysis, we assume the stochasticity to be purely exogenous, i.e. the distribution of $Z$ to be independent of $x$ . In this setting, the leader’s decision $x$ gives rise to the random variable

[TABLE]

and the problem can be understood as picking an optimal random variable from the family $f(X,Z):=\{f(x,Z(\cdot))\;|\;x\in X\}\subseteq L^{0}(\Omega,\mathcal{F},\mathbb{P})$ . We shall rank these random variables according to some mapping $\mathcal{R}:L^{0}(\Omega,\mathcal{F},\mathbb{P})\to\mathbb{R}\cup\{\pm\infty\}=:\overline{\mathbb{R}}$ , i.e. consider the bilevel stochastic problem

[TABLE]

We shall assume that there is some $p\in[1,\infty)$ such that the restriction $\mathcal{R}|_{L^{p}(\Omega,\mathcal{F},\mathbb{P})}$ is real-valued, convex, nondecreasing w.r.t. the $\mathbb{P}$ -almost sure partial order. Furthermore, let $\mathcal{R}$ be law-invariant, i.e. $\mathcal{R}[Y]=\mathcal{R}[Y^{\prime}]$ whenever the induced Borel measures $\mathbb{P}\circ Y$ and $\mathbb{P}\circ Y^{\prime}$ coincide.

Remark 2.1.

The above assumptions are fulfilled for any law-invariant convex risk measure in the sense of [14, 17] (see also [15, 16]). However, we do not assume translation equivariance for the present analysis.

Example 2.2.

The expectation $\mathbb{E}[\cdot]$ , 2. 2.

the expected excess $\mathrm{EE}_{\eta}[\cdot]=\mathbb{E}[\max\{\cdot-\eta,0\}]$ over a fixed target level $\eta\in\mathbb{R}$ , 3. 3.

any weighted sum $\mathrm{SD}_{\rho}[\cdot]=\mathbb{E}[\cdot]+\rho\mathrm{EE}_{E[\cdot]}[\cdot]$ of the expectation and the upper semideviation with $\rho\in[0,1)$ and 4. 4.

the Conditional Value at Risk

[TABLE]

for a fixed level $\alpha\in(0,1)$ (cf. **[37, Theorem 10]**)

are law-invariant and fulfill the above assumptions (see e.g. [42], [34]). In all of the above situations $p$ can be chosen as $1$ .

3 Structural properties

In this section, we shall consider the case where $\mathcal{R}$ is given by the $\mathbb{E}$ , $\mathrm{EE}_{\eta}$ or $\mathrm{SD}_{\rho}$ and examine properties of the mapping $Q_{\mathcal{R}}:\mathbb{R}^{n}\to\overline{\mathbb{R}}$ given by

[TABLE]

First, we shall prove that the function $f$ defined above is Lipschitz continuous and hence Borel measurable.

Lemma 3.1.

Assume that $\mathrm{dom}\;f\neq\emptyset$ , then $f$ is real-valued and Lipschitz continuous on the polyhedron $F=\{(x,z)\in\mathbb{R}^{n}\times\mathbb{R}^{s}\;|\;\exists y\in\mathbb{R}^{m}:Ay\leq Tx+z\}$ .

Proof.

By [13], $\emptyset\neq\mathrm{dom}\;f\subseteq\mathrm{dom}\;\Psi$ implies $\mathrm{dom}\;\Psi=F$ . Consequently, the linear program in the definition of $f(x,z)$ is solvable for any $(x,z)\in F$ by parametric linear programming theory (see [2]). Consider any $(x,z),(x^{\prime},z^{\prime})\in F$ . Without loss of generality, assume that $f(x,z)\geq f(x^{\prime},z^{\prime})$ and let $y^{\prime}\in\Psi(x^{\prime},z^{\prime})$ be such that $f(x^{\prime},z^{\prime})=c^{\top}x^{\prime}+q^{\top}y^{\prime}$ . Following [22] we obtain

[TABLE]

for any $y\in\Psi(x,z)$ . Let $\mathbb{B}$ denote the Euclidean unit ball, then Theorem 7.1 in the Appendix yields

[TABLE]

and hence $|f(x,z)-f(x^{\prime},z^{\prime})|\;\leq\;(\|c\|+\Lambda\|q\|)\|(x,z)-(x^{\prime},z^{\prime})\|$ . ∎

Remark 3.2.

In view of Theorem 7.1 in the Appendix, the above result can be easily extended to the case of a convex quadratic lower level problem.

The next result follows directly from linear programming theory and provides verifiable conditions for $\mathrm{dom}\;f\neq\emptyset$ :

Lemma 3.3.

$\mathrm{dom}\;f\neq\emptyset$ * holds if and only if there exists $(x,z)\in\mathbb{R}^{n}\times\mathbb{R}^{s}$ such that*

$\{y\;|\;Ay\leq Tx+z\}$ * is nonempty,* 2. 2.

there is some $u\in\mathbb{R}^{s}$ satisfying $A^{\top}u=d$ and $u\leq 0$ , and 3. 3.

the function $y\mapsto q^{\top}y$ is bounded from below on $\Psi(x,z)$ .

Under these conditions,

[TABLE]

is attained for any $(x^{\prime},z^{\prime})\in F$ .

Under an appropriate moment condition, Lemma 3.1 implies finiteness and Lipschitz continuity of $Q_{\mathbb{E}}$ , $Q_{\mathrm{EE}_{\eta}}$ and $Q_{\mathrm{SD}_{\rho}}$ . Let

[TABLE]

denote the set of Borel probability measures on $\mathbb{R}^{s}$ with finite moments of order $p\in[0,\infty)$ .

Theorem 3.4.

Assume $\mathrm{dom}\;f\neq\emptyset$ and $\mu_{Z}:=\mathbb{P}\circ Z^{-1}\in\mathcal{M}^{1}_{s}$ . Then the mappings $Q_{\mathbb{E}}$ , $Q_{\mathrm{EE}_{\eta}}$ , $Q_{\mathrm{SD}_{\rho}}$ and $Q_{\mathrm{CVaR}_{\alpha}}$ are real-valued and Lipschitz continuous on

[TABLE]

for any $\eta\in\mathbb{R}$ , $\rho\in[0,1)$ and $\alpha\in(0,1)$ .

Proof.

$Q_{\mathbb{E}}$ : Let $L$ be the Lipschitz constant from Lemma 3.1. For any $z_{0}\in\mathrm{supp}\;\mu_{Z}$ and $x\in F_{Z}$ we have

[TABLE]

Furthermore,

[TABLE]

holds for any $x,x^{\prime}\in F_{Z}$ .

$Q_{\mathrm{EE}_{\eta}}$ : Invoking $\max\{f(x,z)-\eta,0\}\leq|f(x,z)|+|\eta|$ and the Lipschitz continuity of $x\mapsto\max\{f(x,z)-\eta,0\}$ on $F_{Z}$ , finiteness and Lipschitz continuity of $Q_{\mathrm{EE}_{\eta}}$ can be shown by the same arguments as for $Q_{\mathbb{E}}$ .

$Q_{\mathrm{SD}_{\rho}}$ : Finiteness and Lipschitz continuity follow from the corresponding results for $Q_{\mathbb{E}}$ and $Q_{\mathrm{EE}_{\eta}}$ .

$Q_{\mathrm{CVaR}_{\alpha}}$ : Consider the mapping $g:\mathbb{R}^{n}\to L^{0}(\Omega,\mathcal{F},\mathbb{P})$ , $g(x):=f(x,Z(\cdot))$ . By the results for $Q_{\mathbb{E}}$ , we have $g(F_{Z})\subseteq L^{1}(\Omega,\mathcal{F},\mathbb{P})$ and the restriction $g|_{F_{Z}}$ is Lipschitz continuous w.r.t. the $L^{1}$ -norm. Consequently, the composition $Q_{\mathrm{CVaR}_{\alpha}}=\mathrm{CVaR}_{\alpha}\circ g$ is finite an Lipschitz continuous on $F_{Z}$ by [35, Corollary 3.7 and the subsequent remark].

∎

Under the assumptions of Theorem 3.4, the bilevel stochastic linear problem is solvable whenever $X$ is a nonempty compact subset of $F_{Z}$ . A similar result holds for a comprehensive class of risk measure and shall be discussed in Section 4 (cf. Corollary 4.9).

We shall now focus on differentiability of $Q_{\mathcal{R}}$ . It will be convenient to reformulate $f$ as

[TABLE]

where

[TABLE]

Setting

[TABLE]

we obtain

[TABLE]

As the rows of $\hat{A}$ are linearly independent, we may consider the nonempty set

[TABLE]

of lower level base matrices. A base matrix $\hat{A}_{B}\in\mathcal{A}$ is optimal for the lower level problem for a given $(x,z)$ if it is feasible, i.e. $\hat{A}_{B}^{-1}(Tx+z)\geq 0$ , and the associated reduced cost vector $\hat{d}_{N}^{\top}-\hat{d}_{B}^{\top}\hat{A}_{B}^{-1}\hat{A}_{N}$ is nonnegative. Furthermore, for any optimal base matrix $\hat{A}_{B^{\prime}}\in\mathcal{A}$ , there exists a feasible base matrix $\hat{A}_{B}\in\mathcal{A}$ satisfying

[TABLE]

Set

[TABLE]

and assume $\mathrm{dom}\;f\neq\emptyset$ , then

[TABLE]

holds for any $(x,z)\in F$ .

Definition 3.5.

The region of stability associated with a base matrix $\hat{A}_{B}\in\mathcal{A}^{\ast}$ is the set

[TABLE]

Lemma 3.6.

Assume $\mathrm{dom}\;f\neq\emptyset$ and let $x_{0}$ be an inner point of $F_{Z}$ . Then $f(\cdot,z_{0})$ is continuously differentiable at $x_{0}$ for any $z_{0}\in\mathrm{supp}\;\mu_{Z}\setminus\mathcal{N}_{x_{0}}$ , where

[TABLE]

with

[TABLE]

Furthermore, $\mathcal{N}_{x_{0}}$ is contained in a finite union of affine hyperplanes in $\mathbb{R}^{s}$ and we have

[TABLE]

Proof.

$x_{0}\in\mathrm{int}\;F_{Z}$ and $z_{0}\in\mathrm{supp}\;\mu\setminus\mathcal{N}_{x_{0}}\subseteq\mathrm{supp}\;\mu\setminus\mathcal{F}_{x_{0}}$ imply $(x_{0},z_{0})\in\mathrm{int}\;F$ by definition. In view of (2), we have

[TABLE]

If $(x_{0},z_{0})\in\mathrm{int}\;\mathcal{R}(\hat{A}_{B})$ holds for some $\hat{A}_{B}\in\mathcal{A}^{\ast}$ , there is a neighborhood $U$ of $x_{0}$ such that $f(x,z_{0})=c^{\top}x+\hat{q}^{\top}_{B}\hat{A}_{B}^{-1}(Tx+z_{0})$ holds for all $x\in U$ . In particular, $f(\cdot,z_{0})$ is continuously differentiable at $x_{0}$ and $\nabla_{x}f(x_{0},z_{0})=c^{\top}+\hat{q}^{\top}_{B}\hat{A}_{B}^{-1}T$ .

Suppose that $(x_{0},z_{0})\notin\mathrm{int}\;\mathcal{R}(\hat{A}_{B})$ for all $\hat{A}_{B}\in\mathcal{A}^{\ast}$ . The continuity of $f$ implies that there are $k\geq 2$ pairwise different base matrices $\hat{A}_{B^{1}},\ldots,\hat{A}_{B^{k}}\in\mathcal{A}^{\ast}$ such that

[TABLE]

In particular, we have

[TABLE]

i.e. $z_{0}\in\mathcal{V}_{x_{0}}(\hat{A}_{B^{i}},\hat{A}_{B^{j}})$ for all $i,j\in\{1,\ldots,k\}$ . Thus, $z_{0}\in\mathrm{supp}\;\mu\setminus\mathcal{N}_{x_{0}}$ implies

[TABLE]

For any $i\in\{1,\ldots,k\}$ we shall consider the sets

[TABLE]

By (3) we have

[TABLE]

for all $i\in\{1,\ldots,k\}$ . Thus,

[TABLE]

We have $(x_{0},z_{0})\in\mathcal{Z}(\hat{A}_{B^{1}})$ , i.e. $z_{0}\in\mathcal{Z}_{x_{0}}(\hat{A}_{B^{1}})$ . Thus, $z_{0}\in\mathrm{supp}\;\mu\setminus\mathcal{N}_{x_{0}}$ implies

$z_{0}\in\mathrm{int}\;\mathcal{Z}_{x_{0}}(\hat{A}_{B^{1}})$ . Consequently, there is a neighborhood $W$ of $z_{0}$ such that $\hat{A}_{B^{1}}^{-1}(Tx_{0}+z)\geq 0$ for all $z\in W$ . This implies $(x_{0},z_{0})\in\mathrm{int}\;\mathcal{Z}(\hat{A}_{B^{1}})$ for continuity reasons. Hence, $(x_{0},z_{0})\in\mathrm{int}\;\mathcal{O}(\hat{A}_{B^{1}})\;\cap\;\mathrm{int}\;\mathcal{Z}(\hat{A}_{B^{1}})=\mathrm{int}\;\mathcal{R}(\hat{A}_{B^{1}})$ , which contradicts $(x_{0},z_{0})\notin\mathrm{int}\;\mathcal{R}(\hat{A}_{B})$ for all $\hat{A}_{B}\in\mathcal{A}^{\ast}$ .

It remains to show that $\mathcal{N}_{x_{0}}$ is contained in a finite union of affine hyperplanes. Suppose that $z$ is such that $Ay<Tx_{0}+z$ holds for some $y\in\mathbb{R}^{m}$ , then $(x_{0},z)\in\mathrm{int}\;F$ . Consequently,

[TABLE]

is contained in a finite union of affine hyperplanes. Similarly we have

[TABLE]

$\hat{A}_{B}\in\mathcal{A}^{\ast}$ , where $e_{i}^{\top}\hat{A}_{B}^{-1}\neq 0$ due to the regularity of $\hat{A}_{B}$ . Finally, $\mathcal{V}_{x_{0}}(\hat{A}_{B},\hat{A}_{B^{\prime}})$ is an affine hyperplane for any $\hat{A}_{B},\hat{A}_{B^{\prime}}\in\mathcal{A}^{\ast}$ satisfying $\hat{q}^{\top}_{B}\hat{A}_{B}^{-1}\neq\hat{q}^{\top}_{B^{\prime}}\hat{A}_{B^{\prime}}^{-1}$ . ∎

Theorem 3.7.

Assume $\mathrm{dom}\;f\neq\emptyset$ , $\mu_{Z}\in\mathcal{M}^{1}_{s}$ , and let $x_{0}\in\mathrm{int}\;F_{Z}$ be such that $\mu_{Z}[\mathcal{N}_{x_{0}}]=0$ . Then $Q_{\mathbb{E}}$ is continuously differentiable at $x_{0}$ and

[TABLE]

Proof.

We shall prove that Lemma 7.2 in the Appendix is applicable. First, note that condition (a) is satisfied by $\mu_{Z}[\mathcal{N}_{x_{0}}]=0$ and Lemma 3.6. Furthermore, by $x_{0}\in\mathrm{int}\;F_{Z}$ there is neighborhood $U$ of $x_{0}$ that is contained in $F_{Z}$ . In particular, $Q_{\mathbb{E}}$ is well-defined and finite by Proposition 3.4, i.e. the first part of condition (b) of Lemma 7.2 is satisfied. To see that the second part holds as well, let $L$ denote the Lipschitz constant from Lemma 3.1. Fix any $x\in U\setminus\{x_{0}\}$ and $z_{0}\in\mathrm{supp}\;\mu_{Z}\setminus\mathcal{N}_{x_{0}}$ , then

[TABLE]

follows immediately from the characterization of the derivative in Lemma 3.6. Thus, Lemma 7.2 yields the differentiability of $Q_{\mathbb{E}}$ .

We shall now prove that the derivative is indeed continuous. By construction, there exists a neighborhood $U\subseteq\mathrm{int}\;F_{Z}$ of $x_{0}$ such that $\mathcal{N}_{x}\subseteq\mathcal{N}_{x_{0}}$ holds for any $x\in U$ . Consequently, by $\mu_{Z}[\mathcal{N}_{x}]=0$ and the previous arguments, $Q_{\mathbb{E}}$ is differentiable at any $x\in U$ and we have

[TABLE]

where $D:=\{\hat{q}_{B}^{\top}\hat{A}_{B}^{-1}T\;|\;\hat{A}_{B}\in\mathcal{A}^{\ast}\}$ and

[TABLE]

By Lemma 7.3 in the Appendix, the set-valued mapping $\overline{\mathcal{W}}:\mathbb{R}^{n}\times D\rightrightarrows\mathbb{R}^{s}$ ,

[TABLE]

is outer semicontinuous. Furthermore, by the arguments used in the proof of Lemma 3.6 we obtain

[TABLE]

and thus $\mu_{Z}[\mathcal{N}_{x}]=0$ implies $\mu_{Z}[\mathcal{W}(x,\Delta)]=\mu_{Z}[\overline{\mathcal{W}}(x,\Delta)]$ .

We shall use the above representation to prove that for any $\Delta\in D$ , the mapping $M_{\Delta}:\mathbb{R}^{n}\to\mathbb{R}$ , $M_{\Delta}(x):=\mu_{Z}[\mathcal{W}(x,\Delta)]$ is continuous at $x_{0}$ . Consider any sequence $\{x_{l}\}_{l\in\mathbb{N}}\subset\mathbb{R}^{n}$ that converges to $x_{0}$ . Without loss of generality we may assume that $x_{l}\in U$ holds for all $l\in\mathbb{N}$ . We have

[TABLE]

where

[TABLE]

denotes the indicator function associated with the set $\overline{\mathcal{W}}(x_{l},\Delta)$ and the final inequality is obtained by using Fatou’s Lemma. We shall show that

[TABLE]

holds for any $z\in\mathrm{supp}\;\mu_{Z}$ . If the left-hand side in (4) equals zero, the above inequality holds because the right-hand side is nonnegative. On the other hand, $\limsup_{l\to\infty}\mathrm{1}_{\overline{\mathcal{W}}(x_{l},\Delta)}(z)=1$ implies that there is a subsequence $\{x^{\prime}_{l}\}_{l\in\mathbb{N}}$ of $\{x_{l}\}_{l\in\mathbb{N}}$ such that $z\in\overline{\mathcal{W}}(x^{\prime}_{l},\Delta)$ holds for all $l\in\mathbb{N}$ . Thus, $z\in\limsup_{l\to\infty}\overline{\mathcal{W}}(x_{l},\Delta)$ by definition and (4) is satisfied.

Invoking (4) and the previous estimates we obtain

[TABLE]

where the second inequality holds due the outer semicontinuity of $\overline{W}$ and the monotonicity of the indicator function. Consequently, $M_{\Delta}$ is upper semicontinuous at $x_{0}$ for any $\Delta\in D$ .

By $U\subseteq\mathrm{int}\;F_{Z}$ and the arguments used in the proof of Lemma (3.6),

[TABLE]

holds for any $x\in U$ . By $\mathcal{W}(x,\Delta_{1})\cap\mathcal{W}(x,\Delta_{2})=\emptyset$ for any $\Delta_{1},\Delta_{2}\in D$ satisfying $\Delta_{1}\neq\Delta_{2}$ , (5) implies

[TABLE]

for any $x\in U$ . Consequently, as $M_{\Delta}$ is upper semicontinuous at $x_{0}$ for any $\Delta\in D$ , we obtain that

[TABLE]

is representable as a sum functions that are lower semicontinuous at $x_{0}$ . Thus, $M_{\Delta}$ is continuous at $x_{0}$ for any $\Delta\in D$ , which implies the continuity of

[TABLE]

at $x_{0}$ . ∎

When working with the expected excess, the inner maximum may cause additional points of nondifferentiability.

Theorem 3.8.

Assume $\mathrm{dom}\;f\neq\emptyset$ , $\mu_{Z}\in\mathcal{M}^{1}_{s}$ , and let $x_{0}\in\mathrm{int}\;F_{Z}$ and $\eta\in\mathbb{R}$ be such that $\mu_{Z}[\mathcal{N}_{x_{0}}\cup\mathcal{L}(x_{0},\eta)]=0$ , where

[TABLE]

Then $Q_{\mathrm{EE}_{\eta}}$ is continuously differentiable at $x_{0}$ .

Proof.

Consider the mapping $g_{\eta}:\mathbb{R}^{n}\times\mathbb{R}^{s}\to\overline{\mathbb{R}}$ given by

[TABLE]

which is finite and Lipschitz continuous on $F$ by Lemma 3.1. Consider any fixed $z_{0}\in\mathrm{supp}\;\mu_{Z}\setminus\big{(}\mathcal{N}_{x_{0}}\cup\mathcal{L}(x_{0},\eta)\big{)}$ . If $f(x_{0},z_{0})\neq\eta$ , there is a neighborhood $U$ of $x_{0}$ such that either $g_{\eta}(x,z_{0})=f(x,z_{0})-\eta$ for all $x\in U$ or $g_{\eta}(x,z_{0})=0$ for all $x\in U$ . In both cases $g_{\eta}(\cdot,z_{0})$ is continuously differentiable at $x_{0}$ by Theorem 3.7.

Now consider the case where $f(x_{0},z_{0})=\eta$ . The proof of Lemma 3.6 shows that there is some $\hat{A}_{B}\in\mathcal{A}^{\ast}$ such that $(x_{0},z_{0})\in\mathrm{int}\;\mathcal{R}(\hat{A}_{B})$ . In particular, we have $c^{\top}x_{0}+\hat{q}_{B}^{\top}\hat{A}_{B}^{-1}(Tx_{0}+z_{0})=\eta$ and $z_{0}\notin\mathcal{L}(x_{0},\eta)$ implies $\hat{q}_{B}^{\top}\hat{A}_{B}^{-1}=0$ . Thus, $\eta=0$ and there is a neighborhood $V$ of $x_{0}$ such that $g_{\eta}(x,z_{0})=\max\{\hat{q}_{B}^{\top}\hat{A}_{B}^{-1}(Tx+z_{0}),0\}=0$ for all $x\in V$ . Hence, $g_{\eta}(\cdot,z_{0})$ is continuously differentiable at $x_{0}$ .

Invoking Lemma 7.2 and the above considerations, the differentiability of $Q_{\mathrm{EE}_{\eta}}$ and the continuity of

[TABLE]

at $x_{0}$ can be shown by a straightforward extension of the arguments used in the proof of Theorem 3.7. ∎

Theorem 3.9.

Assume $\mathrm{dom}\;f\neq\emptyset$ , $\mu_{Z}\in\mathcal{M}^{1}_{s}$ , and let $x_{0}\in\mathrm{int}\;F_{Z}$ be such that $Q_{\mathbb{E}}(x_{0})\neq 0$ and $\mu_{Z}[\mathcal{N}_{x_{0}}\cup\mathcal{L}(x_{0},Q_{\mathbb{E}}(x_{0}))]=0$ . Then $Q_{\mathrm{SD}_{\rho}}$ is continuously differentiable at $x_{0}$ for any $\rho\in[0,1)$ .

Proof.

Fix any $p\in[0,1)$ . By Theorem 3.7 and the definition of $Q_{\mathrm{SD}_{\rho}}$ it is sufficient to show differentiability of the mapping $x\mapsto Q_{\mathrm{EE}_{Q_{\mathbb{E}}(x)}}(x)$ . Consider the function $g:\mathbb{R}^{n}\times\mathbb{R}^{s}\to\overline{\mathbb{R}}$ defined by

[TABLE]

which is finite and Lipschitz continuous on $F$ by Lemma 3.1 and Theorem 3.4. Fix any $z_{0}\in\mathrm{supp}\;\mu_{Z}\setminus\big{(}\mathcal{N}_{x_{0}}\cup\mathcal{L}(x_{0},Q_{\mathbb{E}}(x_{0}))\big{)}$ and suppose that $f(x_{0},z_{0})=Q_{\mathbb{E}}(x_{0})$ . By the proof of Lemma 3.6 there is some $\hat{A}_{B}\in\mathcal{A}^{\ast}$ such that $(x_{0},z_{0})\in\mathrm{int}\;\mathcal{R}(\hat{A}_{B})$ . In particular, we have $\hat{q}_{B}^{\top}\hat{A}_{B}^{-1}(Tx_{0}+z_{0})=Q_{\mathbb{E}}(x_{0})$ and $z_{0}\notin\mathcal{L}(x_{0},Q_{\mathbb{E}}(x_{0}))$ implies $\hat{q}_{B}^{\top}\hat{A}_{B}^{-1}=0$ . Hence, $Q_{\mathbb{E}}(x_{0})=f(x_{0},z_{0})=0$ , which contradicts the assumptions.

Thus, $f(x_{0},z_{0})\neq Q_{\mathbb{E}}(x_{0})$ and there is a neighborhood $U$ of $x_{0}$ such that either $g(x,z_{0})=f(x,z_{0})-Q_{\mathbb{E}}(x_{0})$ for all $x\in U$ or $g(x,z_{0})=0$ for all $x\in U$ . In both cases $g(\cdot,z_{0})$ is continuously differentiable at $x_{0}$ by Theorem 3.7.

Consequently, the differentiability of $Q_{\mathrm{SD}_{\rho}}$ and the continuity of

[TABLE]

at $x_{0}$ can be shown by a straightforward extension of the arguments used in the proof of Theorem 3.7. ∎

Corollary 3.10.

Assume $\mathrm{dom}\;f\neq\emptyset$ and that $\mu_{Z}\in\mathcal{M}^{1}_{s}$ is absolutely continuous with respect to the Lebesgue measure. Fix any $\eta\in\mathbb{R}$ , then $Q_{\mathbb{E}}$ and $Q_{\mathrm{EE}_{\eta}}$ are continuously differentiable at any $x_{0}\in\mathrm{int}\;F_{Z}$ . Furthermore, for any $\rho\in[0,1)$ , $Q_{\mathrm{SD}_{\rho}}$ is continuously differentiable at any $x_{0}\in\mathrm{int}\;F_{Z}$ satisfying $Q_{\mathbb{E}}(x_{0})\neq 0$ .

Proof.

$Q_{\mathbb{E}}$ : Since $\mathcal{N}_{x_{0}}$ is a finite union of affine hyperplanes, i.e. a set with Lebesgue measure zero, $\mu_{Z}[\mathcal{N}_{x_{0}}]=0$ holds for all $x_{0}\in\mathrm{int}\;F_{Z}$ and the statement is a direct consequence of Theorem 3.7.

$Q_{\mathrm{EE}_{\eta}}$ : By definition, $\mathcal{L}(x_{0},\eta)$ is a finite union of affine hyperplanes, which implies $\mu_{Z}[\mathcal{N}_{x_{0}}\cup\mathcal{L}(x_{0},\eta)]=0$ for any $x_{0}\in\mathrm{int}\;F_{Z}$ and Theorem 3.8 is applicable.

$Q_{\mathrm{SD}_{\rho}}$ : For any fixed $x_{0}$ , $\mathcal{L}(x_{0},Q_{\mathbb{E}}(x_{0}))$ is a set of Lebesgue measure zero and the statement follows from Theorem 3.9. ∎

The previous results give sufficient conditions for differentiability of the objective function of problem (1). In the presence of differentiability, necessary optimality can be formulated in terms of directional derivatives.

Proposition 3.11.

Assume $\mathrm{dom}\;f\neq\emptyset$ , $\mu_{Z}\in\mathcal{M}^{p}_{s}$ , and $X\subseteq F_{Z}$ . Furthermore, let $x_{0}\in X$ be a local minimizer of problem (1) and assume that $Q_{\mathcal{R}}$ is differentiable at $x_{0}$ . Then

[TABLE]

holds for any feasible direction

[TABLE]

Proof.

$\mathrm{dom}\;f\neq\emptyset$ , $\mu_{Z}\in\mathcal{M}^{p}_{s}$ and $X\subseteq F_{Z}$ imply that $Q_{\mathcal{R}}$ is real-valued on $X$ by Corollary 4.9 below. For a proof of the necessity of (6) we refer to [3, Proposition 2.1.2]. ∎

Corollary 3.12.

Assume $\mathrm{dom}\;f\neq\emptyset$ , $X\subseteq F_{Z}$ , and let $\mu_{Z}\in\mathcal{M}^{1}_{s}$ be absolutely continuous with respect to the Lebesgue measure. Furthermore, assume

[TABLE]

then any local minimizer of

[TABLE]

is an element of $X\setminus\mathrm{int}\;X$ .

Proof.

Suppose that $x_{0}\in\mathrm{int}\;X$ is a local minimizer of (1), then Corollary 3.10 and Proposition 3.11 yield $0=Q_{\mathbb{E}}^{\prime}(x_{0})$ as $\mathcal{D}(x_{0},X)=\mathbb{R}^{n}$ . Invoking the proof of Theorem 3.7 we have

[TABLE]

and thus $-c^{\top}\in\mathrm{conv}\;D$ , which contradicts the assumptions. ∎

4 A stability result for bilevel stochastic linear problems

The aim of this section is to establish a qualitative stability result for the bilevel stochastic linear problem (1) with respect to perturbations of the underlying probability measure. Taking into account that the support of the perturbed measure may differ from the original support, we shall assume that $X\times\mathbb{R}^{s}\subseteq F$ to ensure that the objective function of (1) remains well defined.

Throughout this section, we shall consider the general case where $\mathcal{R}$ is law-invariant and there exists some $p\geq 1$ such that the restriction $\mathcal{R}|_{L^{p}(\Omega,\mathcal{F},\mathbb{P})}$ is a real-valued convex risk measure. Furthermore, for the sake of notational simplicity, we assume that the probability space $(\Omega,\mathcal{F},\mathbb{P})$ is atomless (cf. Remark 4.1 below). Then for any $x\in X$ and $\mu\in\mathcal{M}^{p}_{s}$ , Lemma 3.1 implies $(\delta_{x}\otimes\mu)\circ f^{-1}\in\mathcal{M}^{p}_{1}$ and the atomlessness ensures that there exists some $Y_{(x,\mu)}\in L^{p}(\Omega,\mathcal{F},\mathbb{P})$ such that $\mathbb{P}\circ Y_{(x,\mu)}^{-1}=(\delta_{x}\otimes\mu)\circ f^{-1}$ . Thus, we may consider the mapping $\mathcal{Q}_{\mathcal{R}}:X\times\mathcal{M}^{p}_{s}\to\mathbb{R}$ ,

[TABLE]

Note that the specific choice of $Y_{(x,\mu)}$ does not matter due to the law-invariance of $\mathcal{R}$ .

Remark 4.1.

The assumption that $(\Omega,\mathcal{F},\mathbb{P})$ is atomless does not entail a loss of generality: We may just fix an arbitrary atomless probability space $(\overline{\Omega},\overline{\mathcal{F}},\overline{\mathcal{P}})$ , consider a law-invariant convex risk-measure $\overline{\mathcal{R}}:L^{p}(\overline{\Omega},\overline{\mathcal{F}},\overline{\mathcal{P}})\to\mathbb{R}$ and define an the restriction $\mathcal{R}|_{L^{p}(\Omega,\mathcal{F},\mathbb{P})}$ via $\mathcal{R}[Y]=\overline{\mathcal{R}}[\overline{Y}]$ , where $\overline{Y}$ is an arbitrary random variable in $L^{p}(\overline{\Omega},\overline{\mathcal{F}},\overline{\mathcal{P}})$ satisfying $\overline{\mathbb{P}}\circ\overline{Y}^{-1}=\mathbb{P}\circ Y^{-1}$ .

Consider the parametric optimization problem

[TABLE]

As ( $\mathrm{P}_{\mu}$ ) may be nonconvex, we shall pay special attention to sets of local optimal solutions. For any open set $V\subseteq\mathbb{R}^{n}$ we introduce the optimal value function $\varphi_{V}:\mathcal{M}^{p}_{s}\to\overline{\mathbb{R}}$ ,

[TABLE]

as well as the localized optimal solution set mapping $\phi_{V}:\mathcal{M}^{p}_{s}\rightrightarrows\mathbb{R}^{n}$ ,

[TABLE]

It is well known that additional assumptions are needed when studying stability of local solutions.

Definition 4.2.

Given $\mu\in\mathcal{M}^{p}_{s}$ and an open set $V\subseteq\mathbb{R}^{n}$ , $\phi_{V}(\mu)$ is called a complete local minimizing (CLM) set of ( $\mathrm{P}_{\mu}$ ) w.r.t. $V$ if $\emptyset\neq\phi_{V}(\mu)\subseteq V$ .

Remark 4.3.

The set of global optimal solutions $\phi_{\mathbb{R}^{n}}(\mu)$ and any set of isolated minimizers are CLM sets. However, in general, sets of strict local minimizers may fail to be CLM sets (cf. [36]).

In the following, we shall equip $\mathcal{P}(\mathbb{R}^{s})$ with the topology of weak convergence, i.e. the topology where a sequence $\{\mu_{l}\}_{l\in\mathbb{N}}\subset\mathcal{P}(\mathbb{R}^{s})$ converges weakly to $\mu\in\mathcal{P}(\mathbb{R}^{s})$ , written $\mu_{l}\stackrel{{\scriptstyle w}}{{\rightarrow}}\mu$ , iff

[TABLE]

holds for any bounded continuous function $h:\mathbb{R}^{s}\to\mathbb{R}$ (cf. [4]). The example below shows that even $\varphi_{\mathbb{R}^{n}}$ may fail to be weakly continuous on the entire space $\mathcal{P}(\mathbb{R}^{s})$ .

Example 4.4.

The problem

[TABLE]

arises from a bilevel stochastic linear problem, where $\Psi(x,z)=\{z\}$ holds for any $(x,z)$ . Assume that $\mu=\mathbb{P}\circ Z^{-1}=\delta_{0}$ is the Dirac measure at [math]. Then the above problem can be rewritten as $\min_{x}\{x\;|\;0\leq x\leq 1\}$ and its optimal value is [math].

However, while the sequence $\mu_{l}:=(1-\frac{1}{l})\delta_{0}+\frac{1}{l}\delta_{l}$ converges weakly to $\delta_{0}$ , replacing $\mu$ with $\mu_{l}$ yields the problem

[TABLE]

whose optimal value is equal to $1$ for any $l\in\mathbb{N}$ .

In the present work, we shall follow the approach of [7] and confine the stability analysis to locally uniformly $\|\cdot\|^{p}$ -integrating sets.

Definition 4.5.

A set $\mathcal{M}\subseteq\mathcal{M}^{p}_{s}$ is said to be locally uniformly $\|\cdot\|^{p}$ -integrating iff for any $\epsilon>0$ there exists some open neighborhood $\mathcal{N}$ of $\mu$ w.r.t. the topology of weak convergence such that

[TABLE]

A detailed discussion of locally uniformly $\|\cdot\|^{p}$ -integrating sets is provided in [15], [26], [27], and [28]. The following examples demonstrate the relevance of the concept.

Example 4.6.

(a) Fix $\kappa,\epsilon>0$ . Then by [15, Corollary A.47, (c)], the set

[TABLE]

of Borel probability measures with uniformly bounded moments of order $p+\epsilon$ is locally uniformly $\|\cdot\|^{p}$ -integrating.

(b) Fix any compact set $\Xi\subset\mathbb{R}^{s}$ . By [15, Corollary A.47, (b)], the set

[TABLE]

of Borel probability measures whose support is contained in $\Xi$ is locally uniformly $\|\cdot\|^{p}$ -integrating.

(c) Any singleton $\{\mu\}\subset\mathcal{M}^{p}_{s}$ is locally uniformly $\|\cdot\|^{p}$ -integrating by [27, Lemma 5.2].

Theorem 4.7.

Assume $\mathrm{dom}\;f\neq\emptyset$ and $X\times\mathbb{R}^{s}\subseteq F$ . Let $\mathcal{M}\subseteq\mathcal{M}^{p}_{s}$ be locally uniformly $\|\cdot\|^{p}$ -integrating, then

(a)

$\mathcal{Q}_{\mathcal{R}}|_{X\times\mathcal{M}}$ * is real-valued and weakly continuous.* 2. (b)

$\varphi_{\mathbb{R}^{n}}|_{\mathcal{M}}$ * is weakly upper semicontinuous.*

In addition, assume that $\mu_{0}\in\mathcal{M}$ is such that $\phi_{V}(\mu_{0})$ is a CLM set of $P_{\mu_{0}}$ w.r.t. some open bounded set $V\subset\mathbb{R}^{n}$ . Then the following statements hold true:

(c)

$\varphi_{V}|_{\mathcal{M}}$ * is weakly continuous at $\mu_{0}$ .* 2. (d)

$\phi_{V}|_{\mathcal{M}}$ * is weakly Berge upper semicontinuous at $\mu_{0}$ , i.e. for any open set $\mathcal{O}\subseteq\mathbb{R}^{n}$ with $\phi|_{V}(\mu_{0})\subseteq\mathcal{O}$ there exists a weakly open neighborhood $\mathcal{N}$ of $\mu_{0}$ such that $\phi_{V}(\mu)\subseteq\mathcal{O}$ for all $\mu\in\mathcal{N}\cap\mathcal{M}$ .* 3. (e)

There exists some weakly open neighborhood $\mathcal{U}$ of $\mu_{0}$ such that $\phi_{V}(\mu)$ is a CLM set for ( $\mathrm{P}_{\mu}$ ) w.r.t. $V$ for any $\mu\in\mathcal{U}\cap\mathcal{M}$ .

Proof.

Fix any $x_{0}\in X$ . By Lemma 3.1, $f$ is Lipschitz continuous on $X\times\mathbb{R}^{s}$ . Thus, there exists a constant $L>0$ such that

[TABLE]

and the result follows from [7, Corollary 2.4.]. ∎

Remark 4.8.

The assumption $X\times\mathbb{R}^{s}\subseteq F$ is equivalent to $F=\mathbb{R}^{n}\times\mathbb{R}^{s}$ and holds if and only if there is some $y\in\mathbb{R}^{m}$ such that $Ay<0$ . By Gordan’s Theorem ([19]), the latter holds iff $u=0$ is the only nonnegative solution to $A^{\top}u=0$ . Under this condition, the feasible set of the lower level is full dimensional for any leader’s decision $x$ and any parameter $z$ .

If the underlying distribution is fixed, the assumptions of Theorem 4.7 (a) can be weakened significantly.

Corollary 4.9.

Assume $\mathrm{dom}\;f\neq\emptyset$ and $\mu_{Z}\in\mathcal{M}^{p}_{s}$ . Then $Q_{\mathcal{R}}$ is real-valued and continuous on $F_{Z}$ . In addition, assume that $X\subseteq F_{Z}$ is nonempty and compact, then problem (1) is solvable.

Proof.

The set $\{\mu_{Z}\}$ is locally uniformly $\|\cdot\|^{p}$ -integrating by Example 4.6. Thus, continuity of $Q_{\mathcal{R}}(\cdot)=\mathcal{Q}_{\mathcal{R}}(\cdot,\mu_{Z})$ can be established as in the proof of Theorem 4.7 (a) and the solvability of (1) is a direct consequence of the compactness of $X$ . ∎

Example 4.10.

(a) The assumptions of Section 2 are fulfilled for the expected excess of order $p\geq 1$ given by

[TABLE]

where $\eta\in\mathbb{R}$ is a fixed target level (cf. [42, Example 6.22]). Thus, the mapping $Q_{\mathrm{EE}_{\eta}^{p}}$ is continuous on $F_{Z}$ under the assumptions of Corollary 4.9.

(b) The mean upper semideviation of order $p\geq 1$ given by

[TABLE]

is a law-invariant coherent risk measure for any $\rho\in[0,1)$ by [42, Example 6.20]. Thus, Corollary 4.9 gives sufficient conditions for continuity of $Q_{\mathrm{SD}_{\rho}^{p}}$ .

Remark 4.11.

All results of Sections 3 and 4 can be easily extended to the pessimistic approach to bilevel stochastic linear optimization, where $f$ takes the form

[TABLE]

As any Borel probability measure is the weak limit of a sequence of measures having finite support, Theorem 4.7 justifies an approach where the true underlying measure is approximated by a sequence of finite discrete ones.

5 Finite discrete distributions

Throughout this section, we shall assume that the underlying random vector $Z$ is discrete with a finite number of realizations $Z_{1},\ldots,Z_{K}\in\mathbb{R}^{s}$ and respective probabilities $\pi_{1},\ldots,\pi_{K}\in(0,1]$ . Let $I$ denote the index set $\{1,\ldots,K\}$ , then $F_{Z}$ takes the form

[TABLE]

Suppose that $x_{0}\in X$ is such that $\{y\in\mathbb{R}^{m}\;|\;Ay\leq Tx_{0}+Z_{k}\}=\emptyset$ holds for some $k\in I$ . Then the probability of $f(x_{0},Z(\omega))=\infty$ is a least $\pi_{k}>0$ , i.e. $x_{0}$ should be considered as infeasible for problem (1). Consequently, $X\subseteq F_{Z}$ can be understood as an induced constraint. Note that $X\cap F_{Z}$ is a polyhedron if $X$ is a polyhedron.

We shall show that for models involving the expectation, the expected excess or the mean upper semideviation, problem (1) can be reduced to a standard bilevel linear program.

Theorem 5.1 (Expectation).

Assume $\mathrm{dom}\;f\neq\emptyset$ and let $X\subseteq F_{Z}$ be a polyhedron, then the risk neutral bilevel stochastic linear problem

[TABLE]

is equivalent to the optimistic bilevel linear program

[TABLE]

where $\Psi_{\mathbb{E}}:\mathbb{R}^{n}\rightrightarrows\mathbb{R}^{Km}$ is given by

[TABLE]

Proof.

We have

[TABLE]

and the result follows from $\Psi_{\mathbb{E}}(x)=\Psi(x,Z_{1})\times\ldots\times\Psi(x,Z_{K})$ . ∎

Remark 5.2.

The proof of Theorem 5.1 shows that the inner minimization problem in (8) can be decomposed into $K$ problems of similar structure.

Theorem 5.3 (Expected excess).

Assume $\mathrm{dom}\;f\neq\emptyset$ and let $X\subseteq F_{Z}$ be a polyhedron, then for any $\eta\in\mathbb{R}$ , the risk-averse bilevel stochastic linear problem

[TABLE]

is equivalent to the optimistic bilevel linear program

[TABLE]

where $\Psi_{\mathrm{EE}_{\eta}}:\mathbb{R}^{n}\rightrightarrows\mathbb{R}^{Km+K}$ is given by

[TABLE]

Proof.

We have $Q_{\text{EE}_{\eta}}(x)=\sum_{k\in I}\pi_{k}g_{k}(x)$ , where

[TABLE]

holds for any $(x,k)\in X\times I$ . Thus,

[TABLE]

which completes the proof. ∎

Remark 5.4.

Let $\Psi_{\mathrm{EE}_{\eta,k}}:X\rightrightarrows\mathbb{R}^{m+1}$ be given by

[TABLE]

then $\Psi_{\mathrm{EE}_{\eta}}(x)$ admits the representation

[TABLE]

Thus, the inner minimization problem in (9) decomposes into $K$ problems of similar structure.

Theorem 5.5 (Mean upper semideviation).

Assume $\mathrm{dom}\;f\neq\emptyset$ and let $X\subseteq F_{Z}$ be a polyhedron, then for any $\rho\in[0,1)$ , the risk-averse bilevel stochastic linear problem

[TABLE]

is equivalent to the optimistic bilevel linear program

[TABLE]

where $\Psi_{\mathrm{SD}_{\rho}}:\mathbb{R}^{n}\rightrightarrows\mathbb{R}^{Km+K}$ is given by

[TABLE]

Proof.

By

[TABLE]

and the representation of $Q_{\mathbb{E}}$ that was established in the proof of Theorem 5.1, we have

[TABLE]

which completes the proof. ∎

Remark 5.6.

The inner minimization problem in (10) does not decompose scenariowise due to the $K$ coupling constraints $v_{k}\geq\sum_{j\in I}\pi_{j}q^{\top}y_{j}$ for $k\in I$ in the description of $Q_{\mathrm{SD}_{\rho}}(x)$ .

Finally, we shall consider models involving the Conditional Value at Risk.

Theorem 5.7 (Conditional Value at Risk).

Assume $\mathrm{dom}\;f\neq\emptyset$ and let $X\subseteq F_{Z}$ be a polyhedron, then for any $\alpha\in(0,1)$ , the risk-averse bilevel stochastic linear problem

[TABLE]

is equivalent to

[TABLE]

Proof.

As

[TABLE]

the result follows directly from the representation of $Q_{\mathrm{EE}_{\eta}}(x)$ that was established in the proof Theorem 5.3. ∎

Remark 5.8.

Every evaluation the objective function in (11) corresponds to solving a bilevel linear problem with scalar upper level variable $\eta$ .

6 A regularization scheme for bilevel linear problems

In the setting of Theorems 5.1, 5.3 and 5.5, the risk-averse bilevel stochastic linear problem may be reformulated as a standard optimistic bilevel linear problem of the form

[TABLE]

where $\Psi:\mathbb{R}^{k}\rightrightarrows\mathbb{R}^{l}$ is given by

[TABLE]

for vectors $g\in\mathbb{R}^{k}$ , $h,t\in\mathbb{R}^{l}$ and $b\in\mathbb{R}^{r}$ , matrices $W\in\mathbb{R}^{r\times l}$ and $B\in\mathbb{R}^{r\times k}$ , and a nonempty polyhedron $U\subseteq\mathbb{R}^{k}$ .

We shall discuss a solution approach for (12) that relies on replacing it with a regularized single level problem involving the Karush-Kuhn-Tucker (KKT) conditions of the lower level problem.

Theorem 6.1 (cf. [20, Theorem 3.7], [31]).

Assume that $\mathrm{Argmin}_{w}\{t^{\top}w\;|\;w\in\Psi(u)\}$ is nonempty for any $u\in U$ . Then the following statements hold true:

(a)

The optimal values of (12) and

[TABLE]

coincide. 2. (b)

$\overline{u}$ * is a global minimizer of (12) if and only if there exists some $\overline{w}$ such that $(\overline{u},\overline{w})$ is a global minimizer of (13).* 3. (c)

$\overline{u}$ * is a local minimizer of (12) if and only if there exists some $\overline{w}$ such that $(\overline{u},\overline{w})$ is a local minimizer of (13).*

Proof.

By assumption, the mapping $\varphi_{o}:U\to\mathbb{R}$

[TABLE]

is well-defined and for any $\tilde{u}\in U$ there exists some $\tilde{w}\in\Psi(\tilde{u})$ such that $h^{\top}\tilde{w}=\varphi_{o}(\tilde{u})$ . Furthermore, $\varphi_{o}(\tilde{u})\leq h^{\top}w$ holds for any $w\in\Psi(\tilde{u})$ , which implies (a), (b) and the ”if” part of (c).

To show the ”only if” part of (c), suppose that $(\overline{u},\overline{w})$ is a local minimizer of (13). Then there exist some $\epsilon>0$ such that

[TABLE]

holds for any $(u,w)\in B_{\epsilon}(\overline{u},\overline{w})$ satisfying $u\in U$ and $w\in\Psi(u)$ . In particular, we have $g^{\top}\overline{u}+h^{\top}w\geq g^{\top}\overline{u}+h^{\top}\overline{w}$ for any $w\in\Psi(\overline{u})\cap B_{\epsilon}(\overline{w})$ , which implies that $\overline{w}$ is a local and thus global minimizer of the linear program

[TABLE]

Consider the mapping $M:U\rightrightarrows\mathbb{R}^{l}$ defined by

[TABLE]

As $\varphi_{o}$ is Lipschitz continuous by Theorem 7.1 in the Appendix, Lipschitz continuity of $M$ follows from the same result. Suppose that $\overline{u}$ is not a local minimizer of (12), then there exist a sequence $\{u_{n}\}_{n\in\mathbb{N}}$ such that and $u_{n}\in U$ and

[TABLE]

hold for any $n\in\mathbb{N}$ and we have $\lim_{n\to\infty}u_{n}=u$ . The Lipschitz continuity of $M$ and $\overline{w}\in M(\overline{u})$ imply

[TABLE]

Thus, there exists a sequence $\{w_{n}\}_{n\in\mathbb{N}}$ satisfying $\lim_{n\to\infty}w_{n}=\overline{w}$ and $w_{n}\in M(u_{n})$ for all $n\in\mathbb{N}$ . Consequently, by (15), there is some $N\in\mathbb{N}$ such that for any $n\geq N$ , we have $(u_{n},w_{n})\in B_{\epsilon}(\overline{u},\overline{w})$ and

[TABLE]

which contradicts (14). Thus, $\overline{u}$ is a local minimizer of (12). ∎

Next, we use the KKT conditions of the lower level problem to replace (13) with the single-level problem

[TABLE]

The relationship between bilevel problems and mathematical programs with complementarity constraints arising from the lower level KKT system has been investigated in [10]. In the special case of bilevel linear problems, the following holds:

Theorem 6.2 (cf. [10, Theorem 3.2]).

(a)

The optimal values of (13) and (16) coincide 2. (b)

$(\overline{u},\overline{w})$ * is a global minimizer of (13) if and only if there exists some $\overline{v}$ such that $(\overline{u},\overline{w},\overline{v})$ is a global minimizer of (16).* 3. (c)

$(\overline{u},\overline{w})$ * is a local minimizer of (13) if and only if $(\overline{u},\overline{w},\overline{v})$ is a local minimizer of (16) for any $\overline{v}\leq 0$ satisfying $W^{\top}\overline{v}=t$ and $\overline{v}^{\top}(W\overline{w}-B\overline{u}-b)=0$ .*

Proof.

As the lower level problem is linear, its KKT conditions are necessary and sufficient for optimality. Thus, we have $w\in\Psi(u)$ if and only if there exists some $v\leq 0$ such that $W^{\top}v=t$ and $v^{\top}(Ww-Bu-b)=0$ , which implies (a), (b) and the ”if” part of (c).

To show the ”only if” part of (c), let $(\overline{u},\overline{w},\overline{v})$ be a local minimizer of (16) for any $\overline{v}\leq 0$ satisfying $W^{\top}\overline{v}=t$ and $\overline{v}^{\top}(W\overline{w}-B\overline{u}-b)=0$ and suppose that $(\overline{u},\overline{w})$ is not a local minimizer of (13). Then there exist sequences $\{u_{n}\}_{n\in\mathbb{N}}\subseteq U$ and $\{w_{n}\}_{n\in\mathbb{N}}$ such that $\lim_{n\to\infty}u_{n}=\overline{u}$ , $\lim_{n\to\infty}w_{n}=\overline{w}$ and for any $n\in\mathbb{N}$ we have $w_{n}\in\Psi(u_{n})$ and

[TABLE]

As the mapping $\Lambda:\mathrm{gph}\;\Psi\rightrightarrows\mathbb{R}^{r}$ given by

[TABLE]

is outer semicontinuous by Lemma 7.3 in the Appendix, there exists some $N\in\mathbb{N}$ such that

[TABLE]

holds for all $n\geq N$ . Fix any converging sequence $\{v_{n}\}_{n\in\mathbb{N}}$ such that $v_{n}\in\Lambda(u_{n},w_{n})$ holds for any $n\in\mathbb{N}$ . By (19) we have $\overline{v}=\lim_{n\to\infty}v_{n}\in\Lambda(\overline{u},\overline{w})$ . Thus, $(\overline{u},\overline{w},\overline{v})$ is a local minimizer of (16). In particular, there exists some $\overline{N}\in\mathbb{N}$ such that $g^{\top}u_{n}+h^{\top}w_{n}\geq g^{\top}\overline{u}+h^{\top}\overline{w}$ for all $n\geq\overline{N}$ , which contradicts (17). ∎

It is known that often used regularity conditions as Mangasarian-Fromovitz constraint qualification or Slater’s constraint qualification are violated at every feasible point of (16) (cf. [40]). To overcome the difficulties related with this property, we propose to replace (16) by

[TABLE]

and solve this problem for $\varepsilon\downarrow 0$ . This approach and its use to solve general mathematical programs with equilibrium constraints has been investigated in [41]. For the special case of the bilevel linear optimization problem (13) we can prove the following result:

Theorem 6.3.

Let $(\overline{u},\overline{w},\overline{v})$ be an accumulation point of a sequence $\{(u_{n},w_{n},v_{n})\}_{n\in\mathbb{N}}$ of local minimizers of problem $\mathrm{P(}\varepsilon_{n}\mathrm{)}$ for $\varepsilon_{n}\downarrow 0$ . Then $(\overline{u},\overline{w})$ is a local minimizer of (13).

Proof.

Without loss of generality, we may assume that $\{(u_{n},w_{n},v_{n})\}_{n\in\mathbb{N}}$ converges. Suppose that $(\overline{u},\overline{w})$ is not a local minimizer of (13). Then, since $U$ is a polyhedron and $\mathrm{gph}\;\Psi$ is polyhedral (cf. [9, Theorem 3.1]), i.e. equal to the union of a finite number of polyhedra, there exist a direction $(d_{u},d_{w})\in\mathbb{R}^{k}\times\mathbb{R}^{l}$ and a sequence $\alpha_{m}\downarrow 0$ such that $\overline{u}+\alpha_{m}d_{u}\in U$ , $\overline{w}+\alpha_{m}d_{w}\in\Psi(\overline{u}+\alpha_{m}d_{u})$ and

[TABLE]

hold for any $m\in\mathbb{N}$ . As the mapping $\Lambda$ defined by (18) is outer semicontinuous, there exists a constant $N\in\mathbb{N}$ such that $\Lambda(\overline{u}+\alpha_{m}d_{u},\overline{w}+\alpha_{m}d_{w})\subseteq\Lambda(\overline{u},\overline{w})$ for any $m\geq N$ . In particular, there exists some vertex $\tilde{v}$ of $\Lambda(\overline{u},\overline{w})$ such that $\tilde{v}$ is a vertex of $\Lambda(\overline{u}+\alpha_{m}d_{u},\overline{w}+\alpha_{m}d_{w})$ for any $m\geq N$ . We shall prove that there exists some $\overline{N}\in\mathbb{N}$ such that

[TABLE]

holds for any $m,n\geq\overline{N}$ .

For any $i\in\{1,\ldots,r\}$ with $e_{i}^{\top}\tilde{v}<0$ and any $m\geq N$ , we have

[TABLE]

As $e_{i}^{\top}(W\overline{w}-B\overline{u}-b)=0$ and $\alpha_{m}>0$ , this implies $e_{i}^{\top}(Wd_{w}-Bd_{u})=0$ . Furthermore, since $(u_{n},w_{n})$ is feasible for $\mathrm{P(\varepsilon_{n})}$ , we conclude that

[TABLE]

for any $m,n\in\mathbb{N}$ .

Similarly, for any $i\in\{1,\ldots,r\}$ such that $e_{i}^{\top}\tilde{v}=0$ and $e_{i}^{\top}(W\overline{w}-B\overline{u}-b)=0$ , we obtain $e_{i}^{\top}(Wd_{w}-Bd_{u})\leq 0$ and thus

[TABLE]

for any $m,n\in\mathbb{N}$ .

Finally, for any $i\in\{1,\ldots,r\}$ such that $e_{i}^{\top}\tilde{v}=0$ and $e_{i}^{\top}(W\overline{w}-B\overline{u}-b)<0$ , the existence of some $\overline{N}\in\mathbb{N}$ such that

[TABLE]

for any $m,n\geq\overline{N}$ follows from the continuity of the mapping

[TABLE]

By the above considerations, we have

[TABLE]

and $\lim_{n\to\infty}\tilde{v}^{\top}(Ww_{n}-Bu_{n}-b)=0$ . Furthermore, as $\varepsilon_{n}\downarrow 0$ , $(u_{n},w_{n},v_{n})$ is feasible for $\mathrm{P(}\varepsilon_{n^{\prime}}\mathrm{)}$ for any $n^{\prime}\geq n$ . Thus, we may assume that

[TABLE]

holds for any $n\in\mathbb{N}$ without loss of generality. (21), (22) and (23) imply that $(u_{n}+\alpha_{m}d_{u},w_{n}+\alpha_{m}d_{w},\tilde{v})$ is feasible for $\mathrm{P(}\varepsilon_{n}\mathrm{)}$ for any $m,n\geq\overline{N}$ .

Fix $n\geq\overline{N}$ . We shall prove that for any $\lambda\in(0,1]$ , there is some $M_{\lambda}\geq\overline{N}$ such that

[TABLE]

is feasible for $\mathrm{P(}\varepsilon_{n}\mathrm{)}$ whenever $m\geq M_{\lambda}$ . As $\lim_{m\to\infty}(1-\lambda)\lambda\alpha_{m}v_{n}^{\top}\big{(}Wd_{w}-Bd_{u}\big{)}=0$ , there exists some $M_{\lambda}\geq\overline{N}$ such that

[TABLE]

for all $m\geq M_{\lambda}$ . By (22), (23), and the feasibility of $(u_{n},w_{n},v_{n})$ for $\mathrm{P(}\varepsilon_{n}\mathrm{)}$ , we have

[TABLE]

for any $m\geq M_{\lambda}$ and feasibility follows from the linearity of the remaining restrictions.

As (21) implies $g^{\top}d_{u}+h^{\top}d_{w}<0$ ,

[TABLE]

holds for any $\lambda\in(0,1]$ and $m\geq M_{\lambda}$ , which, by

[TABLE]

yields a contradiction to the local optimality of $(u_{n},w_{n},v_{n})$ for $\mathrm{P(}\varepsilon_{n}\mathrm{)}$ . ∎

Remark 6.4.

Let $(\overline{u},\overline{w},\overline{v})$ be an accumulation point of a sequence $\{(u_{n},w_{n},v_{n})\}_{n\in\mathbb{N}}$ of global minimizers of problem $\mathrm{P(}\varepsilon_{n}\mathrm{)}$ for $\varepsilon_{n}\downarrow 0$ . Then $(\overline{u},\overline{w})$ is a global minimizer of (13) (see the ideas in the proof of Theorem 2.1 in [11] in combination with [10]).

7 Appendix

We shall recall some technical results used throughout the paper.

Theorem 7.1 ([23, Theorem 4.2]).

If $D$ positive semidefinite, the set-valued mapping $C:\mathbb{R}^{k}\rightrightarrows\mathbb{R}^{m}$ given by

[TABLE]

is Lipschitz continuous on $\mathrm{dom}\;C:=\{t\in\mathbb{R}^{k}\;|\;C(t)\neq\emptyset\}$ , i.e. there exists a constant $\Lambda>0$ such that $d_{\infty}(C(t),C(t^{\prime}))\leq\Lambda\|t-t^{\prime}\|$ holds for any $t,t^{\prime}\in\mathrm{dom}\;C$ .

The following result is a well-known direct consequence of Lebesgue’s Dominated Convergence Theorem:

Lemma 7.2.

Let $\mu$ be a Borel-probability measure on $\mathbb{R}^{s}$ , $V\subseteq\mathbb{R}^{n}\times\mathbb{R}^{s}$ open, and $g:V\to\mathbb{R}$ such that the following conditions are satisfied:

(a)

$g(\cdot,z)$ * is differentiable at $x_{0}\in V_{\mu}:=\{x\;|\;(x,z)\in V\;\forall z\in\mathrm{supp}\;\mu\}$ for $\mu$ -almost all $z\in\mathbb{R}^{s}$ and the derivative $g^{\prime}(x_{0},z)$ is measurable with respect to $z$ .* 2. (b)

There exists a neighborhood $U\subseteq V_{\mu}$ of $x_{0}$ such that

(i)

the integral $\int_{\mathbb{R}^{s}}g(x,z)\leavevmode\nobreak\ \mu(dz)$ is well-defined and finite for all $x\in U$ and 2. (ii)

there is an integrable function $m:U\to\mathbb{R}$ such that $|e(x,z)|\leq m(z)$ holds for all $x\in U\setminus\{x_{0}\}$ and $\mu$ -almost all $z\in\mathbb{R}^{s}$ , where

[TABLE]

Then $h:V_{\mu}\to\overline{\mathbb{R}}$ , $h(x)=\int_{\mathbb{R}^{s}}g(x,z)\leavevmode\nobreak\ \mu(dz)$ is differentiable at $x_{0}$ and

[TABLE]

Proof.

Set

[TABLE]

By assumption, we have $\lim_{x\to x_{0}}|e(x,z)|=0$ for $\mu$ -almost all $z\in\mathbb{R}^{s}$ and Lebesgue’s Dominated Convergence Theorem implies

[TABLE]

which completes the proof. ∎

Lemma 7.3.

Let $\mathcal{C}\subseteq\mathbb{R}^{k\times s}$ be closed, then the set-valued mapping $\mathcal{T}:\mathbb{R}^{k}\rightrightarrows\mathbb{R}^{l}$ ,

[TABLE]

is outer semicontinuous (cf. [38]), i.e. $\limsup_{t\to t_{0}}\;\mathcal{T}(t)\subseteq\mathcal{T}(t_{0})\;\forall t_{0}\in\mathbb{R}^{k}$ .

Proof.

By definition of the outer limit, $z\in\limsup_{t\to t_{0}}\;\mathcal{T}(t)$ holds if and only if there are sequences $\{t_{n}\}_{n\in\mathcal{N}}\subset\mathbb{R}^{k}$ and $\{z_{n}\}_{n\in\mathcal{N}}\subset\mathbb{R}^{s}$ such that

[TABLE]

For any such sequences we have $(t_{n},z_{n})\in\mathcal{C}$ for all $n\in\mathcal{N}$ and thus $(t_{0},z)\in\mathcal{C}$ . Consequently, $z\in\mathcal{T}(t_{0})$ . ∎

Acknowledgement

The second author thanks the Deutsche Forschungsgemeinschaft for its support via the Collaborative Research Center TRR 154.

Bibliography43

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] B. Bank, J. Guddat, D. Klatte, B. Kummer and K. Tammer, Non-Linear Parametric Optimization , Akademie-Verlag, Berlin (1982).
2[2] K. Beer, Lösung großer linearer Optimierungsaufgaben , Deutscher Verlag der Wissenschaften, Berlin (1977).
3[3] D. P. Bertsekas, Nonlinear Programming , 2nd edition, Athena Scientific, Belmont, Massachusetts (1999).
4[4] P. Billingsley, Convergence of Probability Measures , Wiley, New York (1968).
5[5] S. I. Birbil, G. Gürkan and O. Listes, Simulation-based solution of stochastic mathematical problems with complementarity constraints: Sample-path analysis , Technical report, ERIM Report Series Research in Management, ERS-2004-016-LIS (2014).
6[6] M. Carrion, J. M. Arroyo and A. J. Conejo, A bilevel stochastic programming approach for retailer futures market trading , Power Systems, IEEE Transactions on, 24(3), pp. 1446-1456 (2009).
7[7] M. Claus, V. Krätschmer and R. Schultz, Weak continuity of risk functionals with applications to stochastic programming , SIAM Journal on Optimization, 27(1), pp. 91-108 (2017).
8[8] V. De Miguel and H. Xu, A stochastic multiple-leader Stackelberg model: analysis, computation, and application , Operations Research, 57(5), pp. 1220-1235 (2009).

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Risk-Averse Models in Bilevel Stochastic Linear Programming

Abstract

1 Introduction

2 Model

Remark 2.1**.**

Example 2.2**.**

3 Structural properties

Lemma 3.1**.**

Proof.

Remark 3.2**.**

Lemma 3.3**.**

Theorem 3.4**.**

Proof.

Definition 3.5**.**

Lemma 3.6**.**

Proof.

Theorem 3.7**.**

Proof.

Theorem 3.8**.**

Proof.

Theorem 3.9**.**

Proof.

Corollary 3.10**.**

Proof.

Proposition 3.11**.**

Proof.

Corollary 3.12**.**

Proof.

4 A stability result for bilevel stochastic linear problems

Remark 4.1**.**

Definition 4.2**.**

Remark 4.3**.**

Example 4.4**.**

Definition 4.5**.**

Example 4.6**.**

Theorem 4.7**.**

Proof.

Remark 4.8**.**

Corollary 4.9**.**

Proof.

Example 4.10**.**

Remark 4.11**.**

5 Finite discrete distributions

Theorem 5.1** (Expectation).**

Proof.

Remark 5.2**.**

Theorem 5.3** (Expected excess).**

Proof.

Remark 5.4**.**

Theorem 5.5** (Mean upper semideviation).**

Proof.

Remark 5.6**.**

Theorem 5.7** (Conditional Value at Risk).**

Proof.

Remark 5.8**.**

6 A regularization scheme for bilevel linear problems

Theorem 6.1** (cf. [20, Theorem 3.7], [31]).**

Proof.

Theorem 6.2** (cf. [10, Theorem 3.2]).**

Proof.

Theorem 6.3**.**

Proof.

Remark 6.4**.**

7 Appendix

Theorem 7.1** ([23, Theorem 4.2]).**

Lemma 7.2**.**

Proof.

Lemma 7.3**.**

Proof.

Acknowledgement

Remark 2.1.

Example 2.2.

Lemma 3.1.

Remark 3.2.

Lemma 3.3.

Theorem 3.4.

Definition 3.5.

Lemma 3.6.

Theorem 3.7.

Theorem 3.8.

Theorem 3.9.

Corollary 3.10.

Proposition 3.11.

Corollary 3.12.

Remark 4.1.

Definition 4.2.

Remark 4.3.

Example 4.4.

Definition 4.5.

Example 4.6.

Theorem 4.7.

Remark 4.8.

Corollary 4.9.

Example 4.10.

Remark 4.11.

Theorem 5.1 (Expectation).

Remark 5.2.

Theorem 5.3 (Expected excess).

Remark 5.4.

Theorem 5.5 (Mean upper semideviation).

Remark 5.6.

Theorem 5.7 (Conditional Value at Risk).

Remark 5.8.

Theorem 6.1 (cf. [20, Theorem 3.7], [31]).

Theorem 6.2 (cf. [10, Theorem 3.2]).

Theorem 6.3.

Remark 6.4.

Theorem 7.1 ([23, Theorem 4.2]).

Lemma 7.2.

Lemma 7.3.