Approximation Algorithms for Distributionally Robust Stochastic   Optimization with Black-Box Distributions

Andre Linhares; Chaitanya Swamy

arXiv:1904.07381·cs.DS·October 25, 2023

Approximation Algorithms for Distributionally Robust Stochastic Optimization with Black-Box Distributions

Andre Linhares, Chaitanya Swamy

PDF

Open Access

TL;DR

This paper develops approximation algorithms for distributionally robust stochastic optimization problems where the underlying distribution is uncertain and accessed via black-box sampling, extending solutions to classic combinatorial problems.

Contribution

It introduces a framework using sample average approximation and LP-rounding to solve distributionally robust problems with black-box distributions, achieving near-optimal guarantees.

Findings

01

First approximation algorithms for distributionally robust set cover, vertex cover, edge cover, facility location, and Steiner tree.

02

Guarantees within O(1) factors of deterministic problem solutions for most cases.

03

Framework applicable to problems with uncertain distributions accessed via sampling.

Abstract

Two-stage stochastic optimization is a framework for modeling uncertainty, where we have a probability distribution over possible realizations of the data, called scenarios, and decisions are taken in two stages: we make first-stage decisions knowing only the underlying distribution and before a scenario is realized, and may take additional second-stage recourse actions after a scenario is realized. The goal is typically to minimize the total expected cost. A criticism of this model is that the underlying probability distribution is itself often imprecise! To address this, a versatile approach that has been proposed is the {\em distributionally robust 2-stage model}: given a collection of probability distributions, our goal now is to minimize the maximum expected total cost with respect to a distribution in this collection. We provide a framework for designing approximation algorithms…

Tables1

Table 1. Table 1: A summary of our results. Recall that 𝒜 ≤ k = { A ⊆ U : | A | ≤ k } subscript 𝒜 absent 𝑘 conditional-set 𝐴 𝑈 𝐴 𝑘 \mathcal{A}_{\leq k}=\{A\subseteq U:|A|\leq k\} . We have omitted the O ( ε ) 𝑂 𝜀 O(\varepsilon) terms that appear in the factors. The ℓ ∞ 𝖺𝗌𝗒𝗆 subscript superscript ℓ 𝖺𝗌𝗒𝗆 \ell^{\mathsf{asym}}_{\infty} setting does not apply to vertex cover, edge cover, and set cover. The β 𝛽 \beta -approximation for g ( x , y , A ) 𝑔 𝑥 𝑦 𝐴 g(x,y,A) is the factor β 1 β 2 subscript 𝛽 1 subscript 𝛽 2 \beta_{1}\beta_{2} in Theorem 1 . The * entries are open questions.

Problem	Wasserstein metrics					$𝑳_{\infty}$ , $𝒜 = 𝟐^{U}$
	$\frac{1}{2} L_{1}$		$ℓ_{\infty}^{𝖺𝗌𝗒𝗆}$ (see § 2)		General $𝒜$ , $ℓ$ $β$ =approx. for $g (x, y, A)$
	$𝒜 = 2^{U}$	$𝒜_{\leq k}$	$𝒜 = 2^{U}$	$𝒜_{\leq k}$	General $𝒜$ , $ℓ$ $β$ =approx. for $g (x, y, A)$
Facility location	$21.96$	$196$	$21.96$	$196$	$O (β)$	$10.98$
Vertex cover	$16$	$101.25$	–	–	$O (β)$	$8$
Edge cover	$12$	$36$	–	–	$O (β)$	$6$
Set cover	$O (\log n)$	$O (\log^{2} n)$	–	–	$O (β \log n)$	$O (\log n)$
Steiner tree	160	*	160	*	*	*

Equations164

\min_{x\in X}\quad c^{\intercal}x+\max_{q\in\mathcal{D}}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}

\min_{x\in X}\quad c^{\intercal}x+\max_{q\in\mathcal{D}}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}

g (x, y, A) := A^{'} \in A max g (x, A^{'}) - y \cdot ℓ (A, A^{'}) given a first-stage decision x \in X, scenario A \in A, y \geq 0 .

g (x, y, A) := A^{'} \in A max g (x, A^{'}) - y \cdot ℓ (A, A^{'}) given a first-stage decision x \in X, scenario A \in A, y \geq 0 .

\min_{x\in X}\quad h({\mathring{p}}\,;{x}):=c^{\intercal}x+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}

\min_{x\in X}\quad h({\mathring{p}}\,;{x}):=c^{\intercal}x+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}

x \in P min h (\overset{p}{˚}; x)

x \in P min h (\overset{p}{˚}; x)

\min_{x\in X}\ h({\mathring{p}}\,;{x}):=c^{\intercal}x+z({\mathring{p}}\,;{x}),\qquad\text{where}\ z({\mathring{p}}\,;{x})\ :=\

\min_{x\in X}\ h({\mathring{p}}\,;{x}):=c^{\intercal}x+z({\mathring{p}}\,;{x}),\qquad\text{where}\ z({\mathring{p}}\,;{x})\ :=\

max

max

A^{'} \sum γ_{A, A^{'}}

A, A^{'} \sum ℓ (A, A^{'})

γ

\displaystyle\min_{x\in X}\quad\Bigl{[}c^{\intercal}x+\underbrace{\min_{y\geq 0}\Bigl{(}ry+\max\ \Bigl{\{}\sum_{A,A^{\prime}}\gamma_{A,A^{\prime}}(g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})):\ \ \gamma\geq 0,\ \ \sum_{A^{\prime}}\gamma_{A,A^{\prime}}\leq\mathring{p}_{A}\ \ \forall A\in\mathcal{A}\Bigr{\}}\Bigr{)}}_{\text{\small{$z({\mathring{p}}\,;{x})$}}}\Bigr{]}

\displaystyle\min_{x\in X}\quad\Bigl{[}c^{\intercal}x+\underbrace{\min_{y\geq 0}\Bigl{(}ry+\max\ \Bigl{\{}\sum_{A,A^{\prime}}\gamma_{A,A^{\prime}}(g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})):\ \ \gamma\geq 0,\ \ \sum_{A^{\prime}}\gamma_{A,A^{\prime}}\leq\mathring{p}_{A}\ \ \forall A\in\mathcal{A}\Bigr{\}}\Bigr{)}}_{\text{\small{$z({\mathring{p}}\,;{x})$}}}\Bigr{]}

\displaystyle\text{which simplifies to}\quad\min_{x\in X,y\geq 0}\ \ h({\mathring{p}}\,;{x,y}):=c^{\intercal}x+ry+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}\max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}\bigr{]}.

g(0,0,A)-g(x,y,A)\leq g(0,A^{\prime})-\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}\leq\lambda c^{\intercal}x+y\cdot\ell_{\max}\leq\max\Bigl{\{}\lambda,\tfrac{\ell_{\max}}{r}\Bigr{\}}(c^{\intercal}x+ry).

g(0,0,A)-g(x,y,A)\leq g(0,A^{\prime})-\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}\leq\lambda c^{\intercal}x+y\cdot\ell_{\max}\leq\max\Bigl{\{}\lambda,\tfrac{\ell_{\max}}{r}\Bigr{\}}(c^{\intercal}x+ry).

\displaystyle z^{\mathrm{sh}}({\mathring{p}}\,;{x})\

\displaystyle z^{\mathrm{sh}}({\mathring{p}}\,;{x})\

\displaystyle z^{\mathrm{lg}}({\mathring{p}}\,;{x})\

c^{⊺} x + z^{sh} (p; x) + z^{lg} (p; 0) \leq c^{⊺} x + z^{sh} (p; x) + z^{lg} (p; x) + c^{⊺} x \leq 2 c^{⊺} x + 2 z (p; x) = 2 h (p; x),

c^{⊺} x + z^{sh} (p; x) + z^{lg} (p; 0) \leq c^{⊺} x + z^{sh} (p; x) + z^{lg} (p; x) + c^{⊺} x \leq 2 c^{⊺} x + 2 z (p; x) = 2 h (p; x),

A, A^{'} \sum γ_{A, A^{'}} g (x, A^{'}) \leq A, A^{'} \sum γ_{A, A^{'}} g (0, A^{'}) \leq A, A^{'} \sum γ_{A, A^{'}} g (x, A^{'}) + (λ c^{⊺} x) A, A^{'} \sum γ_{A, A^{'}} \leq A, A^{'} \sum γ_{A, A^{'}} g (x, A^{'}) + c^{⊺} x .

A, A^{'} \sum γ_{A, A^{'}} g (x, A^{'}) \leq A, A^{'} \sum γ_{A, A^{'}} g (0, A^{'}) \leq A, A^{'} \sum γ_{A, A^{'}} g (x, A^{'}) + (λ c^{⊺} x) A, A^{'} \sum γ_{A, A^{'}} \leq A, A^{'} \sum γ_{A, A^{'}} g (x, A^{'}) + c^{⊺} x .

x \in X min \overline{h} (\overset{p}{˚}; x) := c^{⊺} x + z^{sh} (\overset{p}{˚}; x),

x \in X min \overline{h} (\overset{p}{˚}; x) := c^{⊺} x + z^{sh} (\overset{p}{˚}; x),

\min_{x\in X,y\geq 0}\ \overline{h}({\mathring{p}}\,;{x,y}):=c^{\intercal}x+ry+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}\overline{g}(x,y,A)\bigr{]},

\min_{x\in X,y\geq 0}\ \overline{h}({\mathring{p}}\,;{x,y}):=c^{\intercal}x+ry+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}\overline{g}(x,y,A)\bigr{]},

x \in X min h (p; x) := c^{⊺} x + z (p; x),

x \in X min h (p; x) := c^{⊺} x + z (p; x),

where z (p; x) := max

where z (p; x) := max

A^{'} \sum γ_{A, A^{'}}

A, A^{'} \sum ℓ (A, A^{'})

γ

\begin{split}z({\widehat{p}}\,;{x})=\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})&=\tfrac{1}{\beta}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})+\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})\\ &\leq\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma_{A,A^{\prime}}g(x,A^{\prime})+\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime}).\end{split}

\begin{split}z({\widehat{p}}\,;{x})=\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})&=\tfrac{1}{\beta}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})+\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})\\ &\leq\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma_{A,A^{\prime}}g(x,A^{\prime})+\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime}).\end{split}

\begin{split}h({\widehat{p}}\,;{x^{\prime}})-h({\widehat{p}}\,;{x})&\geq c^{\intercal}(x^{\prime}-x)+\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma_{A,A^{\prime}}\Bigl{(}g(x^{\prime},A^{\prime})-g(x,A^{\prime})\Bigr{)}-\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})\\ &\geq c^{\intercal}(x^{\prime}-x)+\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma_{A,A^{\prime}}d^{x,A^{\prime}}\cdot(x^{\prime}-x)-\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}h({\widehat{p}}\,;{x})\\ &=d^{\intercal}(x^{\prime}-x)-\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}h({\widehat{p}}\,;{x}).\end{split}

\begin{split}h({\widehat{p}}\,;{x^{\prime}})-h({\widehat{p}}\,;{x})&\geq c^{\intercal}(x^{\prime}-x)+\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma_{A,A^{\prime}}\Bigl{(}g(x^{\prime},A^{\prime})-g(x,A^{\prime})\Bigr{)}-\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}\cdot\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma^{*}_{A,A^{\prime}}g(x,A^{\prime})\\ &\geq c^{\intercal}(x^{\prime}-x)+\quad\sum_{\mathclap{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}}\ \gamma_{A,A^{\prime}}d^{x,A^{\prime}}\cdot(x^{\prime}-x)-\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}h({\widehat{p}}\,;{x})\\ &=d^{\intercal}(x^{\prime}-x)-\Bigl{(}1-\tfrac{1}{\beta}\Bigr{)}h({\widehat{p}}\,;{x}).\end{split}

\mathsf{vol}(\mathcal{P}_{k})\leq\mathsf{vol}(E_{N})\leq e^{-N/(2m)}\,\mathsf{vol}(E_{0})\leq\Bigl{(}\tfrac{\mu V}{2}\Bigr{)}^{m}\mathsf{vol}_{m}<\mathsf{vol}(W).

\mathsf{vol}(\mathcal{P}_{k})\leq\mathsf{vol}(E_{N})\leq e^{-N/(2m)}\,\mathsf{vol}(E_{0})\leq\Bigl{(}\tfrac{\mu V}{2}\Bigr{)}^{m}\mathsf{vol}_{m}<\mathsf{vol}(W).

\widetilde{f}_{l}\leq\rho\cdot h({\widehat{p}}\,;{x^{\prime}})\leq\rho\bigl{(}h({\widehat{p}}\,;{x^{*}})+\eta\bigr{)}\leq\rho\cdot h({\widehat{p}}\,;{x^{*}})+\rho\varepsilon\cdot\mathsf{LB}\leq\rho(1+\varepsilon)h({\widehat{p}}\,;{x^{*}})

\widetilde{f}_{l}\leq\rho\cdot h({\widehat{p}}\,;{x^{\prime}})\leq\rho\bigl{(}h({\widehat{p}}\,;{x^{*}})+\eta\bigr{)}\leq\rho\cdot h({\widehat{p}}\,;{x^{*}})+\rho\varepsilon\cdot\mathsf{LB}\leq\rho(1+\varepsilon)h({\widehat{p}}\,;{x^{*}})

min

min

θ_{A}

θ, y

θ_{A} \geq g (x, \overline{A}) - y \cdot ℓ (A, \overline{A}) \geq g (x, A^{'}) / β_{1} - β_{2} y \cdot ℓ (A, A^{'}) .

θ_{A} \geq g (x, \overline{A}) - y \cdot ℓ (A, \overline{A}) \geq g (x, A^{'}) / β_{1} - β_{2} y \cdot ℓ (A, A^{'}) .

g (0, y, \emptyset) \geq g (0, A^{*}) - y \cdot ℓ (\emptyset, A^{*}) > T - T \cdot 1 = 0,

g (0, y, \emptyset) \geq g (0, A^{*}) - y \cdot ℓ (\emptyset, A^{*}) > T - T \cdot 1 = 0,

min

min

θ_{A}

θ, y

θ_{A} \geq g (x, A^{'}) - y \cdot ℓ (A, A^{'}) \forall A \in A^{sup}, A^{'} \in ϕ (A) .

θ_{A} \geq g (x, A^{'}) - y \cdot ℓ (A, A^{'}) \forall A \in A^{sup}, A^{'} \in ϕ (A) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRisk and Portfolio Optimization · Auction Theory and Applications · Complexity and Algorithms in Graphs

Full text

Approximation Algorithms for Distributionally Robust

Stochastic Optimization with Black-Box Distributions††thanks: A preliminary version [26] appeared in the Proceedings of the 51st ACM Symposium on Theory of Computing (STOC), 2019.

André Linhares {alinhare,cswamy}@uwaterloo.ca. Dept. of Combinatorics and Optimization, University of Waterloo, Waterloo, ON N2L 3G1. Supported in part by NSERC grant 327620-09 and an NSERC Discovery Accelerator Supplement award.

Chaitanya Swamy00footnotemark: 0

Abstract

Two-stage stochastic optimization is a widely used framework for modeling uncertainty, where we have a probability distribution over possible realizations of the data, called scenarios, and decisions are taken in two stages: we make first-stage decisions knowing only the underlying distribution and before a scenario is realized, and may take additional second-stage recourse actions after a scenario is realized. The goal is typically to minimize the total expected cost. A common criticism levied at this model is that the underlying probability distribution is itself often imprecise! To address this, an approach that is quite versatile and has gained popularity in the stochastic-optimization literature is the distributionally robust 2-stage model: given a collection $\mathcal{D}$ of probability distributions, our goal now is to minimize the maximum expected total cost with respect to a distribution in $\mathcal{D}$ .

There has been almost no prior work however on developing approximation algorithms for distributionally robust problems, when the underlying scenario-set is discrete, as is the case with discrete-optimization problems. We provide a framework for designing approximation algorithms in such settings when the collection $\mathcal{D}$ is a ball around a central distribution and the central distribution is accessed only via a sampling black box.

We first show that one can utilize the sample average approximation (SAA) method—solve the distributionally robust problem with an empirical estimate of the central distribution—to reduce the problem to the case where the central distribution has polynomial-size support. This follows because we argue that a distributionally robust problem can be reduced in a novel way to a standard 2-stage problem with bounded inflation factor, which enables one to use the SAA machinery developed for 2-stage problems. Complementing this, we show how to approximately solve a fractional relaxation of the SAA (i.e., polynomial-scenario central-distribution) problem. Unlike in 2-stage stochastic- or robust- optimization, this turns out to be quite challenging. We utilize the ellipsoid method in conjunction with several new ideas to show that this problem can be approximately solved provided that we have an (approximation) algorithm for a certain max-min problem that is akin to, and generalizes, the $k$ - $\max$ - $\min$ problem—find the worst-case scenario consisting of at most $k$ elements—encountered in 2-stage robust optimization. We obtain such a procedure for various discrete-optimization problems; by complementing this via LP-rounding algorithms that provide local (i.e., per-scenario) approximation guarantees, we obtain the first approximation algorithms for the distributionally robust versions of a variety of discrete-optimization problems including set cover, vertex cover, edge cover, facility location, and Steiner tree, with guarantees that are, except for set cover, within $O(1)$ -factors of the guarantees known for the deterministic version of the problem.

1 Introduction

Stochastic-optimization models capture uncertainty by modeling it via a probability distribution over a collection $\mathcal{A}$ of possible realizations of the data, called scenarios. An important and widely used model is the 2-stage recourse model, where one seeks to take actions both before and after the data has been realized (stages I and II) so as to minimize the expected total cost incurred. Many applications come under this setting. An oft-cited prototypical example is 2-stage stochastic facility location, wherein one needs to decide where to set up facilities to serve clients. The client-demand pattern is uncertain, but one does have some statistical information about the demands. One can open some facilities initially, given only the distributional information about demands; after a specific demand pattern is realized (according to this distribution), one can take additional recourse actions such as opening more facilities incurring their recourse costs. The recourse costs are usually higher than the first-stage costs, as they may entail making decisions in rapid reaction to the observed scenario (e.g., deploying resources with smaller lead time).

An issue with the above 2-stage model, which is a common source of criticism, is that the distribution modeling the uncertainty is itself often imprecise! Usually, one models the distribution to be statistically consistent with some historical data, so we really have a collection of distributions, and a more robust approach is to hedge against the worst possible distribution. This gives rise to the distributionally robust 2-stage model: the setup is similar to that of the 2-stage model, but we now have a collection $\mathcal{D}$ of probability distributions; our goal is to minimize the maximum expected total cost with respect to a distribution in $\mathcal{D}$ . Formally, if $X\subseteq\mathbb{R}_{+}^{m}$ is the set of first-stage actions and the cost associated with $x\in X$ is $c^{\intercal}x$ , we want to solve the following problem:

[TABLE]

where $g(x,A):=\min_{\text{second-stage actions }z^{A}}\bigl{(}\text{cost of }z^{A}\bigr{)}$ .

Distributionally robust (DR) stochastic optimization is a versatile approach dating back to [34] that has (re)gained interest recently in the Operations Research literature, where it is sometimes called data-driven or ambiguous stochastic optimization (see, e.g., [13, 2, 29, 9] and their references). The DR 2-stage model also serves to nicely interpolate between the extremes of: (a) 2-stage stochastic optimization, which optimistically assumes that one knows the underlying distribution $p$ precisely (i.e., $\mathcal{D}=\{p\}$ ); and (b) 2-stage robust optimization, which abandons the distributional view and seeks to minimize the maximum cost incurred in a scenario, thereby adopting the overly cautious approach of being robust against every possible scenario regardless of how likely it is for a scenario to materialize; this can be captured by letting $\mathcal{D}=\{\text{all distributions over$ \mathcal{A} $}\}$ , where $\mathcal{A}$ is the scenario-collection in the 2-stage robust problem. Both extremes can lead to suboptimal decisions: with stochastic optimization, the optimal solution for a specific distribution $p$ could be quite suboptimal even for a “nearby” distribution $q$ ;111There are examples where $\|q-p\|_{1}\leq\varepsilon$ but an optimal solution for $p$ can be arbitrarily bad when evaluated under $q$ . with robust optimization, the presence of a single scenario, however unlikely, may force certain decisions that are undesirable for all other scenarios.

Despite its modeling benefits and popularity, to our knowledge, there has been almost no prior work on developing approximation algorithms for DR 2-stage discrete-optimization, and, more generally, for DR 2-stage problems with a discrete underlying scenario set (as is the case in discrete optimization). (The exception is [1], which we discuss in Section 1.2.222Peripherally related is [40], who consider a version of DR facility location, where the uncertainty only influences the costs and not the constraints, which yields a much-simpler and more restrictive model.)

1.1 Our contributions

We initiate a systematic study of distributionally robust discrete 2-stage problems from the perspective of approximation algorithms. We develop a general framework for designing approximation algorithms for these problems, when the collection $\mathcal{D}$ is a ball around a central distribution $\mathring{p}$ in the $L_{\infty}$ metric, $\frac{1}{2}L_{1}$ metric (total-variation distance), or Wasserstein metric (defined below). (Note that this still allows interpolating between stochastic and robust optimization.) We make no assumptions about $\mathring{p}$ ; it could have exponential-size support, and our only means of accessing $\mathring{p}$ is via a sampling black box.333The DR problem remains challenging even if $\mathring{p}$ has polynomial-size support, but $|\mathcal{A}|$ is exponential. We view sampling from the black box as an elementary operation, so our running time bounds also imply sample-complexity bounds. Settings where $\mathcal{D}$ is a ball in some probability metric arise naturally when one tries to infer a scenario distribution from observed data (see, e.g. [8, 9, 41])—hence, the moniker data-driven optimization—and it has been argued that defining $\mathcal{D}$ using the Wasserstein metric has various benefits [9, 41, 13, 29].

We view the frameworks that we develop for DR discrete 2-stage problems as our chief contribution, and the techniques that we devise for dealing with Wasserstein metrics as the main feature of our work (see Theorem 1 below). We demonstrate the utility of our frameworks by using them to obtain the first approximation guarantees for the distributionally robust versions of various discrete-optimization problems such as set cover, vertex cover, edge cover, facility location, and Steiner tree. The guarantees that we obtain are, in most cases, within $O(1)$ -factors of the guarantees known for the deterministic (and 2-stage-{stochastic, robust}) counterpart of the problem (see Table 1).

Formal model description.

We study the following distributionally robust 2-stage model. We are given an underlying set $\mathcal{A}$ of scenarios, and a ball $\mathcal{D}=\{q:L(\mathring{p},q)\leq r\}$ of distributions around a central distribution $\mathring{p}$ over $\mathcal{A}$ under some metric $L$ on probability distributions. We can take first-stage actions $x\in X\subseteq\mathbb{R}_{+}^{m}$ before a scenario is realized, incurring a first-stage cost $c^{\intercal}x$ , and second-stage recourse actions $z^{A}$ after a scenario $A\in\mathcal{A}$ is realized; the combination $(x,z^{A})$ of first- and second-stage actions for a scenario $A$ must yield a feasible solution for each scenario $A$ . Using $A\sim q$ to denote that scenario $A$ is drawn according to distribution $q$ , we want to solve: $\min_{x\in X}\ \bigl{(}c^{\intercal}x+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\text{cost of$ z^{A} $}\bigr{]}\bigr{)}$ .

We use $\mathcal{I}$ to denote the input size, which always measures the encoding size of the underlying deterministic problem, along with the first- and second-stage costs and the radius $r$ of the ball $\mathcal{D}$ . It is standard in the study of 2-stage problems in the CS literature to assume that every first-stage action has a corresponding recourse action (e.g., facilities may be opened in either stage). We use $\lambda\geq 1$ to denote an inflation parameter that measures the maximum factor by which the cost of a first-stage action increases in the second stage. We consider the cases where $L$ is the $L_{\infty}$ metric, $\|p-q\|_{\infty}:=\max_{A\in\mathcal{A}}|p_{A}-q_{A}|$ ; $\frac{1}{2}L_{1}$ metric, $\frac{1}{2}\|p-q\|_{1}:=\frac{1}{2}\sum_{A\in\mathcal{A}}|p_{A}-q_{A}|$ , which is the total-variation distance; or a Wasserstein metric.

To motivate and define the rich class of Wasserstein metrics, note that while the choice of $L$ is a problem-dependent modeling decision, we would like the ball $\mathcal{D}$ to contain other “reasonably similar” distributions, and exclude completely unrelated distributions, as the latter could lead to overly-conservative decisions, à la robust optimization. One way of measuring the similarity between two distributions is to see if they they spread their probability mass on “similar” scenarios. Wasserstein metrics capture this viewpoint crisply, and lift an underlying scenario metric $\ell$ to a metric on distributions over scenarios. The Wasserstein distance between two distributions $p$ and $q$ is the minimal cost of moving probability mass to transform $p$ into $q$ , where the cost of moving $\gamma_{A,A^{\prime}}$ mass from scenario $A$ to scenario $A^{\prime}$ is $\gamma_{A,A^{\prime}}\ell(A,A^{\prime})$ . (Observe that $\frac{1}{2}L_{1}$ is the Wasserstein metric with respect to the discrete scenario metric: $\ell^{\mathsf{dis}}(A,A^{\prime})=1$ if $A\neq A^{\prime}$ , and [math] otherwise.)

Example: DR 2-stage facility location ( $\mathsf{DRSFL}$ ). As a concrete example, consider the DR version of 2-stage facility location. We have a metric space $\bigl{(}\mathcal{F}\cup\mathcal{C},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}}\bigr{)}$ , where $\mathcal{F}$ is a set of facilities, and $\mathcal{C}$ is a set of clients. A scenario is a subset of $\mathcal{C}$ indicating the set of clients that need to be served in that scenario. (We can model integer demands by creating co-located clients.) We may open a facility $i\in\mathcal{F}$ in stages I or II, incurring costs of $f_{i}$ and $f^{\mathrm{II}}_{i}$ respectively. In scenario $A$ , we need to assign every $j\in A$ to a facility $i^{A}(j)$ opened in stage I or in scenario $A$ ; the second-stage cost of scenario $A$ is $\sum_{i\text{ opened in scenario }A}f^{\mathrm{II}}_{i}+\sum_{j\in A}c_{i^{A}(j)j}$ . The goal is to minimize $\sum_{i\text{ opened in stage I}}f_{i}+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\text{second-stage cost of }A\bigr{]}$ . Here $\lambda:=\max\{1,\max_{i\in\mathcal{F}}f^{\mathrm{II}}_{i}/f_{i}\}$ , and $\mathcal{I}$ is the encoding size of $\bigl{(}\mathcal{F},\mathcal{C},w,f,f^{\mathrm{II}},r\bigr{)}$ .

We consider two common choices for $\mathcal{A}$ : (a) the unrestricted setting: $\mathcal{A}:=2^{\mathcal{C}}$ , which is the usual setting in 2-stage stochastic optimization; and (b) the $k$ -bounded setting: $\mathcal{A}=\mathcal{A}_{\leq k}:=\{A\subseteq\mathcal{C}:|A|\leq k\}$ , which is the usual setup in 2-stage robust optimization for modeling an exponential number of scenarios [11, 23, 17]. These two settings for $\mathcal{A}$ arise for other problems as well (where $\mathcal{C}$ is a suitable ground set).

In addition to $L$ being the $L_{\infty}$ or $\frac{1}{2}L_{1}$ metrics, we can consider various ways of defining a scenario metric $\ell$ in terms of the underlying assignment-cost metric $w$ to capture that two scenarios involving demand locations in the same vicinity are deemed similar; lifting these scenario metrics to the Wasserstein metric over distributions yields a rich class of DR 2-stage facility location models. For instance, we can define the asymmetric metric $\ell^{\mathsf{asym}}_{\infty}(A,A^{\prime}):=\max_{j^{\prime}\in A^{\prime}}w(j^{\prime},A)$ , where $w(j^{\prime},A):=\min_{j\in A}w_{j^{\prime}j}$ , which measures the maximum separation between clients in $A^{\prime}$ and locations in $A$ (the resulting Wasserstein metric $L_{\mathrm{W}}$ will now be an asymmetric metric on distributions). (There are other natural scenario metrics: the asymmetric metric $\ell^{\mathsf{asym}}_{1}(A,A^{\prime}):=\sum_{j^{\prime}\in A^{\prime}}w(j^{\prime},A)$ , and the symmetrizations of these asymmetric metrics:

Our results.

Our main result pertains to Wasserstein metrics, which have a great deal of modeling power. Let $L_{\mathrm{W}}$ be the Wasserstein metric with respect to a scenario metric $\ell$ . To gain mathematical traction, it will be convenient to move to a relaxation of the DR 2-stage problem where we allow fractional second-stage decisions. Let $g(x,A)$ be the optimal second-stage cost of scenario $A$ given $x$ as the first-stage actions when we allow fractional second-stage actions. (We will obtain integral second-stage actions by rounding an optimal solution to $g(x,A)$ using an LP-relative $\alpha$ -approximation algorithm for the deterministic problem.)

We relate the approximability of the DR problem to that of known tasks in 2-stage-stochastic- and deterministic- optimization, and the following deterministic problem:

[TABLE]

Notice that $g(x,y,A)$ ties together three distinct sources of complexity in the DR 2-stage problem: the combinatorial complexity of the underlying optimization problem, captured by $g(x,A^{\prime})$ ; the complexity of the scenario set $\mathcal{A}$ ; and the complexity of the scenario metric $\ell$ , captured by the $y\cdot\ell(A,A^{\prime})$ term.

Theorem 1 (Combination of Theorems 3.5 and 3.7).

Suppose that we have the following.

(1)

A $(\beta_{1},\beta_{2})$ -approximation algorithm for computing $g(x,y,A)$ , which is an algorithm that given $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ returns $\overline{A}\in\mathcal{A}$ such that $g(x,\overline{A})-y\cdot\ell(A,\overline{A})\geq\max_{A^{\prime}\in\mathcal{A}}\bigl{(}\frac{g(x,A^{\prime})}{\beta_{1}}-\beta_{2}\cdot y\cdot\ell(A,A^{\prime})\bigr{)}$ ; 2. (2)

A local $\rho$ -approximation algorithm for the underlying 2-stage problem, which is an algorithm that rounds a fractional first-stage solution to an integral one while incurring at most a $\rho$ -factor blowup in the first-stage cost, and in the cost of each scenario; and 3. (3)

An LP-relative $\alpha$ -approximation algorithm for the underlying deterministic problem.

Then we can obtain an $O\bigl{(}\alpha\beta_{1}\beta_{2}\rho+\varepsilon)$ -approximation for the DR problem in time $\operatorname{\mathsf{poly}}\bigl{(}\text{input size},\frac{\lambda}{\varepsilon}\bigr{)}$ .

Ingredients (2) and (3) can be obtained using known results for 2-stage-stochastic- and deterministic- optimization; ingredient (1) is the new component we need to supply to instantiate Theorem 1 and obtain results for specific DR 2-stage problems. (The non-standard notion of approximation for $g(x,y,A)$ is necessary, as the mixed-sign objective precludes any guarantee under the standard notion of approximation; see Theorem 3.12.) In various settings, we show that a $(\beta_{1},\beta_{1})$ -approximation for $g(x,y,A)$ can be obtained by utilizing results for the simpler $\max$ - $\min$ problem— $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ (i.e., $g(x,0,A)$ )—encountered in 2-stage robust optimization (see the proof of Theorem 3.14 in Section 3.3.6): in the $k$ -bounded setting, where $\mathcal{A}=\mathcal{A}_{\leq k}$ , this is called the $k$ - $\max$ - $\min$ problem [11, 23, 17]. In particular, this applies to the $\frac{1}{2}L_{1}$ -metric, as in this case we have $g(x,y,A)=\max\{g(x,A),\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})-y\}$ .

Corollary 1.

Consider a DR 2-stage problem where the Wasserstein metric $L_{\mathrm{W}}$ is the $\frac{1}{2}L_{1}$ metric. Suppose that we have a $\beta$ -approximation for the problem $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ (given $x\in X$ as input), and we have ingredients (2) and (3) in Theorem 1. Then we can obtain an $O\bigl{(}\alpha\beta\rho+\varepsilon)$ -approximation for the DR problem in time $\operatorname{\mathsf{poly}}\bigl{(}\text{input size},\frac{\lambda}{\varepsilon}\bigr{)}$ .

Theorem 1 (to a partial extent) and Corollary 1 thus provide novel, useful reductions from DR 2-stage optimization to 2-stage {stochastic, robust} (and deterministic) optimization. (For instance, [15] devise approximations for the $\max$ - $\min$ problem in Corollary 1 (i.e., $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ ) for scenario sets defined by matroid-independence and/or knapsack constraints; Corollary 1 enables us to export these guarantees to the corresponding DR 2-stage problem with the $\frac{1}{2}L_{1}$ metric.) In some cases, we can improve upon the guarantees in Theorem 1. For certain covering problems, [35] showed how to obtain $\rho=2\alpha$ via a decoupling idea; by incorporating this idea within our reduction, we can improve the guarantee in Theorem 1 and obtain an $O(\beta_{1}\beta_{2}\rho+\varepsilon)$ -approximation (see “Set cover” in Section 3.3).

We demonstrate the versatility of our framework by applying Theorem 1 and these refinements to obtain guarantees for the DR versions of set cover, vertex cover, edge cover, facility location, and Steiner tree (Section 3.3). These constitute the majority of problems investigated for 2-stage optimization. Our strongest results are for facility location, vertex cover, and edge cover; for Steiner tree, we obtain results in the unrestricted setting. Table 1 summarizes these results.

Technical takeaways for DR problems with Wasserstein metrics.

The reduction in Theorem 1 is obtained by supplementing tools from 2-stage {stochastic, robust} optimization with various additional ideas. Its proof consists of two main components, both of which are of independent interest.

$\bullet$ ** Sample average approximation (SAA) for DR problems.**

In Section 3.1, we prove that a simple and appealing approach in stochastic optimization called the SAA method can be applied to reduce the DR problem to the setting where $\mathring{p}$ has a polynomial-size support. In the SAA method, we draw some $N$ samples to estimate $\mathring{p}$ by its empirical distribution $\widehat{p}$ , and solve the distributionally robust problem for $\widehat{p}$ . We show that (roughly speaking) by taking $N=\operatorname{\mathsf{poly}}\bigl{(}\text{input size},\frac{\lambda}{\varepsilon}\bigr{)}$ samples, we can ensure that a $\beta$ -approximate oracle for the SAA objective value can be combined with a $\rho$ -approximation algorithm for the SAA problem, to obtain an $O(\beta\rho+\varepsilon)$ -approximate solution to the original problem, with high probability (see Theorem 3.5). It is well known that $\Omega(\lambda)$ samples are needed even for (standard) 2-stage stochastic problems in the black-box model [35]. Our SAA result substantially expands the scope of problems for which the SAA method is known to be effective (with $\operatorname{\mathsf{poly}}(\text{input size},\lambda)$ sample size). Previously, such results were known for the special case of 2-stage stochastic problems [4, 38] (see also [24]), and multi-stage stochastic problems with a constant number of stages [38] (for $\beta,\rho=1$ ).

Proving our SAA result requires augmenting the SAA machinery for 2-stage stochastic problems [4, 38] with various new ingredients to deal with the challenges presented by DR problems. We elaborate in Section 3.1.

$\bullet$ ** Solving the polynomial-size central-distribution case.**

Complementing the above SAA result, we show how to approximately solve the DR 2-stage problem with a polynomial-size central distribution $\widehat{p}$ (Section 3.2). It is natural to move to a fractional relaxation of the problem, by replacing the first-stage set $X$ by a suitable polytope $\mathcal{P}\supseteq X$ . In stark contrast with 2-stage {stochastic, robust} optimization, where the fractional relaxation of the polynomial-scenario problem immediately gives a polynomial-size LP and is therefore straightforward to solve in polytime, it is substantially more challenging to even approximately solve the fractional DR problem with a polynomial-size central distribution. In fact, this is perhaps the technically more-challenging part of the paper. The crux of the problem is that, while $\widehat{p}$ has polynomial-size support, there are (numerous) distributions $q$ in $\mathcal{D}$ that have exponential-size support, and one needs to optimize over such distributions. In particular, if we use duality to reformulate the problem $\max_{q:L_{\mathrm{W}}(\widehat{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}$ as a minimization LP, this leads to an LP with an exponential number of both constraints and variables (see the discussion in Section 3.2). Thus, while we started with a polynomial-support central distribution, we have ended up in a situation similar to that in 2-stage stochastic or robust optimization with an exponential number of scenarios!

To surmount these obstacles, we work with the convex program $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ , and solve this approximately by leveraging the ellipsoid-based machinery in [35] (see Theorem 3.7). Not surprisingly, this poses various fresh difficulties, chiefly because we are unable to compute approximate subgradients as required by [35]. We delve into these issues, and the ideas needed to overcome them in Section 3.2.

Approximating $g(x,y,A)$ .

We use the following natural strategy: “guess” $\mu=\ell(A,A^{*})$ for the optimal $A^{*}$ , possibly within a $(1+\varepsilon)$ -factor, and solve the constrained problem ( $\Phi(x,{\mu},A)$ ): $\max_{A^{\prime}\in\mathcal{A}:\ell(A,A^{\prime})\leq\mu}g(x,A^{\prime})$ . It is easy to show that a $\beta$ -approximation to ( $\Phi(x,{\mu},A)$ ) yields a $\beta(1+\varepsilon)$ -approximation for $g(x,y,A)$ (Lemma 3.25). In the unrestricted setting ( $\mathcal{A}=2^{U}$ ), we will usually be able to solve ( $\Phi(x,{\mu},A)$ ) exactly, exploiting the fact that our problems are covering problems. In the $k$ -bounded setting, we cast ( $\Phi(x,{\mu},A)$ ) as a $k$ - $\max$ - $\min$ problem (note that $x$ is integral), and utilize known results for this problem.

For $\mathsf{DRSFL}$ , the result by [23] requires creating co-located clients, which does not work for us. We illuminate a novel connection between cost-sharing schemes and $k$ - $\max$ - $\min$ problems by showing that a cost-sharing scheme for FL having certain properties can be leveraged to obtain an approximation algorithm for $k$ - $\max$ - $\min$ {integral, fractional} FL (see the proof of Theorem 3.20). In doing so, we also end up improving the approximation factor for $k$ - $\max$ - $\min$ FL from $10$ [23] to $6$ . Whereas cost-sharing schemes have played a role in 2-stage stochastic optimization, in the context of the boosted-sampling approach of [18], they have not been used previously for $k$ - $\max$ - $\min$ problems. (The approach in [17] has some some similar elements, but there is no explicit use of cost shares.) Cost-sharing schemes offer a useful tool for designing algorithms for $k$ - $\max$ - $\min$ problems, that we believe will find further application.

DR problems with the $L_{\infty}$ metric.

For the $L_{\infty}$ metric (Section 4), we directly consider the fractional relaxation of the problem. As with the Wasserstein metric, even for a polynomial-scenario central distribution, solving the resulting problem is quite challenging since it (again) leads to an LP with exponentially many variables and constraints. We move to a proxy objective that is pointwise close to the true objective, and show that an $\omega$ -subgradient of the proxy objective can be computed efficiently at any point, even for $\omega=1/\operatorname{\mathsf{poly}}(\text{input size})$ . This enables us to use the algorithm in [35] to solve the fractional problem; rounding this solution using a local approximation algorithm yields results for the DR discrete 2-stage problem. Table 1 lists the results we obtain for the $L_{\infty}$ metric as well.

1.2 Related work

Stochastic optimization is a field with a vast amount of literature (see, e.g., [3, 31, 33]), but its study from an approximation-algorithms perspective is relatively recent. Various approximation results have been obtained in the 2-stage recourse model over the last 15 years in the CS and Operations-Research (OR) literature (see, e.g., [37]), but more general models, such as distributionally robust stochastic optimization, have received little or no attention in this regard.

To the best of our knowledge, with the exception of [1], which we discuss below, there are no prior approximation algorithms for distributionally robust 2-stage discrete optimization problems, when the number $|\mathcal{A}|$ of possible scenarios is (finite, but) exponentially large (even if $\mathring{p}$ has polynomial-size support). Much of the work in the stochastic-optimization and OR literature on these problems has focused on proving suitable duality results that sometimes allow one to reformulate the DR problem more compactly. Moreover, in many cases, the results obtained are for continuous scenario spaces and with other assumptions about the recourse costs. For instance, [9, 13, 41, 20] all consider the setting where $\mathcal{D}$ is a ball in the Wasserstein metric, and provide a closed-form description of the worst-case distribution in $\mathcal{D}$ , which is then used to reformulate the DR problem under further convexity assumptions of the scenario collection $\mathcal{A}$ . DR problems have gained attention in recent years due to their usefulness in inferring decisions from observed data while avoiding the risk of overfitting: here $\mathcal{D}$ is used to model a class of distributions from which the observed data could arise (with high confidence). Various works have advocated the use of a Wasserstein ball around the empirical distribution $\widehat{p}$ for this purpose [9, 41, 13, 29], but there are no results proving polynomial bounds on the number of samples needed in order to produce provably-good results. Note that these works, by definition, consider the setting where the central distribution has polynomial-size support. The distributionally robust setting has also been considered for chance-constrained problems; see, e.g. [8] and the references therein.

The work of [1] in the CS literature on correlation gap can be interpreted as studying distributionally robust discrete-optimization problems, but in a very different setting where $\mathcal{D}$ is not a ball. Instead, $\mathcal{D}$ is the collection of distributions that agree with some given expected values; the correlation gap quantifies the worst-case ratio of the DR objective when one chooses the optimal decisions with respect to the distribution in $\mathcal{D}$ that treats all random variables as independent, versus the optimum of the DR problem. Agrawal et al. [1] proved various $O(1)$ bounds on the correlation gap for submodular functions and subadditive functions admitting suitable cost shares. Various other works (see, e.g., [5, 30] and the references therein) have considered such moment-based collections, but again under continuity and/or convexity assumptions about the scenario space and/or recourse costs.

We now briefly survey the work on approximation algorithms under the stochastic- and robust- optimization models, which the DR model generalizes. As noted above, various approximation results have been obtained for 2-stage, and even multistage problems. In the black-box model, a common approach is the SAA method, which simply consists of solving the stochastic-optimization problem for the empirical distribution $\widehat{p}$ obtained by sampling. The effectiveness of this method has been analyzed both for 2-stage stochastic problems [24, 4, 38] and multi-stage stochastic problems [38]. The sample-complexity bound in [24] is a non-polynomial bound for general 2-stage stochastic problems, whereas [4, 38] both obtain $\operatorname{\mathsf{poly}}(\text{input size},\lambda)$ bounds for structured problems. The proof in [38] applies also to structured multistage linear programs, and [4] show that even approximate solutions to the 2-stage SAA problem translate to approximate solutions to the original 2-stage problem. We build upon the SAA machinery of Charikar et al. [4]. Previously, Shmoys and Swamy [35] showed how to use the ellipsoid method to solve structured 2-stage linear programs in the black-box model, and how to round the resulting fractional solution. We utilize their machinery based on approximate subgradients to solve the polynomial-scenario central-distribution setting. Approximation algorithms for 2-stage problems have also been developed via combinatorial means. The prominent technique here is the boosted sampling technique of Gupta et al. [18]; the survey [37] gives a detailed description of these and other approximation results for 2-stage optimization.

Two-stage robust optimization where uncertainty is reflected in the constraints and not the data was proposed in [6], who devised approximation algorithms for various problems in the polynomial-scenario setting. Notice that it is not clear how to even specify problems with exponentially many scenarios in the robust model. Feige et al. [11] expanded the model of [6] by considering what we call the $k$ -bounded setting, where every subset of at most $k$ elements is a scenario. Subsequently, [23] and [17] expanded the collection of results known for 2-stage robust problems in the $k$ -bounded setting. We utilize results for the closely-related $k$ - $\max$ - $\min$ problem encountered in this setting in our work.

We briefly discuss a few other snippets that consider intermediary approaches between stochastic and robust optimization. Swamy [39] considers a model for risk-averse 2-stage stochastic optimization that interpolates between the stochastic and robust optimization approaches. In the context of online algorithms, Mirrokni et al. [27] and Esfandiari et al. [10] give online algorithms for allocation problems that are simultaneously competitive both in a random input model and in an adversarial input model. Finally, we note that our distributionally robust setting can be seen to be in a similar spirit as a recent focus in algorithmic mechanism design, where one does not assume precise knowledge of the underlying distribution; rather one (implicitly) has a collection of distributions, and one seeks to design mechanisms that work for every distribution in this collection; see, e.g., [21].

2 Problem definitions, and our general class of DR 2-stage problems

Recall that we consider settings where we have a ball $\mathcal{D}=\{q:L(\mathring{p},q)\leq r\}$ of distributions (over the scenario-collection $\mathcal{A}$ ) around a central distribution $\mathring{p}$ under some metric $L$ on distributions, and we seek to minimize the maximum expected cost with respect to a distribution in $\mathcal{D}$ . As mentioned earlier, we make no assumptions about $\mathring{p}$ , and only require the ability to draw samples from $\mathring{p}$ . The metrics that we consider for $L$ are the $L_{\infty}$ metric, $\frac{1}{2}L_{1}$ metric, and the Wasserstein metric. We now define Wasserstein metrics precisely.

Definition 2.1 (Wasserstein (a.k.a transportation or earth-mover) distance).

The Wasserstein distance between two probability distributions $p$ and $q$ over $\mathcal{A}$ is defined with respect to an underlying metric $\ell$ on $\mathcal{A}$ . A transportation plan or flow from $p$ to $q$ is a vector $\gamma\in\mathbb{R}_{+}^{\mathcal{A}\times\mathcal{A}}$ such that: (i) $\sum_{A^{\prime}\in\mathcal{A}}\gamma_{A,A^{\prime}}=p_{A}$ for all $A\in\mathcal{A}$ ; and (ii) $\sum_{A\in\mathcal{A}}\gamma_{A,A^{\prime}}=q_{A^{\prime}}$ for all $A^{\prime}\in\mathcal{A}$ . The Wasserstein distance between $p$ and $q$ , denoted $L_{\mathrm{W}}(p,q)$ , is the minimum value of $\sum_{A,A^{\prime}}\gamma_{A,A^{\prime}}\ell(A,A^{\prime})$ over all transportation plans from $p$ to $q$ .

If $\ell$ is an asymmetric metric, then $L_{\mathrm{W}}$ is an asymmetric metric; if $\ell$ is a pseudometric—i.e., $\ell$ satisfies the triangle inequality but $\ell(A,A^{\prime})$ could be [math] for $A\neq A^{\prime}$ —then so is $L_{\mathrm{W}}$ .

In Section 3.3, we consider the DR versions of set cover (and some special cases), facility location, and Steiner tree. DR 2-stage facility location ( $\mathsf{DRSFL}$ ) was defined in Section 1.1; we define the remaining problems below, and then discuss the general class of DR 2-stage problems to which our framework applies. Recall that $\mathcal{I}$ denotes the input size.

$\bullet$

DR 2-stage set cover ( $\mathsf{DRSSC}$ ). We have a collection $\mathcal{S}$ of subsets over a ground set $U$ . A scenario is a subset of $U$ and specifies the set of elements to be covered in that scenario. We may buy a set $S\in\mathcal{S}$ in either stage, incurring costs of $c_{S}$ and $c^{\mathrm{II}}_{S}$ in stages I and II respectively. The sets chosen in stage I and in each scenario $A$ must together cover $A$ . The goal is to choose some first-stage sets $\mathcal{S}^{\mathrm{I}}\subseteq\mathcal{S}$ and sets $\mathcal{S}^{A}\subseteq\mathcal{S}$ in each scenario $A$ so as to minimize $\sum_{S\in\mathcal{S}^{\mathrm{I}}}c_{S}+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\sum_{S\in\mathcal{S}^{A}}c^{\mathrm{II}}_{S}\bigr{]}$ .

We have $\lambda:=\max\{1,\max_{S\in\mathcal{S}}c^{\mathrm{II}}_{S}/c_{S}\}$ , and $\mathcal{I}$ is the encoding size of $\bigl{(}U,\mathcal{S},c,c^{\mathrm{II}},r\bigr{)}$ . We consider the unrestricted ( $\mathcal{A}=2^{U}$ ) and $k$ -bounded ( $\mathcal{A}=\{A\subseteq U:|A|\leq k\}$ ) settings. Different scenarios could be quite unrelated, so there does not seem to be a natural choice for a (non-discrete) scenario-metric; we therefore consider (balls in) the $L_{\infty}$ or $\frac{1}{2}L_{1}$ metrics. 2. $\bullet$

DR 2-stage Steiner tree ( $\mathsf{DRSST}$ ). We have a complete graph $G=(V,E)$ with metric edge costs $\{c_{e}\}_{e\in E}$ , root $s\in V$ , and inflation factor $\lambda\geq 1$ . A scenario is a subset of nodes $A\subseteq V$ (called terminals) specifying the nodes that need to be connected to $s$ . We may buy an edge $e\in E$ in stages I or II, incurring costs $c_{e}$ or $c^{\mathrm{II}}_{e}=\lambda c_{e}$ respectively. The union of the edges $F\subseteq E$ bought in stage I, and $F^{A}\subseteq E$ bought in scenario $A$ , must connect all nodes in $A$ to $s$ , and we want to minimize $\sum_{e\in F}c_{e}+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\sum_{e\in F^{A}}c^{\mathrm{II}}_{e}\bigr{]}$ . (With non-uniform inflation factors for different edges, even 2-stage stochastic Steiner tree becomes at least as hard as group Steiner tree [32].)

Here $\mathcal{I}$ is the encoding size of $(G,c,r)$ . We obtain results in the unrestricted setting, and leave the $k$ -bounded setting for future work. As with $\mathsf{DRSFL}$ , in addition to the $L_{\infty}$ and $\frac{1}{2}L_{1}$ metrics, we can consider scenario metrics defined using $c$ (e.g., $\ell^{\mathsf{asym}}_{\infty}$ ) and the resulting Wasserstein metrics.

A general class of DR 2-stage problems.

Abstracting away the key properties of $\mathsf{DRSFL}$ , $\mathsf{DRSSC}$ , $\mathsf{DRSST}$ , we now define the generic DR 2-stage problem that we consider. As before, $X$ denotes the finite first-stage action set of the discrete problem. It will be convenient to consider the natural fractional relaxation of the DR problem obtained by enlarging the discrete second-stage action set and $X$ to suitable polytopes. Recall that $g(x,A)$ is the optimal second-stage cost of scenario $A$ given $x$ as the first-stage decision, when we allow fractional second-stage actions. Let $\mathcal{P}\subseteq\mathbb{R}_{+}^{m}$ denote the polytope specifying the fractional first-stage decisions, with $X=\mathcal{P}\cap\mathbb{Z}^{m}$ . (For example, for $\mathsf{DRSSC}$ , $g(x,A)$ is the optimal value of a set-cover LP where we may buy sets fractionally in the second stage, and $\mathcal{P}=[0,1]^{m}$ .) One benefit of moving to the fractional relaxation is that, for every scenario $A$ , $g(x,A)$ is a convex function of $x$ , whose value and subgradient can be exactly computed.

Definition 2.2.

Let $f:\mathbb{R}^{m}\rightarrow\mathbb{R}$ be a function. We say that $d\in\mathbb{R}^{m}$ is a subgradient of $f$ at $u\in\mathbb{R}^{m}$ if we have $f(v)-f(u)\geq d\cdot(v-u)$ for all $v\in\mathbb{R}^{m}$ . Given $S\subseteq\mathbb{R}^{m}$ , we say that $\widehat{d}$ is an $(\omega,S)$ -subgradient of $f$ at the point $u\in S$ if for every $v\in S$ , we have $f(v)-f(u)\geq\widehat{d}\cdot(v-u)-\omega f(u)$ . We abbreviate $(\omega,\mathcal{P})$ -subgradient to $\omega$ -subgradient.

Following [4, 35, 38], we consider the following generic DR 2-stage problem (Q ${}_{\mathring{p}}$ ) with discrete first-stage set $X$ , and its (further) fractional relaxation (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ), and require that they satisfy properties (P1)–(P6) listed below. Let $\|u\|$ denote the $L_{2}$ -norm of $u$ .

[TABLE]

In proving their SAA result for 2-stage stochastic problems, [4] define properties (P1), (P2) below to capture the fact that every first-stage action has a corresponding recourse action that is more expensive by a bounded factor, and hence, it is always feasible to not take any first-stage actions.

(P1)

$0\in X$ , $c\geq 0$ , $\log|X|=\operatorname{\mathsf{poly}}(\mathcal{I})$ , and $0\leq g(x,A)\leq g(0,A)$ for all $x\in\mathcal{P},A\in\mathcal{A}$ . 2. (P2)

We know an inflation parameter $\lambda\geq 1$ such that $g(0,A)\leq g(x,A)+\lambda c^{\intercal}x$ for all $x\in\mathcal{P},A\in\mathcal{A}$ .

Since we apply the ellipsoid-based machinery in [35] to solve the fractional problem with a polynomial-size central distribution, we need bounds on the feasible region $\mathcal{P}$ in terms of enclosing and enclosed balls; this is captured by (P3), which is directly lifted from [35]. Note that the vast majority of 2-stage problems (including $\mathsf{DRSFL}$ , $\mathsf{DRSSC}$ , $\mathsf{DRSST}$ ) involve $\{0,1\}$ decisions, with $X=\{0,1\}^{m}$ and so $\mathcal{P}=[0,1]^{m}$ , so (P3) is readily satisfied. As in [35], we need to be able to compute the value and subgradient of the recourse cost $g(x,A)$ , which is a benign requirement since $g(x,A)$ is the optimal value of a polytime-solvable LP in all our applications. Whereas [35] define a syntactic class of 2-stage stochastic LPs and show (implicitly) that they satisfy this requirement, we explicitly isolate this requirement in (P4), (P5).

(P3)

We have positive bounds $R$ and $V\leq 1$ such that $\mathcal{P}\subseteq B(0,R):=\{x:\|x\|\leq R\}$ and $\mathcal{P}$ contains a ball of radius $V$ such that $\ln\bigl{(}\frac{R}{V}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . 2. (P4)

For every $A\in\mathcal{A}$ , $g(x,A)$ is convex over $\mathcal{P}$ , and can be efficiently computed for every $x\in\mathcal{P}$ . 3. (P5)

For every $x\in\mathcal{P},A\in\mathcal{A}$ , we can efficiently compute a subgradient $d$ of $g(x,A)$ at $x$ with $\|d\|\leq K$ , where $\ln K=\operatorname{\mathsf{poly}}(\mathcal{I})$ . Hence, the Lipschitz constant of $g(x,A)$ is at most $K$ (due to Definition 2.2).

Finally, we need the following additional mild condition.

(P6)

When $L$ is the Wasserstein metric with respect to a scenario metric $\ell$ , we know $\tau\geq 1$ with $\ln\tau=\operatorname{\mathsf{poly}}(\mathcal{I})$ such that $g(x,A^{\prime})-g(x,A)\leq\tau\cdot\ell(A,A^{\prime})$ for all $x\in\mathcal{P}$ and all $(A,A^{\prime})$ with $\ell(A,A^{\prime})>0$ .

As noted above, (P1)–(P5) are gathered from [4, 35], and hold for all the 2-stage problems considered in the CS literature (see [38, 6, 11, 23, 17]); (P6) is a new requirement, but is also rather mild and holds for all the problems we consider. (P1), (P2) and (P6) are used to prove that SAA works for the DR problem under the Wasserstein metric (Section 3.1). (P3)–(P5) pertain to the fractional relaxation, and are utilized to show that one can efficiently solve the SAA problem approximately (Section 3.2).

A solution to (Q ${}_{\mathring{p}}$ ) needs to be rounded to yield integral second-stage actions: any LP-relative $\alpha$ -approximation algorithm for the deterministic version of the problem can be used to obtain recourse actions for each scenario $A$ having cost at most $\alpha\cdot g(x,A)$ . To round a fractional solution to (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ), we utilize a local approximation algorithm for the 2-stage problem: we say that $\mathsf{Alg}$ is a local $\rho$ -approximation algorithm for (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ) if, given any $x\in\mathcal{P}$ , it returns an integral solution $\widetilde{x}\in X$ and implicitly specifies integral recourse actions $\widetilde{z}^{A}$ for every $A\in\mathcal{A}$ , such that $c^{\intercal}\widetilde{x}\leq\rho(c^{\intercal}x)$ and $\text{(cost of$ \widetilde{z}^{A} $)}\leq\rho g(x,A)$ for all $A\in\mathcal{A}$ . An $\alpha$ -approximate solution to (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ) combined with a local $\rho$ -approximation yields an $\alpha\rho$ -approximate solution to the discrete DR 2-stage problem. Local approximation algorithms exist for various 2-stage problems—e.g., set cover, vertex cover, facility location [35]—with approximation factors that are comparable to the approximation factors known for their deterministic counterparts.

3 Distributionally robust problems under the Wasserstein metric

We now focus on the DR 2-stage problem (Q ${}_{\mathring{p}}$ ) when $L$ is the Wasserstein metric $L_{\mathrm{W}}$ with respect to a metric $\ell$ on scenarios. Plugging in the definition of $L_{\mathrm{W}}$ (with respect to scenario metric $\ell$ ), we can rewrite (Q ${}_{\mathring{p}}$ ) as follows.

[TABLE]

Let $O^{*}:=\min_{x\in X}h({\mathring{p}}\,;{x})$ denote the optimal value of (Q ${}_{\mathring{p}}$ ). We note that a naive, simplistic approach that ignores the uncertainty in the underlying distribution, and only considers the central distribution $\mathring{p}$ , yields (expectedly) poor bounds. Suppose $\bar{x}$ is an $\alpha$ -approximate solution for the 2-stage problem $\min_{x\in X}\bigl{(}c^{\intercal}x+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]}\bigr{)}$ . Given (P6), one can show that $z({\mathring{p}}\,;{\bar{x}})\leq{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(\bar{x},A)\bigr{]}+\tau\cdot r$ (and is at least ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(\bar{x},A)\bigr{]}$ ), which implies $h({\mathring{p}}\,;{\bar{x}})\leq\alpha\cdot O^{*}+\tau\cdot r$ , but this is too weak a guarantee since $\tau\cdot r$ could be quite large compared to $O^{*}$ .

In Section 3.1, we work with (Q ${}_{\mathring{p}}$ ) and show that the SAA approach can be used to reduce to the case where the central distribution has polynomial-size support. In Section 3.2, we show how to approximately solve the polynomial-size support case by applying the ellipsoid method to its (further) relaxation (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ), where we replace $X$ with $\mathcal{P}$ . Here, we utilize a local approximation algorithm to move from $\mathcal{P}$ to $X$ , and thereby interface with, and complement, the SAA result for (Q ${}_{\mathring{p}}$ ) proved in Section 3.1. This result applies more generally, even when $\ell$ is not a metric; we only require that $\ell(A,A)=0$ for all $A\in\mathcal{A}$ . (If $\ell$ is not a metric, the Wasserstein distance with respect to $\ell$ need not yield a metric on distributions.)

In Section 3.3, we consider various combinatorial-optimization problems, and utilize the above results in conjunction to obtain the first approximation results for the DR versions of these problems.

3.1 A sample-average-approximation (SAA) result for distributionally robust problems

The SAA approach is the following simple, intuitive idea: draw some $N$ samples from $\mathring{p}$ , estimate $\mathring{p}$ by the empirical distribution $\widehat{p}$ induced by these samples, and solve the SAA problem (Q ${}_{\widehat{p}}$ ). We prove the following SAA result. For any $\varepsilon\leq\frac{1}{3}$ , if we construct $O\bigl{(}\frac{1}{\varepsilon}\bigr{)}$ SAA problems, each using $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon},\log(\frac{1}{\eta})\bigr{)}$ independent samples, and if we have a $\beta$ -approximation algorithm for computing the objective value of the SAA problem at any given point, then we can utilize $\rho$ -approximate solutions to these SAA problems to obtain a solution $\widehat{x}\in X$ satisfying $h({\mathring{p}}\,;{\widehat{x}})\leq 4\beta\rho\bigl{(}1+O(\varepsilon)\bigr{)}\cdot O^{*}+2\beta\rho\eta$ with high probability; Theorem 3.5 gives the precise statement.

The proof has several ingredients. There are two main approaches [4, 38] for showing that the SAA method with a polynomial number of samples works for stochastic-optimization problems. Charikar et al. [4] prove the following SAA result for 2-stage problems.

Theorem 3.1 ([4]).

Consider a 2-stage problem (2St-P) : $\min_{x\in\widetilde{X}}\ \bigl{(}f({p};{x}):=\tilde{c}^{\intercal}x+{\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\tilde{g}(x,A)\bigr{]}\bigr{)}$ , with scenario set $\tilde{\mathcal{A}}$ , where $(\widetilde{X},\tilde{c},\tilde{g},\tilde{\mathcal{A}})$ satisfy (P1), (P2) with inflation parameter $\Lambda$ . With probability at least $1-\delta$ , any optimal solution to the SAA problem constructed using $\operatorname{\mathsf{poly}}\bigl{(}\log|\widetilde{X}|,\frac{\Lambda}{\varepsilon},\log(\frac{1}{\delta})\bigr{)}$ samples is a $(1+\varepsilon)$ -approximate solution to (2St-P). More generally, there is a way of using an $\alpha$ -approximation algorithm for the SAA problem, in conjunction with a $\beta$ -approximate objective-value oracle for the SAA problem, to obtain an $\bigl{(}\alpha\beta+O(\varepsilon)\bigr{)}$ -approximate solution to (2St-P) with high probability.

Note that (Q ${}_{\mathring{p}}$ ) is not a standard 2-stage stochastic-optimization problem because constraint (2) couples the various scenarios, which prevents us from applying Theorem 3.1 to (Q ${}_{\mathring{p}}$ ). The SAA result in Swamy and Shmoys [38] applies to the fractional relaxation of the problem, and works whenever the objective functions of the SAA and original problems satisfy a certain “closeness-in-subgradients” property. A subgradient of $h({\mathring{p}}\,;{\cdot})$ at a point $x\in\mathcal{P}$ is obtained from the optimal distribution $q$ to the inner maximization problem in (Q ${}_{\mathring{p}}$ ). This is however an exponential-size object and utilizing this to prove closeness-in-subgradients seems quite daunting.

Our first insight is that we can decouple the scenarios by Lagrangifying constraint (2) using a dual variable $y\geq 0$ . By standard duality arguments, this leads to the following reformulation of (Q ${}_{\mathring{p}}$ ).

[TABLE]

Recall that $g(x,y,A):=\max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}$ . Let $\ell_{\max}:=\max_{A,A^{\prime}}\ell(A,A^{\prime})$ . The chief benefit of the reformulation (R ${}_{\mathring{p}}$ ) is that we can view (R ${}_{\mathring{p}}$ ) as a 2-stage problem: the first-stage action-set is $X\times\mathbb{R}_{+}$ , and the optimal second-stage cost of scenario $A$ under first-stage actions $(x,y)$ is given by $g(x,y,A)$ . This makes it more amenable to utilize the SAA machinery developed for 2-stage problems. We can exploit (P6) to show that we may limit $y$ to the range $[0,\tau]$ in (R ${}_{\mathring{p}}$ ), and use (P2) to bound the inflation factor of (R ${}_{\mathring{p}}$ ).

Lemma 3.2.

For any $x\in X$ , there exists $y\in[0,\tau]$ such that $h({\mathring{p}}\,;{x})=h({\mathring{p}}\,;{x,y})$ . Hence, $x\in X$ is an $\alpha$ -approximate solution to (Q ${}_{\mathring{p}}$ ) iff $\exists y\in[0,\tau]$ such that $(x,y)$ is an $\alpha$ -approximate solution to (R ${}_{\mathring{p}}$ ).

Proof.

The second statement is immediate from the first one since (Q ${}_{\mathring{p}}$ ) and (R ${}_{\mathring{p}}$ ) have the same optimal values. So we focus on showing the first statement.

Consider any $x\in X$ . There exists $y^{*}\geq 0$ such that $h({\mathring{p}}\,;{x})=h({\mathring{p}}\,;{x,y^{*}})$ . If $y^{*}\leq\tau$ , then we are done. So suppose $y^{*}>\tau$ . We argue that $h({\mathring{p}}\,;{x^{*},\tau})\leq h({\mathring{p}}\,;{x,y^{*}})$ . This completes the proof since we also have $h({\mathring{p}}\,;{x})\leq h({\mathring{p}}\,;{x,y})$ for all $y\geq 0$ . Clearly, $c^{\intercal}x+r\tau\leq c^{\intercal}x+ry^{*}$ . If $A^{\prime}\in\mathcal{A}$ is such that $g(x,y^{*},A)=g(x,A^{\prime})-y^{*}\cdot\ell(A,A^{\prime})$ , then it must be that $\ell(A,A^{\prime})=0$ . Otherwise, $g(x,A^{\prime})-y^{*}\cdot\ell(A,A^{\prime})<g(x,A^{\prime})-\tau\cdot\ell(A,A^{\prime})\leq g(x,A)$ , where the last inequality follows from (P6). This contradicts the choice of $A^{\prime}$ . Therefore, we have $g(x,y^{*},A)=\max_{A^{\prime}\in\mathcal{A}:\ell(A,A^{\prime})=0}g(x,A^{\prime})=g(x,\tau,A)$ , completing the proof. ∎

Lemma 3.3.

For the 2-stage problem (R ${}_{\mathring{p}}$ ), we can set the parameter $\Lambda$ in Theorem 3.1 to be $\max\bigl{\{}\lambda,\frac{\ell_{\max}}{r}\bigr{\}}$ .

Proof.

Consider any $x\in X$ , $y\geq 0$ , and $A\in\mathcal{A}$ . Let $A^{\prime}\in\mathcal{A}$ be such that $g(0,0,A)=g(0,A^{\prime})$ . Then

[TABLE]

The second inequality above follows from (P2). ∎

Given Lemmas 3.2 and 3.3, by suitably discretizing $[0,\tau]$ , one can use Theorem 3.1 to show that: if we construct the SAA problem $\min_{x\in X}h({\widehat{p}}\,;{x})\ \equiv\ \min_{x\in X,y\in[0,\tau]}h({\widehat{p}}\,;{x,y})$ using $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon},\log\tau,\frac{\ell_{\max}}{r}\bigr{)}$ samples, and can compute (approximately) the SAA objective value $h({\widehat{p}}\,;{x,y})$ at any given point, then, with high probability, one can translate an $\alpha$ -approximate solution to the SAA problem to an $O(\alpha+\varepsilon)$ -approximate solution to (Q ${}_{\mathring{p}}$ ). But this result does not quite suit our purposes due to various reasons.

The term $\frac{\ell_{\max}}{r}$ could be rather large, and is not $\operatorname{\mathsf{poly}}(\mathcal{I},\lambda)$ , so this does not yield polynomial sample complexity.444The problem persists even if we utilize the closeness-in-subgradients machinery in [38] to the fractional version of (R ${}_{\mathring{p}}$ ). This would involve estimating ${\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\ell(A,\pi(x,y,A))\bigr{]}$ to within an $\varepsilon r$ term, where $\pi(x,y,A)=\operatorname{argmax}_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}$ , which requires $O\left(\frac{\ell_{\max}}{\varepsilon r}\right)$ samples. Moreover it seems difficult to compute the SAA objective value $h({\widehat{p}}\,;{x,y})$ , or even approximate it. This difficulty arises because computing $g(x,y,A)$ encompasses the NP-hard $k$ - $\max$ - $\min$ problem encountered in 2-stage robust optimization, and furthermore, the mixed-sign objective in $g(x,y,A)$ makes it hard to even approximate $g(x,y,A)$ (see Theorem 3.12).

We need various ideas to circumvent these issues. We show that we can eliminate the dependence on $\frac{\ell_{\max}}{r}$ altogether at the expense of a slight deterioration in the approximation ratio when moving from the SAA to the original problem. The $\frac{\ell_{\max}}{r}$ term arises because $g(0,0,A)$ might be attained by a scenario $A^{\prime}$ where $\ell(A,A^{\prime})\approx\ell_{\max}$ (see the proof of Lemma 3.3). Our crucial second insight is that we can eliminate this and reduce the sample complexity to $\operatorname{\mathsf{poly}}(\mathcal{I},\lambda)$ , by specifically imposing that we never encounter $(A,A^{\prime})$ pairs with $\ell(A,A^{\prime})>M:=\lambda r$ ; we call such pairs long edges, and the remaining pairs short edges. Any $\gamma$ satisfying (2) can send at most $\frac{r}{M}=\frac{1}{\lambda}$ flow on the long edges. Motivated by this, we “decompose” $z({\mathring{p}}\,;{x})$ into $z^{\mathrm{sh}}({\mathring{p}}\,;{x})$ and $z^{\mathrm{lg}}({\mathring{p}}\,;{x})$ , which are (roughly speaking) the contribution from the short and long edges respectively. (This decomposition is akin to the division of low- and high- cost scenarios used by [4] to prove Theorem 3.1, but there are significant technical differences, which complicate things for us, as we discuss below.) We define $z^{\mathrm{sh}}({\mathring{p}}\,;{x})$ and $z^{\mathrm{lg}}({\mathring{p}}\,;{x})$ as follows.

[TABLE]

Lemma 3.4.

For every central distribution $p$ , and every $x\in\mathcal{P}$ , we have $h({p}\,;{x})\leq c^{\intercal}x+z^{\mathrm{sh}}({p}\,;{x})+z^{\mathrm{lg}}({p}\,;{0})\leq 2h({p}\,;{x})$ .

Proof.

We prove this by showing that: (i) $z({p}\,;{x})\leq z^{\mathrm{sh}}({p}\,;{x})+z^{\mathrm{lg}}({p}\,;{x})\leq 2z({p}\,;{x})$ ; and (ii) $z^{\mathrm{lg}}({p}\,;{x})\leq z^{\mathrm{lg}}({p}\,;{0})\leq z^{\mathrm{lg}}({p}\,;{x})+c^{\intercal}x$ . Given these bounds, the upper bound on $h({p}\,;{x})$ follows from the upper bounds on $z({p}\,;{x})$ and $z^{\mathrm{lg}}({p}\,;{x})$ in parts (i) and (ii) respectively. For the other direction, we have

[TABLE]

where the first and second inequalities follow from the second inequalities of parts (ii) and (i) respectively.

Part (ii) follows from property (P2). For any feasible solution $\gamma$ to the optimization problem defining $z^{\mathrm{lg}}({p}\,;{0})$ (and $z^{\mathrm{lg}}({p}\,;{x})$ ), we have

[TABLE]

We now prove part (i). It is clear from the definition that $z^{\mathrm{sh}}({p}\,;{x}),z^{\mathrm{lg}}({p}\,;{x})\leq z({p}\,;{x})$ , so the second inequality holds. For the first inequality, consider any feasible solution $\gamma$ to (T ${}_{\mathring{p},x}$ ). Let $\gamma^{\mathrm{sh}}$ be the restriction of $\gamma$ to the short edges, along with [math]s for the long edges. Similarly, let $\gamma^{\mathrm{lg}}$ be the restriction of $\gamma$ to the long edges, along with [math]s for the short edges. Then $\gamma^{\mathrm{sh}}$ and $\gamma^{\mathrm{lg}}$ are feasible solutions to the optimization problems defining $z^{\mathrm{sh}}({p}\,;{x})$ and $z^{\mathrm{lg}}({p}\,;{x})$ respectively. This yields the first inequality in (i). ∎

Given Lemma 3.4, we focus on the thresholded proxy problem ( $\overline{\mathrm{Q}}_{\mathring{p}}$ ) below, and its reformulation obtained (as before) by Lagrangifying (2) and simplifying.

[TABLE]

where $\overline{g}(x,y,A):=\max_{A^{\prime}\in\mathcal{A}:\ell(A,A^{\prime})\leq M}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}$ . After suitably discretizing the $y$ -interval $[0,\tau]$ , we obtain that the 2-stage problem ( $\overline{\mathrm{R}}_{\mathring{p}}$ ) satisfies (P1) and (P2) with inflation parameter $\Lambda=\lambda$ . So Theorem 3.1 applied to ( $\overline{\mathrm{R}}_{\mathring{p}}$ ) suggests an improved $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon}\bigr{)}$ sample complexity, but two sources of difficulty remain.

First, while we would like to consider the proxy problem ( $\overline{\mathrm{R}}_{\widehat{p}}$ ), which is the SAA version of ( $\overline{\mathrm{R}}_{\mathring{p}}$ ), we are in fact solving the true) SAA problem (Q ${}_{\widehat{p}}$ ) approximately. Whereas $h({p}\,;{x})$ and $\overline{h}({p}\,;{x})+z^{\mathrm{lg}}({p}\,;{0})$ are pointwise close, $z^{\mathrm{lg}}({p}\,;{0})$ could be significant compared to $z({p}\,;{x})$ (as indicated by the factor- $2$ loss in Lemma 3.4). Therefore, an $\alpha$ -approximation to (Q ${}_{\widehat{p}}$ ) does not yield an $O(\alpha)$ -approximation to ( $\overline{\mathrm{Q}}_{\widehat{p}}$ ) (or equivalently, ( $\overline{\mathrm{R}}_{\widehat{p}}$ )). We will in fact not be able to obtain an approximate solution to ( $\overline{\mathrm{R}}_{\widehat{p}}$ ), and so it is unclear why transferring approximation guarantees from ( $\overline{\mathrm{R}}_{\widehat{p}}$ ) to ( $\overline{\mathrm{R}}_{\mathring{p}}$ ) (and hence ( $\overline{\mathrm{Q}}_{\mathring{p}}$ )) is helpful. That is, the artifact we encounter is that the 2-stage SAA problem that has bounded inflation factor is not the one that we are able to approximate. (Note that Theorem 3.1 is not equipped to deal with this issue since its starting point is an approximate solution to the SAA problem.)

The way around this is to realize that our goal is to evaluate the quality of the SAA solution for the original problem (Q ${}_{\mathring{p}}$ ), and not ( $\overline{\mathrm{R}}_{\mathring{p}}$ ). In 2-stage stochastic optimization, the contribution $f_{h}(p)$ from high-cost scenarios to the total expected cost is linear in $p$ , which provides a handle on how to relate $f_{h}(\mathring{p})$ and $f_{h}(\widehat{p})$ . In our case, the contribution $z^{\mathrm{lg}}({p}\,;{0})$ is nonlinear in $p$ , and we need to derive new insights to reason about how this changes when we move from $\mathring{p}$ to its empirical estimate $\widehat{p}$ ; we then proceed by carefully adapting the ideas in [4]. We explain this in more detail under “Overview” in Appendix A.

Second, we (still) do not have an approximate value oracle for $h({\widehat{p}}\,;{x,y})$ (or $\overline{h}({\widehat{p}}\,;{x,y})$ ). However, we will show in Section 3.2 (see Lemma 3.9) that if we have the non-standard type of approximation for $g(x,y,A)$ mentioned in Theorem 1, then one can obtain an approximate value oracle for $h({\widehat{p}}\,;{x})$ . While this is not the same as a value oracle for $h({\widehat{p}}\,;{x,y})$ , we show that this nevertheless suffices.

Combining these ingredients yields the following theorem, which is the main result of this section. Recall that $O^{*}:=\min_{x\in X}h({\mathring{p}}\,;{x})$ , and $\log|X|$ and $\log\tau$ are $\operatorname{\mathsf{poly}}(\mathcal{I})$ .

Theorem 3.5.

Let $\varepsilon\leq\frac{1}{3}$ , $\eta>0$ . Consider $k=\frac{2}{\varepsilon}\log\bigl{(}\frac{1}{\delta}\bigr{)}$ SAA problems with objective functions $h({\widehat{p}^{i}}\,;{x}):=c^{\intercal}x+z({\widehat{p}^{i}}\,;{x})$ , for $i=1,\ldots,k$ , where each $\widehat{p}^{i}$ is an empirical estimate of $\mathring{p}$ constructed using $N=\operatorname{\mathsf{poly}}(\frac{\lambda}{\varepsilon},\log|X|,\log(\frac{\tau}{\eta}),\log(\frac{1}{\delta})\bigr{)}$ independent samples. Suppose that for every $i=1,\ldots,k$ , we have a solution $\widehat{x}^{i}\in X$ and an estimate $f^{i}$ , such that: (S1) $h({\widehat{p}^{i}}\,;{\widehat{x}^{i}})\leq\beta f^{i}$ ; and (S2) $f^{i}\leq\rho\cdot\min_{x\in X}h({\widehat{p}^{i}}\,;{x})$ (where $\beta,\rho\geq 1$ ). Let $j=\operatorname{argmin}_{i=1,\ldots,k}f^{i}$ and $\widehat{x}=\widehat{x}^{j}$ . Then, $h({\mathring{p}}\,;{\widehat{x}})\leq 4\beta\rho\bigl{(}1+O(\varepsilon)\bigr{)}O^{*}+2\beta\rho\eta$ with probability at least $1-3\delta$ .

The mixed (i.e., multiplicative + additive) guarantee obtained above can be turned into a purely multiplicative guarantee if we have a lower bound $\mathsf{LB}$ on $O^{*}$ with $\log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . We show that such a lower bound can indeed be obtained under some very mild assumptions (Lemma 3.11).

The proof of Theorem 3.5 is further complicated due to the peculiarities of the estimates $f^{i}$ that we have for $h({\widehat{p}^{i}}\,;{\widehat{x}^{i}})$ . Note that (S1), (S2) only imply that $\widehat{x}^{i}$ is a $\beta\rho$ -approximation to the SAA problem, and $h({\widehat{p}^{i}}\,;{\widehat{x}^{i}})\in[f^{i}/\rho,\beta f^{i}]$ , so a statement of the form in Theorem 3.1 would yield an inferior approximation bound of $O(\beta^{2}\rho^{2}+\varepsilon)$ . Instead, we need to adapt the arguments of [4] to suit the numerous peculiarities of our setting. The proof is therefore somewhat technical and we defer this to Appendix A.

We remark that the proxy problem ( $\overline{\mathrm{Q}}_{\mathring{p}}$ ) (or ( $\overline{\mathrm{R}}_{\mathring{p}}$ )) is used only in the analysis. One takeaway here is that we derive a substantially improved sample-complexity bound by taking a slight hit in the approximation ratio when moving from the SAA to the original problem. This is a novel, nuanced result regarding the effectiveness of the SAA method for DR 2-stage problems. We do not know of any other setting where one obtains drastically improved sample complexity by settling for a worse than $(1+\varepsilon)$ -factor (but still $O(1)$ ) loss when moving from the SAA to the original problem. (In particular, no such result is known for standard 2-stage problems.)

3.2 Solving distributionally robust problems for polynomial-support central distributions

We now show how to approximately solve the distributionally robust problem (Q ${}_{\widehat{p}}$ ) when the central distribution $\widehat{p}$ has polynomial-size. This will allow us to solve the SAA problem(s) constructed in Section 3.1, and complement Theorem 3.5. Let $\mathcal{A}^{\mathrm{sup}}$ denote the support of $\widehat{p}$ . So we have

[TABLE]

We consider the fractional relaxation of (Q ${}_{\widehat{p}}$ ), where we replace $X$ with its relaxation $\mathcal{P}$ to obtain (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ): $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ . As noted earlier, unlike the case with 2-stage {stochastic, robust} optimization, where the fractional relaxation of the polynomial-scenario problem gives a polynomial-size LP and is therefore straightforward to solve in polytime, it is substantially more challenging to even approximately solve the fractional DR polynomial-scenario problem. In particular, reformulating $z({\widehat{p}}\,;{x})$ (and hence (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ )) as a minimization LP leads to an LP with exponential number of constraints and variables. The issue is that (T ${}_{\widehat{p},x}$ ) involves an exponential number of $\gamma_{A,A^{\prime}}$ variables. So if we reformulate $z({\widehat{p}}\,;{x})$ as a minimization LP by taking the dual of (T ${}_{\widehat{p},x}$ ) (and replacing $g(x,A^{\prime})$ by its LP formulation), we obtain an exponential number of constraints (due to the $\gamma_{A,A^{\prime}}$ variables), and an exponential number of variables (needed to encode the LP for $g(x,A^{\prime})$ , for each $A^{\prime}\in\mathcal{A}$ ). (An exception to all this is the unrestricted setting (i.e., $\mathcal{A}=2^{U}$ for some set $U$ ) with the discrete scenario metric $\ell$ (so $L_{\mathrm{W}}$ is the $\frac{1}{2}L_{1}$ -metric), under the assumption that $g(x,A)\leq g(x,A^{\prime})$ for all $x$ , $A\subseteq A^{\prime}$ , which holds for covering problems. Here, we can reformulate $z({\widehat{p}}\,;{x})$ as a polynomial-size minimization LP and hence, obtain a compact LP for (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ), and round its optimal solution using a local approximation algorithm. Theorem 3.13 proves a more general result along these lines.)

To overcome these obstacles, we work with the convex program given by (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ). Recall that $g(x,y,A):=\max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}$ , where $x\in\mathcal{P}$ , $y\geq 0$ , and $A\in\mathcal{A}$ . We show that the complexity of solving (Q ${}_{\widehat{p}}$ ) is tied to the problem of finding a near-optimal solution to $g(x,y,A)$ . However, as noted earlier, under the standard notion of approximation, it is impossible to obtain any approximation guarantee due to the mixed-sign objective in $g(x,y,A)$ (see Theorem 3.12). To evade this difficulty, we consider the following non-standard notion of approximation for $g(x,y,A)$ .

Definition 3.6.

We say that $\mathsf{Alg}$ is a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(x,y,A)$ , where $\beta_{1},\beta_{2}\geq 1$ , if it returns a scenario $\overline{A}\in\mathcal{A}$ such that $g(x,\overline{A})-y\cdot\ell(A,\overline{A})\geq\frac{g(x,A^{\prime})}{\beta_{1}}-\beta_{2}\cdot y\cdot\ell(A,A^{\prime})$ for all $A^{\prime}\in\mathcal{A}$ .

Recall that a local $\rho$ -approximation for (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ) is an algorithm that given $x\in\mathcal{P}$ , returns an integral solution $\widetilde{x}\in X$ and integral recourse actions $\widetilde{z}^{A}$ for every $A\in\mathcal{A}$ (implicitly), such that $c^{\intercal}\widetilde{x}\leq\rho(c^{\intercal}x)$ and $\text{(cost of$ \widetilde{z}^{A} $)}\leq\rho g(x,A)$ for all $A\in\mathcal{A}$ . The main result of this section, which is used to interface with Theorem 3.5, is as follows.

Theorem 3.7.

Suppose that we have a polytime separation oracle for $\mathcal{P}$ , a local $\rho$ -approximation algorithm for (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ), and a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(x,y,A)$ for any $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ . For any $\varepsilon>0$ , in $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\log(\frac{1}{\varepsilon})\bigr{)}$ time, we can compute $\widetilde{x}\in X$ and an estimate $\widetilde{f}$ of $h({\widehat{p}}\,;{\widetilde{x}})$ such that: $\widetilde{f}\leq h({\widehat{p}}\,;{\widetilde{x}})\leq\beta_{1}\beta_{2}\cdot\widetilde{f}$ , and $\widetilde{f}\leq\rho(1+\varepsilon)\cdot\min_{x\in X}h({\widehat{p}}\,;{x})$ .

We prove the above theorem by utilizing the ellipsoid method. For this, we need to be able to compute a subgradient of the objective function $h({\widehat{p}}\,;{x})$ . Shmoys and Swamy [35] showed that it suffices to have $\omega$ -subgradients (Definition 2.2). We show that a near-optimal solution to (T ${}_{\widehat{p},x}$ ) yields an approximate subgradient of $h({\widehat{p}}\,;{x})$ (Lemma 3.8), and we can obtain such a solution to (T ${}_{\widehat{p},x}$ ) using a $(\beta_{1},\beta_{2})$ -approximation to $g(x,y,A)$ (Lemma 3.9). Recall from properties (P4), (P5) that for every $A\in\mathcal{A}$ , the function $g(\bullet,A)$ is convex, and at every $x\in\mathcal{P}$ , $A\in\mathcal{A}$ , we can efficiently compute $g(x,A)$ , and a subgradient $d^{x,A}$ with $\|d^{x,A}\|\leq K$ , where $\ln K=\operatorname{\mathsf{poly}}(\mathcal{I})$ . The proof of Lemma 3.9 appears after the proof of Theorem 3.7, right before Section 3.2.1.

Lemma 3.8.

Let $x\in\mathcal{P}$ , and $\gamma$ be a $\beta$ -approximate solution to (T ${}_{\widehat{p},x}$ ). Then $d:=c+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}d^{x,A^{\prime}}$ is a $\bigl{(}1-\frac{1}{\beta}\bigr{)}$ -subgradient of $h({\widehat{p}}\,;{.})$ at $x$ .

Proof.

Consider any $x^{\prime}\in\mathcal{P}$ . Since $\gamma$ is a feasible solution to (T ${}_{{\widehat{p}},{x^{\prime}}}$ ), we have $h({\widehat{p}}\,;{x^{\prime}})\geq c^{\intercal}x^{\prime}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(x^{\prime},A^{\prime})$ . Let $\gamma^{*}$ be an optimal solution to (T ${}_{\widehat{p},x}$ ). Since $\gamma$ is a $\beta$ -approximate solution to (T ${}_{\widehat{p},x}$ ), we have

[TABLE]

Therefore,

[TABLE]

The second inequality follows since $d^{x,A^{\prime}}$ is a subgradient of $g(\cdot,A^{\prime})$ at $x$ . ∎

Lemma 3.9.

Let $x\in\mathcal{P}$ . Suppose we have a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(x,y,A)$ for all $y\geq 0$ and all $A\in\mathcal{A}$ . Then, (i) we can compute a $\beta_{1}\beta_{2}$ -approximate solution $\gamma$ to (T ${}_{\widehat{p},x}$ ); (ii) hence, $f=c^{\intercal}x+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(x,A^{\prime})$ satisfies $f\leq h({\widehat{p}}\,;{x})\leq\beta_{1}\beta_{2}\cdot f$ .

The ellipsoid-based algorithm in [35] (and for convex optimization in general) has two phases: one where we use approximate subgradients to obtain a polynomial number of feasible points such that at least one of them is a near-optimal solution, and the other, where we choose the best among these feasible points. In the first phase, starting with an ellipsoid that contains the entire feasible region, at each step, we add a cut (i.e., a hyperplane) passing through the center $\bar{x}$ of the current ellipsoid to chop off a half-ellipsoid that does not contain points of interest. If $\bar{x}$ is infeasible, we use a violated inequality to obtain such a cut. Otherwise, we find an $\omega$ -subgradient $\widehat{d}$ of $h({\widehat{p}}\,;{\bullet})$ at $\bar{x}$ and use the cut $\widehat{d}^{\intercal}(y-\bar{x})\leq 0$ ; the definition of $\omega$ -subgradient ensures that any point $y$ discarded by this cut has $h({\widehat{p}}\,;{y})\geq(1-\omega)h({\widehat{p}}\,;{x})$ . We continue this until the volume of the current ellipsoid becomes sufficiently small, which happens after a polynomial number of iterations. The first phase can be executed using $\omega$ -subgradients, for an arbitrary $\omega$ . Shmoys and Swamy [35] showed that the second phase can be implemented even without having an (approximate) objective-function oracle (which can be hard to obtain with exponentially many scenarios) provided that we have $\omega$ -subgradients for sufficiently small $\omega$ ( $=1/\operatorname{\mathsf{poly}}(\mathcal{I})$ ).

Computing $\omega$ -subgradients efficiently for such small $\omega$ would require an FPTAS for (T ${}_{\widehat{p},x}$ ). But, in general, the optimization problems $g(x,y,A)$ and (T ${}_{\widehat{p},x}$ ) are complicated problems that can capture the APX-hard $k$ - $\max$ - $\min$ problem— $\max_{A\subseteq U:|A|\leq k}g(x,A)$ —encountered in 2-stage robust optimization [11, 17, 23] (see Theorem 3.12). rules out an FPTAS for (T ${}_{\widehat{p},x}$ ); moreover, the approximation we can obtain for $g(x,y,A)$ will naturally depend on the application. We sidestep this difficulty by noting that Lemma 3.9 (ii) gives a $\beta_{1}\beta_{2}$ -approximate value oracle for $h({\widehat{p}}\,;{x})$ , which can be used to implement the second phase.

A final difficulty that remains is that for our applications (see Section 3.3), we will only be able to approximate $g(x,y,A)$ for integral $x$ (as is the case with robust $k$ - $\max$ - $\min$ problems); indeed Theorem 3.7 only assumes that we have an approximation algorithm for computing $g(x,y,A)$ when $x\in X=\mathcal{P}\cap\mathbb{Z}^{m}$ . However, we need to add an $\omega$ -subgradient cut passing through the center $\bar{x}$ of our current ellipsoid, which will typically not be integral; so we will not be able to use Lemmas 3.9 and 3.8 to obtain an $\omega$ -subgradient at $\bar{x}$ . To bypass this difficulty, we use the unorthodox approach of generating a cut from a point different from the ellipsoid-center $\bar{x}$ . We round $\bar{x}$ to $\widetilde{x}\in X$ using our local approximation algorithm, and use Lemma 3.8 at $\bar{x}$ , but with an approximate solution to (T ${}_{{\widehat{p}},{\widetilde{x}}}$ ) (obtained by approximating $g(\widetilde{x},y,A)$ ), to compute a vector $d$ ; we add the cut $d^{\intercal}(x-\bar{x})\leq 0$ . While $d$ need not be an $\omega$ -subgradient at $\bar{x}$ , we argue that this cut is still valid, in that any point $x^{\prime}$ cut off by the inequality has $h({\widehat{p}}\,;{x^{\prime}})$ large compared to $h({\widehat{p}}\,;{\widetilde{x}})$ .

Lemma 3.10.

Let $\bar{x}\in\mathcal{P}$ and $\widetilde{x}\in X$ be obtained by rounding $\bar{x}$ using a local $\rho$ -approximation algorithm. Let $\gamma$ be a $\beta$ -approximate solution to $\mathrm{(T_{\widehat{p},\widetilde{x}})}$ , and let $\widetilde{d}=c+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}d^{\bar{x},A^{\prime}}$ . If $x^{\prime}\in\mathcal{P}$ is such that $\widetilde{d}^{\intercal}(x^{\prime}-\bar{x})\geq 0$ , then $h({\widehat{p}}\,;{x^{\prime}})\geq\frac{1}{\rho}\cdot\bigl{(}c^{\intercal}\widetilde{x}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(\widetilde{x},A^{\prime})\bigr{)}\geq\frac{1}{\beta\rho}\cdot h({\widehat{p}}\,;{\widetilde{x}})$ .

Proof.

Define $f(x):=c^{\intercal}x+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(x,A^{\prime})$ for all $x\in\mathcal{P}$ . Clearly, $f(x)\leq h({\widehat{p}}\,;{x})$ for all $x\in\mathcal{P}$ . Also, since we use a local approximation algorithm to obtain $\widetilde{x}$ , we have $f(\bar{x})\geq f(\widetilde{x})/\rho$ . By mimicking the proof of Lemma 3.8, we have that $c+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}d^{x,A^{\prime}}$ is a subgradient of $f$ at $x$ . We have $h({\widehat{p}}\,;{x^{\prime}})-f(\bar{x})\geq f(x^{\prime})-f(\bar{x})\geq\widetilde{d}^{\intercal}(x^{\prime}-\bar{x})\geq 0$ . So $h({\widehat{p}}\,;{x^{\prime}})\geq f(\bar{x})\geq f(\widetilde{x})/\rho$ . Finally, $f(\widetilde{x})\geq h({\widehat{p}}\,;{\widetilde{x}})/{\beta}$ by Lemma 3.9 (ii). ∎

We describe below the algorithm $\mathsf{PolyAlg}$ leading to Theorem 3.7. By (P3), $\mathcal{P}\subseteq B(0,R)$ , and contains a ball of radius $V\leq 1$ , where $\ln\bigl{(}\frac{R}{V}\bigr{)}$ , $\ln K$ are $\operatorname{\mathsf{poly}}(\mathcal{I})$ . Lemma 3.8 implies that the Lipschitz constant of $h({\widehat{p}}\,;{.})$ is at most $K^{\prime}:=\|c\|+K$ , so $\ln K^{\prime}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . To utilize $\mathsf{PolyAlg}$ to obtain Theorem 3.7, we require a lower bound $\mathsf{LB}$ on $O^{*}_{\widehat{p}}:=\min_{x\in X}h({\widehat{p}}\,;{x})$ with $\log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . Under a standard, rather mild assumption (that originated in [35]), we argue that we can either compute such a lower bound, or determine that $x=0$ is an optimal solution (Lemma 3.11), and show that this suffices. Call a scenario $A$ a “null scenario” if $g(x,A)=g(0,A)$ for all $x\in\mathcal{P}$ (e.g., $A=\emptyset$ in $\mathsf{DRSSC}$ ). We assume that in every non-null scenario $A$ , we have $c^{\intercal}x+g(x,A)\geq 1$ for all $x\in\mathcal{P}$ . We assume that we are given $\ell_{\max}=\max_{A,A^{\prime}}\ell(A,A^{\prime})$ (or an upper bound on it) in the input.

Algorithm $\mathsf{PolyAlg}(\eta)$* .*

Require: separation oracle for $\mathcal{P}$ , local $\rho$ -approximation algorithm $\mathcal{B}$ , and a $(\beta_{1},\beta_{2})$ -approximation algorithm $\mathsf{Alg}$ for $g(x,y,A)$ for all $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ .

Output: $\widetilde{x}\in X$ and $\widetilde{f}$ satisfying: $\widetilde{f}\leq h({\widehat{p}}\,;{\widetilde{x}})\leq\beta_{1}\beta_{2}\widetilde{f}$ , and $\widetilde{f}\leq\rho\bigl{(}\min_{x\in X}h({\widehat{p}}\,;{x})+\eta\bigr{)}$ .

A1.

Set $k\leftarrow 0,\ \bar{x}_{0}\leftarrow 0,\ \mu\leftarrow\min\bigl{\{}1,\frac{\eta}{2K^{\prime}R}\bigr{\}},\ N\leftarrow\lceil 2m^{2}\ln\bigl{(}\frac{2R}{\mu V}\bigr{)}\rceil$ . Let $E_{0}\leftarrow B(0,R)$ and $\mathcal{P}_{0}\leftarrow\mathcal{P}$ . 2. A2.

For $i=0,\ldots,N$ do the following. (We maintain that $E_{i}$ is an ellipsoid centered at $\bar{x}_{i}$ containing $\mathcal{P}_{k}$ .)

a)

If $\bar{x}_{i}\notin\mathcal{P}_{k}$ , let $a^{\intercal}x\leq b$ be an inequality that is satisfied by all $x\in\mathcal{P}_{k}$ but violated by $\bar{x}_{i}$ . (This is either obtained from a separation oracle for $\mathcal{P}$ , or from inequalities added in prior iterations.) Let $H$ be the halfspace $\{x\in\mathbb{R}^{m}:a\cdot(x-\bar{x}_{i})\leq 0\}$ . 2. b)

If $\bar{x}_{i}\in\mathcal{P}_{k}$ , let $\widetilde{x}_{k}\in X$ be obtained by rounding $\bar{x}_{i}$ using $\mathcal{B}$ . Use Lemma 3.9 and $\mathsf{Alg}$ to obtain a $\beta_{1}\beta_{2}$ -approximate solution $\gamma$ to (T ${}_{{\widehat{p}},{\widetilde{x}_{k}}}$ ) (which has polynomial-size support). Define $\widetilde{d}_{k}:=c+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}d^{\bar{x}_{i},A^{\prime}}$ , and $\widetilde{f}_{k}:=c^{\intercal}\widetilde{x}_{k}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(\widetilde{x}_{k},A^{\prime})$ . If $\widetilde{d}_{k}=0$ , then return $\widetilde{x}_{k}$ and $\widetilde{f}_{k}$ . Otherwise, let $H$ denote the halfspace $\{x\in\mathbb{R}^{m}:\widetilde{d}_{k}^{\intercal}(x-\bar{x}_{i})\leq 0\}$ . Set $\mathcal{P}_{k+1}\leftarrow\mathcal{P}_{k}\cap H$ , and $k\leftarrow k+1$ . 3. c)

Set $E_{i+1}$ to be the ellipsoid of minimum volume containing the half-ellipsoid $E_{i}\cap H$ , and let $\bar{x}_{i+1}$ be its center. 3. A3.

Let $k\leftarrow k-1$ . Let $j=\operatorname{argmin}_{i=0,\ldots,k}\widetilde{f}_{i}$ . Return $\widetilde{x}_{j}$ and $\widetilde{f}_{j}$ .

Lemma 3.11.

Suppose that we have a $(\beta_{1},\beta_{2})$ -approximation for $g(0,0,A^{\prime\prime})$ for some scenario $A^{\prime\prime}\in\mathcal{A}$ . We can efficiently determine that either $\mathsf{LB}=\frac{r}{\beta_{1}\ell_{\max}}$ is a lower bound on $\min_{x\in\mathcal{P}}h({p}\,;{x})$ for every distribution $p$ , or $x=0$ is an optimal solution to $\min_{x\in\mathcal{P}}h({p}\,;{x})$ for every distribution $p$ .

Proof.

We first show that $\min_{x\in X}h({p}\,;{x})\geq\frac{r}{\ell_{\max}}$ for every $p$ , if $\mathcal{A}$ contains any non-null scenario. Otherwise $x=0$ is an optimal solution to $\min_{x\in X}h({\widehat{p}}\,;{x})$ for every $p$ . Note that a non-null scenario $A$ must satisfy $g(0,A)\geq 1$ .

Say $A^{*}\in\mathcal{A}$ is a non-null scenario. Fix any $x\in X$ . There is a feasible solution $\gamma$ to (Tp,x) that sends at least $\frac{r}{\ell_{\max}}$ flow to $A^{*}$ , i.e., $\sum_{A\in\mathcal{A}^{\mathrm{sup}}}\gamma_{A,A^{*}}\geq\frac{r}{\ell_{\max}}$ . So $z({p}\,;{x})\geq\frac{r}{\ell_{\max}}\cdot g(x,A^{*})$ , and so $h({p}\,;{x})\geq\frac{r}{\ell_{\max}}$ , since $c^{\intercal}x+g(x,A^{*})\geq 1$ as $A^{*}$ is a non-null scenario. This holds for every $p$ .

If all scenarios in $\mathcal{A}$ are null scenarios, then $z({p}\,;{x})=z({p}\,;{0})$ for all $x\in X$ , since $g(x,A)=g(0,A)$ for all $A\in\mathcal{A}$ and $x\in X$ . Hence, $h({p}\,;{0})=\min_{x\in X}h({p}\,;{x})$ ; again, this holds for all $p$ .

We use the $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(0,0,A^{\prime\prime})$ to obtain a scenario $\overline{A}\in\mathcal{A}$ . Therefore, we have $g(0,\overline{A})\geq\frac{1}{\beta_{1}}\cdot\max_{A\in\mathcal{A}}g(0,A)$ . So if $g(0,\overline{A})<\frac{1}{\beta_{1}}$ , then $g(0,A)<1$ for all $A\in\mathcal{A}$ , which means that all scenarios in $\mathcal{A}$ are null scenarios, and we return $x=0$ as an optimal solution. Otherwise, we return the lower bound $\mathsf{LB}$ . To see why $\mathsf{LB}$ is a valid lower bound, when $g(0,\overline{A})\geq\frac{1}{\beta_{1}}$ , note that there are two cases. If $\mathcal{A}$ contains a non-null scenario then we have established that $\frac{r}{\ell_{\max}}$ is a lower bound. Otherwise, we have established that $x=0$ is an optimal solution; there is a feasible solution to (Tp,0) that sends at least $\frac{r}{\ell_{\max}}$ to $\overline{A}$ , so $h({p}\,;{0})\geq g(0,\overline{A})\cdot\frac{r}{\ell_{\max}}=\frac{r}{\beta_{1}\ell_{\max}}$ . ∎

Proof of Theorem 3.7.

We first apply Lemma 3.11 to either determine that $x=0$ is an optimal solution, or obtain a lower bound $\mathsf{LB}=\frac{r}{\beta_{1}\ell_{\max}}$ on $O^{*}_{\widehat{p}}$ . If Lemma 3.11 returns $x=0$ as an optimal solution, then we use Lemma 3.9 and $\mathsf{Alg}$ to obtain a $\beta_{1}\beta_{2}$ -approximate solution $\gamma$ to (T ${}_{{\widehat{p}},{0}}$ ). We return $x=0$ as the optimal solution, and $\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(0,A^{\prime})$ as an estimate of $h({\widehat{p}}\,;{0})$ , which is a suitable estimate due to Lemma 3.9 (ii).

So suppose Lemma 3.11 returns the lower bound $\mathsf{LB}$ . We run Algorithm $\mathsf{PolyAlg}$ with $\eta=\varepsilon\mathsf{LB}$ . By Lemma 3.9 (ii), we immediately obtain that $\widetilde{f}_{l}\leq h({\widehat{p}}\,;{\widetilde{x}_{l}})\leq\beta_{1}\beta_{2}\cdot\widetilde{f}_{l}$ for all $l=1,\ldots,k$ .

We re-work the arguments in Lemma 4.5 from [35]. For $S\subseteq\mathbb{R}^{m}$ , let $\mathsf{vol}(S)$ denote the volume of $S$ . Let $\mathsf{vol}_{m}$ denote the volume of the unit ball (in the $L_{2}$ -norm) in $\mathbb{R}^{m}$ . It is well known that $\frac{\mathsf{vol}(E_{i+1})}{\mathsf{vol}(E_{i})}\leq e^{-1/2m}$ for every $i=0,\ldots,N$ (see, e.g., [14]).

Let $x^{*}\in X$ be an optimal solution to $\min_{x\in X}h({\widehat{p}}\,;{x})$ . Recall that $\mu=\min\bigl{\{}1,\frac{\eta}{2K^{\prime}R}\bigr{\}}$ . If $\widetilde{d}_{l}\cdot(x^{*}-\bar{x}_{l})\geq 0$ for some $l$ (this includes the case when $\widetilde{d}_{l}=0$ ), then Lemma 3.10 shows that $\widetilde{f}_{l}\leq\rho\cdot h({\widehat{p}}\,;{x^{*}})$ . Otherwise consider the affine transformation $T$ defined by $T(x)=\mu I_{m}(x-x^{*})+x^{*}=\mu x+(1-\mu)x^{*}$ where $I_{m}$ is the $m\times m$ identity matrix, and let $W=T(\mathcal{P})$ , so $W\subseteq\mathcal{P}$ is a shrunken version of $\mathcal{P}$ . By properties of affine transformations, we have $\mathsf{vol}(W)=\mu^{m}\mathsf{vol}(\mathcal{P})\geq(\mu V)^{m}\mathsf{vol}_{m}$ , where the last inequality follows since $\mathcal{P}$ contains a ball of radius $V$ . For any $x^{\prime}=T(x)\in W$ , we have $\|x^{\prime}-x^{*}\|=\mu\|x-x^{*}\|\leq\frac{\eta}{K^{\prime}}$ since $x,x^{*}\in B(0,R)$ ; so $h({\widehat{p}}\,;{x^{\prime}})\leq h({\widehat{p}}\,;{x^{*}})+\eta$ since $h({\widehat{p}}\,;{\cdot})$ has Lipschitz constant at most $K^{\prime}$ . The volume of the ball $E_{0}=B(0,R)$ is $R^{m}\mathsf{vol}_{m}$ . Therefore,

[TABLE]

So there must be a point $x^{\prime}\in W$ that lies on a boundary of $\mathcal{P}_{k}$ generated by a hyperplane $\widehat{d}_{l}\cdot(x-\bar{x}_{l})=0$ . This implies (by Lemma 3.10) that

[TABLE]

where the last inequality follows since $\mathsf{LB}$ is a lower bound on $O^{*}_{\widehat{p}}$ . ∎

Proof of Lemma 3.9.

Part (ii) follows immediately from part (i) and the definition of $h({\widehat{p}}\,;{x})$ . We focus on proving part (i). We consider the dual of (T ${}_{\widehat{p},x}$ ), and show that a $(\beta_{1},\beta_{2})$ -approximation algorithm $\mathsf{Alg}$ for $g(x,y,A)$ yields an approximate separation oracle for the dual. The dual of (T ${}_{\widehat{p},x}$ ) is as follows.

[TABLE]

Notice that (D) is an LP (since $x$ is fixed) with an exponential number of constraints, but a polynomial number of variables. It is evident that $\mathsf{Alg}$ yields some type of approximate separation oracle for (D). Using a standard technique in approximation algorithms, we prove that (D), and the primal (T ${}_{\widehat{p},x}$ ), can be solved approximately (see, e.g., [22, 12]).

Define $\mathcal{Q}(\nu):=\{(\theta,y):\eqref{dgxy},\ \eqref{dnonneg},\ \sum_{A\in\mathcal{A}^{\mathrm{sup}}}\widehat{p}_{A}\theta_{A}+ry\leq\nu\}$ . Note that $\mathit{OPT}_{\ref{dual}}$ is the smallest $\nu$ such that $\mathcal{Q}(\nu)\neq\emptyset$ . We use $\mathsf{Alg}$ to give an approximate separation oracle in the following sense. Given $\nu,(\theta,y)$ , we either show that $(\beta_{1}\theta,\beta_{1}\beta_{2}y)\in\mathcal{Q}(\beta_{1}\beta_{2}\nu)$ , or we exhibit a hyperplane separating $(\theta,y)$ from $\mathcal{Q}(\nu)$ . Thus, for a fixed $\nu$ , in polynomial time, the ellipsoid method either certifies that $\mathcal{Q}(\nu)=\emptyset$ , or returns a point $(\theta,y)$ with $(\beta_{1}\theta,\beta_{1}\beta_{2}y)\in\mathcal{Q}(\beta_{1}\beta_{2}\nu)$ . The approximate separation oracle proceeds as follows. We first check if $\sum_{A\in\mathcal{A}^{\mathrm{sup}}}\widehat{p}_{A}\theta_{A}+ry\leq\nu$ and (5) hold, and if not, use the appropriate inequality as the separating hyperplane. Next, for every $A\in\mathcal{A}^{\mathrm{sup}}$ , we run $\mathsf{Alg}$ for the point $(x,y,A)$ . If in this process, we ever obtain a scenario $\overline{A}$ such that $g(x,\overline{A})-y\cdot\ell(A,\overline{A})>\theta_{A}$ then we return $\theta_{A}\geq g(x,\overline{A})-y\cdot\ell(A,\overline{A})$ as the separating hyperplane. Otherwise, for all $A\in\mathcal{A}^{\mathrm{sup}}$ and $A^{\prime}\in\mathcal{A}$ , we have

[TABLE]

This implies that $(\beta_{1}\theta,\beta_{1}\beta_{2}y)\in\mathcal{Q}(\beta_{1}\beta_{2}\nu)$ .

It is easy to find an upper bound $\mathsf{UB}$ with $\log\mathsf{UB}$ polynomially bounded such that $\mathcal{Q}(\mathsf{UB})\neq\emptyset$ . For a given $\epsilon>0$ , we use binary search in $[0,\mathsf{UB}]$ to find $\nu^{*}$ such that the ellipsoid method when run for $\nu^{*}$ (with the above separation oracle), returns a solution $(\theta^{*},y^{*})$ with $(\beta_{1}\theta^{*},\beta_{1}\beta_{2}y^{*})\in\mathcal{Q}(\beta_{1}\beta_{2}\nu^{*})$ , and when run for $\nu^{*}-\epsilon$ certifies that $\mathcal{Q}(\nu^{*}-\epsilon)=\emptyset$ . So $\mathit{OPT}_{\ref{dual}}\leq\beta_{1}\beta_{2}\nu^{*}$ . For $\nu^{*}-\epsilon$ , we obtain a polynomial-size certificate for the emptiness of $\mathcal{Q}(\nu^{*}-\epsilon)$ . This consists of the polynomially many violated inequalities returned by the separation oracle during the execution of the ellipsoid method, and the inequality $\sum_{A\in\mathcal{A}^{\mathrm{sup}}}\widehat{p}_{A}\theta_{A}+ry\leq\nu^{*}-\epsilon$ . By duality (or Farkas’ lemma), this means that if we restrict (T ${}_{\widehat{p},x}$ ) to only use the $\gamma_{A,A^{\prime}}$ variables corresponding to (the polynomially-many) violated inequalities of type (4) returned during the execution of the ellipsoid method, we can obtain a polynomial-size feasible solution $\overline{\gamma}$ to (T ${}_{\widehat{p},x}$ ) whose value is at least $\nu^{*}-\epsilon$ . If we take $\epsilon$ to be $1/\exp(\mathcal{I})$ (so the binary search still takes polynomial time), this also implies that $\overline{\gamma}$ has value at least $\nu^{*}\geq\mathit{OPT}_{\ref{dual}}/(\beta_{1}\beta_{2})$ . ∎

3.2.1 Hardness results for the SAA problem

First, observe that for the DR 2-stage problem $\min_{x\in X}h({\widehat{p}}\,;{x})$ , where $\widehat{p}$ has polynomial-size support, if we set $r=\ell_{\max}$ , then $z({\widehat{p}}\,;{0})=\max_{A\in\mathcal{A}}g(0,A)$ , so that computing $z({\widehat{p}}\,;{0})$ is equivalent to the $\max$ - $\min$ problem $\max_{A\in\mathcal{A}}g(0,A)$ .

Theorem 3.12.

Consider the DR 2-stage problem $\min_{x\in X}h({\widehat{p}}\,;{x})$ , where the support of $\widehat{p}$ is a polynomial-size subset of $\mathcal{A}_{\leq k}=\{A\subseteq U:|A|\leq k\}$ . Consider the following two settings.

(B1)

the $k$ -bounded setting with the $\frac{1}{2}L_{1}$ metric; 2. (B2)

the unrestricted setting with scenario metric $\ell$ given by: $\ell(A,A)=0$ for all $A\in\mathcal{A}$ ; for $A\neq A^{\prime}\in\mathcal{A}$ , we have $\ell(A,A^{\prime})=1$ if $|A|,|A^{\prime}|\leq k$ , and $Z$ otherwise, where $\frac{Z}{2}$ is an upper bound on $g(0,U)$ .

Assume that $g(0,\emptyset)=0$ , the $k$ - $\max$ - $\min$ problem $(\Pi):\ \max_{A\subseteq U:|A|\leq k}g(0,A)$ , is NP-hard, and the optimum value of $(\Pi)$ is at least $1$ . We have the following hardness results in both settings, assuming P $\neq$ NP.

(a)

No polytime multiplicative approximation is possible for computing $g(0,y,\emptyset)$ , given $y\geq 0$ as input. 2. (b)

By choosing $\widehat{p}$ suitably, the hardness result in (a) carries over to the problem of computing ${\textstyle\operatorname*{E}_{A\sim\widehat{p}}}\bigl{[}g(0,y,A)\bigr{]}$ , given $y\geq 0$ as input. 3. (c)

One can choose $r$ , $\widehat{p}$ so that the problem of computing $z({\widehat{p}}\,;{0})$ is at least as hard as $(\Pi)$ .

Proof.

Part (b) follows from part (a) by simply taking $\widehat{p}$ to be the distribution that puts a weight of $1$ on the scenario $\emptyset$ ; then ${\textstyle\operatorname*{E}_{A\sim\widehat{p}}}\bigl{[}g(0,y,A)\bigr{]}=g(0,y,\emptyset)$ , so the hardness result in part (a) carries over. Let $A^{*}\in\mathcal{A}_{\leq k}$ be an optimal solution to $(\Pi)$ , and $\mathit{OPT}_{\Pi}=g(0,A^{*})$ be its objective value.

Part (a).

We consider the setting (B1) first. Clearly, $g(0,y,\emptyset)$ also seeks to find an optimum of $(\Pi)$ . By exploiting the mixed-sign objective, we can argue that any multiplicative approximation would allow us to decide if $\mathit{OPT}_{\Pi}>T$ by setting $y$ appropriately, which is NP-complete. More precisely, suppose we have a $\beta$ -approximation algorithm for $g(0,y,\emptyset)$ . Then, we can decide if $\mathit{OPT}_{\Pi}>T$ for a given number $T\geq 0$ as follows. Set $y=T$ , and run the $\beta$ -approximation algorithm. If $\mathit{OPT}_{\Pi}>T$ , then

[TABLE]

so the approximation algorithm would return a solution with positive value. If instead we have $\mathit{OPT}_{\Pi}\leq T$ , then for every scenario $A^{\prime}\in\mathcal{A}_{\leq k}$ with $A^{\prime}\neq\emptyset$ , we have $g(0,A^{\prime})-y\cdot\ell(\emptyset,A^{\prime})=g(0,A^{\prime})-T\leq 0$ . Since we also have $g(0,\emptyset)-y\cdot\ell(\emptyset,\emptyset)=0-T\cdot 0=0$ , we conclude that $g(0,y,\emptyset)=0$ , and so the approximation algorithm must return a solution with value [math]. So we can distinguish between $\mathit{OPT}_{\Pi}>T$ and $\mathit{OPT}_{\Pi}\leq T$ .

Now consider the setting (B2). Again, suppose we are given $T\geq 0$ and we want to decide if $\mathit{OPT}_{\Pi}>T$ . We may assume that $T\geq 1$ , as otherwise the answer is yes. Again take $y=T$ . If $\mathit{OPT}_{\Pi}>T$ , then scenario $A^{*}$ satisfies $g(0,A^{*})-y\cdot\ell(\emptyset,A^{*})>T-T\cdot 1\geq 0$ , so a multiplicative approximation for $g(0,y,\emptyset)$ must return a solution with positive objective value. If $\mathit{OPT}_{\Pi}\leq T$ , then we claim that $g(0,y,\emptyset)=0$ , and so the approximation algorithm must return a solution with objective value 0. Thus, we can distinguish between $\mathit{OPT}_{\Pi}\geq T$ and $\mathit{OPT}_{\Pi}<T$ . To prove the claim, we have $g(0,\emptyset)-y\cdot\ell(\emptyset,\emptyset)=0$ . For every $A^{\prime}\in\mathcal{A}_{\leq k}$ , we have $g(0,A^{\prime})-y\cdot\ell(\emptyset,A^{\prime})\leq T-T\cdot 1=0$ . For every $A^{\prime}\notin\mathcal{A}_{\leq k}$ , we have $g(0,A^{\prime})-y\cdot\ell(\emptyset,A^{\prime})\leq Z/2-TZ\leq 0$ .

Part (c).

For the setting (B1), we simply set $r=\ell_{\max}$ (and $\widehat{p}$ to be arbitrary). Then, we have $z({\widehat{p}}\,;{0})=\max_{A^{\prime}\in\mathcal{A}_{\leq k}}g(0,A^{\prime})$ , which is exactly the same as problem $(\Pi)$ .

For the setting (B2), we set $r=1$ and take $\widehat{p}$ to be the distribution that puts weight of 1 on $\emptyset$ . We claim that $z({\widehat{p}}\,;{0})$ is again the same as problem $(\Pi)$ . Setting $\gamma_{\emptyset,A^{*}}=1$ and $\gamma_{A,A^{\prime}}=0$ everywhere else gives a feasible solution to (T ${}_{{\widehat{p}},{0}}$ ) of objective value $\mathit{OPT}_{\Pi}$ . Let $\gamma^{*}$ be an optimal solution to (T ${}_{{\widehat{p}},{0}}$ ). Let $\alpha$ be the amount of flow sent by $\gamma^{*}$ on $(\emptyset,A^{\prime})$ pairs with $\ell(\emptyset,A^{\prime})=Z$ . Let $\theta=\gamma^{*}_{\emptyset,\emptyset}$ . The flow on the remaining $(\emptyset,A)$ pairs has volume $1-\alpha-\theta$ , contributes at most $(1-\alpha-\theta)\mathit{OPT}_{\Pi}$ to the objective, and has $\ell$ -cost $1-\alpha-\theta$ . So we have $\alpha\cdot Z+(1-\alpha-\theta)\leq 1$ and $\mathit{OPT}_{\Pi}\leq\alpha\cdot\frac{Z}{2}+(1-\alpha-\theta)\mathit{OPT}_{\Pi}$ , which implies that $(\alpha+\theta)\bigl{(}\mathit{OPT}_{\Pi}-\frac{1}{2}\bigr{)}\leq 0$ . Since $\mathit{OPT}_{\Pi}\geq 1$ by assumption, we have that $\alpha+\theta=0$ , and hence $\gamma^{*}$ has objective value $\mathit{OPT}_{\Pi}$ . ∎

3.2.2 Refinements: formulating (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ) as a compact LP in special

cases

We say that the set of scenarios $\mathcal{A}$ is collapsible under the scenario metric $\ell$ if for every scenario $A\in\mathcal{A}$ , we can efficiently compute a polynomial-size collection of scenarios $\phi(A)$ such that for every $x\in\mathcal{P}$ , $y\geq 0$ , we have $g(x,y,A)=\max_{A^{\prime}\in\phi(A)}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}$ . For example, if $\mathcal{A}=2^{U}$ for a ground set $U$ , $\ell$ is the discrete scenario metric, and $g(x,A)\leq g(x,A^{\prime})$ for all $x$ , $A\subseteq A^{\prime}$ , then $\mathcal{A}$ is collapsible under $\ell$ since $g(x,y,A)$ is attained by scenarios $A$ or $U$ , for all $(x,y,A)\in\mathcal{P}\times\mathbb{R}_{+}\times\mathcal{A}$ . We show that if $\mathcal{A}$ is collapsible under $\ell$ then (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ) can be cast as a polytime-solvable LP, and its optimal solution can be rounded using an algorithm that is weaker than a local approximation algorithm. (Note also that in this special case, we have a simple, application-independent polytime algorithm for computing $g(x,y,A)$ exactly.)

A restricted local $\rho$ -approximation algorithm takes as input a point $x\in\mathcal{P}$ and a set of scenarios $\widetilde{\mathcal{A}}\subseteq\mathcal{A}$ , and returns an integral solution $\widetilde{x}\in X$ and integral recourse actions $\widetilde{z}^{A}$ for every $A\in\widetilde{\mathcal{A}}$ (possibly specified implicitly), such that $c^{\intercal}\widetilde{x}\leq\rho(c^{\intercal}x)$ and $\text{(cost of$ \widetilde{z}^{A} $)}\leq\rho g(x,A)$ for all $A\in\widetilde{\mathcal{A}}$ . (A local $\rho$ -approximation algorithm is a special case of this.) This weaker notion will be crucial for the Steiner-tree application in Section 3.3.

Theorem 3.13.

Suppose that $\mathcal{A}$ is collapsible under the scenario metric $\ell$ , and $g(x,A)$ is the optimal value of a polytime-solvable LP for all $(x,A)\in\mathcal{P}\times\mathcal{A}$ . Suppose that we have a polytime separation oracle for $\mathcal{P}$ , and a restricted local $\rho$ -approximation algorithm for (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ). Then, in $\operatorname{\mathsf{poly}}(\mathcal{I})$ time, we can compute:

(a)

an optimal solution $\bar{x}\in\mathcal{P}$ to $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ , and its objective value $h({\widehat{p}}\,;{\bar{x}})$ ; 2. (b)

$\widetilde{x}\in X$ , and its objective value $h({\widehat{p}}\,;{\widetilde{x}})$ , satisfying $h({\widehat{p}}\,;{\widetilde{x}})\leq\rho\cdot\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ .

Proof.

We reformulate $z({\widehat{p}}\,;{x})$ as an LP. The dual of (T ${}_{\widehat{p},x}$ ) is as follows.

[TABLE]

Since by assumption $\mathcal{A}$ is collapsible under the scenario metric $\ell$ , the exponentially many constraints in (6) can be collapsed to the polynomially many constraints:

[TABLE]

Suppose that $g(x,A)$ is captured by the polytime-solvable LP: $\min\ c^{A}\cdot z_{A}\ \text{s.t.}\ (x,z_{A})\in\mathcal{F}(A)$ , where $\mathcal{F}(A)$ is a polytope (over which we can optimize linear functions efficiently). Then, incorporating this in the above constraints, we obtain the following LP-formulation for $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ .

[TABLE]

Since we have polytime separation oracles for the polytopes $\mathcal{P}$ and $\{\mathcal{F}(A^{\prime})\}$ , we can efficiently compute an optimal solution $\bar{x}$ for (DR-LP) using the ellipsoid method. This proves part (a).

Part (b) follows from part (a) by applying the restricted local $\rho$ -approximation algorithm with the scenario set $\widetilde{\mathcal{A}}:=\cup_{A\in\mathcal{A}^{\mathrm{sup}}}\phi(A)$ to round $\bar{x}$ and obtain $\widetilde{x}\in X$ . As shown above, we can efficiently compute $z({\widehat{p}}\,;{\widetilde{x}})$ , and hence $h({\widehat{p}}\,;{\widetilde{x}})$ , by solving an LP. Observe that if $(\theta^{*},y^{*})$ is an optimal solution to (D ${}_{\widehat{p},\bar{x}}$ ), then $(\rho\theta^{*},\rho y^{*})$ satisfies constraints (7), which implies that $z({\widehat{p}}\,;{\widetilde{x}})\leq\rho\cdot z({\widehat{p}}\,;{\bar{x}})$ . Since we also have $c^{\intercal}\widetilde{x}\leq\rho c^{\intercal}\bar{x}$ , this implies $h({\widehat{p}}\,;{\widetilde{x}})\leq\rho\cdot h({\widehat{p}}\,;{\bar{x}})$ . ∎

3.3 Applications to distributionally robust combinatorial optimization

We now apply our framework—i.e., Theorems 3.5 and 3.7—for handling general DR 2-stage problems to obtain the first approximation guarantees for the DR versions of various combinatorial-optimization problems (under the Wasserstein metric) such as set cover, vertex cover, edge cover, facility location, and Steiner tree. Except for set cover, our approximation factors are within $O(1)$ factors of the guarantees known for the deterministic counterparts of these problems. In order to apply Theorems 3.5 and 3.7 for a specific problem, we need to do the following.

Verify that properties (P1)–(P6) hold. This is usually quite immediate. (P1)–(P3) follow from the problem definition (in most cases $X=\{0,1\}^{m}$ , $\mathcal{P}=[0,1]^{m}$ ), with $\lambda$ being the maximum factor by which the cost of a first-stage action increases in the second stage. (P4), (P5) follow from prior work [35, 38] as the underlying 2-stage problem falls into the class of 2-stage programs considered therein. (P6) can usually be satisfied by taking $\tau=\mathsf{UB}/(\min_{A,A^{\prime}:\ell(A,A^{\prime})>0}\ell(A,A^{\prime}))$ , for a suitable upper bound $\mathsf{UB}$ on $\max_{A\in\mathcal{A}}g(0,A)$ . 2. 2.

Furnish the following algorithms.

(a)

An LP-relative $\alpha$ -approximation algorithm for the deterministic counterpart, so as to round $g(x,A)$ and obtain integral second-stage decisions: we simply plug in known approximation results. 2. (b)

A local $\rho$ -approximation algorithm for the 2-stage problem: we have $\rho=2\alpha$ for set cover, vertex cover, and edge cover [35], and $\rho=O(1)$ for facility location [35]. (For Steiner tree, we use Theorem 3.13 in place of Theorem 3.7; see below.) 3. (c)

A $(\beta_{1},\beta_{2})$ -approximation algorithm for computing $g(x,y,A)$ , where $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ . This is a new component that we need to devise, whose design will depend on the scenario set $\mathcal{A}$ and the scenario metric $\ell$ (and of course the underlying problem). For various problems, we show how to obtain such an approximation by building upon results known for $k$ - $\max$ - $\min$ problems. We defer the proof of Theorem 3.14 to the end of this section (Section 3.3.6).

Theorem 3.14.

For the $k$ -bounded setting with $\ell$ being the discrete metric, for any $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ , we can obtain $(\beta,1)$ -approximation algorithms for computing $g(x,y,A)$ , where $\beta$ is: (a) $O(\log n)$ for set cover; (b) $\frac{2e}{e-1}$ for vertex cover; and (c) $2$ for edge cover.

Theorems 3.5 and 3.7 then show that, for any $\varepsilon>0$ , we can obtain a solution to the distributionally robust discrete 2-stage problem (i.e., integral first- and second-stage decisions) of cost at most $4\alpha\rho\beta_{1}\beta_{2}\bigl{(}1+O(\varepsilon)\bigr{)}$ times the optimum in $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon}\bigr{)}$ time (and hence, sample complexity).

In certain cases, we can obtain improved guarantees by exploiting the fact that the fractional SAA problem, $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ , can be solved in a better way, without resorting to a local approximation algorithm. The most generic such setting is the unrestricted setting when the scenario collection $\mathcal{A}=2^{U}$ is collapsible under the scenario metric. This includes the following natural choices of the scenario metric.

Lemma 3.15.

Suppose that for all $x\in\mathcal{P}$ , and all $A\subseteq A^{\prime}$ , we have $g(x,A)\leq g(x,A^{\prime})$ . Then the collection of scenarios $\mathcal{A}=2^{U}$ is collapsible under: (i) the discrete metric $\ell^{\mathsf{dis}}$ ; and (ii) the asymmetric metric $\ell^{\mathsf{asym}}_{\infty}(A,A^{\prime})=\max_{j^{\prime}\in A^{\prime}}w(j^{\prime},A)$ , where $w$ is a metric on $U$ .

Proof.

Let $A\in\mathcal{A}$ be an arbitrary scenario. If $\ell$ is the discrete metric $\ell^{\mathsf{dis}}$ , we take $\phi(A):=\{A,U\}$ . If $\ell$ is the asymmetric metric $\ell^{\mathsf{asym}}_{\infty}$ , we take $\phi(A):=\left\{\{j\in U:\min_{k\in A}w_{kj}\leq\mu\}:\mu\in\mathcal{L}\right\}$ , where $\mathcal{L}:=\{w_{jj^{\prime}}:j,j^{\prime}\in U\}$ is the set of all distances between two elements of the ground set. Note that in both settings, if we choose an arbitrary pair $(x,\mu)\in\mathcal{P}\times\mathcal{L}$ , the collection of scenarios $\phi(A)$ contains the (unique) maximal solution $A^{\prime}$ for the constrained problem (3.25). By the monotonicity property of the second-stage costs $g(\cdot,\cdot)$ imposed in the lemma statement, $A^{\prime}$ is optimal for (3.25). By Lemma 3.25, it follows that $\phi(A)$ contains an optimal solution for the unconstrained problem $g(x,y,A)$ for every pair $(x,y)\in\mathcal{P}\times\mathbb{R}_{+}$ , and so $\mathcal{A}$ is collapsible under $\ell$ . ∎

The condition on $g$ in Lemma 3.15 holds for all our applications, since they are covering problems. Thus, in the unrestricted setting with Wasserstein metric corresponding to the scenario metrics in Lemma 3.15, Theorem 3.13 combined with Theorem 3.5 yields an improved $4\alpha\rho\bigl{(}1+O(\varepsilon)\bigr{)}$ -approximation, using a restricted local $\rho$ -approximation algorithm, a weaker requirement that is crucial for Steiner tree. There are other, orthogonal benefits that result from achieving a better approximation for the fractional SAA problem than that given by Theorem 3.7. These require taking a different route than Theorem 3.5 to transfer approximation guarantees from the SAA problem to the original problem. We discuss these in the context of the specific problems to which they apply.

3.3.1 Set cover

The DR version was defined in Section 2. Recall that an instance is given by $\bigl{(}U,\mathcal{S},\{c_{S},c^{\mathrm{II}}_{S}\}_{S\in\mathcal{S}}\bigr{)}$ , where $\mathcal{S}\subseteq 2^{U}$ and $c,c^{\mathrm{II}}$ denote the first- and second-stage costs respectively. Let $n=|U|$ . We have $\alpha=O(\log n)$ , and $\rho=2\alpha$ . Different scenarios could be quite unrelated, so there does not seem to be a natural choice for $\ell$ other than the discrete metric $\ell^{\mathsf{dis}}$ ; we therefore consider the $\frac{1}{2}L_{1}$ -metric. We can take $\tau=\sum_{S\in\mathcal{S}}c^{\mathrm{II}}_{S}$ . Instantiating the above results yields an $O(\log^{2}n)$ -approximation in the unrestricted setting, and an $O(\log^{3}n)$ -approximation in the $k$ -bounded setting (using Theorem 3.14 (a)). But we can do better and improve these guarantees by an $O(\log n)$ factor.

By incorporating a decoupling idea of [35] in our ellipsoid-based algorithm (in a manner similar to [11] in their work on 2-stage robust set cover), we can avoid the use of local approximation algorithm in Algorithm $\mathsf{PolyAlg}$ , and instead use a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(x,y,A)$ more directly.

Theorem 3.16.

Consider the fractional SAA $\mathsf{DRSSC}$ problem: $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ . Suppose that we have a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(0,y,A)$ for any $(y,A)\in\mathbb{R}_{+}\times\mathcal{A}$ . For any $\varepsilon>0$ , in $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\log(\frac{1}{\varepsilon})\bigr{)}$ time, we can compute $\bar{x}\in\mathcal{P}$ , and an estimate $\overline{f}$ of $h({\widehat{p}}\,;{\bar{x}})$ , satisfying $h({\widehat{p}}\,;{\bar{x}})\leq\beta_{1}\beta_{2}\overline{f}$ and $\overline{f}\leq 2(1+\varepsilon)\cdot\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ .

We complement Theorem 3.16 with an analogue of Theorem 3.5, to transfer approximation guarantees from the fractional SAA problem, $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ , to the original fractional problem, $\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})$ .

Note that by Lemma 3.11, we can find in polytime (under very mild assumptions) a lower bound $\mathsf{LB}$ (independent of $p$ ) on the optimal value of $\min_{x\in\mathcal{P}}h({p}\,;{x})$ such that $\log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ , or determine if $x=0$ is an optimal solution to $\min_{x\in\mathcal{P}}h({p}\,;{x})$ for every distribution $p$ . In the latter case, there is nothing to be done, so assume otherwise.

Theorem 3.17.

Let $\varepsilon\leq\frac{1}{3}$ , $\eta>0$ . Let (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ): $\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})$ , be the fractional version of a DR problem satisfying properties (P1)–(P6). Let $\mathsf{LB}$ be a lower bound on $\min_{x\in\mathcal{P}}h({p}\,;{x})$ for all $p$ . Consider $k=\frac{2}{\varepsilon}\log\bigl{(}\frac{1}{\delta}\bigr{)}$ SAA problems with objective functions $h({\widehat{p}^{i}}\,;{x}):=c^{\intercal}x+z({\widehat{p}^{i}}\,;{x})$ , for $i=1,\ldots,k$ , where each $\widehat{p}^{i}$ is an empirical estimate of $\mathring{p}$ constructed using $N=\operatorname{\mathsf{poly}}(\frac{\lambda}{\varepsilon},\log(\frac{\tau R}{V\mathsf{LB}}),\log(\frac{1}{\delta})\bigr{)}$ independent samples. Suppose that for every $i=1,\ldots,k$ , we have a solution $\bar{x}^{i}\in\mathcal{P}$ and an estimate $\overline{f}^{i}$ of $h({\widehat{p}^{i}}\,;{\bar{x}^{i}})$ satisfying $h({\widehat{p}^{i}}\,;{\bar{x}^{i}})\leq\overline{\beta}\cdot\overline{f}^{i}$ and $\overline{f}^{i}\leq\rho\cdot\min_{x\in\mathcal{P}}h({\widehat{p}^{i}}\,;{x})$ (where $\overline{\beta},\rho\geq 1$ ). Let $j=\operatorname{argmin}_{i=1,\ldots,k}\overline{f}^{i}$ and $\bar{x}=\bar{x}^{j}$ . Then, $h({\mathring{p}}\,;{\bar{x}})\leq 4\overline{\beta}\rho\bigl{(}1+O(\varepsilon)\bigr{)}\cdot\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})$ with probability at least $1-3\delta$ .

Before proving Theorems 3.16 and 3.17, we state the results that follow from these (and other prior results). Combining Theorems 3.13 (a) and 3.17, and a local $\rho$ -approximation algorithm (where $\rho=O(\log n)$ ), we obtain an $O(\log n)$ -approximation in the unrestricted setting. Combining Theorems 3.14 (a), 3.16, and 3.17, and a local $\rho$ -approximation algorithm, we obtain an $O(\log^{2}n)$ in the $k$ -bounded setting.

Proof of Theorem 3.17.

The proof follows by suitably discretizing $\mathcal{P}$ and applying Theorem 3.5 to the discretized version of $\mathcal{P}$ . By Lemma 3.8, for every distribution $p$ , we have that the Lipschitz constant of $h({p}\,;{x})$ is at most $K^{\prime}:=\|c\|+K$ , and $\ln K^{\prime}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . Recall that by (P3), $\mathcal{P}$ is contained in the ball $B(0,R)$ , and contains a ball of radius $V\leq 1$ such that $\ln\bigl{(}\frac{R}{V}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . We discretize $\mathcal{P}$ as in [38]. Let $\Delta=\frac{\varepsilon\cdot\mathsf{LB}\cdot V}{8K^{\prime}R\sqrt{m}}$ , and consider the grid $\mathcal{G}=\{x\in\mathcal{P}:x_{i}=n_{i}\Delta,\ \ n_{i}\in\mathbb{Z}_{+}\text{ for all }i=1,\dots,m\}$ .555Note that $V$ needs to be a part of the specification of the grid size; otherwise, a “flat” $\mathcal{P}$ could evade the grid across arbitrarily large distances. As shown in [38], we have: (i) $|\mathcal{G}|\leq\bigl{(}\frac{2R}{\Delta}\bigr{)}^{m}$ , and so $\log|\mathcal{G}|=\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\log(\frac{1}{\varepsilon\cdot\mathsf{LB}})\bigr{)}$ ; and (ii) for any $x\in\mathcal{P}$ , letting $\phi(x)$ denote the point in $\mathcal{G}$ closest to $x$ in Euclidean distance, we have $\bigl{\|}x-\phi(x)\bigr{\|}\leq\frac{\varepsilon\mathsf{LB}}{K^{\prime}}$ , and hence, $\bigl{|}h({p}\,;{x})-h({p}\,;{\phi(x)})\bigr{|}\leq\varepsilon\mathsf{LB}$ .

Let $N$ , the number of samples used to construct each empirical estimate $\widehat{p}^{i}$ , be as given by Theorem 3.5, when we apply it taking $X$ to be the grid $\mathcal{G}$ —i.e., we are considering the DR 2-stage problem $\min_{x\in\mathcal{G}}h({\mathring{p}}\,;{x})$ —and $\eta=\varepsilon\mathsf{LB}$ . Note that properties (P1)–(P6) hold for this DR problem (since by assumption they hold for the DR problem $\min_{x\in X}h({\mathring{p}}\,;{x})$ ).

To apply Theorem 3.5 with $X=\mathcal{G}$ , we also need to supply the points $\widehat{x}^{i}$ and the estimates $f^{i}$ as required by the theorem statement. We set $\widehat{x}^{i}=\phi(\bar{x}^{i})$ , and $f^{i}=\max\{\overline{f}^{i},\mathsf{LB}\}$ for all $i=1,\ldots,k$ . We show that these satisfy properties (S1) and (S2) in the statement of Theorem 3.5, with $\beta=\overline{\beta}(1+\varepsilon)$ . To see this, consider any $i=1,\ldots,k$ . We have

[TABLE]

and since $\mathsf{LB}\leq\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})$ , we have $f^{i}\leq\rho\min_{x\in\mathcal{P}}h({\widehat{p}^{i}}\,;{x})\leq\rho\min_{x\in\mathcal{G}}h({\widehat{p}^{i}}\,;{x})$ . Moreover, the index $j$ , which is a minimizer of the $\{\overline{f}_{i}\}$ estimates, is also a minimizer for the new estimates $\{f^{i}\}$ . So applying Theorem 3.5, we obtain that with probability at least $1-3\delta$ ,

[TABLE]

Note that $\min_{x\in\mathcal{G}}h({\mathring{p}}\,;{x})\leq\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{\phi(x)})\leq\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})+\varepsilon\mathsf{LB}$ . Therefore, we have

[TABLE]

Proof of Theorem 3.16

Let $\bigl{(}U,\mathcal{S},\{c_{S},c^{\mathrm{II}}_{S}\}_{S\in\mathcal{S}}\bigr{)}$ be the DR set cover instance being solved. For any point $\bar{x}\in\mathcal{P}$ , let $S_{\bar{x}}:=\{e\in U:\sum_{S\in\mathcal{S}:e\in S}\bar{x}_{S}\geq 1/2\}$ be the set of elements covered to an extent of at least $1/2$ by the first-stage sets.

The improvement comes from a better way of generating a cut passing through the center $\bar{x}$ of the current ellipsoid, when $\bar{x}\in\mathcal{P}$ . Instead of rounding $\bar{x}$ to $\widetilde{x}\in X$ using a local $\rho$ -approximation algorithm and using approximate solutions to $g(\widetilde{x},y,A)$ to generate a suitable cut at $\bar{x}$ in step A2.A2.b) of Algorithm $\mathsf{PolyAlg}$ , we do the following. Since elements in $S_{\bar{x}}$ are mostly covered by $\bar{x}$ , and the remaining elements are mostly uncovered, intuitively only these remaining elements should matter. Indeed, we argue that approximate solutions to $\max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(0,A^{\prime}\setminus S_{\bar{x}})-y\cdot\ell(A,A^{\prime})\bigr{)}$ can be used to obtain a suitable cut at $\bar{x}$ . Note that this problem can be cast as $g(0,y,A)$ for a modified instance where we add $S_{\bar{x}}$ to our set-system, with costs $c_{S_{\bar{x}}}=c^{\mathrm{II}}_{S_{\bar{x}}}=0$ . Thus, we avoid the $\rho$ -factor loss that was incurred earlier due to the local approximation.

Consider the following LP.

[TABLE]

We prove analogues of Lemmas 3.9 and 3.10 showing that one can compute an approximate solution to (W ${}_{\bar{x}}$ ) using an approximation algorithm for $g(0,y,A)$ (Lemma 3.18 (i)), which allows us to both approximate $h({\widehat{p}}\,;{\widetilde{x}})$ for a related point $\widetilde{x}$ (Lemma 3.18 (ii)), and obtain a suitable cut passing through $\bar{x}$ (Lemma 3.19).

Lemma 3.18.

Let $\bar{x}\in\mathcal{P}$ and $\widetilde{x}:=(\min\{2\bar{x}_{S},1\})_{S\in\mathcal{S}}$ . Suppose we have a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(0,y,A)$ for all $(y,A)\in\mathbb{R}_{+}\times\mathcal{A}$ . Then, (i) we can compute a $\beta_{1}\beta_{2}$ -approximate solution $\gamma$ to (W ${}_{\bar{x}}$ ); (ii) hence, letting $\widetilde{f}=2c^{\intercal}\bar{x}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(0,A^{\prime}\setminus S_{\bar{x}})$ , we have $h({\widehat{p}}\,;{\widetilde{x}})\leq\beta_{1}\beta_{2}\cdot\widetilde{f}$ .

Proof.

Consider the instance of DR set cover obtained from the original instance $\bigl{(}U,\mathcal{S},\{c_{S},c^{\mathrm{II}}_{S}\}_{S\in\mathcal{S}}\bigr{)}$ by adding the set $S_{\bar{x}}$ to $\mathcal{S}$ , with costs $c_{S_{\bar{x}}}=c^{\mathrm{II}}_{S_{\bar{x}}}=0$ . Let $\{g^{\mathrm{new}}(x,A)\}_{x\in\mathcal{P},A\in\mathcal{A}}$ denote the second-stage costs for this new instance of DR set cover. Note that, for every scenario $A\in\mathcal{A}$ , we have $g^{\mathrm{new}}(0,A)=g(0,A\setminus S_{\bar{x}})$ . Therefore, if we were to write the LP ( $\text{T}_{\widehat{p},0}$ ) for this modified instance of DR set cover (i.e., ( $\text{T}_{\widehat{p},0}$ ) with $g$ substituted by $g^{\mathrm{new}}$ ), we would obtain (W ${}_{\bar{x}}$ ). This means that we can obtain a $\beta_{1}\beta_{2}$ -approximate solution $\gamma$ to (W ${}_{\bar{x}}$ ) by applying Lemma 3.9 (i) to the modified instance (using the $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(0,y,A)$ given to us, also applied to the modified instance). This proves (i).

To prove (ii), let $\gamma^{*}$ be an optimal solution of (T ${}_{\widehat{p},\widetilde{x}}$ ). We obtain

[TABLE]

The first inequality follows because $\widetilde{x}\leq 2\bar{x}$ and, for every scenario $A^{\prime}\in\mathcal{A}$ , we have $g(\widetilde{x},A^{\prime})\leq g(0,A^{\prime}\setminus S_{\bar{x}})$ . The latter inequality holds because every feasible fractional second-stage solution for scenario $A^{\prime}\setminus S_{\bar{x}}$ with $x=0$ as the first-stage solution, covers all elements of $A^{\prime}\setminus S_{\bar{x}}$ fully, and hence, combined with $\widetilde{x}$ , fully covers all elements of $A^{\prime}$ ; therefore, it yields feasible fractional second-stage actions for scenario $A^{\prime}$ given the first-stage actions $\widetilde{x}$ . The second inequality above follows because $\gamma$ is a $\beta_{1}\beta_{2}$ -approximate solution for (W ${}_{\bar{x}}$ ). The final inequality uses the fact that $\beta_{1},\beta_{2}\geq 1$ . ∎

Lemma 3.19.

Let $\bar{x}\in\mathcal{P}$ and $\widetilde{x}:=(\min\{2\bar{x}_{S},1\})_{S\in\mathcal{S}}$ . Let $\gamma$ be a $\beta$ -approximate solution to the LP (W ${}_{\bar{x}}$ ), and let $\widetilde{d}=c+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}d^{\bar{x},A^{\prime}\setminus S_{\bar{x}}}$ . If $x^{\prime}\in\mathcal{P}$ is such that $\widetilde{d}^{\intercal}(x^{\prime}-\bar{x})\geq 0$ , then $h({\widehat{p}}\,;{x^{\prime}})\geq\frac{1}{2}\bigl{(}2c^{\intercal}\bar{x}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(0,A^{\prime}\setminus S_{\bar{x}})\bigr{)}\geq\frac{1}{2\beta}\cdot h({\widehat{p}}\,;{\widetilde{x}})$ .

Proof.

Consider the function $f(x)=c^{\intercal}x+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(x,A^{\prime}\setminus S_{\bar{x}})$ defined over $\mathcal{P}$ . Note that $\gamma$ is feasible for the LP (T ${}_{\widehat{p},x^{\prime}}$ ) and $g(x^{\prime},A^{\prime}\setminus S_{\bar{x}})\leq g(x^{\prime},A^{\prime})$ for every scenario $A^{\prime}\in\mathcal{A}$ , which implies $h({\widehat{p}}\,;{x^{\prime}})\geq f(x^{\prime})$ . By mimicking the proof of Lemma 3.8, we have that $\widetilde{d}$ is a subgradient of $f$ at $\bar{x}$ . So

[TABLE]

Now, note that for every scenario $A^{\prime}\in\mathcal{A}$ , we have $g(\bar{x},A^{\prime}\setminus S_{\bar{x}})\geq\frac{1}{2}g(0,A^{\prime}\setminus S_{\bar{x}})$ . This is because if $z$ is a feasible second-stage solution to scenario $A^{\prime}\setminus S_{\bar{x}}$ given $\bar{x}$ as the first-stage actions, then it covers elements of $A^{\prime}\setminus S_{\bar{x}}$ to an extent of at least $\frac{1}{2}$ , and so $(\min\{2z_{S},1\})_{S\in\mathcal{S}}$ is a feasible second-stage solution for $A^{\prime}\setminus S_{\bar{x}}$ given [math] as the first-stage actions. So we obtain

[TABLE]

where the last inequality follows from Lemma 3.18 (ii). ∎

We now exploit Lemmas 3.18 and 3.19 to obtain Theorem 3.17. We do so by mimicking the proof of Theorem 3.7, and pointing out the changes to Algorithm $\mathsf{PolyAlg}$ and its analysis. Let $\mathsf{Alg}$ be a $(\beta_{1},\beta_{2})$ -approximation algorithm for $g(0,y,A)$ for all $(y,A)\in\mathbb{R}_{+}\times\mathcal{A}$ . As before, we start by using Lemma 3.11, either certifying that $x=0$ is an optimal solution to (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ) (in which case we return $x=0$ , and an estimate of $h({\widehat{p}}\,;{0})$ computed via Lemma 3.9), or that $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})\geq\mathsf{LB}$ , where $\mathsf{LB}=\frac{r}{\beta_{1}\ell_{\max}}$ . Suppose we are in the latter case. We run Algorithm $\mathsf{PolyAlg}$ with parameter $\eta=\varepsilon\mathsf{LB}$ , but modify step A2.A2.b) as follows.

•

If $\bar{x}_{i}\in\mathcal{P}_{k}$ , let $\widetilde{x}_{k}:=(\min\{2\bar{x}_{i,S},1\})_{S\in\mathcal{S}}$ . Use Lemma 3.18 and $\mathsf{Alg}$ to obtain a $\beta_{1}\beta_{2}$ -approximate solution $\gamma$ to (W ${}_{\bar{x}_{i}}$ ) (which has polynomial-size support). Define $\widetilde{d}_{k}:=c+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}d^{\bar{x}_{i},A^{\prime}\setminus S_{\bar{x}_{i}}}$ , and $\widetilde{f}_{k}:=2c^{\intercal}\bar{x}_{i}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(0,A^{\prime}\setminus S_{\bar{x}_{i}})$ . If $\widetilde{d}_{k}=0$ , then return $\widetilde{x}_{k}$ and $\widetilde{f}_{k}$ . Otherwise, let $H$ denote the halfspace $\{x\in\mathbb{R}^{m}:\widetilde{d}_{k}^{\intercal}(x-\bar{x}_{i})\leq 0\}$ . Set $\mathcal{P}_{k+1}\leftarrow\mathcal{P}_{k}\cap H$ , and $k\leftarrow k+1$ .

By Lemma 3.18 (ii), we immediately obtain that $h({\widehat{p}}\,;{\widetilde{x}_{l}})\leq\beta_{1}\beta_{2}\cdot\widetilde{f}_{l}$ for all $l=1,\ldots,k$ . Let $x^{*}\in\mathcal{P}$ be an optimal solution to $\min_{x\in\mathcal{P}}h({\widehat{p}}\,;{x})$ . We show that there exists an index $l$ such that $\widetilde{f}_{l}\leq 2(1+\varepsilon)\cdot h({\widehat{p}}\,;{x^{*}})$ . We have two cases to consider.

•

Case 1: we have $\widetilde{d}_{l}\cdot(x^{*}-\bar{x}_{l})\geq 0$ for some $l$ (this includes the case where $\widetilde{d}_{l}=0$ ). Then Lemma 3.19 shows that $\widetilde{f}_{l}\leq 2\cdot h({\widehat{p}}\,;{x^{*}})$ .

•

Case 2: we have $\widetilde{d}_{l}\cdot(x^{*}-\bar{x}_{l})<0$ for all $l$ . In this case, as argued in the proof Theorem 3.7, we can show that there must be a point $x^{\prime}\in\mathcal{P}$ such that $h({\widehat{p}}\,;{x^{\prime}})\leq h({\widehat{p}}\,;{x^{*}})+\eta$ and $\widehat{d}_{l}\cdot(x^{\prime}-\bar{x}_{l})=0$ for some $l$ . Using Lemma 3.19 again, we obtain $\widetilde{f}_{l}\leq 2\cdot h({\widehat{p}}\,;{x^{\prime}})\leq 2\bigl{(}h({\widehat{p}}\,;{x^{*}})+\eta\bigr{)}=2\cdot h({\widehat{p}}\,;{x^{*}})+2\varepsilon\cdot\mathsf{LB}\leq 2(1+\varepsilon)h({\widehat{p}}\,;{x^{*}})$ . ∎

3.3.2 Vertex cover

This is the special case of set cover where we want to cover edges of a graph by vertices, and we again consider the $\frac{1}{2}L_{1}$ -metric. We have $\alpha=2$ , $\rho=2\alpha$ , so we obtain approximation factors of $\bigl{(}4\rho+O(\varepsilon)\bigr{)}=\bigl{(}16+O(\varepsilon)\bigr{)}$ in the unrestricted setting (using Theorems 3.13 (a) and 3.17), and $\bigl{(}4\rho\alpha\cdot\frac{2e}{e-1}+O(\varepsilon)\bigr{)}=\bigl{(}101.25+O(\varepsilon)\bigr{)}$ in the $k$ -bounded setting (via Theorems 3.14 (b), 3.7, and 3.5).

3.3.3 Edge cover

This is the special case of set cover where we want to cover vertices of a graph by edges, and we again consider the $\frac{1}{2}L_{1}$ -metric. We have $\alpha=\frac{3}{2}$ , $\rho=2\alpha$ , so we obtain approximation factors of $\bigl{(}12+O(\varepsilon)\bigr{)}$ in the unrestricted setting (via Theorems 3.13 (a) and 3.17), and $\bigl{(}36+O(\varepsilon)\bigr{)}$ in the $k$ -bounded setting (via Theorems 3.14 (c), 3.7, and 3.5).

3.3.4 Facility location

The DR version ( $\mathsf{DRSFL}$ ) was defined in Section 2. Recall that an instance is given by the tuple $\bigl{(}\mathcal{F},\mathcal{C},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}},\{f_{i},f^{\mathrm{II}}_{i}\}_{i\in\mathcal{F}}\bigr{)}$ , where $\mathcal{F}$ , $\mathcal{C}$ are the facility and client-sets respectively, $w$ is the underlying metric, and $f,f^{\mathrm{II}}$ are the first- and second-stage facility-opening costs. We have $\alpha=1.488$ [25]. Shmoys and Swamy [35] showed that an LP-relative $\varrho$ -approximation for deterministic FL having a certain “demand-obliviousness” property can be turned into a $(\varrho+\alpha)$ -approximation algorithm for 2-stage FL. If the $\varrho$ -approximation algorithm has the property that it returns a solution where every cost component of the rounded solution—i.e., the facility cost, and each client’s assignment cost—is at most $\varrho$ times the corresponding cost component of the fractional solution, then the resulting algorithm is a local approximation algorithm. Using the deterministic $4$ -approximation algorithm of [36] gives a local $\rho$ -approximation with $\rho=5.488$ .

As noted in Section 2, besides the discrete scenario metric, we could define various other natural scenario metrics here in terms of the metric $w$ and obtain a rich class of DR models under the Wasserstein metric. We consider one such setting: the asymmetric metric given by $\ell^{\mathsf{asym}}_{\infty}(A,A^{\prime}):=\max_{j^{\prime}\in A^{\prime}}w(j^{\prime},A)$ .

Theorem 3.20.

For $\mathsf{DRSFL}$ with $\ell$ being either the discrete metric $\ell^{\mathsf{dis}}$ or the asymmetric metric $\ell^{\mathsf{asym}}_{\infty}$ , there is a $(6,1)$ -approximation for computing $g(x,y,A)$ in the $k$ -bounded setting, for any $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ ,

For the Wasserstein metric with respect to both the discrete metric and $\ell^{\mathsf{asym}}_{\infty}$ , we can take $\tau=\bigl{(}\sum_{i\in\mathcal{F}}f^{\mathrm{II}}_{i}+\sum_{i\in\mathcal{F},j\in\mathcal{C}}w_{ij}\bigr{)}/(\min_{i,j:w_{ij}>0}w_{ij})$ . We obtain the following approximation guarantees for $\mathsf{DRSFL}$ with the Wasserstein metric corresponding to the above scenario metrics: (i) $\bigl{(}4\rho+O(\varepsilon)\bigr{)}=\bigl{(}21.96+O(\varepsilon)\bigr{)}$ in the unrestricted setting (using Theorems 3.13 (a) and 3.17); and (ii) $\bigl{(}24\rho\alpha+O(\varepsilon)\bigr{)}=\bigl{(}196+O(\varepsilon)\bigr{)}$ in the $k$ -bounded setting (using Theorems 3.20, 3.7, and 3.5).

Proof of Theorem 3.20

Fix $(x,y,A)\in X\times\mathbb{R}_{+}\times\mathcal{A}$ , where $\mathcal{A}=\mathcal{A}_{\leq k}:=\{A\subseteq\mathcal{C}:|A|\leq k\}$ . Fix $\ell$ to be either the discrete scenario metric $\ell^{\mathsf{dis}}$ or the asymmetric metric $\ell^{\mathsf{asym}}_{\infty}$ . Since $\ell(A,A^{\prime})$ takes polynomially-many values, by Lemma 3.25 (i), it suffices to give a $6$ -approximation for the constrained problem (3.25): $\max_{A^{\prime}\in\mathcal{A}:\ell(A,A^{\prime})\leq\mu}g(x,A^{\prime})$ .

With both scenario metrics, this amounts to approximating the $k$ - $\max$ - $\min$ fractional facility location problem for an underlying facility-location instance $\bigl{(}\mathcal{F},\mathcal{C}^{\prime},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}^{\prime}},\{\widetilde{f}_{i}\}_{i\in\mathcal{F}}\bigr{)}$ , where $\widetilde{f}_{i}=0$ if $x_{i}=1$ , and is $f^{\mathrm{II}}_{i}$ otherwise. If $\ell=\ell^{\mathsf{dis}}$ and $\mu>0$ , then $\mathcal{C}^{\prime}=\mathcal{C}$ (if $\mu=0$ , the optimum of the constrained problem is $g(x,A)$ ); if $\ell=\ell^{\mathsf{asym}}_{\infty}$ , then $\mathcal{C}^{\prime}:=\{j\in\mathcal{C}:w(j,A)\leq\mu\}$ .

A $6$ -approximation algorithm for $k$ - $\max$ - $\min$ facility location.

We now devise an algorithm for the $k$ - $\max$ - $\min$ fractional facility-location problem corresponding to a facility-location instance (such as the one obtained above) $\bigl{(}\mathcal{F},\mathcal{C}^{\prime},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}^{\prime}},\{\widetilde{f}_{i}\}_{i\in\mathcal{F}}\bigr{)}$ .

Khandekar et al. [23] give a $10$ -approximation for the version of $k$ - $\max$ - $\min$ integral FL, where a scenario may place an arbitrary number of co-located clients at a location in $\mathcal{C}^{\prime}$ (and the total number of clients must be at most $k$ ).666Since the gap between the integral and fractional optimal values for FL is at most $\alpha=1.488$ [25], a $\beta$ -approximation for the integral (resp. fractional) version implies an $\alpha\beta$ -approximation for $k$ - $\max$ - $\min$ fractional (resp. integral) facility location. However, in our setting, we may place at most one client at any location in $\mathcal{C}^{\prime}$ , so the algorithm in [23] does not work for our purposes. (Clearly, our setting is more general, since we can encode the scenario-setting of [23] by creating $k$ co-located copies at every $j\in\mathcal{C}^{\prime}$ .) As noted earlier, we can model more-general settings, where clients have (integer) demands, by creating a fixed number of co-located clients at locations in $\mathcal{C}^{\prime}$ ; but, here again, we have a constraint that limits the number of co-located clients at any $j\in\mathcal{C}^{\prime}$ .

We therefore need to develop new techniques to devise an approximation algorithm for $k$ - $\max$ - $\min$ fractional FL. The key tool that we exploit here is that of cost-sharing schemes. We uncover a novel connection between cost-sharing schemes and $k$ - $\max$ - $\min$ problems by demonstrating that one can exploit a cost-sharing scheme for FL having certain properties to obtain an approximation algorithm for $k$ - $\max$ - $\min$ {integral, fractional} FL. Our result also improves the approximation factor for $k$ - $\max$ - $\min$ integral FL from $10$ to $6$ .

A cost-sharing method is a function $\xi:2^{\mathcal{C}^{\prime}}\times\mathcal{C}^{\prime}\rightarrow\mathbb{R}_{+}$ , where $\xi(S,j)$ for $j\in S$ , intuitively gives the contribution of $j$ towards the cost incurred in satisfying the client-set $S$ (i.e., the cost of opening facilities and assigning clients in $S$ to these open facilities). Pál and Tardos [28] devised a cost-sharing method $\xi$ satisfying the following properties. For sets $S,T\subseteq\mathcal{C}^{\prime}$ , define $\xi(S,T):=\sum_{j\in T}\xi(S,j)$ .

$\bullet$

$\xi(S,j)=0$ if $j\notin S$ . 2. $\bullet$

(Competitiveness) For every $S\subseteq\mathcal{C}^{\prime}$ , we have $\xi(S,S)\leq g(x,S)$ . 3. $\bullet$

(Cost-recovery) For every $S\subseteq\mathcal{C}^{\prime}$ , we have $\xi(S,S)\geq g(x,S)/3$ . 4. $\bullet$

(Cross-monotonicity) For all $S_{1}\subseteq S_{2}\subseteq\mathcal{C}^{\prime}$ and every client $j\in\mathcal{C}^{\prime}$ , we have $\xi(S_{2},j)\leq\xi(S_{1},j)$ .

We will prove an additional useful property about $\xi$ , for which we very briefly describe how $\xi$ is computed. For every $S\subseteq\mathcal{C}^{\prime}$ and $i\in\mathcal{F}$ , we compute a certain time $t(S,i)\geq 0$ . The cost-share of a client $j\in S$ is then defined as $\xi(S,j):=\min_{i\in\mathcal{F}}\max\{t(S,i),w_{ij}\}$ . The function $t(\cdot,\cdot)$ satisfies the following property: for every set $S\subseteq\mathcal{C}^{\prime}$ , every client $j\not\in S$ , and every facility $i\in\mathcal{F}$ , we have $t(S+j,i)\leq t(S,i)$ . Further, if this inequality is strict, then $t(S+j,i)\geq w_{ij}$ .

Lemma 3.21.

Consider $S\subseteq\mathcal{C}^{\prime}$ and two clients $j_{1}\in S$ and $j_{2}\not\in J$ . Then $\xi(S+j_{2},j_{1})\geq\min\bigl{\{}\xi(S,j_{1}),\xi(S+j_{2},j_{2})\bigr{\}}$ .

Proof.

By cross-monotonicity, we have $\xi(S+j_{2},j_{1})\leq\xi(S,j_{1})$ . If this holds at equality, then the result follows immediately. So assume otherwise. By the way in which the cost-shares are defined, $\xi(S+j_{2},j_{1})<\xi(S,j_{1})$ implies that $\xi(S+j_{2},j_{1})=t(S+j_{2},i)$ for some facility $i$ and $t(S+j_{2},i)<t(S,i)$ . This implies that $t(S+j_{2},i)\geq w_{ij_{2}}$ , and it follows that $\xi(S+j_{2},j_{2})\leq\max\{t(S+j_{2},i),w_{ij_{2}}\}=t(S+j_{2},i)=\xi(S+j_{2},j_{1})$ . ∎

We may assume that $k\leq|\mathcal{C}^{\prime}|$ (otherwise, we simply set $k=|\mathcal{C}^{\prime}|$ ). Consider the following simple greedy algorithm. Initialize $t\leftarrow 0$ , $S_{0}\leftarrow\emptyset$ . For $t=1,\ldots,k$ , we find $\overline{j}\leftarrow\operatorname{argmax}_{j\in\mathcal{C}^{\prime}\setminus S_{t-1}}\xi(S_{t-1}+j,j)$ , and set $S_{t}\leftarrow S_{t-1}\cup\{\overline{j}\}$ .

Let $O^{*}\in\mathcal{A}$ be such that $g(x,O^{*})=\max_{A\in\mathcal{A}}g(x,A)$ . We claim that $\xi(S_{k},S_{k})\geq\xi(S_{k}\cup O^{*},S_{k}\cup O^{*})/2$ . This will complete the proof since this implies that

[TABLE]

In fact [28] show a stronger form of cost-recovery, namely, that there is an integer solution $z^{S}$ feasible for scenario $S$ given first-stage decisions $x$ such that $\xi(S,S)\geq\bigl{(}\text{cost of }z^{S}\bigr{)}/3$ for every $S\subseteq\mathcal{C}^{\prime}$ , and using this in the above chain of inequalities shows that $S_{k}$ yields a $6$ -approximation also for $k$ - $\max$ - $\min$ integral facility location.

We now prove the above claim. For any $t=1,\ldots,k$ , we show that $\xi(S_{t},j)\geq\psi_{t}$ for all $j\in S_{t}$ , where $\psi_{t}:=\max_{j^{\prime}\in\mathcal{C}^{\prime}\setminus S_{t-1}}\xi(S_{t-1}+j^{\prime},j^{\prime})$ . We prove this by induction on $t$ . Note that $\psi_{t}\geq\psi_{t+1}$ due to cross-monotonicity, and since $\mathcal{C}^{\prime}\setminus S_{t-1}\supseteq\mathcal{C}^{\prime}\setminus S_{t}$ . The statement is clearly true for $t=1$ . Suppose this is true for index $t$ , and consider index $t+1$ . Consider any $j\in S_{t+1}$ . Let $\overline{j}$ be the element added to $S_{t}$ in iteration $t+1$ . By definition, $\xi(S_{t+1},\overline{j})=\psi_{t+1}$ . If $j\in S_{t}$ , then $\xi(S_{t+1},j)\geq\min\{\xi(S_{t},j),\xi(S_{t+1},\overline{j})\}\geq\min\{\psi_{t},\psi_{t+1}\}=\psi_{t+1}$ , where the second inequality follows from the induction hypothesis. Thus, for every $j\in S_{t+1}$ , we have $\xi(S_{t+1},j)\geq\psi_{t+1}$ . This completes the induction step.

Therefore, by repeatedly using cross-monotonicity, we have

[TABLE]

The first inequality follows from the statement proved in the previous paragraph; the second is simply because we restricted $\mathcal{C}^{\prime}\setminus S_{k-1}$ to $O^{*}\setminus S_{k}$ ; the third follows from cross-monotonicity; the fourth is because we replaced $\max$ by an average and all cost shares are nonnegative; the fifth is because $|O^{*}|\leq k$ ; and the last inequality is again due to cross-monotonicity. ∎

3.3.5 Steiner tree

The DR version ( $\mathsf{DRSST}$ ) was defined in Section 2. Recall that an instance is given by $\bigl{(}G=(V,E),c,s,\lambda\bigr{)}$ , where $(G,c)$ is a metric, $s$ is the root, and $c_{e},c^{\mathrm{II}}_{e}=\lambda c_{e}$ are the costs of buying edge $e$ in stages I and II respectively.

We do not have a local approximation algorithm for $\mathsf{DRSST}$ , but there is a restricted local $O(1)$ -approximation algorithm for a monotone version of $\mathsf{DRSST}$ , wherein we require that in every scenario $A$ , the path from each node $v\in A$ to the root $s$ consists of a segment starting at $v$ comprising edges bought in scenario $A$ , followed by a segment ending at $s$ comprising first-stage edges. (Thus, in effect, the first-stage edges $F$ should form a tree containing $s$ .) This monotonicity property was stipulated by [16, 6] in the context of 2-stage {stochastic, robust} Steiner tree respectively, where they show that imposing this condition only incurs a factor- $2$ loss. We argue that the same holds in the DR setting. Thus, by utilizing the restricted local $10$ -approximation algorithm devised by [19] for this monotone 2-stage Steiner tree problem in Theorem 3.13, and the well-known LP-relative $2$ -approximation for Steiner tree, we obtain the following results for the unrestricted setting.

Theorem 3.22.

$\mathsf{DRSST}$ * admits a $(160+O(\varepsilon))$ -approximation algorithm in the unrestricted setting with the scenario metrics $\ell^{\mathsf{dis}}$ and $\ell^{\mathsf{asym}}_{\infty}$ (defined with respect to the metric $c$ on $V$ ).*

Proof of Theorem 3.22

For $\mathsf{DRSST}$ , the discrete first-stage action set is $X=\{0,1\}^{E}$ . We first show that imposing the monotonicity condition incurs a factor- $2$ loss for the DR problem. Recall that the monotonicity condition states that in every scenario $A$ , the path from a node $v\in A$ to the root $s$ consist of a segment of second-stage edges starting at $v$ followed by a segment of first-stage edges ending at $r$ ; we call such a path a monotone path. For $x=\chi^{F}\in X$ , we say that $x+\chi^{F^{A}}$ contains a $v$ - $s$ path (respectively a monotone $v$ - $s$ ) path, if $F\cup F^{A}$ contains a $v$ - $s$ path (respectively a monotone $v$ - $s$ path). We want to compare the following two DR 2-stage Steiner tree problems.

[TABLE]

Lemma 3.23 ([6]).

For every first-stage decision $\bar{x}\in X$ , there exists $\widetilde{x}\in X$ such that $c^{\intercal}\widetilde{x}\leq 2c^{\intercal}\bar{x}$ and $g^{\mathrm{int,mon}}(\widetilde{x},A)\leq 2g^{\mathrm{int}}(\bar{x},A)$ for every set $A\subseteq V$ .

Corollary 3.24.

Consider the DR problems ( $\mathsf{DRSST}$ ) and ( $\mathsf{M}\mathsf{DRSST}$ ) for an arbitrary scenario collection $\mathcal{A}$ . If $\widetilde{x}$ is an $\alpha$ -approximate solution to ( $\mathsf{M}\mathsf{DRSST}$ ), then it is a $(2\alpha)$ -approximate solution to ( $\mathsf{DRSST}$ ).

Proof.

By applying Lemma 3.23 to an optimal solution to ( $\mathsf{DRSST}$ ), we infer that $\mathit{OPT}_{\mathsf{M}\mathsf{DRSST}}\leq 2\mathit{OPT}_{\mathsf{DRSST}}$ . Note that for every scenario $A\in\mathcal{A}$ , we have $g^{\mathrm{int}}(\widetilde{x},A)\leq g^{\mathrm{int,mon}}(\widetilde{x},A)$ by definition. It follows that the objective value of $\widetilde{x}$ in ( $\mathsf{DRSST}$ ) is no larger than its objective value in ( $\mathsf{M}\mathsf{DRSST}$ ), which by assumption is at most $\alpha\cdot\mathit{OPT}_{\mathsf{M}\mathsf{DRSST}}\leq 2\alpha\cdot\mathit{OPT}_{\mathsf{DRSST}}$ . ∎

Gupta et al. [16] consider the following integer program (IP) for $g^{\mathrm{int,mon}}(x,A)$ . For notational simplicity, we assume that $s\notin A$ ; clearly, this can always be ensured without changing the problem. We have variables $\{z^{A}_{e}\}_{e\in E}$ to indicate the edges bought in stage II. To encode the requirement that there is a monotone $v$ - $s$ path for every $v\in A$ , we bidirect the edges to obtain the set of arcs $\overleftrightarrow{E}$ , and use flow variables $\{f^{\mathrm{I},A,v}_{e}\}_{e\in\overleftrightarrow{E}}$ and $\{f^{\mathrm{II},A,v}_{e}\}_{e\in\overleftrightarrow{E}}$ to specify the segments of $v$ ’s path comprising first-stage and second-stage edges. For a vertex $v\in V$ , let $\delta^{\text{in}}(v)$ (respectively $\delta^{\text{out}}(v)$ ) denote the arcs of $\overleftrightarrow{E}$ entering (respectively leaving) $v$ . For an arc $e\in\overleftrightarrow{E}$ , we abuse notation and use $x_{e}$ to denote the component of $x$ corresponding to the undirected version of $e$ .

[TABLE]

Constraints (8) and (9) enforce that $f^{\mathrm{I},A,v}+f^{\mathrm{II},A,v}$ sends one unit of flow from $v$ to $s$ for every terminal $v\in A$ (so it dominates a directed $v\leadsto s$ path), and (10) enforces that this flow is supported on edges bought in stages I and II. Constraints (11) encode the monotonicity requirement on the $v$ - $s$ path.

Letting $g(x,A)$ denote the optimal value of the LP-relaxation obtained by relaxing the integrality constraints (12), (13) to nonnegativity constraints, the DR 2-stage Steiner problem (with fractional second-stage decisions) we consider is: $\min\ \bigl{(}h({\mathring{p}}\,;{x}):=c^{\intercal}x+\max_{q:L_{\mathrm{W}}(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}\bigr{)}$ ; we call this monotone $\mathsf{DRSST}$ . By the discussion in the beginning of Section 3.3, properties (P1)–(P6) hold for monotone $\mathsf{DRSST}$ , setting $\lambda=\max_{e\in E}c^{\mathrm{II}}_{e}/c_{e}$ and $\tau=\sum_{e\in E}c^{\mathrm{II}}_{e}/\min_{e\in E:c_{e}>0}c_{e}$ .

Recall that we are in the unrestricted setting (so $\mathcal{A}=2^{V}$ ), and $L_{\mathrm{W}}$ is the Wasserstein metric with respect to the discrete scenario metric $\ell^{\mathsf{dis}}$ or the asymmetric metric $\ell^{\mathsf{asym}}_{\infty}$ . The set of scenarios is collapsible under both these scenario metrics by Lemma 3.15. Gupta et al. [16] presented a restricted local $20$ -approximation algorithm for monotone $\mathsf{DRSST}$ , and the approximation factor was improved to $10$ by [19]. Therefore, utilizing Theorems 3.5 and 3.13, taking $\rho=10$ and $\alpha=2$ (and $\beta=1$ in Theorem 3.5), we obtain an $\bigl{(}80+O(\varepsilon)\bigr{)}$ -approximation for ( $\mathsf{M}\mathsf{DRSST}$ ). This yields a $\bigl{(}160+O(\varepsilon)\bigr{)}$ -approximation for $\mathsf{DRSST}$ (using Lemma 3.24). ∎

3.3.6 Proof of Theorem 3.14

We first give a reduction, showing that one can approximate $g(x,y,A)$ under very general settings provided that we have a (standard) approximation algorithm for a certain constrained problem.

Lemma 3.25.

Let $\mathcal{A}$ be any scenario set, and $\ell:\mathcal{A}\times\mathcal{A}\rightarrow\mathbb{R}_{+}$ be any function satisfying $\ell(A,A)=0$ for all $A\in\mathcal{A}$ . Fix $x\in X$ , and scenario $A\in\mathcal{A}$ . Consider the constrained problem:

[TABLE]

Suppose that we have a $\beta$ -approximation algorithm $\mathsf{Alg}$ for (3.25). Let $\mathcal{L}:=\{\ell(A,A^{\prime}):A,A^{\prime}\in\mathcal{A}\}$ .

(i) We can compute a $(\beta,1)$ -approximation to $g(x,y,A)$ using $|\mathcal{L}|$ calls to $\mathsf{Alg}$ .

(ii) For any $\varepsilon>0$ , we can compute a $(\beta,1+\varepsilon)$ -approximation to $g(x,y,A)$ using $O\bigl{(}\log_{1+\varepsilon}(\frac{\ell_{\max}}{\ell_{\min}})\bigr{)}$ calls to $\mathsf{Alg}$ , where $\ell_{\max}:=\max_{A,A^{\prime}}\ell(A,A^{\prime})$ and $\ell_{\min}:=\min_{A,A^{\prime}:\ell(A,A^{\prime})>0}\ell(A,A^{\prime})$ .

Proof.

The proof is based on a standard idea of enumerating over all $\ell(A,A^{\prime})$ values. For $\mu\in\mathcal{L}$ , let $A_{\mu}\in\mathcal{A}$ denote the scenario output by $\mathsf{Alg}$ for (3.25).

For part (i), we do the following. We compute $A_{\mu}$ for all $\mu\in\mathcal{L}$ . Let $\mu^{*}:=\operatorname{argmax}_{\mu\in\mathcal{L}}\bigl{(}g(x,A_{\mu})-y\cdot\ell(A,A_{\mu})\bigr{)}$ . We return $A_{\mu^{*}}$ . To show that this yields a $(\beta,1)$ -approximation for computing $g(x,y,A)$ , consider any $A^{\prime}\in\mathcal{A}$ , and let $\mu^{\prime}=\ell(A,A^{\prime})$ . We have

[TABLE]

The first inequality follows from the definition of $\mu^{*}$ , and the second follows since $A_{\mu^{\prime}}$ is a $\beta$ -approximate solution for ( $\Phi(x,{\mu^{\prime}},A)$ ).

For part (ii), we enumerate values in $[\ell_{\min},\ell_{\max}]$ in powers of $(1+\varepsilon)$ . More precisely, define $\overline{\mathcal{L}}:=\{0\}\cup\bigl{\{}(1+\varepsilon)^{i}\ell_{\min}:i=0,\dots,\left\lceil\log_{1+\varepsilon}{\frac{\ell_{\max}}{\ell_{\min}}}\right\rceil\bigr{\}}$ . Note that $|\overline{\mathcal{L}}|=O\bigl{(}\log_{1+\varepsilon}({\frac{\ell_{\max}}{\ell_{\min}}})\bigr{)}$ . We now compute $A_{\mu}$ for all $\mu\in\overline{\mathcal{L}}$ . Let $\mu^{*}:=\operatorname{argmax}_{\mu\in\overline{\mathcal{L}}}\bigl{(}g(x,A_{\mu})-y\cdot\ell(A,A_{\mu})\bigr{)}$ . We return $A_{\mu^{*}}$ . Consider any $A^{\prime}\in\mathcal{A}$ . By construction of $\overline{\mathcal{L}}$ , there is some $\mu^{\prime}\in\overline{\mathcal{L}}$ such that $\ell(A,A^{\prime})\leq\mu^{\prime}\leq(1+\varepsilon)\ell(A,A^{\prime})$ . Again, by the definition of $\mu^{*}$ , and since $A_{\mu^{\prime}}$ is a $\beta$ -approximate solution for ( $\Phi(x,{\mu^{\prime}},A)$ ), we have

[TABLE]

We now consider the setting in Theorem 3.14, namely, the $k$ -bounded setting with $\ell$ being the discrete metric, i.e., $\mathcal{A}=\{A\subseteq U:|A|\leq k\}$ for some ground set $U$ , and $\ell(A,A^{\prime})=1$ if $A\neq A^{\prime}$ , and [math] otherwise.

Fix $x\in X$ and a scenario $A\in\mathcal{A}$ . By Lemma 3.25, it suffices to give an approximation algorithm for the constrained problem (3.25). When $\mu=0$ , the optimum of the constrained problem is simply $g(x,A)$ (which is easy to compute), and otherwise, the constrained problem simplifies to $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ . So it suffices to obtain a $\beta$ -approximation to this latter problem, which is what we focus on in the sequel.

Part (a) of Theorem 3.14.

Gupta et al. [17] give an $O(\log n)$ -approximation algorithm for $k$ - $\max$ - $\min$ set cover, wherein the goal is to choose a set $A\in\mathcal{A}$ so as to maximize the cost of an optimal integral set-cover for $A$ . It is implicit in their analysis777See Theorem 4.2 and Claim 4.3 in [17]; Theorem 4.2 proves that the optimal fractional cost of the set-cover instance $(S,\mathcal{F})$ is at most $c(\Phi^{*})+12T^{*}$ . that this also yields an $O(\log n)$ -approximation for $k$ - $\max$ - $\min$ fractional set cover, where we seek to maximize the cost of an optimal fractional set cover.

This immediately implies an $O(\log n)$ -approximation for $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ as follows. Consider the set cover instance with ground set $U$ , and set-costs given by $w_{S}=0$ if $x_{S}=1$ , and $w_{S}=c^{\mathrm{II}}_{S}$ otherwise. The $k$ - $\max$ - $\min$ fractional set cover for this instance is precisely the problem $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ . So we obtain an $O(\log n)$ -approximation to $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ .

Part (b) of Theorem 3.14.

The problem $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ can be viewed as $k$ - $\max$ - $\min$ fractional vertex cover, where the cost $w_{v}$ of a vertex $v$ is [math] if $x_{v}=1$ , and $c^{\mathrm{II}}_{v}$ otherwise. Feige et al. [11] give a $\frac{2e}{e-1}$ -approximation algorithm for $k$ - $\max$ - $\min$ fractional vertex cover, so we obtain a $\bigl{(}\frac{2e}{e-1},1\bigr{)}$ -approximation for $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ .

Part (c) of Theorem 3.14.

The problem $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ can be viewed as $k$ - $\max$ - $\min$ fractional edge cover, where the cost $w_{e}$ of an edge $e$ is [math] if $x_{e}=1$ , and $c^{\mathrm{II}}_{e}$ otherwise. Feige et al. [11] give a $2$ -approximation algorithm for $k$ - $\max$ - $\min$ fractional edge cover, so we obtain a $(2,1)$ -approximation for $\max_{A^{\prime}\in\mathcal{A}}g(x,A^{\prime})$ . ∎

4 Distributionally robust problems under the $L_{\infty}$ -metric

We now focus on the DR 2-stage problem (Q ${}_{\mathring{p}}$ ), and its fractional relaxation (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ), in the unrestricted setting (so $\mathcal{A}=2^{U}$ , for some $U$ ) when $L$ is the $L_{\infty}$ -metric. Note that since the $L_{\infty}$ -distance between two probability distributions is at most $1$ , we can assume without loss of generality that $r\leq 1$ . We devise an algorithm that, given any $\varepsilon>0$ , runs in time $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{r\varepsilon}\bigr{)}$ , and returns a $\bigl{(}2+O(\varepsilon)\bigr{)}$ -approximate solution to the fractional relaxation (Q ${}^{\mathrm{fr}}_{\mathring{p}}$ ). Combining this with a local $\rho$ -approximation algorithm, we obtain a $\rho(2+O(\varepsilon))$ -approximation for the DR discrete 2-stage problem (i.e., with discrete first- and second- stage actions). This leads to the first guarantees for the DR versions of set cover, vertex cover, edge cover, and facility location under the $L_{\infty}$ -metric (Theorem 4.2).

At a high level, our approach is as follows. We first show how to obtain a suitable convex proxy function $h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ that is pointwise close to the objective function $h({\mathring{p}}\,;{x})$ so that one can cast the problem of minimizing $h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ as a standard 2-stage problem. Instead of utilizing the SAA approach to move to an SAA-version of $h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ with a polynomial-size central distribution, show that a near-optimal solution to the SAA problem translates to a near-optimal solution to the original problem, and finally show how to approximately solve the SAA problem (which is again challenging since this does not reduce to a polynomial-size LP), it is simpler to directly solve the proxy problem, $\min_{x\in\mathcal{P}}h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ , using the approximate-subgradient based machinery in [35]. We show that, under the assumption that $g(x,A)\leq g(x,A^{\prime})$ for all $x$ , $A\subseteq A^{\prime}$ , which holds for all our applications, one can compute an $\omega$ -subgradient of $h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ efficiently in time $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\omega}\bigr{)}$ , and hence can directly use the ellipsoid-based approach in [35] to obtain a solution $\bar{x}\in\mathcal{P}$ such that $h^{\mathrm{pr}}({\mathring{p}}\,;{\bar{x}})\leq\bigl{(}1+O(\varepsilon)\bigr{)}\min_{x\in\mathcal{P}}h^{\mathrm{pr}}({\mathring{p}}\,;{x})+\eta$ . This in turn implies that $h({\mathring{p}}\,;{\bar{x}})\leq\bigl{(}2+O(\varepsilon)\bigr{)}\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})+\eta$ . We can fold the additive error into the multiplicative error by obtaining a lower bound on the optimum.

Theorem 4.1.

Let $\varepsilon\leq\frac{1}{3}$ . Suppose that for all $x\in\mathcal{P}$ , and all $A\subseteq A^{\prime}$ , we have $g(x,A)\leq g(x,A^{\prime})$ . In the unrestricted setting ( $\mathcal{A}=2^{U})$ under the $L_{\infty}$ metric, we can compute a solution $\bar{x}\in\mathcal{P}$ satisfying $h({\mathring{p}}\,;{x})\leq\bigl{(}2+O(\varepsilon)\bigr{)}\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})$ with probability at least $1-\delta$ , in time $\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon r},\log(\frac{1}{\delta})\bigr{)}$ .

Theorem 4.2.

We obtain the following approximation factors for the DR discrete 2-stage problems in the unrestricted setting under the $L_{\infty}$ metric: (a) $O(\log n)$ for set cover; (b) $8+O(\varepsilon)$ for vertex cover; (c) $6+O(\varepsilon)$ for edge cover; and (d) $10.98+O(\varepsilon)$ for facility location.

Proof.

This follows by rounding the solution returned by Theorem 4.1, because, as noted in Section 3.3, we have local approximation algorithms with guarantees of (a) $O(\log n)$ for set cover (where $n=|U|$ ); (b) $4$ for vertex cover; (c) $3$ for edge cover; and (d) $5.488$ for facility location. ∎

In the sequel, we focus on proving Theorem 4.1. We first work our way towards defining the proxy function that we use. Note that for every distribution $q$ with $L_{\infty}(\mathring{p},q)\leq r$ , we must have $q_{A}\geq\max\{\mathring{p}_{A}-r,0\}$ for every scenario $A\in\mathcal{A}$ . We refer to the right side of this inequality as the blocked mass in scenario $A$ . The remainder of the probability mass $\mathring{p}_{A}$ (i.e., the difference $\mathring{p}_{A}$ and the blocked massed) may be moved to other scenarios, and hence we call it the free mass in scenario $\mathcal{A}$ . Separating the blocked mass and the free mass of all the scenarios, we obtain a decomposition $\mathring{p}=\overline{p}+\widetilde{p}$ , where $\overline{p}_{A}=\max\{\mathring{p}_{A}-r,0\}$ and $\widetilde{p}_{A}=\mathring{p}_{A}-\overline{p}_{A}=\min\{\mathring{p}_{A},r\}$ for every scenario $A\in\mathcal{A}$ .

Estimating $P^{\mathrm{free}}$ .

To define our proxy function, we will need an estimate of $P^{\mathrm{free}}$ that is accurate within a $(1+\varepsilon)$ factor. Lemma 4.3 shows that $P^{\mathrm{free}}\geq r$ , which suggests that such an estimate can be obtained with high probability using $\operatorname{\mathsf{poly}}\bigl{(}\frac{1}{r\varepsilon}\bigr{)}$ samples. We prove a few simple results below leading up to this (Lemma 4.6).

Lemma 4.3.

We have $P^{\mathrm{free}}\geq r$ .

Proof.

If there exists a scenario $A\in\mathcal{A}$ with $\widetilde{p}_{A}\geq r$ , then we have $P^{\mathrm{free}}\geq\widetilde{p}_{A}\geq r$ . Otherwise, we have $P^{\mathrm{free}}=\sum_{A\in\mathcal{A}}\widetilde{p}_{A}=\sum_{A\in\mathcal{A}}\mathring{p}_{A}=1\geq r$ . ∎

We partition the set of scenarios $\mathcal{A}$ into a set of frequent scenarios $\mathcal{A}^{\mathrm{freq}}:=\{A\in\mathcal{A}:\mathring{p}_{A}\geq r\}$ and a set of rare scenarios $\mathcal{A}^{\mathrm{rare}}:=\{A\in\mathcal{A}:\mathring{p}_{A}<r\}$ . Note that $|\mathcal{A}^{\mathrm{freq}}|\leq\frac{1}{r}$ , and $\widetilde{p}_{A}=\mathring{p}_{A}$ for every scenario $A\in\mathcal{A}^{\mathrm{rare}}$ .

Lemma 4.4.

Consider a partition $\mathcal{A}=\widehat{\mathcal{A}}^{\mathrm{freq}}\cup\widehat{\mathcal{A}}^{\mathrm{rare}}$ of the scenarios, with $\mathcal{A}^{\mathrm{freq}}\subseteq\widehat{\mathcal{A}}^{\mathrm{freq}}$ (and hence $\widehat{\mathcal{A}}^{\mathrm{rare}}\subseteq\mathcal{A}^{\mathrm{rare}}$ ). Let $\widehat{p}$ be a probability distribution such that $\sum_{A\in\widehat{\mathcal{A}}^{\mathrm{freq}}}|\widehat{p}_{A}-\mathring{p}_{A}|\leq\frac{1}{4}\varepsilon r$ . Let $Q^{\mathrm{free}}:=\sum_{A\in\widehat{\mathcal{A}}^{\mathrm{freq}}}\min\{\widehat{p}_{A},r\}+\sum_{A\in\widehat{\mathcal{A}}^{\mathrm{rare}}}\widehat{p}_{A}$ and $\widehat{P}^{\mathrm{free}}:=\min\left\{Q^{\mathrm{free}}+\frac{1}{2}\varepsilon r,1\right\}$ . Then $P^{\mathrm{free}}\leq\widehat{P}^{\mathrm{free}}\leq\min\{(1+\varepsilon)P^{\mathrm{free}},1\}$ .

Proof.

We first show that the first sum in the definition of $Q^{\mathrm{free}}$ is a good estimate of the amount of free mass in $\widehat{\mathcal{A}}^{\mathrm{freq}}$ . We have

[TABLE]

where the first step uses the triangle inequality; the second step uses the definition of $\widetilde{p}$ ; the third step is by assumption.

Now we show that the second sum in the definition of $Q^{\mathrm{free}}$ is a good estimate of the amount of free mass in $\widehat{\mathcal{A}}^{\mathrm{rare}}$ . We have

[TABLE]

where the first step uses the fact that $\widehat{\mathcal{A}}^{\mathrm{rare}}\subseteq\mathcal{A}^{\mathrm{rare}}$ ; the second step uses the fact that $\mathring{p}$ and $\widehat{p}$ are probability distributions; the third step uses the triangle inequality; the fourth step is by assumption.

Combining (14) and (15) yields $|P^{\mathrm{free}}-Q^{\mathrm{free}}|\leq\frac{1}{2}\varepsilon r$ . This, combined with Lemma 4.3 and the definition of $\widehat{P}^{\mathrm{free}}$ , yields the result. ∎

Lemma 4.5.

Let $\widehat{p}$ be an empirical estimate of $\mathring{p}$ using $N=\operatorname{\mathsf{poly}}(\frac{1}{r},\log\left(\frac{1}{\delta}\right))$ samples, and let $\widehat{\mathcal{A}}^{\mathrm{freq}}:=\{A\in\mathcal{A}:\widehat{p}_{A}\geq\frac{r}{2}\}$ . Then we have $|\widehat{\mathcal{A}}^{\mathrm{freq}}|\leq\frac{2}{r}$ , and with probability at least $1-\delta$ we have $\mathcal{A}^{\mathrm{freq}}\subseteq\widehat{\mathcal{A}}^{\mathrm{freq}}$ .

Proof.

The inequality $|\widehat{\mathcal{A}}^{\mathrm{freq}}|\leq\frac{2}{r}$ follows from the definition of $\widehat{\mathcal{A}}^{\mathrm{freq}}$ and the fact that $\widehat{p}$ is a probability distribution.

Since $\mathring{p}$ is a probability distribution and $\mathring{p}_{A}\geq r$ for every $A\in\mathcal{A}^{\mathrm{freq}}$ , we have $|\mathcal{A}^{\mathrm{freq}}|\leq\frac{1}{r}$ . If we choose $N$ appropriately, by using Chernoff bounds we have $\Pr\left[|\widehat{p}_{A}-\mathring{p}_{A}|>\frac{r}{2}\right]\leq\delta r$ for any fixed scenario $A\in\mathcal{A}$ . It follows that for any fixed scenario $A\in\mathcal{A}^{\mathrm{freq}}$ , we have $\Pr\bigl{[}A\not\in\widehat{\mathcal{A}}^{\mathrm{freq}}\bigr{]}\leq\delta r$ . By the union bound, we have $\Pr\bigl{[}\mathcal{A}^{\mathrm{freq}}\not\subseteq\widehat{\mathcal{A}}^{\mathrm{freq}}\bigr{]}\leq|\mathcal{A}^{\mathrm{freq}}|\delta r\leq\delta$ . ∎

Lemma 4.6.

We can compute an estimate $\widehat{P}^{\mathrm{free}}$ of $P^{\mathrm{free}}$ such that $P^{\mathrm{free}}\leq\widehat{P}^{\mathrm{free}}\leq\min\{(1+\varepsilon)P^{\mathrm{free}},1\}$ with probability at least $1-2\delta$ in time $\operatorname{\mathsf{poly}}(\mathcal{I},\frac{1}{\varepsilon r},\log\left(\frac{1}{\delta}\right))$ .

Proof.

First, we use Lemma 4.5 to obtain a set of scenarios $\widehat{\mathcal{A}}^{\mathrm{freq}}$ of size $|\widehat{\mathcal{A}}^{\mathrm{freq}}|\leq\frac{2}{r}$ that is a superset of $\mathcal{A}^{\mathrm{freq}}$ with probability at least $1-\delta$ . Next, we compute a empirical estimate $\widehat{p}$ of $\mathring{p}$ using $N$ samples. Using Chernoff bounds, we can choose $N=\operatorname{\mathsf{poly}}(\frac{1}{\varepsilon r},\log\left(\frac{1}{\delta}\right))$ so that $\Pr\left[|\widehat{p}_{A}-\mathring{p}_{A}|>\frac{1}{4}\frac{1}{|\widehat{\mathcal{A}}^{\mathrm{freq}}|}\varepsilon r\right]\leq\frac{1}{|\widehat{\mathcal{A}}^{\mathrm{freq}}|}\delta$ for every scenario $A\in\mathcal{A}$ . By the union bound, this event does not happen for any of the scenarios $A\in\widehat{\mathcal{A}}^{\mathrm{freq}}$ with probability at least $1-|\widehat{\mathcal{A}}^{\mathrm{freq}}|\frac{1}{|\widehat{\mathcal{A}}^{\mathrm{freq}}|}\delta=1-\delta$ . In this case, the probability distribution $\widehat{p}$ and the partition $(\widehat{\mathcal{A}}^{\mathrm{freq}},\widehat{\mathcal{A}}^{\mathrm{rare}}:=\mathcal{A}\setminus\widehat{\mathcal{A}}^{\mathrm{freq}})$ of $\mathcal{A}$ satisfy the conditions of Lemma 4.4, and so we can compute $\widehat{P}^{\mathrm{free}}$ as described in that lemma.

The success probability is at least $(1-\delta)^{2}\geq 1-2\delta$ . ∎

A proxy function for $h({\mathring{p}}\,;{x})$ .

We assume in the sequel that the estimate $\widehat{P}^{\mathrm{free}}$ computed in Lemma 4.6 satisfies $P^{\mathrm{free}}\leq\widehat{P}^{\mathrm{free}}\leq\min\{(1+\varepsilon)P^{\mathrm{free}},1\}$ . Consider the polytope $\mathcal{K}:=\bigl{\{}q\in\mathbb{R}_{+}^{\mathcal{A}}:\sum_{A\in\mathcal{A}}q_{A}\leq\widehat{P}^{\mathrm{free}},\quad q_{A}\leq r\ \forall A\in\mathcal{A}\bigr{\}}$ . Our proxy function is then defined as

[TABLE]

Informally, ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]}$ and $\max_{q\in\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A)$ can be seen as upper bounds on the contributions to $\max_{q:L_{\infty}(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}$ from the blocked mass and the free mass of $\mathring{p}$ respectively. We will argue that this proxy function $h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ is a good pointwise approximation of $h({\mathring{p}}\,;{x})$ . First, we need the following preliminary lemma.

Lemma 4.7.

For every $x\in\mathcal{P}$ , we have $\max_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}\geq\frac{1}{1+\varepsilon}\max_{q\in\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A)$ .

Proof.

Let $q^{*}$ be an optimal solution to (Kx). We prove that there exists a distribution $\tilde{q}$ with $\|\mathring{p}-\tilde{q}\|_{\infty}\leq r$ such that $\tilde{q}\geq\frac{1}{1+\varepsilon}q^{*}$ . This yields the result, since we obtain

[TABLE]

We give a constructive proof of the existence of $\widetilde{q}$ , via an iterative algorithm. Recall that $\overline{p}:=(\max\{\mathring{p}_{A}-r,0\})_{A\in\mathcal{A}}$ denotes the blocked mass of the distribution $\mathring{p}$ . We start by setting $\widetilde{q}_{A}:=\overline{p}+\frac{1}{1+\varepsilon}q^{*}$ . Note that for all $A\in\mathcal{A}$ we have $\widetilde{q}_{A}\geq\overline{p}_{A}\geq\mathring{p}_{A}-r$ and $\widetilde{q}_{A}\geq\frac{1}{1+\varepsilon}q^{*}_{A}$ . From now on, we will only increase components of $\widetilde{q}$ , so these two properties will be conserved; therefore we maintain the invariant $\tilde{q}\geq\frac{1}{1+\varepsilon}q^{*}$ . We only need to work towards ensuring that $\widetilde{q}$ is a probability distribution and that $\widetilde{q}_{A}\leq\mathring{p}_{A}+r$ for every $A\in\mathcal{A}$ (which, along with $\widetilde{q}_{A}\geq\mathring{p}_{A}-r$ for every $A\in\mathcal{A}$ , implies $\|\mathring{p}-\tilde{q}\|_{\infty}\leq r$ ).

Note that for every $A\in\mathcal{A}$ we have $\widetilde{q}_{A}\leq\max\{\mathring{p}_{A},r\}\leq 1$ (which also implies $\widetilde{q}_{A}\leq\mathring{p}_{A}+r$ ). Moreover, we have $\sum_{A\in\mathcal{A}}\widetilde{q}_{A}=\sum_{A\in A}\overline{p}_{A}+\frac{1}{1+\varepsilon}\sum_{A\in A}q^{*}_{A}\leq\sum_{A\in A}\overline{p}_{A}+P^{\mathrm{free}}=1$ . It is possible that $\widetilde{q}$ is not a probability distribution yet, if this inequality is not tight. If this is the case, then there must be a scenario $A\in\mathcal{A}$ such that $\widetilde{q}_{A}<\mathring{p}_{A}$ . We increase the component $\widetilde{q}_{A}$ until either we obtain $\sum_{A\in\mathcal{A}}\widetilde{q}_{A}=1$ (and hence $\widetilde{q}$ is a probability distribution) or $\widetilde{q}_{A}=\mathring{p}_{A}+r$ . If $\widetilde{q}$ is still not a probability distribution we repeat the same step with a different scenario. As each step (except possibly the final one) decreases the number of scenarios $A$ such that $\widetilde{q}_{A}<\mathring{p}_{A}$ , this process eventually stops. At this moment, $\widetilde{q}$ is a probability distribution and satisfies $\mathring{p}_{A}-r\leq\widetilde{q}_{A}\leq\mathring{p}_{A}+r$ for every $A$ , and so $\|\mathring{p}-\tilde{q}\|_{\infty}\leq r$ . ∎

Lemma 4.8.

For every $x\in\mathcal{P}$ , we have $h({\mathring{p}}\,;{x})\leq h^{\mathrm{pr}}({\mathring{p}}\,;{x})\leq 2(1+\varepsilon)h({\mathring{p}}\,;{x})$ .

Proof.

We start by proving the first inequality. Let $q^{*}:=\operatorname{argmax}_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}$ , so that $h({\mathring{p}}\,;{x})=c^{\intercal}x+{\textstyle\operatorname*{E}_{A\sim q^{*}}}\bigl{[}g(x,A)\bigr{]}$ . We decompose $q^{*}$ into two vectors as follows: we write $q^{*}=q^{1}+q^{2}$ , where $q^{1}_{A}:=\min\{q^{*}_{A},\mathring{p}_{A}\}$ and $q^{2}_{A}:=q^{*}_{A}-q^{1}_{A}$ for every scenario $A\in\mathcal{A}$ . Next we upper bound the contribution of each of these two vectors to the objective value $h({\mathring{p}}\,;{x})$ . Since $q^{1}\leq\mathring{p}$ , we have $\sum_{A\in\mathcal{A}}q^{1}_{A}g(x,A)\leq{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]}$ . Note that since $\|\mathring{p}-q^{*}\|_{\infty}\leq r$ , and by the way we defined $q^{2}$ , we must have $q^{2}_{A}\leq r$ for every scenario $A\in\mathcal{A}$ . Further, we have $\sum_{A\in\mathcal{A}}q^{2}_{A}\leq P^{\mathrm{free}}\leq\widehat{P}^{\mathrm{free}}$ . It follows that $q^{2}\in\mathcal{K}$ , and so $\sum_{A\in\mathcal{A}}q^{2}_{A}g(x,A)\leq\max_{q\in\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A)$ . Therefore we have

[TABLE]

proving the first inequality.

Now we proceed to prove the second inequality. We have

[TABLE]

The second step uses Lemma 4.7 and the fact that $\max_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}\geq{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]}$ (since $\mathring{p}$ is feasible for the maximization problem on the left side). ∎

Solving the proxy problem $\min_{x\in\mathcal{P}}h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ .

We assume that for all $x\in X$ , and all $A\subseteq A^{\prime}$ , we have $g(x,A)\leq g(x,A^{\prime})$ , which holds for all covering problems. Recall that $\mathcal{P}\subseteq\mathbb{R}_{+}^{m}$ . Recall from property (P4) that for every $A\in\mathcal{A}$ , the function $g(\cdot,A)$ is convex, and at every $x\in\mathcal{P}$ we can efficiently compute its value. We will assume the following stronger version of (P5):

(P5’)

For every $x\in\mathcal{P}$ and $A\in\mathcal{A}$ , we can efficiently compute a subgradient $d^{x,A}$ of $g(\cdot,A)$ at $x$ with $-\lambda c\leq d^{x,A}\leq 0$ .

Shmoys and Swamy [35] define a broad class of 2-stage problems for which (P5’) holds, which includes all the 2-stage problems considered in the literature. Recall that by (P3), $\mathcal{P}\subseteq B(0,R)=\{x:\|x\|\leq R\}$ and $\mathcal{P}$ contains a ball of radius $V\leq 1$ such that $\ln\bigl{(}\frac{R}{V}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . Let $\widetilde{K}$ be the Lipschitz constant of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ ; we show in Lemma 4.13 that $\log\widetilde{K}=\operatorname{\mathsf{poly}}(\mathcal{I})$ . Under this setup, we have the following result from [35].

Theorem 4.9 (see Theorem 4.7, Lemma 4.14 in [35]).

Let $\varepsilon<1/2$ , $\delta>0$ . Define $N=\left\lceil 2m^{2}\ln\bigl{(}\frac{16KR^{2}}{V\eta}\bigr{)}\right\rceil$ and $n=N\ln\bigl{(}\frac{8NKR}{\eta}\bigr{)}$ , and $\omega=\varepsilon/2n=\operatorname{\mathsf{poly}}\bigl{(}\frac{\varepsilon}{\mathcal{I}},\log(\frac{1}{\eta})\bigr{)}$ . Suppose we have a procedure that given any point $x\in\mathcal{P}$ finds an $\omega$ -subgradient of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ at $x$ with probability at least $1-\delta$ in time $T(\omega,\delta)$ . Then, we can find $\bar{x}\in\mathcal{P}$ satisfying $h^{\mathrm{pr}}({\mathring{p}}\,;{\bar{x}})\leq\frac{1}{1-\varepsilon}\cdot\min_{x\in\mathcal{P}}h^{\mathrm{pr}}({\mathring{p}}\,;{x})+\eta$ with probability at least $1-\delta$ in time $O\bigl{(}T(\omega,\frac{\delta}{N+n})\cdot m^{2}\log^{2}(\frac{\widetilde{K}Rm}{V\eta})\bigr{)}=\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},T(\omega,\frac{\delta}{N+n}),\log(\frac{1}{\eta})\bigr{)}$ .

We show that one can compute an $\omega$ -subgradient with probability at least $1-\delta$ in time $T(\omega,\delta)=\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{r\omega},\log(\frac{1}{\delta})\bigr{)}$ . Lemma 4.10 (ii) shows that to obtain an $\omega$ -subgradient, it suffices to be able to (a) find a vector that is componentwise close to ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}$ , and (b) find an optimal solution to the maximization problem (Kx) in the definition of $h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ . Lemma 4.11 argues using simple Chernoff bounds that one can obtain a vector that is componentwise close to ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}$ , and Lemma 4.12 shows that one can compute an optimal solution to (Kx) (with polynomial support). Finally, Lemma 4.13 bounds the Lipschitz constant of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ . Putting everything together yields Theorem 4.1.

Lemma 4.10.

(i)

The function $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ is convex, and the vector $d:=c+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}+\sum_{A\in\mathcal{A}}q^{*}_{A}d^{x,A}$ is a subgradient of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ at $x$ ; here $q^{*}$ is an optimal solution to **(Kx*)**.* 2. (ii)

Moreover, if $d^{\mathrm{est}}$ is a vector such that $-\omega c\leq d^{\mathrm{est}}-{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}\leq 0$ , then $\widehat{d}:=c+d^{\mathrm{est}}+\sum_{A\in\mathcal{A}}q^{*}_{A}d^{x,A}$ is an $\omega$ -subgradient of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ at $x$ .

Proof.

Convexity of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ will follow from the fact that we have a subgradient of $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ at every point $x\in\mathcal{P}$ . Part (i) is a special case of part (ii) with $\omega=0$ , so we focus on part (ii). Consider any $x^{\prime}\in\mathcal{P}$ . We have

[TABLE]

The first inequality follows since $q^{*}$ is a feasible solution to (K ${}_{x^{\prime}}$ ); the second follows since $d^{x,A}$ is a subgradient of $g(\cdot,A)$ at $x$ ; the third follows from the componentwise closeness of $d^{\mathrm{est}}$ and ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}$ ; the fourth follows since $x,x^{\prime}\geq 0$ , and the last inequality is because $h^{\mathrm{pr}}({\mathring{p}}\,;{x})\geq c^{\intercal}x$ . ∎

Lemma 4.11.

Let $x\in\mathcal{P}$ . For any $\omega>0$ and $\delta\in(0,1)$ , we can compute a vector $d^{\mathrm{est}}$ such that $-\omega c\leq d^{\mathrm{est}}-{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}\leq 0$ with probability at least $1-\delta$ in time $T(\omega,\delta):=\operatorname{\mathsf{poly}}(\mathcal{I},\frac{\lambda}{\omega},\log\frac{1}{\delta})$ .

Proof.

This is a simple application of Chernoff-Hoeffding bounds. For $i=1,\ldots,\mathcal{N}$ , we sample a scenario $A$ from $\mathring{p}$ , and compute $Z^{i}=d^{x,A}$ , so $Z^{i}_{e}/\lambda c_{e}\in[-1,0]$ for every $e=1,\dots,m$ by (P5’). Taking the average of $\mathcal{N}$ independent samples, we obtain using Chernoff bounds (see Theorem 1.1 in [7]), that

[TABLE]

for every $e=1,\dots,m$ . So $\mathcal{N}=\frac{2\lambda^{2}}{\omega^{2}}\ln\bigl{(}\frac{2m}{\delta}\bigr{)}$ ensures that the above probability is at most $\delta/m$ . We return $d^{\mathrm{est}}=\frac{1}{\mathcal{N}}\sum_{i=1}^{\mathcal{N}}Z^{i}-\frac{1}{2}\omega c$ . By the union bound, this satisfies $-\omega c\leq d^{\mathrm{est}}-{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}\leq 0$ with probability at least $1-m\frac{\delta}{m}=1-\delta$ . ∎

We say $(A_{1},\dots,A_{t})$ is a good $t$ -sequence for $x$ if $A_{1},\ldots,A_{t}$ are the $t$ scenarios with maximum second-stage cost $g(x,A)$ in that order; i.e., more precisely, we have $g(x,A_{1})\geq g(x,A_{2})\geq\dots\geq g(x,A_{t})\geq\max_{A\in\mathcal{A}\setminus\{A_{1},\ldots,A_{t}\}}g(x,A)$ .

Lemma 4.12.

Let $t:=\min\bigl{\{}\bigl{\lceil}\widehat{P}^{\mathrm{free}}/r\bigr{\rceil},|\mathcal{A}|\bigr{\}}$ , and fix $x\in\mathcal{P}$ . Suppose that $g(x,A)\leq g(x,A^{\prime})$ for all $A\subseteq A^{\prime}$ .

(a)

We can compute a good $t$ -sequence $(A_{1},\dots,A_{t})$ in time $\operatorname{\mathsf{poly}}(\mathcal{I},t)$ . 2. (b)

Define the vector $q^{*}$ as follows:

[TABLE]

Then $q^{*}$ is an optimal solution to $\max_{q\in\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A)$ .

Proof.

By the monotonicity assumption of $g(x,\cdot)$ , the costliest scenario is $U$ , so we start by setting $A_{1}=U$ . We then proceed as follows for $i=2,\dots,t$ . Suppose that we have already computed $A_{1},\dots,A_{i-1}$ . Computing $A_{i}$ amounts to solving the problem

[TABLE]

We claim that (16) admits an optimal solution that is a maximal proper subset of $A_{i^{\prime}}$ for some $1\leq i^{\prime}<i$ . Indeed, let $A^{*}$ be an optimal solution of (16) with maximum cardinality, and suppose for a contradiction that it is not a maximal proper subset of $A_{i^{\prime}}$ for any $1\leq i^{\prime}<i$ . Note that since $A_{1}=U$ , we have $A^{*}\neq U$ , so there is an element $e\in U\setminus A^{*}$ . Now, consider the scenario $\overline{A}:=A^{*}\cup\{e\}$ . Since by assumption $A^{*}$ is not a maximal subset of $A_{i^{\prime}}$ for any $1\leq i^{\prime}<i-1$ , it follows that $\overline{A}$ is feasible for (16). By the monotonicity assumption, since $A^{*}\subseteq\overline{A}$ , we have $g(x,\overline{A})\geq g(x,A^{*})$ , and so $\overline{A}$ is also an optimal solution for (16). Since $|\overline{A}|>|A^{*}|$ , this contradicts the definition of $A^{*}$ .

We now utilize the observation above to show that given $x$ and $A_{1},\dots,A_{i-1}$ , we can solve (16) in $\operatorname{\mathsf{poly}}(\mathcal{I},i)$ time. This can be done by enumerating all maximal proper subsets of $A_{1},\dots,A_{i-1}$ . Since each set $A_{i^{\prime}}$ has $|A_{i^{\prime}}|$ maximal proper subsets, we enumerate $\sum_{i^{\prime}=1}^{i-1}|A_{i^{\prime}}|\leq(i-1)|U|=\operatorname{\mathsf{poly}}(\mathcal{I},i)$ scenarios, and the claim follows. We conclude that we can compute a good $t$ -sequence by solving (16) for $i=2,\dots,t$ , which takes $\sum_{i=2}^{t}\operatorname{\mathsf{poly}}(\mathcal{I},i)=\operatorname{\mathsf{poly}}(\mathcal{I},t)$ time.

For part (b), consider the polytope $\frac{1}{r}\mathcal{K}:=\{\frac{1}{r}q:q\in\mathcal{K}\}$ . Note that the problem $\max_{q\in\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A)$ is equivalent to the problem $\max_{q\in\frac{1}{r}\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A)$ (up to scaling of the solutions), which can be seen as a fractional knapsack problem: we have one item of value $g(x,A)$ and weight $1$ for every $A\in\mathcal{A}$ ; the capacity of the knapsack is set to $\frac{\widehat{P}^{\mathrm{free}}}{r}$ . The result then follows by using the fact that one can compute an optimal solution to a fractional knapsack problem in a greedy fashion, by repeatedly picking among the available items the one with the highest value/weight ratio. ∎

Lemma 4.13.

The function $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ has Lipschitz constant at most $\widetilde{K}=(2\lambda+1)\|c\|$ .

Proof.

It suffices to show that $h^{\mathrm{pr}}({\mathring{p}}\,;{\cdot})$ admits a subgradient of Euclidean norm at most $\widetilde{K}$ at every point $x\in\mathcal{P}$ . Fix $x\in\mathcal{P}$ , and consider the subgradient $d:=c+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}+\sum_{A\in\mathcal{A}}q^{*}_{A}d^{x,A}$ given by Lemma 4.10. We have

[TABLE]

The first step follows from the triangle inequality, and the final step follows because $\|d^{x,A}\|\leq\lambda\|c\|$ for every $A\in\mathcal{A}$ by assumption (P5’) and $\sum_{A\in\mathcal{A}}q^{*}_{A}\leq\widehat{P}^{\mathrm{free}}\leq 1$ . ∎

Proof of Theorem 4.1.

Note that $g(0,U)=\max_{A\in\mathcal{A}}g(0,A)$ by the monotonicity property of the second-stage costs. If $g(0,U)=0$ (so $\mathcal{A}$ contains only null scenarios) then $\max_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}=0$ , and so $x=0$ is an optimal solution to the DR problem. Otherwise, the optimal value of (Kx) is at least $\mathsf{LB}:=r\cdot g(0,U)$ since there is always a distribution $q$ with $\|q-\mathring{p}\|_{\infty}\leq r$ that places a weight of at least $r$ on $U$ (e.g., take $q=\mathring{p}$ if $\mathring{p}_{U}\geq r$ ; otherwise, take $q_{U}=r$ , $q_{A}=(1-r)\mathring{p}_{A}/\sum_{A^{\prime}\subsetneq U}\mathring{p}_{A^{\prime}}$ for all $A\subsetneq U$ ). Note that $\log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I})$ .

We compute a $(1+\varepsilon)$ -estimate of $P^{\mathrm{free}}$ using Lemma 4.6. We then run the algorithm Theorem 4.9, utilizing Lemmas 4.10–4.12 to compute $\omega$ -subgradients, and setting $\eta=\varepsilon\cdot\mathsf{LB}$ and $\widetilde{K}=(2\lambda+1)\|c\|$ (using Lemma 4.13). Let $\bar{x}$ be the solution returned. Using Lemma 4.8, we obtain that

[TABLE]

where $\frac{2+2\varepsilon}{1-\varepsilon}\leq 2+4\varepsilon$ since $\varepsilon\leq\frac{1}{3}$ . The success probability is at least $1-3\delta$ . ∎

Appendix A Proof of Theorem 3.5

Overview.

Let $\widehat{p}$ denote a generic empirical estimate of $\mathring{p}$ (which could be any of $\widehat{p}^{1},\ldots,\widehat{p}^{k}$ ). We discretize $[0,\tau]$ suitably to obtain a set $Y$ so that for any $x\in X$ , and $y\in[0,\tau]$ , there is some $y^{\prime}\in Y$ such that $\overline{h}({p}\,;{x,y^{\prime}})$ is close to $\overline{h}({p}\,;{x,y^{\prime}})$ for any central distribution $p$ (Claim A.2). It follows that approximate solutions to $\min_{x\in X,y\in Y}\overline{h}({p}\,;{x,y})$ translate to approximate solutions to $\min_{x\in X,y\in[0,\tau]}\overline{h}({p}\,;{x,y})$ .

The arguments in [4] can be used to show that an approximate solution to $\min_{x\in X,y\in Y}\overline{h}({\widehat{p}}\,;{x,y})$ can be used to obtain an approximate solution to $\min_{x\in X,y\in Y}\overline{h}({\mathring{p}}\,;{x,y})$ (given a suitable value oracle for $\overline{h}({\widehat{p}}\,;{x,y})$ ). Recall that $\overline{h}({p}\,;{x,y})=c^{\intercal}x+ry+{\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\overline{g}(x,y,A)\bigr{]}$ . The proof in [4] proceeds by decomposing ${\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\overline{g}(x,y,A)\bigr{]}$ into two terms, ${\textstyle\operatorname*{E}^{l}_{A\sim p}}\bigl{[}.\bigr{]}$ and ${\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}$ , which are the contributions from “low” cost and “high” cost scenarios respectively. For the low scenarios, Chernoff bounds imply that ${\textstyle\operatorname*{E}^{l}_{A\sim\widehat{p}}}\bigl{[}.\bigr{]}$ and ${\textstyle\operatorname*{E}^{l}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}$ are close to each other, for all $(x,y)\in X\times Y$ , and all SAA problems; this is stated in (20).

But the high-scenario contribution could be quite different in the SAA and original problems, although in both problems, this contribution is essentially independent of $(x,y)$ since the choice of “high” ensures that high scenarios occur with small probability; this is shown by inequalities (18), (19).

Since ${\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}$ is linear in $p$ , the expectation of ${\textstyle\operatorname*{E}^{h}_{A\sim\widehat{p}}}\bigl{[}.\bigr{]}$ , over the choice of $\widehat{p}$ , is precisely ${\textstyle\operatorname*{E}^{h}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}$ . Thus, among our multiple SAA problems (involving empirical estimates $\widehat{p}^{i}$ of $\mathring{p}$ ), we can guarantee by Markov’s inequality that (with high probability) for at least one of them, ${\textstyle\operatorname*{E}^{h}_{A\sim\widehat{p}^{i}}}\bigl{[}.\bigr{]}$ will be close to ${\textstyle\operatorname*{E}^{h}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}$ . It follows that an $\alpha$ -approximate solution to this SAA problem is also an $\alpha\bigl{(}1+O(\varepsilon)\bigr{)}$ -approximate solution to the original problem. But we do not a priori know this index $i$ , and evaluating or estimating ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}\overline{g}(x,y,A)\bigr{]}$ (and hence, $\overline{h}({\mathring{p}}\,;{x,y})$ ) is challenging because (other than the difficulty of evaluating $\overline{g}(x,y,A)$ for a specific scenario $A$ ) $\mathring{p}$ can have exponential support; in fact, this is often #P-hard even for standard 2-stage problems. In [4], it is shown that if one can estimate the objective value $\overline{h}({\widehat{p}}\,;{x,y})$ for the SAA problem (which seems easier since $\widehat{p}$ has polynomial support), then choosing the (solution corresponding to the) SAA problem with best SAA objective value works.

In our case, we actually want to evaluate $h({\mathring{p}}\,;{x,y})$ , or roughly equivalently (by Lemma 3.4), the objective $\overline{h}({\mathring{p}}\,;{x,y})+z^{\mathrm{lg}}({\mathring{p}}\,;{0})$ for the solution returned by the SAA problem. While we can once again decompose ${\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}g(x,y,A)\bigr{]}$ into ${\textstyle\operatorname*{E}^{l}_{A\sim p}}\bigl{[}.\bigr{]}$ and ${\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}$ , as with ${\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}$ , the term $z^{\mathrm{lg}}({p}\,;{0})$ could have very different contributions in the SAA and original problems, and we need to reason about this separately. Moreover, a complicating factor is that this term is not linear in $p$ . We show in Claim A.3 that this term is concave in $p$ , and this allows us to still use Markov’s inequality as above. In the proof below, we consider the combined term ${\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}+z^{\mathrm{lg}}({p}\,;{0})$ , and apply Markov’s inequality to show that among our multiple SAA problems, there is some index $t$ for which this term is close to ${\textstyle\operatorname*{E}^{h}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})$ ; see inequality (21).

Finally, we show that, although we do not know $t$ , and we do not know how to evaluate $h({\widehat{p}}\,;{x,y})$ or $\overline{h}({\widehat{p}}\,;{x,y})$ , the index $j$ corresponding to the best $f^{i}$ estimate works as well as $t$ ; this is captured by (23).

Details.

Instead of directly working with $h({p}\,;{x})$ and $h({p}\,;{x,y})$ , we will work with the quantities $\overline{h}({p}\,;{x})+z^{\mathrm{lg}}({p}\,;{0})$ and $\overline{h}({p}\,;{x,y})+z^{\mathrm{lg}}({p}\,;{0})$ . It will be cumbersome to carry around the $z^{\mathrm{lg}}({p}\,;{0})$ term, so we define $\widetilde{h}({p}\,;{x}):=\overline{h}({p}\,;{x})+z^{\mathrm{lg}}({p}\,;{0})$ , and $\widetilde{h}({p}\,;{x,y}):=\overline{h}({p}\,;{x,y})+z^{\mathrm{lg}}({p}\,;{0})$ . To further simplify notation, we further abbreviate notation. The convention we follow is that whenever there is an index $i$ in the superscript of a quantity, it refers to that quantity for the central distribution $\widehat{p}^{i}$ of the $i$ -th SAA problem. So we use

–

$\widehat{h}^{i}(x)$ and $\widehat{h}^{i}(x,y)$ to denote $h({\widehat{p}^{i}}\,;{x})$ and $h({\widehat{p}^{i}}\,;{x,y})$ respectively; 2. –

$\overline{h}^{i}(x)$ and $\overline{h}^{i}(x,y)$ to denote $\overline{h}({\widehat{p}^{i}}\,;{x})$ and $\overline{h}({\widehat{p}^{i}}\,;{x,y})$ respectively; 3. –

$\widetilde{h}^{i}(x)$ and $\widetilde{h}^{i}(x,y)$ to denote $\widetilde{h}({\widehat{p}^{i}}\,;{x})$ and $\widetilde{h}({\widehat{p}^{i}}\,;{x,y})$ respectively; 4. –

$\widehat{z}^{\mathrm{sh},{i}}(x)$ and $\widehat{z}^{\mathrm{lg},{i}}$ to denote $z^{\mathrm{sh}}({\widehat{p}^{i}}\,;{x})$ and $z^{\mathrm{lg}}({\widehat{p}^{i}}\,;{0})$ respectively.

We focus on showing that

[TABLE]

Combining this with Lemma 3.4 completes the proof.

Let $\eta^{\prime}:=\frac{\eta}{2+8\varepsilon}$ . Define $Y:=\{0,\tau\}\cup\{\text{integer multiples of$ \frac{\eta^{\prime}}{\lambda r} $in }[0,\tau]\}$ .888The discretization considered in [4] is incorrect: it assumes implicitly that the search region of the SAA problem is (or may be) restricted to points whose first-stage cost is within some factor of the optimum of the original problem, but this need not hold. It also assumes that the grid points lie in the feasible region, which again need not hold. Note that $|Y|=O\bigl{(}\frac{\tau\lambda r}{\eta^{\prime}}\bigr{)}$ .

Claim A.1.

The discretized 2-stage problem $\min_{x\in X,y\in Y}\overline{h}({p}\,;{x,y})$ satisfies properties (P1), (P2) with inflation parameter $\Lambda=\lambda$ , i.e., we have

[TABLE]

Claim A.2.

For any $x\in X$ , $y\in[0,\tau]$ , and any distribution $p$ , there is some $y^{\prime}\in Y$ such that $\overline{h}({p}\,;{x,y})-\eta^{\prime}\leq\overline{h}({p}\,;{x,y^{\prime}})\leq\overline{h}({p}\,;{x,y})+\eta^{\prime}$ .

Proof.

There is some $y^{\prime}\in Y$ with $|y-y^{\prime}|\leq\frac{\eta^{\prime}}{\lambda r}$ . If $y^{\prime}\geq y$ , then $\overline{h}({p}\,;{x,y^{\prime}})\leq\overline{h}({p}\,;{x,y})+r\cdot\frac{\eta^{\prime}}{\lambda r}\leq\overline{h}({p}\,;{x,y})+\eta^{\prime}$ . Also, $\overline{g}(x,y,A)\leq\overline{g}(x,y^{\prime},A)+(y^{\prime}-y)\cdot\lambda r$ for all $A$ , so $\overline{h}({p}\,;{x,y})\leq\overline{h}({p}\,;{x,y^{\prime}})+\eta^{\prime}$ . If $y^{\prime}<y$ , then we can interchange the arguments; the claim follows. ∎

We now adapt and generalize the arguments in [4]. Let $\bar{x}\in X$ be an optimal solution to $\min_{x\in X}\overline{h}({\mathring{p}}\,;{x})$ , which is also an optimal solution to $\min_{x\in X}\widetilde{h}({\mathring{p}}\,;{x})$ . Let $\overline{O}:=\overline{h}({\mathring{p}}\,;{\bar{x}})$ . Let $y^{*}\in[0,\tau]$ be such that $\overline{O}=\overline{h}({\mathring{p}}\,;{\bar{x},y^{*}})$ , and let $\bar{y}\in Y$ given by Claim A.2 be such that $\overline{O}-\eta^{\prime}\leq\overline{h}({\mathring{p}}\,;{\bar{x},\bar{y}})\leq\overline{O}+\eta^{\prime}$ .

Let $H=\frac{2\lambda}{\varepsilon}\cdot\overline{O}$ . Call a scenario $A$ “high”, if $\overline{g}(0,0,A)>H$ , and “low” otherwise. Let $\mathring{p}^{h}=\sum_{A:A\text{ is high}}\mathring{p}_{A}$ We use ${\textstyle\operatorname*{E}^{l}_{A}}\bigl{[}.\bigr{]}$ (respectively ${\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}.\bigr{]}$ ) to denote the expectation ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}$ where non-low (respectively non-high) scenarios contribute 0 (so ${\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}={\textstyle\operatorname*{E}^{l}_{A}}\bigl{[}.\bigr{]}+{\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}.\bigr{]}$ ). Let $\widehat{p}^{i,h}$ , $\widehat{\operatorname*{E}}^{{i},l}_{A}\bigl{[}.\bigr{]}$ , and $\widehat{\operatorname*{E}}^{{i},h}_{A}\bigl{[}.\bigr{]}$ denote these quantities for the $i$ -th SAA problem. Since $\overline{h}({\mathring{p}}\,;{\bar{x}})\geq{\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(\bar{x},y^{*},A)\bigr{]}\geq\mathring{p}^{h}\bigl{(}H-\lambda(c^{\intercal}\bar{x}+ry^{*})\bigr{)})$ (the second inequality is due to Claim A.1), we have $\mathring{p}^{h}\leq\frac{\varepsilon}{\lambda}$ .999If $\overline{O}=0$ , then $c^{\intercal}\bar{x}+ry^{*}=0$ , and $\overline{g}(\bar{x},y^{*},A)=0$ for all $A$ with $\mathring{p}_{A}>0$ . Therefore, $\overline{g}(0,0,A)=0=H$ for all $A$ with $\mathring{p}_{A}>0$ , and all scenarios in the support of $\mathring{p}$ are low scenarios. The sample size $N$ is chosen so that Chernoff bounds ensure that with probability at least $1-\delta$ , for every $i$ , we have $\widehat{p}^{i,h}\leq\frac{2\varepsilon}{\lambda}$ . Hence,

[TABLE]

Since $\overline{g}(x,y,A)\leq\overline{g}(0,0,A)\leq H$ for all low scenarios $A$ and all $(x,y)\in X\times\mathbb{R}_{+}$ , the choice of $N$ shows that, again using Chernoff bounds, with probability $1-\delta$ , we have

[TABLE]

Next, we argue that there is some index $t$ such that $\widehat{\operatorname*{E}}^{{t},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}+\widehat{z}^{\mathrm{lg},{t}}$ is close to ${\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})$ . For every $i$ , the expected value of $\widehat{\operatorname*{E}}^{{i},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}$ is precisely ${\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}$ , so we can use Markov’s inequality. But it is more tricky to reason about the expected value of $\widehat{z}^{\mathrm{lg},{i}}$ since $z^{\mathrm{lg}}({p}\,;{0})$ is not linear in $p$ .

Claim A.3.

$z^{\mathrm{lg}}({p}\,;{0})$ * is a concave function of $p$ .*

Proof.

Consider any two distributions $p$ and $q$ , and $\bar{p}=\theta\cdot p+(1-\theta)\cdot q$ , where $\theta\in[0,1]$ . Let $\gamma^{p}$ and $\gamma^{q}$ be the optimal solutions to the optimization problems defining $z^{\mathrm{lg}}({p}\,;{0})$ and $z^{\mathrm{lg}}({q}\,;{0})$ . Then, $\theta\cdot\gamma^{p}+(1-\theta)\cdot\gamma^{q}$ is a feasible solution to the optimization problem defining $z^{\mathrm{lg}}({\bar{p}}\,;{0})$ , and its objective value is $\theta\cdot z^{\mathrm{lg}}({p}\,;{0})+(1-\theta)\cdot z^{\mathrm{lg}}({q}\,;{0})$ . ∎

Using the above claim and Jensen’s inequality, we obtain that the expected value of $\widehat{z}^{\mathrm{lg},{i}}$ is at most $z^{\mathrm{lg}}({\mathring{p}}\,;{0})$ . Therefore, by Markov’s inequality, we have that the event $\widehat{\operatorname*{E}}^{{i},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}+\widehat{z}^{\mathrm{lg},{i}}>(1+\varepsilon)\bigl{(}{\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})\bigr{)}$ happens with probability at most $\frac{1}{1+\varepsilon}\leq 1-\varepsilon/2$ . The probability that this happens for all $i=1,\ldots,k$ is at most $(1-\varepsilon/2)^{k}\leq\delta$ . So we may assume that there is some index $t\in\{1,\ldots,k\}$ such that

[TABLE]

Now we show that the index $j$ obtained from the $f^{i}$ estimates can be used in place of the index $t$ . To do this, we first use the properties of the $f^{i}$ ’s and the index $j$ to relate the quality of $\widehat{x}^{j}$ for the $j$ -th SAA problem to the quality of $(\bar{x},\bar{y})$ under any of the other SAA problems. Let $y^{j}\geq 0$ be such that $\overline{h}^{j}(\widehat{x}^{j})=\overline{h}^{j}(\widehat{x}^{j},y^{j})$ , and let $\widehat{y}^{j}$ be the point in $Y$ given by Claim A.2. We have that for every $i=1,\ldots,k$ ,

[TABLE]

The first inequality follows from Claim A.2; the second follows from Lemma 3.4; the next three inequalities follow from the properties of the $f^{i}$ estimates, and the choice of index $j$ ; the last inequality again uses Lemma 3.4, and that $\widetilde{h}^{i}(\bar{x})\leq\widetilde{h}^{i}(\bar{x},y)$ for any $y\geq 0$ .

Let $\alpha=2\beta\rho$ . Let $\Delta^{j}={\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})-\widehat{\operatorname*{E}}^{{j},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}-\widehat{z}^{\mathrm{lg},{j}}$ , and $\Delta^{t}={\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})-\widehat{\operatorname*{E}}^{{t},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}-\widehat{z}^{\mathrm{lg},{t}}$ . Applying (22) to $j$ and $t$ , we have $\widetilde{h}^{j}(\widehat{x},\widehat{y}^{j})-\eta^{\prime}\leq\alpha\cdot\widetilde{h}^{j}(\bar{x},\bar{y})$ and $\widetilde{h}^{j}(\widehat{x},\widehat{y}^{j})-\eta^{\prime}\leq\alpha\cdot\widetilde{h}^{t}(\bar{x},\bar{y})$ . Multiplying the first inequality by $\frac{1}{\alpha}$ and the second by $1-\frac{1}{\alpha}$ and adding, we get

[TABLE]

We now combine these various inequalities to obtain the desired result. By repeatedly using (18)–(20), we get

[TABLE]

where the last inequality above follows by applying (23). We bound $\widetilde{h}^{j}(\bar{x},\bar{y})+\Delta^{j}$ as follows.

[TABLE]

Similarly, we have

[TABLE]

Substituting $-\Delta^{t}\leq\varepsilon\bigl{(}\overline{O}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})\bigr{)}$ from (21), we can simplify this to

[TABLE]

Finally, substituting this bound and (25), in (24), we obtain

[TABLE]

This implies that

[TABLE]

where $\frac{1+\varepsilon}{1-2\varepsilon}\leq 1+4\varepsilon$ since $\varepsilon\leq\frac{1}{3}$ . This proves (17). Combining this with Lemma 3.4 yields the inequality in Theorem 3.5. The success probability is the probability that inequalities (19)–(21) hold, which is at least $1-3\delta$ . ∎

Bibliography41

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[1] Shipra Agrawal, Yichuan Ding, Amin Saberi, and Yinyu Ye. Price of Correlations in Stochastic Optimization. Operations Research , 60(1):150–162, 2012.
2[2] D. Bertsimas, M. Sim, and M. Zhang. A practicable framework for distributionally robust linear optimization. optimization-online.org , 2013.
3[3] John R. Birge and François Louveaux. Introduction to Stochastic Programming . Springer Science & Business Media, June 2011.
4[4] Moses Charikar, Chandra Chekuri, and Martin Pál. Sampling bounds for stochastic optimization. In Proceedings of the 8th International Workshop on Approximation, Randomization and Combinatorial Optimization Problems (APPROX) , pages 257–269, 2005.
5[5] Erick Delage and Yinyu Ye. Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems. Operations Research , 58(3):595–612, 2010.
6[6] Kedar Dhamdhere, Vineet Goyal, R. Ravi, and Mohit Singh. How to pay, come what may: Approximation algorithms for demand-robust covering problems. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages 367–378, 2005.
7[7] Devdatt Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms . Cambridge University Press, New York, NY, USA, 1st edition, 2009.
8[8] Emre Erdoğan and Garud Iyengar. Ambiguous chance constrained problems and robust optimization. Math. Program. , 107(1-2):37–61, December 2005.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

Taxonomy

Approximation Algorithms for Distributionally Robust

Abstract

1 Introduction

1.1 Our contributions

Formal model description.

Our results.

Theorem 1** (Combination of Theorems 3.5 and 3.7).**

Corollary 1**.**

Technical takeaways for DR problems with Wasserstein metrics.

Approximating g(x,y,A)g(x,y,A)g(x,y,A).

DR problems with the L∞L_{\infty}L∞​ metric.

1.2 Related work

2 Problem definitions, and our general class of DR 2-stage problems

Definition 2.1** **(Wasserstein (a.k.a transportation or earth-mover) distance).

A general class of DR 2-stage problems.

Definition 2.2**.**

3 Distributionally robust problems under the Wasserstein metric

3.1 A sample-average-approximation (SAA) result for distributionally robust problems

Theorem 3.1** ([4]).**

Lemma 3.2**.**

Proof.

Lemma 3.3**.**

Proof.

Lemma 3.4**.**

Proof.

Theorem 3.5**.**

3.2 Solving distributionally robust problems for polynomial-support central distributions

Definition 3.6**.**

Theorem 3.7**.**

Lemma 3.8**.**

Proof.

Lemma 3.9**.**

Lemma 3.10**.**

Proof.

Algorithm PolyAlg(η)\mathsf{PolyAlg}(\eta)PolyAlg(η).

Lemma 3.11**.**

Proof.

Proof of Theorem 3.7.

Proof of Lemma 3.9.

3.2.1 Hardness results for the SAA problem

Theorem 3.12**.**

Proof.

Part (a).

Part (c).

3.2.2 Refinements: formulating (Qp^fr{}^{\mathrm{fr}}_{\widehat{p}}p​fr​) as a compact LP in special

Theorem 3.13**.**

Proof.

3.3 Applications to distributionally robust combinatorial optimization

Theorem 3.14**.**

Lemma 3.15**.**

Proof.

3.3.1 Set cover

Theorem 3.16**.**

Theorem 3.17**.**

Proof of Theorem 3.17.

Proof of Theorem 3.16

Lemma 3.18**.**

Proof.

Lemma 3.19**.**

Proof.

3.3.2 Vertex cover

3.3.3 Edge cover

3.3.4 Facility location

Theorem 3.20**.**

Proof of Theorem 3.20

A 666-approximation algorithm for kkk-max⁡\maxmax-min⁡\minmin facility location.

Lemma 3.21**.**

Proof.

3.3.5 Steiner tree

Theorem 3.22**.**

Proof of Theorem 3.22

Lemma 3.23** ([6]).**

Corollary 3.24**.**

Theorem 1 (Combination of Theorems 3.5 and 3.7).

Corollary 1.

Approximating $g(x,y,A)$ .

DR problems with the $L_{\infty}$ metric.

Definition 2.1 (Wasserstein (a.k.a transportation or earth-mover) distance).

Definition 2.2.

Theorem 3.1 ([4]).

Lemma 3.2.

Lemma 3.3.

Lemma 3.4.

Theorem 3.5.

Definition 3.6.

Theorem 3.7.

Lemma 3.8.

Lemma 3.9.

Lemma 3.10.

Algorithm $\mathsf{PolyAlg}(\eta)$* .*

Lemma 3.11.

Theorem 3.12.

3.2.2 Refinements: formulating (Q ${}^{\mathrm{fr}}_{\widehat{p}}$ ) as a compact LP in special

Theorem 3.13.

Theorem 3.14.

Lemma 3.15.

Theorem 3.16.

Theorem 3.17.

Lemma 3.18.

Lemma 3.19.

Theorem 3.20.

A $6$ -approximation algorithm for $k$ - $\max$ - $\min$ facility location.

Lemma 3.21.

Theorem 3.22.

Lemma 3.23 ([6]).

Corollary 3.24.

Lemma 3.25.

4 Distributionally robust problems under the $L_{\infty}$ -metric

Theorem 4.1.

Theorem 4.2.

Estimating $P^{\mathrm{free}}$ .

Lemma 4.3.

Lemma 4.4.

Lemma 4.5.

Lemma 4.6.

A proxy function for $h({\mathring{p}}\,;{x})$ .

Lemma 4.7.

Lemma 4.8.

Solving the proxy problem $\min_{x\in\mathcal{P}}h^{\mathrm{pr}}({\mathring{p}}\,;{x})$ .

Theorem 4.9 (see Theorem 4.7, Lemma 4.14 in [35]).

Lemma 4.10.

Lemma 4.11.

Lemma 4.12.

Lemma 4.13.

Claim A.1.

Claim A.2.

Claim A.3.