Approximation Algorithms for Distributionally Robust Stochastic Optimization with Black-Box Distributions
Andre Linhares, Chaitanya Swamy

TL;DR
This paper develops approximation algorithms for distributionally robust stochastic optimization problems where the underlying distribution is uncertain and accessed via black-box sampling, extending solutions to classic combinatorial problems.
Contribution
It introduces a framework using sample average approximation and LP-rounding to solve distributionally robust problems with black-box distributions, achieving near-optimal guarantees.
Findings
First approximation algorithms for distributionally robust set cover, vertex cover, edge cover, facility location, and Steiner tree.
Guarantees within O(1) factors of deterministic problem solutions for most cases.
Framework applicable to problems with uncertain distributions accessed via sampling.
Abstract
Two-stage stochastic optimization is a framework for modeling uncertainty, where we have a probability distribution over possible realizations of the data, called scenarios, and decisions are taken in two stages: we make first-stage decisions knowing only the underlying distribution and before a scenario is realized, and may take additional second-stage recourse actions after a scenario is realized. The goal is typically to minimize the total expected cost. A criticism of this model is that the underlying probability distribution is itself often imprecise! To address this, a versatile approach that has been proposed is the {\em distributionally robust 2-stage model}: given a collection of probability distributions, our goal now is to minimize the maximum expected total cost with respect to a distribution in this collection. We provide a framework for designing approximation algorithmsâŠ
| Problem | Wasserstein metrics | , | ||||
| (see § 2) | General , =approx. for | |||||
| Facility location | ||||||
| Vertex cover | â | â | ||||
| Edge cover | â | â | ||||
| Set cover | â | â | ||||
| Steiner tree | 160 | * | 160 | * | * | * |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization · Auction Theory and Applications · Complexity and Algorithms in Graphs
Approximation Algorithms for Distributionally Robust
Stochastic Optimization with Black-Box Distributionsâ â thanks: A preliminary version [26] appeared in the Proceedings of the 51st ACM Symposium on Theory of Computing (STOC), 2019.
AndrĂ© Linhares {alinhare,cswamy}@uwaterloo.ca. Dept. of Combinatorics and Optimization, University of Waterloo, Waterloo, ON N2L 3G1. Supported in part by NSERC grant 327620-09 and an NSERC Discovery Accelerator Supplement award. ââ
Chaitanya Swamy00footnotemark: 0
Abstract
Two-stage stochastic optimization is a widely used framework for modeling uncertainty, where we have a probability distribution over possible realizations of the data, called scenarios, and decisions are taken in two stages: we make first-stage decisions knowing only the underlying distribution and before a scenario is realized, and may take additional second-stage recourse actions after a scenario is realized. The goal is typically to minimize the total expected cost. A common criticism levied at this model is that the underlying probability distribution is itself often imprecise! To address this, an approach that is quite versatile and has gained popularity in the stochastic-optimization literature is the distributionally robust 2-stage model: given a collection of probability distributions, our goal now is to minimize the maximum expected total cost with respect to a distribution in .
There has been almost no prior work however on developing approximation algorithms for distributionally robust problems, when the underlying scenario-set is discrete, as is the case with discrete-optimization problems. We provide a framework for designing approximation algorithms in such settings when the collection is a ball around a central distribution and the central distribution is accessed only via a sampling black box.
We first show that one can utilize the sample average approximation (SAA) methodâsolve the distributionally robust problem with an empirical estimate of the central distributionâto reduce the problem to the case where the central distribution has polynomial-size support. This follows because we argue that a distributionally robust problem can be reduced in a novel way to a standard 2-stage problem with bounded inflation factor, which enables one to use the SAA machinery developed for 2-stage problems. Complementing this, we show how to approximately solve a fractional relaxation of the SAA (i.e., polynomial-scenario central-distribution) problem. Unlike in 2-stage stochastic- or robust- optimization, this turns out to be quite challenging. We utilize the ellipsoid method in conjunction with several new ideas to show that this problem can be approximately solved provided that we have an (approximation) algorithm for a certain max-min problem that is akin to, and generalizes, the -- problemâfind the worst-case scenario consisting of at most elementsâencountered in 2-stage robust optimization. We obtain such a procedure for various discrete-optimization problems; by complementing this via LP-rounding algorithms that provide local (i.e., per-scenario) approximation guarantees, we obtain the first approximation algorithms for the distributionally robust versions of a variety of discrete-optimization problems including set cover, vertex cover, edge cover, facility location, and Steiner tree, with guarantees that are, except for set cover, within -factors of the guarantees known for the deterministic version of the problem.
1 Introduction
Stochastic-optimization models capture uncertainty by modeling it via a probability distribution over a collection of possible realizations of the data, called scenarios. An important and widely used model is the 2-stage recourse model, where one seeks to take actions both before and after the data has been realized (stages I and II) so as to minimize the expected total cost incurred. Many applications come under this setting. An oft-cited prototypical example is 2-stage stochastic facility location, wherein one needs to decide where to set up facilities to serve clients. The client-demand pattern is uncertain, but one does have some statistical information about the demands. One can open some facilities initially, given only the distributional information about demands; after a specific demand pattern is realized (according to this distribution), one can take additional recourse actions such as opening more facilities incurring their recourse costs. The recourse costs are usually higher than the first-stage costs, as they may entail making decisions in rapid reaction to the observed scenario (e.g., deploying resources with smaller lead time).
An issue with the above 2-stage model, which is a common source of criticism, is that the distribution modeling the uncertainty is itself often imprecise! Usually, one models the distribution to be statistically consistent with some historical data, so we really have a collection of distributions, and a more robust approach is to hedge against the worst possible distribution. This gives rise to the distributionally robust 2-stage model: the setup is similar to that of the 2-stage model, but we now have a collection of probability distributions; our goal is to minimize the maximum expected total cost with respect to a distribution in . Formally, if is the set of first-stage actions and the cost associated with is , we want to solve the following problem:
[TABLE]
where   g(x,A):=\min_{\text{second-stage actions }z^{A}}\bigl{(}\text{cost of }z^{A}\bigr{)}.
Distributionally robust (DR) stochastic optimization is a versatile approach dating back to [34] that has (re)gained interest recently in the Operations Research literature, where it is sometimes called data-driven or ambiguous stochastic optimization (see, e.g., [13, 2, 29, 9] and their references). The DR 2-stage model also serves to nicely interpolate between the extremes of: (a) 2-stage stochastic optimization, which optimistically assumes that one knows the underlying distribution precisely (i.e., ); and (b) 2-stage robust optimization, which abandons the distributional view and seeks to minimize the maximum cost incurred in a scenario, thereby adopting the overly cautious approach of being robust against every possible scenario regardless of how likely it is for a scenario to materialize; this can be captured by letting \mathcal{D}=\{\text{all distributions over \mathcal{A}}\}, where is the scenario-collection in the 2-stage robust problem. Both extremes can lead to suboptimal decisions: with stochastic optimization, the optimal solution for a specific distribution could be quite suboptimal even for a ânearbyâ distribution ;111There are examples where but an optimal solution for can be arbitrarily bad when evaluated under . with robust optimization, the presence of a single scenario, however unlikely, may force certain decisions that are undesirable for all other scenarios.
Despite its modeling benefits and popularity, to our knowledge, there has been almost no prior work on developing approximation algorithms for DR 2-stage discrete-optimization, and, more generally, for DR 2-stage problems with a discrete underlying scenario set (as is the case in discrete optimization). (The exception is [1], which we discuss in Section 1.2.222Peripherally related is [40], who consider a version of DR facility location, where the uncertainty only influences the costs and not the constraints, which yields a much-simpler and more restrictive model.)
1.1 Our contributions
We initiate a systematic study of distributionally robust discrete 2-stage problems from the perspective of approximation algorithms. We develop a general framework for designing approximation algorithms for these problems, when the collection is a ball around a central distribution in the metric, metric (total-variation distance), or Wasserstein metric (defined below). (Note that this still allows interpolating between stochastic and robust optimization.) We make no assumptions about ; it could have exponential-size support, and our only means of accessing is via a sampling black box.333The DR problem remains challenging even if has polynomial-size support, but is exponential. We view sampling from the black box as an elementary operation, so our running time bounds also imply sample-complexity bounds. Settings where is a ball in some probability metric arise naturally when one tries to infer a scenario distribution from observed data (see, e.g. [8, 9, 41])âhence, the moniker data-driven optimizationâand it has been argued that defining using the Wasserstein metric has various benefits [9, 41, 13, 29].
We view the frameworks that we develop for DR discrete 2-stage problems as our chief contribution, and the techniques that we devise for dealing with Wasserstein metrics as the main feature of our work (see Theorem 1 below). We demonstrate the utility of our frameworks by using them to obtain the first approximation guarantees for the distributionally robust versions of various discrete-optimization problems such as set cover, vertex cover, edge cover, facility location, and Steiner tree. The guarantees that we obtain are, in most cases, within -factors of the guarantees known for the deterministic (and 2-stage-{stochastic, robust}) counterpart of the problem (see Table 1).
Formal model description.
We study the following distributionally robust 2-stage model. We are given an underlying set of scenarios, and a ball of distributions around a central distribution over under some metric on probability distributions. We can take first-stage actions before a scenario is realized, incurring a first-stage cost , and second-stage recourse actions after a scenario is realized; the combination of first- and second-stage actions for a scenario must yield a feasible solution for each scenario . Using to denote that scenario is drawn according to distribution , we want to solve: \min_{x\in X}\ \bigl{(}c^{\intercal}x+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\text{cost of z^{A}}\bigr{]}\bigr{)}.
We use to denote the input size, which always measures the encoding size of the underlying deterministic problem, along with the first- and second-stage costs and the radius of the ball . It is standard in the study of 2-stage problems in the CS literature to assume that every first-stage action has a corresponding recourse action (e.g., facilities may be opened in either stage). We use to denote an inflation parameter that measures the maximum factor by which the cost of a first-stage action increases in the second stage. We consider the cases where is the metric, ; metric, , which is the total-variation distance; or a Wasserstein metric.
To motivate and define the rich class of Wasserstein metrics, note that while the choice of is a problem-dependent modeling decision, we would like the ball to contain other âreasonably similarâ distributions, and exclude completely unrelated distributions, as the latter could lead to overly-conservative decisions, Ă la robust optimization. One way of measuring the similarity between two distributions is to see if they they spread their probability mass on âsimilarâ scenarios. Wasserstein metrics capture this viewpoint crisply, and lift an underlying scenario metric to a metric on distributions over scenarios. The Wasserstein distance between two distributions and is the minimal cost of moving probability mass to transform into , where the cost of moving mass from scenario to scenario is . (Observe that is the Wasserstein metric with respect to the discrete scenario metric: if , and [math] otherwise.)
Example: DR 2-stage facility location (). As a concrete example, consider the DR version of 2-stage facility location. We have a metric space \bigl{(}\mathcal{F}\cup\mathcal{C},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}}\bigr{)}, where is a set of facilities, and is a set of clients. A scenario is a subset of indicating the set of clients that need to be served in that scenario. (We can model integer demands by creating co-located clients.) We may open a facility in stages I or II, incurring costs of and respectively. In scenario , we need to assign every to a facility opened in stage I or in scenario ; the second-stage cost of scenario is . The goal is to minimize \sum_{i\text{ opened in stage I}}f_{i}+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\text{second-stage cost of }A\bigr{]}. Here , and is the encoding size of \bigl{(}\mathcal{F},\mathcal{C},w,f,f^{\mathrm{II}},r\bigr{)}.
We consider two common choices for : (a) the unrestricted setting: , which is the usual setting in 2-stage stochastic optimization; and (b) the -bounded setting: , which is the usual setup in 2-stage robust optimization for modeling an exponential number of scenarios [11, 23, 17]. These two settings for arise for other problems as well (where is a suitable ground set).
In addition to being the or metrics, we can consider various ways of defining a scenario metric in terms of the underlying assignment-cost metric to capture that two scenarios involving demand locations in the same vicinity are deemed similar; lifting these scenario metrics to the Wasserstein metric over distributions yields a rich class of DR 2-stage facility location models. For instance, we can define the asymmetric metric , where , which measures the maximum separation between clients in and locations in (the resulting Wasserstein metric will now be an asymmetric metric on distributions). (There are other natural scenario metrics: the asymmetric metric , and the symmetrizations of these asymmetric metrics:
Our results.
Our main result pertains to Wasserstein metrics, which have a great deal of modeling power. Let be the Wasserstein metric with respect to a scenario metric . To gain mathematical traction, it will be convenient to move to a relaxation of the DR 2-stage problem where we allow fractional second-stage decisions. Let be the optimal second-stage cost of scenario given as the first-stage actions when we allow fractional second-stage actions. (We will obtain integral second-stage actions by rounding an optimal solution to using an LP-relative -approximation algorithm for the deterministic problem.)
We relate the approximability of the DR problem to that of known tasks in 2-stage-stochastic- and deterministic- optimization, and the following deterministic problem:
[TABLE]
Notice that ties together three distinct sources of complexity in the DR 2-stage problem: the combinatorial complexity of the underlying optimization problem, captured by ; the complexity of the scenario set ; and the complexity of the scenario metric , captured by the term.
Theorem 1** (Combination of Theorems 3.5 and 3.7).**
Suppose that we have the following.
- (1)
A -approximation algorithm for computing , which is an algorithm that given returns such that g(x,\overline{A})-y\cdot\ell(A,\overline{A})\geq\max_{A^{\prime}\in\mathcal{A}}\bigl{(}\frac{g(x,A^{\prime})}{\beta_{1}}-\beta_{2}\cdot y\cdot\ell(A,A^{\prime})\bigr{)}; 2. (2)
A local -approximation algorithm for the underlying 2-stage problem, which is an algorithm that rounds a fractional first-stage solution to an integral one while incurring at most a -factor blowup in the first-stage cost, and in the cost of each scenario; and 3. (3)
An LP-relative -approximation algorithm for the underlying deterministic problem.
Then we can obtain an O\bigl{(}\alpha\beta_{1}\beta_{2}\rho+\varepsilon)-approximation for the DR problem in time \operatorname{\mathsf{poly}}\bigl{(}\text{input size},\frac{\lambda}{\varepsilon}\bigr{)}.
Ingredients (2) and (3) can be obtained using known results for 2-stage-stochastic- and deterministic- optimization; ingredient (1) is the new component we need to supply to instantiate Theorem 1 and obtain results for specific DR 2-stage problems. (The non-standard notion of approximation for is necessary, as the mixed-sign objective precludes any guarantee under the standard notion of approximation; see Theorem 3.12.) In various settings, we show that a -approximation for can be obtained by utilizing results for the simpler - problemâ (i.e., )âencountered in 2-stage robust optimization (see the proof of Theorem 3.14 in Section 3.3.6): in the -bounded setting, where , this is called the -- problem [11, 23, 17]. In particular, this applies to the -metric, as in this case we have .
Corollary 1**.**
Consider a DR 2-stage problem where the Wasserstein metric is the metric. Suppose that we have a -approximation for the problem (given as input), and we have ingredients (2) and (3) in Theorem 1. Then we can obtain an O\bigl{(}\alpha\beta\rho+\varepsilon)-approximation for the DR problem in time \operatorname{\mathsf{poly}}\bigl{(}\text{input size},\frac{\lambda}{\varepsilon}\bigr{)}.
Theorem 1 (to a partial extent) and Corollary 1 thus provide novel, useful reductions from DR 2-stage optimization to 2-stage {stochastic, robust} (and deterministic) optimization. (For instance, [15] devise approximations for the - problem in Corollary 1 (i.e., ) for scenario sets defined by matroid-independence and/or knapsack constraints; Corollary 1 enables us to export these guarantees to the corresponding DR 2-stage problem with the metric.) In some cases, we can improve upon the guarantees in Theorem 1. For certain covering problems, [35] showed how to obtain via a decoupling idea; by incorporating this idea within our reduction, we can improve the guarantee in Theorem 1 and obtain an -approximation (see âSet coverâ in Section 3.3).
We demonstrate the versatility of our framework by applying Theorem 1 and these refinements to obtain guarantees for the DR versions of set cover, vertex cover, edge cover, facility location, and Steiner tree (Section 3.3). These constitute the majority of problems investigated for 2-stage optimization. Our strongest results are for facility location, vertex cover, and edge cover; for Steiner tree, we obtain results in the unrestricted setting. Table 1 summarizes these results.
Technical takeaways for DR problems with Wasserstein metrics.
The reduction in Theorem 1 is obtained by supplementing tools from 2-stage {stochastic, robust} optimization with various additional ideas. Its proof consists of two main components, both of which are of independent interest.
** Sample average approximation (SAA) for DR problems.**
In Section 3.1, we prove that a simple and appealing approach in stochastic optimization called the SAA method can be applied to reduce the DR problem to the setting where has a polynomial-size support. In the SAA method, we draw some samples to estimate by its empirical distribution , and solve the distributionally robust problem for . We show that (roughly speaking) by taking N=\operatorname{\mathsf{poly}}\bigl{(}\text{input size},\frac{\lambda}{\varepsilon}\bigr{)} samples, we can ensure that a -approximate oracle for the SAA objective value can be combined with a -approximation algorithm for the SAA problem, to obtain an -approximate solution to the original problem, with high probability (see Theorem 3.5). It is well known that samples are needed even for (standard) 2-stage stochastic problems in the black-box model [35]. Our SAA result substantially expands the scope of problems for which the SAA method is known to be effective (with sample size). Previously, such results were known for the special case of 2-stage stochastic problems [4, 38] (see also [24]), and multi-stage stochastic problems with a constant number of stages [38] (for ).
Proving our SAA result requires augmenting the SAA machinery for 2-stage stochastic problems [4, 38] with various new ingredients to deal with the challenges presented by DR problems. We elaborate in Section 3.1.
** Solving the polynomial-size central-distribution case.**
Complementing the above SAA result, we show how to approximately solve the DR 2-stage problem with a polynomial-size central distribution (Section 3.2). It is natural to move to a fractional relaxation of the problem, by replacing the first-stage set by a suitable polytope . In stark contrast with 2-stage {stochastic, robust} optimization, where the fractional relaxation of the polynomial-scenario problem immediately gives a polynomial-size LP and is therefore straightforward to solve in polytime, it is substantially more challenging to even approximately solve the fractional DR problem with a polynomial-size central distribution. In fact, this is perhaps the technically more-challenging part of the paper. The crux of the problem is that, while has polynomial-size support, there are (numerous) distributions in that have exponential-size support, and one needs to optimize over such distributions. In particular, if we use duality to reformulate the problem \max_{q:L_{\mathrm{W}}(\widehat{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]} as a minimization LP, this leads to an LP with an exponential number of both constraints and variables (see the discussion in Section 3.2). Thus, while we started with a polynomial-support central distribution, we have ended up in a situation similar to that in 2-stage stochastic or robust optimization with an exponential number of scenarios!
To surmount these obstacles, we work with the convex program , and solve this approximately by leveraging the ellipsoid-based machinery in [35] (see Theorem 3.7). Not surprisingly, this poses various fresh difficulties, chiefly because we are unable to compute approximate subgradients as required by [35]. We delve into these issues, and the ideas needed to overcome them in Section 3.2.
Approximating .
We use the following natural strategy: âguessâ for the optimal , possibly within a -factor, and solve the constrained problem (): . It is easy to show that a -approximation to () yields a -approximation for (Lemma 3.25). In the unrestricted setting (), we will usually be able to solve () exactly, exploiting the fact that our problems are covering problems. In the -bounded setting, we cast () as a -- problem (note that is integral), and utilize known results for this problem.
For , the result by [23] requires creating co-located clients, which does not work for us. We illuminate a novel connection between cost-sharing schemes and -- problems by showing that a cost-sharing scheme for FL having certain properties can be leveraged to obtain an approximation algorithm for -- {integral, fractional} FL (see the proof of Theorem 3.20). In doing so, we also end up improving the approximation factor for -- FL from  [23] to . Whereas cost-sharing schemes have played a role in 2-stage stochastic optimization, in the context of the boosted-sampling approach of [18], they have not been used previously for -- problems. (The approach in [17] has some some similar elements, but there is no explicit use of cost shares.) Cost-sharing schemes offer a useful tool for designing algorithms for -- problems, that we believe will find further application.
DR problems with the metric.
For the metric (Section 4), we directly consider the fractional relaxation of the problem. As with the Wasserstein metric, even for a polynomial-scenario central distribution, solving the resulting problem is quite challenging since it (again) leads to an LP with exponentially many variables and constraints. We move to a proxy objective that is pointwise close to the true objective, and show that an -subgradient of the proxy objective can be computed efficiently at any point, even for . This enables us to use the algorithm in [35] to solve the fractional problem; rounding this solution using a local approximation algorithm yields results for the DR discrete 2-stage problem. Table 1 lists the results we obtain for the metric as well.
1.2 Related work
Stochastic optimization is a field with a vast amount of literature (see, e.g., [3, 31, 33]), but its study from an approximation-algorithms perspective is relatively recent. Various approximation results have been obtained in the 2-stage recourse model over the last 15 years in the CS and Operations-Research (OR) literature (see, e.g., [37]), but more general models, such as distributionally robust stochastic optimization, have received little or no attention in this regard.
To the best of our knowledge, with the exception of [1], which we discuss below, there are no prior approximation algorithms for distributionally robust 2-stage discrete optimization problems, when the number of possible scenarios is (finite, but) exponentially large (even if has polynomial-size support). Much of the work in the stochastic-optimization and OR literature on these problems has focused on proving suitable duality results that sometimes allow one to reformulate the DR problem more compactly. Moreover, in many cases, the results obtained are for continuous scenario spaces and with other assumptions about the recourse costs. For instance, [9, 13, 41, 20] all consider the setting where is a ball in the Wasserstein metric, and provide a closed-form description of the worst-case distribution in , which is then used to reformulate the DR problem under further convexity assumptions of the scenario collection . DR problems have gained attention in recent years due to their usefulness in inferring decisions from observed data while avoiding the risk of overfitting: here is used to model a class of distributions from which the observed data could arise (with high confidence). Various works have advocated the use of a Wasserstein ball around the empirical distribution for this purpose [9, 41, 13, 29], but there are no results proving polynomial bounds on the number of samples needed in order to produce provably-good results. Note that these works, by definition, consider the setting where the central distribution has polynomial-size support. The distributionally robust setting has also been considered for chance-constrained problems; see, e.g. [8] and the references therein.
The work of [1] in the CS literature on correlation gap can be interpreted as studying distributionally robust discrete-optimization problems, but in a very different setting where is not a ball. Instead, is the collection of distributions that agree with some given expected values; the correlation gap quantifies the worst-case ratio of the DR objective when one chooses the optimal decisions with respect to the distribution in that treats all random variables as independent, versus the optimum of the DR problem. Agrawal et al. [1] proved various bounds on the correlation gap for submodular functions and subadditive functions admitting suitable cost shares. Various other works (see, e.g., [5, 30] and the references therein) have considered such moment-based collections, but again under continuity and/or convexity assumptions about the scenario space and/or recourse costs.
We now briefly survey the work on approximation algorithms under the stochastic- and robust- optimization models, which the DR model generalizes. As noted above, various approximation results have been obtained for 2-stage, and even multistage problems. In the black-box model, a common approach is the SAA method, which simply consists of solving the stochastic-optimization problem for the empirical distribution obtained by sampling. The effectiveness of this method has been analyzed both for 2-stage stochastic problems [24, 4, 38] and multi-stage stochastic problems [38]. The sample-complexity bound in [24] is a non-polynomial bound for general 2-stage stochastic problems, whereas [4, 38] both obtain bounds for structured problems. The proof in [38] applies also to structured multistage linear programs, and [4] show that even approximate solutions to the 2-stage SAA problem translate to approximate solutions to the original 2-stage problem. We build upon the SAA machinery of Charikar et al. [4]. Previously, Shmoys and Swamy [35] showed how to use the ellipsoid method to solve structured 2-stage linear programs in the black-box model, and how to round the resulting fractional solution. We utilize their machinery based on approximate subgradients to solve the polynomial-scenario central-distribution setting. Approximation algorithms for 2-stage problems have also been developed via combinatorial means. The prominent technique here is the boosted sampling technique of Gupta et al. [18]; the survey [37] gives a detailed description of these and other approximation results for 2-stage optimization.
Two-stage robust optimization where uncertainty is reflected in the constraints and not the data was proposed in [6], who devised approximation algorithms for various problems in the polynomial-scenario setting. Notice that it is not clear how to even specify problems with exponentially many scenarios in the robust model. Feige et al. [11] expanded the model of [6] by considering what we call the -bounded setting, where every subset of at most elements is a scenario. Subsequently, [23] and [17] expanded the collection of results known for 2-stage robust problems in the -bounded setting. We utilize results for the closely-related -- problem encountered in this setting in our work.
We briefly discuss a few other snippets that consider intermediary approaches between stochastic and robust optimization. Swamy [39] considers a model for risk-averse 2-stage stochastic optimization that interpolates between the stochastic and robust optimization approaches. In the context of online algorithms, Mirrokni et al. [27] and Esfandiari et al. [10] give online algorithms for allocation problems that are simultaneously competitive both in a random input model and in an adversarial input model. Finally, we note that our distributionally robust setting can be seen to be in a similar spirit as a recent focus in algorithmic mechanism design, where one does not assume precise knowledge of the underlying distribution; rather one (implicitly) has a collection of distributions, and one seeks to design mechanisms that work for every distribution in this collection; see, e.g., [21].
2 Problem definitions, and our general class of DR 2-stage problems
Recall that we consider settings where we have a ball of distributions (over the scenario-collection ) around a central distribution under some metric on distributions, and we seek to minimize the maximum expected cost with respect to a distribution in . As mentioned earlier, we make no assumptions about , and only require the ability to draw samples from . The metrics that we consider for are the metric, metric, and the Wasserstein metric. We now define Wasserstein metrics precisely.
Definition 2.1** **(Wasserstein (a.k.a transportation or earth-mover) distance).
The Wasserstein distance between two probability distributions and over is defined with respect to an underlying metric on . A transportation plan or flow from to is a vector such that: (i) for all ; and (ii) for all . The Wasserstein distance between and , denoted , is the minimum value of over all transportation plans from to .
If is an asymmetric metric, then is an asymmetric metric; if is a pseudometricâi.e., satisfies the triangle inequality but could be [math] for âthen so is .
In Section 3.3, we consider the DR versions of set cover (and some special cases), facility location, and Steiner tree. DR 2-stage facility location () was defined in Section 1.1; we define the remaining problems below, and then discuss the general class of DR 2-stage problems to which our framework applies. Recall that denotes the input size.
DR 2-stage set cover (). We have a collection of subsets over a ground set . A scenario is a subset of and specifies the set of elements to be covered in that scenario. We may buy a set in either stage, incurring costs of and in stages I and II respectively. The sets chosen in stage I and in each scenario must together cover . The goal is to choose some first-stage sets and sets in each scenario so as to minimize \sum_{S\in\mathcal{S}^{\mathrm{I}}}c_{S}+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\sum_{S\in\mathcal{S}^{A}}c^{\mathrm{II}}_{S}\bigr{]}.
We have , and is the encoding size of \bigl{(}U,\mathcal{S},c,c^{\mathrm{II}},r\bigr{)}. We consider the unrestricted () and -bounded () settings. Different scenarios could be quite unrelated, so there does not seem to be a natural choice for a (non-discrete) scenario-metric; we therefore consider (balls in) the or metrics. 2.
DR 2-stage Steiner tree (). We have a complete graph with metric edge costs , root , and inflation factor . A scenario is a subset of nodes (called terminals) specifying the nodes that need to be connected to . We may buy an edge in stages I or II, incurring costs or respectively. The union of the edges bought in stage I, and bought in scenario , must connect all nodes in to , and we want to minimize \sum_{e\in F}c_{e}+\max_{q:L(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}\sum_{e\in F^{A}}c^{\mathrm{II}}_{e}\bigr{]}. (With non-uniform inflation factors for different edges, even 2-stage stochastic Steiner tree becomes at least as hard as group Steiner tree [32].)
Here is the encoding size of . We obtain results in the unrestricted setting, and leave the -bounded setting for future work. As with , in addition to the and metrics, we can consider scenario metrics defined using (e.g., ) and the resulting Wasserstein metrics.
A general class of DR 2-stage problems.
Abstracting away the key properties of , , , we now define the generic DR 2-stage problem that we consider. As before, denotes the finite first-stage action set of the discrete problem. It will be convenient to consider the natural fractional relaxation of the DR problem obtained by enlarging the discrete second-stage action set and to suitable polytopes. Recall that is the optimal second-stage cost of scenario given as the first-stage decision, when we allow fractional second-stage actions. Let denote the polytope specifying the fractional first-stage decisions, with . (For example, for , is the optimal value of a set-cover LP where we may buy sets fractionally in the second stage, and .) One benefit of moving to the fractional relaxation is that, for every scenario , is a convex function of , whose value and subgradient can be exactly computed.
Definition 2.2**.**
Let be a function. We say that is a subgradient of at if we have for all . Given , we say that is an -subgradient of at the point if for every , we have . We abbreviate -subgradient to -subgradient.
Following [4, 35, 38], we consider the following generic DR 2-stage problem (Q) with discrete first-stage set , and its (further) fractional relaxation (Q), and require that they satisfy properties (P1)â(P6) listed below. Let denote the -norm of .
[TABLE]
[TABLE]
In proving their SAA result for 2-stage stochastic problems, [4] define properties (P1), (P2) below to capture the fact that every first-stage action has a corresponding recourse action that is more expensive by a bounded factor, and hence, it is always feasible to not take any first-stage actions.
- (P1)
, , , and for all . 2. (P2)
We know an inflation parameter such that for all .
Since we apply the ellipsoid-based machinery in [35] to solve the fractional problem with a polynomial-size central distribution, we need bounds on the feasible region in terms of enclosing and enclosed balls; this is captured by (P3), which is directly lifted from [35]. Note that the vast majority of 2-stage problems (including , , ) involve decisions, with and so , so (P3) is readily satisfied. As in [35], we need to be able to compute the value and subgradient of the recourse cost , which is a benign requirement since is the optimal value of a polytime-solvable LP in all our applications. Whereas [35] define a syntactic class of 2-stage stochastic LPs and show (implicitly) that they satisfy this requirement, we explicitly isolate this requirement in (P4), (P5).
- (P3)
We have positive bounds and such that and contains a ball of radius such that \ln\bigl{(}\frac{R}{V}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}). 2. (P4)
For every , is convex over , and can be efficiently computed for every . 3. (P5)
For every , we can efficiently compute a subgradient of at with , where . Hence, the Lipschitz constant of is at most (due to Definition 2.2).
Finally, we need the following additional mild condition.
- (P6)
When is the Wasserstein metric with respect to a scenario metric , we know with such that for all and all with .
As noted above, (P1)â(P5) are gathered from [4, 35], and hold for all the 2-stage problems considered in the CS literature (see [38, 6, 11, 23, 17]); (P6) is a new requirement, but is also rather mild and holds for all the problems we consider. (P1), (P2) and (P6) are used to prove that SAA works for the DR problem under the Wasserstein metric (Section 3.1). (P3)â(P5) pertain to the fractional relaxation, and are utilized to show that one can efficiently solve the SAA problem approximately (Section 3.2).
A solution to (Q) needs to be rounded to yield integral second-stage actions: any LP-relative -approximation algorithm for the deterministic version of the problem can be used to obtain recourse actions for each scenario having cost at most . To round a fractional solution to (Q), we utilize a local approximation algorithm for the 2-stage problem: we say that is a local -approximation algorithm for (Q) if, given any , it returns an integral solution and implicitly specifies integral recourse actions for every , such that and \text{(cost of \widetilde{z}^{A})}\leq\rho g(x,A) for all . An -approximate solution to (Q) combined with a local -approximation yields an -approximate solution to the discrete DR 2-stage problem. Local approximation algorithms exist for various 2-stage problemsâe.g., set cover, vertex cover, facility location [35]âwith approximation factors that are comparable to the approximation factors known for their deterministic counterparts.
3 Distributionally robust problems under the Wasserstein metric
We now focus on the DR 2-stage problem (Q) when is the Wasserstein metric with respect to a metric on scenarios. Plugging in the definition of (with respect to scenario metric ), we can rewrite (Q) as follows.
[TABLE]
[TABLE]
Let denote the optimal value of (Q). We note that a naive, simplistic approach that ignores the uncertainty in the underlying distribution, and only considers the central distribution , yields (expectedly) poor bounds. Suppose is an -approximate solution for the 2-stage problem \min_{x\in X}\bigl{(}c^{\intercal}x+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]}\bigr{)}. Given (P6), one can show that z({\mathring{p}}\,;{\bar{x}})\leq{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(\bar{x},A)\bigr{]}+\tau\cdot r (and is at least {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(\bar{x},A)\bigr{]}), which implies , but this is too weak a guarantee since could be quite large compared to .
In Section 3.1, we work with (Q) and show that the SAA approach can be used to reduce to the case where the central distribution has polynomial-size support. In Section 3.2, we show how to approximately solve the polynomial-size support case by applying the ellipsoid method to its (further) relaxation (Q), where we replace with . Here, we utilize a local approximation algorithm to move from to , and thereby interface with, and complement, the SAA result for (Q) proved in Section 3.1. This result applies more generally, even when is not a metric; we only require that for all . (If is not a metric, the Wasserstein distance with respect to need not yield a metric on distributions.)
In Section 3.3, we consider various combinatorial-optimization problems, and utilize the above results in conjunction to obtain the first approximation results for the DR versions of these problems.
3.1 A sample-average-approximation (SAA) result for distributionally robust problems
The SAA approach is the following simple, intuitive idea: draw some samples from , estimate by the empirical distribution induced by these samples, and solve the SAA problem (Q). We prove the following SAA result. For any , if we construct O\bigl{(}\frac{1}{\varepsilon}\bigr{)} SAA problems, each using \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon},\log(\frac{1}{\eta})\bigr{)} independent samples, and if we have a -approximation algorithm for computing the objective value of the SAA problem at any given point, then we can utilize -approximate solutions to these SAA problems to obtain a solution satisfying h({\mathring{p}}\,;{\widehat{x}})\leq 4\beta\rho\bigl{(}1+O(\varepsilon)\bigr{)}\cdot O^{*}+2\beta\rho\eta with high probability; Theorem 3.5 gives the precise statement.
The proof has several ingredients. There are two main approaches [4, 38] for showing that the SAA method with a polynomial number of samples works for stochastic-optimization problems. Charikar et al. [4] prove the following SAA result for 2-stage problems.
Theorem 3.1** ([4]).**
Consider a 2-stage problem (2St-P) : \min_{x\in\widetilde{X}}\ \bigl{(}f({p};{x}):=\tilde{c}^{\intercal}x+{\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\tilde{g}(x,A)\bigr{]}\bigr{)}, with scenario set , where satisfy (P1), (P2) with inflation parameter . With probability at least , any optimal solution to the SAA problem constructed using \operatorname{\mathsf{poly}}\bigl{(}\log|\widetilde{X}|,\frac{\Lambda}{\varepsilon},\log(\frac{1}{\delta})\bigr{)} samples is a -approximate solution to (2St-P). More generally, there is a way of using an -approximation algorithm for the SAA problem, in conjunction with a -approximate objective-value oracle for the SAA problem, to obtain an \bigl{(}\alpha\beta+O(\varepsilon)\bigr{)}-approximate solution to (2St-P) with high probability.
Note that (Q) is not a standard 2-stage stochastic-optimization problem because constraint (2) couples the various scenarios, which prevents us from applying Theorem 3.1 to (Q). The SAA result in Swamy and Shmoys [38] applies to the fractional relaxation of the problem, and works whenever the objective functions of the SAA and original problems satisfy a certain âcloseness-in-subgradientsâ property. A subgradient of at a point is obtained from the optimal distribution to the inner maximization problem in (Q). This is however an exponential-size object and utilizing this to prove closeness-in-subgradients seems quite daunting.
Our first insight is that we can decouple the scenarios by Lagrangifying constraint (2) using a dual variable . By standard duality arguments, this leads to the following reformulation of (Q).
[TABLE]
Recall that g(x,y,A):=\max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}. Let . The chief benefit of the reformulation (R) is that we can view (R) as a 2-stage problem: the first-stage action-set is , and the optimal second-stage cost of scenario under first-stage actions is given by . This makes it more amenable to utilize the SAA machinery developed for 2-stage problems. We can exploit (P6) to show that we may limit to the range in (R), and use (P2) to bound the inflation factor of (R).
Lemma 3.2**.**
For any , there exists such that . Hence, is an -approximate solution to (Q) iff such that is an -approximate solution to (R).
Proof.
The second statement is immediate from the first one since (Q) and (R) have the same optimal values. So we focus on showing the first statement.
Consider any . There exists such that . If , then we are done. So suppose . We argue that . This completes the proof since we also have for all . Clearly, . If is such that , then it must be that . Otherwise, , where the last inequality follows from (P6). This contradicts the choice of . Therefore, we have , completing the proof. â
Lemma 3.3**.**
For the 2-stage problem (R), we can set the parameter in Theorem 3.1 to be \max\bigl{\{}\lambda,\frac{\ell_{\max}}{r}\bigr{\}}.
Proof.
Consider any , , and . Let be such that . Then
[TABLE]
The second inequality above follows from (P2). â
Given Lemmas 3.2 and 3.3, by suitably discretizing , one can use Theorem 3.1 to show that: if we construct the SAA problem using \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon},\log\tau,\frac{\ell_{\max}}{r}\bigr{)} samples, and can compute (approximately) the SAA objective value at any given point, then, with high probability, one can translate an -approximate solution to the SAA problem to an -approximate solution to (Q). But this result does not quite suit our purposes due to various reasons.
The term could be rather large, and is not , so this does not yield polynomial sample complexity.444The problem persists even if we utilize the closeness-in-subgradients machinery in [38] to the fractional version of (R). This would involve estimating {\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\ell(A,\pi(x,y,A))\bigr{]} to within an term, where \pi(x,y,A)=\operatorname{argmax}_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}, which requires samples. Moreover it seems difficult to compute the SAA objective value , or even approximate it. This difficulty arises because computing encompasses the NP-hard -- problem encountered in 2-stage robust optimization, and furthermore, the mixed-sign objective in makes it hard to even approximate (see Theorem 3.12).
We need various ideas to circumvent these issues. We show that we can eliminate the dependence on altogether at the expense of a slight deterioration in the approximation ratio when moving from the SAA to the original problem. The term arises because might be attained by a scenario where (see the proof of Lemma 3.3). Our crucial second insight is that we can eliminate this and reduce the sample complexity to , by specifically imposing that we never encounter pairs with ; we call such pairs long edges, and the remaining pairs short edges. Any satisfying (2) can send at most flow on the long edges. Motivated by this, we âdecomposeâ into and , which are (roughly speaking) the contribution from the short and long edges respectively. (This decomposition is akin to the division of low- and high- cost scenarios used by [4] to prove Theorem 3.1, but there are significant technical differences, which complicate things for us, as we discuss below.) We define and as follows.
[TABLE]
Lemma 3.4**.**
For every central distribution , and every , we have .
Proof.
We prove this by showing that: (i) ; and (ii) . Given these bounds, the upper bound on follows from the upper bounds on and in parts (i) and (ii) respectively. For the other direction, we have
[TABLE]
where the first and second inequalities follow from the second inequalities of parts (ii) and (i) respectively.
Part (ii) follows from property (P2). For any feasible solution to the optimization problem defining (and ), we have
[TABLE]
We now prove part (i). It is clear from the definition that , so the second inequality holds. For the first inequality, consider any feasible solution to (T). Let be the restriction of to the short edges, along with [math]s for the long edges. Similarly, let be the restriction of to the long edges, along with [math]s for the short edges. Then and are feasible solutions to the optimization problems defining and respectively. This yields the first inequality in (i). â
Given Lemma 3.4, we focus on the thresholded proxy problem () below, and its reformulation obtained (as before) by Lagrangifying (2) and simplifying.
[TABLE]
[TABLE]
where \overline{g}(x,y,A):=\max_{A^{\prime}\in\mathcal{A}:\ell(A,A^{\prime})\leq M}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}. After suitably discretizing the -interval , we obtain that the 2-stage problem () satisfies (P1) and (P2) with inflation parameter . So Theorem 3.1 applied to () suggests an improved \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon}\bigr{)} sample complexity, but two sources of difficulty remain.
First, while we would like to consider the proxy problem (), which is the SAA version of (), we are in fact solving the true) SAA problem (Q) approximately. Whereas and are pointwise close, could be significant compared to (as indicated by the factor- loss in Lemma 3.4). Therefore, an -approximation to (Q) does not yield an -approximation to () (or equivalently, ()). We will in fact not be able to obtain an approximate solution to (), and so it is unclear why transferring approximation guarantees from () to () (and hence ()) is helpful. That is, the artifact we encounter is that the 2-stage SAA problem that has bounded inflation factor is not the one that we are able to approximate. (Note that Theorem 3.1 is not equipped to deal with this issue since its starting point is an approximate solution to the SAA problem.)
The way around this is to realize that our goal is to evaluate the quality of the SAA solution for the original problem (Q), and not (). In 2-stage stochastic optimization, the contribution from high-cost scenarios to the total expected cost is linear in , which provides a handle on how to relate and . In our case, the contribution is nonlinear in , and we need to derive new insights to reason about how this changes when we move from to its empirical estimate ; we then proceed by carefully adapting the ideas in [4]. We explain this in more detail under âOverviewâ in Appendix A.
Second, we (still) do not have an approximate value oracle for (or ). However, we will show in Section 3.2 (see Lemma 3.9) that if we have the non-standard type of approximation for mentioned in Theorem 1, then one can obtain an approximate value oracle for . While this is not the same as a value oracle for , we show that this nevertheless suffices.
Combining these ingredients yields the following theorem, which is the main result of this section. Recall that , and and are .
Theorem 3.5**.**
Let , . Consider k=\frac{2}{\varepsilon}\log\bigl{(}\frac{1}{\delta}\bigr{)} SAA problems with objective functions , for , where each is an empirical estimate of constructed using N=\operatorname{\mathsf{poly}}(\frac{\lambda}{\varepsilon},\log|X|,\log(\frac{\tau}{\eta}),\log(\frac{1}{\delta})\bigr{)} independent samples. Suppose that for every , we have a solution and an estimate , such that: (S1) ; and (S2) (where ). Let and . Then, h({\mathring{p}}\,;{\widehat{x}})\leq 4\beta\rho\bigl{(}1+O(\varepsilon)\bigr{)}O^{*}+2\beta\rho\eta with probability at least .
The mixed (i.e., multiplicative + additive) guarantee obtained above can be turned into a purely multiplicative guarantee if we have a lower bound on with \log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}). We show that such a lower bound can indeed be obtained under some very mild assumptions (Lemma 3.11).
The proof of Theorem 3.5 is further complicated due to the peculiarities of the estimates that we have for . Note that (S1), (S2) only imply that is a -approximation to the SAA problem, and , so a statement of the form in Theorem 3.1 would yield an inferior approximation bound of . Instead, we need to adapt the arguments of [4] to suit the numerous peculiarities of our setting. The proof is therefore somewhat technical and we defer this to Appendix A.
We remark that the proxy problem () (or ()) is used only in the analysis. One takeaway here is that we derive a substantially improved sample-complexity bound by taking a slight hit in the approximation ratio when moving from the SAA to the original problem. This is a novel, nuanced result regarding the effectiveness of the SAA method for DR 2-stage problems. We do not know of any other setting where one obtains drastically improved sample complexity by settling for a worse than -factor (but still ) loss when moving from the SAA to the original problem. (In particular, no such result is known for standard 2-stage problems.)
3.2 Solving distributionally robust problems for polynomial-support central distributions
We now show how to approximately solve the distributionally robust problem (Q) when the central distribution has polynomial-size. This will allow us to solve the SAA problem(s) constructed in Section 3.1, and complement Theorem 3.5. Let denote the support of . So we have
[TABLE]
[TABLE]
We consider the fractional relaxation of (Q), where we replace with its relaxation to obtain (Q): . As noted earlier, unlike the case with 2-stage {stochastic, robust} optimization, where the fractional relaxation of the polynomial-scenario problem gives a polynomial-size LP and is therefore straightforward to solve in polytime, it is substantially more challenging to even approximately solve the fractional DR polynomial-scenario problem. In particular, reformulating (and hence (Q)) as a minimization LP leads to an LP with exponential number of constraints and variables. The issue is that (T) involves an exponential number of variables. So if we reformulate as a minimization LP by taking the dual of (T) (and replacing by its LP formulation), we obtain an exponential number of constraints (due to the variables), and an exponential number of variables (needed to encode the LP for , for each ). (An exception to all this is the unrestricted setting (i.e., for some set ) with the discrete scenario metric (so is the -metric), under the assumption that for all , , which holds for covering problems. Here, we can reformulate as a polynomial-size minimization LP and hence, obtain a compact LP for (Q), and round its optimal solution using a local approximation algorithm. Theorem 3.13 proves a more general result along these lines.)
To overcome these obstacles, we work with the convex program given by (Q). Recall that g(x,y,A):=\max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}, where , , and . We show that the complexity of solving (Q) is tied to the problem of finding a near-optimal solution to . However, as noted earlier, under the standard notion of approximation, it is impossible to obtain any approximation guarantee due to the mixed-sign objective in (see Theorem 3.12). To evade this difficulty, we consider the following non-standard notion of approximation for .
Definition 3.6**.**
We say that is a -approximation algorithm for , where , if it returns a scenario such that for all .
Recall that a local -approximation for (Q) is an algorithm that given , returns an integral solution and integral recourse actions for every (implicitly), such that and \text{(cost of \widetilde{z}^{A})}\leq\rho g(x,A) for all . The main result of this section, which is used to interface with Theorem 3.5, is as follows.
Theorem 3.7**.**
Suppose that we have a polytime separation oracle for , a local -approximation algorithm for (Q), and a -approximation algorithm for for any . For any , in \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\log(\frac{1}{\varepsilon})\bigr{)} time, we can compute and an estimate of such that: , and .
We prove the above theorem by utilizing the ellipsoid method. For this, we need to be able to compute a subgradient of the objective function . Shmoys and Swamy [35] showed that it suffices to have -subgradients (Definition 2.2). We show that a near-optimal solution to (T) yields an approximate subgradient of (Lemma 3.8), and we can obtain such a solution to (T) using a -approximation to (Lemma 3.9). Recall from properties (P4), (P5) that for every , the function is convex, and at every , , we can efficiently compute , and a subgradient with , where . The proof of Lemma 3.9 appears after the proof of Theorem 3.7, right before Section 3.2.1.
Lemma 3.8**.**
Let , and be a -approximate solution to (T). Then is a \bigl{(}1-\frac{1}{\beta}\bigr{)}-subgradient of at .
Proof.
Consider any . Since is a feasible solution to (T), we have . Let be an optimal solution to (T). Since is a -approximate solution to (T), we have
[TABLE]
Therefore,
[TABLE]
The second inequality follows since is a subgradient of at . â
Lemma 3.9**.**
Let . Suppose we have a -approximation algorithm for for all and all . Then, (i) we can compute a -approximate solution to (T); (ii) hence, satisfies .
The ellipsoid-based algorithm in [35] (and for convex optimization in general) has two phases: one where we use approximate subgradients to obtain a polynomial number of feasible points such that at least one of them is a near-optimal solution, and the other, where we choose the best among these feasible points. In the first phase, starting with an ellipsoid that contains the entire feasible region, at each step, we add a cut (i.e., a hyperplane) passing through the center of the current ellipsoid to chop off a half-ellipsoid that does not contain points of interest. If is infeasible, we use a violated inequality to obtain such a cut. Otherwise, we find an -subgradient of at and use the cut ; the definition of -subgradient ensures that any point discarded by this cut has . We continue this until the volume of the current ellipsoid becomes sufficiently small, which happens after a polynomial number of iterations. The first phase can be executed using -subgradients, for an arbitrary . Shmoys and Swamy [35] showed that the second phase can be implemented even without having an (approximate) objective-function oracle (which can be hard to obtain with exponentially many scenarios) provided that we have -subgradients for sufficiently small ().
Computing -subgradients efficiently for such small would require an FPTAS for (T). But, in general, the optimization problems and (T) are complicated problems that can capture the APX-hard -- problemââencountered in 2-stage robust optimization [11, 17, 23] (see Theorem 3.12). rules out an FPTAS for (T); moreover, the approximation we can obtain for will naturally depend on the application. We sidestep this difficulty by noting that Lemma 3.9 (ii) gives a -approximate value oracle for , which can be used to implement the second phase.
A final difficulty that remains is that for our applications (see Section 3.3), we will only be able to approximate for integral (as is the case with robust -- problems); indeed Theorem 3.7 only assumes that we have an approximation algorithm for computing when . However, we need to add an -subgradient cut passing through the center of our current ellipsoid, which will typically not be integral; so we will not be able to use Lemmas 3.9 and 3.8 to obtain an -subgradient at . To bypass this difficulty, we use the unorthodox approach of generating a cut from a point different from the ellipsoid-center . We round to using our local approximation algorithm, and use Lemma 3.8 at , but with an approximate solution to (T) (obtained by approximating ), to compute a vector ; we add the cut . While need not be an -subgradient at , we argue that this cut is still valid, in that any point cut off by the inequality has large compared to .
Lemma 3.10**.**
Let and be obtained by rounding using a local -approximation algorithm. Let be a -approximate solution to , and let . If is such that , then h({\widehat{p}}\,;{x^{\prime}})\geq\frac{1}{\rho}\cdot\bigl{(}c^{\intercal}\widetilde{x}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(\widetilde{x},A^{\prime})\bigr{)}\geq\frac{1}{\beta\rho}\cdot h({\widehat{p}}\,;{\widetilde{x}}).
Proof.
Define for all . Clearly, for all . Also, since we use a local approximation algorithm to obtain , we have . By mimicking the proof of Lemma 3.8, we have that is a subgradient of at . We have . So . Finally, by Lemma 3.9 (ii). â
We describe below the algorithm leading to Theorem 3.7. By (P3), , and contains a ball of radius , where \ln\bigl{(}\frac{R}{V}\bigr{)}, are . Lemma 3.8 implies that the Lipschitz constant of is at most , so . To utilize to obtain Theorem 3.7, we require a lower bound on with \log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}). Under a standard, rather mild assumption (that originated in [35]), we argue that we can either compute such a lower bound, or determine that is an optimal solution (Lemma 3.11), and show that this suffices. Call a scenario a ânull scenarioâ if for all (e.g., in ). We assume that in every non-null scenario , we have for all . We assume that we are given (or an upper bound on it) in the input.
Algorithm .
Require: separation oracle for , local -approximation algorithm , and a -approximation algorithm for for all .
Output: and satisfying: , and \widetilde{f}\leq\rho\bigl{(}\min_{x\in X}h({\widehat{p}}\,;{x})+\eta\bigr{)}.
- A1.
Set k\leftarrow 0,\ \bar{x}_{0}\leftarrow 0,\ \mu\leftarrow\min\bigl{\{}1,\frac{\eta}{2K^{\prime}R}\bigr{\}},\ N\leftarrow\lceil 2m^{2}\ln\bigl{(}\frac{2R}{\mu V}\bigr{)}\rceil. Let and . 2. A2.
For do the following. (We maintain that is an ellipsoid centered at containing .)
- a)
If , let be an inequality that is satisfied by all but violated by . (This is either obtained from a separation oracle for , or from inequalities added in prior iterations.) Let be the halfspace . 2. b)
If , let be obtained by rounding using . Use Lemma 3.9 and to obtain a -approximate solution to (T) (which has polynomial-size support). Define , and . If , then return and . Otherwise, let denote the halfspace . Set , and . 3. c)
Set to be the ellipsoid of minimum volume containing the half-ellipsoid , and let be its center. 3. A3.
Let . Let . Return and .
Lemma 3.11**.**
Suppose that we have a -approximation for for some scenario . We can efficiently determine that either is a lower bound on for every distribution , or is an optimal solution to for every distribution .
Proof.
We first show that for every , if contains any non-null scenario. Otherwise is an optimal solution to for every . Note that a non-null scenario must satisfy .
Say is a non-null scenario. Fix any . There is a feasible solution to (Tp,x) that sends at least flow to , i.e., . So , and so , since as is a non-null scenario. This holds for every .
If all scenarios in are null scenarios, then for all , since for all and . Hence, ; again, this holds for all .
We use the -approximation algorithm for to obtain a scenario . Therefore, we have . So if , then for all , which means that all scenarios in are null scenarios, and we return as an optimal solution. Otherwise, we return the lower bound . To see why is a valid lower bound, when , note that there are two cases. If contains a non-null scenario then we have established that is a lower bound. Otherwise, we have established that is an optimal solution; there is a feasible solution to (Tp,0) that sends at least to , so . â
Proof of Theorem 3.7.
We first apply Lemma 3.11 to either determine that is an optimal solution, or obtain a lower bound on . If Lemma 3.11 returns as an optimal solution, then we use Lemma 3.9 and to obtain a -approximate solution to (T). We return as the optimal solution, and as an estimate of , which is a suitable estimate due to Lemma 3.9 (ii).
So suppose Lemma 3.11 returns the lower bound . We run Algorithm with . By Lemma 3.9 (ii), we immediately obtain that for all .
We re-work the arguments in Lemma 4.5 from [35]. For , let denote the volume of . Let denote the volume of the unit ball (in the -norm) in . It is well known that for every (see, e.g., [14]).
Let be an optimal solution to . Recall that \mu=\min\bigl{\{}1,\frac{\eta}{2K^{\prime}R}\bigr{\}}. If for some (this includes the case when ), then Lemma 3.10 shows that . Otherwise consider the affine transformation defined by where is the identity matrix, and let , so is a shrunken version of . By properties of affine transformations, we have , where the last inequality follows since contains a ball of radius . For any , we have since ; so since has Lipschitz constant at most . The volume of the ball is . Therefore,
[TABLE]
So there must be a point that lies on a boundary of generated by a hyperplane . This implies (by Lemma 3.10) that
[TABLE]
where the last inequality follows since is a lower bound on . â
Proof of Lemma 3.9.
Part (ii) follows immediately from part (i) and the definition of . We focus on proving part (i). We consider the dual of (T), and show that a -approximation algorithm for yields an approximate separation oracle for the dual. The dual of (T) is as follows.
[TABLE]
Notice that (D) is an LP (since is fixed) with an exponential number of constraints, but a polynomial number of variables. It is evident that yields some type of approximate separation oracle for (D). Using a standard technique in approximation algorithms, we prove that (D), and the primal (T), can be solved approximately (see, e.g., [22, 12]).
Define . Note that is the smallest such that . We use to give an approximate separation oracle in the following sense. Given , we either show that , or we exhibit a hyperplane separating from . Thus, for a fixed , in polynomial time, the ellipsoid method either certifies that , or returns a point with . The approximate separation oracle proceeds as follows. We first check if and (5) hold, and if not, use the appropriate inequality as the separating hyperplane. Next, for every , we run for the point . If in this process, we ever obtain a scenario such that then we return as the separating hyperplane. Otherwise, for all and , we have
[TABLE]
This implies that .
It is easy to find an upper bound with polynomially bounded such that . For a given , we use binary search in to find such that the ellipsoid method when run for (with the above separation oracle), returns a solution with , and when run for certifies that . So . For , we obtain a polynomial-size certificate for the emptiness of . This consists of the polynomially many violated inequalities returned by the separation oracle during the execution of the ellipsoid method, and the inequality . By duality (or Farkasâ lemma), this means that if we restrict (T) to only use the variables corresponding to (the polynomially-many) violated inequalities of type (4) returned during the execution of the ellipsoid method, we can obtain a polynomial-size feasible solution to (T) whose value is at least . If we take to be (so the binary search still takes polynomial time), this also implies that has value at least . â
3.2.1 Hardness results for the SAA problem
First, observe that for the DR 2-stage problem , where has polynomial-size support, if we set , then , so that computing is equivalent to the - problem .
Theorem 3.12**.**
Consider the DR 2-stage problem , where the support of is a polynomial-size subset of . Consider the following two settings.
- (B1)
the -bounded setting with the metric; 2. (B2)
the unrestricted setting with scenario metric given by: for all ; for , we have if , and otherwise, where is an upper bound on .
Assume that , the -- problem , is NP-hard, and the optimum value of is at least . We have the following hardness results in both settings, assuming PNP.
- (a)
No polytime multiplicative approximation is possible for computing , given as input. 2. (b)
By choosing suitably, the hardness result in (a) carries over to the problem of computing {\textstyle\operatorname*{E}_{A\sim\widehat{p}}}\bigl{[}g(0,y,A)\bigr{]}, given as input. 3. (c)
One can choose , so that the problem of computing is at least as hard as .
Proof.
Part (b) follows from part (a) by simply taking to be the distribution that puts a weight of on the scenario ; then {\textstyle\operatorname*{E}_{A\sim\widehat{p}}}\bigl{[}g(0,y,A)\bigr{]}=g(0,y,\emptyset), so the hardness result in part (a) carries over. Let be an optimal solution to , and be its objective value.
Part (a).
We consider the setting (B1) first. Clearly, also seeks to find an optimum of . By exploiting the mixed-sign objective, we can argue that any multiplicative approximation would allow us to decide if by setting appropriately, which is NP-complete. More precisely, suppose we have a -approximation algorithm for . Then, we can decide if for a given number as follows. Set , and run the -approximation algorithm. If , then
[TABLE]
so the approximation algorithm would return a solution with positive value. If instead we have , then for every scenario with , we have . Since we also have , we conclude that , and so the approximation algorithm must return a solution with value [math]. So we can distinguish between and .
Now consider the setting (B2). Again, suppose we are given and we want to decide if . We may assume that , as otherwise the answer is yes. Again take . If , then scenario satisfies , so a multiplicative approximation for must return a solution with positive objective value. If , then we claim that , and so the approximation algorithm must return a solution with objective value 0. Thus, we can distinguish between and . To prove the claim, we have . For every , we have . For every , we have .
Part (c).
For the setting (B1), we simply set (and to be arbitrary). Then, we have , which is exactly the same as problem .
For the setting (B2), we set and take to be the distribution that puts weight of 1 on . We claim that is again the same as problem . Setting and everywhere else gives a feasible solution to (T) of objective value . Let be an optimal solution to (T). Let be the amount of flow sent by on pairs with . Let . The flow on the remaining pairs has volume , contributes at most to the objective, and has -cost . So we have and , which implies that (\alpha+\theta)\bigl{(}\mathit{OPT}_{\Pi}-\frac{1}{2}\bigr{)}\leq 0. Since by assumption, we have that , and hence has objective value . â
3.2.2 Refinements: formulating (Q) as a compact LP in special
cases
We say that the set of scenarios is collapsible under the scenario metric if for every scenario , we can efficiently compute a polynomial-size collection of scenarios such that for every , , we have g(x,y,A)=\max_{A^{\prime}\in\phi(A)}\bigl{(}g(x,A^{\prime})-y\cdot\ell(A,A^{\prime})\bigr{)}. For example, if for a ground set , is the discrete scenario metric, and for all , , then is collapsible under since is attained by scenarios or , for all . We show that if is collapsible under then (Q) can be cast as a polytime-solvable LP, and its optimal solution can be rounded using an algorithm that is weaker than a local approximation algorithm. (Note also that in this special case, we have a simple, application-independent polytime algorithm for computing exactly.)
A restricted local -approximation algorithm takes as input a point and a set of scenarios , and returns an integral solution and integral recourse actions for every (possibly specified implicitly), such that and \text{(cost of \widetilde{z}^{A})}\leq\rho g(x,A) for all . (A local -approximation algorithm is a special case of this.) This weaker notion will be crucial for the Steiner-tree application in Section 3.3.
Theorem 3.13**.**
Suppose that is collapsible under the scenario metric , and is the optimal value of a polytime-solvable LP for all . Suppose that we have a polytime separation oracle for , and a restricted local -approximation algorithm for (Q). Then, in time, we can compute:
- (a)
an optimal solution to , and its objective value ; 2. (b)
, and its objective value , satisfying .
Proof.
We reformulate as an LP. The dual of (T) is as follows.
[TABLE]
Since by assumption is collapsible under the scenario metric , the exponentially many constraints in (6) can be collapsed to the polynomially many constraints:
[TABLE]
Suppose that is captured by the polytime-solvable LP: , where is a polytope (over which we can optimize linear functions efficiently). Then, incorporating this in the above constraints, we obtain the following LP-formulation for .
[TABLE]
Since we have polytime separation oracles for the polytopes and , we can efficiently compute an optimal solution for (DR-LP) using the ellipsoid method. This proves part (a).
Part (b) follows from part (a) by applying the restricted local -approximation algorithm with the scenario set to round and obtain . As shown above, we can efficiently compute , and hence , by solving an LP. Observe that if is an optimal solution to (D), then satisfies constraints (7), which implies that . Since we also have , this implies . â
3.3 Applications to distributionally robust combinatorial optimization
We now apply our frameworkâi.e., Theorems 3.5 and 3.7âfor handling general DR 2-stage problems to obtain the first approximation guarantees for the DR versions of various combinatorial-optimization problems (under the Wasserstein metric) such as set cover, vertex cover, edge cover, facility location, and Steiner tree. Except for set cover, our approximation factors are within factors of the guarantees known for the deterministic counterparts of these problems. In order to apply Theorems 3.5 and 3.7 for a specific problem, we need to do the following.
Verify that properties (P1)â(P6) hold. This is usually quite immediate. (P1)â(P3) follow from the problem definition (in most cases , ), with being the maximum factor by which the cost of a first-stage action increases in the second stage. (P4), (P5) follow from prior work [35, 38] as the underlying 2-stage problem falls into the class of 2-stage programs considered therein. (P6) can usually be satisfied by taking , for a suitable upper bound on . 2. 2.
Furnish the following algorithms.
- (a)
An LP-relative -approximation algorithm for the deterministic counterpart, so as to round and obtain integral second-stage decisions: we simply plug in known approximation results. 2. (b)
A local -approximation algorithm for the 2-stage problem: we have for set cover, vertex cover, and edge cover [35], and for facility location [35]. (For Steiner tree, we use Theorem 3.13 in place of Theorem 3.7; see below.) 3. (c)
A -approximation algorithm for computing , where . This is a new component that we need to devise, whose design will depend on the scenario set and the scenario metric (and of course the underlying problem). For various problems, we show how to obtain such an approximation by building upon results known for -- problems. We defer the proof of Theorem 3.14 to the end of this section (Section 3.3.6).
Theorem 3.14**.**
For the -bounded setting with being the discrete metric, for any , we can obtain -approximation algorithms for computing , where is: (a) for set cover; (b) for vertex cover; and (c) for edge cover.
Theorems 3.5 and 3.7 then show that, for any , we can obtain a solution to the distributionally robust discrete 2-stage problem (i.e., integral first- and second-stage decisions) of cost at most 4\alpha\rho\beta_{1}\beta_{2}\bigl{(}1+O(\varepsilon)\bigr{)} times the optimum in \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon}\bigr{)} time (and hence, sample complexity).
In certain cases, we can obtain improved guarantees by exploiting the fact that the fractional SAA problem, , can be solved in a better way, without resorting to a local approximation algorithm. The most generic such setting is the unrestricted setting when the scenario collection is collapsible under the scenario metric. This includes the following natural choices of the scenario metric.
Lemma 3.15**.**
Suppose that for all , and all , we have . Then the collection of scenarios is collapsible under: (i) the discrete metric ; and (ii) the asymmetric metric , where is a metric on .
Proof.
Let be an arbitrary scenario. If is the discrete metric , we take . If is the asymmetric metric , we take , where is the set of all distances between two elements of the ground set. Note that in both settings, if we choose an arbitrary pair , the collection of scenarios contains the (unique) maximal solution for the constrained problem (3.25). By the monotonicity property of the second-stage costs imposed in the lemma statement, is optimal for (3.25). By Lemma 3.25, it follows that contains an optimal solution for the unconstrained problem for every pair , and so is collapsible under . â
The condition on in Lemma 3.15 holds for all our applications, since they are covering problems. Thus, in the unrestricted setting with Wasserstein metric corresponding to the scenario metrics in Lemma 3.15, Theorem 3.13 combined with Theorem 3.5 yields an improved 4\alpha\rho\bigl{(}1+O(\varepsilon)\bigr{)}-approximation, using a restricted local -approximation algorithm, a weaker requirement that is crucial for Steiner tree. There are other, orthogonal benefits that result from achieving a better approximation for the fractional SAA problem than that given by Theorem 3.7. These require taking a different route than Theorem 3.5 to transfer approximation guarantees from the SAA problem to the original problem. We discuss these in the context of the specific problems to which they apply.
3.3.1 Set cover
The DR version was defined in Section 2. Recall that an instance is given by \bigl{(}U,\mathcal{S},\{c_{S},c^{\mathrm{II}}_{S}\}_{S\in\mathcal{S}}\bigr{)}, where and denote the first- and second-stage costs respectively. Let . We have , and . Different scenarios could be quite unrelated, so there does not seem to be a natural choice for other than the discrete metric ; we therefore consider the -metric. We can take . Instantiating the above results yields an -approximation in the unrestricted setting, and an -approximation in the -bounded setting (using Theorem 3.14 (a)). But we can do better and improve these guarantees by an factor.
By incorporating a decoupling idea of [35] in our ellipsoid-based algorithm (in a manner similar to [11] in their work on 2-stage robust set cover), we can avoid the use of local approximation algorithm in Algorithm , and instead use a -approximation algorithm for more directly.
Theorem 3.16**.**
Consider the fractional SAA problem: . Suppose that we have a -approximation algorithm for for any . For any , in \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\log(\frac{1}{\varepsilon})\bigr{)} time, we can compute , and an estimate of , satisfying and .
We complement Theorem 3.16 with an analogue of Theorem 3.5, to transfer approximation guarantees from the fractional SAA problem, , to the original fractional problem, .
Note that by Lemma 3.11, we can find in polytime (under very mild assumptions) a lower bound (independent of ) on the optimal value of such that \log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}), or determine if is an optimal solution to for every distribution . In the latter case, there is nothing to be done, so assume otherwise.
Theorem 3.17**.**
Let , . Let (Q): , be the fractional version of a DR problem satisfying properties (P1)â(P6). Let be a lower bound on for all . Consider k=\frac{2}{\varepsilon}\log\bigl{(}\frac{1}{\delta}\bigr{)} SAA problems with objective functions , for , where each is an empirical estimate of constructed using N=\operatorname{\mathsf{poly}}(\frac{\lambda}{\varepsilon},\log(\frac{\tau R}{V\mathsf{LB}}),\log(\frac{1}{\delta})\bigr{)} independent samples. Suppose that for every , we have a solution and an estimate of satisfying and (where ). Let and . Then, h({\mathring{p}}\,;{\bar{x}})\leq 4\overline{\beta}\rho\bigl{(}1+O(\varepsilon)\bigr{)}\cdot\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x}) with probability at least .
Before proving Theorems 3.16 and 3.17, we state the results that follow from these (and other prior results). Combining Theorems 3.13 (a) and 3.17, and a local -approximation algorithm (where ), we obtain an -approximation in the unrestricted setting. Combining Theorems 3.14 (a), 3.16, and 3.17, and a local -approximation algorithm, we obtain an in the -bounded setting.
Proof of Theorem 3.17.
The proof follows by suitably discretizing and applying Theorem 3.5 to the discretized version of . By Lemma 3.8, for every distribution , we have that the Lipschitz constant of is at most , and . Recall that by (P3), is contained in the ball , and contains a ball of radius such that \ln\bigl{(}\frac{R}{V}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}). We discretize as in [38]. Let , and consider the grid .555Note that needs to be a part of the specification of the grid size; otherwise, a âflatâ could evade the grid across arbitrarily large distances. As shown in [38], we have: (i) |\mathcal{G}|\leq\bigl{(}\frac{2R}{\Delta}\bigr{)}^{m}, and so \log|\mathcal{G}|=\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\log(\frac{1}{\varepsilon\cdot\mathsf{LB}})\bigr{)}; and (ii) for any , letting denote the point in closest to in Euclidean distance, we have \bigl{\|}x-\phi(x)\bigr{\|}\leq\frac{\varepsilon\mathsf{LB}}{K^{\prime}}, and hence, \bigl{|}h({p}\,;{x})-h({p}\,;{\phi(x)})\bigr{|}\leq\varepsilon\mathsf{LB}.
Let , the number of samples used to construct each empirical estimate , be as given by Theorem 3.5, when we apply it taking to be the grid âi.e., we are considering the DR 2-stage problem âand . Note that properties (P1)â(P6) hold for this DR problem (since by assumption they hold for the DR problem ).
To apply Theorem 3.5 with , we also need to supply the points and the estimates as required by the theorem statement. We set , and for all . We show that these satisfy properties (S1) and (S2) in the statement of Theorem 3.5, with . To see this, consider any . We have
[TABLE]
and since , we have . Moreover, the index , which is a minimizer of the estimates, is also a minimizer for the new estimates . So applying Theorem 3.5, we obtain that with probability at least ,
[TABLE]
Note that . Therefore, we have
[TABLE]
Proof of Theorem 3.16
Let \bigl{(}U,\mathcal{S},\{c_{S},c^{\mathrm{II}}_{S}\}_{S\in\mathcal{S}}\bigr{)} be the DR set cover instance being solved. For any point , let be the set of elements covered to an extent of at least by the first-stage sets.
The improvement comes from a better way of generating a cut passing through the center of the current ellipsoid, when . Instead of rounding to using a local -approximation algorithm and using approximate solutions to to generate a suitable cut at in step A2.A2.b) of Algorithm , we do the following. Since elements in are mostly covered by , and the remaining elements are mostly uncovered, intuitively only these remaining elements should matter. Indeed, we argue that approximate solutions to \max_{A^{\prime}\in\mathcal{A}}\bigl{(}g(0,A^{\prime}\setminus S_{\bar{x}})-y\cdot\ell(A,A^{\prime})\bigr{)} can be used to obtain a suitable cut at . Note that this problem can be cast as for a modified instance where we add to our set-system, with costs . Thus, we avoid the -factor loss that was incurred earlier due to the local approximation.
Consider the following LP.
[TABLE]
We prove analogues of Lemmas 3.9 and 3.10 showing that one can compute an approximate solution to (W) using an approximation algorithm for (Lemma 3.18 (i)), which allows us to both approximate for a related point (Lemma 3.18 (ii)), and obtain a suitable cut passing through (Lemma 3.19).
Lemma 3.18**.**
Let and . Suppose we have a -approximation algorithm for for all . Then, (i) we can compute a -approximate solution to (W); (ii) hence, letting , we have .
Proof.
Consider the instance of DR set cover obtained from the original instance \bigl{(}U,\mathcal{S},\{c_{S},c^{\mathrm{II}}_{S}\}_{S\in\mathcal{S}}\bigr{)} by adding the set to , with costs . Let denote the second-stage costs for this new instance of DR set cover. Note that, for every scenario , we have . Therefore, if we were to write the LP () for this modified instance of DR set cover (i.e., () with substituted by ), we would obtain (W). This means that we can obtain a -approximate solution to (W) by applying Lemma 3.9 (i) to the modified instance (using the -approximation algorithm for given to us, also applied to the modified instance). This proves (i).
To prove (ii), let be an optimal solution of (T). We obtain
[TABLE]
The first inequality follows because and, for every scenario , we have . The latter inequality holds because every feasible fractional second-stage solution for scenario with as the first-stage solution, covers all elements of fully, and hence, combined with , fully covers all elements of ; therefore, it yields feasible fractional second-stage actions for scenario given the first-stage actions . The second inequality above follows because is a -approximate solution for (W). The final inequality uses the fact that . â
Lemma 3.19**.**
Let and . Let be a -approximate solution to the LP (W), and let . If is such that , then h({\widehat{p}}\,;{x^{\prime}})\geq\frac{1}{2}\bigl{(}2c^{\intercal}\bar{x}+\sum_{(A,A^{\prime})\in\mathcal{A}^{\mathrm{sup}}\times\mathcal{A}}\gamma_{A,A^{\prime}}g(0,A^{\prime}\setminus S_{\bar{x}})\bigr{)}\geq\frac{1}{2\beta}\cdot h({\widehat{p}}\,;{\widetilde{x}}).
Proof.
Consider the function defined over . Note that is feasible for the LP (T) and for every scenario , which implies . By mimicking the proof of Lemma 3.8, we have that is a subgradient of at . So
[TABLE]
Now, note that for every scenario , we have . This is because if is a feasible second-stage solution to scenario given as the first-stage actions, then it covers elements of to an extent of at least , and so is a feasible second-stage solution for given [math] as the first-stage actions. So we obtain
[TABLE]
where the last inequality follows from Lemma 3.18 (ii). â
We now exploit Lemmas 3.18 and 3.19 to obtain Theorem 3.17. We do so by mimicking the proof of Theorem 3.7, and pointing out the changes to Algorithm and its analysis. Let be a -approximation algorithm for for all . As before, we start by using Lemma 3.11, either certifying that is an optimal solution to (Q) (in which case we return , and an estimate of computed via Lemma 3.9), or that , where . Suppose we are in the latter case. We run Algorithm with parameter , but modify step A2.A2.b) as follows.
- âą
If , let . Use Lemma 3.18 and to obtain a -approximate solution to (W) (which has polynomial-size support). Define , and . If , then return and . Otherwise, let denote the halfspace . Set , and .
By Lemma 3.18 (ii), we immediately obtain that for all . Let be an optimal solution to . We show that there exists an index such that . We have two cases to consider.
- âą
Case 1: we have for some (this includes the case where ). Then Lemma 3.19 shows that .
- âą
Case 2: we have for all . In this case, as argued in the proof Theorem 3.7, we can show that there must be a point such that and for some . Using Lemma 3.19 again, we obtain \widetilde{f}_{l}\leq 2\cdot h({\widehat{p}}\,;{x^{\prime}})\leq 2\bigl{(}h({\widehat{p}}\,;{x^{*}})+\eta\bigr{)}=2\cdot h({\widehat{p}}\,;{x^{*}})+2\varepsilon\cdot\mathsf{LB}\leq 2(1+\varepsilon)h({\widehat{p}}\,;{x^{*}}). â
3.3.2 Vertex cover
This is the special case of set cover where we want to cover edges of a graph by vertices, and we again consider the -metric. We have , , so we obtain approximation factors of \bigl{(}4\rho+O(\varepsilon)\bigr{)}=\bigl{(}16+O(\varepsilon)\bigr{)} in the unrestricted setting (using Theorems 3.13 (a) and 3.17), and \bigl{(}4\rho\alpha\cdot\frac{2e}{e-1}+O(\varepsilon)\bigr{)}=\bigl{(}101.25+O(\varepsilon)\bigr{)} in the -bounded setting (via Theorems 3.14 (b), 3.7, and 3.5).
3.3.3 Edge cover
This is the special case of set cover where we want to cover vertices of a graph by edges, and we again consider the -metric. We have , , so we obtain approximation factors of \bigl{(}12+O(\varepsilon)\bigr{)} in the unrestricted setting (via Theorems 3.13 (a) and 3.17), and \bigl{(}36+O(\varepsilon)\bigr{)} in the -bounded setting (via Theorems 3.14 (c), 3.7, and 3.5).
3.3.4 Facility location
The DR version () was defined in Section 2. Recall that an instance is given by the tuple \bigl{(}\mathcal{F},\mathcal{C},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}},\{f_{i},f^{\mathrm{II}}_{i}\}_{i\in\mathcal{F}}\bigr{)}, where , are the facility and client-sets respectively, is the underlying metric, and are the first- and second-stage facility-opening costs. We have  [25]. Shmoys and Swamy [35] showed that an LP-relative -approximation for deterministic FL having a certain âdemand-obliviousnessâ property can be turned into a -approximation algorithm for 2-stage FL. If the -approximation algorithm has the property that it returns a solution where every cost component of the rounded solutionâi.e., the facility cost, and each clientâs assignment costâis at most times the corresponding cost component of the fractional solution, then the resulting algorithm is a local approximation algorithm. Using the deterministic -approximation algorithm of [36] gives a local -approximation with .
As noted in Section 2, besides the discrete scenario metric, we could define various other natural scenario metrics here in terms of the metric and obtain a rich class of DR models under the Wasserstein metric. We consider one such setting: the asymmetric metric given by .
Theorem 3.20**.**
For with being either the discrete metric or the asymmetric metric , there is a -approximation for computing in the -bounded setting, for any ,
For the Wasserstein metric with respect to both the discrete metric and , we can take \tau=\bigl{(}\sum_{i\in\mathcal{F}}f^{\mathrm{II}}_{i}+\sum_{i\in\mathcal{F},j\in\mathcal{C}}w_{ij}\bigr{)}/(\min_{i,j:w_{ij}>0}w_{ij}). We obtain the following approximation guarantees for with the Wasserstein metric corresponding to the above scenario metrics: (i) \bigl{(}4\rho+O(\varepsilon)\bigr{)}=\bigl{(}21.96+O(\varepsilon)\bigr{)} in the unrestricted setting (using Theorems 3.13 (a) and 3.17); and (ii) \bigl{(}24\rho\alpha+O(\varepsilon)\bigr{)}=\bigl{(}196+O(\varepsilon)\bigr{)} in the -bounded setting (using Theorems 3.20, 3.7, and 3.5).
Proof of Theorem 3.20
Fix , where . Fix to be either the discrete scenario metric or the asymmetric metric . Since takes polynomially-many values, by Lemma 3.25 (i), it suffices to give a -approximation for the constrained problem (3.25): .
With both scenario metrics, this amounts to approximating the -- fractional facility location problem for an underlying facility-location instance \bigl{(}\mathcal{F},\mathcal{C}^{\prime},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}^{\prime}},\{\widetilde{f}_{i}\}_{i\in\mathcal{F}}\bigr{)}, where if , and is otherwise. If and , then (if , the optimum of the constrained problem is ); if , then .
A -approximation algorithm for -- facility location.
We now devise an algorithm for the -- fractional facility-location problem corresponding to a facility-location instance (such as the one obtained above) \bigl{(}\mathcal{F},\mathcal{C}^{\prime},\{w_{ij}\}_{i,j\in\mathcal{F}\cup\mathcal{C}^{\prime}},\{\widetilde{f}_{i}\}_{i\in\mathcal{F}}\bigr{)}.
Khandekar et al. [23] give a -approximation for the version of -- integral FL, where a scenario may place an arbitrary number of co-located clients at a location in (and the total number of clients must be at most ).666Since the gap between the integral and fractional optimal values for FL is at most  [25], a -approximation for the integral (resp. fractional) version implies an -approximation for -- fractional (resp. integral) facility location. However, in our setting, we may place at most one client at any location in , so the algorithm in [23] does not work for our purposes. (Clearly, our setting is more general, since we can encode the scenario-setting of [23] by creating co-located copies at every .) As noted earlier, we can model more-general settings, where clients have (integer) demands, by creating a fixed number of co-located clients at locations in ; but, here again, we have a constraint that limits the number of co-located clients at any .
We therefore need to develop new techniques to devise an approximation algorithm for -- fractional FL. The key tool that we exploit here is that of cost-sharing schemes. We uncover a novel connection between cost-sharing schemes and -- problems by demonstrating that one can exploit a cost-sharing scheme for FL having certain properties to obtain an approximation algorithm for -- {integral, fractional} FL. Our result also improves the approximation factor for -- integral FL from to .
A cost-sharing method is a function , where for , intuitively gives the contribution of towards the cost incurred in satisfying the client-set (i.e., the cost of opening facilities and assigning clients in to these open facilities). Pål and Tardos [28] devised a cost-sharing method satisfying the following properties. For sets , define .
if . 2.
(Competitiveness) For every , we have . 3.
(Cost-recovery) For every , we have . 4.
(Cross-monotonicity) For all and every client , we have .
We will prove an additional useful property about , for which we very briefly describe how is computed. For every and , we compute a certain time . The cost-share of a client is then defined as . The function satisfies the following property: for every set , every client , and every facility , we have . Further, if this inequality is strict, then .
Lemma 3.21**.**
Consider and two clients and . Then \xi(S+j_{2},j_{1})\geq\min\bigl{\{}\xi(S,j_{1}),\xi(S+j_{2},j_{2})\bigr{\}}.
Proof.
By cross-monotonicity, we have . If this holds at equality, then the result follows immediately. So assume otherwise. By the way in which the cost-shares are defined, implies that for some facility and . This implies that , and it follows that . â
We may assume that (otherwise, we simply set ). Consider the following simple greedy algorithm. Initialize , . For , we find , and set .
Let be such that . We claim that . This will complete the proof since this implies that
[TABLE]
In fact [28] show a stronger form of cost-recovery, namely, that there is an integer solution feasible for scenario given first-stage decisions such that \xi(S,S)\geq\bigl{(}\text{cost of }z^{S}\bigr{)}/3 for every , and using this in the above chain of inequalities shows that yields a -approximation also for -- integral facility location.
We now prove the above claim. For any , we show that for all , where . We prove this by induction on . Note that due to cross-monotonicity, and since . The statement is clearly true for . Suppose this is true for index , and consider index . Consider any . Let be the element added to in iteration . By definition, . If , then , where the second inequality follows from the induction hypothesis. Thus, for every , we have . This completes the induction step.
Therefore, by repeatedly using cross-monotonicity, we have
[TABLE]
The first inequality follows from the statement proved in the previous paragraph; the second is simply because we restricted to ; the third follows from cross-monotonicity; the fourth is because we replaced by an average and all cost shares are nonnegative; the fifth is because ; and the last inequality is again due to cross-monotonicity. â
3.3.5 Steiner tree
The DR version () was defined in Section 2. Recall that an instance is given by \bigl{(}G=(V,E),c,s,\lambda\bigr{)}, where is a metric, is the root, and are the costs of buying edge in stages I and II respectively.
We do not have a local approximation algorithm for , but there is a restricted local -approximation algorithm for a monotone version of , wherein we require that in every scenario , the path from each node to the root consists of a segment starting at comprising edges bought in scenario , followed by a segment ending at comprising first-stage edges. (Thus, in effect, the first-stage edges should form a tree containing .) This monotonicity property was stipulated by [16, 6] in the context of 2-stage {stochastic, robust} Steiner tree respectively, where they show that imposing this condition only incurs a factor- loss. We argue that the same holds in the DR setting. Thus, by utilizing the restricted local -approximation algorithm devised by [19] for this monotone 2-stage Steiner tree problem in Theorem 3.13, and the well-known LP-relative -approximation for Steiner tree, we obtain the following results for the unrestricted setting.
Theorem 3.22**.**
* admits a -approximation algorithm in the unrestricted setting with the scenario metrics and (defined with respect to the metric on ).*
Proof of Theorem 3.22
For , the discrete first-stage action set is . We first show that imposing the monotonicity condition incurs a factor- loss for the DR problem. Recall that the monotonicity condition states that in every scenario , the path from a node to the root consist of a segment of second-stage edges starting at followed by a segment of first-stage edges ending at ; we call such a path a monotone path. For , we say that contains a - path (respectively a monotone -) path, if contains a - path (respectively a monotone - path). We want to compare the following two DR 2-stage Steiner tree problems.
[TABLE]
Lemma 3.23** ([6]).**
For every first-stage decision , there exists such that and for every set .
Corollary 3.24**.**
Consider the DR problems () and () for an arbitrary scenario collection . If is an -approximate solution to (), then it is a -approximate solution to ().
Proof.
By applying Lemma 3.23 to an optimal solution to (), we infer that . Note that for every scenario , we have by definition. It follows that the objective value of in () is no larger than its objective value in (), which by assumption is at most . â
Gupta et al. [16] consider the following integer program (IP) for . For notational simplicity, we assume that ; clearly, this can always be ensured without changing the problem. We have variables to indicate the edges bought in stage II. To encode the requirement that there is a monotone - path for every , we bidirect the edges to obtain the set of arcs , and use flow variables and to specify the segments of âs path comprising first-stage and second-stage edges. For a vertex , let (respectively ) denote the arcs of entering (respectively leaving) . For an arc , we abuse notation and use to denote the component of corresponding to the undirected version of .
[TABLE]
Constraints (8) and (9) enforce that sends one unit of flow from to for every terminal (so it dominates a directed path), and (10) enforces that this flow is supported on edges bought in stages I and II. Constraints (11) encode the monotonicity requirement on the - path.
Letting denote the optimal value of the LP-relaxation obtained by relaxing the integrality constraints (12), (13) to nonnegativity constraints, the DR 2-stage Steiner problem (with fractional second-stage decisions) we consider is: \min\ \bigl{(}h({\mathring{p}}\,;{x}):=c^{\intercal}x+\max_{q:L_{\mathrm{W}}(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}\bigr{)}; we call this monotone . By the discussion in the beginning of Section 3.3, properties (P1)â(P6) hold for monotone , setting and .
Recall that we are in the unrestricted setting (so ), and is the Wasserstein metric with respect to the discrete scenario metric or the asymmetric metric . The set of scenarios is collapsible under both these scenario metrics by Lemma 3.15. Gupta et al. [16] presented a restricted local -approximation algorithm for monotone , and the approximation factor was improved to by [19]. Therefore, utilizing Theorems 3.5 and 3.13, taking and (and in Theorem 3.5), we obtain an \bigl{(}80+O(\varepsilon)\bigr{)}-approximation for (). This yields a \bigl{(}160+O(\varepsilon)\bigr{)}-approximation for (using Lemma 3.24). â
3.3.6 Proof of Theorem 3.14
We first give a reduction, showing that one can approximate under very general settings provided that we have a (standard) approximation algorithm for a certain constrained problem.
Lemma 3.25**.**
Let be any scenario set, and be any function satisfying for all . Fix , and scenario . Consider the constrained problem:
[TABLE]
Suppose that we have a -approximation algorithm for (3.25). Let .
(i) We can compute a -approximation to using calls to .
(ii) For any , we can compute a -approximation to using O\bigl{(}\log_{1+\varepsilon}(\frac{\ell_{\max}}{\ell_{\min}})\bigr{)} calls to , where and .
Proof.
The proof is based on a standard idea of enumerating over all values. For , let denote the scenario output by for (3.25).
For part (i), we do the following. We compute for all . Let \mu^{*}:=\operatorname{argmax}_{\mu\in\mathcal{L}}\bigl{(}g(x,A_{\mu})-y\cdot\ell(A,A_{\mu})\bigr{)}. We return . To show that this yields a -approximation for computing , consider any , and let . We have
[TABLE]
The first inequality follows from the definition of , and the second follows since is a -approximate solution for ().
For part (ii), we enumerate values in in powers of . More precisely, define \overline{\mathcal{L}}:=\{0\}\cup\bigl{\{}(1+\varepsilon)^{i}\ell_{\min}:i=0,\dots,\left\lceil\log_{1+\varepsilon}{\frac{\ell_{\max}}{\ell_{\min}}}\right\rceil\bigr{\}}. Note that |\overline{\mathcal{L}}|=O\bigl{(}\log_{1+\varepsilon}({\frac{\ell_{\max}}{\ell_{\min}}})\bigr{)}. We now compute for all . Let \mu^{*}:=\operatorname{argmax}_{\mu\in\overline{\mathcal{L}}}\bigl{(}g(x,A_{\mu})-y\cdot\ell(A,A_{\mu})\bigr{)}. We return . Consider any . By construction of , there is some such that . Again, by the definition of , and since is a -approximate solution for (), we have
[TABLE]
We now consider the setting in Theorem 3.14, namely, the -bounded setting with being the discrete metric, i.e., for some ground set , and if , and [math] otherwise.
Fix and a scenario . By Lemma 3.25, it suffices to give an approximation algorithm for the constrained problem (3.25). When , the optimum of the constrained problem is simply (which is easy to compute), and otherwise, the constrained problem simplifies to . So it suffices to obtain a -approximation to this latter problem, which is what we focus on in the sequel.
Part (a) of Theorem 3.14.
Gupta et al. [17] give an -approximation algorithm for -- set cover, wherein the goal is to choose a set so as to maximize the cost of an optimal integral set-cover for . It is implicit in their analysis777See Theorem 4.2 and Claim 4.3 in [17]; Theorem 4.2 proves that the optimal fractional cost of the set-cover instance is at most . that this also yields an -approximation for -- fractional set cover, where we seek to maximize the cost of an optimal fractional set cover.
This immediately implies an -approximation for as follows. Consider the set cover instance with ground set , and set-costs given by if , and otherwise. The -- fractional set cover for this instance is precisely the problem . So we obtain an -approximation to .
Part (b) of Theorem 3.14.
The problem can be viewed as -- fractional vertex cover, where the cost of a vertex is [math] if , and otherwise. Feige et al. [11] give a -approximation algorithm for -- fractional vertex cover, so we obtain a \bigl{(}\frac{2e}{e-1},1\bigr{)}-approximation for .
Part (c) of Theorem 3.14.
The problem can be viewed as -- fractional edge cover, where the cost of an edge is [math] if , and otherwise. Feige et al. [11] give a -approximation algorithm for -- fractional edge cover, so we obtain a -approximation for . â
4 Distributionally robust problems under the -metric
We now focus on the DR 2-stage problem (Q), and its fractional relaxation (Q), in the unrestricted setting (so , for some ) when is the -metric. Note that since the -distance between two probability distributions is at most , we can assume without loss of generality that . We devise an algorithm that, given any , runs in time \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{r\varepsilon}\bigr{)}, and returns a \bigl{(}2+O(\varepsilon)\bigr{)}-approximate solution to the fractional relaxation (Q). Combining this with a local -approximation algorithm, we obtain a -approximation for the DR discrete 2-stage problem (i.e., with discrete first- and second- stage actions). This leads to the first guarantees for the DR versions of set cover, vertex cover, edge cover, and facility location under the -metric (Theorem 4.2).
At a high level, our approach is as follows. We first show how to obtain a suitable convex proxy function that is pointwise close to the objective function so that one can cast the problem of minimizing as a standard 2-stage problem. Instead of utilizing the SAA approach to move to an SAA-version of with a polynomial-size central distribution, show that a near-optimal solution to the SAA problem translates to a near-optimal solution to the original problem, and finally show how to approximately solve the SAA problem (which is again challenging since this does not reduce to a polynomial-size LP), it is simpler to directly solve the proxy problem, , using the approximate-subgradient based machinery in [35]. We show that, under the assumption that for all , , which holds for all our applications, one can compute an -subgradient of efficiently in time \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\omega}\bigr{)}, and hence can directly use the ellipsoid-based approach in [35] to obtain a solution such that h^{\mathrm{pr}}({\mathring{p}}\,;{\bar{x}})\leq\bigl{(}1+O(\varepsilon)\bigr{)}\min_{x\in\mathcal{P}}h^{\mathrm{pr}}({\mathring{p}}\,;{x})+\eta. This in turn implies that h({\mathring{p}}\,;{\bar{x}})\leq\bigl{(}2+O(\varepsilon)\bigr{)}\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x})+\eta. We can fold the additive error into the multiplicative error by obtaining a lower bound on the optimum.
Theorem 4.1**.**
Let . Suppose that for all , and all , we have . In the unrestricted setting ( under the metric, we can compute a solution satisfying h({\mathring{p}}\,;{x})\leq\bigl{(}2+O(\varepsilon)\bigr{)}\min_{x\in\mathcal{P}}h({\mathring{p}}\,;{x}) with probability at least , in time \operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{\varepsilon r},\log(\frac{1}{\delta})\bigr{)}.
Theorem 4.2**.**
We obtain the following approximation factors for the DR discrete 2-stage problems in the unrestricted setting under the metric: (a) for set cover; (b) for vertex cover; (c) for edge cover; and (d) for facility location.
Proof.
This follows by rounding the solution returned by Theorem 4.1, because, as noted in Section 3.3, we have local approximation algorithms with guarantees of (a) for set cover (where ); (b) for vertex cover; (c) for edge cover; and (d) for facility location. â
In the sequel, we focus on proving Theorem 4.1. We first work our way towards defining the proxy function that we use. Note that for every distribution with , we must have for every scenario . We refer to the right side of this inequality as the blocked mass in scenario . The remainder of the probability mass (i.e., the difference and the blocked massed) may be moved to other scenarios, and hence we call it the free mass in scenario . Separating the blocked mass and the free mass of all the scenarios, we obtain a decomposition , where and for every scenario .
Estimating .
To define our proxy function, we will need an estimate of that is accurate within a factor. Lemma 4.3 shows that , which suggests that such an estimate can be obtained with high probability using \operatorname{\mathsf{poly}}\bigl{(}\frac{1}{r\varepsilon}\bigr{)} samples. We prove a few simple results below leading up to this (Lemma 4.6).
Lemma 4.3**.**
We have .
Proof.
If there exists a scenario with , then we have . Otherwise, we have . â
We partition the set of scenarios into a set of frequent scenarios and a set of rare scenarios . Note that , and for every scenario .
Lemma 4.4**.**
Consider a partition of the scenarios, with (and hence ). Let be a probability distribution such that . Let and . Then .
Proof.
We first show that the first sum in the definition of is a good estimate of the amount of free mass in . We have
[TABLE]
where the first step uses the triangle inequality; the second step uses the definition of ; the third step is by assumption.
Now we show that the second sum in the definition of is a good estimate of the amount of free mass in . We have
[TABLE]
where the first step uses the fact that ; the second step uses the fact that and are probability distributions; the third step uses the triangle inequality; the fourth step is by assumption.
Combining (14) and (15) yields . This, combined with Lemma 4.3 and the definition of , yields the result. â
Lemma 4.5**.**
Let be an empirical estimate of using samples, and let . Then we have , and with probability at least we have .
Proof.
The inequality follows from the definition of and the fact that is a probability distribution.
Since is a probability distribution and for every , we have . If we choose appropriately, by using Chernoff bounds we have for any fixed scenario . It follows that for any fixed scenario , we have \Pr\bigl{[}A\not\in\widehat{\mathcal{A}}^{\mathrm{freq}}\bigr{]}\leq\delta r. By the union bound, we have \Pr\bigl{[}\mathcal{A}^{\mathrm{freq}}\not\subseteq\widehat{\mathcal{A}}^{\mathrm{freq}}\bigr{]}\leq|\mathcal{A}^{\mathrm{freq}}|\delta r\leq\delta. â
Lemma 4.6**.**
We can compute an estimate of such that with probability at least in time .
Proof.
First, we use Lemma 4.5 to obtain a set of scenarios of size that is a superset of with probability at least . Next, we compute a empirical estimate of using samples. Using Chernoff bounds, we can choose so that for every scenario . By the union bound, this event does not happen for any of the scenarios with probability at least . In this case, the probability distribution and the partition of satisfy the conditions of Lemma 4.4, and so we can compute as described in that lemma.
The success probability is at least . â
A proxy function for .
We assume in the sequel that the estimate computed in Lemma 4.6 satisfies . Consider the polytope \mathcal{K}:=\bigl{\{}q\in\mathbb{R}_{+}^{\mathcal{A}}:\sum_{A\in\mathcal{A}}q_{A}\leq\widehat{P}^{\mathrm{free}},\quad q_{A}\leq r\ \forall A\in\mathcal{A}\bigr{\}}. Our proxy function is then defined as
[TABLE]
Informally, {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]} and can be seen as upper bounds on the contributions to \max_{q:L_{\infty}(\mathring{p},q)\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]} from the blocked mass and the free mass of respectively. We will argue that this proxy function is a good pointwise approximation of . First, we need the following preliminary lemma.
Lemma 4.7**.**
For every , we have \max_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}\geq\frac{1}{1+\varepsilon}\max_{q\in\mathcal{K}}\sum_{A\in\mathcal{A}}q_{A}g(x,A).
Proof.
Let be an optimal solution to (Kx). We prove that there exists a distribution with such that . This yields the result, since we obtain
[TABLE]
We give a constructive proof of the existence of , via an iterative algorithm. Recall that denotes the blocked mass of the distribution . We start by setting . Note that for all we have and . From now on, we will only increase components of , so these two properties will be conserved; therefore we maintain the invariant . We only need to work towards ensuring that is a probability distribution and that for every (which, along with for every , implies ).
Note that for every we have (which also implies ). Moreover, we have . It is possible that is not a probability distribution yet, if this inequality is not tight. If this is the case, then there must be a scenario such that . We increase the component until either we obtain (and hence is a probability distribution) or . If is still not a probability distribution we repeat the same step with a different scenario. As each step (except possibly the final one) decreases the number of scenarios such that , this process eventually stops. At this moment, is a probability distribution and satisfies for every , and so . â
Lemma 4.8**.**
For every , we have .
Proof.
We start by proving the first inequality. Let q^{*}:=\operatorname{argmax}_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}, so that h({\mathring{p}}\,;{x})=c^{\intercal}x+{\textstyle\operatorname*{E}_{A\sim q^{*}}}\bigl{[}g(x,A)\bigr{]}. We decompose into two vectors as follows: we write , where and for every scenario . Next we upper bound the contribution of each of these two vectors to the objective value . Since , we have \sum_{A\in\mathcal{A}}q^{1}_{A}g(x,A)\leq{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]}. Note that since , and by the way we defined , we must have for every scenario . Further, we have . It follows that , and so . Therefore we have
[TABLE]
proving the first inequality.
Now we proceed to prove the second inequality. We have
[TABLE]
The second step uses Lemma 4.7 and the fact that \max_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}\geq{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}g(x,A)\bigr{]} (since is feasible for the maximization problem on the left side). â
Solving the proxy problem .
We assume that for all , and all , we have , which holds for all covering problems. Recall that . Recall from property (P4) that for every , the function is convex, and at every we can efficiently compute its value. We will assume the following stronger version of (P5):
- (P5â)
For every and , we can efficiently compute a subgradient of at with .
Shmoys and Swamy [35] define a broad class of 2-stage problems for which (P5â) holds, which includes all the 2-stage problems considered in the literature. Recall that by (P3), and contains a ball of radius such that \ln\bigl{(}\frac{R}{V}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}). Let be the Lipschitz constant of ; we show in Lemma 4.13 that . Under this setup, we have the following result from [35].
Theorem 4.9** (see Theorem 4.7, Lemma 4.14 in [35]).**
Let , . Define N=\left\lceil 2m^{2}\ln\bigl{(}\frac{16KR^{2}}{V\eta}\bigr{)}\right\rceil and n=N\ln\bigl{(}\frac{8NKR}{\eta}\bigr{)}, and \omega=\varepsilon/2n=\operatorname{\mathsf{poly}}\bigl{(}\frac{\varepsilon}{\mathcal{I}},\log(\frac{1}{\eta})\bigr{)}. Suppose we have a procedure that given any point finds an -subgradient of at with probability at least in time . Then, we can find satisfying with probability at least in time O\bigl{(}T(\omega,\frac{\delta}{N+n})\cdot m^{2}\log^{2}(\frac{\widetilde{K}Rm}{V\eta})\bigr{)}=\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},T(\omega,\frac{\delta}{N+n}),\log(\frac{1}{\eta})\bigr{)}.
We show that one can compute an -subgradient with probability at least in time T(\omega,\delta)=\operatorname{\mathsf{poly}}\bigl{(}\mathcal{I},\frac{\lambda}{r\omega},\log(\frac{1}{\delta})\bigr{)}. Lemma 4.10 (ii) shows that to obtain an -subgradient, it suffices to be able to (a) find a vector that is componentwise close to {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}, and (b) find an optimal solution to the maximization problem (Kx) in the definition of . Lemma 4.11 argues using simple Chernoff bounds that one can obtain a vector that is componentwise close to {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}, and Lemma 4.12 shows that one can compute an optimal solution to (Kx) (with polynomial support). Finally, Lemma 4.13 bounds the Lipschitz constant of . Putting everything together yields Theorem 4.1.
Lemma 4.10**.**
- (i)
The function is convex, and the vector d:=c+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}+\sum_{A\in\mathcal{A}}q^{*}_{A}d^{x,A} is a subgradient of at ; here is an optimal solution to **(Kx*)**.* 2. (ii)
Moreover, if is a vector such that -\omega c\leq d^{\mathrm{est}}-{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}\leq 0, then is an -subgradient of at .
Proof.
Convexity of will follow from the fact that we have a subgradient of at every point . Part (i) is a special case of part (ii) with , so we focus on part (ii). Consider any . We have
[TABLE]
The first inequality follows since is a feasible solution to (K); the second follows since is a subgradient of at ; the third follows from the componentwise closeness of and {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}; the fourth follows since , and the last inequality is because . â
Lemma 4.11**.**
Let . For any and , we can compute a vector such that -\omega c\leq d^{\mathrm{est}}-{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}\leq 0 with probability at least in time .
Proof.
This is a simple application of Chernoff-Hoeffding bounds. For , we sample a scenario from , and compute , so for every by (P5â). Taking the average of independent samples, we obtain using Chernoff bounds (see Theorem 1.1 in [7]), that
[TABLE]
for every . So \mathcal{N}=\frac{2\lambda^{2}}{\omega^{2}}\ln\bigl{(}\frac{2m}{\delta}\bigr{)} ensures that the above probability is at most . We return . By the union bound, this satisfies -\omega c\leq d^{\mathrm{est}}-{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}\leq 0 with probability at least . â
We say is a good -sequence for if are the scenarios with maximum second-stage cost in that order; i.e., more precisely, we have .
Lemma 4.12**.**
Let t:=\min\bigl{\{}\bigl{\lceil}\widehat{P}^{\mathrm{free}}/r\bigr{\rceil},|\mathcal{A}|\bigr{\}}, and fix . Suppose that for all .
- (a)
We can compute a good -sequence in time . 2. (b)
Define the vector as follows:
[TABLE]
Then is an optimal solution to .
Proof.
By the monotonicity assumption of , the costliest scenario is , so we start by setting . We then proceed as follows for . Suppose that we have already computed . Computing amounts to solving the problem
[TABLE]
We claim that (16) admits an optimal solution that is a maximal proper subset of for some . Indeed, let be an optimal solution of (16) with maximum cardinality, and suppose for a contradiction that it is not a maximal proper subset of for any . Note that since , we have , so there is an element . Now, consider the scenario . Since by assumption is not a maximal subset of for any , it follows that is feasible for (16). By the monotonicity assumption, since , we have , and so is also an optimal solution for (16). Since , this contradicts the definition of .
We now utilize the observation above to show that given and , we can solve (16) in time. This can be done by enumerating all maximal proper subsets of . Since each set has maximal proper subsets, we enumerate scenarios, and the claim follows. We conclude that we can compute a good -sequence by solving (16) for , which takes time.
For part (b), consider the polytope . Note that the problem is equivalent to the problem (up to scaling of the solutions), which can be seen as a fractional knapsack problem: we have one item of value and weight for every ; the capacity of the knapsack is set to . The result then follows by using the fact that one can compute an optimal solution to a fractional knapsack problem in a greedy fashion, by repeatedly picking among the available items the one with the highest value/weight ratio. â
Lemma 4.13**.**
The function has Lipschitz constant at most .
Proof.
It suffices to show that admits a subgradient of Euclidean norm at most at every point . Fix , and consider the subgradient d:=c+{\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}d^{x,A}\bigr{]}+\sum_{A\in\mathcal{A}}q^{*}_{A}d^{x,A} given by Lemma 4.10. We have
[TABLE]
The first step follows from the triangle inequality, and the final step follows because for every by assumption (P5â) and . â
Proof of Theorem 4.1.
Note that by the monotonicity property of the second-stage costs. If (so contains only null scenarios) then \max_{q:\|\mathring{p}-q\|_{\infty}\leq r}{\textstyle\operatorname*{E}_{A\sim q}}\bigl{[}g(x,A)\bigr{]}=0, and so is an optimal solution to the DR problem. Otherwise, the optimal value of (Kx) is at least since there is always a distribution with that places a weight of at least on (e.g., take if ; otherwise, take , for all ). Note that \log\bigl{(}\frac{1}{\mathsf{LB}}\bigr{)}=\operatorname{\mathsf{poly}}(\mathcal{I}).
We compute a -estimate of using Lemma 4.6. We then run the algorithm Theorem 4.9, utilizing Lemmas 4.10â4.12 to compute -subgradients, and setting and (using Lemma 4.13). Let be the solution returned. Using Lemma 4.8, we obtain that
[TABLE]
where since . The success probability is at least . â
Appendix A Proof of Theorem 3.5
Overview.
Let denote a generic empirical estimate of (which could be any of ). We discretize suitably to obtain a set so that for any , and , there is some such that is close to for any central distribution (Claim A.2). It follows that approximate solutions to translate to approximate solutions to .
The arguments in [4] can be used to show that an approximate solution to can be used to obtain an approximate solution to (given a suitable value oracle for ). Recall that \overline{h}({p}\,;{x,y})=c^{\intercal}x+ry+{\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\overline{g}(x,y,A)\bigr{]}. The proof in [4] proceeds by decomposing {\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}\overline{g}(x,y,A)\bigr{]} into two terms, {\textstyle\operatorname*{E}^{l}_{A\sim p}}\bigl{[}.\bigr{]} and {\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}, which are the contributions from âlowâ cost and âhighâ cost scenarios respectively. For the low scenarios, Chernoff bounds imply that {\textstyle\operatorname*{E}^{l}_{A\sim\widehat{p}}}\bigl{[}.\bigr{]} and {\textstyle\operatorname*{E}^{l}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]} are close to each other, for all , and all SAA problems; this is stated in (20).
But the high-scenario contribution could be quite different in the SAA and original problems, although in both problems, this contribution is essentially independent of since the choice of âhighâ ensures that high scenarios occur with small probability; this is shown by inequalities (18), (19).
Since {\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]} is linear in , the expectation of {\textstyle\operatorname*{E}^{h}_{A\sim\widehat{p}}}\bigl{[}.\bigr{]}, over the choice of , is precisely {\textstyle\operatorname*{E}^{h}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}. Thus, among our multiple SAA problems (involving empirical estimates of ), we can guarantee by Markovâs inequality that (with high probability) for at least one of them, {\textstyle\operatorname*{E}^{h}_{A\sim\widehat{p}^{i}}}\bigl{[}.\bigr{]} will be close to {\textstyle\operatorname*{E}^{h}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}. It follows that an -approximate solution to this SAA problem is also an \alpha\bigl{(}1+O(\varepsilon)\bigr{)}-approximate solution to the original problem. But we do not a priori know this index , and evaluating or estimating {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}\overline{g}(x,y,A)\bigr{]} (and hence, ) is challenging because (other than the difficulty of evaluating for a specific scenario ) can have exponential support; in fact, this is often #P-hard even for standard 2-stage problems. In [4], it is shown that if one can estimate the objective value for the SAA problem (which seems easier since has polynomial support), then choosing the (solution corresponding to the) SAA problem with best SAA objective value works.
In our case, we actually want to evaluate , or roughly equivalently (by Lemma 3.4), the objective for the solution returned by the SAA problem. While we can once again decompose {\textstyle\operatorname*{E}_{A\sim p}}\bigl{[}g(x,y,A)\bigr{]} into {\textstyle\operatorname*{E}^{l}_{A\sim p}}\bigl{[}.\bigr{]} and {\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}, as with {\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}, the term could have very different contributions in the SAA and original problems, and we need to reason about this separately. Moreover, a complicating factor is that this term is not linear in . We show in Claim A.3 that this term is concave in , and this allows us to still use Markovâs inequality as above. In the proof below, we consider the combined term {\textstyle\operatorname*{E}^{h}_{A\sim p}}\bigl{[}.\bigr{]}+z^{\mathrm{lg}}({p}\,;{0}), and apply Markovâs inequality to show that among our multiple SAA problems, there is some index for which this term is close to {\textstyle\operatorname*{E}^{h}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0}); see inequality (21).
Finally, we show that, although we do not know , and we do not know how to evaluate or , the index corresponding to the best estimate works as well as ; this is captured by (23).
Details.
Instead of directly working with and , we will work with the quantities and . It will be cumbersome to carry around the term, so we define , and . To further simplify notation, we further abbreviate notation. The convention we follow is that whenever there is an index in the superscript of a quantity, it refers to that quantity for the central distribution of the -th SAA problem. So we use
- â
and to denote and respectively; 2. â
and to denote and respectively; 3. â
and to denote and respectively; 4. â
and to denote and respectively.
We focus on showing that
[TABLE]
Combining this with Lemma 3.4 completes the proof.
Let . Define Y:=\{0,\tau\}\cup\{\text{integer multiples of \frac{\eta^{\prime}}{\lambda r} in }[0,\tau]\}.888The discretization considered in [4] is incorrect: it assumes implicitly that the search region of the SAA problem is (or may be) restricted to points whose first-stage cost is within some factor of the optimum of the original problem, but this need not hold. It also assumes that the grid points lie in the feasible region, which again need not hold. Note that |Y|=O\bigl{(}\frac{\tau\lambda r}{\eta^{\prime}}\bigr{)}.
Claim A.1**.**
The discretized 2-stage problem satisfies properties (P1), (P2) with inflation parameter , i.e., we have
[TABLE]
Claim A.2**.**
For any , , and any distribution , there is some such that .
Proof.
There is some with . If , then . Also, for all , so . If , then we can interchange the arguments; the claim follows. â
We now adapt and generalize the arguments in [4]. Let be an optimal solution to , which is also an optimal solution to . Let . Let be such that , and let given by Claim A.2 be such that .
Let . Call a scenario âhighâ, if , and âlowâ otherwise. Let We use {\textstyle\operatorname*{E}^{l}_{A}}\bigl{[}.\bigr{]} (respectively {\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}.\bigr{]}) to denote the expectation {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]} where non-low (respectively non-high) scenarios contribute 0 (so {\textstyle\operatorname*{E}_{A\sim\mathring{p}}}\bigl{[}.\bigr{]}={\textstyle\operatorname*{E}^{l}_{A}}\bigl{[}.\bigr{]}+{\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}.\bigr{]}). Let , \widehat{\operatorname*{E}}^{{i},l}_{A}\bigl{[}.\bigr{]}, and \widehat{\operatorname*{E}}^{{i},h}_{A}\bigl{[}.\bigr{]} denote these quantities for the -th SAA problem. Since \overline{h}({\mathring{p}}\,;{\bar{x}})\geq{\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(\bar{x},y^{*},A)\bigr{]}\geq\mathring{p}^{h}\bigl{(}H-\lambda(c^{\intercal}\bar{x}+ry^{*})\bigr{)}) (the second inequality is due to Claim A.1), we have .999If , then , and for all with . Therefore, for all with , and all scenarios in the support of are low scenarios. The sample size is chosen so that Chernoff bounds ensure that with probability at least , for every , we have . Hence,
[TABLE]
Since for all low scenarios and all , the choice of shows that, again using Chernoff bounds, with probability , we have
[TABLE]
Next, we argue that there is some index such that \widehat{\operatorname*{E}}^{{t},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}+\widehat{z}^{\mathrm{lg},{t}} is close to {\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0}). For every , the expected value of \widehat{\operatorname*{E}}^{{i},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]} is precisely {\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}, so we can use Markovâs inequality. But it is more tricky to reason about the expected value of since is not linear in .
Claim A.3**.**
* is a concave function of .*
Proof.
Consider any two distributions and , and , where . Let and be the optimal solutions to the optimization problems defining and . Then, is a feasible solution to the optimization problem defining , and its objective value is . â
Using the above claim and Jensenâs inequality, we obtain that the expected value of is at most . Therefore, by Markovâs inequality, we have that the event \widehat{\operatorname*{E}}^{{i},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}+\widehat{z}^{\mathrm{lg},{i}}>(1+\varepsilon)\bigl{(}{\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})\bigr{)} happens with probability at most . The probability that this happens for all is at most . So we may assume that there is some index such that
[TABLE]
Now we show that the index obtained from the estimates can be used in place of the index . To do this, we first use the properties of the âs and the index to relate the quality of for the -th SAA problem to the quality of under any of the other SAA problems. Let be such that , and let be the point in given by Claim A.2. We have that for every ,
[TABLE]
The first inequality follows from Claim A.2; the second follows from Lemma 3.4; the next three inequalities follow from the properties of the estimates, and the choice of index ; the last inequality again uses Lemma 3.4, and that for any .
Let . Let \Delta^{j}={\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})-\widehat{\operatorname*{E}}^{{j},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}-\widehat{z}^{\mathrm{lg},{j}}, and \Delta^{t}={\textstyle\operatorname*{E}^{h}_{A}}\bigl{[}\overline{g}(0,0,A)\bigr{]}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})-\widehat{\operatorname*{E}}^{{t},h}_{A}\bigl{[}\overline{g}(0,0,A)\bigr{]}-\widehat{z}^{\mathrm{lg},{t}}. Applying (22) to and , we have and . Multiplying the first inequality by and the second by and adding, we get
[TABLE]
We now combine these various inequalities to obtain the desired result. By repeatedly using (18)â(20), we get
[TABLE]
where the last inequality above follows by applying (23). We bound as follows.
[TABLE]
Similarly, we have
[TABLE]
Substituting -\Delta^{t}\leq\varepsilon\bigl{(}\overline{O}+z^{\mathrm{lg}}({\mathring{p}}\,;{0})\bigr{)} from (21), we can simplify this to
[TABLE]
Finally, substituting this bound and (25), in (24), we obtain
[TABLE]
This implies that
[TABLE]
where since . This proves (17). Combining this with Lemma 3.4 yields the inequality in Theorem 3.5. The success probability is the probability that inequalities (19)â(21) hold, which is at least . â
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Shipra Agrawal, Yichuan Ding, Amin Saberi, and Yinyu Ye. Price of Correlations in Stochastic Optimization. Operations Research , 60(1):150â162, 2012.
- 2[2] D. Bertsimas, M. Sim, and M. Zhang. A practicable framework for distributionally robust linear optimization. optimization-online.org , 2013.
- 3[3] John R. Birge and François Louveaux. Introduction to Stochastic Programming . Springer Science & Business Media, June 2011.
- 4[4] Moses Charikar, Chandra Chekuri, and Martin PĂĄl. Sampling bounds for stochastic optimization. In Proceedings of the 8th International Workshop on Approximation, Randomization and Combinatorial Optimization Problems (APPROX) , pages 257â269, 2005.
- 5[5] Erick Delage and Yinyu Ye. Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems. Operations Research , 58(3):595â612, 2010.
- 6[6] Kedar Dhamdhere, Vineet Goyal, R. Ravi, and Mohit Singh. How to pay, come what may: Approximation algorithms for demand-robust covering problems. In Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science (FOCS) , pages 367â378, 2005.
- 7[7] Devdatt Dubhashi and Alessandro Panconesi. Concentration of Measure for the Analysis of Randomized Algorithms . Cambridge University Press, New York, NY, USA, 1st edition, 2009.
- 8[8] Emre ErdoÄan and Garud Iyengar. Ambiguous chance constrained problems and robust optimization. Math. Program. , 107(1-2):37â61, December 2005.
