(Near) Optimal Adaptivity Gaps for Stochastic Multi-Value Probing

Domagoj Bradac; Sahil Singla; Goran Zuzic

arXiv:1902.01461·cs.DS·February 7, 2019

(Near) Optimal Adaptivity Gaps for Stochastic Multi-Value Probing

Domagoj Bradac, Sahil Singla, Goran Zuzic

PDF

TL;DR

This paper studies the adaptivity gap in stochastic multi-value probing problems, providing near-optimal bounds for various functions and constraints, thereby advancing understanding of non-adaptive strategies in complex probabilistic settings.

Contribution

It introduces a multi-value stochastic probing framework and establishes tight bounds on the adaptivity gap for key classes of functions and constraints, resolving open questions.

Findings

01

Adaptivity gap at most 2 for monotone submodular functions.

02

Adaptivity gap between O(k log k) and k for weighted rank functions of k-extendible systems.

03

Results extend previous Bernoulli case bounds to multi-value distributions.

Abstract

Consider a kidney-exchange application where we want to find a max-matching in a random graph. To find whether an edge $e$ exists, we need to perform an expensive test, in which case the edge $e$ appears independently with a \emph{known} probability $p_{e}$ . Given a budget on the total cost of the tests, our goal is to find a testing strategy that maximizes the expected maximum matching size. The above application is an example of the stochastic probing problem. In general the optimal stochastic probing strategy is difficult to find because it is \emph{adaptive}---decides on the next edge to probe based on the outcomes of the probed edges. An alternate approach is to show the \emph{adaptivity gap} is small, i.e., the best \emph{non-adaptive} strategy always has a value close to the best adaptive strategy. This allows us to focus on designing non-adaptive strategies that are much…

Figures1

Click any figure to enlarge with its caption.

Equations123

\displaystyle\operatorname{val}_{\mathbf{X}}(S)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}f\big{(}\{\mathbf{X}_{e}\mid e\in S\}\big{)}=f(\mathbf{X}_{S}),

\displaystyle\operatorname{val}_{\mathbf{X}}(S)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}f\big{(}\{\mathbf{X}_{e}\mid e\in S\}\big{)}=f(\mathbf{X}_{S}),

f_{S} (A) = def f (S \cup A) - f (S)

f_{S} (A) = def f (S \cup A) - f (S)

adap (T, f) = def E_{X} [val_{X} (S (X))] .

adap (T, f) = def E_{X} [val_{X} (S (X))] .

(C, val_{X}) \in P sup \frac{sup _{T is feasible in P} adap ( T , f )}{sup _{S \in C} E _{X} [ val _{X} ( S )]} .

(C, val_{X}) \in P sup \frac{sup _{T is feasible in P} adap ( T , f )}{sup _{S \in C} E _{X} [ val _{X} ( S )]} .

alg (T, f) = def E_{X, X^{'}} [val_{X^{'}} (S (X))] .

alg (T, f) = def E_{X, X^{'}} [val_{X^{'}} (S (X))] .

alg (T, f) \geq \frac{1}{2} adap (T, f) .

alg (T, f) \geq \frac{1}{2} adap (T, f) .

adap (T, f) = E_{I} [f (I) + adap (T_{I}, f_{I})] and alg (T, f) = E_{I, R} [f (R) + alg (T_{I}, f_{R})] .

adap (T, f) = E_{I} [f (I) + adap (T_{I}, f_{I})] and alg (T, f) = E_{I, R} [f (R) + alg (T_{I}, f_{R})] .

adap (T, f)

adap (T, f)

\leq E_{I, R} [f (I) + f (R) + adap (T_{I}, f_{I \cup R})],

adap (T, f) \leq E_{I, R} [2 \cdot f (R) + adap (T_{I}, f_{I \cup R})] .

adap (T, f) \leq E_{I, R} [2 \cdot f (R) + adap (T_{I}, f_{I \cup R})] .

alg (T, f) = E_{I, R} [f (R) + alg (T_{I}, f_{R})] \geq E_{I, R} [f (R) + alg (T_{I}, f_{I \cup R})] .

alg (T, f) = E_{I, R} [f (R) + alg (T_{I}, f_{R})] \geq E_{I, R} [f (R) + alg (T_{I}, f_{I \cup R})] .

alg (T_{I}, f_{I \cup R}) \geq \frac{1}{2} adap (T_{I}, f_{I \cup R}) .

alg (T_{I}, f_{I \cup R}) \geq \frac{1}{2} adap (T_{I}, f_{I \cup R}) .

alg (T, f) \geq \frac{1}{2} adap (T, f),

alg (T, f) \geq \frac{1}{2} adap (T, f),

f (S) = def k \in K (S) \sum (1 - ϵ)^{k} .

f (S) = def k \in K (S) \sum (1 - ϵ)^{k} .

adap (k) = (1 - ϵ)^{k} \cdot adap (0) .

adap (k) = (1 - ϵ)^{k} \cdot adap (0) .

\displaystyle{\sf adap}(0)\quad=\quad\sum_{k=0}^{\infty}(1-\epsilon)^{k}\cdot\epsilon\cdot\Big{(}1+{\sf adap}(k+1)\Big{)}\quad=\quad\sum_{k=0}^{\infty}(1-\epsilon)^{k}\cdot\epsilon\Big{(}1+(1-\epsilon)^{k+1}\cdot{\sf adap}(0)\Big{)},

\displaystyle{\sf adap}(0)\quad=\quad\sum_{k=0}^{\infty}(1-\epsilon)^{k}\cdot\epsilon\cdot\Big{(}1+{\sf adap}(k+1)\Big{)}\quad=\quad\sum_{k=0}^{\infty}(1-\epsilon)^{k}\cdot\epsilon\Big{(}1+(1-\epsilon)^{k+1}\cdot{\sf adap}(0)\Big{)},

adap = adap (0) = 2 - ϵ .

adap = adap (0) = 2 - ϵ .

alg (k) = (1 - ϵ)^{k} \cdot alg (0) .

alg (k) = (1 - ϵ)^{k} \cdot alg (0) .

\displaystyle{\sf alg}(0)\quad=\quad\sup_{k\geq 1}\Big{\{}1-(1-\epsilon)^{k}+{\sf alg}(k)\Big{\}}\quad=\quad\sup_{k\geq 1}\Big{\{}1-(1-\epsilon)^{k}+(1-\epsilon)^{k}\cdot{\sf alg}(0)\Big{\}},

\displaystyle{\sf alg}(0)\quad=\quad\sup_{k\geq 1}\Big{\{}1-(1-\epsilon)^{k}+{\sf alg}(k)\Big{\}}\quad=\quad\sup_{k\geq 1}\Big{\{}1-(1-\epsilon)^{k}+(1-\epsilon)^{k}\cdot{\sf alg}(0)\Big{\}},

alg = alg (0) = 1.

alg = alg (0) = 1.

adap (T, f) = E_{X} [f (X_{S})] \leq k \cdot E_{X, X^{'}} [greedy (X_{S} \cup X_{S}^{'})] .

adap (T, f) = E_{X} [f (X_{S})] \leq k \cdot E_{X, X^{'}} [greedy (X_{S} \cup X_{S}^{'})] .

E_{X, X^{'}} [greedy (X_{S} \cup X_{S}^{'})] \leq 2 \cdot alg (T, f) .

E_{X, X^{'}} [greedy (X_{S} \cup X_{S}^{'})] \leq 2 \cdot alg (T, f) .

alg (T, f) \geq \frac{1}{2} \cdot E_{X, X^{'}} [greedy (X_{S} \cup X_{S}^{'})] \geq \frac{1}{2 k} \cdot adap (T, f) .

alg (T, f) \geq \frac{1}{2} \cdot E_{X, X^{'}} [greedy (X_{S} \cup X_{S}^{'})] \geq \frac{1}{2 k} \cdot adap (T, f) .

greedy (T, f) \leq E_{I, R} [f (I \cup R) + greedy (T_{I}, (f / R) / I)],

greedy (T, f) \leq E_{I, R} [f (I \cup R) + greedy (T_{I}, (f / R) / I)],

greedy (T, f)

greedy (T, f)

= E_{I, R} [2 \cdot f (R) + greedy (T_{I}, (f / R) / I)],

alg (T, f)

alg (T, f)

alg (T_{I}, (f / R) / I) \geq \frac{1}{2} greedy (T_{I}, (f / R) / I) .

alg (T_{I}, (f / R) / I) \geq \frac{1}{2} greedy (T_{I}, (f / R) / I) .

greedy (T, f) \leq 2 \cdot alg (T, f),

greedy (T, f) \leq 2 \cdot alg (T, f),

adap (T, f) \leq j \sum 2^{j} \cdot adap (T, f_{j}),

adap (T, f) \leq j \sum 2^{j} \cdot adap (T, f_{j}),

alg (T, f_{j}) \geq \frac{1}{2 k} \cdot adap (T, f_{j}) .

alg (T, f_{j}) \geq \frac{1}{2 k} \cdot adap (T, f_{j}) .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

(Near) Optimal Adaptivity Gaps for

Stochastic Multi-Value Probing

Domagoj Bradac

([email protected]) Department of Mathematics, Faculty of Science, University of Zagreb.

Sahil Singla

([email protected]) Department of Computer Science, Princeton University. Most of this work was done when the author was a graduate student at Carnegie Mellon University.

Goran Zuzic

([email protected]) Computer Science Department, Carnegie Mellon University.

Abstract

Consider a kidney-exchange application where we want to find a max-matching in a random graph. To find whether an edge $e$ exists, we need to perform an expensive test, in which case the edge $e$ appears independently with a known probability $p_{e}$ . Given a budget on the total cost of the tests, our goal is to find a testing strategy that maximizes the expected maximum matching size.

The above application is an example of the stochastic probing problem. In general the optimal stochastic probing strategy is difficult to find because it is adaptive—decides on the next edge to probe based on the outcomes of the probed edges. An alternate approach is to show the adaptivity gap is small, i.e., the best non-adaptive strategy always has a value close to the best adaptive strategy. This allows us to focus on designing non-adaptive strategies that are much simpler. Previous works, however, have focused on Bernoulli random variables that can only capture whether an edge appears or not. In this work we introduce a multi-value stochastic probing problem, which can also model situations where the weight of an edge has a probability distribution over multiple values.

Our main technical contribution is to obtain (near) optimal bounds for the (worst-case) adaptivity gaps for multi-value stochastic probing over prefix-closed constraints. For a monotone submodular function, we show the adaptivity gap is at most $2$ and provide a matching lower bound. For a weighted rank function of a $k$ -extendible system (a generalization of intersection of $k$ matroids), we show the adaptivity gap is between $O(k\log k)$ and $k$ . None of these results were known even in the Bernoulli case where both our upper and lower bounds also apply, thereby resolving an open question of Gupta et al. [GNS17].

1 Introduction

Consider a kidney-exchange application where we want to find a maximum matching in a random graph. To find whether an edge $e$ exists, we need to perform an expensive test, in which case the edge $e$ appears independently with a known probability $p_{e}$ . Given a budget on the total cost of the tests, our goal is to design a testing strategy that maximizes the expected size of the found matching.

The above application can be modeled as a constrained stochastic probing problem [ANS08, GN13, ASW14, GNS16, GNS17]. In this setting, we are given a universe $V$ of elements (e.g., the set of all possible edges), each with an activation probability $p_{v}$ for $v\in V$ (e.g., the probability an edge exists). We define a random set $A\subseteq V$ of active elements that contains every $v$ independently with probability $p_{v}$ . A probe at $v$ reveals whether $v\in A$ or $v\not\in A$ , and we are only allowed to probe certain feasible subsets $S\in\mathcal{F}\subseteq 2^{V}$ (e.g., subsets of edges whose tests fit in our budget). Our goal is to design a probing strategy to find a feasible set $S\in\mathcal{F}$ of elements to maximize $\mathbb{E}_{A}[f(A\cap S)]$ , where $f$ is some combinatorial function $f:2^{V}\rightarrow\mathbb{R}_{\geq 0}$ (e.g., the cardinality of the maximum matching). Notice our probing strategy could be adaptive, i.e., we could decide which element to probe next based on the outcomes of already probed elements.

Besides matching [CIK*+*09, BGL*+*12], stochastic probing has applications for stochastic variants of several other combinatorial problems. E.g., it can be used for Bayesian mechanism design problems [GN13], robot path-planning problems [GNS16, GNS17], and stochastic set cover problems that arise in database applications [LPRY08, DHK14]. As observed in these prior works, the optimal strategy for stochastic probing can be represented as a binary decision tree where each node represents an element of $V$ : You first probe the root node element, and then depending on whether it is active or inactive, you either move to the right or the left subtree. In general, such an optimal decision tree can be exponentially sized and is hard to describe. We do not even understand how to capture it for very simple functions and constraints (e.g., the $\max$ function with cardinality constraints [HFX18]).

An alternate approach is to focus on non-adaptive strategies. Such a strategy commits to probing a feasible set $S\in\mathcal{F}$ in the beginning, irrespective of which of these elements turn out active. A non-adaptive strategy has several benefits: (a) it is easy to represent since we can just store the set $S$ , (b) it is easy to find for many classes of functions and constraints (e.g., submodular functions over intersection of matroids [CVZ14]), and (c) it is parallelizable because we do not need feedback. The concern is that the expected value of the optimal non-adaptive strategy might be much smaller than that of the optimal adaptive strategy. This raises the (worst-case instance) adaptivity gap question: What is the maximum ratio between the expected values of the optimal adaptive and the optimal non-adaptive strategies for stochastic probing? If this ratio is small then we can focus on non-adaptive strategies and reap its benefits with only a small loss in value (see Figure 1).

Since for general combinatorial functions or constraints the adaptivity gaps can be made arbitrarily large, we need to consider special classes of functions and constraints. In a surprising result, Gupta et al. prove that for any monotone submodular function and any prefix-closed constraints111Prefix-closed constraints stipulate that any prefix of a feasible probing sequence is also feasible. This class contains any downward-closed/packing constraint., the adaptivity gap is at most $3$ [GNS17]. The best known lower bound in this setting, however, is only $\frac{e}{e-1}\approx 1.58$ due to Asadpour et al. [ANS08]. This leaves open the following question:

For stochastic probing, what is the (worst-case) adaptivity gap for monotone submodular functions over prefix-closed constraints?

We show that both the previously known upper bound of $3$ and the lower bound of $\frac{e}{e-1}$ are not tight. Instead, the adaptivity gap is exactly $2$ .

One might notice that submodular functions do not capture the max-matching function used to model kidney-exchanges. This motivates us to consider more general combinatorial functions; in particular, we study the weighted rank function of a $k$ -extendible system (defined in §2). This class generalizes intersection of $k$ -matroids [Mes06], e.g., a $2$ -extendible system captures matching in general graphs (unlike intersections of two matroids). Our goal is to bound the adaptivity gap for such functions over arbitrary prefix-closed constraints.

A major drawback of the stochastic probing model is that it only considers Bernoulli random variables. One would ideally allow for more modeling power by permitting the outcome of a probe to be a non-binary value. For example, in the kidney-exchange application, one might desire to summarize an edge probe by the risk involved in performing the match: a value of [math] describes an impossible match, a value of $1$ indicates a safe match, and the possibilities in between are represented by intermediate values. Notice that the optimal adaptive strategy is still a decision tree; however, it may no longer be binary.

The main contributions of this paper are (1) a model that extends the binary stochastic probing to the multi-value setting, (2) the exact calculation of the adaptivity gap for stochastic probing of monotone submodular functions (in both the binary and multi-value setting), and (3) a nearly-tight adaptivity gap for stochastic probing of weighted rank functions over $k$ -extendible systems.

1.1 Overview of Results

Our conceptual contribution is to present a generalization of the stochastic probing model to stochastic multi-value probing ( ${\sf SMP}$ ) described in §2. Roughly, the idea is that each element has $t$ potential types, and a probe reveals which one of its types it takes. This trivially captures stochastic probing for $t=2$ , where the two types are active and inactive. In general these different types can be used to model different weights of an element, or to even encode different kinds of complementary relationships in the element values.

Although the ${\sf SMP}$ model is more general than the stochastic probing model, our main technical result in §3 is that for monotone submodular functions the adaptivity gap is bounded by $2$ . We also give a matching lower bound which proves this cannot be further reduced. This is despite the fact that the optimal decision tree for ${\sf SMP}$ may no longer be binary.

Theorem 1.1.

The adaptivity gap for ${\sf SMP}$ where the constraints are prefix-closed and the function is monotone non-negative submodular is exactly $2$ .

Since ${\sf SMP}$ is strictly more general than stochastic probing, Theorem 1.1 also improves the previously known upper bound of $3$ for monotone submodular stochastic probing. In fact, our lower bound ${\sf SMP}$ instance in Theorem 1.1 is Bernoulli. Thus it resolves an open question of [GNS17] of finding the optimal adaptivity gaps for submodular stochastic probing.

Our main technical result in §4 is that the adaptivity gap for weighted rank function of a $k$ -extendible system is $\tilde{\Theta}(k)$ .

Theorem 1.2.

The adaptivity gap for ${\sf SMP}$ where the constraints are prefix-closed and the function is a weighted rank function of a $k$ -extendible system is between $k$ and $O(k\log k)$ . Moreover, for unweighted rank functions, the adaptivity gap is between $k$ and $2k$ .

Since the weighted rank of function of intersection of $k$ -matroids is a $k$ -extendible system, Theorem 1.2 implies as a corollary that the adaptivity gaps for this class is at most $\tilde{\Theta}(k)$ . This improves the previously best known upper bound for intersection of $k$ matroids of $O(k^{4}\cdot\log n)$ due to Gupta et al. [GNS16]. We also give an $\Omega(\sqrt{k})$ -lower bound in this setting.

1.2 Techniques and Challenges

In this section we outline our main techniques and challenges for ${\sf SMP}$ adaptivity gaps.

Submodular Functions: To prove a small adaptivity gap, we need to show existence of a “good” non-adaptive solution. A priori it is not clear how to construct such a solution, e.g., LP based approaches do not extend beyond matroid constraints because of large integrality gaps. Since we only need to show existence, we can assume the optimal (exponential sized) decision tree is known. A crucial idea of [GNS16] is to perform a random walk on this optimal decision tree (with probabilities given by the tree) and probing elements on the sampled root-leaf path. In other words, consider a non-adaptive strategy that randomly chooses a root-leaf path in the decision tree with the same probability as the optimal adaptive strategy. While this idea is natural in hindsight, its analysis for the non-adaptive strategy has been challenging.

In [GNS16], the authors use Freedman’s inequality—linear functions are “well-concentrated” for a martingale—to argue that simple submodular functions are well-concentrated. This step requires massive union bounds over a polynomial number of linear functions, which loses logarithmic factors. To overcome this super-constant loss, in [GNS17] the authors use an inductive approach and induct over subtrees where in each step a stem—the all-no path—is observed. A “stem lemma” allows them to argue that for every stem the expected value of the non-adaptive algorithm is within a factor $2$ to the expected adaptive strategy. Finally, they “stitch” back the stem for induction by using submodularity, overall losing a factor of $3$ .

In this work, to prove the improved adaptivity gap of $2$ in Theorem 1.1, our insight is to modify the above induction to observe a single node at each step (instead of a stem as in [GNS17]). While we still induct over subtrees, this allows us to avoid any additional loss due to the stitching step. This induction turns out to be nontrivial because the adaptive and non-adaptive strategies can observe different types of the root element. In other words, although the non-adaptive random walk strategy follows the distribution of root-leaf paths of the adaptive strategy, it has to independently re-sample (re-probe) all the nodes on the chosen path. This hinders a direct application of induction as the marginal values in the subtrees change between the two strategies. We remedy this issue using two main ideas. First, we compare the non-adaptive strategy to a “super-strategy” that can choose from both the elements chosen by the adaptive and the non-adaptive strategies. (This is also the intuition for the gap of $2$ since the “super-strategy” has two chances to sample an element.) Second, the non-adaptive strategy forfeits any potential future value that the adaptive strategy gained at the root but the non-adaptive missed due to re-sampling. (This can be done by contracting the element sampled by the adaptive strategy without receiving its value.) Notice that both these steps are pessimistic and hence give a valid upper bound on the adaptivity gap. Together these ideas suffice to match the marginal values in the subtrees and apply induction without the stiching step, yielding an adaptivity gap of $2$ . Our lower bounds in §3.2 show examples where the super-strategy does not have any advantage over the adaptive strategy. Thus the adaptivity gap of $2$ is optimal.

Rank Functions: A technical challenge in extending the above inductive approach to $k$ -extendible system rank functions is that their marginal values do not belong to the same class. Namely, after contracting an element, the marginal value of a submodular function is submodular but the marginal value of a $k$ -extendible system rank function may not even be subadditive. To overcome this, we first focus on unweighted rank functions. Instead of directly comparing the non-adaptive strategy to the adaptive strategy, our insight is to compare it to a greedy procedure. We show that this greedy procedure is a $k$ -approximation to the adaptive strategy. Moreover, we show it has a notion of a marginal value. This allows us to compare the non-adaptive strategy to the greedy procedure in a similar way as for submodular functions, by losing another factor of $2$ . Our lower bound in §4.3 shows that the factor $k$ loss in comparing to a greedy procedure is unavoidable, thereby making our analysis tight up to constants.

Finally, the challenge in proving Theorem 1.2 for weighted $k$ -extendible system rank functions is that the greedy procedure only guarantees a $k$ -approximation if we go in the order of decreasing weights. Instead, our adaptivity gap proofs only work when we are greedy in the root-to-leaf path order. One way around this is to partition the elements into $O(\log n)$ exponentially weighted classes (e.g., $1,2,2^{2},\ldots$ ) and apply the unweighted argument to the most valuable class. Unfortunately, this loses an $\Omega(\log n)$ factor. To obtain bounds independent of the universe size $n$ , our insight is that picking an element in a class “removes” at most $k$ elements from a lower weight class. We can therefore improve the $\log n$ factor loss to a $\log k$ by increasing the gap between successive classes to $\Omega(k)$ . To achieve this we further combine $O(\log k)$ consecutive classes into a “super-class” (bucket). It is an interesting open question to find if this $\log k$ loss is essential in going from unweighted to weighted $k$ -extendible system rank functions.

1.3 Further Related Work

The adaptivity gap of stochastic packing problems has seen much interest; see, e.g., for knapsack [DGV04, BGK11, Ma14], packing integer programs [DGV05, CIK*+*09, BGL*+*12], budgeted multi-armed bandits [GM07, GKMR11, LY13, Ma14], and orienteering [GM09, GKNR12, BN14]. All except the orienteering results rely on having relaxations that capture the constraints of the problem via linear constraints. For stochastic monotone submodular functions where the probing constraints are given by matroids, Asadpour et al. [AN16] bounded the adaptivity gap by $\frac{e}{e-1}$ ; Hellerstein et al. [HKL15] bound it by $\frac{1}{\tau}$ , where $\tau$ is the smallest probability of some set being materialized. Other relevant papers are [LPRY08, DHK14].

The work of Chen et al. [CIK*+*09] (see also [Ada11, BGL*+*12, BCN*+*15, AGM15]) sought to maximize the size of a matching subject to $b$ -matching constraints; this was motivated by applications to online dating and kidney exchange. See also [RSÜ05, AR12] for pointers to other work on kidney exchange problems. The work of [GN13] abstracted out the general problem of maximizing a function (in their case, the rank function of the intersection of matroids or knapsacks) subject to probing constraints (again, intersection of matroids and knapsacks). This was improved and generalized by Adamczyk et al. [ASW14] to submodular objectives. All these results use LPs or geometric relaxations, and do not extend to arbitrary packing constraints due to large integrality gaps of the relaxations.

2 Stochastic Multi-Value Probing Model

In this section we formally define our stochastic multi-value probing ( ${\sf SMP}$ ) model using the idea of combinatorial valuation over independent elements. We also discuss some preliminaries.

2.1 Combinatorial Valuation over Independent Elements

The multi-value paradigm is based on the notion of type, which represents different “values” an element can take. This leads to combinatorial valuations over independent elements where each element independently takes its type. Similar notions have been defined before; e.g., see [RS17] and references therein.

Definition 2.1 (Combinatorial valuation $\operatorname{val}_{\mathbf{X}}$ over independent elements).

Consider a finite universe $V$ of elements and size $n=|V|$ . Each element $e\in V$ obtains exactly one type from a finite set $T_{e}$ according to a given probability distribution $\mathcal{D}_{e}$ over $T_{e}$ . These types are assigned independently across different elements, i.e., the random vector of types $\mathbf{X}\in\bigtimes_{e\in V}T_{e}$ is drawn from the product distribution $\prod_{e\in V}\mathcal{D}_{e}$ . Given a combinatorial function $f:2^{T}\to\mathbb{R}_{\geq 0}$ for $T\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigcup_{e\in V}T_{e}$ , the valuation of a set $S\subseteq V$ is

[TABLE]

where we define $\mathbf{X}_{S}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\big{\{}\mathbf{X}_{e}\mid e\in S\big{\}}$ to simplify notation.

For example, in the Bernoulli case studied in the stochastic probing literature, each element has two types: active and inactive, the distributions $\mathcal{D}_{e}$ are Bernoulli, and the valuation function $\operatorname{val}_{\mathbf{X}}(S)=f(\{e\in S\mid e\text{ is active}\})$ . Another example is the multi-value max-weight matching problem described in the introduction. Here different types of an element (edge) correspond to its different weights and $\operatorname{val}_{\mathbf{X}}(S)$ is the max-weight matching in the induced subgraph on $S$ .

In this work we always assume the combinatorial function $f:2^{T}\to\mathbb{R}_{\geq 0}$ satisfies $f(\emptyset)=0$ and is monotone, i.e., $f(A)\leq f(B)$ for all $A\subseteq B$ . We also assume it belongs to one of the following classes.

•

subadditive if $f(A\cup B)\leq f(A)+f(B)$ for all $A,B\subseteq T$ .

•

submodular if $f(A\cup B)+f(A\cap B)\leq f(A)+f(B)$ for all $A,B\subseteq T$ . For $S\subseteq T$ , the contraction

[TABLE]

of a monotone submodular function is also monotone submodular.

•

weighted rank function of a family $\mathcal{F}\subseteq 2^{T}$ if $f(A)=\max_{B\in\mathcal{F}}w(A\cap B)$ where $w:2^{T}\to\mathbb{R}_{\geq 0}$ is a linear function with non-negative weights. When $w$ is the all ones vector (i.e., $w(A)=|A|$ ), we call it the unweighted rank function of $\mathcal{F}$ .

In particular, we work with rank functions of two special families $\mathcal{F}\in 2^{V}$ . Subsets in the family are called independent subsets. A family $\mathcal{F}\ni\emptyset$ forms a

•

matroid if for every $A,B\in\mathcal{F}$ with $|A|>|B|$ there exists $x\in A\setminus B$ such that $B\cup\{e\}\in\mathcal{F}$ .

•

$k$ -extendible system if for every $A\subseteq B\in\mathcal{F}$ and $e\in T$ where $A\cup\{e\}\in\mathcal{F}$ , we have that there is a set $Z\subseteq B\setminus A$ such that $|Z|\leq k$ and $B\setminus Z\cup\{e\}\in\mathcal{F}$ .

This latter family is important because it generalizes the family of intersection of $k$ matroids, e.g., a $2$ -extendible systems captures general graph matchings (see [CCPV11] for further discussion).

2.2 Adaptive Strategies and ${\sf SMP}$

Roughly, the goal of an ${\sf SMP}$ problem is to maximize a combinatorial function over independent elements under some “feasibility constraints”. We define a probe of an element $e\in V$ to be an operation that reveals its random type $X_{e}\in T_{e}$ . A probing sequence is an ordered sequence of probes on some elements.

The ${\sf SMP}$ problem only allows a family of probing sequences $\mathcal{C}$ , which are called feasible. We assume minimal properties from this family. Specifically, it is prefix-closed, i.e., for every sequence in $\mathcal{C}$ , each of its prefix is also in $\mathcal{C}$ . This prefix-closed family is powerful because it generalizes any downward-closed family $\mathcal{F}$ (i.e., for all $A\in\mathcal{F}$ and $B\subseteq A$ we have $B\in\mathcal{F}$ ) and can also capture precedence constraints.

We now define an adaptive strategy which constitutes a feasible solution for ${\sf SMP}$ . The nodes in this tree correspond to probes of elements

Definition 2.2 (Adaptive strategy $\mathcal{T}$ ).

It is a rooted decision tree where each non-leaf node is labeled with an element $e\in V$ and has $|T_{e}|$ arcs to child nodes. Each arc is uniquely labeled with a type $t\in T_{e}$ . Whenever we encounter a node labeled $e$ , the adaptive strategy probes $e$ and proceeds to the subtree corresponding to the arc labeled $X_{e}\sim\mathcal{D}_{e}$ . The strategy terminates on reaching a leaf and receives a value of $\operatorname{val}_{\mathbf{X}}(S(\mathbf{X}))$ , where $S(\mathbf{X})\subseteq V$ is the set of probed elements by strategy $\mathcal{T}$ for type vector $\mathbf{X}$ . The objective is the expected valuation, which we denote by

[TABLE]

Notice, since $f$ is monotone, a strategy never gains value by removing a probed element. We say a strategy $\mathcal{T}$ is feasible for $\mathcal{C}$ if every root-leaf path belongs to $\mathcal{C}$ . We now formally define an ${\sf SMP}$ problem.

Definition 2.3 ( ${\sf SMP}$ problem $(\mathcal{C},\operatorname{val}_{\mathbf{X}})$ ).

Given a prefix-closed family of probing constraints $\mathcal{C}$ and a combinatorial valuation $\operatorname{val}_{\mathbf{X}}$ over independent elements, an ${\sf SMP}$ problem is to find a feasible adaptive strategy $\mathcal{T}$ to maximize the expected valuation ${\sf adap}(\mathcal{T},f)$ .

2.3 Non-Adaptive Strategies and Adaptivity Gaps

A strategy to solve an ${\sf SMP}$ problem can benefit from adjusting its probing sequence based on the outcomes of the already probed elements. For instance, in the kidney-exchange example if one finds an edge incident to a vertex $u$ , one may choose not to probe any other edges incident to $u$ . On the other hand, a strategy that always decides the next probe independent of the outcomes of the probed elements is called non-adaptive. Our goal is to study the largest ratio between adaptive and non-adaptive strategies.

Definition 2.4 (Adaptivity gap for $\mathcal{P}$ ).

Let $\mathcal{P}$ be a class of SMP problems (e.g., monotone submodular functions over prefix-closed constraints). Define the adaptivity gap as the largest (worst-case instance) ratio of the optimal adaptive and optimal non-adaptive strategies for a problem $(\mathcal{C},\operatorname{val}_{\mathbf{X}})\in\mathcal{P}$ , i.e.,

[TABLE]

Notice that in the denominator $S$ does not depend on $\mathbf{X}$ .

The adaptivity gap for a general combinatorial function $f$ is unbounded [GNS16]. In this work we focus on monotone submodular functions and (weighted) rank functions of a $k$ -extendible system. We bound adaptivity gaps by analyzing the following natural random walk non-adaptive strategy.

Definition 2.5 (Random walk non-adaptive strategy).

For any given adaptive strategy $\mathcal{T}$ , there is a corresponding non-adaptive strategy that (virtually) draws a sample $\mathbf{X}\sim\prod_{e\in V}\mathcal{D}_{e}$ from the product distribution and traverses $\mathcal{T}$ along the root-leaf path for $\mathbf{X}$ (i.e., when at a node labeled $e$ , traverse the unique arc labeled $X_{e}$ ). Let $S(\mathbf{X})$ be the random set of elements probed by such a root-leaf path. The true (non-virtual) types of elements correspond to the vector of outcomes $\mathbf{X}^{\prime}\sim\prod_{e\in V}\mathcal{D}_{e}$ . Here $\mathbf{X}$ and $\mathbf{X}^{\prime}$ are i.i.d. r.v.s. The random walk non-adaptive strategy probes $S$ according to the above distribution and receives the valuation

[TABLE]

3 Adaptivity Gaps for a Monotone Submodular

Function

In this section we prove our first main result, the optimal adaptivity gap for submodular functions. In §3.1 we prove the upper bound and in §3.2 we prove the lower bound of Theorem 1.1. See 1.1

3.1 Upper Bound of $2$

Our non-adaptive strategy samples a random root-leaf path using the optimal adaptive strategy tree $\mathcal{T}$ (2.5). In other words, it performs a “dry-run” of a random walk along the tree without probing anything. In the end it queries all the elements on this random root-leaf path. We argue that its expected value is at least half of the adaptive strategy. We encourage the reader to follow the proof idea outlined in §1.2 since algebra can conceal the main ideas.

Proof of the upper bound in Theorem 1.1.

We induct over the depth of the tree $\mathcal{T}$ , i.e., for any monotone submodular function $f$ and tree $\mathcal{T}$ of depth at most $d$ , we have

[TABLE]

The base case for $d=1$ is trivially true because the tree is a single node. For induction, let $e$ be the root node of the optimal decision tree $\mathcal{T}$ . Denote by $I\stackrel{{\scriptstyle\mathrm{def}}}{{=}}X_{e}$ the (random) type of element $e$ when probed by the adaptive strategy (and also the virtual type of the non-adaptive strategy), while $R\stackrel{{\scriptstyle\mathrm{def}}}{{=}}X^{\prime}_{e}$ be the (random) true type when probed by the non-adaptive strategy. Also, let $\mathcal{T}_{I}$ denote the subtree the adaptive strategy goes to when the root element is in type $I$ and let $f_{I}$ be the contraction from Eq. (1). This implies

[TABLE]

Now using submodularity and monotonicity of $f$ , we bound the adaptive strategy

[TABLE]

where the last inequality uses that every monotone submodular function is subadditive. Notice that $I$ and $R$ are i.i.d. variables. This along with linearity of expectation implies

[TABLE]

Next, we lower bound the expected value of the non-adaptive strategy from Eq. (4). We use monotonicity of $f$ to get

[TABLE]

Since $f_{I\cup R}$ is also a monotone submodular function over independent elements and $\mathcal{T}_{I}$ is an adaptive strategy tree of depth at most $d-1$ , by induction hypothesis

[TABLE]

Combining this with Eq. (5) and Eq. (6), we get

[TABLE]

which finishes the proof of the upper bound by induction. ∎

3.2 Lower Bound of $2$

In this section we show a monotone non-negative submodular function and a prefix-closed set of constraints where the adaptivity gap for stochastic probing is arbitrarily close to $2$ . Combined with §3.1, this proves Theorem 1.1 that the optimal adaptivity gap is exactly $2$ .

The proof below uses a stochastic probing instance on an infinite universe. Since submodular functions are defined only on finite sets, the proof below is informal. We do this to explain our main ideas and defer the formal proof to Appendix A.

Informal proof of the lower bound in Theorem 1.1.

Our example is on a universe $V:=\{e_{(k,l)}\mid k,l\in\mathbb{Z}_{\geq 0}\}$ where every element is independently active with probability $\epsilon$ for some $0<\epsilon<1$ .

Example:

We define our submodular objective $f$ to be the weighted rank function of a partition matroid that selects at most one element from each part. The elements are partitioned according to their first label—for every $k\in\mathbb{Z}_{\geq 0}$ the set $\{e_{(k,l)}\mid l\in\mathbb{Z}_{\geq 0}\}$ is a part of the partition matroid with weight $(1-\epsilon)^{k}$ . In other words, for any set $S\subseteq V$ let $K(S):=\{k\mid e_{(k,l)}\in S\}$ be the (unique) set of first labels, then

[TABLE]

Note that this series always converges so $f$ is well defined.

To define our prefix-closed constraints, we consider an infinite directed acyclic graph where every element is identified with a single node in the graph. Every node/element $e_{(k,l)}$ has exactly two outgoing edges: towards $e_{(k,l+1)}$ and towards $e_{(k+l+1,0)}$ . We denote $\{e_{(k,0)},e_{(k,1)},\ldots\}$ as the elements on column $k$ . The probing constraint is that a sequence of elements can be probed if and only if it corresponds to a directed path starting at $e_{(0,0)}$ . See Figure 2 for an illustration.

Analysis:

We first give an adaptive strategy with value $2-\epsilon$ (in Eq. (7)) and later argue that every non-adaptive strategy has value at most $1$ (in Eq. (8)); thereby, proving this theorem. Although, the probing constraint allows for infinite strategies, and in a different setting it would not be clear how to define their expected values, since $f$ is monotone we include every active element in the solution. So the expected value of an infinite strategy can be defined as the limit of strategies that only probe a finite number of elements. The finite lower bound example in Appendix A is constructed by reducing $V$ so that the resulting strategies are close to this limit.

Our adaptive strategy ${\sf adap}$ starts with probing element $e_{(0,0)}$ . It is defined recursively: after probing $e_{(k,l)}$ , the next element to probe is either $e_{(k+l+1,0)}$ if $e_{(k,l)}$ is found active, or $e_{(k,l+1)}$ otherwise. In other words, it probes elements on a column until it finds one active, and then probes another column.

Let ${\sf adap}(k)$ denote the expected additional value our above adaptive strategy if the next probed element is $e_{(k,0)}$ and let ${\sf adap}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}{\sf adap}(0)$ denote the expected value of the entire strategy. Note that ${\sf adap}(k)$ does not depend on the set of elements found active before probing $e_{(k,0)}$ (i.e., the elements $e_{(k^{\prime},l^{\prime})}$ where $k^{\prime}<k$ ). Furthermore, the subgraph reachable from $e_{(k,0)}$ is similar to the entire graph on $V$ in the sense that one can relabel the elements in the subgraph to match the entire graph exactly, the only difference being that the value of any subset is multiplied by a factor of $(1-\epsilon)^{k}$ . Therefore, we have

[TABLE]

Now, summing over the number of inactive elements on column [math], we get

[TABLE]

which uses ${\sf adap}(k)=(1-\epsilon)^{k}\cdot{\sf adap}(0)$ . Solving this equation yields the result:

[TABLE]

Similarly, let ${\sf alg}(k)$ denote the expected additional value of the optimal non-adaptive strategy if the next probed element is $e_{(k,0)}$ , and let ${\sf alg}={\sf alg}(0)$ denote the expected value of the optimal non-adaptive strategy. By the same argument as ${\sf adap}(k)$ , we have

[TABLE]

Let $k$ denote the number of elements the optimal non-adaptive strategy probes on column [math]. We get

[TABLE]

which uses ${\sf alg}(k)=(1-\epsilon)^{k}\cdot{\sf alg}(0)$ . This implies

[TABLE]

Combining Eq. (7) and Eq. (8), we get an adaptivity gap arbitrarily close to $2$ for $\epsilon\rightarrow 0$ . ∎

4 Adaptivity Gaps for a Weighted Rank Function of a $k$ -Extendible

System

For a downward-closed family $\mathcal{F}$ , recollect that we define its rank function $f_{\mathcal{F}}:2^{V}\rightarrow\mathbb{R}_{\geq 0}$ to be the largest cardinality subset in $\mathcal{F}$ , i.e., $f_{\mathcal{F}}(S)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\max_{T\subseteq S~{}\&~{}T\in\mathcal{F}}|T|=\max_{T\in\mathcal{F}}|S\cap T|$ . In this section we prove our results on the adaptivity gaps of a weighted rank function of a $k$ -extendible system. See 1.2

In §4.1 we prove the upper bound for unweighted $k$ -extendible systems, and in §4.2 we give a reduction from weighted to unweighted $k$ -extendible systems that loses a factor $O(\log k)$ in the adaptivity gap. Our lower bound is presented in §4.3.

To simplify our proofs, we define an element $e\in T$ as a loop in $\mathcal{F}\subseteq 2^{T}$ if $\{e\}\not\in\mathcal{F}$ . Furthermore, given a non-loop element $e\in T$ , we define the contraction $\mathcal{F}/e$ as $\{F\setminus\{e\}\mid F\in\mathcal{F},e\in F\}$ , i.e., the family of subsets that contain $e$ but with $e$ removed. We also need the following property of $k$ -extendible systems, which intuitively means a set $E\in\mathcal{F}$ hurts at most $k\cdot|E|$ from another set $B\in\mathcal{F}$ . We include the proof for completeness in Appendix B.

Fact 4.1.

Let $\mathcal{F}\subseteq 2^{T}$ be a $k$ -extendible system. For every $A\subseteq B\in\mathcal{F}$ and $E\subseteq T$ where $A\cup E\in\mathcal{F}$ , there exists a set $Z\subseteq B\setminus A$ such that $|Z|\leq k\cdot|E|$ and $B\setminus Z\cup E\in\mathcal{F}$ .

4.1 Upper Bound of $2k$ for an Unweighted $k$ -Extendible System

Let $\mathcal{T}$ denote the optimal adaptive strategy for maximizing the rank function $f$ of a given $k$ -extendible system $\mathcal{F}$ . We prove the following unweighted upper bound of Theorem 1.2.

Theorem 4.2.

The adaptivity gap for ${\sf SMP}$ where the constraints are prefix-closed and the function is an unweighted rank function of a $k$ -extendible system is at most $2k$ .

We use the random walk strategy to convert the adaptive strategy $\mathcal{T}$ into a non-adaptive strategy. To analyze our algorithm, we define a natural greedy procedure to select a subset of $A\subseteq T$ that is also in $\mathcal{F}\subseteq 2^{T}$ . First, consider elements of $A$ in an arbitrary order (which can even be determined on the fly). If the currently considered element is a non-loop, it gets contracted in $\mathcal{F}$ ; otherwise it gets ignored. Any such computed set is in $\mathcal{F}$ and the final output, the number of contracted elements, is denoted by $\operatorname{greedy}(A)$ . We first show that for $k$ -extendible systems such a greedy procedure produces a $k$ -approximation to the largest subset in $\mathcal{F}$ . A similar statement has been proven by Mestre [Mes06].

Lemma 4.3.

Let $f$ be a rank function of a $k$ -extendible system $\mathcal{F}\subseteq 2^{T}$ . Fix any subset $A\subseteq T$ and consider the output of the greedy procedure $\operatorname{greedy}(A)$ with an arbitrary ordering of $A$ . We have that $f(A)\leq k\cdot\operatorname{greedy}(A)$ . Even more, for any $A\subseteq B\subseteq T$ we have that $f(A)\leq k\cdot greedy(B)$ .

Proof.

Let $G\subseteq B$ be the set picked by $\operatorname{greedy}(B)$ . Notice that $G$ is a maximal set in $\mathcal{F}$ (need not be maximum). On the other hand, let $\mathrm{OPT}\subseteq A$ be the set picked by $f(A)$ , i.e., the maximum set in $\mathcal{F}$ on $A$ . Our goal is to prove $|\mathrm{OPT}|\leq k\cdot|G|$ .

Let $C\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\mathrm{OPT}\cap G$ , note that $G=C\cup(G\setminus C)\in\mathcal{F}$ and $C\subseteq\mathrm{OPT}$ , hence by 4.1 there is a $Z\subseteq\mathrm{OPT}\setminus C$ with $|Z|\leq k\cdot|G\setminus C|=k\cdot|G|-k\cdot|C|$ such that $\mathrm{OPT}\setminus Z\cup(G\setminus C)=(\mathrm{OPT}\setminus C)\setminus Z\cup G\in\mathcal{F}$ . However, since $G$ is a maximal set and $(\mathrm{OPT}\setminus C)\cap G=\emptyset$ we know that $\mathrm{OPT}\setminus C\setminus Z=\emptyset$ and hence $|\mathrm{OPT}|\leq|Z|+|C|\leq k\cdot|G|-k\cdot|C|+|C|=k\cdot|G|-(k-1)|C|\leq k\cdot|G|$ . ∎

Given the above properties of a $k$ -extendible system, we can now prove Theorem 4.2.

Proof of Theorem 4.2.

Let $\mathbf{X}$ and $\mathbf{X}^{\prime}$ denote the element types for the adaptive and the non-adaptive algorithms, respectively. The adaptive strategy on the optimal decision tree $\mathcal{T}$ gets value $f(\mathbf{X}_{S})$ , where $S\subseteq V$ is the set of probed elements by strategy $\mathcal{T}$ for type vector $\mathbf{X}$ . We compare this value to a greedy strategy $\operatorname{greedy}(\mathbf{X}_{S}\cup\mathbf{X}^{\prime}_{S})$ in which

(a)

we consider the elements of $S$ in root-to-leaf order in which they appear on the tree and 2. (b)

for any $e\in S$ we first consider $\mathbf{X}^{\prime}_{e}$ (the true type) before $\mathbf{X}_{e}$ (the virtual type) in the greedy order.

Note by Lemma 4.3 we have

[TABLE]

By induction on the subtrees, below we prove

[TABLE]

This finishes the proof of Theorem 4.2 because the optimal non-adaptive algorithm has value at least

[TABLE]

To prove the missing Eq. (9), we induct on the height of the tree and $\mathcal{F}$ being any downward-closed family. For consistency, we define the notation of $\operatorname{greedy}(\mathcal{T},f)$ to denote the value of the above greedy strategy when run on $\mathcal{T}$ with a rank function $f$ . Thus, $\operatorname{greedy}(\mathcal{T},f)=\mathbb{E}_{\mathbf{X},\mathbf{X}^{\prime}}[\operatorname{greedy}(\mathbf{X}_{S}\cup\mathbf{X}^{\prime}_{S})]$ . Suppose $e\in V$ is the label of the root of $\mathcal{T}$ . Denote by $I\stackrel{{\scriptstyle\mathrm{def}}}{{=}}X_{e}$ the (random) type of element $e$ when probed by the adaptive strategy (which is also the virtual type of the non-adaptive strategy), and denote $R\stackrel{{\scriptstyle\mathrm{def}}}{{=}}X^{\prime}_{e}$ the (random) true type when probed by the non-adaptive strategy. Also, let $\mathcal{T}_{I}$ denote the subtree the adaptive strategy goes to when the root $e$ is in state $I$ . We have

[TABLE]

where by $(f/R)/I$ we mean the rank function of $\mathcal{F}$ after we first contract $R$ if it a non-loop, and then contract $I$ if it is still a non-loop. Now subadditivity of $f$ gives

[TABLE]

where the last equality uses linearity of expectation as $I$ and $R$ are identically distributed.

Next, we lower bound the value of our non-adaptive algorithm. Although it takes a random root-leaf path and decides the set of elements to retain in the end, we lower bound its value by an online algorithm that greedily selects $R$ (unless it is a loop), however, always also contracts $I$ if it is a non-loop. This gives,

[TABLE]

Since $(f/R)/I$ is also a rank function of a downward-closed system and $\mathcal{T}_{I}$ is an adaptive strategy, by induction hypothesis we have

[TABLE]

Combining this with Eq. (10) and Eq. (11), we get

[TABLE]

which proves Eq. (9) by induction. ∎

4.2 Reducing Weighted to Unweighted $k$ -Extendible System by Losing

$O(\log k)$

We show how to extend the adaptivity gap result for an unweighted $k$ -extendible system to a weighted $k$ -extendible system by losing an $O(\log k)$ factor.

Theorem 4.4.

For ${\sf SMP}$ over prefix-closed constraints, the adaptivity gap for a weighted rank function of a $k$ -extendible system is at most $32k\log_{2}k$ .

Proof.

Given a weighted rank function $f$ of a $k$ -extendible system $\mathcal{F}\subseteq 2^{T}$ over a set of types $T$ , we define $f_{j}$ for $j\in\mathbb{Z}$ to be an unweighted rank function of the $k$ -extendible system $\mathcal{F}$ ; however, the new weights are changed such that only the types with original weights in $(2^{j-1},2^{j}]$ participate with new weight of $1$ , while the other elements have a new weight of [math]. Note that this partitions the set of types $T$ into pairwise disjoint classes. Notice, we have

[TABLE]

where $adap(\mathcal{T},f_{j})$ denotes the expected value of an adaptive strategy given by the common decision tree $\mathcal{T}$ with respect to the rank function $f_{j}$ .

Now, since ${\sf adap}(\mathcal{T},f_{j})$ is an unweighted $k$ -extendible system problem, we know that a random root-leaf path returns a solution with expected value

[TABLE]

In the following lemma, we show that these non-adaptive solutions for $f_{j}$ can be combined to obtain a feasible and “high-value” non-adaptive solution for $f$ .

Lemma 4.5.

The random-walk non-adaptive algorithm ${\sf alg}$ has expected value

[TABLE]

Before proving Lemma 4.5, we finish the proof of Theorem 4.4 by combining it with Eq. (13) and Eq. (12):

[TABLE]

Informally, in the proof of Lemma 4.5 we combine the unweighted solutions of ${\sf alg}(\mathcal{T},f_{i})$ by running a “greedy-optimal” algorithm from the higher weight to the smaller weight classes and fixing the types chosen in earlier classes. Unfortunately, in general such an approach loses an extra factor $k$ in the approximation. To fix this, our second idea is to increase the weight gap between successive classes. We achieve this by combining $O(\log k)$ consecutive classes into a bucket, where in each bucket we focus on the class with the largest non-adaptive value. Because of boundary issues, we only take either odd or even buckets.

Proof of Lemma 4.5.

Let $a\leq b\in\mathbb{Z}$ denote the indices of the smallest and the highest weight classes. We define buckets consisting of $2\log k$ consecutive classes, where bucket $B_{i}$ consists of classes $\{b-2i\log k,b-2i\log k-1,\ldots,b-2(i-1)\log k\}$ . For each $B_{i}$ , let

[TABLE]

Since each bucket has size $2\log k$ , this implies

[TABLE]

Without loss of generality we can assume the odd indices satisfy

[TABLE]

Otherwise, use the same argument for even indices. Combining the last two equations, we get

[TABLE]

We now claim that a greedy-optimal algorithm has a large value: It goes over classes $j(i)$ in decreasing order of (odd) buckets, but it always selects the maximum independent set (instead of selecting a maximal greedy set) in the current class $j(i)$ given its choices in the previous. This algorithm is, therefore, a combination of greedy and optimal algorithms.

Claim 4.6.

Consider an algorithm that goes over the odd numbered buckets in decreasing order of weights and selects the maximum set from class $j(i)$ in bucket $i$ such that the resulting set is still feasible in $\mathcal{F}$ . (After a set in a class is selected, it gets fixed for all smaller choices.) The finally chosen set has value at least

[TABLE]

Proof.

The intuition is that for a $k$ -extendible system by 4.1 any selected member can “hurt” at most $k$ members from lower buckets. Since we only consider odd numbered buckets, two types in different buckets differ in their weights by at least a factor of $2^{2\log k}=k^{2}$ . Thus, losing $k$ types of lower weight should not significantly impact the value.

Let $\ell$ be the random variable denoting the leaf reached by the random walk on the decision tree $\mathcal{T}$ , and let $R$ be the random set of elements seen by the random-walk non-adaptive strategy on this path. Furthermore, let $A_{i}$ denote the set of elements picked by the non-adaptive strategy with respect to $f_{j(i)}$ , let $A^{\prime}_{i}\subseteq A_{i}$ be the set of elements picked by our greedy-optimal non-adaptive strategy from bucket $i$ , and let $A^{\prime}_{<i}$ denote $\bigcup_{i^{\prime}<i~{}:~{}i^{\prime}\text{ is odd}}A_{i^{\prime}}$ . In other words, $A^{\prime}_{<i}$ is the greedy-optimal solution up to bucket number $i$ and $A^{\prime}_{i}$ is the maximum subset of $A_{i}$ such that $A^{\prime}_{i}\cup A^{\prime}_{<i}\in\mathcal{F}$ . Note that $A_{i}$ , $A^{\prime}_{i}$ and $A^{\prime}_{<i}$ are random variables depending on $\ell$ and $R$ .

Using 4.1 on the $k$ -extendible system $\mathcal{F}$ with the preconditions $\emptyset\cup A^{\prime}_{<i}\in\mathcal{F}$ and $\emptyset\subseteq A_{i}$ , there exists a set $Z$ with $|Z|\leq k\cdot|A^{\prime}_{<i}|$ such that $A_{i}\setminus Z\in\mathcal{F}$ . Hence, we have

[TABLE]

Multiplying by $2^{j(i)}$ and summing over all odd $i$ gives

[TABLE]

Now, since every bucket $i$ contains $2\log k$ classes, where two successive class weights differ by a factor of $2$ , we know

[TABLE]

Combining this with Eq. (4.2) gives

[TABLE]

where the last inequality uses

[TABLE]

After rearranging,

[TABLE]

Notice that by definition of a class, each type in class $j(i)$ has weight at least $2^{j(i)-1}$ . Using this fact and taking expectation over $\ell$ and $R$ , we get

[TABLE]

which finishes the proof of 4.6. ∎

Using 4.6, we have

[TABLE]

which combined when with Eq. (14) proves Lemma 4.5. ∎

4.3 Lower Bounds

We present two very similar lower bound examples: one where the adaptivity gap is $k-o(1)$ for a rank function of an unweighted $k$ -extendible system and another where the adaptivity gap is $\Omega(\sqrt{k})$ for a rank function of an intersection of $k$ matroids. A related example was also shown in [GNS17].

Example:

For generality we work in the Bernoulli setting where each element in $V$ is either active or inactive. Consider a perfect $w$ -ary tree of depth $k$ whose edges correspond to the ground set $V$ . Each edge is active with probability $p>0$ . For any leaf $\ell$ , let $P_{\ell}$ denote the unique path from the root to $\ell$ . The objective value on any set is the maximum number of edges in the set on the same root-leaf path, i.e., for any $S\subseteq V$ ,

[TABLE]

The feasibility constraints are such that a set of edges can be probed if and only if there exists some root-leaf path $P_{\ell}$ such that every probed edge has at least one endpoint on $P_{\ell}.$ Note that this implies that a maximum of $w\cdot k$ edges can be probed.

Analysis:

Let the adaptive strategy be the following: probe all $w$ edges incident to the root. If any of them is active, start probing the edges directly below the active edge, otherwise below the first edge. Continue recursively until a leaf is reached. On every level, the adaptive strategy has $1-(1-p)^{w}$ probability of finding an active edge. Therefore, the expected value of the adaptive strategy is $k\cdot(1-(1-p)^{w}).$

For any non-adaptive strategy, the feasibility constraints imply there exists a root-leaf path $P_{\ell}$ such that all probed edges have an endpoint on it. Suppose all $w\cdot k$ edges incident to $P_{\ell}$ are probed. The non-adaptive strategy can get value at most $1$ from the edges not on $P_{\ell}$ and in expectation at most $k\cdot p$ from the edges on $P_{\ell}.$ So, the non-adaptive strategy has an expected value of at most $1+k\cdot p.$

Lower Bound of $k$ for an unweighted $k$ -extendible system

Consider the example described above and set $w\stackrel{{\scriptstyle\mathrm{def}}}{{=}}k^{4}$ and $p\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{1}{k^{3}}$ . The function $f$ is trivially a rank function of a $k$ -extendible system because the rank of the system is $k$ , i.e., $f(V)=k$ . The adaptive strategy has an expected value

[TABLE]

whereas any non-adaptive strategy has an expected value at most $1+\frac{1}{k^{2}}.$ This gives an adaptivity gap of $k-o(1)$ .

Lower Bound of $\Omega(\sqrt{k})$ for an unweighted

intersection of $k$ matroids

In this section we show how to model the above example as an intersection of $t=k^{2}$ matroids, yielding an adaptivity gap of $\Omega(\sqrt{t})$ for an intersection of $t$ matroids. Consider the example described above and set $w\stackrel{{\scriptstyle\mathrm{def}}}{{=}}k$ and $p\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\frac{1}{k}$ . The adaptive strategy has an expected value of

[TABLE]

and the non-adaptive strategy gets at most $2$ in expectation; so the adaptivity gap is $\Omega(k)$ .

All that remains to show is that $f$ can be represented as an intersection of $k^{2}$ simple partition matroids. We use the term simple partition matroid for a matroid that partitions the $V$ into multiple parts and a set is independent if it contains at most one element in every part.

Suppose that $k$ is prime and label each node $v$ with a list $L_{v}$ as follows: the root’s label is an empty list $()$ . Let $L(i)$ denote the $i^{th}$ element of the list $L$ and $L+x$ a list equal to $L$ with $x$ appended to it. All the other nodes are labeled recursively: let $v$ be a node with children $\{v_{0},v_{1},...v_{k-1}\}$ . Define $L_{v_{i}}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}L_{v}+i$ . Hence, $u$ is an ancestor of $v$ if and only if $L_{u}$ is a prefix of $L_{v}$ , and otherwise $L_{u}(i)\neq L_{v}(i)$ for some $i$ .

Let $e_{v}$ denote the edge/element between $v$ and its parent. We define $k^{2}$ partition matroids $M_{i,j}$ for $i\in\{1,2,...,k\}$ and $j\in\{0,1,...,k-1\}$ . Each $M_{i,j}$ consists of $k$ big partitions indexed from [math] to $k-1$ , and all other partitions contain only a single element. Let

[TABLE]

For a node $v$ on depth $d_{v}\geq i$ , element $e_{v}$ is in the $I_{v}(i,j)^{th}$ big partition of $M_{i,j}$ . For a node $v$ on depth $d_{v}<i$ , $e_{v}$ is the only element in its partition in $M_{i,j}$ .

We claim that $f$ is the rank function of $\mathcal{F}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\bigcap_{i=1}^{k}\bigcap_{j=0}^{k-1}M_{i,j}$ , which is an intersection of $k^{2}$ matroids. Since $\mathcal{F}$ is an intersection of simple partition matroids, $S\in\mathcal{F}$ if and only if $\{a,b\}\in\mathcal{F}$ for every $a,b\in S$ . Now consider two nodes $u,v$ such that $\{e_{u},e_{v}\}\not\in\mathcal{F}$ . This means $I_{u}(i,j)=I_{v}(i,j)$ for some $i\leq d_{u},d_{v}$ and $j\in\{0,1,...,k-1\}$ , which is equivalent to

[TABLE]

Since $k$ is prime, this holds for some $i,j$ if and only if $d_{u}=d_{v}$ (for $j=0,i=1$ ) or $L_{u}(i)\neq L_{v}(i)$ for any $i$ . That is, $\{e_{u},e_{v}\}\not\in\mathcal{F}$ if and only if $u$ and $v$ are not ancestors of one another, which completes the proof.

Acknowledgments. We thank Anupam Gupta for useful discussions. The second author was supported in part by NSF awards CCF-1319811, CCF-1536002, and CCF-1617790. The third author was supported in part by CCF-1527110, CCF-1618280 and NSF CAREER award CCF-1750808.

Appendix A Adaptivity Gap Lower Bound of $2$ for Submodular Functions

Proof.

As mentioned, the finite lower bound example is constructed by reducing the infinite example given in Section 3.2. However, this reduction loses the nice similarity properties of the graph so much more calculation is required in order to bound the strategies.

Let $0<\epsilon<1/2$ and $D$ be the smallest integer such that $(1-\epsilon)^{D}<\epsilon^{2}$ . The ground set is the result of removing elements $e_{(k,l)}$ where $k+l>D$ , that is $V\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{e_{(k,l)}:k,l\in\mathbb{Z}_{\geq 0},k+l\leq D\}$ where each node is active with probability $\epsilon$ . The probing constraint and the objective function $f$ are naturally reduced to this set: a sequence of elements can be probed if they correspond to a (finite) path starting at $e_{(0,0)}$ in the given graph, and $f(S)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\sum_{k\in K(S)}(1-\epsilon)^{k}$ where $K(S)$ is the set of (unique) first labels which now finite. Similarly as before, we will denote $\{e_{(k,0)},e_{(k,1)},\ldots,e_{(k,D-k)}\}$ as the vertices on the column $k$ .

We first show that any non-adaptive strategy has expectation at most 1. Let ${\sf alg}(k)$ denote the additional expected value of the optimal non-adaptive strategy if the next probed element is $e_{(k,0)}$ . We will inductively prove ${\sf alg}(k)<(1-\epsilon)^{k}$ , which is sufficient for our claim. For the base case $k=D$ , the inequality clearly holds since ${\sf alg}(D)=\epsilon(1-\epsilon)^{D}<(1-\epsilon)^{D}$ . For $0\leq k<D$ let $i$ be the second label of the last vertex probed on the column $k$ .

[TABLE]

This completes the induction and proves that non-adaptive strategies get at most $1$ .

Finally, we show that there exists an adaptive strategy with expected value at least $2-O(\epsilon)$ for sufficiently small $\epsilon>0$ . This finalizes the proof since it implies a gap of $2$ by taking $\epsilon\to 0$ . The strategy is naturally reduced: first probe $e{(0,0)}$ and after probing some $e_{(k,l)}$ terminate if $k+l=D$ , otherwise probe $e_{(k+l+1,0)}$ if $e_{(k,l)}$ is active and $e_{(k,l+1)}$ if not. Let ${\sf adap}(k)$ denote the expected value this strategy gets when the next probed element is $e_{(k,0)}$ , for $0\leq k\leq D$ . For convenience, define ${\sf adap}(D+i)\stackrel{{\scriptstyle\mathrm{def}}}{{=}}0$ for all $i\geq 1$ .

We prove by induction that ${\sf adap}(k)>\frac{4-6\epsilon}{2-\epsilon}(1-\epsilon)^{k}-8\epsilon$ , which is sufficient to finalize the proof since then ${\sf adap}(0)>2-O(\epsilon)$ . For $k$ large enough that $\frac{4-6\epsilon}{2-\epsilon}(1-\epsilon)^{k}<8\epsilon$ , the inequality clearly holds and presents our base case. Otherwise, $(1-\epsilon)^{k}\geq 8\frac{2-\epsilon}{4-6\epsilon}\epsilon>4\epsilon$ . Let $i$ be the second label of the last vertex probed on the column $k$ and let $A$ denote the set of active elements.

[TABLE]

Using the induction hypothesis, we get

[TABLE]

After dropping some positive summands and using $(1-\epsilon)^{D}<\epsilon$ and $(1-\epsilon)^{k}>\epsilon$ , we get

[TABLE]

It is sufficient to prove

[TABLE]

Multiplying by $\frac{(2-\epsilon)^{2}}{(1-\epsilon)^{k}}>0$ , we get an equivalent statement to prove:

[TABLE]

Finally, using $\epsilon^{2}\frac{(2-\epsilon)^{2}}{(1-\epsilon)^{k}}<\epsilon^{2}(2-\epsilon)^{2}\frac{1}{4\epsilon}=\epsilon+O(\epsilon^{2})$ and expanding out, we note that the left-hand side is $8-15\epsilon+O(\epsilon^{2})$ , while the right-hand side is $8-16\epsilon+O(\epsilon^{2})$ . Therefore, the inequality holds for sufficiently small $\epsilon>0$ . This concludes the proof. ∎

Appendix B Proof of the $k$ -Extendible Property for Set Extension

See 4.1

Proof.

Enumerate the elements $E=\{e_{1},\ldots,e_{r}\}$ where $r\stackrel{{\scriptstyle\mathrm{def}}}{{=}}|E|$ and denote by $E_{i}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\{e_{1},\ldots,e_{i}\}$ for $0\leq i\leq r$ . Initialize $Z_{0}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}\emptyset$ and consider the following procedure to construct $Z_{1},Z_{2},\ldots,Z_{r}$ that satisfies the invariants $A\subseteq B\setminus Z_{i}$ , $B\setminus Z_{i}\cup E_{i}\in\mathcal{F}$ and $|Z_{i}|\leq k\cdot i$ .

In the $i^{th}$ step we have that $A\cup E_{i-1}\cup\{e_{i}\}\in\mathcal{F}$ by downward-closeness and $A\cup E_{i-1}\subseteq B\setminus Z_{i-1}\cup E_{i-1}$ by the induction hypothesis. Hence by $k$ -extendibility we can find $Z^{\prime}\subseteq B\setminus(Z_{i-1}\cup A\cup E_{i-1})$ with $|Z^{\prime}|\leq k$ and where $(B\setminus Z_{i-1}\cup E_{i-1})\setminus Z^{\prime}\cup\{e_{i}\}=B\setminus(Z_{i-1}\cup Z^{\prime})\cup E_{i}\in\mathcal{F}$ . Set $Z_{i}\stackrel{{\scriptstyle\mathrm{def}}}{{=}}Z_{i-1}\cup Z^{\prime}$ and note that $|Z_{i}|\leq|Z_{i-1}|+|Z^{\prime}|\leq(i-1)\cdot k+k=i\cdot k$ . Furthermore, already deduced that $B\setminus Z_{i}\cup E_{i}\in\mathcal{F}$ and finally $A\subseteq B\setminus Z_{i}=B\setminus Z_{i-1}\setminus Z^{\prime}$ since $Z^{\prime}\cap A=\emptyset$ . We satisfied all stipulations of the induction, hence we report $Z_{r}$ as the solution. ∎

Bibliography31

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1[Ada 11] Marek Adamczyk. Improved analysis of the greedy algorithm for stochastic matching. Inf. Process. Lett. , 111(15):731–737, 2011.
2[AGM 15] Marek Adamczyk, Fabrizio Grandoni, and Joydeep Mukherjee. Improved approximation algorithms for stochastic matching. In Algorithms-ESA 2015 , pages 1–12. Springer, 2015.
3[AN 16] Arash Asadpour and Hamid Nazerzadeh. Maximizing stochastic monotone submodular functions. Management Science , 62(8):2374–2391, 2016.
4[ANS 08] Arash Asadpour, Hamid Nazerzadeh, and Amin Saberi. Stochastic submodular maximization. In International Workshop on Internet and Network Economics , pages 477–489. Springer, 2008. Full version appears as [ AN 16 ] .
5[AR 12] Itai Ashlagi and Alvin E. Roth. New challenges in multihospital kidney exchange. American Economic Review , 102(3):354–59, 2012.
6[ASW 14] Marek Adamczyk, Maxim Sviridenko, and Justin Ward. Submodular stochastic probing on matroids. In STACS , pages 29–40, 2014.
7[BCN + 15] Alok Baveja, Amit Chavan, Andrei Nikiforov, Aravind Srinivasan, and Pan Xu. Improved bounds in stochastic matching and optimization. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, August 24-26, 2015, Princeton, NJ, USA , pages 124–134, 2015.
8[BGK 11] Anand Bhalgat, Ashish Goel, and Sanjeev Khanna. Improved approximation results for stochastic knapsack problems. In SODA , pages 1647–1665, 2011.

TL;DR

Contribution

Findings

Abstract

Peer Reviews

Videos

(Near) Optimal Adaptivity Gaps for

Abstract

1 Introduction

1.1 Overview of Results

Theorem 1.1**.**

Theorem 1.2**.**

1.2 Techniques and Challenges

1.3 Further Related Work

2 Stochastic Multi-Value Probing Model

2.1 Combinatorial Valuation over Independent Elements

Definition 2.1** (Combinatorial valuation val⁡X\operatorname{val}_{\mathbf{X}}valX​ over independent elements).**

2.2 Adaptive Strategies and SMP{\sf SMP}SMP

Definition 2.2** (Adaptive strategy T\mathcal{T}T).**

Definition 2.3** (SMP{\sf SMP}SMP problem (C,val⁡X)(\mathcal{C},\operatorname{val}_{\mathbf{X}})(C,valX​)).**

2.3 Non-Adaptive Strategies and Adaptivity Gaps

Definition 2.4** (Adaptivity gap for P\mathcal{P}P).**

Definition 2.5** (Random walk non-adaptive strategy).**

3 Adaptivity Gaps for a Monotone Submodular

3.1 Upper Bound of 222

Proof of the upper bound in Theorem 1.1.

3.2 Lower Bound of 222

Informal proof of the lower bound in Theorem 1.1.

Example:

Analysis:

4 Adaptivity Gaps for a Weighted Rank Function of a kkk-Extendible

Fact 4.1**.**

4.1 Upper Bound of 2k2k2k for an Unweighted kkk-Extendible System

Theorem 4.2**.**

Lemma 4.3**.**

Proof.

Proof of Theorem 4.2.

4.2 Reducing Weighted to Unweighted kkk-Extendible System by Losing

Theorem 4.4**.**

Proof.

Lemma 4.5**.**

Proof of Lemma 4.5.

Claim 4.6**.**

Proof.

4.3 Lower Bounds

Example:

Analysis:

Lower Bound of kkk for an unweighted kkk-extendible system

Lower Bound of Ω(k)\Omega(\sqrt{k})Ω(k​) for an unweighted

Appendix A Adaptivity Gap Lower Bound of 222 for Submodular Functions

Proof.

Appendix B Proof of the kkk-Extendible Property for Set Extension

Proof.

Theorem 1.1.

Theorem 1.2.

Definition 2.1 (Combinatorial valuation $\operatorname{val}_{\mathbf{X}}$ over independent elements).

2.2 Adaptive Strategies and ${\sf SMP}$

Definition 2.2 (Adaptive strategy $\mathcal{T}$ ).

Definition 2.3 ( ${\sf SMP}$ problem $(\mathcal{C},\operatorname{val}_{\mathbf{X}})$ ).

Definition 2.4 (Adaptivity gap for $\mathcal{P}$ ).

Definition 2.5 (Random walk non-adaptive strategy).

3.1 Upper Bound of $2$

3.2 Lower Bound of $2$

4 Adaptivity Gaps for a Weighted Rank Function of a $k$ -Extendible

Fact 4.1.

4.1 Upper Bound of $2k$ for an Unweighted $k$ -Extendible System

Theorem 4.2.

Lemma 4.3.

4.2 Reducing Weighted to Unweighted $k$ -Extendible System by Losing

Theorem 4.4.

Lemma 4.5.

Claim 4.6.

Lower Bound of $k$ for an unweighted $k$ -extendible system

Lower Bound of $\Omega(\sqrt{k})$ for an unweighted

Appendix A Adaptivity Gap Lower Bound of $2$ for Submodular Functions

Appendix B Proof of the $k$ -Extendible Property for Set Extension