Pandora's Problem with Nonobligatory Inspection
Hedyeh Beyhaghi, Robert Kleinberg

TL;DR
This paper studies a variant of Pandora's search problem where inspection costs are nonobligatory, providing the first approximation guarantees for simple, computationally efficient policies that closely approximate the optimal solution.
Contribution
It introduces a family of committing policies for the nonobligatory inspection problem and proves they approximate the optimal policy within a guaranteed factor.
Findings
Optimal committing policy approximates the fully optimal policy within 63%.
For two options, the approximation factor improves to 80%.
The 80% approximation is tight for committing policies.
Abstract
Martin Weitzman's "Pandora's problem" furnishes the mathematical basis for optimal search theory in economics. Nearly 40 years later, Laura Doval introduced a version of the problem in which the searcher is not obligated to pay the cost of inspecting an alternative's value before selecting it. Unlike the original Pandora's problem, the version with nonobligatory inspection cannot be solved optimally by any simple ranking-based policy, and it is unknown whether there exists any polynomial-time algorithm to compute the optimal policy. This motivates the study of approximately optimal policies that are simple and computationally efficient. In this work we provide the first non-trivial approximation guarantees for this problem. We introduce a family of "committing policies" such that it is computationally easy to find and implement the optimal committing policy. We prove that the optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuction Theory and Applications · Optimization and Search Problems · Advanced Bandit Algorithms Research
Pandora’s Problem with Nonobligatory Inspection
Hedyeh Beyhaghi and Robert Kleinberg
Cornell University Supported in part by NSF grant CCF-1512964.
Abstract
Martin Weitzman’s “Pandora’s problem” furnishes the mathematical basis for optimal search theory in economics. Nearly 40 years later, Laura Doval introduced a version of the problem in which the searcher is not obligated to pay the cost of inspecting an alternative’s value before selecting it. Unlike the original Pandora’s problem, the version with nonobligatory inspection cannot be solved optimally by any simple ranking-based policy, and it is unknown whether there exists any polynomial-time algorithm to compute the optimal policy. This motivates the study of approximately optimal policies that are simple and computationally efficient. In this work we provide the first non-trivial approximation guarantees for this problem. We introduce a family of “committing policies” such that it is computationally easy to find and implement the optimal committing policy. We prove that the optimal committing policy is guaranteed to approximate the fully optimal policy within a factor, and for the special case of two boxes we improve this factor to and show that this approximation is tight for the class of committing policies.
1 Introduction
Search theory, which concerns the ways in which costs of obtaining information affect the structure and outcome of optimization procedures, was born in 1961 when the economist George Stigler [11] sought to understand the phenomenon of price dispersion. When sellers charge different prices for identical goods, why do consumers ever choose the higher-priced seller? Stigler realized that this counter-intuitive behavior could be explained by search frictions whereby consumers must expend costly effort to find and/or evaluate sellers.
The insight that optimization has qualitatively different outcomes under search frictions resounded beyond economics, and particularly within computer science. Models of costly information acquisition have been incorporated into information retrieval, robotics, database theory, distributed systems, and of course also into sub-areas of CS such as algorithmic pricing and mechanism design that explicitly relate to economics.
From a mathematical standpoint, the most foundational model of optimal search was articulated by Martin Weitzman [12] under the name Pandora’s problem. The basic elements of the problem are as follows. A searcher is allowed to select a prize from one of closed boxes. The values of the prizes inside the boxes are independent random variables, sampled from (not necessarily identical) distributions that are known to the searcher. The searcher chooses a sequence of operations, each of which is either opening a box or selecting the box. Opening box has an associated cost and results in learning the value of the prize contained inside. Selecting box results in a payoff of and immediately ends the search process; this operation can only be performed after box has been opened. The searcher’s goal is to design an adaptive policy (i.e., a choice of which operation to perform next, for every possible past history of operations and their outcomes) to maximize the expectation of the prize selected, minus the sum of the inspection costs accrued while opening boxes.
A priori, it would appear that the solution to Pandora’s problem may be horribly complex. An optimal adaptive policy must specify the next operation to be performed given any past history. If each is drawn from a distribution with support size , the number of possible histories is , so adaptive policies in general have exponential description size. It is easy to see that the optimal policy can be implemented in space , but there is no obvious reason why the complexity of Pandora’s problem should lie anywhere below PSPACE.
Surprisingly, though, the solution to Pandora’s problem is not complex at all. Weitzman proved that the optimal policy has a beautifully simple structure: one computes a reservation value for each box, sorts them in decreasing order of reservation value, and opens them in this order, stopping and selecting the first open box whose prize value exceeds the reservation value of every remaining closed box. This entire process can be implemented to run in time .
A key assumption in Pandora’s problem is that the searcher must open a box, and suffer the attendant cost, before selecting it. This assumption limits the applicability of the model. Given that the value of the prize inside a box is drawn from a distribution known in advance, in many cases it may be more advantageous to select a box without paying to inspect its contents. For example, when using Pandora’s problem to model a firm searching for an employee to hire, boxes represent job candidates. The cost of opening a box represents the cost to the firm of undertaking a process, such as an interview or internship, to assess the value of hiring a candidate. If the evidence of a candidate’s promise is sufficiently strong a priori, it may be realistic to assume that the firm is willing to hire him or her directly, skipping the costly evaluation process. This motivates a version of Pandora’s problem in which a box may be selected without opening it, if the searcher so desires.
Given that Weitzman’s original model dates from 1979 and has been cited almost 900 times, it is quite surprising that this close variant never appeared in the literature until a 2018 paper by Laura Doval [3]. The relative unpopularity of the variant with nonobligatory inspection can probably be attributed to the apparent complexity and lack of structure in its optimal solution. For example, Doval presents an example of a problem instance (Problem 3 in [3]) with three boxes — A, B, and C — such that the optimal policy first opens box C, but the question of whether it subsequently opens box A before B or vice-versa depends on the value of the prize discovered inside box C. As before, one can easily show that this variant of Pandora’s problem belongs to PSPACE, but unlike Weitzman’s version of Pandora’s problem, there is no evidence that this version is easier than PSPACE-complete.
These considerations motivate the study of approximately optimal policies that are computationally efficient, structurally simple, or both. Our work initiates this study.
1.1 Our results and techniques
To put our results in context, we begin this section with an easy observation showing that simple, computationally efficient policies can attain at least a -approximation to the optimal policy. Consider the following two policies.
[Policy A] Run Weitzman’s optimal policy, ignoring the fact that the searcher has the option to select boxes without opening them. 2. 2.
[Policy B] Leave every box closed, and select the one with the highest expected value.
Among all adaptive policies, Policy A is the one that maximizes the expected net contribution (i.e., the value if selected, minus the inspection cost) of open boxes, whereas Policy B maximizes the expected net contribution of closed boxes. Hence the combined value of Policies A and B bounds from above the combined value that the optimal policy obtains from both open and closed boxes. The better of A and B must consequently attain at least half the value of the optimal policy.
For any specified , it is not hard to construct a problem instance such that neither Policy A nor Policy B attains more than of the value of the optimal policy. To achieve a better approximation factor, we focus on a broader class of simple policies that includes both of the aforementioned ones.
Let us define a committing policy to be one that, before it opens any boxes, must pre-commit to a partition of the boxes into a set of boxes that will never be opened and a set that will never be selected without first being opened; in addition it pre-commits to an order in which the boxes in the latter set will be opened. Such a policy is almost non-adaptive; the only way in which it may adjust its behavior in response to information revealed during the search process is that it may terminate the search early. In this sense, questions about the ability of committing policies to approximate the optimal adaptive policy are akin to questions about adaptivity gaps in stochastic optimization [1, 2, 6, 7].
The foregoing discussion inspires two interrelated questions.
Question 1
For which values of is there a polynomial-time algorithm that -approximates the optimal adaptive policy?
Question 2
What is the worst-case ratio between the value of the optimal committing policy and that of the optimal adaptive policy?
We show that for the general case of Pandora’s problem with nonobligatory inspection, there is a polynomial-time algorithm to identify the optimal committing policy, and this policy always attains at least fraction of the optimal policy’s value. This furnishes a non-trivial lower bound on the answers to Questions 1 and 2 above. Our second main result fully settles Question 2 for the case of two boxes: we show that the optimal committing policy is always at least a -approximation to the optimal adaptive policy, and that this approximation factor is tight. The main question left open by our work is whether the factor of for the general case of boxes can be improved. We conjecture that the answer is yes. In fact, we believe it is plausible that the ratio between the values of the optimal committing policy and the optimal adaptive policy is never less than , even when the number of boxes is greater than 2.
In the remainder of this section, we briefly discuss the techniques used to achieve these results. Our proof that committing policies attain a -approximation to the value of the optimal adaptive policy starts with a crucial observation: Pandora’s problem with nonobligatory inspection can be recast as an equivalent problem in which inspection is obligatory, and the boxes are grouped into pairs each consisting of one of the boxes from the original (nonobligatory) problem instance paired with a “doppelganger” whose inspection cost is zero and whose value is deterministically equal to the expected value of the first box. To make the problem with paired boxes equivalent to the original problem instance, we must impose an additional constraint that search policies for the paired-box problem may open at most one of the two boxes in each pair. This reduction, which appears simple and natural in hindsight, is crucial because it enables the application of two powerful tools. The first is a lemma of Kleinberg, Waggoner, and Weyl [8] that reduces the analysis of policies for Pandora’s problem and generalizations to the analysis of algorithms for the same optimization problem when the values of items are revealed for free, but are sampled from modified distributions. In Section 2.2 we generalize the lemma to account for policies that may select a box without opening it, a generalization which is vital for our application. The second tool is a theorem of Asadpour and Nazerzadeh about the adaptivity gap of stochastic submodular function maximization problems. Once Pandora’s problem with nonobligatory inspection has been transformed into a form where these two ingredients apply, the derivation of the -approximation result becomes nearly automatic. The combination of the two ingredients — the Kleinberg-Waggoner-Weyl amortization lemma from [8] together with adaptivity gaps for stochastic probing — was pioneered by Singla [10] to solve a problem he refers to as constrained utility maximization in the price of information model, which generalizes our paired-box problem with probing constraints.
To prove that the optimal committing policy can be identified in polynomial time we combine three easy observations.
Of the ways of partitioning the boxes into those that always remain closed and those that are never selected while closed, we need only consider the partitions in which there is at most one box of the former type. 2. 2.
For a fixed partition of boxes into two sets as above, the optimal committing policy constrained to use this partition can easily be determined by applying Weitzman’s theorem. 3. 3.
The value of this constrained optimal policy can be calculated in polynomial time.
Finally, to show that the gap between committing policies and fully adaptive policies is in the special case of two boxes, we express the value of the optimal policy as a convex combination of two quantities: its expected value conditional on selecting a closed box, and its expected value conditional on selecting an open box. We then design a probability distribution over committing policies whose expected value can be bounded below by a weighted sum of the same two quantities. Minimizing the ratio of these two weighted sums boils down a question about minimizing a specific bivariate function, which can be solved by direct calculation.
1.2 Related Work
We have already discussed the foundational work on optimal search theory in economics, particlarly Weitzman’s paper [12] that introduced Pandora’s problem and derived its solution. The optimality of Weitzman’s procedure turns out to be a special case of the Gittins Index Theorem [4, 5], which ironically was proven earlier although Weitzman obtained his results independently and the connection between these two theorems was only realized afterward.
Doval [3] was the first to address Pandora’s problem with nonobligatory inspection, though special cases were anticipated in earlier unpublished work by Postl [9]. In addition to examples illustrating that optimal policies in general need to be adaptive (as described above), Doval’s main results identify sufficient conditions for the optimal policy to have a simple structure. In particular, Theorem 1 in [3] identifes a sufficient condition under which the optimal policy is a committing policy. The sufficient condition is quite technical, but one corollary is that a committing policy is optimal whenever boxes have equal inspection costs and are totally ordered by the “mean-preserving spread” relation. Doval also provides a complete solution for the case when the boxes have equal costs, the value of each is sampled from a distribution with two-point support, and the lower support point is the same for all boxes.
The blending of Pandora’s problem with ideas from combinatorial optimization and algorithmic game theory was initiated by Kleinberg, Waggoner, and Weyl [8]. Their paper introduced a novel method for analyzing optimal and approximately-optimal policies for Pandora’s problem and generalizations, by relating the expected utility of the policy to expected values of related quantities in a simpler environment without inspection costs. The paper primarily applies this method to analyze the price of anarchy of a descending price auction when bidders face a cost to inspect their own value, but it also analyzes various extensions including one in which inspection is optional; the price of anarchy of the descending auction in this setting is shown to be no worse than . Singla [10] applied the analysis technique introduced in [8] to a much broader family of combinatorial optimization problems, providing a general transformation to convert frugal algorithms (a type of greedy algorithm) for combinatorial optimization problems into policies for solving combinatorial counterparts to Pandora’s problem, i.e., generalizations in which the searcher still must pay a cost to open each box, but may be allowed to select multiple boxes, subject to feasibility constraints on the set of selected boxes. As noted earlier, among the problems solved in [10] is a constrained utility maximization problem featuring probing constraints that generalize the probing constraint in our paired-box problem.
Adaptivity gaps have been studied for various stochastic optimization problems. Any such problem consists of a set of elements whose values are independent random variables. The algorithm knows the distributions of these variables, but not the actual realizations. The only way to learn the actual realizations is to probe these elements. If the value of the optimal adaptive probing policy can always be approximated, to within a factor of , by the value of a simple policy that performs probes in a fixed, predetermined order until a stopping time is reached, then we say the problem has an adaptivity gap of . In one of the earliest papers on adaptivity gaps in stochastic optimization, Dean, Goemans, and Vondrak [2] studied a stochastic variant of the knapsack problem, where items have deterministic values but their sizes are independent random variables and the act of placing an item in the knapsack reveals its size. They showed that adaptivity gap is constant and provided constant factor non-adaptive approximations.
The proof of our main result makes use of adaptivity gaps for stochastic submodular optimization with constraints on probing. Asadpour and Nazerzadeh [1] bound the adaptivity gap to for maximizing stochastic monotone submodular functions when elements to probe should satisfy matroid feasibility constraints. Adaptivity gaps for much more general families of constraints were subsequently proven by Gupta, Nagarajan, and Singla [6, 7]. In addition to feasibility constraints over sets of elements to probe, there may also be constraints on the ordering of the probes. Gupta, Nagarajan, and Singla showed a constant adaptivity gap for submodular functions under arbitrary prefix-closed constraints on the sequence of elements probed [7].
2 Preliminaries
In this section, we formally define our model and discuss two related problems: Pandora’s problem with required inspection and maximizing a stochastic monotone submodular function. Then we introduce a class of search procedures called committing policies and explain why the optimal committing policy has a simple structure and is computationally easy to identify and implement.
2.1 Model
An agent has a set of boxes. Box , , contains a prize, , distributed according to distribution with expected value . Prizes inside boxes are independently distributed. Box has inspection cost . While and are known; is not.
The agent sequentially inspects boxes, and search is with recall. Given a set of uninspected boxes, , and a vector of realized sampled prizes, , the agent decides whether to stop or to continue search; if she decides to continue search she decides which box in to inspect next. If she decides to inspect box , she pays cost to instantaneously learn her value . If she decides to stop search, she can choose to select whichever box she pleases, regardless of whether it is inspected or not. We use as an indicator for box being inspected and as an indicator for the agent obtaining box . Since one box can be obtained, . The agent is an expected utility maximizer, where utility, , is defined as the value of the box selected minus the sum of inspection costs paid. Given , the vector of realized sampled prizes, and the two vectors of indicator variables, and , respectively indicating which boxes were selected and inspected, we have:
[TABLE]
2.2 Required Inspection
Consider imposing the additional constraint that a box can only be selected after it is inspected. In other words, we require for each .
Weitzman [12] finds the optimal procedure to maximize expected utility when inspection is required. The optimal solution is an index-based policy, in which the agent inspects boxes in decreasing order of their indices, , where is the unique solution to
[TABLE]
and is also known as the reservation value of box . The search stops either when one of the realized values is above the reservation value of every remaining uninspected box, or when the agent has inspected all of the boxes.
Kleinberg et al. [8] develop a new interpretation of Weitzman’s characterization. They introduce an important property of policies that we will call “non-exposure”, defined as follows.
Definition 1** (Non-exposed Policy).**
A policy is non-exposed if it is guaranteed to select any inspected box whose value is found to satisfy . In other words, a policy is non-exposed if the event has probability zero, for every box .
The key to the analysis of Weitzman’s optimal policy in [8] is a family of random variables defined for each box . Kleinberg et al. prove that for any policy that satisfies the required-inspection constraint , the net contribution of box to the expected value of the policy is bounded above by , with equality if and only if the policy is non-exposed.
Lemma 2**.**
[8]** Given any and any policy that satisfies pointwise,
[TABLE]
Furthermore, this holds with equality for every box if and only if the policy is non-exposed.
Lemma 2 can be interpreted as providing an accounting scheme that amortizes a policy’s expected inspection costs by deducting them from the expected value of the box it eventually selects. This accounting scheme exactly characterizes the value of non-exposed policies, and furnishes an upper bound on the value of every other policy. The benefit of the amortization is that it reduces the problem of analyzing policies for Pandora’s problem to the (generally simpler) problem of analyzing rules for selecting boxes in an environment where the value of box is , and this value can be queried at no cost. A first application of this technique is the following characterization of the optimal policy with required inspection, and its expected utility.
Corollary 3**.**
[8]** Weitzman’s policy on boxes with distributions and inspection costs , achieves expected utility ; the expected utility of any other policy cannot exceed this bound.
Since Pandora’s problem with nonobligatory inspection allows policies that may violate the inequality , in the sequel we will need a generalization of Lemma 2 that pertains to such policies.
Lemma 4**.**
Given any policy for Pandora’s problem with nonobligatory inspection, and any box , let
[TABLE]
The inequality
[TABLE]
is always satisfied, and the two sides are equal for every box if and only if the policy is non-exposed.
Proof.
First observe that is independent of , hence
[TABLE]
Both of these equations will be used in the sequel.
To prove the inequality asserted in the lemma, we will prove the following inequality of conditional expectations pointwise, then integrate over .
[TABLE]
There are two cases to consider. When , is conditionally independent of because the contents of box are never even inspected, so can have no influence on the decision whether to select box or not. Hence
[TABLE]
which establishes that the integrands on the two sides of inequality (5) are equal when . When we use the equation in the following manipulation.
[TABLE]
Hence inequality (5) also holds when .
The final sentence of the lemma asserts a necessary and sufficient condition for equality in (2). To justify this condition, note that every step in the derivation of inequality (2) is an equation except for the inequality
[TABLE]
Hence, strict inequality holds in (2) if and only if there is a positive probability that and . The relations and hold precisely when the policy violates the definition of non-exposure. ∎
2.3 Stochastic Submodular Maximization
Consider the problem of maximizing a stochastic monotone submodular function with respect to a matroid constraint . Suppose is a function of random variables, namely, . Assume is submodular, meaning
[TABLE]
where and respectively denote the coordinate-wise minimum and maximum of vectors and .
A policy picks the elements to inspect one by one (perhaps, based on the realized value of the previous elements) until it stops. Once stops, the current state is a random vector , where denotes the realization of , if is inspected by the policy, and is equal to [math] otherwise. The objective of stochastic submodular maximization is to optimize the expected value of a policy, i.e., , subject to feasibility. The feasibility constraint is modeled using a matroid. For a given matroid defined on the ground set of the aforementioned random variable set , a policy is called feasible if the subset of random variables it inspects is always an independent set of .
Asadpour and Nazerzadeh [1] compare the performance of the best adaptive and non-adaptive policies. In adaptive policies, at each point in time all the information regarding the previous inspections of the policy is known. In other words, the policy has access to the actual realized value of all the elements it has inspected so far. In contrast, non-adaptive policies do not have access to such information and should make their decisions (about which random variables to inspect) before observing the outcome of any of them. They show that there exists a non-adaptive policy that achieves at least a fraction of the value of the optimal adaptive policy.
Lemma 5**.**
[1]** There exists a non-adaptive policy that achieves fraction of the optimal policy in maximizing a stochastic monotone submodular function with respect to matroid feasibility.
We now use the multilinear relaxation of to define the value of fractional non-adaptive policies [1]. A fractional non-adaptive policy is determined by a vector . This policy inspects elements in the (random) set , a set that is defined to include each with probability , independently for each .
We use to denote the expected value obtained by the fractional non-adaptive policy associated with . Using the notation to denote the random vector when is the non-adaptive policy associated with set , we have
[TABLE]
Lemma 6**.**
[1]** For any monotone submodular function with matroid feasibility constraint, for any in the base polytope of , there exists an integral (deterministic) non-adaptive policy with expected value greater than or equal to .
2.4 Committing Policies with Nonobligatory Inspection
Consider the problem of maximizing expected utility for the box problem with nonobligatory inspection (as discussed in Section 2.1). A class of policies that will be central to our analysis are the committing policies, which were discussed in Section 1 and are defined formally as follows.
Definition 7** (Committing Policy).**
A policy is called committing if there exists a partition of the boxes into two sets, and , and a total ordering of the elements of , denoted by , such that the following properties hold.
The policy never inspects a box in : 2. 2.
The policy never selects a box in before inspecting it: 3. 3.
If and then the policy never inspects before it has inspected .
The set is called the reservation set of the committing policy.
Among committing policies with a fixed reservation set, , it is easy to identify the one that maximizes expected utility.
Definition 8**.**
Policy simulates running Weitzman’s optimal policy on a modified set of boxes, in which the boxes in are unchanged, but each box in is modified so that its inspection cost is zero, and its value distribution is a point mass on . When the policy in the simulation inspects or selects a box in , policy performs the same operation. When it inspects a box in , policy instead selects the same box without inspecting it.
The proof of the following lemma is easy, and we defer it to Appendix A, along with the (also easy) proofs of the remaining two lemmas in this section.
Lemma 9**.**
For every , policy attains the highest expected utility among all committing policies with reservation set .
According to Lemma 9, the optimal committing policy must be one of the elements in the set . In fact, it is easy to see that the optimal committing policy must belong to a much smaller set with just elements. Define to be the Weitzman’s optimal policy on the given (unmodified) set of boxes; equivalently . Also, for , define to be the optimal committing policy with reservation set .
Lemma 10**.**
The optimal committing policy always belongs to the set .
Lemma 11**.**
For any , the expected utility of policy can be computed in time , where is the maximum number of support points in any of the distributions .
Therefore, one can identify the optimal committing policy in polynomial time by evaluating the expected utility of each policy in the set and selecting the best of these alternatives.
3 Approximation
In this section we analyze the worst-case ratio between the value of the optimal committing policy and that of the optimal policy.
Theorem 12**.**
At least one of policies and , , achieves at least of the optimal utility for the box problem with nonobligatory inspection.
We establish a correspondence between the box problem and stochastic submodular optimization. Recall from Section 2.3 that an instance of stochastic submodular optimization is specified by a set of random variables , a submodular function , and a matroid with ground set . A policy chooses (either adaptively or non-adaptively) a subset of random variables whose values it probes, subject to the constraint that must be an independent set in . The value obtained when running policy is the random variable , where denotes the random vector specified by setting if and otherwise.
Definition 13**.**
[Associated Stochastic Optimization Problem] Given an instance of Pandora’s problem with nonobligatory inspection, having boxes with costs and values , the associated stochastic optimization problem has random variables denoted by
[TABLE]
submodular objective function
[TABLE]
and matroid constraint defined by the partition matroid whose independent sets are all the subsets of that contain at most one element of each pair . The distributions of the random variables are defined as follows: is drawn from the same distribution as , whereas is deterministically equal to .
Probing the first element of pair in the associated stochastic optimization problem corresponds to inspecting box in the box problem. Probing the second element of the pair corresponds to selecting box uninspected. This correspondence is formalized by the following pair of policy transformations.
Definition 14**.**
Let denote an instance of Pandora’s problem with nonobligatory inspection, and let denote the associated stochastic optimization problem.
If is any (possibly adaptive) policy for Pandora’s problem let denote the adaptive policy for that simulates running in and performs the following sequence of probes: whenever inspects box , probes , and whenever stops and selects any box, probes every variable in the set , where denotes the set of boxes in that were uninspected at the moment when stopped.
If is a non-adaptive policy for stochastic optimization problem and is the set of random variables that probes, let denote the set of boxes and let denote the committing policy for Pandora’s problem .
In the following lemmas, as in the preceding definition, denotes an instance of Pandora’s problem with nonobligatory inspection and denotes its associated stochastic optimization problem. If is a policy for either problem or , we will use the notation to denote the expected utility of running policy . In the case of Pandora’s problem this means . In the case of the associated stochastic optimization problem it means .
Lemma 15**.**
If is a non-adaptive policy for and is the corresponding committing policy for , then
[TABLE]
Proof.
Since is a committing policy, the inequality follows directly from Lemma 10, so we focus on the inequality for the remainder of the proof.
Couple the probability spaces of the two optimization problems such that when the prize inside box is , the value of random variable equals . Note that such a coupling exists, because the random variables are mutually independent and has the same marginal distribution as by construction.
By construction, policy is non-exposed. According to Lemma 2, then,
[TABLE]
where if and if . As for , by the definition of and of we have
[TABLE]
where if , if , and otherwise. In the former two cases whereas in the third case . Hence pointwise. Combining this inequality with (10)-(11) and using the fact that the random variables are mutually independent, as are , the inequality follows. ∎
Lemma 16**.**
If is any (possibly adaptive) policy for Pandora’s problem , and is the corresponding policy for the associated stochastic optimization problem, then .
Proof.
As in the proof of Lemma 15, couple the probability spaces of the two optimization problems such that the value of the random variable equals . By construction of policy , the set of random variables it probes is . Hence, if we define when and when , then we have
[TABLE]
Lemma 4 implies the following upper bound on .
[TABLE]
Combining this relation with inequality (12) completes the proof. ∎
Proof of Theorem 12.
If denotes the optimal policy for an instance of Pandora’s problem with nonobligatory inspection, and denotes the associated stochastic optimization problem, let denote an optimal non-adaptive policy for . We have the chain of inequalities
[TABLE]
where the first inequality is Lemma 15, the second is Lemma 5, and the third is Lemma 16. ∎
4 4/5 Approximation for Two Boxes
In this section we show that for the case of two boxes, , the best of policies achieves at least utility of the optimal policy. We also provide a tight example for the approximation factor.
Theorem 17**.**
At least one of policies achieves at least utility of the optimal policy for the box problem with nonobligatory inspection in a setting with two boxes. This approximation factor is tight.
The proof supplies an upper bound on the optimal value by characterizing the optimal policy in the two-box case. Using ideas similar to those of Asadpour and Nazerzadeh [1], given the optimal policy we consider a corresponding fractional non-adaptive policy. By comparing the better of the fractional non-adaptive policy and a policy that leaves all boxes uninspected, with the optimal policy we show that of the optimal is achievable.
Optimal Policy Characterization and Evaluation
We first characterize the potential optimal policies in a problem with two boxes. The following lemma summarizes some trivial observations, hence its proof is omitted.
Lemma 18**.**
The optimal policy in the two-box problem with nonobligatory inspection falls into one of three categories:
it always selects an open box; 2. 2.
it always selects a closed box; 3. 3.
it sometimes selects an open box and sometimes a closed box.
*In case 1, the policy is equivalent to with expected utility equal to .
In case 2, the expected utility equals . Suppose the equality holds for index . In this case .*
The best of and achieves the optimal value in cases 1 and 2. Therefore we only need to show that the approximation holds for case 3 where the optimal policy starts with inspecting a box. Without loss of generality, suppose that the optimal policy starts with inspecting box .
Lemma 19**.**
If the optimal policy starts with inspecting box , it selects box without inspecting it only if is less than threshold where is the solution to .
Proof.
Consider the realized . The agent has the option to choose between (the value of selecting box without inspecting it) and (the value of emulating Weitzman’s policy). To maximize the expected value, is chosen only if . ∎
Let . The optimal policy achieves utility .
Lemma 20**.**
In the optimal policy that starts with inspecting box and selects uninspected open box 2 with probability , the expected utility achieved is . Let be a random variable distributed according to the conditional distribution of given the event . Then the expected utility is
[TABLE]
Lower Bound on the Optimal Non-adaptive Policy
Let NonAdapt be a fractional non-adaptive policy (defined in Equation 8) that inspects each element with the marginal probabilities of its inspection in the optimal policy. For our case, in pair , the first element is inspected with probability and the second element with probability [math]. In pair , the first element is inspected with probability and the second element with probability . Since the probability of inspection of elements of each pair sums to , NonAdapt belongs to the base polytope of the partition matroid.
Consider a modified random variable for the first element of pair with a dominated distribution. Let this random variable be [math] with probability and with probability , where is a random variable distributed according to the conditional distribution of given the event . Due to the independence of random variables in non-adaptive policies and the monotonicity of maximization, this modification results in a fractional non-adaptive policy with a (weakly) lower value. Since value [math] has no effect in maximizing non-negative numbers, we can consider the following modified realizations for our lower bound on the fractional non-adaptive policy: random variables , and are inspected with probabilities , , and respectively.
By Formula (8), for the expected value of NonAdapt we have:
[TABLE]
Using , for the first four terms we have:
[TABLE]
Using the same argument, for the last three terms we have:
[TABLE]
Therefore
[TABLE]
Another valid non-adaptive policy is with value at least .
[TABLE]
Inequalities 15, 16 and Lemma 10 imply:
[TABLE]
Comparing the Optimal Adaptive and Non-Adaptive Policies
We compare the lower bound on the optimal non-adaptive policy, with the utility of the optimal policy from Equation 14, , and show that the ratio is at least . Note that by Lemma 19, . Let where . We have:
[TABLE]
The formula for the first part is decreasing in for a fixed and achieves its minimum at . The formula for the second part is increasing with fixed and therefore achieves its minimum at . Therefore the maximum ratio occurs at and is equal to:
[TABLE]
Since ,
[TABLE]
This concludes the proof of Theorem 17.
The following is a tight example for Theorem 17.
Example 1**.**
Consider boxes A and B. Suppose box A has value [math] with probability and value with probability , and its inspection cost is [math]. Box B has value [math] with probability and value with probability ; and its inspection cost is .
The optimal policy starts with inspecting box A, and if the value is 0, selects uninspected box B. If the value of box A is 1, the optimal policy inspects box B and takes the maximum value of the two boxes. The expected utility of this policy is
[TABLE]
which approaches as goes to infinity.
Policies , and each achieve utility : Policy , inspects both boxes and obtains the maximum value. The expected utility in this case is . Policy starts by inspecting box B. If box B has value , it selects it. Otherwise it selects uninspected box A. Therefore it has utility . Policy inspects box A. If the value is 0, it selects uninspected box B. If the value of box A is 1, it is indifferent between selecting box A and uninspected box B. The expected utility in this case is .
Appendix A Omitted Proofs Concerning Committing Policies
In this section we reiterate and prove Lemmas 9, 10 and 11, which concern the structure and computation of optimal committing policies.
Lemma 21** (Lemma 9 restated).**
For every , policy attains the highest expected utility among all committing policies with reservation set .
Proof.
Define a modified set of boxes as in Definition 8. Observe that in this modified problem instance, for any box we have since . Since the value of box is deterministically equal to , whenever Weitzman’s policy inspects box it finds that and hence it immediately selects . Thus, every execution path of Weitzman’s policy on the modified set of boxes can be represented by a sequence of operations, each of which is either inspecting a box in , selecting a box in , or inspecting-and-immediately-selecting a box in . Policy duplicates each of these three types of operations and receives the same cost or expected benefit whenever it performs one of them, hence the expected utility of running Weitzman’s optimal policy on the modified problem instance equals the expected utility of running on the original instance.
We must now show that no other committing policy with reservation set can attain a higher expected utility. This is quite easy to do, using the fact that Weitzman’s policy is optimal for the modified instance. If is any committing policy with reservation set , there is a corresponding policy for the modified set of boxes that operates as follows: when inspects or selects a box in , performs the same operation. When selects a box in , inspects and immediately selects that box. (There is no need to define the behavior of when inspects a box in since that event never happens.) The utility of running on the modified set of boxes is the same as the utility of running on the original set of boxes, since the extra inspection operations that performs on elements of have zero cost. Since the utility of running Weitzman’s policy on the modified set of boxes is an upper bound on the utility of running , it follows that the utility of running is an upper bound on the utility of running , as claimed. ∎
Lemma 22** (Lemma 10 restated).**
The optimal committing policy always belongs to the set .
Proof.
Suppose is any set of two or more elements, and consider any two distinct elements with . A committing policy with reservation set can never open box or box , and the operation of selecting closed box is always dominated by the operation of selecting closed box . Hence, any committing policy with reservation set is dominated by a committing policy with reservation set . In particular, the optimal such policy, , has at least as much expected utility as . ∎
Lemma 23** (Lemma 11 restated).**
For any , the expected utility of policy can be computed in time , where is the maximum number of support points in any of the distributions .
Proof.
Let us start with the case . According to Corollary 3, the expected utility of Weitzman’s optimal policy, , is equal to . Let denote the cumulative distribution function of , i.e.
[TABLE]
Then we have the formula
[TABLE]
The integrand on the right side is a step function with at most steps, since every discontinuity in the step function belongs to the union of the support sets of the distributions of . Hence the integral can be computed in time by simply summing over the steps.
Computing the expected utility of policy in the general case of reduces to the special case , because the expected utility of is equal to the expected utility of Weitzman’s policy on a modified set of boxes, as was shown in the proof of Lemma 9. ∎
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Arash Asadpour and Hamid Nazerzadeh. Maximizing stochastic monotone submodular functions. Management Science , 62(8):2374–2391, 2015.
- 2[2] Brian C. Dean, Michel X. Goemans, and Jan Vondrák. Approximating the stochastic knapsack problem: The benefit of adaptivity. Mathematics of Operations Research , 33(4):945–964, 2008.
- 3[3] Laura Doval. Whether or not to open Pandora’s box. Journal of Economic Theory , 175:127 – 158, 2018.
- 4[4] J. C. Gittins. Bandit processes and dynamic allocation indices. Journal of the Royal Statistical Society , 41(2):148–177, 1979.
- 5[5] J. C. Gittins and D. M. Jones. A dynamic allocation index for the sequential design of experiments. pages 241–266, 1974.
- 6[6] Anupam Gupta, Viswanath Nagarajan, and Sahil Singla. Algorithms and adaptivity gaps for stochastic probing. In Proc. 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1731–1747. SIAM, 2016.
- 7[7] Anupam Gupta, Viswanath Nagarajan, and Sahil Singla. Adaptivity gaps for stochastic probing: Submodular and xos functions. In Proc. 28th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 1688–1702. SIAM, 2017.
- 8[8] Robert Kleinberg, Bo Waggoner, and E. Glen Weyl. Descending price optimally coordinates search. In Proc. 17th ACM Conference on Economics and Computation (EC) , pages 23–24, 2016. ar Xiv:1603.07682 [cs.GT].
