Improved Regret and Contextual Linear Extension for Pandora's Box and Prophet Inequality
Junyan Liu, Ziyun Chen, Kun Wang, Haipeng Luo, Lillian J. Ratliff

TL;DR
This paper introduces improved algorithms for the Pandora's Box problem and Prophet Inequality in online learning settings, achieving lower regret bounds and extending to contextual linear models with time-varying features.
Contribution
It presents a novel algorithm that reduces regret bounds for Pandora's Box and Prophet Inequality problems, including extensions to contextual linear reward models.
Findings
Achieves $ ilde{O}( oot{n}T)$ regret for Pandora's Box problem.
Extends results to contextual linear reward settings with $ ilde{O}(nd oot{T})$ regret.
Applies techniques successfully to online Prophet Inequality problem.
Abstract
We study the Pandora's Box problem in an online learning setting with semi-bandit feedback. In each round, the learner sequentially pays to open up to boxes with unknown reward distributions, observes rewards upon opening, and decides when to stop. The utility of the learner is the maximum observed reward minus the cumulative cost of opened boxes, and the goal is to minimize regret defined as the gap between the cumulative expected utility and that of the optimal policy. We propose a new algorithm that achieves regret after rounds, which improves the bound of Agarwal et al. [2024] and matches the known lower bound up to logarithmic factors. To better capture real-life applications, we then extend our results to a natural but challenging contextual linear setting, where each box's expected reward is linear in some known but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
