Online Learning with Probing for Sequential User-Centric Selection
Tianyi Xu, Yiting Chen, Henger Li, Zheyong Bian, Emiliano Dall'Anese, Zizhan Zheng

TL;DR
This paper introduces the PUCS framework for sequential decision-making with costly probing, providing algorithms with provable guarantees for both offline and online settings, and demonstrating effectiveness on real data.
Contribution
It formalizes the PUCS framework, proposes a greedy algorithm with approximation guarantees, and develops OLPA with regret bounds for online learning, filling a gap in resource-aware sequential decision-making.
Findings
Greedy probing algorithm achieves a constant-factor approximation.
OLPA algorithm attains near-optimal regret bounds.
Experimental results validate the proposed methods' effectiveness.
Abstract
We formalize sequential decision-making with information acquisition as the probing-augmented user-centric selection (PUCS) framework, where a learner first probes a subset of arms to obtain side information on resources and rewards, and then assigns plays to arms. PUCS covers applications such as ridesharing, wireless scheduling, and content recommendation, in which both resources and payoffs are initially unknown and probing is costly. For the offline setting with known distributions, we present a greedy probing algorithm with a constant-factor approximation guarantee . For the online setting with unknown distributions, we introduce OLPA, a stochastic combinatorial bandit algorithm that achieves a regret bound . We also prove a lower bound , showing that the upper bound is tight up to logarithmic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
