Irrevocable Multi-Armed Bandit Policies
Vivek Farias, Ritesh Madan

TL;DR
This paper introduces an 'irrevocable' heuristic for multi-armed bandit problems with multiple simultaneous pulls, balancing limited exploration with near-optimal performance, especially in coin-based bandit scenarios.
Contribution
The paper proposes a novel irrevocable heuristic for multi-armed bandits, providing theoretical bounds and demonstrating practical effectiveness with minimal exploration costs.
Findings
Irrevocable heuristic achieves up to 10% loss compared to optimal policies.
Expected rewards are within a factor of 1/8 of the unrestricted optimal.
The heuristic is robust across various problem parameters.
Abstract
This paper considers the multi-armed bandit problem with multiple simultaneous arm pulls. We develop a new `irrevocable' heuristic for this problem. In particular, we do not allow recourse to arms that were pulled at some point in the past but then discarded. This irrevocable property is highly desirable from a practical perspective. As a consequence of this property, our heuristic entails a minimum amount of `exploration'. At the same time, we find that the price of irrevocability is limited for a broad useful class of bandits we characterize precisely. This class includes one of the most common applications of the bandit model, namely, bandits whose arms are `coins' of unknown biases. Computational experiments with a generative family of large scale problems within this class indicate losses of up to 5 to 10% relative to an upper bound on the performance of an optimal policy with no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Data Stream Mining Techniques · Auction Theory and Applications
