Minimax Optimal Algorithms for Adversarial Bandit Problem with Multiple Plays
N. Mert Vural, Hakan Gokcesu, Kaan Gokcesu, Suleyman S. Kozat

TL;DR
This paper presents a new minimax optimal algorithm for the adversarial multi-play bandit problem with semi-bandit feedback, achieving improved regret bounds and strong empirical performance without statistical assumptions.
Contribution
Introduces a novel expert advice algorithm for multi-play bandits that achieves minimax optimal regret and improves high-probability bounds.
Findings
Achieves asymptotic minimax optimal regret bounds.
Improves high-probability bounds by O(√m).
Demonstrates significant empirical performance gains.
Abstract
We investigate the adversarial bandit problem with multiple plays under semi-bandit feedback. We introduce a highly efficient algorithm that asymptotically achieves the performance of the best switching -arm strategy with minimax optimal regret bounds. To construct our algorithm, we introduce a new expert advice algorithm for the multiple-play setting. By using our expert advice algorithm, we additionally improve the best-known high-probability bound for the multi-play setting by . Our results are guaranteed to hold in an individual sequence manner since we have no statistical assumption on the bandit arm gains. Through an extensive set of experiments involving synthetic and real data, we demonstrate significant performance gains achieved by the proposed algorithm with respect to the state-of-the-art algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
