Optimism in the Face of Ambiguity Principle for Multi-Armed Bandits
Mengmeng Li, Daniel Kuhn, Bahar Ta\c{s}kesen

TL;DR
This paper introduces a novel FTPL algorithm for multi-armed bandits that balances computational efficiency with optimal regret, using ambiguity-aware perturbations and a new arm sampling method.
Contribution
It proposes an ambiguity-based FTPL algorithm that unifies and generalizes existing methods, achieving optimal regret with low computational costs.
Findings
Achieves optimal regret for adversarial and stochastic bandits.
Provides a bisection algorithm up to 10,000 times faster than standard methods.
Generalizes existing FTPL and FTRL algorithms, settling conjectures.
Abstract
Follow-The-Regularized-Leader (FTRL) algorithms often enjoy optimal regret for adversarial as well as stochastic bandit problems and allow for a streamlined analysis. Nonetheless, FTRL algorithms require the solution of an optimization problem in every iteration and are thus computationally challenging. In contrast, Follow-The-Perturbed-Leader (FTPL) algorithms achieve computational efficiency by perturbing the estimates of the rewards of the arms, but their regret analysis is cumbersome. We propose a new FTPL algorithm that generates optimal policies for both adversarial and stochastic multi-armed bandits. Like FTRL, our algorithm admits a unified regret analysis, and similar to FTPL, it offers low computational costs. Unlike existing FTPL algorithms that rely on independent additive disturbances governed by a \textit{known} distribution, we allow for disturbances governed by an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI
MethodsSparse Evolutionary Training
