An Improved Parametrization and Analysis of the EXP3++ Algorithm for Stochastic and Adversarial Bandits
Yevgeny Seldin, G\'abor Lugosi

TL;DR
This paper introduces an improved version of the EXP3++ algorithm for multi-armed bandits, reducing regret dependence on time horizon in stochastic settings while maintaining adversarial guarantees.
Contribution
It proposes a new gap estimation strategy that enhances the EXP3++ algorithm, improving regret bounds in stochastic regimes without affecting adversarial performance.
Findings
Reduced regret dependence from $( ln t)^3$ to $( ln t)^2$ in stochastic settings
Eliminated additive regret factor related to the minimal gap $\\Delta$
Maintained existing regret guarantees in adversarial regimes
Abstract
We present a new strategy for gap estimation in randomized algorithms for multiarmed bandits and combine it with the EXP3++ algorithm of Seldin and Slivkins (2014). In the stochastic regime the strategy reduces dependence of regret on a time horizon from to and eliminates an additive factor of order , where is the minimal gap of a problem instance. In the adversarial regime regret guarantee remains unchanged.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Fault Detection and Control Systems
