A Blackbox Approach to Best of Both Worlds in Bandits and Beyond
Christoph Dann, Chen-Yu Wei, Julian Zimmert

TL;DR
This paper introduces a blackbox reduction technique that transforms existing online learning algorithms into ones that achieve optimal regret in both stochastic and adversarial environments across various domains.
Contribution
It provides a general reduction from best-of-both-worlds guarantees to a broad class of FTRL and OMD algorithms, enabling new guarantees without specialized tuning.
Findings
Achieves $O( ext{log}(T))$ regret in stochastic regime.
Achieves $ ilde{O}( ext{sqrt}(T))$ regret in adversarial regime.
Extends to contextual bandits, graph bandits, and MDPs.
Abstract
Best-of-both-worlds algorithms for online learning which achieve near-optimal regret in both the adversarial and the stochastic regimes have received growing attention recently. Existing techniques often require careful adaptation to every new problem setup, including specialised potentials and careful tuning of algorithm parameters. Yet, in domains such as linear bandits, it is still unknown if there exists an algorithm that can simultaneously obtain regret in the stochastic regime and regret in the adversarial regime. In this work, we resolve this question positively and present a general reduction from best of both worlds to a wide family of follow-the-regularized-leader (FTRL) and online-mirror-descent (OMD) algorithms. We showcase the capability of this reduction by transforming existing algorithms that are only known to achieve worst-case…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Adversarial Robustness in Machine Learning · Machine Learning and Algorithms
