Follow the Leader If You Can, Hedge If You Must
Steven de Rooij, Tim van Erven, Peter D. Gr\"unwald, Wouter M. Koolen

TL;DR
The paper introduces FlipFlop, a novel algorithm that combines the strengths of Follow-the-Leader and hedging strategies, achieving low regret in stochastic settings and strong worst-case guarantees.
Contribution
It presents FlipFlop, the first method to provably merge FTL's efficiency with hedging strategies' robustness, along with AdaHedge for dynamic learning rate tuning.
Findings
FlipFlop achieves regret close to FTL without losing worst-case guarantees.
AdaHedge improves dynamic learning rate tuning over previous methods.
Both algorithms are invariant under loss rescaling and can handle negative losses.
Abstract
Follow-the-Leader (FTL) is an intuitive sequential prediction strategy that guarantees constant regret in the stochastic setting, but has terrible performance for worst-case data. Other hedging strategies have better worst-case guarantees but may perform much worse than FTL if the data are not maximally adversarial. We introduce the FlipFlop algorithm, which is the first method that provably combines the best of both worlds. As part of our construction, we develop AdaHedge, which is a new way of dynamically tuning the learning rate in Hedge without using the doubling trick. AdaHedge refines a method by Cesa-Bianchi, Mansour and Stoltz (2007), yielding slightly improved worst-case guarantees. By interleaving AdaHedge and FTL, the FlipFlop algorithm achieves regret within a constant factor of the FTL regret, without sacrificing AdaHedge's worst-case guarantees. AdaHedge and FlipFlop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
