Online Learning with Switching Costs and Other Adaptive Adversaries
Nicolo Cesa-Bianchi, Ofer Dekel, Ohad Shamir

TL;DR
This paper investigates the impact of adaptive adversaries with switching costs on online learning, revealing that bandit feedback leads to higher regret rates than full-information scenarios, and introduces new bounds and strategies.
Contribution
It characterizes the power of adaptive adversaries with switching costs and bounded memory, providing nearly complete regret bounds and a novel reduction from experts to bandits.
Findings
Bandit feedback with switching costs yields a regret rate of .67 T^{2/3}
Full-information case with switching costs achieves .5 T^{1/2} regret rate
Bounded memory adversaries can force .67 T^{2/3} regret even with full information.
Abstract
We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is . Interestingly, this rate is significantly worse than the rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
