Online Learning with Switching Costs and Other Adaptive Adversaries

Nicolo Cesa-Bianchi; Ofer Dekel; Ohad Shamir

arXiv:1302.4387·cs.LG·June 4, 2013·53 cites

Online Learning with Switching Costs and Other Adaptive Adversaries

Nicolo Cesa-Bianchi, Ofer Dekel, Ohad Shamir

PDF

Open Access

TL;DR

This paper investigates the impact of adaptive adversaries with switching costs on online learning, revealing that bandit feedback leads to higher regret rates than full-information scenarios, and introduces new bounds and strategies.

Contribution

It characterizes the power of adaptive adversaries with switching costs and bounded memory, providing nearly complete regret bounds and a novel reduction from experts to bandits.

Findings

01

Bandit feedback with switching costs yields a regret rate of .67 T^{2/3}

02

Full-information case with switching costs achieves .5 T^{1/2} regret rate

03

Bounded memory adversaries can force .67 T^{2/3} regret even with full information.

Abstract

We study the power of different types of adaptive (nonoblivious) adversaries in the setting of prediction with expert advice, under both full-information and bandit feedback. We measure the player's performance using a new notion of regret, also known as policy regret, which better captures the adversary's adaptiveness to the player's behavior. In a setting where losses are allowed to drift, we characterize ---in a nearly complete manner--- the power of adaptive adversaries with bounded memories and switching costs. In particular, we show that with switching costs, the attainable rate with bandit feedback is $Θ (T^{2/3})$ . Interestingly, this rate is significantly worse than the $Θ (T)$ rate attainable with switching costs in the full-information case. Via a novel reduction from experts to bandits, we also show that a bounded memory adversary can force…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics