Bandits with Feedback Graphs and Switching Costs

Raman Arora; Teodor V. Marinov; Mehryar Mohri

arXiv:1907.12189·cs.LG·March 24, 2020·1 cites

Bandits with Feedback Graphs and Switching Costs

Raman Arora, Teodor V. Marinov, Mehryar Mohri

PDF

Open Access

TL;DR

This paper introduces new algorithms for adversarial multi-armed bandits with feedback graphs and switching costs, achieving regret bounds based on the domination number and improving policy regret with partial counterfactual feedback.

Contribution

The paper presents algorithms with regret guarantees depending on the domination number, and provides lower bounds and improved policy regret bounds for partial feedback scenarios.

Findings

01

Regret depends on the domination number of the feedback graph

02

New algorithms outperform previous ones based on the independence number

03

Improved policy regret bounds with partial counterfactual feedback

Abstract

We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a \emph{switching cost} is incurred for shifting to a new action. All previously known results incur a factor proportional to the independence number of the feedback graph. We give a new algorithm whose regret guarantee depends only on the domination number of the graph. We further supplement that result with a lower bound. Finally, we also give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics