Bandits with Feedback Graphs and Switching Costs
Raman Arora, Teodor V. Marinov, Mehryar Mohri

TL;DR
This paper introduces new algorithms for adversarial multi-armed bandits with feedback graphs and switching costs, achieving regret bounds based on the domination number and improving policy regret with partial counterfactual feedback.
Contribution
The paper presents algorithms with regret guarantees depending on the domination number, and provides lower bounds and improved policy regret bounds for partial feedback scenarios.
Findings
Regret depends on the domination number of the feedback graph
New algorithms outperform previous ones based on the independence number
Improved policy regret bounds with partial counterfactual feedback
Abstract
We study the adversarial multi-armed bandit problem where partial observations are available and where, in addition to the loss incurred for each action, a \emph{switching cost} is incurred for shifting to a new action. All previously known results incur a factor proportional to the independence number of the feedback graph. We give a new algorithm whose regret guarantee depends only on the domination number of the graph. We further supplement that result with a lower bound. Finally, we also give a new algorithm with improved policy regret bounds when partial counterfactual feedback is available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Reinforcement Learning in Robotics
