Online learning with graph-structured feedback against adaptive adversaries
Zhili Feng, Po-Ling Loh

TL;DR
This paper investigates online learning with graph-structured feedback against adaptive adversaries with bounded memory, providing upper and lower bounds on policy regret for different graph observability scenarios.
Contribution
It establishes tight upper and lower bounds on policy regret in online learning with graph feedback under adaptive adversaries, extending prior results to more general settings.
Findings
Upper bounds of T^{2/3} and T^{3/4} for strongly and weakly observable graphs
Matching lower bound of T^{2/3} for adversaries with bounded memory in full-information setting
Analysis of switching costs with non-revealing strongly-observable feedback graphs
Abstract
We derive upper and lower bounds for the policy regret of -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of and for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of , as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
