Online learning with graph-structured feedback against adaptive   adversaries

Zhili Feng; Po-Ling Loh

arXiv:1804.00335·cs.LG·April 3, 2018·1 cites

Online learning with graph-structured feedback against adaptive adversaries

Zhili Feng, Po-Ling Loh

PDF

Open Access

TL;DR

This paper investigates online learning with graph-structured feedback against adaptive adversaries with bounded memory, providing upper and lower bounds on policy regret for different graph observability scenarios.

Contribution

It establishes tight upper and lower bounds on policy regret in online learning with graph feedback under adaptive adversaries, extending prior results to more general settings.

Findings

01

Upper bounds of T^{2/3} and T^{3/4} for strongly and weakly observable graphs

02

Matching lower bound of T^{2/3} for adversaries with bounded memory in full-information setting

03

Analysis of switching costs with non-revealing strongly-observable feedback graphs

Abstract

We derive upper and lower bounds for the policy regret of $T$ -round online learning problems with graph-structured feedback, where the adversary is nonoblivious but assumed to have a bounded memory. We obtain upper bounds of $O (T^{2/3})$ and $O (T^{3/4})$ for strongly-observable and weakly-observable graphs, respectively, based on analyzing a variant of the Exp3 algorithm. When the adversary is allowed a bounded memory of size 1, we show that a matching lower bound of $Ω (T^{2/3})$ is achieved in the case of full-information feedback. We also study the particular loss structure of an oblivious adversary with switching costs, and show that in such a setting, non-revealing strongly-observable feedback graphs achieve a lower bound of $Ω (T^{2/3})$ , as well.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems