Non-trivial two-armed partial-monitoring games are bandits
Andr\'as Antos, G\'abor Bart\'ok, Csaba Szepesv\'ari

TL;DR
This paper demonstrates that non-trivial two-action partial-monitoring games can be reduced to bandit problems, leading to a minimax regret rate of a a a(a T).
Contribution
It establishes a reduction from non-trivial two-action partial-monitoring games to bandit problems, clarifying their regret bounds.
Findings
Non-trivial two-action partial-monitoring games are equivalent to bandit problems.
Minimax regret in these games is a(a T).
Reduction simplifies analysis of such games.
Abstract
We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications
