Non-trivial two-armed partial-monitoring games are bandits

Andr\'as Antos; G\'abor Bart\'ok; Csaba Szepesv\'ari

arXiv:1108.4961·cs.LG·August 26, 2011·1 cites

Non-trivial two-armed partial-monitoring games are bandits

Andr\'as Antos, G\'abor Bart\'ok, Csaba Szepesv\'ari

PDF

Open Access

TL;DR

This paper demonstrates that non-trivial two-action partial-monitoring games can be reduced to bandit problems, leading to a minimax regret rate of a a a(a T).

Contribution

It establishes a reduction from non-trivial two-action partial-monitoring games to bandit problems, clarifying their regret bounds.

Findings

01

Non-trivial two-action partial-monitoring games are equivalent to bandit problems.

02

Minimax regret in these games is a(a T).

03

Reduction simplifies analysis of such games.

Abstract

We consider online learning in partial-monitoring games against an oblivious adversary. We show that when the number of actions available to the learner is two and the game is nontrivial then it is reducible to a bandit-like game and thus the minimax regret is $Θ (T)$ .

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Game Theory and Applications · Auction Theory and Applications