Exploration by Optimisation in Partial Monitoring

Tor Lattimore; Csaba Szepesvari

arXiv:1907.05772·cs.LG·October 28, 2019·20 cites

Exploration by Optimisation in Partial Monitoring

Tor Lattimore, Csaba Szepesvari

PDF

Open Access

TL;DR

This paper introduces a simple, efficient algorithm for adversarial partial monitoring games that achieves near-optimal regret bounds, matching the best known theoretical limits and extending to various game settings.

Contribution

The paper presents a new algorithm that attains optimal regret bounds in adversarial partial monitoring and performs well across multiple game types.

Findings

01

Achieves regret bound of 6(d+1)k^{3/2}√(n log k) in partial monitoring.

02

Matches the best known information-theoretic upper bounds.

03

Performs near-optimally in full information, bandit, and globally observable games.

Abstract

We provide a simple and efficient algorithm for adversarial $k$ -action $d$ -outcome non-degenerate locally observable partial monitoring game for which the $n$ -round minimax regret is bounded by $6 (d + 1) k^{3/2} n lo g (k)$ , matching the best known information-theoretic upper bound. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning