Exploration by Optimisation in Partial Monitoring
Tor Lattimore, Csaba Szepesvari

TL;DR
This paper introduces a simple, efficient algorithm for adversarial partial monitoring games that achieves near-optimal regret bounds, matching the best known theoretical limits and extending to various game settings.
Contribution
The paper presents a new algorithm that attains optimal regret bounds in adversarial partial monitoring and performs well across multiple game types.
Findings
Achieves regret bound of 6(d+1)k^{3/2}√(n log k) in partial monitoring.
Matches the best known information-theoretic upper bounds.
Performs near-optimally in full information, bandit, and globally observable games.
Abstract
We provide a simple and efficient algorithm for adversarial -action -outcome non-degenerate locally observable partial monitoring game for which the -round minimax regret is bounded by , matching the best known information-theoretic upper bound. The same algorithm also achieves near-optimal regret for full information, bandit and globally observable games.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Adversarial Robustness in Machine Learning
