Best-of-Both-Worlds Algorithms for Partial Monitoring
Taira Tsuchiya, Shinji Ito, Junya Honda

TL;DR
This paper introduces the first algorithms that perform optimally in both stochastic and adversarial partial monitoring settings, with regret bounds tailored to game observability and complexity.
Contribution
It presents novel best-of-both-worlds algorithms for partial monitoring, with regret bounds for both stochastic and adversarial regimes based on game observability.
Findings
Regret bounds for non-degenerate locally observable games in stochastic and adversarial regimes.
Regret bounds for globally observable games in stochastic and adversarial regimes.
Algorithms are based on follow-the-regularized-leader with adaptive learning rates.
Abstract
This study considers the partial monitoring problem with -actions and -outcomes and provides the first best-of-both-worlds algorithms, whose regrets are favorably bounded both in the stochastic and adversarial regimes. In particular, we show that for non-degenerate locally observable games, the regret is in the stochastic regime and in the adversarial regime, where is the number of rounds, is the maximum number of distinct observations per action, is the minimum suboptimality gap, and is the number of Pareto optimal actions. Moreover, we show that for globally observable games, the regret is in the stochastic regime and in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Reinforcement Learning in Robotics
