An Information-Theoretic Approach to Minimax Regret in Partial Monitoring
Tor Lattimore, Csaba Szepesvari

TL;DR
This paper develops an information-theoretic framework to derive minimax regret bounds in partial monitoring, improving existing results and providing new insights into various game settings.
Contribution
It introduces a new minimax theorem linking Bayesian and minimax regret without assumptions, and generalizes tools to obtain tighter regret bounds in partial monitoring and bandit problems.
Findings
Derived regret bounds for finite partial monitoring that are independent of large constants.
Improved the upper bound of minimax regret for k-armed bandits to sqrt{2kn}.
Provided better constants in the analysis of the cops and robbers game.
Abstract
We prove a new minimax theorem connecting the worst-case Bayesian regret and minimax regret under partial monitoring with no assumptions on the space of signals or decisions of the adversary. We then generalise the information-theoretic tools of Russo and Van Roy (2016) for proving Bayesian regret bounds and combine them with the minimax theorem to derive minimax regret bounds for various partial monitoring settings. The highlight is a clean analysis of `non-degenerate easy' and `hard' finite partial monitoring, with new regret bounds that are independent of arbitrarily large game-dependent constants. The power of the generalised machinery is further demonstrated by proving that the minimax regret for k-armed adversarial bandits is at most sqrt{2kn}, improving on existing results by a factor of 2. Finally, we provide a simple analysis of the cops and robbers game, also improving best…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Machine Learning and Algorithms
