Information-Theoretic Minimax Regret Bounds for Reinforcement Learning based on Duality
Raghav Bongole, Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias, J. Oechtering, and Mikael Skoglund

TL;DR
This paper derives information-theoretic minimax regret bounds for reinforcement learning in finite-horizon MDPs, providing new theoretical insights into robust policy performance across unknown environments.
Contribution
It introduces a novel minimax regret framework for MDPs and establishes bounds using information-theoretic and Bayesian regret analysis.
Findings
Derived minimax regret bounds for finite-horizon MDPs
Established minimax theorems linking Bayesian and minimax regret
Applied bounds to various reinforcement learning scenarios
Abstract
We study agents acting in an unknown environment where the agent's goal is to find a robust policy. We consider robust policies as policies that achieve high cumulative rewards for all possible environments. To this end, we consider agents minimizing the maximum regret over different environment parameters, leading to the study of minimax regret. This research focuses on deriving information-theoretic bounds for minimax regret in Markov Decision Processes (MDPs) with a finite time horizon. Building on concepts from supervised learning, such as minimum excess risk (MER) and minimax excess risk, we use recent bounds on the Bayesian regret to derive minimax regret bounds. Specifically, we establish minimax theorems and use bounds on the Bayesian regret to perform minimax regret analysis using these minimax theorems. Our contributions include defining a suitable minimax regret in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Reinforcement Learning in Robotics
