An Information-Theoretic Analysis of Bayesian Reinforcement Learning
Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering,, and Mikael Skoglund

TL;DR
This paper applies information-theoretic methods to analyze the fundamental limits of Bayesian reinforcement learning, deriving bounds on the minimum Bayesian regret for MDPs, including bandit and online optimization problems.
Contribution
It introduces a framework for bounding Bayesian regret in model-based RL using information-theoretic measures like relative entropy and Wasserstein distance.
Findings
Derived upper bounds on Bayesian regret for MDPs.
Reproduced and improved existing bounds for bandit and online optimization problems.
Provided a unified information-theoretic approach to RL performance limits.
Abstract
Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning
