An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Amaury Gouverneur; Borja Rodr\'iguez-G\'alvez; Tobias J. Oechtering,; and Mikael Skoglund

arXiv:2207.08735·cs.LG·July 19, 2022

An Information-Theoretic Analysis of Bayesian Reinforcement Learning

Amaury Gouverneur, Borja Rodr\'iguez-G\'alvez, Tobias J. Oechtering,, and Mikael Skoglund

PDF

Open Access

TL;DR

This paper applies information-theoretic methods to analyze the fundamental limits of Bayesian reinforcement learning, deriving bounds on the minimum Bayesian regret for MDPs, including bandit and online optimization problems.

Contribution

It introduces a framework for bounding Bayesian regret in model-based RL using information-theoretic measures like relative entropy and Wasserstein distance.

Findings

01

Derived upper bounds on Bayesian regret for MDPs.

02

Reproduced and improved existing bounds for bandit and online optimization problems.

03

Provided a unified information-theoretic approach to RL performance limits.

Abstract

Building on the framework introduced by Xu and Raginksy [1] for supervised learning problems, we study the best achievable performance for model-based Bayesian reinforcement learning problems. With this purpose, we define minimum Bayesian regret (MBR) as the difference between the maximum expected cumulative reward obtainable either by learning from the collected data or by knowing the environment and its dynamics. We specialize this definition to reinforcement learning problems modeled as Markov decision processes (MDPs) whose kernel parameters are unknown to the agent and whose uncertainty is expressed by a prior distribution. One method for deriving upper bounds on the MBR is presented and specific bounds based on the relative entropy and the Wasserstein distance are given. We then focus on two particular cases of MDPs, the multi-armed bandit problem (MAB) and the online optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Adversarial Robustness in Machine Learning