Posterior Sampling Reinforcement Learning with Gaussian Processes for Continuous Control: Sublinear Regret Bounds for Unbounded State Spaces
Hamish Flynn, Joe Watson, Ingmar Posner, Jan Peters

TL;DR
This paper provides a theoretical analysis of Gaussian process posterior sampling reinforcement learning (GP-PSRL) for continuous control, establishing sublinear regret bounds even with unbounded state spaces, thus advancing the understanding of its performance guarantees.
Contribution
The paper derives the first tight Bayesian regret bounds for GP-PSRL in unbounded state spaces, using advanced probabilistic inequalities and chaining methods to improve prior theoretical results.
Findings
Regret bound of order (H^{3/2}\u221a{}(rac{ ext{max info gain}}{T}) T)
States visited are contained within a near-constant radius ball with high probability
Provides a theoretical foundation for analyzing GP-PSRL in complex, unbounded environments
Abstract
We analyze the Bayesian regret of the Gaussian process posterior sampling reinforcement learning (GP-PSRL) algorithm. Posterior sampling is an effective heuristic for decision-making under uncertainty that has been used to develop successful algorithms for a variety of continuous control problems. However, theoretical work on GP-PSRL is limited. All known regret bounds either fail to achieve a tight dependence on a kernel-dependent quantity called the maximum information gain or fail to properly account for the fact that the set of possible system states is unbounded. Through a recursive application of the Borell-Tsirelson-Ibragimov-Sudakov inequality, we show that, with high probability, the states actually visited by the algorithm are contained within a ball of near-constant radius. To obtain tight dependence on the maximum information gain, we use the chaining method to control the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics
