No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes
Jasmine Bayrooti, Sattar Vakili, Amanda Prorok, Carl Henrik Ek

TL;DR
This paper establishes theoretical no-regret guarantees for Thompson sampling in finite-horizon RL using Gaussian process models, addressing complex temporal structures and extending classical analysis tools.
Contribution
It provides the first regret bounds for TS in episodic RL with GP priors over rewards and transitions, handling non-Gaussian value functions and recursive Bellman updates.
Findings
Regret bound of (rac{KH ext ilde{}ig(rac{KH ext ilde{}ig)}) for TS in episodic RL.
Addresses challenges of non-Gaussian value functions and recursive Bellman structure.
Extends classical elliptical potential lemma to multi-output Gaussian process settings.
Abstract
Thompson sampling (TS) is a powerful and widely used strategy for sequential decision-making, with applications ranging from Bayesian optimization to reinforcement learning (RL). Despite its success, the theoretical foundations of TS remain limited, particularly in settings with complex temporal structure such as RL. We address this gap by establishing no-regret guarantees for TS using models with Gaussian marginal distributions. Specifically, we consider TS in episodic RL with joint Gaussian process (GP) priors over rewards and transitions. We prove a regret bound of over episodes of horizon , where captures the complexity of the GP model. Our analysis addresses several challenges, including the non-Gaussian nature of value functions and the recursive structure of Bellman updates, and extends classical tools such as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics
