No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti; Sattar Vakili; Amanda Prorok; Carl Henrik Ek

arXiv:2510.20725·cs.LG·October 24, 2025

No-Regret Thompson Sampling for Finite-Horizon Markov Decision Processes with Gaussian Processes

Jasmine Bayrooti, Sattar Vakili, Amanda Prorok, Carl Henrik Ek

PDF

Open Access

TL;DR

This paper establishes theoretical no-regret guarantees for Thompson sampling in finite-horizon RL using Gaussian process models, addressing complex temporal structures and extending classical analysis tools.

Contribution

It provides the first regret bounds for TS in episodic RL with GP priors over rewards and transitions, handling non-Gaussian value functions and recursive Bellman updates.

Findings

01

Regret bound of (rac{KH ext ilde{}ig(rac{KH ext ilde{}ig)}) for TS in episodic RL.

02

Addresses challenges of non-Gaussian value functions and recursive Bellman structure.

03

Extends classical elliptical potential lemma to multi-output Gaussian process settings.

Abstract

Thompson sampling (TS) is a powerful and widely used strategy for sequential decision-making, with applications ranging from Bayesian optimization to reinforcement learning (RL). Despite its success, the theoretical foundations of TS remain limited, particularly in settings with complex temporal structure such as RL. We address this gap by establishing no-regret guarantees for TS using models with Gaussian marginal distributions. Specifically, we consider TS in episodic RL with joint Gaussian process (GP) priors over rewards and transitions. We prove a regret bound of $\tilde{O} (K H Γ (K H))$ over $K$ episodes of horizon $H$ , where $Γ (\cdot)$ captures the complexity of the GP model. Our analysis addresses several challenges, including the non-Gaussian nature of value functions and the recursive structure of Bellman updates, and extends classical tools such as the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Gaussian Processes and Bayesian Inference · Reinforcement Learning in Robotics