On Online Learning in Kernelized Markov Decision Processes
Sayak Ray Chowdhury, Aditya Gopalan

TL;DR
This paper introduces algorithms for online learning in continuous-state and action Markov decision processes using kernel methods, achieving low regret with UCB and Thompson Sampling strategies under smoothness assumptions.
Contribution
It presents novel algorithms that leverage kernel approximation techniques for low-regret learning in continuous MDPs, extending existing methods to more general settings.
Findings
Algorithms with low regret for continuous MDPs
Effective use of kernel methods for smooth transition dynamics
Applicable to both UCB and Thompson Sampling frameworks
Abstract
We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
