On Online Learning in Kernelized Markov Decision Processes

Sayak Ray Chowdhury; Aditya Gopalan

arXiv:1911.01871·cs.LG·November 6, 2019

On Online Learning in Kernelized Markov Decision Processes

Sayak Ray Chowdhury, Aditya Gopalan

PDF

TL;DR

This paper introduces algorithms for online learning in continuous-state and action Markov decision processes using kernel methods, achieving low regret with UCB and Thompson Sampling strategies under smoothness assumptions.

Contribution

It presents novel algorithms that leverage kernel approximation techniques for low-regret learning in continuous MDPs, extending existing methods to more general settings.

Findings

01

Algorithms with low regret for continuous MDPs

02

Effective use of kernel methods for smooth transition dynamics

03

Applicable to both UCB and Thompson Sampling frameworks

Abstract

We develop algorithms with low regret for learning episodic Markov decision processes based on kernel approximation techniques. The algorithms are based on both the Upper Confidence Bound (UCB) as well as Posterior or Thompson Sampling (PSRL) philosophies, and work in the general setting of continuous state and action spaces when the true unknown transition dynamics are assumed to have smoothness induced by an appropriate Reproducing Kernel Hilbert Space (RKHS).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.