Modelling transition dynamics in MDPs with RKHS embeddings

Steffen Grunewalder (University College London); Guy Lever (University; College London); Luca Baldassarre (University College London); Massi Pontil; (University College London); Arthur Gretton (MPI for Intelligent Systems)

arXiv:1206.4655·cs.LG·June 22, 2012·44 cites

Modelling transition dynamics in MDPs with RKHS embeddings

Steffen Grunewalder (University College London), Guy Lever (University, College London), Luca Baldassarre (University College London), Massi Pontil, (University College London), Arthur Gretton (MPI for Intelligent Systems)

PDF

Open Access

TL;DR

This paper introduces a nonparametric RKHS embedding method for modeling transition dynamics in MDPs, enabling efficient policy and value function learning without estimating transition probabilities, and demonstrates superior performance in control and navigation tasks.

Contribution

The paper presents a novel RKHS embedding approach for transition dynamics in MDPs that simplifies computations and guarantees convergence, improving policy and value estimation.

Findings

01

Achieves better performance than Gaussian process-based methods.

02

Provides convergence guarantees for value iteration in MDPs.

03

Effective in control and sensor-based navigation tasks.

Abstract

We propose a new, nonparametric approach to learning and representing transition dynamics in Markov decision processes (MDPs), which can be combined easily with dynamic programming methods for policy optimisation and value estimation. This approach makes use of a recently developed representation of conditional distributions as \emph{embeddings} in a reproducing kernel Hilbert space (RKHS). Such representations bypass the need for estimating transition probabilities or densities, and apply to any domain on which kernels can be defined. This avoids the need to calculate intractable integrals, since expectations are represented as RKHS inner products whose computation has linear complexity in the number of points used to represent the embedding. We provide guarantees for the proposed applications in MDPs: in the context of a value iteration algorithm, we prove convergence to either the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Machine Learning and Algorithms

MethodsGaussian Process