Policy Evaluation in Continuous MDPs with Efficient Kernelized Gradient   Temporal Difference

Alec Koppel; Garrett Warnell; Ethan Stump; Peter Stone; Alejandro; Ribeiro

arXiv:1709.04221·math.OC·May 19, 2020·IEEE Trans. Autom. Control.

Policy Evaluation in Continuous MDPs with Efficient Kernelized Gradient Temporal Difference

Alec Koppel, Garrett Warnell, Ethan Stump, Peter Stone, Alejandro, Ribeiro

PDF

TL;DR

This paper introduces a memory-efficient, non-parametric stochastic method for policy evaluation in continuous MDPs, leveraging kernelized gradient TD learning to achieve faster convergence with less memory.

Contribution

It extends gradient temporal difference learning to a non-parametric, kernel-based setting with guaranteed convergence and improved efficiency in continuous state spaces.

Findings

01

Faster convergence to lower Bellman error in Mountain Car domain

02

Achieves convergence with finite memory and complexity

03

Outperforms existing methods in efficiency and accuracy

Abstract

We consider policy evaluation in infinite-horizon discounted Markov decision problems (MDPs) with infinite spaces. We reformulate this task a compositional stochastic program with a function-valued decision variable that belongs to a reproducing kernel Hilbert space (RKHS). We approach this problem via a new functional generalization of stochastic quasi-gradient methods operating in tandem with stochastic sparse subspace projections. The result is an extension of gradient temporal difference learning that yields nonlinearly parameterized value function estimates of the solution to the Bellman evaluation equation. Our main contribution is a memory-efficient non-parametric stochastic method guaranteed to converge exactly to the Bellman fixed point with probability $1$ with attenuating step-sizes. Further, with constant step-sizes, we obtain mean convergence to a neighborhood and that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.