Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces

Santiago Paternain; Juan Andr\'es Bazerque; Austin Small and; Alejandro Ribeiro

arXiv:1807.11274·cs.SY·July 31, 2018

Stochastic Policy Gradient Ascent in Reproducing Kernel Hilbert Spaces

Santiago Paternain, Juan Andr\'es Bazerque, Austin Small and, Alejandro Ribeiro

PDF

TL;DR

This paper introduces a novel stochastic policy gradient ascent algorithm for reinforcement learning in RKHS, featuring unbiased gradient estimates, variance reduction, and sparse representations, enabling efficient learning of low-complexity policies.

Contribution

The paper presents a new policy gradient method with unbiased estimates, variance reduction, and sparse RKHS representations, ensuring convergence and practical efficiency.

Findings

01

Successful learning of low-complexity policies

02

Convergence to stationary points demonstrated

03

Numerical examples confirm effectiveness

Abstract

Reinforcement learning consists of finding policies that maximize an expected cumulative long-term reward in a Markov decision process with unknown transition probabilities and instantaneous rewards. In this paper, we consider the problem of finding such optimal policies while assuming they are continuous functions belonging to a reproducing kernel Hilbert space (RKHS). To learn the optimal policy we introduce a stochastic policy gradient ascent algorithm with three unique novel features: (i) The stochastic estimates of policy gradients are unbiased. (ii) The variance of stochastic gradients is reduced by drawing on ideas from numerical differentiation. (iii) Policy complexity is controlled using sparse RKHS representations. Novel feature (i) is instrumental in proving convergence to a stationary point of the expected cumulative reward. Novel feature (ii) facilitates reasonable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.