Randomized Policy Learning for Continuous State and Action MDPs

Hiteshi Sharma; Rahul Jain

arXiv:2006.04331·cs.LG·November 17, 2020

Randomized Policy Learning for Continuous State and Action MDPs

Hiteshi Sharma, Rahul Jain

PDF

Open Access

TL;DR

This paper introduces RANDPOL, a randomized policy iteration algorithm for continuous state and action MDPs that offers a computationally efficient alternative to deep neural networks, with finite performance guarantees and competitive results.

Contribution

The paper proposes RANDPOL, a novel policy iteration method using randomized networks for continuous MDPs, providing theoretical guarantees and empirical performance improvements.

Findings

01

RANDPOL achieves competitive performance on challenging environments.

02

Randomized networks reduce training complexity and improve numerical stability.

03

Finite time guarantees support the method's theoretical robustness.

Abstract

Deep reinforcement learning methods have achieved state-of-the-art results in a variety of challenging, high-dimensional domains ranging from video games to locomotion. The key to success has been the use of deep neural networks used to approximate the policy and value function. Yet, substantial tuning of weights is required for good results. We instead use randomized function approximation. Such networks are not only cheaper than training fully connected networks but also improve the numerical performance. We present \texttt{RANDPOL}, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm. Then we show the numerical performance on challenging environments and compare them with deep neural network based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Data Stream Mining Techniques · Auction Theory and Applications