Proximal Reinforcement Learning: A New Theory of Sequential Decision   Making in Primal-Dual Spaces

Sridhar Mahadevan; Bo Liu; Philip Thomas; Will Dabney; Steve Giguere,; Nicholas Jacek; Ian Gemp; Ji Liu

arXiv:1405.6757·cs.LG·May 28, 2014·46 cites

Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces

Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Steve Giguere,, Nicholas Jacek, Ian Gemp, Ji Liu

PDF

Open Access

TL;DR

This paper introduces proximal reinforcement learning, a new theoretical framework using primal-dual spaces and proximal operators to develop reliable, safe, and stable RL algorithms with convergence guarantees.

Contribution

It presents a novel proximal operator-based framework that unifies and generalizes existing RL algorithms, enabling stable off-policy learning and integration with stochastic optimization.

Findings

01

Provides a rigorous mathematical foundation for RL algorithms.

02

Develops operator splitting methods for safe gradient decomposition.

03

Shows connections between natural gradient, mirror descent, and proximal methods.

Abstract

In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Model Reduction and Neural Networks