Proximal Reinforcement Learning: A New Theory of Sequential Decision Making in Primal-Dual Spaces
Sridhar Mahadevan, Bo Liu, Philip Thomas, Will Dabney, Steve Giguere,, Nicholas Jacek, Ian Gemp, Ji Liu

TL;DR
This paper introduces proximal reinforcement learning, a new theoretical framework using primal-dual spaces and proximal operators to develop reliable, safe, and stable RL algorithms with convergence guarantees.
Contribution
It presents a novel proximal operator-based framework that unifies and generalizes existing RL algorithms, enabling stable off-policy learning and integration with stochastic optimization.
Findings
Provides a rigorous mathematical foundation for RL algorithms.
Develops operator splitting methods for safe gradient decomposition.
Shows connections between natural gradient, mirror descent, and proximal methods.
Abstract
In this paper, we set forth a new vision of reinforcement learning developed by us over the past few years, one that yields mathematically rigorous solutions to longstanding important questions that have remained unresolved: (i) how to design reliable, convergent, and robust reinforcement learning algorithms (ii) how to guarantee that reinforcement learning satisfies pre-specified "safety" guarantees, and remains in a stable region of the parameter space (iii) how to design "off-policy" temporal difference learning algorithms in a reliable and stable manner, and finally (iv) how to integrate the study of reinforcement learning into the rich theory of stochastic optimization. In this paper, we provide detailed answers to all these questions using the powerful framework of proximal operators. The key idea that emerges is the use of primal dual spaces connected through the use of a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Model Reduction and Neural Networks
