Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Tianqi Zheng; Pengcheng You; and Enrique Mallada

arXiv:2212.01505·cs.LG·December 6, 2022·1 cites

Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

Tianqi Zheng, Pengcheng You, and Enrique Mallada

PDF

Open Access

TL;DR

This paper introduces a new constrained reinforcement learning algorithm based on dissipative saddle flow dynamics, ensuring almost sure convergence to the optimal policy without the limitations of previous stochastic gradient methods.

Contribution

It proposes a novel stochastic gradient descent-ascent algorithm for constrained RL that guarantees convergence to the optimal policy using saddle-flow dynamics.

Findings

01

Algorithm converges almost surely to the optimal policy.

02

Addresses limitations of previous primal-dual methods.

03

Provides a theoretically grounded approach for constrained RL.

Abstract

In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm's history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this work, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques