Safe Reinforcement Learning using Finite-Horizon Gradient-based Estimation
Juntao Dai, Yaodong Yang, Qian Zheng, Gang Pan

TL;DR
This paper introduces a finite-horizon gradient-based estimation method for safe reinforcement learning, enabling more accurate constraint estimation and safer policy updates in non-discounted, finite-horizon scenarios.
Contribution
It proposes the first finite-horizon non-discounted constraint estimation method (GBE) and a new safe RL algorithm (CGPO) that improves safety and efficiency.
Findings
CGPO accurately estimates constraint functions for subsequent policies.
CGPO ensures safe and feasible policy updates in finite-horizon scenarios.
Theoretical and empirical analyses validate GBE's effectiveness.
Abstract
A key aspect of Safe Reinforcement Learning (Safe RL) involves estimating the constraint condition for the next policy, which is crucial for guiding the optimization of safe policy updates. However, the existing Advantage-based Estimation (ABE) method relies on the infinite-horizon discounted advantage function. This dependence leads to catastrophic errors in finite-horizon scenarios with non-discounted constraints, resulting in safety-violation updates. In response, we propose the first estimation method for finite-horizon non-discounted constraints in deep Safe RL, termed Gradient-based Estimation (GBE), which relies on the analytic gradient derived along trajectories. Our theoretical and empirical analyses demonstrate that GBE can effectively estimate constraint changes over a finite horizon. Constructing a surrogate optimization problem with GBE, we developed a novel Safe RL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Optical Sensing Technologies · Autonomous Vehicle Technology and Safety · Machine Learning and ELM
