Exchange Policy Optimization Algorithm for Semi-Infinite Safe Reinforcement Learning
Jiaming Zhang, Yujie Yang, Haoning Wang, Liping Zhang, Shengbo Eben Li

TL;DR
This paper introduces the exchange policy optimization (EPO) algorithm for semi-infinite safe reinforcement learning, effectively managing infinite constraints by iteratively refining safety sets to ensure optimal performance and safety bounds.
Contribution
The paper proposes a novel EPO framework that adaptively manages infinite safety constraints in semi-infinite safe RL, ensuring performance and safety bounds.
Findings
EPO achieves near-optimal policy performance.
EPO maintains safety violations within prescribed bounds.
Theoretical guarantees support EPO's effectiveness.
Abstract
Safe reinforcement learning (safe RL) aims to respect safety requirements while optimizing long-term performance. In many practical applications, however, the problem involves an infinite number of constraints, known as semi-infinite safe RL (SI-safe RL). Such constraints typically appear when safety conditions must be enforced across an entire continuous parameter space, such as ensuring adequate resource distribution at every spatial location. In this paper, we propose exchange policy optimization (EPO), an algorithmic framework that achieves optimal policy performance and deterministic bounded safety. EPO works by iteratively solving safe RL subproblems with finite constraint sets and adaptively adjusting the active set through constraint expansion and deletion. At each iteration, constraints with violations exceeding the predefined tolerance are added to refine the policy, while…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Adversarial Robustness in Machine Learning
