Joint Learning of Policy with Unknown Temporal Constraints for Safe Reinforcement Learning
Lunet Yifru, Ali Baheri

TL;DR
This paper introduces a framework that simultaneously learns safety constraints and optimal policies in reinforcement learning environments where safety constraints are unknown, ensuring safety and optimality through theoretical guarantees.
Contribution
It presents a novel joint learning framework combining logically-constrained RL with evolutionary algorithms to synthesize STL safety specifications with proven convergence.
Findings
Successfully identified safety constraints and policies in grid-world environments.
Provided theoretical guarantees for convergence and error bounds.
Demonstrated practical effectiveness of the framework.
Abstract
In many real-world applications, safety constraints for reinforcement learning (RL) algorithms are either unknown or not explicitly defined. We propose a framework that concurrently learns safety constraints and optimal RL policies in such environments, supported by theoretical guarantees. Our approach merges a logically-constrained RL algorithm with an evolutionary algorithm to synthesize signal temporal logic (STL) specifications. The framework is underpinned by theorems that establish the convergence of our joint learning process and provide error bounds between the discovered policy and the true optimal policy. We showcased our framework in grid-world environments, successfully identifying both acceptable safety constraints and RL policies while demonstrating the effectiveness of our theorems in practice.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Formal Methods in Verification
