Cautious Reinforcement Learning with Logical Constraints
Mohammadhosein Hasanbeig, Alessandro Abate, Daniel Kroening

TL;DR
This paper introduces an adaptive safe padding approach in reinforcement learning that ensures safety during learning while optimizing control policies to satisfy temporal logic goals, balancing exploration and safety with theoretical guarantees.
Contribution
The paper proposes a novel adaptive safe padding method that guarantees safety and optimality in reinforcement learning with temporal logic constraints, supported by theoretical proofs.
Findings
The method effectively balances exploration and safety during learning.
Theoretical guarantees on policy optimality and convergence are established.
Experimental results demonstrate improved safety and goal satisfaction.
Abstract
This paper presents the concept of an adaptive safe padding that forces Reinforcement Learning (RL) to synthesise optimal control policies while ensuring safety during the learning process. Policies are synthesised to satisfy a goal, expressed as a temporal logic formula, with maximal probability. Enforcing the RL agent to stay safe during learning might limit the exploration, however we show that the proposed architecture is able to automatically handle the trade-off between efficient progress in exploration (towards goal satisfaction) and ensuring safety. Theoretical guarantees are available on the optimality of the synthesised policies and on the convergence of the learning algorithm. Experimental results are provided to showcase the performance of the proposed method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · AI-based Problem Solving and Planning
