Model-Free Learning of Safe yet Effective Controllers
Alper Kamil Bozkurt, Yu Wang, Miroslav Pajic

TL;DR
This paper introduces a model-free reinforcement learning method for developing control policies that ensure safety, satisfy complex temporal logic specifications, and optimize control rewards in unknown environments.
Contribution
It presents a novel RL algorithm that sequentially prioritizes safety, LTL satisfaction, and control performance without requiring environment models.
Findings
The proposed method effectively learns safe and effective control policies.
It demonstrates applicability in unknown Markov decision process environments.
Abstract
We study the problem of learning safe control policies that are also effective; i.e., maximizing the probability of satisfying a linear temporal logic (LTL) specification of a task, and the discounted reward capturing the (classic) control performance. We consider unknown environments modeled as Markov decision processes. We propose a model-free reinforcement learning algorithm that learns a policy that first maximizes the probability of ensuring safety, then the probability of satisfying the given LTL specification and lastly, the sum of discounted Quality of Control rewards. Finally, we illustrate applicability of our RL-based approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
