Model-Free Learning of Safe yet Effective Controllers

Alper Kamil Bozkurt; Yu Wang; Miroslav Pajic

arXiv:2103.14600·cs.RO·April 7, 2026

Model-Free Learning of Safe yet Effective Controllers

Alper Kamil Bozkurt, Yu Wang, Miroslav Pajic

PDF

TL;DR

This paper introduces a model-free reinforcement learning method for developing control policies that ensure safety, satisfy complex temporal logic specifications, and optimize control rewards in unknown environments.

Contribution

It presents a novel RL algorithm that sequentially prioritizes safety, LTL satisfaction, and control performance without requiring environment models.

Findings

01

The proposed method effectively learns safe and effective control policies.

02

It demonstrates applicability in unknown Markov decision process environments.

Abstract

We study the problem of learning safe control policies that are also effective; i.e., maximizing the probability of satisfying a linear temporal logic (LTL) specification of a task, and the discounted reward capturing the (classic) control performance. We consider unknown environments modeled as Markov decision processes. We propose a model-free reinforcement learning algorithm that learns a policy that first maximizes the probability of ensuring safety, then the probability of satisfying the given LTL specification and lastly, the sum of discounted Quality of Control rewards. Finally, we illustrate applicability of our RL-based approach.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.