Safe Exploration Using Bayesian World Models and Log-Barrier Optimization
Yarden As, Bhavya Sukhija, Andreas Krause

TL;DR
This paper introduces CERL, a reinforcement learning method that uses Bayesian world models and log-barrier optimization to ensure safe exploration and policy safety during learning, especially in image-based environments.
Contribution
CERL is a novel approach combining Bayesian modeling and log-barrier optimization to achieve safe, robust exploration in constrained Markov decision processes.
Findings
CERL outperforms existing methods in safety and optimality.
CERL maintains safety during learning with image observations.
The approach is robust to model inaccuracies.
Abstract
A major challenge in deploying reinforcement learning in online tasks is ensuring that safety is maintained throughout the learning process. In this work, we propose CERL, a new method for solving constrained Markov decision processes while keeping the policy safe during learning. Our method leverages Bayesian world models and suggests policies that are pessimistic w.r.t. the model's epistemic uncertainty. This makes CERL robust towards model inaccuracies and leads to safe exploration during learning. In our experiments, we demonstrate that CERL outperforms the current state-of-the-art in terms of safety and optimality in solving CMDPs from image observations.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReservoir Engineering and Simulation Methods · AI-based Problem Solving and Planning · Fault Detection and Control Systems
