Safe Exploration Using Bayesian World Models and Log-Barrier   Optimization

Yarden As; Bhavya Sukhija; Andreas Krause

arXiv:2405.05890·cs.LG·May 10, 2024·1 cites

Safe Exploration Using Bayesian World Models and Log-Barrier Optimization

Yarden As, Bhavya Sukhija, Andreas Krause

PDF

Open Access

TL;DR

This paper introduces CERL, a reinforcement learning method that uses Bayesian world models and log-barrier optimization to ensure safe exploration and policy safety during learning, especially in image-based environments.

Contribution

CERL is a novel approach combining Bayesian modeling and log-barrier optimization to achieve safe, robust exploration in constrained Markov decision processes.

Findings

01

CERL outperforms existing methods in safety and optimality.

02

CERL maintains safety during learning with image observations.

03

The approach is robust to model inaccuracies.

Abstract

A major challenge in deploying reinforcement learning in online tasks is ensuring that safety is maintained throughout the learning process. In this work, we propose CERL, a new method for solving constrained Markov decision processes while keeping the policy safe during learning. Our method leverages Bayesian world models and suggests policies that are pessimistic w.r.t. the model's epistemic uncertainty. This makes CERL robust towards model inaccuracies and leads to safe exploration during learning. In our experiments, we demonstrate that CERL outperforms the current state-of-the-art in terms of safety and optimality in solving CMDPs from image observations.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReservoir Engineering and Simulation Methods · AI-based Problem Solving and Planning · Fault Detection and Control Systems