Safe Continuous Control with Constrained Model-Based Policy Optimization

Moritz A. Zanger; Karam Daaboul; J. Marius Z\"ollner

arXiv:2104.06922·cs.LG·April 15, 2021

Safe Continuous Control with Constrained Model-Based Policy Optimization

Moritz A. Zanger, Karam Daaboul, J. Marius Z\"ollner

PDF

Open Access 1 Repo

TL;DR

This paper introduces a model-based safe exploration algorithm for high-dimensional control that significantly reduces sample complexity while maintaining safety constraints, validated on robotic tasks.

Contribution

It proposes a novel model-based constrained policy optimization method with adaptive uncertainty quantification and dynamic rollout limits for safe reinforcement learning.

Findings

01

Achieves 10-20x reduction in training samples compared to model-free methods.

02

Maintains approximate safety constraints during learning.

03

Validates effectiveness on simulated robotic control tasks.

Abstract

The applicability of reinforcement learning (RL) algorithms in real-world domains often requires adherence to safety constraints, a need difficult to address given the asymptotic nature of the classic RL optimization objective. In contrast to the traditional RL objective, safe exploration considers the maximization of expected returns under safety constraints expressed in expected cost returns. We introduce a model-based safe exploration algorithm for constrained high-dimensional control to address the often prohibitively high sample complexity of model-free safe exploration algorithms. Further, we provide theoretical and empirical analyses regarding the implications of model-usage on constrained policy optimization problems and introduce a practical algorithm that accelerates policy search with model-generated data. The need for accurate estimates of a policy's constraint satisfaction…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

anyboby/mujoco_safety_gym
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms