Safe Exploration in Continuous Action Spaces

Gal Dalal; Krishnamurthy Dvijotham; Matej Vecerik; Todd Hester; Cosmin; Paduraru; Yuval Tassa

arXiv:1801.08757·cs.AI·January 29, 2018·274 cites

Safe Exploration in Continuous Action Spaces

Gal Dalal, Krishnamurthy Dvijotham, Matej Vecerik, Todd Hester, Cosmin, Paduraru, Yuval Tassa

PDF

Open Access 5 Repos

TL;DR

This paper introduces a safety layer for reinforcement learning in continuous action spaces that guarantees constraint satisfaction during learning by analytically correcting actions based on a linearized model, suitable for real-world systems.

Contribution

The authors propose a novel safety layer with a closed-form solution for constraint satisfaction in RL, applicable to arbitrary past trajectories and real-world data logs.

Findings

01

Guarantees zero constraint violations during learning.

02

Effective in physics-based environments where reward shaping fails.

03

Applicable to systems with smooth dynamics and arbitrary past data.

Abstract

We address the problem of deploying a reinforcement learning (RL) agent on a physical system such as a datacenter cooling unit or robot, where critical constraints must never be violated. We show how to exploit the typically smooth dynamics of these systems and enable RL algorithms to never violate constraints during learning. Our technique is to directly add to the policy a safety layer that analytically solves an action correction formulation per each state. The novelty of obtaining an elegant closed-form solution is attained due to a linearized model, learned on past trajectories consisting of arbitrary actions. This is to mimic the real-world circumstances where data logs were generated with a behavior policy that is implausible to describe mathematically; such cases render the known safety-aware off-policy methods inapplicable. We demonstrate the efficacy of our approach on new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Data Stream Mining Techniques