Data Generation Method for Learning a Low-dimensional Safe Region in   Safe Reinforcement Learning

Zhehua Zhou; Ozgur S. Oguz; Yi Ren; Marion Leibold; Martin Buss

arXiv:2109.05077·eess.SY·September 14, 2021

Data Generation Method for Learning a Low-dimensional Safe Region in Safe Reinforcement Learning

Zhehua Zhou, Ozgur S. Oguz, Yi Ren, Marion Leibold, Martin Buss

PDF

Open Access

TL;DR

This paper proposes a data generation method combining two sampling techniques to improve safety estimation in high-dimensional safe reinforcement learning, demonstrated on a three-link inverted pendulum.

Contribution

It introduces a novel data generation approach that balances learning performance and safety risk for better safety region estimation.

Findings

01

Enhanced safety estimates with the proposed sampling method

02

Improved learning performance in high-dimensional systems

03

Validated on a three-link inverted pendulum example

Abstract

Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process. For implementing safe reinforcement learning on highly nonlinear and high-dimensional dynamical systems, one possible approach is to find a low-dimensional safe region via data-driven feature extraction methods, which provides safety estimates to the learning algorithm. As the reliability of the learned safety estimates is data-dependent, we investigate in this work how different training data will affect the safe reinforcement learning approach. By balancing between the learning performance and the risk of being unsafe, a data generation method that combines two sampling methods is proposed to generate representative training data. The performance of the method is demonstrated with a three-link inverted pendulum example.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFault Detection and Control Systems · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control