Data Generation Method for Learning a Low-dimensional Safe Region in Safe Reinforcement Learning
Zhehua Zhou, Ozgur S. Oguz, Yi Ren, Marion Leibold, Martin Buss

TL;DR
This paper proposes a data generation method combining two sampling techniques to improve safety estimation in high-dimensional safe reinforcement learning, demonstrated on a three-link inverted pendulum.
Contribution
It introduces a novel data generation approach that balances learning performance and safety risk for better safety region estimation.
Findings
Enhanced safety estimates with the proposed sampling method
Improved learning performance in high-dimensional systems
Validated on a three-link inverted pendulum example
Abstract
Safe reinforcement learning aims to learn a control policy while ensuring that neither the system nor the environment gets damaged during the learning process. For implementing safe reinforcement learning on highly nonlinear and high-dimensional dynamical systems, one possible approach is to find a low-dimensional safe region via data-driven feature extraction methods, which provides safety estimates to the learning algorithm. As the reliability of the learned safety estimates is data-dependent, we investigate in this work how different training data will affect the safe reinforcement learning approach. By balancing between the learning performance and the risk of being unsafe, a data generation method that combines two sampling methods is proposed to generate representative training data. The performance of the method is demonstrated with a three-link inverted pendulum example.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems · Reinforcement Learning in Robotics · Adaptive Dynamic Programming Control
