Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

Hongpeng Cao; Yanbing Mao; Lui Sha; Marco Caccamo

arXiv:2412.13224·cs.RO·December 19, 2024

Physics-model-guided Worst-case Sampling for Safe Reinforcement Learning

Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

PDF

Open Access

TL;DR

This paper introduces a physics-guided worst-case sampling method for deep reinforcement learning to improve safety and robustness in critical corner cases of cyber-physical systems, validated through extensive experiments.

Contribution

It proposes a novel physics-model-guided worst-case sampling strategy integrated into Phy-DRL for safer, more efficient learning in safety-critical systems.

Findings

01

Enhanced sampling efficiency in training safe policies

02

Improved robustness of policies in safety-critical scenarios

03

Validated on multiple simulated and real robotic systems

Abstract

Real-world accidents in learning-enabled CPS frequently occur in challenging corner cases. During the training of deep reinforcement learning (DRL) policy, the standard setup for training conditions is either fixed at a single initial condition or uniformly sampled from the admissible state space. This setup often overlooks the challenging but safety-critical corner cases. To bridge this gap, this paper proposes a physics-model-guided worst-case sampling strategy for training safe policies that can handle safety-critical cases toward guaranteed safety. Furthermore, we integrate the proposed worst-case sampling strategy into the physics-regulated deep reinforcement learning (Phy-DRL) framework to build a more data-efficient and safe learning algorithm for safety-critical CPS. We validate the proposed training strategy with Phy-DRL through extensive experiments on a simulated cart-pole…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics