Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

Hongpeng Cao; Yanbing Mao; Lui Sha; Marco Caccamo

arXiv:2305.16614·cs.AI·July 9, 2024·1 cites

Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings

Hongpeng Cao, Yanbing Mao, Lui Sha, Marco Caccamo

PDF

Open Access 1 Repo 1 Video 3 Reviews

TL;DR

This paper introduces Phy-DRL, a physics-regulated deep reinforcement learning framework that ensures safety and efficiency in autonomous systems through invariant embeddings and physics-based neural network modifications.

Contribution

It presents a novel invariant-embedding design for DRL that integrates physics models, providing provable safety guarantees and improved training efficiency.

Findings

01

Validated safety guarantees on cart-pole and quadruped systems.

02

Achieved faster training with fewer parameters compared to traditional DRL.

03

Demonstrated strict physics compliance in neural network design.

Abstract

This paper proposes the Phy-DRL: a physics-regulated deep reinforcement learning (DRL) framework for safety-critical autonomous systems. The Phy-DRL has three distinguished invariant-embedding designs: i) residual action policy (i.e., integrating data-driven-DRL action policy and physics-model-based action policy), ii) automatically constructed safety-embedded reward, and iii) physics-model-guided neural network (NN) editing, including link editing and activation editing. Theoretically, the Phy-DRL exhibits 1) a mathematically provable safety guarantee and 2) strict compliance of critic and actor networks with physics knowledge about the action-value function and action policy. Finally, we evaluate the Phy-DRL on a cart-pole system and a quadruped robot. The experiments validate our theoretical results and demonstrate that Phy-DRL features guaranteed safety compared to purely…

Peer Reviews

Decision·ICLR 2024 spotlight

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 3

Strengths

- Considers safety information and known system dynamics in the final policy. - Introduces novel scheme for neural network policy and action-value function editing using the known dynamics of the environment.

Weaknesses

- The paper addresses the case of RL with partially known environment dynamics but does not adequately consider comparative approaches such as model-based RL [1, 2] or ODE-regularized RL [3] in all the experiments. - The effect of each of the invariant embeddings is not shown empirically. An included ablation analysis would be useful to determine this. - Presentation could be improved slightly, perhaps by including some pseudo code in the Appendix. [1] End-to-End Safe Reinforcement Learnin

Reviewer 02Rating 8· accept, good paperConfidence 2

Strengths

Addressing Safety in DRL: The paper introduces a DRL framework specifically designed for safety-critical autonomous systems, highlighting the significance of adhering to physical laws in AI applications. Mathematically-Provable Safety Guarantee: One of the notable features of the Phy-DRL is its mathematically provable safety guarantee, which is crucial for real-world applications where safety is paramount. (although I am taking the claim of authors at face value and couldn't do a thorough analy

Weaknesses

The thing I feel which is missing is where is the definition of invariant embeddings. What are they invariant to? How is adding a residual to a model-based making it invariant (and again invariant to what? state space?) Where is the invariant embedding principles coming from? (please cite any papers) The lack of real-world results is also concerning, given the efforts by the community to test robots in the real world. The presentation also needs to be improved. Specifically, Important terms

Reviewer 03Rating 8· accept, good paperConfidence 3

Strengths

1. The paper is well-motivated and well-organized. It is in general easy to follow. 2. The approach, as the reviewer can tell, is sound and solid with remarkable provable safety guarantees. 3. This framework is novel, specifically for the part of RL plus linearised model with a systematical reward construction approach to achieve provable safety guarantees. 4. The experiments are significant compared to pure DRL and pure model-based approach.

Weaknesses

1. In general, the paper is easy to follow. However, Section 6 can be a bit confusing to the reviewer and needs back-and-forth checking while reading. This section can be improved by adding more overview, intuition, and connecting intro between and among subsections. It is somewhat difficult to accept a bunch of symbols such as in Equation (13), (14), (15), (16). 2. It is somewhat hard to understand Algorithms (1) and (2). There is a lot of white space in Algorithm (1) and (2), why don't use i

Code & Models

Repositories

hp-cao/phy_rl
noneOfficial

Videos

Physics-Regulated Deep Reinforcement Learning: Invariant Embeddings· slideslive

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Fault Detection and Control Systems