Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction
Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane

TL;DR
This paper introduces an automatic method to adjust exploration in safe reinforcement learning, ensuring constraints are satisfied with high probability while learning in continuous spaces.
Contribution
It proposes a novel exploration adjustment technique that dynamically switches exploration on or off and tunes variance, guaranteeing joint chance constraint satisfaction.
Findings
Method guarantees constraint satisfaction with high probability.
Numerical simulations demonstrate effectiveness and safety.
Automatic adjustment improves learning stability.
Abstract
In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. Specifically, our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value as well as adjusts the variance-covariance matrix used in the Gaussian policy for exploration. We also show that our exploration process adjustment method theoretically guarantees the satisfaction of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Control Systems Optimization
