Automatic Exploration Process Adjustment for Safe Reinforcement Learning   with Joint Chance Constraint Satisfaction

Yoshihiro Okawa; Tomotake Sasaki; Hidenao Iwane

arXiv:2103.03656·cs.LG·March 8, 2021

Automatic Exploration Process Adjustment for Safe Reinforcement Learning with Joint Chance Constraint Satisfaction

Yoshihiro Okawa, Tomotake Sasaki, Hidenao Iwane

PDF

Open Access

TL;DR

This paper introduces an automatic method to adjust exploration in safe reinforcement learning, ensuring constraints are satisfied with high probability while learning in continuous spaces.

Contribution

It proposes a novel exploration adjustment technique that dynamically switches exploration on or off and tunes variance, guaranteeing joint chance constraint satisfaction.

Findings

01

Method guarantees constraint satisfaction with high probability.

02

Numerical simulations demonstrate effectiveness and safety.

03

Automatic adjustment improves learning stability.

Abstract

In reinforcement learning (RL) algorithms, exploratory control inputs are used during learning to acquire knowledge for decision making and control, while the true dynamics of a controlled object is unknown. However, this exploring property sometimes causes undesired situations by violating constraints regarding the state of the controlled object. In this paper, we propose an automatic exploration process adjustment method for safe RL in continuous state and action spaces utilizing a linear nominal model of the controlled object. Specifically, our proposed method automatically selects whether the exploratory input is used or not at each time depending on the state and its predicted value as well as adjusts the variance-covariance matrix used in the Gaussian policy for exploration. We also show that our exploration process adjustment method theoretically guarantees the satisfaction of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Advanced Control Systems Optimization