PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Qisheng Zhang; Zhen Guo; Audun J{\o}sang; Lance M. Kaplan; Feng Chen,; Dong H. Jeong; Jin-Hee Cho

arXiv:2212.06343·cs.LG·December 14, 2022

PPO-UE: Proximal Policy Optimization via Uncertainty-Aware Exploration

Qisheng Zhang, Zhen Guo, Audun J{\o}sang, Lance M. Kaplan, Feng Chen,, Dong H. Jeong, Jin-Hee Cho

PDF

Open Access

TL;DR

This paper introduces PPO-UE, an enhanced version of PPO that incorporates self-adaptive uncertainty-aware exploration to improve training stability, convergence speed, and performance in continuous control tasks.

Contribution

PPO-UE is a novel PPO variant that uses ratio uncertainty levels for adaptive exploration, addressing stability issues and boosting performance.

Findings

01

PPO-UE outperforms baseline PPO in Roboschool tasks.

02

Sensitivity analysis shows optimal ratio uncertainty levels improve results.

03

PPO-UE enhances convergence speed and stability.

Abstract

Proximal Policy Optimization (PPO) is a highly popular policy-based deep reinforcement learning (DRL) approach. However, we observe that the homogeneous exploration process in PPO could cause an unexpected stability issue in the training phase. To address this issue, we propose PPO-UE, a PPO variant equipped with self-adaptive uncertainty-aware explorations (UEs) based on a ratio uncertainty level. The proposed PPO-UE is designed to improve convergence speed and performance with an optimized ratio uncertainty level. Through extensive sensitivity analysis by varying the ratio uncertainty level, our proposed PPO-UE considerably outperforms the baseline PPO in Roboschool continuous control tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies · Prosthetics and Rehabilitation Robotics

MethodsEntropy Regularization · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Proximal Policy Optimization