Proximal Policy Optimization with Adaptive Threshold for Symmetric   Relative Density Ratio

Taisuke Kobayashi

arXiv:2203.09809·cs.LG·July 4, 2023

Proximal Policy Optimization with Adaptive Threshold for Symmetric Relative Density Ratio

Taisuke Kobayashi

PDF

Open Access

TL;DR

This paper introduces PPO-RPE, a novel reinforcement learning method that uses a symmetric relative density ratio with an adaptive threshold to improve policy regularization and stability in complex environments.

Contribution

It proposes a new PPO variant utilizing relative Pearson divergence to adaptively set the threshold based on symmetry, enhancing policy update control.

Findings

01

The adaptive threshold improves policy regularization effectiveness.

02

PPO-RPE outperforms traditional methods in benchmark simulations.

03

The method statistically enhances task success in locomotion tasks.

Abstract

Deep reinforcement learning (DRL) is one of the promising approaches for introducing robots into complicated environments. The recent remarkable progress of DRL stands on regularization of policy, which allows the policy to improve stably and efficiently. A popular method, so-called proximal policy optimization (PPO), and its variants constrain density ratio of the latest and baseline policies when the density ratio exceeds a given threshold. This threshold can be designed relatively intuitively, and in fact its recommended value range has been suggested. However, the density ratio is asymmetric for its center, and the possible error scale from its center, which should be close to the threshold, would depend on how the baseline policy is given. In order to maximize the values of regularization of policy, this paper proposes a new PPO derived using relative Pearson (RPE) divergence,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms

MethodsEntropy Regularization · Proximal Policy Optimization