Proximal Policy Optimization with Adaptive Threshold for Symmetric Relative Density Ratio
Taisuke Kobayashi

TL;DR
This paper introduces PPO-RPE, a novel reinforcement learning method that uses a symmetric relative density ratio with an adaptive threshold to improve policy regularization and stability in complex environments.
Contribution
It proposes a new PPO variant utilizing relative Pearson divergence to adaptively set the threshold based on symmetry, enhancing policy update control.
Findings
The adaptive threshold improves policy regularization effectiveness.
PPO-RPE outperforms traditional methods in benchmark simulations.
The method statistically enhances task success in locomotion tasks.
Abstract
Deep reinforcement learning (DRL) is one of the promising approaches for introducing robots into complicated environments. The recent remarkable progress of DRL stands on regularization of policy, which allows the policy to improve stably and efficiently. A popular method, so-called proximal policy optimization (PPO), and its variants constrain density ratio of the latest and baseline policies when the density ratio exceeds a given threshold. This threshold can be designed relatively intuitively, and in fact its recommended value range has been suggested. However, the density ratio is asymmetric for its center, and the possible error scale from its center, which should be close to the threshold, would depend on how the baseline policy is given. In order to maximize the values of regularization of policy, this paper proposes a new PPO derived using relative Pearson (RPE) divergence,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms
MethodsEntropy Regularization · Proximal Policy Optimization
