Directional-Clamp PPO
Gilad Karpel, Ruida Zhou, Shoham Sabach, Mohammad Ghavamzadeh

TL;DR
The paper introduces Directional-Clamp PPO, a novel reinforcement learning algorithm that penalizes updates moving in the wrong direction, leading to more stable and effective policy optimization.
Contribution
It proposes a new penalty mechanism in PPO that steers importance ratios away from wrong directions, improving stability and performance in continuous control tasks.
Findings
DClamp-PPO outperforms standard PPO and variants across MuJoCo environments.
The method reduces wrong-direction updates and maintains importance ratios closer to 1.
Theoretical analysis confirms improved stability of the optimization process.
Abstract
Proximal Policy Optimization (PPO) is widely regarded as one of the most successful deep reinforcement learning algorithms, known for its robustness and effectiveness across a range of problems. The PPO objective encourages the importance ratio between the current and behavior policies to move to the "right" direction -- starting from importance sampling ratios equal to 1, increasing the ratios for actions with positive advantages and decreasing those with negative advantages. A clipping function is introduced to prevent over-optimization when updating the importance ratio in these "right" direction regions. Many PPO variants have been proposed to extend its success, most of which modify the objective's behavior by altering the clipping in the "right" direction regions. However, due to randomness in the rollouts and stochasticity of the policy optimization, we observe that the ratios…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Multi-Objective Optimization Algorithms
