Directional-Clamp PPO

Gilad Karpel; Ruida Zhou; Shoham Sabach; Mohammad Ghavamzadeh

arXiv:2511.02577·cs.LG·November 5, 2025

Directional-Clamp PPO

Gilad Karpel, Ruida Zhou, Shoham Sabach, Mohammad Ghavamzadeh

PDF

Open Access

TL;DR

The paper introduces Directional-Clamp PPO, a novel reinforcement learning algorithm that penalizes updates moving in the wrong direction, leading to more stable and effective policy optimization.

Contribution

It proposes a new penalty mechanism in PPO that steers importance ratios away from wrong directions, improving stability and performance in continuous control tasks.

Findings

01

DClamp-PPO outperforms standard PPO and variants across MuJoCo environments.

02

The method reduces wrong-direction updates and maintains importance ratios closer to 1.

03

Theoretical analysis confirms improved stability of the optimization process.

Abstract

Proximal Policy Optimization (PPO) is widely regarded as one of the most successful deep reinforcement learning algorithms, known for its robustness and effectiveness across a range of problems. The PPO objective encourages the importance ratio between the current and behavior policies to move to the "right" direction -- starting from importance sampling ratios equal to 1, increasing the ratios for actions with positive advantages and decreasing those with negative advantages. A clipping function is introduced to prevent over-optimization when updating the importance ratio in these "right" direction regions. Many PPO variants have been proposed to extend its success, most of which modify the objective's behavior by altering the clipping in the "right" direction regions. However, due to randomness in the rollouts and stochasticity of the policy optimization, we observe that the ratios…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Advanced Multi-Objective Optimization Algorithms