The Sufficiency of Off-Policyness and Soft Clipping: PPO is still Insufficient according to an Off-Policy Measure
Xing Chen, Dongcui Diao, Hechang Chen, Hengshuai Yao, Haiyin Piao,, Zhixiao Sun, Zhiwei Yang, Randy Goebel, Bei Jiang, Yi Chang

TL;DR
This paper demonstrates that PPO's clipped policy space is insufficient by introducing a novel off-policy measure, showing that better policies exist outside PPO's constrained space, and proposing an exploration method that surpasses PPO in policy optimization.
Contribution
The paper introduces a new surrogate objective using sigmoid functions, revealing PPO's limitations and enabling exploration beyond the clipped policy space, improving CPI optimization.
Findings
PPO is insufficient in off-policyness according to the DEON metric.
The proposed method explores a larger policy space than PPO.
Our algorithm outperforms PPO in maximizing the CPI objective during training.
Abstract
The popular Proximal Policy Optimization (PPO) algorithm approximates the solution in a clipped policy space. Does there exist better policies outside of this space? By using a novel surrogate objective that employs the sigmoid function (which provides an interesting way of exploration), we found that the answer is ``YES'', and the better policies are in fact located very far from the clipped space. We show that PPO is insufficient in ``off-policyness'', according to an off-policy metric called DEON. Our algorithm explores in a much larger policy space than PPO, and it maximizes the Conservative Policy Iteration (CPI) objective better than PPO during training. To the best of our knowledge, all current PPO methods have the clipping operation and optimize in the clipped policy space. Our method is the first of this kind, which advances the understanding of CPI optimization and policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsOptimization and Search Problems · Age of Information Optimization
MethodsEntropy Regularization · Proximal Policy Optimization
