An Adaptive Clipping Approach for Proximal Policy Optimization

Gang Chen; Yiming Peng; Mengjie Zhang

arXiv:1804.06461·cs.LG·April 20, 2018·25 cites

An Adaptive Clipping Approach for Proximal Policy Optimization

Gang Chen, Yiming Peng, Mengjie Zhang

PDF

Open Access

TL;DR

This paper introduces PPO-λ, an adaptive clipping method for PPO that enhances learning efficiency and reliability by dynamically adjusting policy updates, showing improved performance on Atari and control benchmarks.

Contribution

It proposes a novel adaptive clipping mechanism and a new algorithm, PPO-λ, which improves policy learning by dynamically controlling update magnitudes based on theoretical targets.

Findings

01

PPO-λ outperforms PPO on Atari game tasks.

02

Adaptive clipping improves policy update stability.

03

Empirical results demonstrate better overall performance.

Abstract

Very recently proximal policy optimization (PPO) algorithms have been proposed as first-order optimization methods for effective reinforcement learning. While PPO is inspired by the same learning theory that justifies trust region policy optimization (TRPO), PPO substantially simplifies algorithm design and improves data efficiency by performing multiple epochs of \emph{clipped policy optimization} from sampled data. Although clipping in PPO stands for an important new mechanism for efficient and reliable policy update, it may fail to adaptively improve learning performance in accordance with the importance of each sampled state. To address this issue, a new surrogate learning objective featuring an adaptive clipping mechanism is proposed in this paper, enabling us to develop a new algorithm, known as PPO- $λ$ . PPO- $λ$ optimizes policies repeatedly based on a theoretical…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Evolutionary Algorithms and Applications

MethodsEntropy Regularization · Proximal Policy Optimization