Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

Jared Markowitz; Edward W. Staley

arXiv:2311.05846·cs.LG·November 13, 2023·2 cites

Clipped-Objective Policy Gradients for Pessimistic Policy Optimization

Jared Markowitz, Edward W. Staley

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple clipped-objective policy gradient (COPG) method that enhances exploration and improves learning performance in deep reinforcement learning, especially in continuous action spaces, by promoting a pessimistic policy update.

Contribution

The paper proposes a novel clipped-objective policy gradient that is more pessimistic and promotes exploration, leading to better performance than PPO and comparable or better results than TRPO.

Findings

01

COPG improves learning performance over PPO in various settings.

02

Pessimistic objective promotes enhanced exploration.

03

COPG achieves comparable or superior results to TRPO.

Abstract

To facilitate efficient learning, policy gradient approaches to deep reinforcement learning (RL) are typically paired with variance reduction measures and strategies for making large but safe policy changes based on a batch of experiences. Natural policy gradient methods, including Trust Region Policy Optimization (TRPO), seek to produce monotonic improvement through bounded changes in policy outputs. Proximal Policy Optimization (PPO) is a commonly used, first-order algorithm that instead uses loss clipping to take multiple safe optimization steps per batch of data, replacing the bound on the single step of TRPO with regularization on multiple steps. In this work, we find that the performance of PPO, when applied to continuous action spaces, may be consistently improved through a simple change in objective. Instead of the importance sampling objective of PPO, we instead recommend a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

BIT-aerial-robotics/AquaML/blob/2.1.11/AquaML/rlalgo/COPGAgent.py
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Advanced Memory and Neural Computing · Fuel Cells and Related Materials

MethodsEntropy Regularization · Proximal Policy Optimization · Trust Region Policy Optimization