Optimistic Policy Regularization

Mai Pham; Vikrant Vaze; Peter Chin

arXiv:2603.06793·cs.LG·March 10, 2026

Optimistic Policy Regularization

Mai Pham, Vikrant Vaze, Peter Chin

PDF

Open Access

TL;DR

This paper introduces Optimistic Policy Regularization (OPR), a method that enhances deep reinforcement learning by maintaining successful trajectories, leading to improved sample efficiency and performance across Atari games and cyber-defense environments.

Contribution

The paper proposes OPR, a novel regularization technique that preserves successful behaviors during policy training, significantly boosting sample efficiency and performance in deep reinforcement learning.

Findings

01

OPR improves sample efficiency on Atari games.

02

OPR achieves higher scores in 22 out of 49 Atari environments.

03

OPR outperforms baseline methods in cyber-defense tasks.

Abstract

Deep reinforcement learning agents frequently suffer from premature convergence, where early entropy collapse causes the policy to discard exploratory behaviors before discovering globally optimal strategies. We introduce Optimistic Policy Regularization (OPR), a lightweight mechanism designed to preserve and reinforce historically successful trajectories during policy optimization. OPR maintains a dynamic buffer of high-performing episodes and biases learning toward these behaviors through directional log-ratio reward shaping and an auxiliary behavioral cloning objective. When instantiated on Proximal Policy Optimization (PPO), OPR substantially improves sample efficiency on the Arcade Learning Environment. Across 49 Atari games evaluated at the 10-million step benchmark, OPR achieves the highest score in 22 environments despite baseline methods being reported at the standard…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Adversarial Robustness in Machine Learning