Optimistic Proximal Policy Optimization

Takahisa Imagawa; Takuya Hiraoka; Yoshimasa Tsuruoka

arXiv:1906.11075·cs.LG·June 27, 2019·1 cites

Optimistic Proximal Policy Optimization

Takahisa Imagawa, Takuya Hiraoka, Yoshimasa Tsuruoka

PDF

Open Access

TL;DR

This paper introduces OPPO, an enhancement to proximal policy optimization that uses optimism under uncertainty to improve reinforcement learning in environments with sparse rewards, demonstrating superior performance in tabular tasks.

Contribution

The paper proposes OPPO, a novel reinforcement learning algorithm that incorporates optimism in the face of uncertainty to better handle sparse reward scenarios.

Findings

01

OPPO outperforms existing methods in tabular tasks.

02

Using optimism improves policy evaluation under reward uncertainty.

03

OPPO effectively addresses the challenge of sparse rewards.

Abstract

Reinforcement Learning, a machine learning framework for training an autonomous agent based on rewards, has shown outstanding results in various domains. However, it is known that learning a good policy is difficult in a domain where rewards are rare. We propose a method, optimistic proximal policy optimization (OPPO) to alleviate this difficulty. OPPO considers the uncertainty of the estimated total return and optimistically evaluates the policy based on that amount. We show that OPPO outperforms the existing methods in a tabular task.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Machine Learning and Algorithms