Improving Policy Gradient by Exploring Under-appreciated Rewards

Ofir Nachum; Mohammad Norouzi; Dale Schuurmans

arXiv:1611.09321·cs.LG·March 17, 2017·5 cites

Improving Policy Gradient by Exploring Under-appreciated Rewards

Ofir Nachum, Mohammad Norouzi, Dale Schuurmans

PDF

Open Access

TL;DR

This paper introduces a directed exploration strategy for policy gradient reinforcement learning that focuses on under-appreciated reward regions, leading to improved performance on challenging tasks like multi-digit addition.

Contribution

It proposes a novel exploration method that emphasizes under-estimated reward regions, enhancing policy gradient methods' effectiveness in high-dimensional, sparse reward environments.

Findings

01

Successfully solves multi-digit addition with pure RL.

02

Reduces hyper-parameter sensitivity.

03

Outperforms baseline methods on challenging tasks.

Abstract

This paper presents a novel form of policy gradient for model-free reinforcement learning (RL) with improved exploration properties. Current policy-based methods use entropy regularization to encourage undirected exploration of the reward landscape, which is ineffective in high dimensional spaces with sparse rewards. We propose a more directed exploration strategy that promotes exploration of under-appreciated reward regions. An action sequence is considered under-appreciated if its log-probability under the current policy under-estimates its resulting reward. The proposed exploration strategy is easy to implement, requiring small modifications to an implementation of the REINFORCE algorithm. We evaluate the approach on a set of algorithmic tasks that have long challenged RL methods. Our approach reduces hyper-parameter sensitivity and demonstrates significant improvements over baseline…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsViral Infectious Diseases and Gene Expression in Insects · Reinforcement Learning in Robotics · Receptor Mechanisms and Signaling

MethodsREINFORCE · Entropy Regularization