Hindsight policy gradients

Paulo Rauber; Avinash Ummadisingu; Filipe Mutz; Juergen Schmidhuber

arXiv:1711.06006·cs.LG·February 21, 2019·25 cites

Hindsight policy gradients

Paulo Rauber, Avinash Ummadisingu, Filipe Mutz, Juergen Schmidhuber

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to incorporate hindsight into policy gradient reinforcement learning algorithms, significantly improving sample efficiency in sparse-reward environments by enabling agents to learn from achieved goals retrospectively.

Contribution

It generalizes the concept of hindsight to a broad class of policy gradient methods, enhancing their ability to learn efficiently in sparse-reward settings.

Findings

01

Hindsight improves sample efficiency across various environments.

02

The method generalizes to multiple policy gradient algorithms.

03

Experimental results show significant performance gains.

Abstract

A reinforcement learning agent that needs to pursue different goals across episodes requires a goal-conditional policy. In addition to their potential to generalize desirable behavior to unseen goals, such policies may also enable higher-level planning based on subgoals. In sparse-reward environments, the capacity to exploit information about the degree to which an arbitrary goal has been achieved while another goal was intended appears crucial to enable sample efficient learning. However, reinforcement learning agents have only recently been endowed with such capacity for hindsight. In this paper, we demonstrate how hindsight can be introduced to policy gradient methods, generalizing this idea to a broad class of successful algorithms. Our experiments on a diverse selection of sparse-reward environments show that hindsight leads to a remarkable increase in sample efficiency.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

paulorauber/hpg
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Adaptive Dynamic Programming Control