Evolved Policy Gradients

Rein Houthooft; Richard Y. Chen; Phillip Isola; Bradly C. Stadie,; Filip Wolski; Jonathan Ho; Pieter Abbeel

arXiv:1802.04821·cs.LG·May 1, 2018·96 cites

Evolved Policy Gradients

Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie,, Filip Wolski, Jonathan Ho, Pieter Abbeel

PDF

Open Access 3 Repos

TL;DR

This paper introduces a metalearning approach that evolves a differentiable loss function for reinforcement learning, enabling faster and more adaptable policy learning with better generalization to new tasks.

Contribution

It presents a novel method to evolve a flexible loss function for RL, improving learning speed and generalization over existing policy gradient algorithms.

Findings

01

EPG achieves faster learning in randomized environments.

02

EPG's learned loss generalizes to out-of-distribution tasks.

03

EPG exhibits distinct behavior from other metalearning algorithms.

Abstract

We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM