Evolved Policy Gradients
Rein Houthooft, Richard Y. Chen, Phillip Isola, Bradly C. Stadie,, Filip Wolski, Jonathan Ho, Pieter Abbeel

TL;DR
This paper introduces a metalearning approach that evolves a differentiable loss function for reinforcement learning, enabling faster and more adaptable policy learning with better generalization to new tasks.
Contribution
It presents a novel method to evolve a flexible loss function for RL, improving learning speed and generalization over existing policy gradient algorithms.
Findings
EPG achieves faster learning in randomized environments.
EPG's learned loss generalizes to out-of-distribution tasks.
EPG exhibits distinct behavior from other metalearning algorithms.
Abstract
We propose a metalearning approach for learning gradient-based reinforcement learning (RL) algorithms. The idea is to evolve a differentiable loss function, such that an agent, which optimizes its policy to minimize this loss, will achieve high rewards. The loss is parametrized via temporal convolutions over the agent's experience. Because this loss is highly flexible in its ability to take into account the agent's history, it enables fast task learning. Empirical results show that our evolved policy gradient algorithm (EPG) achieves faster learning on several randomized environments compared to an off-the-shelf policy gradient method. We also demonstrate that EPG's learned loss can generalize to out-of-distribution test time tasks, and exhibits qualitatively different behavior from other popular metalearning algorithms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
