Policy Gradient RL Algorithms as Directed Acyclic Graphs

Juan Jose Garau Luis

arXiv:2012.07763·cs.LG·January 19, 2021

Policy Gradient RL Algorithms as Directed Acyclic Graphs

Juan Jose Garau Luis

PDF

Open Access 1 Repo

TL;DR

This paper extends a DAG-based framework for meta RL to include Policy Gradient algorithms, enabling automated discovery and optimization of these algorithms within a unified graph-based representation.

Contribution

It introduces an extended search language and DAG representations for five Policy Gradient algorithms, filling a gap in the existing meta RL framework.

Findings

01

Successfully represented five Policy Gradient algorithms as DAGs.

02

Enhanced the framework's ability to generate and optimize Policy Gradient methods.

03

Facilitated automated discovery of new Policy Gradient algorithms.

Abstract

Meta Reinforcement Learning (RL) methods focus on automating the design of RL algorithms that generalize to a wide range of environments. The framework introduced in (Anonymous, 2020) addresses the problem by representing different RL algorithms as Directed Acyclic Graphs (DAGs), and using an evolutionary meta learner to modify these graphs and find good agent update rules. While the search language used to generate graphs in the paper serves to represent numerous already-existing RL algorithms (e.g., DQN, DDQN), it has limitations when it comes to representing Policy Gradient algorithms. In this work we try to close this gap by extending the original search language and proposing graphs for five different Policy Gradient algorithms: VPG, PPO, DDPG, TD3, and SAC.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jjgarau/DAGPolicyGradient
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Scheduling and Optimization Algorithms

MethodsDilated Convolution · Global Average Pooling · Average Pooling · 1x1 Convolution · Switchable Atrous Convolution · Batch Normalization · Adam · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution