Policy Gradient RL Algorithms as Directed Acyclic Graphs
Juan Jose Garau Luis

TL;DR
This paper extends a DAG-based framework for meta RL to include Policy Gradient algorithms, enabling automated discovery and optimization of these algorithms within a unified graph-based representation.
Contribution
It introduces an extended search language and DAG representations for five Policy Gradient algorithms, filling a gap in the existing meta RL framework.
Findings
Successfully represented five Policy Gradient algorithms as DAGs.
Enhanced the framework's ability to generate and optimize Policy Gradient methods.
Facilitated automated discovery of new Policy Gradient algorithms.
Abstract
Meta Reinforcement Learning (RL) methods focus on automating the design of RL algorithms that generalize to a wide range of environments. The framework introduced in (Anonymous, 2020) addresses the problem by representing different RL algorithms as Directed Acyclic Graphs (DAGs), and using an evolutionary meta learner to modify these graphs and find good agent update rules. While the search language used to generate graphs in the paper serves to represent numerous already-existing RL algorithms (e.g., DQN, DDQN), it has limitations when it comes to representing Policy Gradient algorithms. In this work we try to close this gap by extending the original search language and proposing graphs for five different Policy Gradient algorithms: VPG, PPO, DDPG, TD3, and SAC.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Formal Methods in Verification · Scheduling and Optimization Algorithms
MethodsDilated Convolution · Global Average Pooling · Average Pooling · 1x1 Convolution · Switchable Atrous Convolution · Batch Normalization · Adam · Experience Replay · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution
