Evolving Pareto-Optimal Actor-Critic Algorithms for Generalizability and Stability
Juan Jose Garau-Luis, Yingjie Miao, John D. Co-Reyes, Aaron Parisi,, Jie Tan, Esteban Real, Aleksandra Faust

TL;DR
MetaPG is an evolutionary approach that automatically designs actor-critic algorithms, significantly improving their generalizability and stability for real-world reinforcement learning tasks.
Contribution
It introduces MetaPG, a method that evolves actor-critic loss functions focusing on generalizability, stability, and performance, outperforming SAC in various environments.
Findings
MetaPG improves generalizability by 20% over SAC.
MetaPG reduces instability by up to 67%.
Evolved algorithms perform well across different environments and conditions.
Abstract
Generalizability and stability are two key objectives for operating reinforcement learning (RL) agents in the real world. Designing RL algorithms that optimize these objectives can be a costly and painstaking process. This paper presents MetaPG, an evolutionary method for automated design of actor-critic loss functions. MetaPG explicitly optimizes for generalizability and performance, and implicitly optimizes the stability of both metrics. We initialize our loss function population with Soft Actor-Critic (SAC) and perform multi-objective optimization using fitness metrics encoding single-task performance, zero-shot generalizability to unseen environment configurations, and stability across independent runs with different random seeds. On a set of continuous control tasks from the Real-World RL Benchmark Suite, we find that our method, using a single environment during evolution, evolves…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Evolutionary Algorithms and Applications · Metaheuristic Optimization Algorithms Research
