Scaling Policy Gradient Quality-Diversity with Massive Parallelization via Behavioral Variations
Konstantinos Mitsides, Maxence Faldor, Antoine Cully

TL;DR
This paper introduces ASCII-ME, a fast, scalable, and sample-efficient policy gradient-based algorithm for Quality-Diversity optimization that significantly reduces runtime and enables massive parallelization without centralized training.
Contribution
ASCII-ME is a novel method that scales Quality-Diversity optimization with policy gradients, avoiding centralized actor-critic training and enabling rapid, parallel solution generation.
Findings
Generates diverse high-performing neural network policies in under 250 seconds.
Operates five times faster than current state-of-the-art algorithms.
Maintains competitive sample efficiency with massive parallelization.
Abstract
Quality-Diversity optimization comprises a family of evolutionary algorithms aimed at generating a collection of diverse and high-performing solutions. MAP-Elites (ME), a notable example, is used effectively in fields like evolutionary robotics. However, the reliance of ME on random mutations from Genetic Algorithms limits its ability to evolve high-dimensional solutions. Methods proposed to overcome this include using gradient-based operators like policy gradients or natural evolution strategies. While successful at scaling ME for neuroevolution, these methods often suffer from slow training speeds, or difficulties in scaling with massive parallelization due to high computational demands or reliance on centralized actor-critic training. In this work, we introduce a fast, sample-efficient ME based algorithm capable of scaling up with massive parallelization, significantly reducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Stochastic Gradient Optimization Techniques · Reinforcement Learning in Robotics
