TL;DR
This paper introduces QDPG, a novel algorithm combining policy gradients and quality-diversity methods to efficiently generate diverse, high-performing neural policies for continuous control tasks, enhancing exploration and robustness.
Contribution
The paper proposes a new Diversity Policy Gradient (DPG) method that improves sample efficiency in quality-diversity optimization by leveraging time-step level information.
Findings
QDPG outperforms evolutionary algorithms in sample efficiency.
QDPG produces diverse and high-quality policies.
The method is effective in continuous control environments.
Abstract
A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single efficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
