Diversity Policy Gradient for Sample Efficient Quality-Diversity   Optimization

Thomas Pierrot; Valentin Mac\'e; F\'elix Chalumeau; Arthur Flajolet,; Geoffrey Cideron; Karim Beguir; Antoine Cully; Olivier Sigaud; Nicolas; Perrin-Gilbert

arXiv:2006.08505·cs.AI·June 1, 2022

Diversity Policy Gradient for Sample Efficient Quality-Diversity Optimization

Thomas Pierrot, Valentin Mac\'e, F\'elix Chalumeau, Arthur Flajolet,, Geoffrey Cideron, Karim Beguir, Antoine Cully, Olivier Sigaud, Nicolas, Perrin-Gilbert

PDF

1 Repo

TL;DR

This paper introduces QDPG, a novel algorithm combining policy gradients and quality-diversity methods to efficiently generate diverse, high-performing neural policies for continuous control tasks, enhancing exploration and robustness.

Contribution

The paper proposes a new Diversity Policy Gradient (DPG) method that improves sample efficiency in quality-diversity optimization by leveraging time-step level information.

Findings

01

QDPG outperforms evolutionary algorithms in sample efficiency.

02

QDPG produces diverse and high-quality policies.

03

The method is effective in continuous control environments.

Abstract

A fascinating aspect of nature lies in its ability to produce a large and diverse collection of organisms that are all high-performing in their niche. By contrast, most AI algorithms focus on finding a single efficient solution to a given problem. Aiming for diversity in addition to performance is a convenient way to deal with the exploration-exploitation trade-off that plays a central role in learning. It also allows for increased robustness when the returned collection contains several working solutions to the considered problem, making it well-suited for real applications such as robotics. Quality-Diversity (QD) methods are evolutionary algorithms designed for this purpose. This paper proposes a novel algorithm, QDPG, which combines the strength of Policy Gradient algorithms and Quality Diversity approaches to produce a collection of diverse and high-performing neural policies in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adaptive-intelligent-robotics/dcg-map-elites
jax

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.