Guiding Evolutionary Strategies by Differentiable Robot Simulators

Vladislav Kurenkov; Bulat Maksudov

arXiv:2110.00438·cs.RO·November 10, 2021

Guiding Evolutionary Strategies by Differentiable Robot Simulators

Vladislav Kurenkov, Bulat Maksudov

PDF

1 Repo

TL;DR

This paper proposes combining Differentiable Robot Simulators with Evolutionary Strategies to significantly reduce sample complexity in robotic policy search, demonstrating 3-5 times efficiency gains in simulation and real-world experiments.

Contribution

It introduces a novel method that integrates DRS gradients with Evolutionary Strategies, improving sample efficiency in robotic policy optimization.

Findings

01

Sample complexity reduced by 3-5x in simulations.

02

Effective in both simulation and real-world environments.

03

Demonstrates potential for more efficient robotic learning.

Abstract

In recent years, Evolutionary Strategies were actively explored in robotic tasks for policy search as they provide a simpler alternative to reinforcement learning algorithms. However, this class of algorithms is often claimed to be extremely sample-inefficient. On the other hand, there is a growing interest in Differentiable Robot Simulators (DRS) as they potentially can find successful policies with only a handful of trajectories. But the resulting gradient is not always useful for the first-order optimization. In this work, we demonstrate how DRS gradient can be used in conjunction with Evolutionary Strategies. Preliminary results suggest that this combination can reduce sample complexity of Evolutionary Strategies by 3x-5x times in both simulation and the real world.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vkurenkov/guided-es-by-differentiable-simulators
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.