Shaped Policy Search for Evolutionary Strategies using Waypoints

Kiran Lekkala; Laurent Itti

arXiv:2105.14639·cs.RO·July 4, 2023

Shaped Policy Search for Evolutionary Strategies using Waypoints

Kiran Lekkala, Laurent Itti

PDF

TL;DR

This paper introduces a method to enhance exploration in evolutionary strategies for reinforcement learning by leveraging intermediate waypoints and learned dynamics, demonstrated on driving and robotic arm tasks.

Contribution

It proposes a novel approach that incorporates intermediate waypoints and learned dynamics into evolutionary strategies to improve training efficiency in RL tasks.

Findings

01

Improved exploration in RL with waypoints.

02

Faster training convergence in experiments.

03

Applicable to diverse simulation environments.

Abstract

In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator