Shaped Policy Search for Evolutionary Strategies using Waypoints
Kiran Lekkala, Laurent Itti

TL;DR
This paper introduces a method to enhance exploration in evolutionary strategies for reinforcement learning by leveraging intermediate waypoints and learned dynamics, demonstrated on driving and robotic arm tasks.
Contribution
It proposes a novel approach that incorporates intermediate waypoints and learned dynamics into evolutionary strategies to improve training efficiency in RL tasks.
Findings
Improved exploration in RL with waypoints.
Faster training convergence in experiments.
Applicable to diverse simulation environments.
Abstract
In this paper, we try to improve exploration in Blackbox methods, particularly Evolution strategies (ES), when applied to Reinforcement Learning (RL) problems where intermediate waypoints/subgoals are available. Since Evolutionary strategies are highly parallelizable, instead of extracting just a scalar cumulative reward, we use the state-action pairs from the trajectories obtained during rollouts/evaluations, to learn the dynamics of the agent. The learnt dynamics are then used in the optimization procedure to speed-up training. Lastly, we show how our proposed approach is universally applicable by presenting results from experiments conducted on Carla driving and UR5 robotic arm simulators.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsEntropy Regularization · Proximal Policy Optimization · CARLA: An Open Urban Driving Simulator
