TL;DR
This paper introduces a reward-conditioned neural movement primitive framework that leverages variational inference and evolutionary strategies to enable robots to learn complex trajectories efficiently, with demonstrated success in simulation and real-world tasks.
Contribution
The paper presents a novel neural process-based approach with a crossover operation in latent space for improved policy exploration and trajectory generation in robotic reinforcement learning.
Findings
Enhanced sample efficiency over state-of-the-art methods
Stable learning progress across diverse tasks
Successful real-world robot obstacle avoidance
Abstract
The aim of this paper is to study the reward based policy exploration problem in a supervised learning approach and enable robots to form complex movement trajectories in challenging reward settings and search spaces. For this, the experience of the robot, which can be bootstrapped from demonstrated trajectories, is used to train a novel Neural Processes-based deep network that samples from its latent space and generates the required trajectories given desired rewards. Our framework can generate progressively improved trajectories by sampling them from high reward landscapes, increasing the reward gradually. Variational inference is used to create a stochastic latent space to sample varying trajectories in generating population of trajectories given target rewards. We benefit from Evolutionary Strategies and propose a novel crossover operation, which is applied in the self-organized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
