Contextual Latent-Movements Off-Policy Optimization for Robotic Manipulation Skills
Samuele Tosatto, Georgia Chalvatzaki, Jan Peters

TL;DR
This paper introduces LAMPO, a novel off-policy reinforcement learning algorithm that efficiently optimizes low-dimensional latent movement primitives for robotic manipulation, improving sample efficiency in both simulation and real-world settings.
Contribution
The paper proposes a new framework combining low-dimensional latent dynamics with a contextual off-policy RL algorithm, LAMPO, for more efficient robot skill learning.
Findings
LAMPO achieves higher sample efficiency than existing methods.
Experimental results demonstrate successful transfer from simulation to real robots.
The approach effectively handles high-dimensional movement primitive optimization.
Abstract
Parameterized movement primitives have been extensively used for imitation learning of robotic tasks. However, the high-dimensionality of the parameter space hinders the improvement of such primitives in the reinforcement learning (RL) setting, especially for learning with physical robots. In this paper we propose a novel view on handling the demonstrated trajectories for acquiring low-dimensional, non-linear latent dynamics, using mixtures of probabilistic principal component analyzers (MPPCA) on the movements' parameter space. Moreover, we introduce a new contextual off-policy RL algorithm, named LAtent-Movements Policy Optimization (LAMPO). LAMPO can provide gradient estimates from previous experience using self-normalized importance sampling, hence, making full use of samples collected in previous learning iterations. These advantages combined provide a complete framework for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
