Deep Black-Box Reinforcement Learning with Movement Primitives
Fabian Otto, Onur Celik, Hongyi Zhou, Hanna Ziesche, Ngo Anh Vien,, Gerhard Neumann

TL;DR
This paper introduces a deep reinforcement learning algorithm using movement primitives and trust region layers, excelling in sparse and non-Markovian reward scenarios for robotic control.
Contribution
It presents a novel deep ERL algorithm with differentiable trust region layers, enabling high-precision policy learning for complex control tasks.
Findings
ERL outperforms step-based algorithms on sparse and non-Markovian rewards.
Trust region layers allow exact convex optimization for policy updates.
High-quality policies are achieved with sparse and non-Markovian reward formulations.
Abstract
\Episode-based reinforcement learning (ERL) algorithms treat reinforcement learning (RL) as a black-box optimization problem where we learn to select a parameter vector of a controller, often represented as a movement primitive, for a given task descriptor called a context. ERL offers several distinct benefits in comparison to step-based RL. It generates smooth control trajectories, can handle non-Markovian reward definitions, and the resulting exploration in parameter space is well suited for solving sparse reward settings. Yet, the high dimensionality of the movement primitive parameters has so far hampered the effective use of deep RL methods. In this paper, we present a new algorithm for deep ERL. It is based on differentiable trust region layers, a successful on-policy deep RL algorithm. These layers allow us to specify trust regions for the policy update that are solved exactly…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Bandit Algorithms Research · Age of Information Optimization
