TL;DR
This paper introduces a scalable, robust model-based policy search method using parameterized black-box priors, enabling high-dimensional robotic control with minimal interaction time.
Contribution
The paper proposes a novel approach that leverages parameterized black-box priors within Black-DROPS to scale model-based policy search to high-dimensional systems and improve robustness.
Findings
Outperforms previous algorithms in data efficiency.
Enables a hexapod robot to learn gaits in 16-30 seconds.
Successfully applied to high-dimensional robot control tasks.
Abstract
The most data-efficient algorithms for reinforcement learning in robotics are model-based policy search algorithms, which alternate between learning a dynamical model of the robot and optimizing a policy to maximize the expected return given the model and its uncertainties. Among the few proposed approaches, the recently introduced Black-DROPS algorithm exploits a black-box optimization algorithm to achieve both high data-efficiency and good computation times when several cores are used; nevertheless, like all model-based policy search approaches, Black-DROPS does not scale to high dimensional state/action spaces. In this paper, we introduce a new model learning procedure in Black-DROPS that leverages parameterized black-box priors to (1) scale up to high-dimensional systems, and (2) be robust to large inaccuracies of the prior information. We demonstrate the effectiveness of our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
