A survey on policy search algorithms for learning robot controllers in a handful of trials
Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Freek Stulp,, Sylvain Calinon, Jean-Baptiste Mouret

TL;DR
This survey reviews methods enabling robots to learn effective policies within a few trials by leveraging prior knowledge and data-driven models, addressing the challenge of micro-data reinforcement learning.
Contribution
It categorizes existing approaches for rapid policy learning, highlighting the combination of priors and surrogate models as key to success in micro-data reinforcement learning.
Findings
Leveraging prior knowledge improves sample efficiency.
Data-driven surrogate models enable faster policy optimization.
Combining priors and models is most effective for micro-data learning.
Abstract
Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
