A survey on policy search algorithms for learning robot controllers in a   handful of trials

Konstantinos Chatzilygeroudis; Vassilis Vassiliades; Freek Stulp,; Sylvain Calinon; Jean-Baptiste Mouret

arXiv:1807.02303·cs.RO·December 5, 2019

A survey on policy search algorithms for learning robot controllers in a handful of trials

Konstantinos Chatzilygeroudis, Vassilis Vassiliades, Freek Stulp,, Sylvain Calinon, Jean-Baptiste Mouret

PDF

TL;DR

This survey reviews methods enabling robots to learn effective policies within a few trials by leveraging prior knowledge and data-driven models, addressing the challenge of micro-data reinforcement learning.

Contribution

It categorizes existing approaches for rapid policy learning, highlighting the combination of priors and surrogate models as key to success in micro-data reinforcement learning.

Findings

01

Leveraging prior knowledge improves sample efficiency.

02

Data-driven surrogate models enable faster policy optimization.

03

Combining priors and models is most effective for micro-data learning.

Abstract

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.