Learning Gaussian Policies from Corrective Human Feedback

Daan Wout; Jan Scholten; Carlos Celemin; Jens Kober

arXiv:1903.05216·cs.LG·March 14, 2019·1 cites

Learning Gaussian Policies from Corrective Human Feedback

Daan Wout, Jan Scholten, Carlos Celemin, Jens Kober

PDF

Open Access

TL;DR

This paper introduces Gaussian Process Coach (GPC), a novel method for learning policies from corrective human feedback that improves performance and robustness over existing approaches by leveraging Gaussian Processes and policy uncertainty.

Contribution

The paper proposes GPC, which eliminates feature engineering by using Gaussian Processes and utilizes policy uncertainty for active feedback querying and adaptive learning rates.

Findings

01

GPC outperforms COACH in final performance and convergence rate.

02

GPC shows increased robustness to erroneous feedback.

03

GPC is effective in both simulated and real human teaching scenarios.

Abstract

Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher's learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Control Systems and Identification · Reinforcement Learning in Robotics