Learning Gaussian Policies from Corrective Human Feedback
Daan Wout, Jan Scholten, Carlos Celemin, Jens Kober

TL;DR
This paper introduces Gaussian Process Coach (GPC), a novel method for learning policies from corrective human feedback that improves performance and robustness over existing approaches by leveraging Gaussian Processes and policy uncertainty.
Contribution
The paper proposes GPC, which eliminates feature engineering by using Gaussian Processes and utilizes policy uncertainty for active feedback querying and adaptive learning rates.
Findings
GPC outperforms COACH in final performance and convergence rate.
GPC shows increased robustness to erroneous feedback.
GPC is effective in both simulated and real human teaching scenarios.
Abstract
Learning from human feedback is a viable alternative to control design that does not require modelling or control expertise. Particularly, learning from corrective advice garners advantages over evaluative feedback as it is a more intuitive and scalable format. The current state-of-the-art in this field, COACH, has proven to be a effective approach for confined problems. However, it parameterizes the policy with Radial Basis Function networks, which require meticulous feature space engineering for higher order systems. We introduce Gaussian Process Coach (GPC), where feature space engineering is avoided by employing Gaussian Processes. In addition, we use the available policy uncertainty to 1) inquire feedback samples of maximal utility and 2) to adapt the learning rate to the teacher's learning phase. We demonstrate that the novel algorithm outperforms the current state-of-the-art in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Control Systems and Identification · Reinforcement Learning in Robotics
