Active Preference-Based Gaussian Process Regression for Reward Learning

Erdem B{\i}y{\i}k; Nicolas Huynh; Mykel J. Kochenderfer; Dorsa Sadigh

arXiv:2005.02575·cs.RO·June 5, 2020·6 cites

Active Preference-Based Gaussian Process Regression for Reward Learning

Erdem B{\i}y{\i}k, Nicolas Huynh, Mykel J. Kochenderfer, Dorsa Sadigh

PDF

Open Access 1 Repo

TL;DR

This paper introduces an active preference-based Gaussian Process method for reward learning that efficiently infers expressive reward functions from human trajectory preferences, addressing data inefficiency and structural constraints.

Contribution

It proposes a novel active learning framework using Gaussian Processes to learn reward functions solely from human preferences without assuming strict reward structures.

Findings

01

Efficiently learns reward functions from limited human preferences.

02

Outperforms existing methods in simulation and user studies.

03

Handles high-dimensional robotic tasks effectively.

Abstract

Designing reward functions is a challenging problem in AI and robotics. Humans usually have a difficult time directly specifying all the desirable behaviors that a robot needs to optimize. One common approach is to learn reward functions from collected expert demonstrations. However, learning reward functions from demonstrations introduces many challenges: some methods require highly structured models, e.g. reward functions that are linear in some predefined set of features, while others adopt less structured reward functions that on the other hand require tremendous amount of data. In addition, humans tend to have a difficult time providing demonstrations on robots with high degrees of freedom, or even quantifying reward values for given demonstrations. To address these challenges, we present a preference-based learning approach, where as an alternative, the human feedback is only in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Stanford-ILIAD/active-preference-based-gpr
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGaussian Processes and Bayesian Inference · Advanced Control Systems Optimization · Advanced Multi-Objective Optimization Algorithms

MethodsGaussian Process