Learning Reward Functions from Diverse Sources of Human Feedback: Optimally Integrating Demonstrations and Preferences
Erdem B{\i}y{\i}k, Dylan P. Losey, Malayandi Palan, Nicholas C., Landolfi, Gleb Shevchuk, Dorsa Sadigh

TL;DR
This paper introduces a framework for integrating multiple human feedback sources, such as demonstrations and preferences, to learn reward functions more effectively for robotic tasks, combining passive and active data collection.
Contribution
The paper presents an algorithm that combines passive demonstrations with active preference queries to improve reward learning and adaptively determines when to use each data type.
Findings
Integrated approach outperforms single-source methods in simulations.
User studies show the framework is user-friendly and effective.
The method achieves more accurate reward functions with fewer queries.
Abstract
Reward functions are a common way to specify the objective of a robot. As designing reward functions can be extremely challenging, a more promising approach is to directly learn reward functions from human teachers. Importantly, data from human teachers can be collected either passively or actively in a variety of forms: passive data sources include demonstrations, (e.g., kinesthetic guidance), whereas preferences (e.g., comparative rankings) are actively elicited. Prior research has independently applied reward learning to these different data sources. However, there exist many domains where multiple sources are complementary and expressive. Motivated by this general problem, we present a framework to integrate multiple sources of information, which are either passively or actively collected from human users. In particular, we present an algorithm that first utilizes user…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
