I Know What You Meant: Learning Human Objectives by (Under)estimating Their Choice Set
Ananth Jonnavittula, Dylan P. Losey

TL;DR
This paper introduces a risk-averse approach for assistive robots to learn human objectives by considering simpler or similar alternatives to user demonstrations, improving understanding especially when users cannot fully demonstrate their intentions.
Contribution
The paper proposes a novel method that underestimates human capability, formalizes properties for generating alternatives, and demonstrates improved learning in simulations and user studies.
Findings
Better extrapolation of human objectives in simulations.
Improved performance in user study with real participants.
Theoretically proven risk-averse advantage of underestimating human choice set.
Abstract
Assistive robots have the potential to help people perform everyday tasks. However, these robots first need to learn what it is their user wants them to do. Teaching assistive robots is hard for inexperienced users, elderly users, and users living with physical disabilities, since often these individuals are unable to show the robot their desired behavior. We know that inclusive learners should give human teachers credit for what they cannot demonstrate. But today's robots do the opposite: they assume every user is capable of providing any demonstration. As a result, these robots learn to mimic the demonstrated behavior, even when that behavior is not what the human really meant! Here we propose a different approach to reward learning: robots that reason about the user's demonstrations in the context of similar or simpler alternatives. Unlike prior works -- which err towards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
