Learning Preferences for Interactive Autonomy
Erdem B{\i}y{\i}k

TL;DR
This paper explores learning reward functions for robots from human comparative feedback instead of demonstrations, addressing the challenge of suboptimal human input by proposing various feedback forms and active learning strategies across diverse domains.
Contribution
It introduces new forms of comparative feedback and active learning methods to improve reward function inference from human input, especially when demonstrations are suboptimal.
Findings
Effective reward learning from comparative feedback demonstrated in multiple domains
Active learning improves the efficiency of human feedback collection
Methods outperform traditional demonstration-based inverse reinforcement learning
Abstract
When robots enter everyday human environments, they need to understand their tasks and how they should perform those tasks. To encode these, reward functions, which specify the objective of a robot, are employed. However, designing reward functions can be extremely challenging for complex tasks and environments. A promising approach is to learn reward functions from humans. Recently, several robot learning works embrace this approach and leverage human demonstrations to learn the reward functions. Known as inverse reinforcement learning, this approach relies on a fundamental assumption: humans can provide near-optimal demonstrations to the robot. Unfortunately, this is rarely the case: human demonstrations to the robot are often suboptimal due to various reasons, e.g., difficulty of teleoperation, robot having high degrees of freedom, or humans' cognitive limitations. This thesis is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics
