Learning Preferences for Manipulation Tasks from Online Coactive Feedback
Ashesh Jain, Shikhar Sharma, Thorsten Joachims, Ashutosh Saxena

TL;DR
This paper introduces a coactive online learning framework for teaching mobile manipulators preferences over trajectories in complex, context-rich environments, using incremental feedback instead of optimal demonstrations.
Contribution
It proposes a novel coactive feedback method for preference learning in manipulation tasks, with theoretical regret bounds and practical implementation on real robots.
Findings
Users can train robots with minimal feedback in minutes.
The algorithm achieves regret bounds matching optimal trajectory methods.
Successful experiments on household and grocery checkout scenarios.
Abstract
We consider the problem of learning preferences over trajectories for mobile manipulators such as personal robots and assembly line robots. The preferences we learn are more intricate than simple geometric constraints on trajectories; they are rather governed by the surrounding context of various objects and human interactions in the environment. We propose a coactive online learning framework for teaching preferences in contextually rich environments. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this coactive preference feedback can be more easily elicited than demonstrations of optimal trajectories. Nevertheless,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Optimization and Search Problems
