Learning Trajectory Preferences for Manipulators via Iterative Improvement
Ashesh Jain, Brian Wojcik, Thorsten Joachims, Ashutosh Saxena

TL;DR
This paper introduces a co-active online learning framework for teaching robots personalized manipulation trajectories through iterative user feedback, avoiding the need for optimal demonstrations.
Contribution
It presents a novel co-active preference learning approach with theoretical regret bounds, applicable to diverse manipulation tasks and environments.
Findings
The algorithm effectively learns user preferences for manipulation trajectories.
It achieves regret bounds comparable to optimal trajectory algorithms.
Demonstrated on grocery checkout tasks with varying environmental factors.
Abstract
We consider the problem of learning good trajectories for manipulation tasks. This is challenging because the criterion defining a good trajectory varies with users, tasks and environments. In this paper, we propose a co-active online learning framework for teaching robots the preferences of its users for object manipulation tasks. The key novelty of our approach lies in the type of feedback expected from the user: the human user does not need to demonstrate optimal trajectories as training data, but merely needs to iteratively provide trajectories that slightly improve over the trajectory currently proposed by the system. We argue that this co-active preference feedback can be more easily elicited from the user than demonstrations of optimal trajectories, which are often challenging and non-intuitive to provide on high degrees of freedom manipulators. Nevertheless, theoretical regret…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Reinforcement Learning in Robotics · Robotic Path Planning Algorithms
