Learning Reward Functions for Robotic Manipulation by Observing Humans
Minttu Alakuijala, Gabriel Dulac-Arnold, Julien Mairal, Jean Ponce and, Cordelia Schmid

TL;DR
This paper introduces a method to learn general reward functions for robotic manipulation by observing unlabeled human videos, enabling robots to better explore and learn tasks without task-specific demonstrations.
Contribution
The work presents a novel approach to derive task-agnostic reward functions from human videos, generalizing across robot embodiments and environments without requiring task-specific data.
Findings
The learned reward function generalizes to unseen robot and environment configurations.
The method accelerates reinforcement learning for manipulation tasks in simulation.
It does not require task-specific human demonstrations or predefined correspondences.
Abstract
Observing a human demonstrator manipulate objects provides a rich, scalable and inexpensive source of data for learning robotic policies. However, transferring skills from human videos to a robotic manipulator poses several challenges, not least a difference in action and observation spaces. In this work, we use unlabeled videos of humans solving a wide range of manipulation tasks to learn a task-agnostic reward function for robotic manipulation policies. Thanks to the diversity of this training data, the learned reward function sufficiently generalizes to image observations from a previously unseen robot embodiment and environment to provide a meaningful prior for directed exploration in reinforcement learning. We propose two methods for scoring states relative to a goal image: through direct temporal regression, and through distances in an embedding space obtained with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robot Manipulation and Learning · Domain Adaptation and Few-Shot Learning
