Human Preference Modeling Using Visual Motion Prediction Improves Robot Skill Learning from Egocentric Human Video
Mrinal Verghese, Christopher G. Atkeson

TL;DR
This paper introduces a novel method for robot skill learning from egocentric human videos by modeling human preferences through visual motion prediction, leading to improved performance in real robot tasks.
Contribution
It proposes a new reward function based on predicting object motion, enabling effective robot learning directly from human videos without transfer issues.
Findings
Policies learned outperform prior methods in multiple tasks.
The approach works effectively on real robots and in simulation.
The method requires only a few demonstrations to initialize learning.
Abstract
We present an approach to robot learning from egocentric human videos by modeling human preferences in a reward function and optimizing robot behavior to maximize this reward. Prior work on reward learning from human videos attempts to measure the long-term value of a visual state as the temporal distance between it and the terminal state in a demonstration video. These approaches make assumptions that limit performance when learning from video. They must also transfer the learned value function across the embodiment and environment gap. Our method models human preferences by learning to predict the motion of tracked points between subsequent images and defines a reward function as the agreement between predicted and observed object motion in a robot's behavior at each step. We then use a modified Soft Actor Critic (SAC) algorithm initialized with 10 on-robot demonstrations to estimate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Social Robot Interaction and HRI · Robot Manipulation and Learning
