WARPED: Wrist-Aligned Rendering for Robot Policy Learning from Egocentric Human Demonstrations
Harry Freeman, Chung Hee Kim, George Kantor

TL;DR
WARPED is a framework that synthesizes wrist-view observations from egocentric videos to train visuomotor policies efficiently with monocular RGB data, reducing data collection time significantly.
Contribution
The paper introduces WARPED, a novel method for generating realistic wrist-view data from monocular videos to facilitate robot policy learning without specialized hardware.
Findings
WARPED achieves success rates comparable to teleoperated demonstrations.
Requires 5-8x less data collection time than traditional methods.
Effective for five tabletop manipulation tasks.
Abstract
Recent advancements in learning from human demonstration have shown promising results in addressing the scalability and high cost of data collection required to train robust visuomotor policies. However, existing approaches are often constrained by a reliance on multiview camera setups, depth sensors, or custom hardware and are typically limited to policy execution from third-person or egocentric cameras. In this paper, we present WARPED, a framework designed to synthesize realistic wrist-view observations from human demonstration videos to facilitate the training of visuomotor policies using only monocular RGB data. With data collected from an egocentric RGB camera, our system leverages vision foundation models to initialize the interactive scene. A hand-object interaction pipeline is then employed to track the hand and manipulated object and retarget the trajectories to a robotic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
