Learning to Imitate Object Interactions from Internet Videos
Austin Patel, Andrew Wang, Ilija Radosavovic, Jitendra Malik

TL;DR
This paper introduces a new method for reconstructing 4D hand-object interactions from videos and demonstrates how to imitate these interactions in a physics simulator, enabling applications in robotics and animation.
Contribution
The paper presents RHOV, a novel 4D reconstruction technique from videos, and a system for imitation in physics simulators, advancing understanding and replication of object interactions.
Findings
Successfully reconstructed 4D trajectories from 100 challenging videos
Imitated diverse object interactions in a physics simulator
Applicable to different embodiments, including robotic arms
Abstract
We study the problem of imitating object interactions from Internet videos. This requires understanding the hand-object interactions in 4D, spatially in 3D and over time, which is challenging due to mutual hand-object occlusions. In this paper we make two main contributions: (1) a novel reconstruction technique RHOV (Reconstructing Hands and Objects from Videos), which reconstructs 4D trajectories of both the hand and the object using 2D image cues and temporal smoothness constraints; (2) a system for imitating object interactions in a physics simulator with reinforcement learning. We apply our reconstruction technique to 100 challenging Internet videos. We further show that we can successfully imitate a range of different object interactions in a physics simulator. Our object-centric approach is not limited to human-like end-effectors and can learn to imitate object interactions using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
