AnyLift: Scaling Motion Reconstruction from Internet Videos via 2D Diffusion
Hongjie Li, Heng Yu, Jiaman Li, Hong-Xing Yu, Ehsan Adeli, C. Karen Liu, Jiajun Wu

TL;DR
AnyLift introduces a two-stage diffusion-based framework that synthesizes multi-view 2D motion data from Internet videos to reconstruct globally consistent 3D human motion and interactions, even for rare motions.
Contribution
It leverages 2D diffusion and synthetic multi-view data to improve 3D motion and HOI reconstruction from challenging Internet videos, addressing limitations of existing methods.
Findings
Outperforms prior work in realistic human motion reconstruction.
Effectively captures challenging motions like gymnastics.
Successfully recovers coherent human-object interactions in the wild.
Abstract
Reconstructing 3D human motion and human-object interactions (HOI) from Internet videos is a fundamental step toward building large-scale datasets of human behavior. Existing methods struggle to recover globally consistent 3D motion under dynamic cameras, especially for motion types underrepresented in current motion-capture datasets, and face additional difficulty recovering coherent human-object interactions in 3D. We introduce a two-stage framework leveraging 2D diffusion that reconstructs 3D human motion and HOI from Internet videos. In the first stage, we synthesize multi-view 2D motion data for each domain, leveraging 2D keypoints extracted from Internet videos to incorporate human motions that rarely appear in existing MoCap datasets. In the second stage, a camera-conditioned multi-view 2D motion diffusion model is trained on the domain-specific synthetic data to recover 3D human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
