Lifting Motion to the 3D World via 2D Diffusion
Jiaman Li, C. Karen Liu, Jiajun Wu

TL;DR
This paper presents MVLift, a novel method that estimates 3D motion from 2D pose sequences using diffusion models, enabling accurate global 3D motion prediction without requiring 3D ground truth data.
Contribution
MVLift introduces a multi-stage framework leveraging 2D diffusion models to recover 3D motion from 2D data without 3D supervision, improving generalization across domains.
Findings
Outperforms prior methods on five datasets.
Does not require 3D ground truth for training.
Effective across human, human-object, and animal motions.
Abstract
Estimating 3D motion from 2D observations is a long-standing research challenge. Prior work typically requires training on datasets containing ground truth 3D motions, limiting their applicability to activities well-represented in existing motion capture data. This dependency particularly hinders generalization to out-of-distribution scenarios or subjects where collecting 3D ground truth is challenging, such as complex athletic movements or animal motion. We introduce MVLift, a novel approach to predict global 3D motion -- including both joint rotations and root trajectories in the world coordinate system -- using only 2D pose sequences for training. Our multi-stage framework leverages 2D motion diffusion models to progressively generate consistent 2D pose sequences across multiple views, a key step in recovering accurate global 3D motion. MVLift generalizes across various domains,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Computer Graphics and Visualization Techniques · Human Motion and Animation
MethodsDiffusion
