DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction

Yufu Wang; Evonne Ng; Soyong Shin; Rawal Khirodkar; Yuan Dong; Zhaoen Su; Jinhyung Park; Kris Kitani; Alexander Richard; Fabian Prada; Michael Zollhofer

arXiv:2603.03265·cs.CV·March 4, 2026

DuoMo: Dual Motion Diffusion for World-Space Human Reconstruction

Yufu Wang, Evonne Ng, Soyong Shin, Rawal Khirodkar, Yuan Dong, Zhaoen Su, Jinhyung Park, Kris Kitani, Alexander Richard, Fabian Prada, Michael Zollhofer

PDF

Open Access

TL;DR

DuoMo introduces a dual diffusion model approach to accurately reconstruct global human motion from noisy, incomplete videos, surpassing previous methods in error reduction and consistency.

Contribution

The paper proposes a novel dual diffusion model framework that separately estimates camera-space motion and refines it in world coordinates, enabling robust, global human motion reconstruction.

Findings

01

Achieves 16% reduction in world-space error on EMDB dataset.

02

Achieves 30% reduction in world-space error on RICH dataset.

03

Maintains low foot skating while improving accuracy.

Abstract

We present DuoMo, a generative method that recovers human motion in world-space coordinates from unconstrained videos with noisy or incomplete observations. Reconstructing such motion requires solving a fundamental trade-off: generalizing from diverse and noisy video inputs while maintaining global motion consistency. Our approach addresses this problem by factorizing motion learning into two diffusion models. The camera-space model first estimates motion from videos in camera coordinates. The world-space model then lifts this initial estimate into world coordinates and refines it to be globally consistent. Together, the two models can reconstruct motion across diverse scenes and trajectories, even from highly noisy or incomplete observations. Moreover, our formulation is general, generating the motion of mesh vertices directly and bypassing parametric models. DuoMo achieves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · 3D Shape Modeling and Analysis