3D-Aware Implicit Motion Control for View-Adaptive Human Video Generation
Zhixue Fang, Xu He, Songlin Tang, Haoxian Zhang, Qingfeng Li, Xiaoqiang Liu, Pengfei Wan, Kun Gai

TL;DR
This paper introduces 3DiMo, a view-agnostic, implicit motion control method for human video generation that improves motion fidelity and view flexibility by training a motion encoder with a pretrained generator using diverse multi-view supervision.
Contribution
It proposes a novel implicit, view-agnostic motion representation and a training framework that transitions from external 3D guidance to intrinsic 3D understanding, enhancing view-adaptive human video synthesis.
Findings
Outperforms existing methods in motion fidelity and visual quality.
Enables flexible, text-driven camera control in generated videos.
Successfully transitions from external 3D supervision to intrinsic 3D motion understanding.
Abstract
Existing methods for human motion control in video generation typically rely on either 2D poses or explicit 3D parametric models (e.g., SMPL) as control signals. However, 2D poses rigidly bind motion to the driving viewpoint, precluding novel-view synthesis. Explicit 3D models, though structurally informative, suffer from inherent inaccuracies (e.g., depth ambiguity and inaccurate dynamics) which, when used as a strong constraint, override the powerful intrinsic 3D awareness of large-scale video generators. In this work, we revisit motion control from a 3D-aware perspective, advocating for an implicit, view-agnostic motion representation that naturally aligns with the generator's spatial priors rather than depending on externally reconstructed constraints. We introduce 3DiMo, which jointly trains a motion encoder with a pretrained video generator to distill driving frames into compact,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Human Pose and Action Recognition · 3D Shape Modeling and Analysis
