HumanGenesis: Agent-Based Geometric and Generative Modeling for Synthetic Human Dynamics
Weiqi Li, Zehao Zhang, Liang Lin, Guangrun Wang

TL;DR
HumanGenesis introduces an agent-based framework that combines geometric reconstruction and generative modeling to produce photorealistic, expressive human videos with improved fidelity and scene coherence.
Contribution
The paper presents a novel multi-agent system integrating 3D reconstruction, critique, pose guidance, and video harmonization for synthetic human dynamics.
Findings
Achieves state-of-the-art results in text-guided synthesis and video reenactment.
Significantly improves geometric fidelity and scene integration.
Enhances motion generalization for expressive pose sequences.
Abstract
\textbf{Synthetic human dynamics} aims to generate photorealistic videos of human subjects performing expressive, intention-driven motions. However, current approaches face two core challenges: (1) \emph{geometric inconsistency} and \emph{coarse reconstruction}, due to limited 3D modeling and detail preservation; and (2) \emph{motion generalization limitations} and \emph{scene inharmonization}, stemming from weak generative capabilities. To address these, we present \textbf{HumanGenesis}, a framework that integrates geometric and generative modeling through four collaborative agents: (1) \textbf{Reconstructor} builds 3D-consistent human-scene representations from monocular video using 3D Gaussian Splatting and deformation decomposition. (2) \textbf{Critique Agent} enhances reconstruction fidelity by identifying and refining poor regions via multi-round MLLM-based reflection. (3)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Generative Adversarial Networks and Image Synthesis
