TL;DR
HUG3D is a comprehensive framework that reconstructs high-fidelity, physically plausible 3D human models from a single image by modeling group interactions and leveraging multi-view diffusion and physics-based priors.
Contribution
It introduces a novel holistic approach combining group-level context, multi-view diffusion, and physics-based priors for multi-human 3D reconstruction from a single image.
Findings
Outperforms existing methods in multi-human 3D reconstruction accuracy.
Produces more physically plausible and high-fidelity models.
Effectively resolves occlusions and interaction artifacts.
Abstract
Reconstructing textured 3D human models from a single image is fundamental for AR/VR and digital human applications. However, existing methods mostly focus on single individuals and thus fail in multi-human scenes, where naive composition of individual reconstructions often leads to artifacts such as unrealistic overlaps, missing geometry in occluded regions, and distorted interactions. These limitations highlight the need for approaches that incorporate group-level context and interaction priors. We introduce a holistic method that explicitly models both group- and instance-level information. To mitigate perspective-induced geometric distortions, we first transform the input into a canonical orthographic space. Our primary component, Human Group-Instance Multi-View Diffusion (HUG-MVD), then generates complete multi-view normals and images by jointly modeling individuals and group…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
