MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
Zeren Jiang, Chen Guo, Manuel Kaufmann, Tianjian Jiang, Julien, Valentin, Otmar Hilliges, Jie Song

TL;DR
MultiPly is a novel framework that reconstructs multiple 3D human shapes from monocular in-the-wild videos, effectively disentangling individuals and capturing detailed shapes despite challenging interactions.
Contribution
It introduces a layered neural scene representation, hybrid segmentation, and confidence-guided optimization for accurate multi-person 3D reconstruction from monocular videos.
Findings
Outperforms prior methods on public datasets
Achieves temporally consistent 3D reconstructions
Handles close interactions and complex scenes effectively
Abstract
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentanglement of individuals without any prior knowledge about the subjects. Moreover, it requires recovering intricate and complete 3D human shapes from short video sequences, intensifying the level of difficulty. To tackle these challenges, we first define a layered neural representation for the entire scene, composited by individual human and background models. We learn the layered neural representation from videos via our layer-wise differentiable volume rendering. This learning process is further enhanced by our hybrid instance segmentation approach which combines the self-supervised 3D…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis
