Coherent Human-Scene Reconstruction from Multi-Person Multi-View Video in a Single Pass
Sangmin Kim, Minhyuk Hwang, Geonho Cha, Dongyoon Wee, Jaesik Park

TL;DR
CHROMM is a unified neural framework for multi-person, multi-view 3D human and scene reconstruction that operates in a single pass, integrating geometric and human priors for fast, accurate results.
Contribution
The paper introduces CHROMM, a novel end-to-end neural network that jointly estimates cameras, scene point clouds, and human meshes from multi-view videos without external modules.
Findings
Achieves competitive accuracy in human motion and pose estimation.
Runs over 8 times faster than previous optimization-based methods.
Effectively handles scale discrepancies and multi-person association.
Abstract
Recent advances in 3D foundation models have led to growing interest in reconstructing humans and their surrounding environments. However, most existing approaches focus on monocular inputs, and extending them to multi-view settings requires additional overhead modules or preprocessed data. To this end, we present CHROMM, a unified framework that jointly estimates cameras, scene point clouds, and human meshes from multi-person multi-view videos without relying on external modules or preprocessing. We integrate strong geometric and human priors from Pi3X and Multi-HMR into a single trainable neural network architecture, and introduce a scale adjustment module to solve the scale discrepancy between humans and the scene. We also introduce a multi-view fusion strategy to aggregate per-view estimates into a single representation at test-time. Finally, we propose a geometry-based multi-person…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Gait Recognition and Analysis · Video Surveillance and Tracking Methods
