Towards Metric-Aware Multi-Person Mesh Recovery by Jointly Optimizing Human Crowd in Camera Space
Kaiwen Wang, Kaili Zheng, Yiming Shi, Chenyi Guo, Ji Wu

TL;DR
This paper introduces a scene-consistent multi-person mesh recovery method that jointly optimizes human placements in camera space, leveraging anthropometric priors and depth cues, resulting in improved depth reasoning and mesh accuracy.
Contribution
It proposes Depth-conditioned Translation Optimization (DTO) for joint scene-level refinement and a Metric-Aware HMR network for metric-scale human mesh estimation, advancing multi-person mesh recovery.
Findings
Achieves state-of-the-art depth reasoning accuracy.
Constructs a large-scale, scene-consistent multi-person dataset.
Demonstrates improved human mesh recovery performance.
Abstract
Multi-person human mesh recovery from a single image is a challenging task, hindered by the scarcity of in-the-wild training data. Prevailing in-the-wild human mesh pseudo-ground-truth (pGT) generation pipelines are single-person-centric, where each human is processed individually without joint optimization. This oversight leads to a lack of scene-level consistency, producing individuals with conflicting depths and scales within the same image. To address this, we introduce Depth-conditioned Translation Optimization (DTO), a novel optimization-based method that jointly refines the camera-space translations of all individuals in a crowd. By leveraging anthropometric priors on human height and depth cues from a monocular depth estimator, DTO solves for a scene-consistent placement of all subjects within a principled Maximum a posteriori (MAP) framework. Applying DTO to the 4D-Humans…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
