TL;DR
MetricHMSR is a new framework that accurately recovers metric human meshes and 3D scenes from a single image by disentangling pose and global position, improving 3D alignment.
Contribution
It introduces a bounding camera ray map and a Human MoE to explicitly recover metric scale and disentangle pose from global translation.
Findings
Achieves state-of-the-art results in human mesh recovery.
Improves 3D scene and human alignment accuracy.
Effectively disentangles local pose from global position.
Abstract
We introduce MetricHMSR, a novel framework for recovering metric human meshes and 3D scenes from a single monocular image. Existing methods struggle to recover metric scale due to monocular scale ambiguity and weak-perspective camera assumptions. Moreover, their fully coupled feature representations make it difficult to disentangle local pose from global translation, often requiring multi-stage pipelines that introduce accumulated errors. To address these challenges, we propose MetricHMR (Metric Human Mesh Recovery), which incorporates a bounding camera ray map representation to provide explicit metric cues for human reconstruction,together with a Human Mixture-of-Experts (HumanMoE) that dynamically routes image features to specialized experts, enabling the disentangled perception of local human pose and global metric position. Leveraging the recovered metric human as a geometric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
