FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery
Patrick Kwon, Chen Chen

TL;DR
FactorizedHMR is a two-stage hybrid framework that improves video human mesh recovery by separately handling well-constrained torso regions and uncertain limb articulations, especially under occlusion.
Contribution
It introduces a novel two-stage approach with deterministic and probabilistic modules, along with a synthetic data pipeline for diverse supervision, enhancing recovery accuracy under challenging conditions.
Findings
Outperforms strong baselines in occlusion-heavy scenarios
Provides clearer gains in world-space metrics
Maintains competitive performance across benchmarks
Abstract
Human Mesh Recovery (HMR) is fundamentally ambiguous: under occlusion or weak depth cues, multiple 3D bodies can explain the same image evidence. This ambiguity is not uniform across the body, as torso pose and root structure are often relatively well constrained, whereas distal articulations such as the arms and legs are more uncertain. Building on this observation, we propose FactorizedHMR, a two-stage framework that treats these two regimes differently. A deterministic regression module first recovers a stable torso-root anchor, and a probabilistic flow-matching module then completes the remaining non-torso articulation. To make this completion reliable, we combine a composite target representation with geometry-aware supervision and feature-aware classifier-free guidance, preserving the torso-root anchor while improving single-reference recovery of ambiguity-prone articulation. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
