FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery

Patrick Kwon; Chen Chen

arXiv:2605.14854·cs.CV·May 19, 2026

FactorizedHMR: A Hybrid Framework for Video Human Mesh Recovery

Patrick Kwon, Chen Chen

PDF

TL;DR

FactorizedHMR is a two-stage hybrid framework that improves video human mesh recovery by separately handling well-constrained torso regions and uncertain limb articulations, especially under occlusion.

Contribution

It introduces a novel two-stage approach with deterministic and probabilistic modules, along with a synthetic data pipeline for diverse supervision, enhancing recovery accuracy under challenging conditions.

Findings

01

Outperforms strong baselines in occlusion-heavy scenarios

02

Provides clearer gains in world-space metrics

03

Maintains competitive performance across benchmarks

Abstract

Human Mesh Recovery (HMR) is fundamentally ambiguous: under occlusion or weak depth cues, multiple 3D bodies can explain the same image evidence. This ambiguity is not uniform across the body, as torso pose and root structure are often relatively well constrained, whereas distal articulations such as the arms and legs are more uncertain. Building on this observation, we propose FactorizedHMR, a two-stage framework that treats these two regimes differently. A deterministic regression module first recovers a stable torso-root anchor, and a probabilistic flow-matching module then completes the remaining non-torso articulation. To make this completion reliable, we combine a composite target representation with geometry-aware supervision and feature-aware classifier-free guidance, preserving the torso-root anchor while improving single-reference recovery of ambiguity-prone articulation. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.