DanceHMR: Hand-Aware Whole-Body Human Mesh Recovery from Monocular Videos
Wenhao Shen, Ming Zhou, Hengyuan Zhang, Siyuan Bian, Youjiang Xu, Xi Lin

TL;DR
DanceHMR is a novel framework that achieves temporally stable, detailed whole-body human mesh recovery from monocular videos, effectively capturing hand articulation and body motion in challenging real-world scenarios.
Contribution
It introduces a unified, temporally coherent model with residual body-hand fusion and close-up-aware augmentation for improved hand and body mesh recovery.
Findings
Enhanced hand articulation recovery compared to prior methods.
Achieved stable, temporally consistent SMPL-X mesh motion in real-world videos.
Demonstrated competitive accuracy on benchmark datasets.
Abstract
Monocular video human mesh recovery is essential for digital humans, avatar animation, and embodied simulation, where both temporal stability and expressive whole-body motion are required. Existing video HMR methods produce coherent body motion but often overlook detailed hand articulation, while image-based whole-body methods recover SMPL-X meshes independently per frame, often leading to jittery and inaccurate hand motion. We present a temporally coherent whole-body HMR framework for challenging in-the-wild monocular videos. Our model unifies body context and part-specific hand observations through residual body-hand fusion, enabling stable body motion and detailed hand recovery within a single temporal architecture. We further introduce close-up-aware augmentation to improve robustness under upper-body framing. Experiments on whole-body and body-only benchmarks demonstrate improved…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
