THUNDR: Transformer-based 3D HUmaN Reconstruction with Markers
Mihai Zanfir, Andrei Zanfir, Eduard Gabriel Bazavan, William T., Freeman, Rahul Sukthankar, Cristian Sminchisescu

TL;DR
THUNDR introduces a transformer-based approach for 3D human reconstruction from monocular images, combining model-free predictions with anthropometric constraints, achieving state-of-the-art results in both supervised and self-supervised settings.
Contribution
The paper presents a novel transformer-based pipeline that integrates 3D marker prediction with a statistical human model for improved 3D human pose and shape estimation.
Findings
State-of-the-art results on Human3.6M and 3DPW datasets.
Effective in both supervised and self-supervised regimes.
Robust performance on challenging in-the-wild poses.
Abstract
We present THUNDR, a transformer-based deep neural network methodology to reconstruct the 3d pose and shape of people, given monocular RGB images. Key to our methodology is an intermediate 3d marker representation, where we aim to combine the predictive power of model-free-output architectures and the regularizing, anthropometrically-preserving properties of a statistical human surface model like GHUM -- a recently introduced, expressive full body statistical 3d human model, trained end-to-end. Our novel transformer-based prediction pipeline can focus on image regions relevant to the task, supports self-supervised regimes, and ensures that solutions are consistent with human anthropometry. We show state-of-the-art results on Human3.6M and 3DPW, for both the fully-supervised and the self-supervised models, for the task of inferring 3d human shape, joint positions, and global translation.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
