TL;DR
SPEC introduces a novel method for in-the-wild 3D human pose and shape estimation by accurately estimating camera parameters from a single image, significantly improving reconstruction accuracy over previous approaches that assumed fixed camera settings.
Contribution
The paper presents the first in-the-wild 3D human pose and shape estimation method that estimates camera parameters directly from a single image, enhancing reconstruction accuracy.
Findings
Outperforms prior methods on 3DPW benchmark
Creates new synthetic and in-the-wild datasets with camera calibration
Improves 3D human body reconstruction accuracy
Abstract
Due to the lack of camera parameter information for in-the-wild images, existing 3D human pose and shape (HPS) estimation methods make several simplifying assumptions: weak-perspective projection, large constant focal length, and zero camera rotation. These assumptions often do not hold and we show, quantitatively and qualitatively, that they cause errors in the reconstructed 3D shape and pose. To address this, we introduce SPEC, the first in-the-wild 3D HPS method that estimates the perspective camera from a single image and employs this to reconstruct 3D human bodies more accurately. First, we train a neural network to estimate the field of view, camera pitch, and roll given an input image. We employ novel losses that improve the calibration accuracy over previous work. We then train a novel network that concatenates the camera calibration to the image features and uses these together…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
