CameraHMR: Aligning People with Perspective
Priyanka Patel, Michael J. Black

TL;DR
CameraHMR introduces improved pseudo ground truth generation for 3D human pose estimation by estimating camera intrinsics and using dense surface keypoints, resulting in more accurate models from monocular images.
Contribution
The paper presents novel methods for estimating camera intrinsics and fitting dense keypoints, enhancing pseudo ground truth quality for training 3D human pose models.
Findings
Enhanced pseudo ground truth leads to more accurate 3D pose estimation.
Incorporating full perspective camera models improves fitting accuracy.
CameraHMR achieves state-of-the-art performance in 3D human pose estimation.
Abstract
We address the challenge of accurate 3D human pose and shape estimation from monocular images. The key to accuracy and robustness lies in high-quality training data. Existing training datasets containing real images with pseudo ground truth (pGT) use SMPLify to fit SMPL to sparse 2D joint locations, assuming a simplified camera with default intrinsics. We make two contributions that improve pGT accuracy. First, to estimate camera intrinsics, we develop a field-of-view prediction model (HumanFoV) trained on a dataset of images containing people. We use the estimated intrinsics to enhance the 4D-Humans dataset by incorporating a full perspective camera model during SMPLify fitting. Second, 2D joints provide limited constraints on 3D body shape, resulting in average-looking bodies. To address this, we use the BEDLAM dataset to train a dense surface keypoint detector. We apply this detector…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMigration, Aging, and Tourism Studies
