PersPose: 3D Human Pose Estimation with Perspective Encoding and Perspective Rotation
Xiaoyang Hao, Han Li

TL;DR
PersPose introduces perspective encoding and rotation techniques to improve monocular 3D human pose estimation by accounting for camera intrinsics and perspective distortions, achieving state-of-the-art results on multiple datasets.
Contribution
The paper proposes Perspective Encoding and Perspective Rotation to enhance 3D human pose estimation from monocular images, addressing perspective-related challenges and improving accuracy.
Findings
Achieves SOTA performance on 3DPW, MPI-INF-3DHP, and Human3.6M datasets.
Reduces MPJPE by 7.54% on 3DPW compared to previous methods.
Demonstrates effectiveness of perspective-aware techniques in 3D HPE.
Abstract
Monocular 3D human pose estimation (HPE) methods estimate the 3D positions of joints from individual images. Existing 3D HPE approaches often use the cropped image alone as input for their models. However, the relative depths of joints cannot be accurately estimated from cropped images without the corresponding camera intrinsics, which determine the perspective relationship between 3D objects and the cropped images. In this work, we introduce Perspective Encoding (PE) to encode the camera intrinsics of the cropped images. Moreover, since the human subject can appear anywhere within the original image, the perspective relationship between the 3D scene and the cropped image differs significantly, which complicates model fitting. Additionally, the further the human subject deviates from the image center, the greater the perspective distortions in the cropped image. To address these issues,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
