TL;DR
This paper introduces PROX, a method that incorporates 3D scene constraints to improve monocular 3D human pose estimation, reducing errors by enforcing scene-body interactions and avoiding body-scene intersections.
Contribution
The paper presents a novel approach that leverages static 3D scene structure to enhance 3D human pose estimation from monocular images, including a new dataset and scene-aware constraints.
Findings
Scene constraints significantly reduce pose estimation errors.
The method effectively prevents body-scene inter-penetration.
Quantitative results show improved accuracy over baseline methods.
Abstract
To understand and analyze human behavior, we need to capture humans moving in, and interacting with, the world. Most existing methods perform 3D human pose estimation without explicitly considering the scene. We observe however that the world constrains the body and vice-versa. To motivate this, we show that current 3D human pose estimation methods produce results that are not consistent with the 3D scene. Our key contribution is to exploit static 3D scene structure to better estimate human pose from monocular images. The method enforces Proximal Relationships with Object eXclusion and is called PROX. To test this, we collect a new dataset composed of 12 different 3D scenes and RGB sequences of 20 subjects moving in and interacting with the scenes. We represent human pose using the 3D human body model SMPL-X and extend SMPLify-X to estimate body pose using scene constraints. We make use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
