TL;DR
DOPE is a novel method that combines part-specific experts into a single model for accurate, real-time whole-body 3D human pose estimation in natural environments, overcoming data limitations.
Contribution
The paper introduces a distillation approach that integrates separate part experts into one efficient model for whole-body 3D pose detection in the wild.
Findings
Outperforms non-distilled models significantly.
Achieves near-expert performance with less computational cost.
Enables real-time whole-body 3D pose estimation.
Abstract
We introduce DOPE, the first method to detect and estimate whole-body 3D human poses, including bodies, hands and faces, in the wild. Achieving this level of details is key for a number of applications that require understanding the interactions of the people with each other or with the environment. The main challenge is the lack of in-the-wild data with labeled whole-body 3D poses. In previous work, training data has been annotated or generated for simpler tasks focusing on bodies, hands or faces separately. In this work, we propose to take advantage of these datasets to train independent experts for each part, namely a body, a hand and a face expert, and distill their knowledge into a single deep network designed for whole-body 2D-3D pose detection. In practice, given a training image with partial or no annotation, each part expert detects its subset of keypoints in 2D and 3D and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
