TL;DR
FrankMocap is a fast, accurate system for monocular 3D whole-body pose estimation that integrates face, hands, and body movements from single images, outperforming existing methods.
Contribution
It introduces a modular approach combining independent 3D pose regressions with flexible integration modules for seamless whole-body estimation.
Findings
Outperforms optimization-based methods in accuracy
Provides real-time whole-body pose estimation
Effectively combines face, hands, and body data
Abstract
Most existing monocular 3D pose estimation approaches only focus on a single body part, neglecting the fact that the essential nuance of human motion is conveyed through a concert of subtle movements of face, hands, and body. In this paper, we present FrankMocap, a fast and accurate whole-body 3D pose estimation system that can produce 3D face, hands, and body simultaneously from in-the-wild monocular images. The core idea of FrankMocap is its modular design: We first run 3D pose regression methods for face, hands, and body independently, followed by composing the regression outputs via an integration module. The separate regression modules allow us to take full advantage of their state-of-the-art performances without compromising the original accuracy and reliability in practice. We develop three different integration modules that trade off between latency and accuracy. All of them are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
