TL;DR
This paper introduces ExPose, a fast and accurate method for regressing full expressive 3D human body, face, and hands from a single RGB image, overcoming data scarcity and resolution challenges.
Contribution
The paper presents a novel body-driven attention mechanism and curated dataset to improve 3D human reconstruction from images, addressing data and resolution limitations.
Findings
ExPose outperforms existing optimization-based methods in accuracy.
It achieves faster inference with comparable or better quality.
The approach effectively captures detailed facial and hand expressions.
Abstract
To understand how people look, interact, or perform tasks, we need to quickly and accurately capture their 3D body, face, and hands together from an RGB image. Most existing methods focus only on parts of the body. A few recent approaches reconstruct full expressive 3D humans from images using 3D body models that include the face and hands. These methods are optimization-based and thus slow, prone to local optima, and require 2D keypoints as input. We address these limitations by introducing ExPose (EXpressive POse and Shape rEgression), which directly regresses the body, face, and hands, in SMPL-X format, from an RGB image. This is a hard problem due to the high dimensionality of the body and the lack of expressive training data. Additionally, hands and faces are much smaller than the body, occupying very few image pixels. This makes hand and face estimation hard when body images are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsResidual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Batch Normalization · Convolution · HRNet · 1x1 Convolution
