TL;DR
This paper introduces a deep learning model that predicts 3D hand gestures from body motion in conversational settings, improving gesture synthesis and hand pose estimation from single images.
Contribution
A novel deep prior for body motion enables accurate 3D hand prediction from body gestures, outperforming previous methods and generalizing to multi-person conversations.
Findings
Outperforms state-of-the-art in 3D hand pose estimation.
Effective in synthesizing hand gestures from body motion.
Generalizes beyond monologue data to multi-person interactions.
Abstract
We propose a novel learned deep prior of body motion for 3D hand shape synthesis and estimation in the domain of conversational gestures. Our model builds upon the insight that body motion and hand gestures are strongly correlated in non-verbal communication settings. We formulate the learning of this prior as a prediction task of 3D hand shape over time given body motion input alone. Trained with 3D pose estimations obtained from a large-scale dataset of internet videos, our hand prediction model produces convincing 3D hand gestures given only the 3D motion of the speaker's arms as input. We demonstrate the efficacy of our method on hand gesture synthesis from body motion input, and as a strong body prior for single-view image-based 3D hand pose estimation. We demonstrate that our method outperforms previous state-of-the-art approaches and can generalize beyond the monologue-based…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
