DexAvatar: 3D Sign Language Reconstruction with Hand and Body Pose Priors
Kaustubh Kundu, Hrishav Bakul Barua, Lucy Robertson-Bell, Zhixi Cai, Kalin Stefanov

TL;DR
DexAvatar is a new framework that reconstructs detailed 3D hand and body movements from monocular sign language videos using learned priors, significantly improving pose estimation accuracy over existing methods.
Contribution
It introduces a novel approach combining hand and body pose priors for accurate 3D reconstruction from in-the-wild videos, addressing limitations of current datasets and estimation techniques.
Findings
Achieves 35.11% improvement in pose estimation accuracy on SGNify dataset.
Effectively reconstructs fine-grained hand articulations and body movements.
Outperforms state-of-the-art methods in sign language pose estimation.
Abstract
The trend in sign language generation is centered around data-driven generative methods that require vast amounts of precise 2D and 3D human pose data to achieve an acceptable generation quality. However, currently, most sign language datasets are video-based and limited to automatically reconstructed 2D human poses (i.e., keypoints) and lack accurate 3D information. Furthermore, existing state-of-the-art for automatic 3D human pose estimation from sign language videos is prone to self-occlusion, noise, and motion blur effects, resulting in poor reconstruction quality. In response to this, we introduce DexAvatar, a novel framework to reconstruct bio-mechanically accurate fine-grained hand articulations and body movements from in-the-wild monocular sign language videos, guided by learned 3D hand and body priors. DexAvatar achieves strong performance in the SGNify motion capture dataset,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Human Motion and Animation
