Text-Driven 3D Hand Motion Generation from Sign Language Data
L\'eore Bensabath, Mathis Petrovich, G\"ul Varol

TL;DR
This paper introduces a large-scale dataset and a diffusion-based generative model for creating 3D hand motions from natural language descriptions, advancing sign language understanding and synthesis.
Contribution
We automatically generate a large paired dataset of 3D hand motions and text, and develop a robust text-conditioned diffusion model for diverse sign language motion synthesis.
Findings
Model generalizes to unseen sign categories
Effective across different sign languages
Supports non-sign hand movements
Abstract
Our goal is to train a generative model of 3D hand motions, conditioned on natural language descriptions specifying motion characteristics such as handshapes, locations, finger/hand/arm movements. To this end, we automatically build pairs of 3D hand motions and their associated textual labels with unprecedented scale. Specifically, we leverage a large-scale sign language video dataset, along with noisy pseudo-annotated sign categories, which we translate into hand motion descriptions via an LLM that utilizes a dictionary of sign attributes, as well as our complementary motion-script cues. This data enables training a text-conditioned hand motion diffusion model HandMDM, that is robust across domains such as unseen sign categories from the same sign language, but also signs from another sign language and non-sign hand movements. We contribute extensive experimental investigation of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
