Text-Driven 3D Hand Motion Generation from Sign Language Data

L\'eore Bensabath; Mathis Petrovich; G\"ul Varol

arXiv:2508.15902·cs.CV·August 27, 2025

Text-Driven 3D Hand Motion Generation from Sign Language Data

L\'eore Bensabath, Mathis Petrovich, G\"ul Varol

PDF

TL;DR

This paper introduces a large-scale dataset and a diffusion-based generative model for creating 3D hand motions from natural language descriptions, advancing sign language understanding and synthesis.

Contribution

We automatically generate a large paired dataset of 3D hand motions and text, and develop a robust text-conditioned diffusion model for diverse sign language motion synthesis.

Findings

01

Model generalizes to unseen sign categories

02

Effective across different sign languages

03

Supports non-sign hand movements

Abstract

Our goal is to train a generative model of 3D hand motions, conditioned on natural language descriptions specifying motion characteristics such as handshapes, locations, finger/hand/arm movements. To this end, we automatically build pairs of 3D hand motions and their associated textual labels with unprecedented scale. Specifically, we leverage a large-scale sign language video dataset, along with noisy pseudo-annotated sign categories, which we translate into hand motion descriptions via an LLM that utilizes a dictionary of sign attributes, as well as our complementary motion-script cues. This data enables training a text-conditioned hand motion diffusion model HandMDM, that is robust across domains such as unseen sign categories from the same sign language, but also signs from another sign language and non-sign hand movements. We contribute extensive experimental investigation of these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.