Motion is the Choreographer: Learning Latent Pose Dynamics for Seamless Sign Language Generation
Jiayi He, Xu Wang, Shengeng Tang, Yaxiong Wang, Lechao Cheng, Dan Guo

TL;DR
This paper introduces a novel sign language video generation method that separates motion semantics from signer identity, enabling realistic, flexible, and signer-independent sign language synthesis with minimal data.
Contribution
It proposes a two-phase framework that constructs a signer-independent motion lexicon and synthesizes continuous motion trajectories, improving generalization and personalization in sign language video generation.
Findings
Disentangling motion from identity improves synthesis quality.
The method requires only one recording per sign for the lexicon.
It enables flexible signer personalization with high realism.
Abstract
Sign language video generation requires producing natural signing motions with realistic appearances under precise semantic control, yet faces two critical challenges: excessive signer-specific data requirements and poor generalization. We propose a new paradigm for sign language video generation that decouples motion semantics from signer identity through a two-phase synthesis framework. First, we construct a signer-independent multimodal motion lexicon, where each gloss is stored as identity-agnostic pose, gesture, and 3D mesh sequences, requiring only one recording per sign. This compact representation enables our second key innovation: a discrete-to-continuous motion synthesis stage that transforms retrieved gloss sequences into temporally coherent motion trajectories, followed by identity-aware neural rendering to produce photorealistic videos of arbitrary signers. Unlike prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
