StgcDiff: Spatial-Temporal Graph Condition Diffusion for Sign Language Transition Generation

Jiashu He; Jiayi He; Shengeng Tang; Huixia Ben; Lechao Cheng; Richang Hong

arXiv:2506.13156·cs.CV·June 17, 2025

StgcDiff: Spatial-Temporal Graph Condition Diffusion for Sign Language Transition Generation

Jiashu He, Jiayi He, Shengeng Tang, Huixia Ben, Lechao Cheng, Richang Hong

PDF

Open Access

TL;DR

StgcDiff is a novel graph-based diffusion framework that generates smooth, coherent sign language transitions by modeling complex spatial-temporal dependencies, significantly improving over existing concatenation methods.

Contribution

We introduce a structure-aware, graph-based diffusion model with a Sign-GCN module for realistic sign language transition generation, capturing spatial-temporal cues more effectively.

Findings

01

Outperforms existing methods on PHOENIX14T, USTC-CSL100, and USTC-SLR500 datasets.

02

Produces more natural and semantically accurate sign language transitions.

03

Effectively models complex spatial-temporal dependencies in sign language data.

Abstract

Sign language transition generation seeks to convert discrete sign language segments into continuous sign videos by synthesizing smooth transitions. However,most existing methods merely concatenate isolated signs, resulting in poor visual coherence and semantic accuracy in the generated videos. Unlike textual languages,sign language is inherently rich in spatial-temporal cues, making it more complex to model. To address this,we propose StgcDiff, a graph-based conditional diffusion framework that generates smooth transitions between discrete signs by capturing the unique spatial-temporal dependencies of sign language. Specifically, we first train an encoder-decoder architecture to learn a structure-aware representation of spatial-temporal skeleton sequences. Next, we optimize a diffusion denoiser conditioned on the representations learned by the pre-trained encoder, which is tasked with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Speech and dialogue systems