GLOS: Sign Language Generation with Temporally Aligned Gloss-Level Conditioning
Taeryung Lee, Hyeongjin Nam, Gyeongsik Moon, and Kyoung Mu Lee

TL;DR
GLOS introduces a sign language generation framework that uses temporally aligned gloss-level conditioning and a new fusion module to improve lexical order and semantic accuracy in generated signs.
Contribution
The paper presents a novel gloss-level conditioning approach with temporal alignment and a fusion module, enhancing sign language generation accuracy and control.
Findings
Outperforms prior methods on CSL-Daily and Phoenix-2014T datasets.
Generates signs with correct lexical order and high semantic accuracy.
Enables fine-grained control of signs through word-level semantics.
Abstract
Sign language generation (SLG), or text-to-sign generation, bridges the gap between signers and non-signers. Despite recent progress in SLG, existing methods still often suffer from incorrect lexical ordering and low semantic accuracy. This is primarily due to sentence-level condition, which encodes the entire sentence of the input text into a single feature vector as a condition for SLG. This approach fails to capture the temporal structure of sign language and lacks the granularity of word-level semantics, often leading to disordered sign sequences and ambiguous motions. To overcome these limitations, we propose GLOS, a sign language generation framework with temporally aligned gloss-level conditioning. First, we employ gloss-level conditions, which we define as sequences of gloss embeddings temporally aligned with the motion sequence. This enables the model to access both the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Social Robot Interaction and HRI
