GLOS: Sign Language Generation with Temporally Aligned Gloss-Level Conditioning

Taeryung Lee; Hyeongjin Nam; Gyeongsik Moon; and Kyoung Mu Lee

arXiv:2506.07460·cs.CV·June 10, 2025

GLOS: Sign Language Generation with Temporally Aligned Gloss-Level Conditioning

Taeryung Lee, Hyeongjin Nam, Gyeongsik Moon, and Kyoung Mu Lee

PDF

Open Access

TL;DR

GLOS introduces a sign language generation framework that uses temporally aligned gloss-level conditioning and a new fusion module to improve lexical order and semantic accuracy in generated signs.

Contribution

The paper presents a novel gloss-level conditioning approach with temporal alignment and a fusion module, enhancing sign language generation accuracy and control.

Findings

01

Outperforms prior methods on CSL-Daily and Phoenix-2014T datasets.

02

Generates signs with correct lexical order and high semantic accuracy.

03

Enables fine-grained control of signs through word-level semantics.

Abstract

Sign language generation (SLG), or text-to-sign generation, bridges the gap between signers and non-signers. Despite recent progress in SLG, existing methods still often suffer from incorrect lexical ordering and low semantic accuracy. This is primarily due to sentence-level condition, which encodes the entire sentence of the input text into a single feature vector as a condition for SLG. This approach fails to capture the temporal structure of sign language and lacks the granularity of word-level semantics, often leading to disordered sign sequences and ambiguous motions. To overcome these limitations, we propose GLOS, a sign language generation framework with temporally aligned gloss-level conditioning. First, we employ gloss-level conditions, which we define as sequences of gloss embeddings temporally aligned with the motion sequence. This enables the model to access both the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Social Robot Interaction and HRI