A Spatio-Temporal Representation Learning as an Alternative to   Traditional Glosses in Sign Language Translation and Production

Eui Jun Hwang; Sukmin Cho; Huije Lee; Youngwoo Yoon; Jong C. Park

arXiv:2407.02854·cs.CL·December 5, 2024

A Spatio-Temporal Representation Learning as an Alternative to Traditional Glosses in Sign Language Translation and Production

Eui Jun Hwang, Sukmin Cho, Huije Lee, Youngwoo Yoon, Jong C. Park

PDF

Open Access

TL;DR

This paper introduces UniGloR, a spatio-temporal representation framework that replaces traditional glosses in sign language translation and production, capturing dynamic features for improved performance.

Contribution

We propose UniGloR, a novel dense spatio-temporal representation derived from keypoints, addressing gloss annotation limitations and enhancing sign language processing tasks.

Findings

01

Outperforms previous methods on PHOENIX14T and How2Sign datasets.

02

Effectively captures sign language dynamics without gloss annotations.

03

Matches or exceeds state-of-the-art performance in SLT and SLP.

Abstract

This work addresses the challenges associated with the use of glosses in both Sign Language Translation (SLT) and Sign Language Production (SLP). While glosses have long been used as a bridge between sign language and spoken language, they come with two major limitations that impede the advancement of sign language systems. First, annotating the glosses is a labor-intensive and time-consuming process, which limits the scalability of datasets. Second, the glosses oversimplify sign language by stripping away its spatio-temporal dynamics, reducing complex signs to basic labels and missing the subtle movements essential for precise interpretation. To address these limitations, we introduce Universal Gloss-level Representation (UniGloR), a framework designed to capture the spatio-temporal features inherent in sign language, providing a more dynamic and detailed alternative to the use of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Subtitles and Audiovisual Media