Autoregressive Sign Language Production: A Gloss-Free Approach with Discrete Representations
Eui Jun Hwang, Huije Lee, Jong C. Park

TL;DR
This paper introduces a novel gloss-free sign language production method using vector quantization to derive discrete representations, improving translation quality and coherence in sign language generation.
Contribution
It proposes the Sign Language Vector Quantization Network, a new approach that leverages discrete representations and latent alignment for direct spoken-to-sign translation.
Findings
Outperforms previous SLP methods in evaluations
Demonstrates the effectiveness of Back-Translation and Fréchet Gesture Distance metrics
Supports manual and non-manual sign elements
Abstract
Gloss-free Sign Language Production (SLP) offers a direct translation of spoken language sentences into sign language, bypassing the need for gloss intermediaries. This paper presents the Sign language Vector Quantization Network, a novel approach to SLP that leverages Vector Quantization to derive discrete representations from sign pose sequences. Our method, rooted in both manual and non-manual elements of signing, supports advanced decoding methods and integrates latent-level alignment for enhanced linguistic coherence. Through comprehensive evaluations, we demonstrate superior performance of our method over prior SLP methods and highlight the reliability of Back-Translation and Fr\'echet Gesture Distance as evaluation metrics.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Speech and dialogue systems
