Bridging Sign and Spoken Languages: Pseudo Gloss Generation for Sign Language Translation
Jianyuan Guo, Peike Li, Trevor Cohn

TL;DR
This paper introduces a novel gloss-free pseudo gloss generation method for sign language translation that leverages large language models and weakly supervised learning to improve alignment and performance without relying on costly gloss annotations.
Contribution
It proposes a new framework that generates pseudo glosses from spoken language text using LLMs, enhancing sign language translation without expert-annotated gloss labels.
Findings
Outperforms previous gloss-free methods on SLT benchmarks.
Achieves competitive results compared to gloss-based approaches.
Uses weakly supervised learning for better alignment of pseudo glosses.
Abstract
Sign Language Translation (SLT) aims to map sign language videos to spoken language text. A common approach relies on gloss annotations as an intermediate representation, decomposing SLT into two sub-tasks: video-to-gloss recognition and gloss-to-text translation. While effective, this paradigm depends on expert-annotated gloss labels, which are costly and rarely available in existing datasets, limiting its scalability. To address this challenge, we propose a gloss-free pseudo gloss generation framework that eliminates the need for human-annotated glosses while preserving the structured intermediate representation. Specifically, we prompt a Large Language Model (LLM) with a few example text-gloss pairs using in-context learning to produce draft sign glosses from spoken language text. To enhance the correspondence between LLM-generated pseudo glosses and the sign sequences in video, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Speech and dialogue systems
