SCOPE: Sign Language Contextual Processing with Embedding from LLMs
Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu

TL;DR
SCOPE introduces a context-aware sign language recognition and translation framework leveraging large language models and dialogue context, significantly improving performance on multiple datasets and validated by community surveys.
Contribution
The paper presents a novel multi-modal, context-aware framework for sign language recognition and translation, incorporating dialogue context and fine-tuning LLMs, along with a new dataset.
Findings
Achieves state-of-the-art results on multiple datasets
Enhances recognition accuracy with dialogue context
Validated robustness through community surveys
Abstract
Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign language Contextual Processing with Embedding from LLMs), a novel context-aware vision-based SLR and SLT framework. For SLR, we utilize dialogue contexts through a multi-modal encoder to enhance gloss-level recognition. For subsequent SLT, we further fine-tune a Large Language Model (LLM) by incorporating prior conversational context. We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Experimental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication
MethodsSurrogate Lagrangian Relaxation
