SCOPE: Sign Language Contextual Processing with Embedding from LLMs

Yuqi Liu; Wenqian Zhang; Sihan Ren; Chengyu Huang; Jingyi Yu; Lan Xu

arXiv:2409.01073·cs.CV·September 4, 2024

SCOPE: Sign Language Contextual Processing with Embedding from LLMs

Yuqi Liu, Wenqian Zhang, Sihan Ren, Chengyu Huang, Jingyi Yu, Lan Xu

PDF

Open Access 1 Repo 1 Video

TL;DR

SCOPE introduces a context-aware sign language recognition and translation framework leveraging large language models and dialogue context, significantly improving performance on multiple datasets and validated by community surveys.

Contribution

The paper presents a novel multi-modal, context-aware framework for sign language recognition and translation, incorporating dialogue context and fine-tuning LLMs, along with a new dataset.

Findings

01

Achieves state-of-the-art results on multiple datasets

02

Enhances recognition accuracy with dialogue context

03

Validated robustness through community surveys

Abstract

Sign languages, used by around 70 million Deaf individuals globally, are visual languages that convey visual and contextual information. Current methods in vision-based sign language recognition (SLR) and translation (SLT) struggle with dialogue scenes due to limited dataset diversity and the neglect of contextually relevant information. To address these challenges, we introduce SCOPE (Sign language Contextual Processing with Embedding from LLMs), a novel context-aware vision-based SLR and SLT framework. For SLR, we utilize dialogue contexts through a multi-modal encoder to enhance gloss-level recognition. For subsequent SLT, we further fine-tune a Large Language Model (LLM) by incorporating prior conversational context. We also contribute a new sign language dataset that contains 72 hours of Chinese sign language videos in contextual dialogues across various scenarios. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Godheritage/SCOPE
noneOfficial

Videos

SCOPE: Sign Language Contextual Processing with Embedding from LLMs· underline

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication

MethodsSurrogate Lagrangian Relaxation