Scaling up Multimodal Pre-training for Sign Language Understanding

Wengang Zhou; Weichao Zhao; Hezhen Hu; Zecheng Li; Houqiang Li

arXiv:2408.08544·cs.CV·August 19, 2024

Scaling up Multimodal Pre-training for Sign Language Understanding

Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li

PDF

Open Access

TL;DR

This paper introduces a large-scale multimodal pre-training approach to improve sign language understanding across recognition, translation, and retrieval tasks, addressing the challenge of learning effective representations of sign language videos.

Contribution

It proposes a unified multimodal pre-training framework that enhances performance across multiple sign language understanding tasks, a novel approach in this domain.

Findings

01

Improved accuracy in sign language recognition and translation tasks.

02

Enhanced retrieval performance with the proposed pre-training model.

03

Demonstrated generalization across diverse SLU tasks.

Abstract

Sign language serves as the primary meaning of communication for the deaf-mute community. Different from spoken language, it commonly conveys information by the collaboration of manual features, i.e., hand gestures and body movements, and non-manual features, i.e., facial expressions and mouth cues. To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied in recent years, including isolated/continuous sign language recognition (ISLR/CSLR), gloss-free sign language translation (GF-SLT) and sign language retrieval (SL-RT). Sign language recognition and translation aims to understand the semantic meaning conveyed by sign languages from gloss-level and sentence-level, respectively. In contrast, SL-RT focuses on retrieving sign videos or corresponding texts from a closed-set under the query-by-example search…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHearing Impairment and Communication · Hand Gesture Recognition Systems · Speech and dialogue systems