Scaling up Multimodal Pre-training for Sign Language Understanding
Wengang Zhou, Weichao Zhao, Hezhen Hu, Zecheng Li, Houqiang Li

TL;DR
This paper introduces a large-scale multimodal pre-training approach to improve sign language understanding across recognition, translation, and retrieval tasks, addressing the challenge of learning effective representations of sign language videos.
Contribution
It proposes a unified multimodal pre-training framework that enhances performance across multiple sign language understanding tasks, a novel approach in this domain.
Findings
Improved accuracy in sign language recognition and translation tasks.
Enhanced retrieval performance with the proposed pre-training model.
Demonstrated generalization across diverse SLU tasks.
Abstract
Sign language serves as the primary meaning of communication for the deaf-mute community. Different from spoken language, it commonly conveys information by the collaboration of manual features, i.e., hand gestures and body movements, and non-manual features, i.e., facial expressions and mouth cues. To facilitate communication between the deaf-mute and hearing people, a series of sign language understanding (SLU) tasks have been studied in recent years, including isolated/continuous sign language recognition (ISLR/CSLR), gloss-free sign language translation (GF-SLT) and sign language retrieval (SL-RT). Sign language recognition and translation aims to understand the semantic meaning conveyed by sign languages from gloss-level and sentence-level, respectively. In contrast, SL-RT focuses on retrieving sign videos or corresponding texts from a closed-set under the query-by-example search…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHearing Impairment and Communication · Hand Gesture Recognition Systems · Speech and dialogue systems
