SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale
Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu

TL;DR
This paper introduces SignMusketeers, a scalable and efficient multi-stream model for sign language translation that learns from individual frames and focuses on key features like face, hands, and body pose, achieving high performance with minimal computation.
Contribution
The paper presents a novel self-supervised, multi-stream approach that learns sign language representations from frames, reducing computational costs while maintaining translation accuracy.
Findings
Achieves state-of-the-art translation performance on How2Sign dataset.
Uses less than 3% of the compute compared to previous models.
Focuses on key sign language attributes like face, hands, and body pose.
Abstract
A persistent challenge in sign language video processing, including the task of sign to written language translation, is how we learn representations of sign language in an effective and efficient way that preserves the important attributes of these languages, while remaining invariant to irrelevant visual differences. Informed by the nature and linguistics of signed languages, our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body pose of the signer. However, instead of fully relying on pose estimation from off-the-shelf pose tracking models, which have inconsistent performance for hands and faces, we propose to learn a representation of the complex handshapes and facial expressions of sign languages in a self-supervised fashion. Our approach is based on learning from individual frames (rather than video sequences) and is therefore much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Subtitles and Audiovisual Media
