SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

Shester Gueuwou; Xiaodan Du; Greg Shakhnarovich; Karen Livescu

arXiv:2406.06907·cs.CL·June 4, 2025·2 cites

SignMusketeers: An Efficient Multi-Stream Approach for Sign Language Translation at Scale

Shester Gueuwou, Xiaodan Du, Greg Shakhnarovich, Karen Livescu

PDF

Open Access

TL;DR

This paper introduces SignMusketeers, a scalable and efficient multi-stream model for sign language translation that learns from individual frames and focuses on key features like face, hands, and body pose, achieving high performance with minimal computation.

Contribution

The paper presents a novel self-supervised, multi-stream approach that learns sign language representations from frames, reducing computational costs while maintaining translation accuracy.

Findings

01

Achieves state-of-the-art translation performance on How2Sign dataset.

02

Uses less than 3% of the compute compared to previous models.

03

Focuses on key sign language attributes like face, hands, and body pose.

Abstract

A persistent challenge in sign language video processing, including the task of sign to written language translation, is how we learn representations of sign language in an effective and efficient way that preserves the important attributes of these languages, while remaining invariant to irrelevant visual differences. Informed by the nature and linguistics of signed languages, our proposed method focuses on just the most relevant parts in a signing video: the face, hands and body pose of the signer. However, instead of fully relying on pose estimation from off-the-shelf pose tracking models, which have inconsistent performance for hands and faces, we propose to learn a representation of the complex handshapes and facial expressions of sign languages in a self-supervised fashion. Our approach is based on learning from individual frames (rather than video sequences) and is therefore much…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Subtitles and Audiovisual Media