Scaling Sign Language Translation

Biao Zhang; Garrett Tanzer; Orhan Firat

arXiv:2407.11855·cs.CL·July 17, 2024

Scaling Sign Language Translation

Biao Zhang, Garrett Tanzer, Orhan Firat

PDF

Open Access

TL;DR

This paper advances sign language translation by scaling data, models, and translation directions, achieving significant improvements and demonstrating zero-shot capabilities across multiple languages.

Contribution

It introduces a large-scale, unified pretraining approach for SLT using diverse data sources and model scaling, enabling open-domain translation and zero-shot transfer.

Findings

01

Significant performance improvements over baselines.

02

Successful zero-shot sign language translation.

03

Effective cross-lingual and cross-modal transfer.

Abstract

Sign language translation (SLT) addresses the problem of translating information from a sign language in video to a spoken language in text. Existing studies, while showing progress, are often limited to narrow domains and/or few sign languages and struggle with open-domain tasks. In this paper, we push forward the frontier of SLT by scaling pretraining data, model size, and number of translation directions. We perform large-scale SLT pretraining on different data including 1) noisy multilingual YouTube SLT data, 2) parallel text corpora, and 3) SLT data augmented by translating video captions to other languages with off-the-shelf machine translation models. We unify different pretraining tasks with task-specific prompts under the encoder-decoder architecture, and initialize the SLT model with pretrained (m/By)T5 models across model sizes. SLT pretraining results on How2Sign and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Swearing, Euphemism, Multilingualism