Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse
Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden

TL;DR
This paper introduces a multi-modal transformer model for Sign Language Translation that leverages context from previous sequences and visual cues, significantly improving translation accuracy on large-scale datasets.
Contribution
The authors propose a novel context-aware transformer architecture with multiple encoders for sign language translation, enhancing performance by effectively utilizing contextual information.
Findings
Nearly doubled BLEU-4 scores over baselines
Significant improvements on large-scale datasets
Effective use of context in sign language translation
Abstract
Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos, both of which have different grammar and word/gloss order. From a Neural Machine Translation (NMT) perspective, the straightforward way of training translation models is to use sign language phrase-spoken language sentence pairs. However, human interpreters heavily rely on the context to understand the conveyed information, especially for sign language interpretation, where the vocabulary size may be significantly smaller than their spoken language equivalent. Taking direct inspiration from how humans translate, we propose a novel multi-modal transformer architecture that tackles the translation task in a context-aware manner, as a human would. We use the context from previous sequences and confident predictions to disambiguate weaker visual cues. To achieve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication
