Is context all you need? Scaling Neural Sign Language Translation to   Large Domains of Discourse

Ozge Mercanoglu Sincan; Necati Cihan Camgoz; Richard Bowden

arXiv:2308.09622·cs.CV·August 21, 2023·1 cites

Is context all you need? Scaling Neural Sign Language Translation to Large Domains of Discourse

Ozge Mercanoglu Sincan, Necati Cihan Camgoz, Richard Bowden

PDF

Open Access

TL;DR

This paper introduces a multi-modal transformer model for Sign Language Translation that leverages context from previous sequences and visual cues, significantly improving translation accuracy on large-scale datasets.

Contribution

The authors propose a novel context-aware transformer architecture with multiple encoders for sign language translation, enhancing performance by effectively utilizing contextual information.

Findings

01

Nearly doubled BLEU-4 scores over baselines

02

Significant improvements on large-scale datasets

03

Effective use of context in sign language translation

Abstract

Sign Language Translation (SLT) is a challenging task that aims to generate spoken language sentences from sign language videos, both of which have different grammar and word/gloss order. From a Neural Machine Translation (NMT) perspective, the straightforward way of training translation models is to use sign language phrase-spoken language sentence pairs. However, human interpreters heavily rely on the context to understand the conveyed information, especially for sign language interpretation, where the vocabulary size may be significantly smaller than their spoken language equivalent. Taking direct inspiration from how humans translate, we propose a novel multi-modal transformer architecture that tackles the translation task in a context-aware manner, as a human would. We use the context from previous sequences and confident predictions to disambiguate weaker visual cues. To achieve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication