TL;DR
This paper introduces a novel sign language translation method that uses sentence embeddings as supervision, eliminating the need for manual gloss annotations and enabling effective multilingual translation.
Contribution
It presents a new gloss-free training approach using sentence embeddings, improving sign language translation performance without relying on annotated gloss data.
Findings
Outperforms existing gloss-free methods significantly.
Sets new state-of-the-art on datasets without gloss annotations.
Reduces the gap between gloss-dependent and gloss-free systems.
Abstract
State-of-the-art sign language translation (SLT) systems facilitate the learning process through gloss annotations, either in an end2end manner or by involving an intermediate step. Unfortunately, gloss labelled sign language data is usually not available at scale and, when available, gloss annotations widely differ from dataset to dataset. We present a novel approach using sentence embeddings of the target sentences at training time that take the role of glosses. The new kind of supervision does not need any manual annotation but it is learned on raw textual data. As our approach easily facilitates multilinguality, we evaluate it on datasets covering German (PHOENIX-2014T) and American (How2Sign) sign languages and experiment with mono- and multilingual sentence embeddings and translation systems. Our approach significantly outperforms other gloss-free approaches, setting the new…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
