Spatio-temporal Sign Language Representation and Translation
Yasser Hamidullah, Josef van Genabith, Cristina Espa\~na-Bonet

TL;DR
This paper presents an end-to-end spatio-temporal feature learning and translation system for Swiss German Sign Language to German, aiming to improve generalization in sign language translation tasks.
Contribution
It introduces a unified model that learns spatio-temporal features and performs translation simultaneously, differing from traditional separate feature extraction and translation pipelines.
Findings
Achieved 5±1 BLEU on development set
Test performance dropped to 0.11±0.06 BLEU
Demonstrates potential of end-to-end spatio-temporal modeling
Abstract
This paper describes the DFKI-MLT submission to the WMT-SLT 2022 sign language translation (SLT) task from Swiss German Sign Language (video) into German (text). State-of-the-art techniques for SLT use a generic seq2seq architecture with customized input embeddings. Instead of word embeddings as used in textual machine translation, SLT systems use features extracted from video frames. Standard approaches often do not benefit from temporal features. In our participation, we present a system that learns spatio-temporal feature representations and translation in a single model, resulting in a real end-to-end architecture expected to better generalize to new data sets. Our best system achieved BLEU points on the development set, but the performance on the test dropped to BLEU points.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication
