Spatio-temporal Sign Language Representation and Translation

Yasser Hamidullah; Josef van Genabith; Cristina Espa\~na-Bonet

arXiv:2510.19413·cs.CL·October 23, 2025

Spatio-temporal Sign Language Representation and Translation

Yasser Hamidullah, Josef van Genabith, Cristina Espa\~na-Bonet

PDF

Open Access

TL;DR

This paper presents an end-to-end spatio-temporal feature learning and translation system for Swiss German Sign Language to German, aiming to improve generalization in sign language translation tasks.

Contribution

It introduces a unified model that learns spatio-temporal features and performs translation simultaneously, differing from traditional separate feature extraction and translation pipelines.

Findings

01

Achieved 5±1 BLEU on development set

02

Test performance dropped to 0.11±0.06 BLEU

03

Demonstrates potential of end-to-end spatio-temporal modeling

Abstract

This paper describes the DFKI-MLT submission to the WMT-SLT 2022 sign language translation (SLT) task from Swiss German Sign Language (video) into German (text). State-of-the-art techniques for SLT use a generic seq2seq architecture with customized input embeddings. Instead of word embeddings as used in textual machine translation, SLT systems use features extracted from video frames. Standard approaches often do not benefit from temporal features. In our participation, we present a system that learns spatio-temporal feature representations and translation in a single model, resulting in a real end-to-end architecture expected to better generalize to new data sets. Our best system achieved $5 \pm 1$ BLEU points on the development set, but the performance on the test dropped to $0.11 \pm 0.06$ BLEU points.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication