Gloss-Free Sign Language Translation: An Unbiased Evaluation of Progress in the Field
Ozge Mercanoglu Sincan, Jian He Low, Sobhan Asasi, Richard Bowden

TL;DR
This paper critically evaluates recent gloss-free sign language translation models by standardizing evaluation conditions, revealing that many reported improvements are attributable to implementation and setup differences rather than methodological advances.
Contribution
It provides a unified re-implementation and fair comparison of recent models, highlighting the impact of evaluation practices on reported performance gains.
Findings
Performance gains often diminish under standardized evaluation.
Implementation details significantly influence results.
Standardized benchmarks are essential for fair comparison.
Abstract
Sign Language Translation (SLT) aims to automatically convert visual sign language videos into spoken language text and vice versa. While recent years have seen rapid progress, the true sources of performance improvements often remain unclear. Do reported performance gains come from methodological novelty, or from the choice of a different backbone, training optimizations, hyperparameter tuning, or even differences in the calculation of evaluation metrics? This paper presents a comprehensive study of recent gloss-free SLT models by re-implementing key contributions in a unified codebase. We ensure fair comparison by standardizing preprocessing, video encoders, and training setups across all methods. Our analysis shows that many of the performance gains reported in the literature often diminish when models are evaluated under consistent conditions, suggesting that implementation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays
