Sign Language Translation from Instructional Videos
Laia Tarr\'es, Gerard I. G\'allego, Amanda Duarte, Jordi Torres,, Xavier Gir\'o-i-Nieto

TL;DR
This paper presents a new baseline for sign language translation using a large dataset and Transformer models, achieving an 8.03 BLEU score and providing open-source tools to advance research.
Contribution
It introduces the first baseline results on the large How2Sign dataset and offers an open-source implementation for future research in sign language translation.
Findings
Achieved BLEU score of 8.03 on How2Sign dataset
Used Transformer model with I3D video features
Provided open-source implementation for the community
Abstract
The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication
MethodsLinear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Softmax · Multi-Head Attention · Layer Normalization
