Sign Language Translation from Instructional Videos

Laia Tarr\'es; Gerard I. G\'allego; Amanda Duarte; Jordi Torres,; Xavier Gir\'o-i-Nieto

arXiv:2304.06371·cs.CL·April 17, 2023·1 cites

Sign Language Translation from Instructional Videos

Laia Tarr\'es, Gerard I. G\'allego, Amanda Duarte, Jordi Torres,, Xavier Gir\'o-i-Nieto

PDF

Open Access 1 Repo

TL;DR

This paper presents a new baseline for sign language translation using a large dataset and Transformer models, achieving an 8.03 BLEU score and providing open-source tools to advance research.

Contribution

It introduces the first baseline results on the large How2Sign dataset and offers an open-source implementation for future research in sign language translation.

Findings

01

Achieved BLEU score of 8.03 on How2Sign dataset

02

Used Transformer model with I3D video features

03

Provided open-source implementation for the community

Abstract

The advances in automatic sign language translation (SLT) to spoken languages have been mostly benchmarked with datasets of limited size and restricted domains. Our work advances the state of the art by providing the first baseline results on How2Sign, a large and broad dataset. We train a Transformer over I3D video features, using the reduced BLEU as a reference metric for validation, instead of the widely used BLEU score. We report a result of 8.03 on the BLEU score, and publish the first open-source implementation of its kind to promote further advances.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

imatge-upc/slt_how2sign_wicv2023
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication

MethodsLinear Layer · Position-Wise Feed-Forward Layer · Residual Connection · Label Smoothing · Dropout · Byte Pair Encoding · Adam · Softmax · Multi-Head Attention · Layer Normalization