Less is more: concatenating videos for Sign Language Translation from a   small set of signs

David Vinicius da Silva; Valter Estevam; and David Menotti

arXiv:2409.01506·cs.CV·September 4, 2024

Less is more: concatenating videos for Sign Language Translation from a small set of signs

David Vinicius da Silva, Valter Estevam, and David Menotti

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to generate large-scale sign language translation datasets by concatenating short isolated sign clips, enabling effective training with limited original data and reducing costs.

Contribution

It proposes a novel data augmentation approach using clip concatenation to expand training datasets for Sign Language Translation models from small sign vocabularies.

Findings

01

Generated datasets with 170K to 500K videos for training

02

Achieved BLEU-4 score of 9.2% and METEOR score of 26.2%

03

Demonstrated cost-effective dataset creation for sign language translation

Abstract

The limited amount of labeled data for training the Brazilian Sign Language (Libras) to Portuguese Translation models is a challenging problem due to video collection and annotation costs. This paper proposes generating sign language content by concatenating short clips containing isolated signals for training Sign Language Translation models. We employ the V-LIBRASIL dataset, composed of 4,089 sign videos for 1,364 signs, interpreted by at least three persons, to create hundreds of thousands of sentences with their respective Libras translation, and then, to feed the model. More specifically, we propose several experiments varying the vocabulary size and sentence structure, generating datasets with approximately 170K, 300K, and 500K videos. Our results achieve meaningful scores of 9.2% and 26.2% for BLEU-4 and METEOR, respectively. Our technique enables the creation or extension of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

DavidVinicius/concatenating-videos-for-sign-language-translation
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication