SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint   Sequences

Ali Emre Keskin; Hacer Yalim Keles

arXiv:2405.02977·cs.CV·July 24, 2024

SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences

Ali Emre Keskin, Hacer Yalim Keles

PDF

TL;DR

This paper introduces SkelCap, a model that generates textual descriptions from skeleton keypoint sequences, and presents a new dataset based on Turkish sign language to improve sign language understanding.

Contribution

It develops a novel dataset and baseline model for translating skeleton keypoints into descriptive text, addressing data scarcity in sign language research.

Findings

01

High ROUGE-L score of 0.98 indicating accurate text generation

02

BLEU-4 score of 0.94 demonstrating strong translation quality

03

Model performs well in signer-agnostic evaluations

Abstract

Numerous sign language datasets exist, yet they typically cover only a limited selection of the thousands of signs used globally. Moreover, creating diverse sign language datasets is an expensive and challenging task due to the costs associated with gathering a varied group of signers. Motivated by these challenges, we aimed to develop a solution that addresses these limitations. In this context, we focused on textually describing body movements from skeleton keypoint sequences, leading to the creation of a new dataset. We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset. We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements. This model processes the skeleton keypoints data as a vector, applies a fully connected layer for embedding, and utilizes a transformer neural network for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.