SkelCap: Automated Generation of Descriptive Text from Skeleton Keypoint Sequences
Ali Emre Keskin, Hacer Yalim Keles

TL;DR
This paper introduces SkelCap, a model that generates textual descriptions from skeleton keypoint sequences, and presents a new dataset based on Turkish sign language to improve sign language understanding.
Contribution
It develops a novel dataset and baseline model for translating skeleton keypoints into descriptive text, addressing data scarcity in sign language research.
Findings
High ROUGE-L score of 0.98 indicating accurate text generation
BLEU-4 score of 0.94 demonstrating strong translation quality
Model performs well in signer-agnostic evaluations
Abstract
Numerous sign language datasets exist, yet they typically cover only a limited selection of the thousands of signs used globally. Moreover, creating diverse sign language datasets is an expensive and challenging task due to the costs associated with gathering a varied group of signers. Motivated by these challenges, we aimed to develop a solution that addresses these limitations. In this context, we focused on textually describing body movements from skeleton keypoint sequences, leading to the creation of a new dataset. We structured this dataset around AUTSL, a comprehensive isolated Turkish sign language dataset. We also developed a baseline model, SkelCap, which can generate textual descriptions of body movements. This model processes the skeleton keypoints data as a vector, applies a fully connected layer for embedding, and utilizes a transformer neural network for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
