SignIT: A Comprehensive Dataset and Multimodal Analysis for Italian Sign Language Recognition
Alessia Micieli, Giovanni Maria Farinella, Francesco Ragusa

TL;DR
SignIT introduces a new Italian Sign Language dataset with multimodal annotations and benchmarks, revealing current model limitations and paving the way for improved recognition methods.
Contribution
The paper provides a comprehensive LIS dataset with multimodal annotations and benchmarks, highlighting the challenges and guiding future research in sign language recognition.
Findings
State-of-the-art models have limited performance on SignIT.
Temporal information and multimodal data influence recognition accuracy.
The dataset enables evaluation of sign language recognition methods.
Abstract
In this work we present SignIT, a new dataset to study the task of Italian Sign Language (LIS) recognition. The dataset is composed of 644 videos covering 3.33 hours. We manually annotated videos considering a taxonomy of 94 distinct sign classes belonging to 5 macro-categories: Animals, Food, Colors, Emotions and Family. We also extracted 2D keypoints related to the hands, face and body of the users. With the dataset, we propose a benchmark for the sign recognition task, adopting several state-of-the-art models showing how temporal information, 2D keypoints and RGB frames can be influence the performance of these models. Results show the limitations of these models on this challenging LIS dataset. We release data and annotations at the following link: https://fpv-iplab.github.io/SignIT/.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Face recognition and analysis
