RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition
C\u{a}t\u{a}lin-Alexandru R\^ipanu, Andrei-Theodor Hotnog, Giulia-Stefania Imbrea, Dumitru-Clementin Cercel

TL;DR
This paper introduces RoCoISLR, a new Romanian sign language dataset with over 9,000 videos, and benchmarks seven models, revealing the potential of transformer architectures for sign language recognition.
Contribution
The paper presents the first large-scale, standardized Romanian sign language dataset and provides benchmark evaluations of state-of-the-art models for isolated sign language recognition.
Findings
Transformer models outperform convolutional models
Swin Transformer achieved 34.1% Top-1 accuracy
Challenges due to long-tail class distributions in low-resource sign languages
Abstract
Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines;…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays
