RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition

C\u{a}t\u{a}lin-Alexandru R\^ipanu; Andrei-Theodor Hotnog; Giulia-Stefania Imbrea; Dumitru-Clementin Cercel

arXiv:2511.12767·cs.CV·November 18, 2025

RoCoISLR: A Romanian Corpus for Isolated Sign Language Recognition

C\u{a}t\u{a}lin-Alexandru R\^ipanu, Andrei-Theodor Hotnog, Giulia-Stefania Imbrea, Dumitru-Clementin Cercel

PDF

Open Access

TL;DR

This paper introduces RoCoISLR, a new Romanian sign language dataset with over 9,000 videos, and benchmarks seven models, revealing the potential of transformer architectures for sign language recognition.

Contribution

The paper presents the first large-scale, standardized Romanian sign language dataset and provides benchmark evaluations of state-of-the-art models for isolated sign language recognition.

Findings

01

Transformer models outperform convolutional models

02

Swin Transformer achieved 34.1% Top-1 accuracy

03

Challenges due to long-tail class distributions in low-resource sign languages

Abstract

Automatic sign language recognition plays a crucial role in bridging the communication gap between deaf communities and hearing individuals; however, most available datasets focus on American Sign Language. For Romanian Isolated Sign Language Recognition (RoISLR), no large-scale, standardized dataset exists, which limits research progress. In this work, we introduce a new corpus for RoISLR, named RoCoISLR, comprising over 9,000 video samples that span nearly 6,000 standardized glosses from multiple sources. We establish benchmark results by evaluating seven state-of-the-art video recognition models-I3D, SlowFast, Swin Transformer, TimeSformer, Uniformer, VideoMAE, and PoseConv3D-under consistent experimental setups, and compare their performance with that of the widely used WLASL2000 corpus. According to the results, transformer-based architectures outperform convolutional baselines;…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays