Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects

Husne Ara Rubaiyeat; Hasan Mahmud; Md Kamrul Hasan

arXiv:2511.21533·cs.CL·November 27, 2025

Bangla Sign Language Translation: Dataset Creation Challenges, Benchmarking and Prospects

Husne Ara Rubaiyeat, Hasan Mahmud, Md Kamrul Hasan

PDF

Open Access

TL;DR

This paper introduces a new dataset for Bangla Sign Language translation, discusses challenges in dataset creation, benchmarks initial methods, and explores future research directions to aid deaf communities.

Contribution

It presents the IsharaKhobor dataset and its subsets, addressing low-resource challenges and providing benchmarks for Bangla Sign Language translation research.

Findings

01

Created the IsharaKhobor dataset and subsets

02

Benchmarking with landmark-based raw and RQE embeddings

03

Ablation study on vocabulary restriction and canonicalization

Abstract

Bangla Sign Language Translation (BdSLT) has been severely constrained so far as the language itself is very low resource. Standard sentence level dataset creation for BdSLT is of immense importance for developing AI based assistive tools for deaf and hard of hearing people of Bangla speaking community. In this paper, we present a dataset, IsharaKhobor , and two subset of it for enabling research. We also present the challenges towards developing the dataset and present some way forward by benchmarking with landmark based raw and RQE embedding. We do some ablation on vocabulary restriction and canonicalization of the same within the dataset, which resulted in two more datasets, IsharaKhobor_small and IsharaKhobor_canonical_small. The dataset is publicly available at: www.kaggle.com/datasets/hasanssl/isharakhobor [1].

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Natural Language Processing Techniques