Introducing A Bangla Sentence - Gloss Pair Dataset for Bangla Sign Language Translation and Research
Neelavro Saha, Rafi Shahriyar, Nafis Ashraf Roudra, Saadman Sakib, Annajiat Alim Rasel

TL;DR
This paper introduces Bangla-SGP, a new dataset of 1,000 human-annotated Bangla sentence-gloss pairs augmented with 3,000 synthetic pairs, to advance Bangla Sign Language translation research, and evaluates transformer models on this dataset.
Contribution
The creation of a novel, high-quality Bangla sentence-gloss dataset with synthetic augmentation and the evaluation of transformer models for sentence-to-gloss translation.
Findings
Transformer models achieved promising BLEU scores on the dataset.
Synthetic data augmentation improved translation performance.
Model performance was compared with the RWTH-PHOENIX-2014T benchmark.
Abstract
Bangla Sign Language (BdSL) translation represents a low-resource NLP task due to the lack of large-scale datasets that address sentence-level translation. Correspondingly, existing research in this field has been limited to word and alphabet level detection. In this work, we introduce Bangla-SGP, a novel parallel dataset consisting of 1,000 human-annotated sentence-gloss pairs which was augmented with around 3,000 synthetically generated pairs using syntactic and morphological rules through a rule-based Retrieval-Augmented Generation (RAG) pipeline. The gloss sequences of the spoken Bangla sentences are made up of individual glosses which are Bangla sign supported words and serve as an intermediate representation for a continuous sign. Our dataset consists of 1000 high quality Bangla sentences that are manually annotated into a gloss sequence by a professional signer. The augmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Natural Language Processing Techniques
