TL;DR
BdSL-SPOTER is a transformer-based Bengali Sign Language recognition framework that improves accuracy and efficiency through cultural adaptation, curriculum learning, and a compact model suitable for real-world applications.
Contribution
It introduces a novel pose-based transformer model with cultural preprocessing and curriculum learning for Bengali Sign Language recognition, achieving high accuracy with low computational costs.
Findings
97.92% Top-1 accuracy on BdSLW60 benchmark
22.82% improvement over Bi-LSTM baseline
Reduced parameters and FLOPs for real-time use
Abstract
We introduce BdSL-SPOTER, a pose-based transformer framework for accurate and efficient recognition of Bengali Sign Language (BdSL). BdSL-SPOTER extends the SPOTER paradigm with cultural specific preprocessing and a compact four-layer transformer encoder featuring optimized learnable positional encodings, while employing curriculum learning to enhance generalization on limited data and accelerate convergence. On the BdSLW60 benchmark, it achieves 97.92% Top-1 validation accuracy, representing a 22.82% improvement over the Bi-LSTM baseline, all while keeping computational costs low. With its reduced number of parameters, lower FLOPs, and higher FPS, BdSL-SPOTER provides a practical framework for real-world accessibility applications and serves as a scalable model for other low-resource regional sign languages.
Peer Reviews
Decision·Submitted to ICLR 2026
1. Cultural Adaptation: The model addresses cultural differences in BdSL through novel techniques, including cultural regularization and motion-aware attention biasing. These methods help the model better understand and adapt to BdSL’s unique signing conventions, making it culturally sensitive. 2. Dataset and Experimental Design: The paper uses the BdSLW60 dataset, which contains 9,307 videos, ensuring a diverse and representative dataset. The rigorous signer-independent 5-fold cross-validation
1. Single Dataset Limitation: Although the BdSLW60 dataset is a valuable resource, the model’s evaluation is based on a single dataset. The lack of cross-dataset validation limits the generalizability of the model. Future work could benefit from expanding the model’s evaluation to multiple datasets to assess its broader applicability. 2. Limited Signer Diversity: While the BdSLW60 dataset includes 18 signers, the diversity of the signers (in terms of age, gender, and regional variations) is sti
* Competitive results and ablations. Reported Top-1, Top-5, and Macro-F1 are high relative to listed baselines, and each cultural component contributes. * Method description is clear enough to reproduce. The components are specified with equations and objectives, and the training pipeline is described at a level that enables re-implementation.
1. Causal support for “cultural” choices is limited. The α choice is motivated by a compactness gap relative to an ASL reference, but comparability across acquisition conditions is not established. A sensitivity study or a learnable α would strengthen the claim. 2. Low-motion attention bias relies on a strong linguistic assumption. Holds can also reflect hesitation or tracking noise. The paper does not show robustness of κ, γ, ε across sign speeds or styles. 3. Ethics statement is minimal. It
Originality: The Cultural adaptation approach addresses an important gap in SLR research Significance: Focusing on the under-resourced Bengali Sign Language is commendable Technical approach: Integration of linguistic insights with transformer architecture is innovative Evaluation: Strict signer-independent cross-validation is methodologically sound
Incomplete presentation: Missing figures undermine the credibility of claims. Methodological opacity: Key parameters and implementation details are unclear. Limited validation: Cultural adaptations need more rigorous evaluation and analysis. Single-dataset evaluation: Results on only one dataset limit generalizability claims.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
