A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition
Nigar Alishzade, Gulchin Abdullayeva

TL;DR
This paper systematically compares recurrent and attention-based neural architectures for isolated sign language recognition, finding that Transformers outperform ConvLSTM in accuracy but with different trade-offs.
Contribution
It provides the first comprehensive comparison of ConvLSTM and Transformer models on multiple sign language datasets, highlighting their respective strengths and trade-offs.
Findings
Transformers achieve higher accuracy than ConvLSTM.
ConvLSTM is more computationally efficient.
Transformers show better signer independence.
Abstract
This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays
