A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

Nigar Alishzade; Gulchin Abdullayeva

arXiv:2511.13126·cs.CL·November 18, 2025

A Comparative Analysis of Recurrent and Attention Architectures for Isolated Sign Language Recognition

Nigar Alishzade, Gulchin Abdullayeva

PDF

Open Access

TL;DR

This paper systematically compares recurrent and attention-based neural architectures for isolated sign language recognition, finding that Transformers outperform ConvLSTM in accuracy but with different trade-offs.

Contribution

It provides the first comprehensive comparison of ConvLSTM and Transformer models on multiple sign language datasets, highlighting their respective strengths and trade-offs.

Findings

01

Transformers achieve higher accuracy than ConvLSTM.

02

ConvLSTM is more computationally efficient.

03

Transformers show better signer independence.

Abstract

This study presents a systematic comparative analysis of recurrent and attention-based neural architectures for isolated sign language recognition. We implement and evaluate two representative models-ConvLSTM and Vanilla Transformer-on the Azerbaijani Sign Language Dataset (AzSLD) and the Word-Level American Sign Language (WLASL) dataset. Our results demonstrate that the attention-based Vanilla Transformer consistently outperforms the recurrent ConvLSTM in both Top-1 and Top-5 accuracy across datasets, achieving up to 76.8% Top-1 accuracy on AzSLD and 88.3% on WLASL. The ConvLSTM, while more computationally efficient, lags in recognition accuracy, particularly on smaller datasets. These findings highlight the complementary strengths of each paradigm: the Transformer excels in overall accuracy and signer independence, whereas the ConvLSTM offers advantages in computational efficiency and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays