# Pose-Based Static Sign Language Recognition with Deep Learning for Turkish, Arabic, and American Sign Languages

**Authors:** Rıdvan Yayla, Hakan Üçgün, Mahmud Abbas

PMC · DOI: 10.3390/s26020524 · Sensors (Basel, Switzerland) · 2026-01-13

## TL;DR

This paper introduces a deep learning framework for recognizing Turkish, American, and Arabic sign languages using hand pose data, comparing different model architectures for accuracy and generalization.

## Contribution

A cross-lingual sign language recognition system is proposed with a comparative analysis of CNN, ViT, and SSM models using diverse datasets.

## Key findings

- Vision Transformers and state space models outperformed CNNs in capturing spatial cues across sign languages.
- Curated datasets from multiple sources improved model generalization across Turkish, American, and Arabic sign languages.

## Abstract

Advancements in artificial intelligence have significantly enhanced communication for individuals with hearing impairments. This study presents a robust cross-lingual Sign Language Recognition (SLR) framework for Turkish, American English, and Arabic sign languages. The system utilizes the lightweight MediaPipe library for efficient hand landmark extraction, ensuring stable and consistent feature representation across diverse linguistic contexts. Datasets were meticulously constructed from nine public-domain sources (four Arabic, three American, and two Turkish). The final training data comprises curated image datasets, with frames for each language carefully selected from varying angles and distances to ensure high diversity. A comprehensive comparative evaluation was conducted across three state-of-the-art deep learning architectures—ConvNeXt (CNN-based), Swin Transformer (ViT-based), and Vision Mamba (SSM-based)—all applied to identical feature sets. The evaluation demonstrates the superior performance of contemporary vision Transformers and state space models in capturing subtle spatial cues across diverse sign languages. Our approach provides a comparative analysis of model generalization capabilities across three distinct sign languages, offering valuable insights for model selection in pose-based SLR systems.

## Full-text entities

- **Diseases:** hearing impairments (MESH:D034381)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12846149/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12846149/full.md

## References

61 references — full list in the complete paper: https://tomesphere.com/paper/PMC12846149/full.md

---
Source: https://tomesphere.com/paper/PMC12846149