TSLFormer: A Lightweight Transformer Model for Turkish Sign Language Recognition Using Skeletal Landmarks
Kutay Ert\"urk, Furkan Alt{\i}n{\i}\c{s}{\i}k, \.Irem Sar{\i}alt{\i}n, \"Omer Nezih Gerek

TL;DR
TSLFormer is a lightweight transformer-based model that recognizes Turkish Sign Language from skeletal joint data, enabling efficient, real-time gesture translation without raw video input.
Contribution
The paper introduces TSLFormer, a novel transformer model that uses skeletal joint data for sign language recognition, reducing computational complexity and improving efficiency.
Findings
Achieves competitive accuracy on AUTSL dataset
Operates with minimal computational resources
Enables real-time sign language translation
Abstract
This study presents TSLFormer, a light and robust word-level Turkish Sign Language (TSL) recognition model that treats sign gestures as ordered, string-like language. Instead of using raw RGB or depth videos, our method only works with 3D joint positions - articulation points - extracted using Google's Mediapipe library, which focuses on the hand and torso skeletal locations. This creates efficient input dimensionality reduction while preserving important semantic gesture information. Our approach revisits sign language recognition as sequence-to-sequence translation, inspired by the linguistic nature of sign languages and the success of transformers in natural language processing. Since TSLFormer uses the self-attention mechanism, it effectively captures temporal co-occurrence within gesture sequences and highlights meaningful motion patterns as words unfold. Evaluated on the AUTSL…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
