Sentence-Level Sign Language Recognition Framework
Atra Akandeh

TL;DR
This paper introduces two sentence-level sign language recognition models using CTC, one based on LRCN and the other on a Multi-Cue Network, achieving a 35% WER on RWTH-PHOENIX-Weather.
Contribution
The paper proposes novel sentence-level SLR models leveraging CTC and multi-cue features, improving recognition accuracy without prior knowledge.
Findings
Achieved 35% Word Error Rate on RWTH-PHOENIX-Weather.
Compared LRCN-based and multi-cue models for SLR.
Demonstrated effectiveness of multi-cue features in sign language recognition.
Abstract
We present two solutions to sentence-level SLR. Sentence-level SLR required mapping videos of sign language sentences to sequences of gloss labels. Connectionist Temporal Classification (CTC) has been used as the classifier level of both models. CTC is used to avoid pre-segmenting the sentences into individual words. The first model is an LRCN-based model, and the second model is a Multi-Cue Network. LRCN is a model in which a CNN as a feature extractor is applied to each frame before feeding them into an LSTM. In the first approach, no prior knowledge has been leveraged. Raw frames are fed into an 18-layer LRCN with a CTC on top. In the second approach, three main characteristics (hand shape, hand position, and hand movement information) associated with each sign have been extracted using Mediapipe. 2D landmarks of hand shape have been used to create the skeleton of the hands and then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Surrogate Lagrangian Relaxation
