Video-based Sign Language Recognition without Temporal Segmentation
Jie Huang, Wengang Zhou, Qilin Zhang, Houqiang Li, Weiping Li

TL;DR
This paper introduces LS-HAN, a novel video-based sign language recognition framework that eliminates the need for temporal segmentation, improving recognition accuracy and reducing preprocessing complexity.
Contribution
The paper presents LS-HAN, a new continuous sign language recognition model that removes the need for temporal segmentation and leverages hierarchical attention and latent space for improved accuracy.
Findings
Effective on large-scale datasets
Outperforms existing methods in recognition accuracy
Reduces preprocessing steps in sign language recognition
Abstract
Millions of hearing impaired people around the world routinely use some variants of sign languages to communicate, thus the automatic translation of a sign language is meaningful and important. Currently, there are two sub-problems in Sign Language Recognition (SLR), i.e., isolated SLR that recognizes word by word and continuous SLR that translates entire sentences. Existing continuous SLR methods typically utilize isolated SLRs as building blocks, with an extra layer of preprocessing (temporal segmentation) and another layer of post-processing (sentence synthesis). Unfortunately, temporal segmentation itself is non-trivial and inevitably propagates errors into subsequent steps. Worse still, isolated SLR methods typically require strenuous labeling of each word separately in a sentence, severely limiting the amount of attainable training data. To address these challenges, we propose a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication
