TL;DR
This paper introduces an attentional network for continuous sign language recognition that models multiple data streams and captures complex dependencies, improving the understanding of sign components in context.
Contribution
It presents a novel attention-based approach to synchronize and integrate multi-channel sign language data, emphasizing the importance of context in sign interpretation.
Findings
Achieved competitive results on RWTH-PHOENIX-Weather 2014 dataset
Effectively models dependencies between handshapes, face, and other sign components
Enhances sign recognition accuracy by contextual aggregation
Abstract
This paper proposes an attentional network for the task of Continuous Sign Language Recognition. The proposed approach exploits co-independent streams of data to model the sign language modalities. These different channels of information can share a complex temporal structure between each other. For that reason, we apply attention to synchronize and help capture entangled dependencies between the different sign language components. Even though Sign Language is multi-channel, handshapes represent the central entities in sign interpretation. Seeing handshapes in their correct context defines the meaning of a sign. Taking that into account, we utilize the attention mechanism to efficiently aggregate the hand features with their appropriate spatio-temporal context for better sign recognition. We found that by doing so the model is able to identify the essential Sign Language components that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
