Continuous Sign Language Recognition Using Intra-inter Gloss Attention
Hossein Ranjbar, Alireza Taheri

TL;DR
This paper introduces intra-inter gloss attention modules in transformer-based models for continuous sign language recognition, effectively capturing local and semantic dependencies to improve accuracy without extra supervision.
Contribution
The study proposes a novel intra-inter gloss attention mechanism that enhances sequence modeling in CSLR by focusing on local chunks and gloss-level relationships, reducing complexity and noise.
Findings
Achieved a 20.4% WER on PHOENIX-2014 dataset.
Outperformed state-of-the-art methods without additional supervision.
Effectively captured local and semantic dependencies in sign language videos.
Abstract
Many continuous sign language recognition (CSLR) studies adopt transformer-based architectures for sequence modeling due to their powerful capacity for capturing global contexts. Nevertheless, vanilla self-attention, which serves as the core module of the transformer, calculates a weighted average over all time steps; therefore, the local temporal semantics of sign videos may not be fully exploited. In this study, we introduce a novel module in sign language recognition studies, called intra-inter gloss attention module, to leverage the relationships among frames within glosses and the semantic and grammatical dependencies between glosses in the video. In the intra-gloss attention module, the video is divided into equally sized chunks and a self-attention mechanism is applied within each chunk. This localized self-attention significantly reduces complexity and eliminates noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Gait Recognition and Analysis
MethodsSoftmax · Attention Is All You Need · Sparse Evolutionary Training · Average Pooling · Focus
