A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild
Kamala Gajurel, Cuncong Zhong, Guanghui Wang

TL;DR
This paper introduces a Transformer-based fine-grained visual attention method leveraging optical flow for improved fingerspelling recognition in unconstrained, real-world videos, addressing challenges of gesture ambiguity and variability.
Contribution
It proposes a novel fine-grained attention mechanism using optical flow within a Transformer model for sequence prediction in wild datasets, outperforming existing methods.
Findings
Outperforms state-of-the-art approaches
Effectively captures fine-grained motion details
Handles real-world, unconstrained video data
Abstract
Fingerspelling in sign language has been the means of communicating technical terms and proper nouns when they do not have dedicated sign language gestures. Automatic recognition of fingerspelling can help resolve communication barriers when interacting with deaf people. The main challenges prevalent in fingerspelling recognition are the ambiguity in the gestures and strong articulation of the hands. The automatic recognition model should address high inter-class visual similarity and high intra-class variation in the gestures. Most of the existing research in fingerspelling recognition has focused on the dataset collected in a controlled environment. The recent collection of a large-scale annotated fingerspelling dataset in the wild, from social media and online platforms, captures the challenges in a real-world scenario. In this work, we propose a fine-grained visual attention…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Adam · Layer Normalization
