A Fine-Grained Visual Attention Approach for Fingerspelling Recognition   in the Wild

Kamala Gajurel; Cuncong Zhong; Guanghui Wang

arXiv:2105.07625·cs.CV·August 24, 2021·1 cites

A Fine-Grained Visual Attention Approach for Fingerspelling Recognition in the Wild

Kamala Gajurel, Cuncong Zhong, Guanghui Wang

PDF

Open Access

TL;DR

This paper introduces a Transformer-based fine-grained visual attention method leveraging optical flow for improved fingerspelling recognition in unconstrained, real-world videos, addressing challenges of gesture ambiguity and variability.

Contribution

It proposes a novel fine-grained attention mechanism using optical flow within a Transformer model for sequence prediction in wild datasets, outperforming existing methods.

Findings

01

Outperforms state-of-the-art approaches

02

Effectively captures fine-grained motion details

03

Handles real-world, unconstrained video data

Abstract

Fingerspelling in sign language has been the means of communicating technical terms and proper nouns when they do not have dedicated sign language gestures. Automatic recognition of fingerspelling can help resolve communication barriers when interacting with deaf people. The main challenges prevalent in fingerspelling recognition are the ambiguity in the gestures and strong articulation of the hands. The automatic recognition model should address high inter-class visual similarity and high intra-class variation in the gestures. Most of the existing research in fingerspelling recognition has focused on the dataset collected in a controlled environment. The recent collection of a large-scale annotated fingerspelling dataset in the wild, from social media and online platforms, captures the challenges in a real-world scenario. In this work, we propose a fine-grained visual attention…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Dropout · Residual Connection · Dense Connections · Adam · Layer Normalization