American Sign Language fingerspelling recognition in the wild
Bowen Shi, Aurora Martinez Del Rio, Jonathan Keane, Jonathan Michaux,, Diane Brentari, Greg Shakhnarovich, Karen Livescu

TL;DR
This paper introduces the first large-scale, natural video dataset for American Sign Language fingerspelling recognition in real-world conditions, and explores baseline models to address visual variability and low frame rates.
Contribution
It provides the largest natural video dataset for fingerspelling recognition and establishes baseline methods using attention-based and CTC models in challenging real-world scenarios.
Findings
Letter error rates are higher than in controlled settings.
Visual variability and low frame rates significantly impact recognition accuracy.
Model variants have different strengths and weaknesses in this challenging task.
Abstract
We address the problem of American Sign Language fingerspelling recognition in the wild, using videos collected from websites. We introduce the largest data set available so far for the problem of fingerspelling recognition, and the first using naturally occurring video data. Using this data set, we present the first attempt to recognize fingerspelling sequences in this challenging setting. Unlike prior work, our video data is extremely challenging due to low frame rates and visual variability. To tackle the visual challenges, we train a special-purpose signing hand detector using a small subset of our data. Given the hand detector output, a sequence model decodes the hypothesized fingerspelled letter sequence. For the sequence model, we explore attention-based recurrent encoder-decoders and CTC-based approaches. As the first attempt at fingerspelling recognition in the wild, this work…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Hearing Impairment and Communication
Methods7 Fastest Ways to Call American Airlines Reservations Number (USA Guide)
