Data-Efficient American Sign Language Recognition via Few-Shot Prototypical Networks
Meher Md Saad

TL;DR
This paper introduces a few-shot learning framework using prototypical networks and a spatiotemporal graph encoder for efficient American Sign Language recognition, especially effective with limited data and unseen signs.
Contribution
It presents a novel few-shot prototypical network approach with a spatiotemporal graph encoder for sign language recognition, improving generalization in data-scarce scenarios.
Findings
Achieves 43.75% Top-1 accuracy on WLASL dataset.
Outperforms standard classifiers by over 13% in scarce data conditions.
Demonstrates 30% accuracy on unseen signs without fine-tuning.
Abstract
Isolated Sign Language Recognition (ISLR) is critical for bridging the communication gap between the Deaf and Hard-of-Hearing (DHH) community and the hearing world. However, robust ISLR is fundamentally constrained by data scarcity and the long-tail distribution of sign vocabulary, where gathering sufficient examples for thousands of unique signs is prohibitively expensive. Standard classification approaches struggle under these conditions, often overfitting to frequent classes while failing to generalize to rare ones. To address this bottleneck, we propose a Few-Shot Prototypical Network framework adapted for a skeleton based encoder. Unlike traditional classifiers that learn fixed decision boundaries, our approach utilizes episodic training to learn a semantic metric space where signs are classified based on their proximity to dynamic class prototypes. We integrate a Spatiotemporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition
