Real-Time American Sign Language Recognition Using 3D Convolutional Neural Networks and LSTM: Architecture, Training, and Deployment
Dawnena Key

TL;DR
This paper introduces a real-time ASL recognition system combining 3D CNN and LSTM architectures, trained on multiple datasets, achieving high accuracy and deployable on edge devices for improved communication accessibility.
Contribution
It presents a novel hybrid deep learning architecture for real-time sign language recognition, optimized for deployment on edge devices, and trained on diverse datasets for broad applicability.
Findings
Achieved F1-scores from 0.71 to 0.99 across sign classes.
Successfully deployed on AWS and edge devices for real-time inference.
Demonstrated effectiveness on multiple ASL datasets.
Abstract
This paper presents a real-time American Sign Language (ASL) recognition system utilizing a hybrid deep learning architecture combining 3D Convolutional Neural Networks (3D CNN) with Long Short-Term Memory (LSTM) networks. The system processes webcam video streams to recognize word-level ASL signs, addressing communication barriers for over 70 million deaf and hard-of-hearing individuals worldwide. Our architecture leverages 3D convolutions to capture spatial-temporal features from video frames, followed by LSTM layers that model sequential dependencies inherent in sign language gestures. Trained on the WLASL dataset (2,000 common words), ASL-LEX lexical database (~2,700 signs), and a curated set of 100 expert-annotated ASL signs, the system achieves F1-scores ranging from 0.71 to 0.99 across sign classes. The model is deployed on AWS infrastructure with edge deployment capability on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays
