Real-Time American Sign Language Recognition Using 3D Convolutional Neural Networks and LSTM: Architecture, Training, and Deployment

Dawnena Key

arXiv:2512.22177·cs.CV·December 30, 2025

Real-Time American Sign Language Recognition Using 3D Convolutional Neural Networks and LSTM: Architecture, Training, and Deployment

Dawnena Key

PDF

Open Access

TL;DR

This paper introduces a real-time ASL recognition system combining 3D CNN and LSTM architectures, trained on multiple datasets, achieving high accuracy and deployable on edge devices for improved communication accessibility.

Contribution

It presents a novel hybrid deep learning architecture for real-time sign language recognition, optimized for deployment on edge devices, and trained on diverse datasets for broad applicability.

Findings

01

Achieved F1-scores from 0.71 to 0.99 across sign classes.

02

Successfully deployed on AWS and edge devices for real-time inference.

03

Demonstrated effectiveness on multiple ASL datasets.

Abstract

This paper presents a real-time American Sign Language (ASL) recognition system utilizing a hybrid deep learning architecture combining 3D Convolutional Neural Networks (3D CNN) with Long Short-Term Memory (LSTM) networks. The system processes webcam video streams to recognize word-level ASL signs, addressing communication barriers for over 70 million deaf and hard-of-hearing individuals worldwide. Our architecture leverages 3D convolutions to capture spatial-temporal features from video frames, followed by LSTM layers that model sequential dependencies inherent in sign language gestures. Trained on the WLASL dataset (2,000 common words), ASL-LEX lexical database (~2,700 signs), and a curated set of 100 expert-annotated ASL signs, the system achieves F1-scores ranging from 0.71 to 0.99 across sign classes. The model is deployed on AWS infrastructure with edge deployment capability on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Interactive and Immersive Displays