Deep learning based Hand gesture recognition system and design of a Human-Machine Interface
Abir Sen, Tapas Kumar Mishra, Ratnakar Dash

TL;DR
This paper presents a real-time hand gesture recognition system using deep learning models, including CNNs and ViT, to enable human-computer interaction and control desktop applications, benefiting disabled users.
Contribution
The work introduces a real-time gesture recognition system employing pre-trained CNNs and ViT, with improved accuracy and smoothness for practical HCI applications.
Findings
Inception-V1 outperformed other models in classification accuracy.
System achieved 25 fps, suitable for real-time interaction.
Successfully controlled desktop applications with gesture commands.
Abstract
In this work, a real-time hand gesture recognition system-based human-computer interface (HCI) is presented. The system consists of six stages: (1) hand detection, (2) gesture segmentation, (3) use of five pre-trained convolutional neural network models (CNN) and vision transformer (ViT), (4) building an interactive human-machine interface (HMI), (5) development of a gesture-controlled virtual mouse, (6) use of Kalman filter to estimate the hand position, based on that the smoothness of the motion of pointer is improved. In our work, five pre-trained CNN (VGG16, VGG19, ResNet50, ResNet101, and Inception-V1) models and ViT have been employed to classify hand gesture images. Two multi-class datasets (one public and one custom) have been used to validate the models. Considering the model's performances, it is observed that Inception-V1 has significantly shown a better classification…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition
MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
