Deep learning based Hand gesture recognition system and design of a   Human-Machine Interface

Abir Sen; Tapas Kumar Mishra; Ratnakar Dash

arXiv:2207.03112·cs.CV·January 18, 2023·1 cites

Deep learning based Hand gesture recognition system and design of a Human-Machine Interface

Abir Sen, Tapas Kumar Mishra, Ratnakar Dash

PDF

Open Access

TL;DR

This paper presents a real-time hand gesture recognition system using deep learning models, including CNNs and ViT, to enable human-computer interaction and control desktop applications, benefiting disabled users.

Contribution

The work introduces a real-time gesture recognition system employing pre-trained CNNs and ViT, with improved accuracy and smoothness for practical HCI applications.

Findings

01

Inception-V1 outperformed other models in classification accuracy.

02

System achieved 25 fps, suitable for real-time interaction.

03

Successfully controlled desktop applications with gesture commands.

Abstract

In this work, a real-time hand gesture recognition system-based human-computer interface (HCI) is presented. The system consists of six stages: (1) hand detection, (2) gesture segmentation, (3) use of five pre-trained convolutional neural network models (CNN) and vision transformer (ViT), (4) building an interactive human-machine interface (HMI), (5) development of a gesture-controlled virtual mouse, (6) use of Kalman filter to estimate the hand position, based on that the smoothness of the motion of pointer is improved. In our work, five pre-trained CNN (VGG16, VGG19, ResNet50, ResNet101, and Inception-V1) models and ViT have been employed to classify hand gesture images. Two multi-class datasets (one public and one custom) have been used to validate the models. Considering the model's performances, it is observed that Inception-V1 has significantly shown a better classification…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHand Gesture Recognition Systems · Hearing Impairment and Communication · Human Pose and Action Recognition

MethodsAttention Is All You Need · Linear Layer · Softmax · Multi-Head Attention · Residual Connection · Dense Connections · Layer Normalization · Vision Transformer · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings