TL;DR
HGR-Net is a two-stage CNN architecture that combines semantic segmentation and gesture recognition, achieving high accuracy and efficiency in hand gesture recognition tasks even under challenging conditions.
Contribution
The paper introduces a novel fusion network that integrates segmentation and recognition stages, improving robustness and efficiency in hand gesture recognition.
Findings
Achieves near state-of-the-art accuracy in gesture recognition.
Operates at an average of 23 ms per frame, suitable for real-time applications.
Requires less training time, runtime, and model size compared to existing methods.
Abstract
We propose a two-stage convolutional neural network (CNN) architecture for robust recognition of hand gestures, called HGR-Net, where the first stage performs accurate semantic segmentation to determine hand regions, and the second stage identifies the gesture. The segmentation stage architecture is based on the combination of fully convolutional residual network and atrous spatial pyramid pooling. Although the segmentation sub-network is trained without depth information, it is particularly robust against challenges such as illumination variations and complex backgrounds. The recognition stage deploys a two-stream CNN, which fuses the information from the red-green-blue and segmented images by combining their deep representations in a fully connected layer before classification. Extensive experiments on public datasets show that our architecture achieves almost as good as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
