Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN
Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa

TL;DR
This paper presents a skeleton-based, data fusion and multi-stream CNN framework for real-time hand gesture recognition that reduces hardware and computational demands while maintaining high accuracy.
Contribution
It introduces a novel static image classification approach for dynamic gestures using data-level fusion and an optimized multi-stream CNN architecture, enabling real-time performance.
Findings
Competitive accuracy on five benchmark datasets.
Supports real-time deployment on standard consumer hardware.
Demonstrates low latency and resource efficiency in practical scenarios.
Abstract
Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
