Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Oluwaleke Yusuf; Maki Habib; Mohamed Moustafa

arXiv:2406.15003·cs.CV·April 17, 2026·2 cites

Real-Time Hand Gesture Recognition: Integrating Skeleton-Based Data Fusion and Multi-Stream CNN

Oluwaleke Yusuf, Maki Habib, Mohamed Moustafa

PDF

TL;DR

This paper presents a skeleton-based, data fusion and multi-stream CNN framework for real-time hand gesture recognition that reduces hardware and computational demands while maintaining high accuracy.

Contribution

It introduces a novel static image classification approach for dynamic gestures using data-level fusion and an optimized multi-stream CNN architecture, enabling real-time performance.

Findings

01

Competitive accuracy on five benchmark datasets.

02

Supports real-time deployment on standard consumer hardware.

03

Demonstrates low latency and resource efficiency in practical scenarios.

Abstract

Hand Gesture Recognition (HGR) enables intuitive human-computer interactions in various real-world contexts. However, existing frameworks often struggle to meet the real-time requirements essential for practical HGR applications. This study introduces a robust, skeleton-based framework for dynamic HGR that simplifies the recognition of dynamic hand gestures into a static image classification task, effectively reducing both hardware and computational demands. Our framework utilizes a data-level fusion technique to encode 3D skeleton data from dynamic gestures into static RGB spatiotemporal images. It incorporates a specialized end-to-end Ensemble Tuner (e2eET) Multi-Stream CNN architecture that optimizes the semantic connections between data representations while minimizing computational needs. Tested across five benchmark datasets (SHREC'17, DHG-14/28, FPHA, LMDHG, and CNR), the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.