Multi-Modality Fusion based on Consensus-Voting and 3D Convolution for Isolated Gesture Recognition
Jiali Duan, Shuai Zhou, Jun Wan, Xiaoyuan Guo, and Stan Z. Li

TL;DR
This paper introduces a multi-modality fusion framework using consensus-voting and 3D convolution for improved isolated gesture recognition, leveraging RGB and depth data to enhance accuracy and robustness.
Contribution
It proposes a novel two-stream network with consensus voting and depth-saliency modules for effective RGB-D gesture recognition, outperforming existing methods on benchmark datasets.
Findings
Achieved 10.29% higher accuracy on Chalearn IsoGD benchmark.
Obtained 96.74% accuracy on RGBD-HuDaAct dataset.
Demonstrated the effectiveness of multi-modality fusion and consensus-voting in gesture recognition.
Abstract
Recently, the popularity of depth-sensors such as Kinect has made depth videos easily available while its advantages have not been fully exploited. This paper investigates, for gesture recognition, to explore the spatial and temporal information complementarily embedded in RGB and depth sequences. We propose a convolutional twostream consensus voting network (2SCVN) which explicitly models both the short-term and long-term structure of the RGB sequences. To alleviate distractions from background, a 3d depth-saliency ConvNet stream (3DDSN) is aggregated in parallel to identify subtle motion characteristics. These two components in an unified framework significantly improve the recognition accuracy. On the challenging Chalearn IsoGD benchmark, our proposed method outperforms the first place on the leader-board by a large margin (10.29%) while also achieving the best result on RGBD-HuDaAct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHand Gesture Recognition Systems · Human Pose and Action Recognition · Gait Recognition and Analysis
