TL;DR
DeepGRU is an end-to-end deep learning model for gesture recognition that is device-agnostic, achieves state-of-the-art accuracy on multiple datasets, and is efficient to train even on limited hardware and data.
Contribution
We introduce DeepGRU, a novel gesture recognition model using stacked GRUs and global attention, demonstrating superior accuracy and efficiency across diverse datasets.
Findings
Achieves 84.9% and 92.3% accuracy on NTU RGB+D cross-subject and cross-view tests.
Attains 100% recognition accuracy on UT-Kinect dataset.
Can be trained in under 10 minutes on small datasets using only CPU.
Abstract
We propose DeepGRU, a novel end-to-end deep network model informed by recent developments in deep learning for gesture and action recognition, that is streamlined and device-agnostic. DeepGRU, which uses only raw skeleton, pose or vector data is quickly understood, implemented, and trained, and yet achieves state-of-the-art results on challenging datasets. At the heart of our method lies a set of stacked gated recurrent units (GRU), two fully-connected layers and a novel global attention model. We evaluate our method on seven publicly available datasets, containing various number of samples and spanning over a broad range of interactions (full-body, multi-actor, hand gestures, etc.). In all but one case we outperform the state-of-the-art pose-based methods. For instance, we achieve a recognition accuracy of 84.9% and 92.3% on cross-subject and cross-view tests of the NTU RGB+D dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
