TL;DR
This paper introduces a multi-task learning framework for online and early gesture detection that models gesture progression, achieving high accuracy and enabling real-time, early recognition in gesture-based interfaces.
Contribution
It presents a novel multi-task learning approach for gesture progression modeling and early detection, along with new annotations and a baseline for gesture localization.
Findings
Achieves 87.8% recognition accuracy on NVIDIA gesture dataset.
Outperforms previous state-of-the-art by more than 4%.
Provides competitive results on Montalbano dataset.
Abstract
Online and Early detection of gestures is crucial for building touchless gesture based interfaces. These interfaces should operate on a stream of video frames instead of the complete video and detect the presence of gestures at an earlier stage than post-completion for providing real time user experience. To achieve this, it is important to recognize the progression of the gesture across different stages so that appropriate responses can be triggered on reaching the desired execution stage. To address this, we propose a simple yet effective multi-task learning framework which models the progression of the gesture along with frame level recognition. The proposed framework recognizes the gestures at an early stage with high precision and also achieves state-of-the-art recognition accuracy of 87.8% which is closer to human accuracy of 88.4% on the NVIDIA gesture dataset in the offline…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
