A Deep Learning Framework for Recognizing both Static and Dynamic   Gestures

Osama Mazhar; Sofiane Ramdani; and Andrea Cherubini

arXiv:2006.06321·cs.CV·March 18, 2021

A Deep Learning Framework for Recognizing both Static and Dynamic Gestures

Osama Mazhar, Sofiane Ramdani, and Andrea Cherubini

PDF

TL;DR

This paper introduces a unified deep learning framework, StaDNet, that recognizes static and dynamic gestures using RGB vision, pose estimation, and attention mechanisms, achieving state-of-the-art results in human-robot interaction scenarios.

Contribution

The novel StaDNet framework combines pose-driven spatial attention with CNN and LSTM to recognize both static and dynamic gestures from RGB images without depth sensors.

Findings

01

Outperforms state-of-the-art on Chalearn 2016 dataset

02

Successfully transfers knowledge to Praxis gestures dataset

03

Achieves high accuracy in static and dynamic gesture recognition

Abstract

Intuitive user interfaces are indispensable to interact with the human centric smart environments. In this paper, we propose a unified framework that recognizes both static and dynamic gestures, using simple RGB vision (without depth sensing). This feature makes it suitable for inexpensive human-robot interaction in social or industrial settings. We employ a pose-driven spatial attention strategy, which guides our proposed Static and Dynamic gestures Network - StaDNet. From the image of the human upper body, we estimate his/her depth, along with the region-of-interest around his/her hands. The Convolutional Neural Network in StaDNet is fine-tuned on a background-substituted hand gestures dataset. It is utilized to detect 10 static gestures for each hand as well as to obtain the hand image-embeddings. These are subsequently fused with the augmented pose vector and then passed to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.