Multi-modal Fusion for Single-Stage Continuous Gesture Recognition

Harshala Gammulle; Simon Denman; Sridha Sridharan; Clinton Fookes

arXiv:2011.04945·cs.CV·September 22, 2021

Multi-modal Fusion for Single-Stage Continuous Gesture Recognition

Harshala Gammulle, Simon Denman, Sridha Sridharan, Clinton Fookes

PDF

TL;DR

This paper introduces a novel single-stage multi-modal fusion framework for continuous gesture recognition that detects and classifies multiple gestures in videos without pre-segmentation, outperforming existing methods.

Contribution

The paper presents a unified single-stage model with multi-modal fusion, feature mapping, and a mid-point loss for continuous gesture recognition, advancing beyond two-stage approaches.

Findings

01

Outperforms state-of-the-art on three datasets

02

Handles variable-length input videos effectively

03

Highlights importance of each component through ablation studies

Abstract

Gesture recognition is a much studied research area which has myriad real-world applications including robotics and human-machine interaction. Current gesture recognition methods have focused on recognising isolated gestures, and existing continuous gesture recognition methods are limited to two-stage approaches where independent models are required for detection and classification, with the performance of the latter being constrained by detection performance. In contrast, we introduce a single-stage continuous gesture recognition framework, called Temporal Multi-Modal Fusion (TMMF), that can detect and classify multiple gestures in a video via a single model. This approach learns the natural transitions between gestures and non-gestures without the need for a pre-processing segmentation step to detect individual gestures. To achieve this, we introduce a multi-modal fusion mechanism to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.