Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for   Gesture Recognition

Zitong Yu; Benjia Zhou; Jun Wan; Pichao Wang; Haoyu Chen; Xin Liu,; Stan Z. Li; Guoying Zhao

arXiv:2008.09412·cs.CV·July 7, 2021

Searching Multi-Rate and Multi-Modal Temporal Enhanced Networks for Gesture Recognition

Zitong Yu, Benjia Zhou, Jun Wan, Pichao Wang, Haoyu Chen, Xin Liu,, Stan Z. Li, Guoying Zhao

PDF

1 Repo

TL;DR

This paper introduces a neural architecture search-based approach for multi-modal gesture recognition, leveraging enhanced temporal features and optimized multi-rate, multi-modal networks to achieve state-of-the-art results on benchmark datasets.

Contribution

It presents the first NAS-based method for RGB-D gesture recognition, integrating 3D-CDC for temporal enhancement and optimized backbones for multi-rate, multi-modal learning.

Findings

01

Achieves state-of-the-art performance on IsoGD, NvGesture, and EgoGesture datasets.

02

Demonstrates effective multi-modal and multi-rate integration for gesture recognition.

03

Provides a new perspective on RGB and depth modality relationships.

Abstract

Gesture recognition has attracted considerable attention owing to its great potential in applications. Although the great progress has been made recently in multi-modal learning methods, existing methods still lack effective integration to fully explore synergies among spatio-temporal modalities effectively for gesture recognition. The problems are partially due to the fact that the existing manually designed network architectures have low efficiency in the joint learning of multi-modalities. In this paper, we propose the first neural architecture search (NAS)-based method for RGB-D gesture recognition. The proposed method includes two key components: 1) enhanced temporal representation via the proposed 3D Central Difference Convolution (3D-CDC) family, which is able to capture rich temporal context via aggregating temporal difference information; and 2) optimized backbones for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ZitongYu/3DCDC-NAS
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution