Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal   and Attention Driven Residues

Chhavi Dhiman; Dinesh Kumar Vishwakarma; Paras Aggarwal

arXiv:1912.00576·cs.CV·December 3, 2019·5 cites

Skeleton based Activity Recognition by Fusing Part-wise Spatio-temporal and Attention Driven Residues

Chhavi Dhiman, Dinesh Kumar Vishwakarma, Paras Aggarwal

PDF

Open Access

TL;DR

This paper introduces a novel skeleton-based 3D human action recognition framework that combines part-wise spatiotemporal features with attention-driven residues, achieving state-of-the-art accuracy on multiple datasets.

Contribution

It proposes a new part-wise spatiotemporal CNN architecture with attention-driven residues for improved skeleton-based action recognition.

Findings

01

Achieves highest top-1 accuracy on benchmark datasets

02

Demonstrates robustness across multiple challenging datasets

03

Highlights local skeleton features effectively

Abstract

There exist a wide range of intra class variations of the same actions and inter class similarity among the actions, at the same time, which makes the action recognition in videos very challenging. In this paper, we present a novel skeleton-based part-wise Spatiotemporal CNN RIAC Network-based 3D human action recognition framework to visualise the action dynamics in part wise manner and utilise each part for action recognition by applying weighted late fusion mechanism. Part wise skeleton based motion dynamics helps to highlight local features of the skeleton which is performed by partitioning the complete skeleton in five parts such as Head to Spine, Left Leg, Right Leg, Left Hand, Right Hand. The RIAFNet architecture is greatly inspired by the InceptionV4 architecture which unified the ResNet and Inception based Spatio-temporal feature representation concept and achieving the highest…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Gait Recognition and Analysis

MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection