Attention-Driven Body Pose Encoding for Human Activity Recognition

B Debnath; M O'brien; S Kumar; A Behera

arXiv:2009.14326·cs.CV·October 5, 2020

Attention-Driven Body Pose Encoding for Human Activity Recognition

B Debnath, M O'brien, S Kumar, A Behera

PDF

Open Access

TL;DR

This paper introduces an attention-based encoding method for 3D body pose data that enhances feature representation by capturing spatial and temporal relationships, improving human activity recognition accuracy.

Contribution

It presents a novel multi-stream attention-based approach that fuses spatial, temporal, and RGB video data for superior activity recognition performance.

Findings

01

Improved recognition accuracy over baseline models

02

Effective encoding of spatial and temporal pose features

03

Successful integration of RGB video and pose data

Abstract

This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. % adapted from…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods

MethodsAttention Is All You Need · Linear Layer · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling · Convolution · Reduction-A · Inception-ResNet-v2-B · 1x1 Convolution · Dropout