Attention-Driven Body Pose Encoding for Human Activity Recognition
B Debnath, M O'brien, S Kumar, A Behera

TL;DR
This paper introduces an attention-based encoding method for 3D body pose data that enhances feature representation by capturing spatial and temporal relationships, improving human activity recognition accuracy.
Contribution
It presents a novel multi-stream attention-based approach that fuses spatial, temporal, and RGB video data for superior activity recognition performance.
Findings
Improved recognition accuracy over baseline models
Effective encoding of spatial and temporal pose features
Successful integration of RGB video and pose data
Abstract
This article proposes a novel attention-based body pose encoding for human activity recognition that presents a enriched representation of body-pose that is learned. The enriched data complements the 3D body joint position data and improves model performance. In this paper, we propose a novel approach that learns enhanced feature representations from a given sequence of 3D body joints. To achieve this encoding, the approach exploits 1) a spatial stream which encodes the spatial relationship between various body joints at each time point to learn spatial structure involving the spatial distribution of different body joints 2) a temporal stream that learns the temporal variation of individual body joints over the entire sequence duration to present a temporally enhanced representation. Afterwards, these two pose streams are fused with a multi-head attention mechanism. % adapted from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Hand Gesture Recognition Systems · Video Surveillance and Tracking Methods
MethodsAttention Is All You Need · Linear Layer · *Communicated@Fast*How Do I Communicate to Expedia? · Residual Connection · Max Pooling · Convolution · Reduction-A · Inception-ResNet-v2-B · 1x1 Convolution · Dropout
