Pose-conditioned Spatio-Temporal Attention for Human Action Recognition
Fabien Baradel, Christian Wolf, Julien Mille

TL;DR
This paper introduces a pose-conditioned spatio-temporal attention model for human action recognition that effectively integrates pose and RGB data, achieving state-of-the-art results on major datasets.
Contribution
It proposes a novel two-stream approach with pose-conditioned attention and a trainable glimpse sensor for improved action recognition.
Findings
Achieves state-of-the-art results on NTU-RGB+D dataset.
Performs well on SBU Kinect Interaction dataset.
Close to state-of-the-art on MSR Daily Activity 3D dataset.
Abstract
We address human action recognition from multi-modal video data involving articulated pose and RGB frames and propose a two-stream approach. The pose stream is processed with a convolutional model taking as input a 3D tensor holding data from a sub-sequence. A specific joint ordering, which respects the topology of the human body, ensures that different convolutional layers correspond to meaningful levels of abstraction. The raw RGB stream is handled by a spatio-temporal soft-attention mechanism conditioned on features from the pose network. An LSTM network receives input from a set of image locations at each instant. A trainable glimpse sensor extracts features on a set of predefined locations specified by the pose stream, namely the 4 hands of the two people involved in the activity. Appearance features give important cues on hand motion and on objects held in each hand. We show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Hand Gesture Recognition Systems
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
