3D Human Activity Recognition with Reconfigurable Convolutional Neural Networks
Keze Wang, Xiaolong Wang, Liang Lin, Meng Wang, Wangmeng Zuo

TL;DR
This paper introduces a reconfigurable deep neural network for 3D human activity recognition from RGB-D videos, capable of dynamically adjusting its structure to better capture temporal activity variations.
Contribution
It proposes a novel structured CNN model with latent variables that can adapt its configuration during inference, improving recognition accuracy over existing methods.
Findings
Outperforms state-of-the-art activity recognition methods
Effectively models temporal variations in activities
Validated on a large RGB-D video dataset
Abstract
Human activity understanding with 3D/depth sensors has received increasing attention in multimedia processing and interactions. This work targets on developing a novel deep model for automatic activity recognition from RGB-D videos. We represent each human activity as an ensemble of cubic-like video segments, and learn to discover the temporal structures for a category of activities, i.e. how the activities to be decomposed in terms of classification. Our model can be regarded as a structured deep architecture, as it extends the convolutional neural networks (CNNs) by incorporating structure alternatives. Specifically, we build the network consisting of 3D convolutions and max-pooling operators over the video segments, and introduce the latent variables in each convolutional layer manipulating the activation of neurons. Our model thus advances existing approaches in two aspects: (i) it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
