Semantic Image Networks for Human Action Recognition
Sunder Ali Khowaja, Seok-Lyong Lee

TL;DR
This paper introduces semantic images combined with advanced neural networks for improved human action recognition in videos, achieving state-of-the-art accuracy on standard datasets.
Contribution
It proposes a novel semantic image representation with segmentation and ranking, and a four-stream network architecture that significantly enhances recognition performance.
Findings
Semantic images improve activation and convergence.
Segmentation prior enhances recognition accuracy.
LSTM effectively models temporal variances.
Abstract
In this paper, we propose the use of a semantic image, an improved representation for video analysis, principally in combination with Inception networks. The semantic image is obtained by applying localized sparse segmentation using global clustering (LSSGC) prior to the approximate rank pooling which summarizes the motion characteristics in single or multiple images. It incorporates the background information by overlaying a static background from the window onto the subsequent segmented frames. The idea is to improve the action-motion dynamics by focusing on the region which is important for action recognition and encoding the temporal variances using the frame ranking method. We also propose the sequential combination of Inception-ResNetv2 and long-short-term memory network (LSTM) to leverage the temporal variances for improved recognition performance. Extensive analysis has been…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Memory Network · Long Short-Term Memory
