Action Recognition with Image Based CNN Features
Mahdyar Ravanbakhsh, Hossein Mousavi, Mohammad Rastegari, Vittorio, Murino, Larry S. Davis

TL;DR
This paper introduces a hierarchical CNN-based feature structure that captures temporal variations and sub-actions in videos, significantly improving action recognition performance over traditional image-based CNN features.
Contribution
It proposes a hierarchical model on top of CNN features and a key-frame extraction method to effectively capture motion and sub-actions in videos.
Findings
Achieves superior accuracy on multiple action datasets.
Effectively captures temporal and sub-action information.
Outperforms existing state-of-the-art methods.
Abstract
Most of human actions consist of complex temporal compositions of more simple actions. Action recognition tasks usually relies on complex handcrafted structures as features to represent the human action model. Convolutional Neural Nets (CNN) have shown to be a powerful tool that eliminate the need for designing handcrafted features. Usually, the output of the last layer in CNN (a layer before the classification layer -known as fc7) is used as a generic feature for images. In this paper, we show that fc7 features, per se, can not get a good performance for the task of action recognition, when the network is trained only on images. We present a feature structure on top of fc7 features, which can capture the temporal variation in a video. To represent the temporal components, which is needed to capture motion information, we introduced a hierarchical structure. The hierarchical model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Video Surveillance and Tracking Methods
