Complex Human Action Recognition in Live Videos Using Hybrid FR-DL Method
Fatemeh Serpush, Mahdi Rezaei

TL;DR
This paper introduces a hybrid FR-DL method that automates frame selection and feature extraction for human action recognition in videos, significantly improving accuracy and speed over existing methods.
Contribution
It proposes a novel hybrid approach combining background subtraction, HOG, CNN, LSTM, and Softmax-KNN for efficient and accurate action recognition in live videos.
Findings
Significant accuracy improvement over six state-of-the-art methods.
Enhanced processing speed due to automated frame and feature selection.
Effective recognition of 101 complex activities in the wild.
Abstract
Automated human action recognition is one of the most attractive and practical research fields in computer vision, in spite of its high computational costs. In such systems, the human action labelling is based on the appearance and patterns of the motions in the video sequences; however, the conventional methodologies and classic neural networks cannot use temporal information for action recognition prediction in the upcoming frames in a video sequence. On the other hand, the computational cost of the preprocessing stage is high. In this paper, we address challenges of the preprocessing phase, by an automated selection of representative frames among the input sequences. Furthermore, we extract the key features of the representative frame rather than the entire features. We propose a hybrid technique using background subtraction and HOG, followed by application of a deep neural network…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Anomaly Detection Techniques and Applications · Context-Aware Activity Recognition Systems
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Feature Selection · Tanh Activation · Sigmoid Activation · Long Short-Term Memory
